Log in

View Full Version : Hardware encoders even without B frames are more efficient than x265 & x264 combined?


birdie
26th October 2014, 12:57
This short video clip (http://wikisend.com/download/472094/27seconds.mp4) (pardon the quality and shaking) was shot on HTC One S smartphone, and it has an average bitrate of just 9Mbit/sec.

What I find confusing is that neither x264, not x265 can compress this file more than it's already compressed. The source video has a lot of fine details and if I try to preserve it, the resulting file produced by x264/x265 is actually bigger than the source, even though it has no B frames, and both x264 and x265 use them.

I don't use any fancy options for x264/x265 - I try to compress using ffmpeg (2.4.3)/x265(from GIT) this way:

ffmpeg -i source.mp4 -c:a copy -c:v libx265 -preset veryslow -x265-params crf=20 output.mkv

or in case of x264 (also from GIT):

ffmpeg -i source.mp4 -c:a copy -c:v libx264 -preset placebo -crf=18 output.mkv

Can anyone tell me what I'm doing wrong? Maybe there are some secret options to preserve details? I tried to encode using two pass encoding (average bitrate=4300k) but it didn't help much - the loss of details is very visible.

Atak_Snajpera
26th October 2014, 18:34
Your source footage has already crappy quality. Just take a look at total lack of details on grass. This proves again that real time hardware encoders (especially in phones) are terrible. No idea what you want to improve here if recorded video has no details.

Blue_MiSfit
26th October 2014, 20:00
Transcoding is always lossy. It's even worse when you start with a poor quality source.

Garbage in, garbage out - as the saying goes :)

Asmodian
27th October 2014, 19:34
Do not blame x264. Your phone got to start with the full raw video off the camera sensor while x264/x265 had to start with the result of your phones hardware encoder. You can loose a lot from the real image to get the video your phone made while you can loose absolutely nothing when re-compressing the phone's video if you want the same quality.

Maybe this thread (http://forum.doom9.org/showthread.php?t=171282) will help explain.

birdie
28th October 2014, 21:41
Your source footage has already crappy quality. Just take a look at total lack of details on grass. This proves again that real time hardware encoders (especially in phones) are terrible. No idea what you want to improve here if recorded video has no details.

That's weird because I can perfectly see the blades of grass in this video. Re-encoding this particular video clip is akin to re-encoding a high bitrate old audio codec to a lower bitrate new audio codec - there's no perceptible audio quality loss, yet the resulting file is substantially smaller. It seems like you're all saying that it's not possible with x264, yet I've successfully re-encoded many video clips (less shaky but encoded using hardware H.264) using x264 and I got a bitrate decrease of 50 to 75% without any visual quality loss. I could also reduce the bitrate of some DVDs around thrice again using x264.

I smell some serious contradiction here, but I'm not gonna argue with you. Besides x265 should have helped, right? But even with x265 and a bitrate around 9Mb/sec (equal to the original) I couldn't preserve the level of detail. It seems like I don't understand video encoding and video codecs.

Or someone just cannot give me valid reasons as to why this particular video cannot be re-encoded more efficiently using B frames, better motion vectors, better intra prediction, you name it.

vivan
28th October 2014, 23:11
Re-encoding this particular video clip is akin to re-encoding a high bitrate old audio codec to a lower bitrate new audio codecNo, it's more like reencoding from low bitrate terrible codec to a even lower bitrate with a better codec.

yet I've successfully re-encoded many video clips (less shaky but encoded using hardware H.264) using x264 and I got a bitrate decrease of 50 to 75% without any visual quality loss. I could also reduce the bitrate of some DVDs around thrice again using x264.Because they had actual details instead of artifacts.

Or someone just cannot give me valid reasons as to why this particular video cannot be re-encoded more efficiently using B frames, better motion vectors, better intra prediction, you name it.Video is full of compression artefacts, and compressing artifacts is hard (especially since another encoder is used which makes completely different decisions).

http://en.wikipedia.org/wiki/Generation_loss

Video encoders always work with uncompressed data (raw frames), they don't know how source was encoded. Artifacts and details are the same for them. So they have to spend more bitrate on reproducing artifacts than were spent on source video by previous encoder. And this leaves less bitrate for an actual details - even if encoder is much more efficient.
This is why reencoding (for saving space) is not always a good idea - it works well only with high quality sources.

Analogy: we all know that png is very effective at compressing screenshots.
Take a screenshot, save it as png (1). Then recompress it using jpg (2). Then try to save it as png again.
The result would not only look worse than (1), but will also take more space than (2).

foxyshadis
29th October 2014, 00:16
The #1 problem is the shakiness, and it's exacerbated by the rolling shutter. (That's the effect that causes distortion when the camera shakes.) The shaking causes a significant amount of sharp-blur-sharp transitions, which makes it harder for the encoder to lock on correctly and much harder to code, especially when everything is subtly (or wildly) changing shape every frame. Attempting to keep detail in this case is like attempting to efficiently encode a tree waving in the wind; it's just too random, even if it doesn't appear so to our eyes.

I used depanstabilize and the video not only looked better, it compressed better as well.

LSMASHVideoSource("C:\Users\foxy\Downloads\27seconds.mp4")
dp = DepanEstimate()
DePanStabilize(last,dp,dxmax=200,dymax=200,mirror=15,blur=30)

Personally, I'd then crop it by at least 20-30 pix on each side, especially the top (and resize it back up if 1920x1080 if required). Not only is compression improved, the ugly mirrored borders are minimized.

birdie
30th October 2014, 11:53
Now it's all become clear. Thank you, vivan and foxyshadis, for insightful comments!