ffmpeg libschroedinger quality [Archive]

View Full Version : ffmpeg libschroedinger quality

ghelyar

25th May 2009, 14:23

I have heard a lot of good things about Dirac/VC-2 and the schroedinger library. It's supposed to be able to produce very high quality video at a relatively small file size.

However, when I tried to convert some video clips to it with ffmpeg on Windows, I noticed that the quality was pretty poor. I took a h264 source file (I don't know if that makes a difference) and used the same settings twice to make 2 new videos of the first 30 seconds. The only setting I changed was the vcodec, between libschroedinger and libx264.

The Dirac output was a little bit smaller for the same bitrate but the quality is nowhere near as good as x264, particularly when there is movement (a mouth speaking or whatever).

The output file sizes were as follows:
8,185,419 out_drac.mp4
9,632,311 out_x264.mp4

The ffmpeg console output:C:\>ffmpeg -t 30 -i in.mp4 -acodec copy -vcodec libschroedinger -b 2000k out_drac.mp4
FFmpeg version SVN-r16573, Copyright (c) 2000-2009 Fabrice Bellard, et al.
configuration: --extra-cflags=-fno-common --enable-memalign-hack --enable-pthreads --enable-libmp3lame --enable-libxvid --enable-libvorbis --enable-libtheora
--enable-libspeex --enable-libfaac --enable-libgsm --enable-libx264 --enable-libschroedinger --enable-avisynth --enable-swscale --enable-gpl --enable-shared --d
isable-static
libavutil 49.12. 0 / 49.12. 0
libavcodec 52.10. 0 / 52.10. 0
libavformat 52.23. 1 / 52.23. 1
libavdevice 52. 1. 0 / 52. 1. 0
libswscale 0. 6. 1 / 0. 6. 1
built on Jan 13 2009 03:17:03, gcc: 4.2.4
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x9ee6d0]edit list not starting at 0, a/v desync might occur, patch welcome
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'in.mp4':
Duration: 00:28:16.58, start: 0.000000, bitrate: 2186 kb/s
Stream #0.0(eng): Video: h264, yuv420p, 1280x720, 24.00 tb(r)
Stream #0.1(eng): Audio: aac, 44100 Hz, stereo, s16
Stream #0.2(eng): Data: rtp / 0x20707472
Stream #0.3(eng): Data: rtp / 0x20707472
Output #0, mp4, to 'out_drac.mp4':
Stream #0.0(eng): Video: libschroedinger, yuv420p, 1280x720, q=2-31, 2000 kb/s, 24.00 tb(c)
Stream #0.1(eng): Audio: libfaac, 44100 Hz, stereo, s16
Stream mapping:
Stream #0.0 -> #0.0
Stream #0.1 -> #0.1
Press [q] to stop encoding
frame= 721 fps= 13 q=0.0 Lsize= 7994kB time=30.00 bitrate=2182.8kbits/s
video:7505kB audio:469kB global headers:0kB muxing overhead 0.245660%

C:\>ffmpeg -t 30 -i in.mp4 -acodec copy -vcodec libx264 -b 2000k out_x264.mp4
FFmpeg version SVN-r16573, Copyright (c) 2000-2009 Fabrice Bellard, et al.
configuration: --extra-cflags=-fno-common --enable-memalign-hack --enable-pthreads --enable-libmp3lame --enable-libxvid --enable-libvorbis --enable-libtheora
--enable-libspeex --enable-libfaac --enable-libgsm --enable-libx264 --enable-libschroedinger --enable-avisynth --enable-swscale --enable-gpl --enable-shared --d
isable-static
libavutil 49.12. 0 / 49.12. 0
libavcodec 52.10. 0 / 52.10. 0
libavformat 52.23. 1 / 52.23. 1
libavdevice 52. 1. 0 / 52. 1. 0
libswscale 0. 6. 1 / 0. 6. 1
built on Jan 13 2009 03:17:03, gcc: 4.2.4
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x3e6b0]edit list not starting at 0, a/v desync might occur, patch welcome
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'in.mp4':
Duration: 00:28:16.58, start: 0.000000, bitrate: 2186 kb/s
Stream #0.0(eng): Video: h264, yuv420p, 1280x720, 24.00 tb(r)
Stream #0.1(eng): Audio: aac, 44100 Hz, stereo, s16
Stream #0.2(eng): Data: rtp / 0x20707472
Stream #0.3(eng): Data: rtp / 0x20707472
Output #0, mp4, to 'out_x264.mp4':
Stream #0.0(eng): Video: libx264, yuv420p, 1280x720, q=2-31, 2000 kb/s, 24.00 tb(c)
Stream #0.1(eng): Audio: libfaac, 44100 Hz, stereo, s16
Stream mapping:
Stream #0.0 -> #0.0
Stream #0.1 -> #0.1
[libx264 @ 0x28e1520]using cpu capabilities: MMX2 SSE2Fast SSSE3 PHADD SSE4.1 Cache64
[libx264 @ 0x28e1520]profile Baseline, level 3.1
Press [q] to stop encoding
frame= 721 fps= 19 q=-2.0 Lsize= 9407kB time=30.00 bitrate=2568.6kbits/s
video:8920kB audio:469kB global headers:1kB muxing overhead 0.176093%
[libx264 @ 0x28e1520]slice I:64 Avg QP:22.44 size: 40737
[libx264 @ 0x28e1520]slice P:657 Avg QP:25.23 size: 9935
[libx264 @ 0x28e1520]mb I I16..4: 53.3% 0.0% 46.7%
[libx264 @ 0x28e1520]mb P I16..4: 17.6% 0.0% 0.0% P16..4: 37.3% 0.0% 0.0% 0.0% 0.0% skip:45.2%
[libx264 @ 0x28e1520]final ratefactor: 27.94
[libx264 @ 0x28e1520]SSIM Mean Y:0.9773081
[libx264 @ 0x28e1520]kb/s:2432.5

I have tried with several different bitrates (always with the same results), as well as other suggestions I have found such as using qscale or sameq (both of which produced massive files that VLC could not play smoothly anyway, even on a relatively powerful computer).

In fact, if I set x264 to lower bitrates I have better quality and smaller file size, which is what I was expecting from dirac in the first place.

I am new to this and my command line arguments are fairly simple. Is there something I am missing here or is dirac just over-hyped?

25th May 2009, 14:43

In fact, if I set x264 to lower bitrates I have better quality and smaller file size, which is what I was expecting from dirac in the first place.
Now that would have been surprising. I'd expect the current Dirac encoders to give similar quality to MPEG-4 ASP or VC-1 at best.

Is there something I am missing here or is dirac just over-hyped?
Maybe dirac-research would be better than Schrödinger. But, is anyone even claiming that these Dirac implementations would beat the best H.264 encoders?

ghelyar

25th May 2009, 15:02

It has been a few months since I did the research on it (roughly when it got VC-2 status and when VLC could first play it - before then there wasn't much point in trying) so I do not have any references to cite at hand but yes pretty much everything I found back then said that Dirac would be better than H.264 and VC-1, was the only one good enough for "Super Hi-Vision" (a.k.a. Ultra HD), needed to be developed because the current codecs were not good enough to transmit HD signals over SD wires, etc.

I know it's still very new but most of the recent improvements have only been speed optimisations in the encoders/decoders. I think the quality should be about the same now as it will be down the line and the elephants dream / big buck bunny 1080p videos I saw months ago were incredibly high quality at very small file sizes.

I would guess that I am doing something wrong. There is so much that can be set in ffmpeg that I do not know how to do or do not understand enough to know how to use it to optimise anything. Perhaps it is better if schroedinger is used directly instead of through ffmpeg, for example.

Asking if it was over-hyped (because the hype is or was definitely there) was really just a way of asking whether other people have the same problems as me or if I am doing something wrong. "Is it just me or ..."

Edit: I have a debian box in which ffmpeg is slightly newer and uses "--enable-libdirac --disable-decoder=libdirac --enable-libschroedinger --disable-encoder=libschroedinger" so I will try on there. This version does not let me put drac in mp4 though even though it works on the windows ffmpeg. VLC 0.98 or 1.0 trunk don't seem to want to play drac at all unless it is in mp4 (even though ed and bbb 1080p were in .ts before)

Dark Shikari

25th May 2009, 16:09

25th May 2009, 16:12

It has been a few months since I did the research on it (roughly when it got VC-2 status and when VLC could first play it - before then there wasn't much point in trying) so I do not have any references to cite at hand but yes pretty much everything I found back then said that Dirac would be better than H.264 and VC-1, was the only one good enough for "Super Hi-Vision" (a.k.a. Ultra HD), needed to be developed because the current codecs were not good enough to transmit HD signals over SD wires, etc.
Well, these claims are hype and Dirac is nowhere near quality dominance in any segment. The only advantage it has is the royalty-free format.

I know it's still very new but most of the recent improvements have only been speed optimisations in the encoders/decoders. I think the quality should be about the same now as it will be down the line and the elephants dream / big buck bunny 1080p videos I saw months ago were incredibly high quality at very small file sizes.
Care to dig up some links? I've only seen unimpressive SD encodes.

Here's a good x264 encode for comparison: http://mirror05.x264.nl/Dark/force.php?file=./x264clips/BigBuckBunny.mkv

I would guess that I am doing something wrong. There is so much that can be set in ffmpeg that I do not know how to do or do not understand enough to know how to use it to optimise anything. Perhaps it is better if schroedinger is used directly instead of through ffmpeg, for example.
I'm quite certain that you won't reach the quality/size level of x264 even if you used the best options of dirac_research or Schrödinger encoders and the defaults of x264.

I just made a few low-bitrate test encodes with dirac_encoder of Dirac 1.0.2 myself and they seemed to support my view as well.

ghelyar

25th May 2009, 18:28

OK well as long as I know it was all a big lie I can move on and encode my media into 264. That was the point in the thread. There are plenty of resources on ffmpeg 264 optimisations that I can read anyway (as well as some presets). I can file dirac away with the similar failure - jpeg 2000.

As for the links, the only ones I can find now are things like http://dirac.kw.bbc.co.uk/download/video/maybefinal/bbb-tr2200.ts and they are nowhere near 1080p (though they still do not have the artefacts that I am getting* in my encodes now). What I saw before was probably made with something like this wiki article (http://diracvideo.org/wiki/index.php/Encode_Big_Buck_Bunny) but I'm not going to go and do it myself to find out.

I know they were 1080p because I used them to test my newest 1080p screen out when I got it. It was about 150mb and you could not really see any artefacts (maybe a little if you looked incredibly carefully at the edges of things when they moved but at a normal viewing distance for a given screen size there was no chance of seeing any). I suppose that probably falls in line with this, as the 264 link you gave is smaller (122mb) and at least as good quality. It could have just been that because it was so high resolution, the artefacts were not enlarged as they would be for a smaller video that is upscaled to a larger display.

Edit:
*to show just how bad the artefacts I am getting at 2000k are, here (http://www.helyar.net/files/schro_sucks.png) is an image (approx. 800kB png) comparing libschroedinger on the left to libx264 on the right, taken from the encodes logged in the first post. Even from that low res bbb I linked to in this post, you can see that I am getting a lot of crap. In the video, the text moves very slowly - it is not stationary - so it is pretty much impossible to read (not that you would anyway, it's just a good example from the sample I am using).

nurbs

28th May 2009, 12:17

If you want a detailed technical explanation of why Dirac (in its current spec-frozen form) is doomed to suck, I can give that, too.
I'd be interested, if it isn't too much trouble.

Last time I tried dirac was more than a year ago. It didn't look good, and my 2.2 GHz Athlon64 (X2) apparently wasn't fast enough to play back 720x576@25fps in realtime. That was before the 1.0 release and schroedinger.

ghelyar

28th May 2009, 12:52

Actually, I'm quite interested in this too. I just assumed you were being facetious :)

Dark Shikari

28th May 2009, 16:44

I'd be interested, if it isn't too much trouble.

Last time I tried dirac was more than a year ago. It didn't look good, and my 2.2 GHz Athlon64 (X2) apparently wasn't fast enough to play back 720x576@25fps in realtime. That was before the 1.0 release and schroedinger.Actually, I'm quite interested in this too. I just assumed you were being facetious :)

1. Wavelets suck, visually. Nobody has found a way around this yet. One could say they have a high ratio of PSNR to visual quality ;)

2. Wavelets suck for intra coding compared to H.264's intra prediction. Hence why JPEG-2000 comes out worse than JPEG.

3. Dirac has constant-size partitions, either 8x8 or 16x16, because they couldn't find a good way to mix them in OBMC.

4. Dirac has a pretty crappy entropy coder (far fewer contexts than H.264!). I suspect this is why despite being in theory superior to Snow due to having B-frames, it often comes out worse.

5. Dirac's current implementation is not very good to begin with.

6. And it's slow as hell (OBMC + 8-tap motion compensation -> insanely slow).

ghelyar

28th May 2009, 22:48

I thought wavelets were supposed to be better than DCT.

For speed, libschrodinger is definitely much faster than libdirac (at least 5 times faster for me)

Dark Shikari

28th May 2009, 22:53

I thought wavelets were supposed to be better than DCT."Supposed to be" according to who?For speed, libschrodinger is definitely much faster than libdirac (at least 5 times faster for me)That's because libdirac is the slow reference implementation. I'm surprised it's only 5 times faster.

ghelyar

29th May 2009, 02:55

I thought wavelets were supposed to be better than DCT."Supposed to be" according to who?

When I was at university, writing DCTs, FTs and the like in MATLAB, there were a lot of academics (computer scientists, mathematicians, etc) that all shared this view. I'm pretty sure that wavelets are supposed to be an evolution in this transformation to the frequency domain in a similar way to the evolution to elliptic curve cryptography (though nobody has actually proven the security of elliptic curves yet AFAIK). There is still a lot of research going into wavelets so a lot of smart people must see their potential.

A better question would be what do I mean by "better" (because I am not entirely sure myself how they are intended to be better). This may be a better size reduction but be more easily perceived (i.e. look worse to the human eye), for example.

Dark Shikari

29th May 2009, 03:04

benwaggoner

29th May 2009, 03:09

When I was at university, writing DCTs, FTs and the like in MATLAB, there were a lot of academics (computer scientists, mathematicians, etc) that all shared this view. I'm pretty sure that wavelets are supposed to be an evolution in this transformation to the frequency domain in a similar way to the evolution to elliptic curve cryptography (though nobody has actually proven the security of elliptic curves yet AFAIK). There is still a lot of research going into wavelets so a lot of smart people must see their potential.

A better question would be what do I mean by "better" (because I am not entirely sure myself how they are intended to be better). This may be a better size reduction but be more easily perceived (i.e. look worse to the human eye), for example.
Wavelets have been shown to do quite nicely for still image compression; the best wavelet still enocoders are in the ballpark of the best DCT still image codecs. Still, an H.264 High Profile I-frame can beat JPEG 2000.

The nut no one has cracked has been to get decent motion estimation with a wavelet codec, while DCT is very good at that: small blocks + block based motion estimation are a great match. A wavelet transform touches a much bigger image area, so motion doesn't map to the transform at all.

So, near as I can tell is that wavelet codecs bet that wavelets > dct for intra by a big enough margin that dct > wavelets for inter won't matter that much.

However, if you think about the average video encode, what's the ratio of bits spent on intra blocks to predicted blocks? And lets say you made the intra blocks 2x as efficient while reducing the efficiency of predicted blocks by 20%? Probably still a lousy deal.

And wavelet motion estimation is more than 20% less efficient than the best dct motion estimation, while wavelet intra coding isn't anywhere near 2x as efficient as the best dct intra coding.

Dark Shikari

29th May 2009, 03:12

Wavelets have been shown to do quite nicely for still image compression; the best wavelet still enocoders are in the ballpark of the best DCT still image codecs. Still, an H.264 High Profile I-frame can beat JPEG 2000.

The nut no one has cracked has been to get decent motion estimation with a wavelet codec, while DCT is very good at that: small blocks + block based motion estimation are a great match. A wavelet transform touches a much bigger image area, so motion doesn't map to the transform at all.

So, near as I can tell is that wavelet codecs bet that wavelets > dct for intra by a big enough margin that dct > wavelets for inter won't matter that much.

However, if you think about the average video encode, what's the ratio of bits spent on intra blocks to predicted blocks? And lets say you made the intra blocks 2x as efficient while reducing the efficiency of predicted blocks by 20%? Probably still a lousy deal.

And wavelet motion estimation is more than 20% less efficient than the best dct motion estimation, while wavelet intra coding isn't anywhere near 2x as efficient as the best dct intra coding.I'd say it's the opposite; with H.264-style intra prediction, DCTs should beat wavelets (at least conventional ones) in intra coding by a large factor, significantly greater than any difference in inter coding.

Part of the issue is that the large-scale correlation of wavelets is nearly useless at sane bitrates; it only makes a serious difference at very low bitrates... at which point you would be better off downscaling before encoding anyways. JPEG-2000 shows these characteristics; it begins beating JPEG at something like 80x compression, at which point the image is completely destroyed anyways.

ghelyar

29th May 2009, 03:29

Clearly as it is at the moment for video quality, it isn't great. I just don't think that "wavelets suck" is "a detailed technical explanation of why Dirac is doomed to suck".

The problems mentioned are problems with the current implementations. H.264 shares things in common with old codecs that are not very good by today's standards. The difference is just how they are implemented. In time, people have ideas that improve the output. Things like those mentioned are mostly just because people have had a lot longer to solve problems with DCT than with wavelets.

I'm sure wavelets will have their uses at some point. We are just not quite there yet.

Dark Shikari

29th May 2009, 03:32

Clearly as it is at the moment for video quality, it isn't great. I just don't think that "wavelets suck" is "a detailed technical explanation of why Dirac is doomed to suck".You're omitting the second sentence: "Nobody has found a way around this yet."

Most of the issues with Dirac are inherent issues in any OBMC-wavelet format: the inability to perform spatial intra prediction, the inability to mix partition sizes, the inability to use an adaptive transform size, and so forth.

ghelyar

29th May 2009, 03:34

You're omitting the second sentence: "Nobody has found a way around this yet."

Most of the issues with Dirac are inherent issues in any OBMC-wavelet format: the inability to perform spatial intra prediction, the inability to mix partition sizes, the inability to use an adaptive transform size, and so forth.

Even if "Nobody has found a way around this yet", the very fact that you said "yet" means that there is a possibility of it being solved, which means it isn't doomed.

Regardless of the rest of the sentence, "it sucks" is never a valid reason why "it sucks", for anything.

Dark Shikari

29th May 2009, 03:36

Even if "Nobody has found a way around this yet", the very fact that you said "yet" means that there is a possibility of it being solved, which means it isn't doomed.

Regardless of the rest of the sentence, "it sucks" is never a valid reason why "it sucks", for anything."Doomed" refers to the current Dirac specification. If there's a way around it, it will almost surely not be possible under the current specification.

ghelyar

29th May 2009, 03:39

Clearly there is no getting through to some people. You seem to have something personal against all wavelets.

If you cannot provide "a detailed technical explanation", do not offer one. benwaggoner's answer was much more useful.

Dark Shikari

29th May 2009, 03:53

Clearly there is no getting through to some people.Yes; no matter how many times I explain to you why my statements are of limited scope, you insist on trolling this thread.You seem to have something personal against all wavelets.Actually, most of the problems result from OBMC, not wavelets. The two aren't inherently linked.If you cannot provide "a detailed technical explanation", do not offer one. benwaggoner's answer was much more useful.Do you need a more detailed technical explanation?

Traditional spatial intra prediction is impossible with overlapped-block motion compensation because it requires spatial reconstruction to be done before the prediction of the next block. However, with OBMC, there is no clear boundary between blocks; you cannot predict from the previous block because the previous block cannot be fully decoded without the current block's information. There is no "row" of decoded pixels to predict from. One possible solution would be to use something like markov chain intra prediction, but I suspect that still will perform worse when you don't have explicit neighbors to use for prediction.

Variable-size partitions are extremely messy with OBMC because of overlapping. OBMC generally has an explicit overlap distance, something which variable-size partitions tend to mess with. Curiously enough, larger partitions actually tend to be more effective with OBMC that you would think, as larger overlap distances tend to help with compression despite the loss of motion vector precision. I suspect the "best" way to do this is something like TJ Davies' partition tree approach, or more generally, arbitrary-position motion vectors with implicit overlap and grid MC. Of course, all of these approaches make the problem even more computationally intractable, both in terms of optimization and decoding.

The adaptive transform size problem stems from the overlap problem: transforms are overlapped too, so adaptive transform size in the classical sense doesn't seem to make any real sense. You'd probably have to make such adaptivity internal to the transform itself, and I'm not sure how well that would interact with the overlap. This would involve changing the transform into something not-quite-wavelet, which may be the right approach.

Visually, wavelets tend to underperform DCTs for a number of reasons. The wavelet structure tends to greatly increase ringing, something that doesn't hurt PSNR much at all but looks rather bad visually. This is because the wavelet function itself inherently rings (http://upload.wikimedia.org/wikipedia/commons/2/23/Wavelet_-_Morlet.png). DCTs can avoid this kind of ringing with very small transforms, like H.264 uses (4x4); see the "adaptive transform" issue above. Another problem involves energy in the frequency domain. A single coefficient in a DCT can create a texture across the entire transform block, while in a wavelet, a single coefficient generally either creates a low-frequency effect across the block or a single high frequency effect in a small part of the block. This may make more sense in theory, since it should be superior for coding small objects, but since the HVS loves "detail," even if it isn't real, the HVS tends to prefer the DCT approach.

Is that more clear? I generally don't go into this much depth because most people who ask this question don't have the technical background to understand such a detailed explanation.

avih

29th May 2009, 09:50

Clearly there is no getting through to some people. You seem to have something personal against all wavelets.
...
Usage of Ad hominem (http://en.wikipedia.org/wiki/Ad_hominem) arguments does not make them more convincing and doesn't encourage a proper discussion. Please refrain from such statements and stay on topic. That's a warning.

*.mp4 guy

29th May 2009, 11:36

Mihai Cartoaje

22nd June 2009, 00:41

Until there is a wavelet coder that outperforms AVC intra while being less then two times as inherently computationally costly, wavelets are completely irrelevant to modern compression standards. And this completely ignores all of the problems with translating a wavelet image coder into an efficient video codec, which, as has been stated, will most likely never be solved at competetive computational cost.
I designed a transform that is similar to wavelets and is good for video. It is described here (http://libima.com/video.htm). Comp.compression thread (http://groups.google.com/group/comp.compression/browse_thread/thread/d764ba921c81b49d).