View Full Version : SVC and > 8-bit: any use or demos
benwaggoner
2nd May 2009, 19:08
Working away on my book, I'm trying to get the 1st draft of the H.264 chapter done this weekend (although I've said that before :)).
My basic take on SVC and the > 8-bit High profiles is "interesting, but it's not clear if, where, and when they'll be used in practice." Anyone have any good examples of anyone doing anything with either of them yet (outside of videoconferencing for SVC, and at all with High 4:2:2 and High 4:4:4).
I'm certainly going to be talking about them in some detail, because they are interesting and I can think of plenty of places I'd like to see them used. I just don't want to make any over-optimistic predictions after the MPEG-2 pt 2 Fine Grain Scalability prediction debacle of the 1st edition.
So, anyone have any good examples, demos, etcetera for either?
On a broader level, if any community members are interested in reviewing a draft of the chapter down the road a bit, PM me. I'd much rather be yelled at for my dumb mistakes before it goes to print and I'm haunted by them for years. I still feel daily shame over my poor description of the zig-zag and Yeltsin Walk patterns...
4:2:2 10 bits will be used for contribution : SDI is 4:2:2 10 bits, so without 10 bits, encoding forces a needless reduction of the bitdepth. Since increased bitdepth doesn't increase bitrate (and since contribution could afford it anyway), it will move toward 10 bits - provided SDI remains long enough. There were some demos at NAB about it.
SVC is trickier. It can do some pretty neat things but I'm not sure they are useful enough to be adopted. Still, it's far more realistic than FGS for mpeg2.
benwaggoner
3rd May 2009, 02:35
4:2:2 10 bits will be used for contribution : SDI is 4:2:2 10 bits, so without 10 bits, encoding forces a needless reduction of the bitdepth. Since increased bitdepth doesn't increase bitrate (and since contribution could afford it anyway), it will move toward 10 bits - provided SDI remains long enough. There were some demos at NAB about it.
So H.264 as a capture/mezzanine codec for 10-bit? Interesting.
Honestly, I've not been a big fan of H.264 for mezzaine; decode and random access are rather slow, and the efficiency gains aren't nearly as significant at those bitrates. There was some good work going on a couple years ago for some simplified studio profiles (no CABAC, no loop filter, 1-2 reference frames max); is this the product of that?
SVC is trickier. It can do some pretty neat things but I'm not sure they are useful enough to be adopted. Still, it's far more realistic than FGS for mpeg2.
Yeah, I must have had two dozen NAB conversations about SVC, which were all basically amounted to everyone being interested in theory and keeping an eye on it, but no one actually committed to using it, or having validated it for any particuar scenario.
Someone's going to need to just jump in and try to go big, and we'll see what they can pull off. So much of the theoretical value depends on network topology and other stuff.
I'd just love to see a good public demo we could all watch and ruminate on.
Sagekilla
3rd May 2009, 04:12
IIRC, if you're doing lossless compression the loop filter is disabled. Or, if you're encoding at sufficiently high enough bitrates it gets disabled anyway.
I'd think disabling CABAC would be a bit of a bad idea though. It saves a significant amount of bitrate and depending on how much bitrate you save, could save CPU time too.
Dark Shikari
3rd May 2009, 04:15
I'd think disabling CABAC would be a bit of a bad idea though. It saves a significant amount of bitrate and depending on how much bitrate you save, could save CPU time too.At QP1 intra-only, the gain of CABAC is less than 3% over CAVLC.
Sagekilla
3rd May 2009, 04:19
Thanks, didn't know that Dark. I know at extremely high QPs CABAC has absurd compression gains over CAVLC, but how quickly does this diminish in the lower QPs?
Dark Shikari
3rd May 2009, 04:20
Thanks, didn't know that Dark. I know at extremely high QPs CABAC has absurd compression gains over CAVLC, but how quickly does this diminish in the lower QPs?http://akuvian.org/src/x264/entropy.png
It's worse for intra-only.
benwaggoner
3rd May 2009, 05:55
At QP1 intra-only, the gain of CABAC is less than 3% over CAVLC.
And at that kind of lossless data rate, the computational cost of CABAC is going to be pretty enormous.
CABAC is a powerful tool when targeting perf-rich environments of ASICs that support it, but it's also not amenable to parallelization and hence GPU acceleration.
There's plenty of use cases where you'd be better with it off as you're more bound by decode complexity than efficiency.
Like most tools in most codecs, if they made it possible to turn off, there was probably a reason why you'd want it off :).
Dark Shikari
3rd May 2009, 05:59
And at that kind of lossless data rate, the computational cost of CABAC is going to be pretty enormous.
CABAC is a powerful tool when targeting perf-rich environments of ASICs that support it, but it's also not amenable to parallelization and hence GPU acceleration.CAVLC isn't amenable to parallelization either...
So H.264 as a capture/mezzanine codec for 10-bit? Interesting.
Honestly, I've not been a big fan of H.264 for mezzaine; decode and random access are rather slow, and the efficiency gains aren't nearly as significant at those bitrates.Ah, but here's the trick : mpeg2 - which is currently used for contribution - doesn't support 10 bits, and jpeg2K, which does support 10bits, isn't efficient enough since it is intra only.
I have also forgotten AVC intra, which already exists and is 4:2:2 10 bits.
CAVLC isn't amenable to parallelization either, so I have no idea what you're talking about. With CAVLC, you can decode one symbol at a time, but that symbol is several bits long. With CABAC, you can decode one bin at a time, and that represents less than one bit.
BTW, slices exists. 4 slices for HD content doesn't reduce coding efficiency, and does allow parallelization.
Dark Shikari
3rd May 2009, 06:16
With CAVLC, you can decode one symbol at a time, but that symbol is several bits long. With CABAC, you can decode one bin at a time, and that represents less than one bit.
BTW, slices exists. 4 slices for HD content doesn't reduce coding efficiency, and does allow parallelization.Of course CABAC is more complex than CAVLC; doesn't mean it's more or less parallelizable though.
benwaggoner
3rd May 2009, 06:16
Ah, but here's the trick : mpeg2 - which is currently used for contribution - doesn't support 10 bits, and jpeg2K, which does support 10bits, isn't efficient enough since it is intra only.
I have also forgotten AVC intra, which already exists and is 4:2:2 10 bits.
I'm a Cineform man for my mezzanine, personally. It can do 10-bit 4:2:2 and even higher, and has served me well for many years. Plus being wavelet, editors with integrated support can do cool tricks like doing subband decoding for fast scrubbing and previews.
The whole Xbox/Zune Marketplace teams have long since standardized on it. I don't know how many petabytes of prerpocessed source content they must have archived by now.
Agreed J2K is too complex for mez, but once you're that close to lossless, H.264's complexity doesn't really offer you that much added efficiency, particulary once things are I-frame only. Really, a computationally expensive codec is of decreasing value with increasing bitrates. Waveletish approaches seem very interesting there as the subbands are finally good for something.
Motion JPEG XR may turn into something interesting as well, although its subbands are a lot coarser (4x instead of 2x per step).
Hmm... SVC for mez?
benwaggoner
3rd May 2009, 06:21
Of course CABAC is more complex than CAVLC; doesn't mean it's more or less parallelizable though.
I didn't meant to imply that CAVLC was more parallelizable, but that CABAC is enough of a long pole that the lack of a way to parallelize it can be the gating factor for decode performance at these kinds of data rates. Really, an HD mez could have peaks up to 80 Mbps I imagine. Even with 4 slices and 4 available cores that could be a lot to process.
Certainly not worth it for a 3% gain.
The nice thing about CABAC is you can more afford its complexity at the bitrates where it's worth using.
akupenguin
3rd May 2009, 11:55
At QP1 intra-only, the gain of CABAC is less than 3% over CAVLC.
True, but QP0 is often lower bitrate than QP1 (whether or not intra is involved), and CABAC has a bigger effect on QP0.
I expect 10bit QP13 to be better approximated by 8bit QP1 than by 8bit QP13, just nitpicking so that people don't get the wrong idea about high bitrate encodings of ordinary sources.
benwaggoner
3rd May 2009, 17:10
True, but QP0 intra-only is often lower bitrate than QP1, and CABAC has a bigger effect on QP0.
Interesting. And getting interframe coding out of the mix certainly saves a whole lot of perf. I presume one would have an independent CABAC per frame.
I expect 10bit QP13 to be better approximated by 8bit QP1 than by 8bit QP13, just nitpicking so that people don't get the wrong idea about high bitrate encodings of ordinary sources.
Agreed. 8-bit codecs are great for storing a preprocessed master ready for encoding (definition of mezzanine), but it's much better to have 10-bits for anything that's going to get further video processing, to reduce banding. A mathematically lossy 10-bit can substantially outperform a lossless 8-bit for a source archive.
Dithering to 8-bit can produce great quality with a good algorithm, but you only want to do it once!
On problem I have with much of today's compression workflow, including AVISynth, is that it presumes 8-bit input, and just truncates any LSBs. I'd really be able to use 10-bit as input to my scaling algorithms, levels controls, etcetera, and then have access to a tunable dither on the output side.
Hence I wind up doing some high-end projects in After Effects using 32-bit float, outputting to Cineform 10-bit, and then using a special DirectShow dithering filter to render out my YV12 LAG master for encoding. Results are awesome, but it's rather a lot of work...
There's been some interesting research around using 10-bit as input to the encoder even when going to 8-bit output in order to integrate dithering and banding reduction decisions with quantization. But I haven't seen anything come far enough along to even provide a good demo.
Codec guys (myself often included) often assume the world starts with YV12. But lots of psychovisual processing has happened (explicitly or implicitly) before it even hit YV12.
I wish we could get beyond the "waterfall" media pipeline approach, and let pipelines share bidirectional information. Wouldn't it be handy to know where the motion vectors and field/frame decisoins were in the source when deinterlacing? Or even using those as a hint for the intial regions to use for motion search? Or having a noise reduction filter with a strength that could be adaptive to QP of output frames?
Hard to do that kind of stuff in the current way the tools are architected, though.
akupenguin
3rd May 2009, 17:23
Wouldn't it be handy to know where the motion vectors and field/frame decisions were in the source when deinterlacing?
You don't have to break the waterfall model to do that. At least libavcodec exports that info, and some of MPlayer's filters use it (dunno about deinterlacing in particular).
Or having a noise reduction filter with a strength that could be adaptive to QP of output frames?
That... could be tricky, and I fail to imagine any general purpose API of which that is a special case. But yes it would be handy.
benwaggoner
3rd May 2009, 18:43
You don't have to break the waterfall model to do that. At least libavcodec exports that info, and some of MPlayer's filters use it (dunno about deinterlacing in particular).
Yeah, that's more about decoder<>preprocessor integration. Although I could see cases where being able to dynamically increase decoder deblocking/deringing based on the encoder's heuristics could be useful, maybe.
That... could be tricky, and I fail to imagine any general purpose API of which that is a special case. But yes it would be handy.
It'd be something like the codec requesting a new version of the frame with different preprocessing settings. But it'd be hard to do without coupling it closely to a particular denoising algorithm.
Well, I suppose if denoisers offered a single linear strength scale, the codec could iteratively find the lowest denoise setting that hit its internal qualtiy metric or something...
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.