Doom9's Forum - View Single Post - How VP9 works, technical details & diagrams

pieter3d · 22nd October 2013, 00:52

Quote:

Originally Posted by benwaggoner

Certainly. But since WPP doesn't have nearly the quality overhead of AVC slices, I can imagine that it might become standard in general encodes.

For my professional use, I generally can tune my encodes to devices and vise versa.

Yes, when you are the encoder it is a terrific feature. Also when you control the whole pipe (like if you are Netflix), then it is great too. But when you just need to decode any Main Profile clip, you can't rely on it sadly.

Quote:

Originally Posted by benwaggoner

Well, I think 8K video is probably farther out than VP10

.

How to tiles compare in practice to Slices/WPP as a parallelization mechanism? Say, 128x128 tiles?

Tiles have a coding efficiency impact, since you break the coding dependencies along boundaries, and CABAC adaptation is reset at the start of each tile (same argument goes for slices). If you go to small tiles like 128x128, you will see substantial coding loss due to adaptation being reset many times. Note that 128x128 is actually not allowed by HEVC Main Profile, the smallest allowed tile is 256x64.

I have spoken to some of the HEVC engineers at JCT-VC meetings who have said that in some sequences the distortion introduced by tile boundary independence can actually make the tile boundaries visible (a smart encoder could allocate more bits near the boundaries to compensate though).

HEVC tiles have a major advantage that you do not get with WPP: workload balancing. At the start of each picture, you can resize the tiles, making one wider/taller than another, or adding more. This lets you adapt your encoder threads so that they all have close to 100% duty cycle (they all finish approximately the same time). Also you can add more/less tiles as physical threads free up/get allocated elsewhere due to changing load on a machine.

An advantage of tiles over slices is that they can be more square, so you have better spatial correlation than long thin slices. Of course you also avoid coding the slice header more than once. You can also use tiles to quickly make a 4k encoder using 4 threads (hw or sw) of a 1080p encoder without very many modifications.

WPP makes the CABAC engine start each CTB row with the state from the start of the above CTB row, which really means that the adaptation is more spatially relevant. Normally when you finish one row and start the next, the context variables don't receive any special treatment despite the big jump spatially. With WPP, you can actually see some coding gains there,on top of the multithreading advantage. However, you do have to code entry points for each CTB row in the slice header, which actually can take up a significant chunk of data for low bitrates, or pictures with lots of skip CUs.