Slices vs Tiles/Wpp parallelization [Archive]

View Full Version : Slices vs Tiles/Wpp parallelization

rudyb

16th April 2015, 03:38

I cannot really follow the spec to see how they use Slices for parallelization?

I can see that spec talks about the concept of Entry Points, and I believe these are associated with WPP/Tiles.
In the Slice Segment Header syntax, I can see that there are syntaxes that are associated with so called Entry Points, where you provide the Byte offset for your entry points.
So, this way the decoder would know where to exactly start decoding different entry points (subsets or substreams) independently in parallel, by having access to their address location.

However, I don't see something similar for Slice segments!

I was hopping to see some parameters somewhere in one of the parameter sets (PPS/SPS/VPS) that provides some sort of information about the location of different slices segments, so that the decoder can start decoding different independent slice segments in parallel.

But, how is this done?
If no where any information is provided to the decoder about the slice segment start locations, then how the decoder will be able to decode independent slices independently in parallel?

The closest I found is the slice address location syntax "slice_segment_address", which is provided in the slice segment header, but this is just for each particular slice segment, and it doesn't provide the decoder the ability to start different slice segments at the same time. This is just used for the Slice segment re-synchronization. If we loose a slice, the next slice will start from the correct position, but that is about it.

Can anyone explain how HEVC decoder will know about different slice segment locations, so it can start decoding them in parallel fashion?
Shouldn't there be some flags in (e.g. PPS) to tell how many slice segment there is, with the corresponding address offset for each one?!

Sulik

16th April 2015, 05:00

The slice segment address can tell you exactly where the slice segment starts in the picture (basically just a ctb count in a fixed scan order).

rudyb

16th April 2015, 05:39

but that slice segment address information is inside Slice Segment Header.
For a continues bitstream, this means that the decoder cannot start the second Slice Segment #2 until it is done with Slice Segment #1 RBSP. That is because it doesn't yet know the start address for Slice Segment #2, since its start address is in Slice segment #2 header. The decoder doesn't know yet, because it is currently processing RBSP of Slice Segment #1.
The only way for the decoder to start Slice Segments concurrently is somehow have access to all the Slice Segment start addresses beforehand. This means that some other syntax, besides Slice Segment Header, must contain the information about how many Slice Segment there is, and what is the offset address for each one...
But the question is how does the decoder get this information? I cannot see how the decoder access those information!

xooyoozoo

16th April 2015, 08:19

Slices are packaged in NAL units. Skipping ahead to the start of the next NAL unit is pretty straightforward and relies on a certain sequence of bits as a marker.

pieter3d

17th April 2015, 23:50

That applies to AnnexB streams. If the stream is instead packaged in a container format like MKV then you can use the container's format to find the other slices.

Note that by doing parallelism with slice you are not being as efficent as possible. Each slice means you have to repeat the slider header again, wasting bits. Also, prediction is broken across slice boundaries resulting in compression efficiency loss. WPP addresses this. Tiles break prediction too, but at least don't have the repeated header overhead.