Quarkboy
27th April 2008, 11:31
I've been reading through the ITU H.264 specifications (2005 revision), mainly for fun (yes, I know, I'm sadistic), but now that I have a much better grasp of the extract structure of the format, a few lingering questions remain about the overall purpose of some of the included syntax/semantics:
First, what exactly is the practical use of slices? I understand that it's useful in theory to split up a picture into smaller logical units that aren't necessarily the same size, but then there are all those various slice mappings (section 8.2.2), which assuming I'm reading the specs right basically map various sub rectangles of the picture to slices and uses non-standard orderings of the macroblocks within them (i.e. not simply raster ordering but a bunch of other variants).
While I could see the theoretical benefit of using these types of slices in specific cases, the computation complexity of searching for the optimal organization would seem to far outweigh any benefit to using these non-standard scannings... especially considering that you can decide intra/inter prediction based on sub_macroblock precision...
I know that x264 uses slices to do multi-threaded encoding on multi-core processors, but there seems to be little use for anything but the simplest style of slicing, why is such a complicated collection part of the specifications (and required for all profiles, I might add)?
Second, I don't understand how SI and SP mblocks can be practically applied to the situation they were intended for, random access and editing ease of use. First of all, they're only allowed in Extended profile, but extended also has a bunch of other restrictions making it a lot closer to baseline than high-profile, including no cabac and no higher precision color modes. The basic idea with them seems to be that they are redundant slices enabling the frames to be decoded in different orderings, but I don't see how an encoder could reasonably make use of this feature to ensure true random-access or editing capability. Wouldn't it be easier just to force insertion of IDR frames in the encoder to create cut points then try to optimize SP/SI macroblocks... not to mention that video editing is done per-frame, and not per-slice, so why are the SI and SP slicing parameters in the first place?
I realize that x264 doesn't implement extended profile, and maybe some of my questions are behind the reasons for that, but I'd like to hear what the elders here have to think about these questions.
First, what exactly is the practical use of slices? I understand that it's useful in theory to split up a picture into smaller logical units that aren't necessarily the same size, but then there are all those various slice mappings (section 8.2.2), which assuming I'm reading the specs right basically map various sub rectangles of the picture to slices and uses non-standard orderings of the macroblocks within them (i.e. not simply raster ordering but a bunch of other variants).
While I could see the theoretical benefit of using these types of slices in specific cases, the computation complexity of searching for the optimal organization would seem to far outweigh any benefit to using these non-standard scannings... especially considering that you can decide intra/inter prediction based on sub_macroblock precision...
I know that x264 uses slices to do multi-threaded encoding on multi-core processors, but there seems to be little use for anything but the simplest style of slicing, why is such a complicated collection part of the specifications (and required for all profiles, I might add)?
Second, I don't understand how SI and SP mblocks can be practically applied to the situation they were intended for, random access and editing ease of use. First of all, they're only allowed in Extended profile, but extended also has a bunch of other restrictions making it a lot closer to baseline than high-profile, including no cabac and no higher precision color modes. The basic idea with them seems to be that they are redundant slices enabling the frames to be decoded in different orderings, but I don't see how an encoder could reasonably make use of this feature to ensure true random-access or editing capability. Wouldn't it be easier just to force insertion of IDR frames in the encoder to create cut points then try to optimize SP/SI macroblocks... not to mention that video editing is done per-frame, and not per-slice, so why are the SI and SP slicing parameters in the first place?
I realize that x264 doesn't implement extended profile, and maybe some of my questions are behind the reasons for that, but I'd like to hear what the elders here have to think about these questions.