View Full Version : Slicing in x264
immars
18th June 2009, 13:27
Hello,
Is there any possibility to specify parameters to control the slicing procedure? Changing the size of slices, for example.
I've searched through the help message but got nothing about slice.
And, according to the last lines of x264 debug message:
...
x264 [debug]: frame=45778 QP=37.36 NAL=2 Slice:P Poc:8 I:6 P:163 SKIP:7991 size=657 bytes
x264 [info]: slice I:1513 Avg QP:35.56 size: 16721
x264 [info]: slice P:44266 Avg QP:39.10 size: 2123
x264 [info]: mb I I16..4: 95.0% 0.0% 5.0%
x264 [info]: mb P I16..4: 6.3% 0.0% 0.0% P16..4: 6.3% 0.5% 0.0% 0.0% 0.0% skip:86.8%
...
It seems x264 does not do slicing within frames at all: exactly 1 slice for every frame (slice I + slice P = 1513 + 44266 = 45779 = frame count).
Am I correct about this?
Thanks for any tips.
Sharktooth
18th June 2009, 13:39
x264 doesnt actually support slices.
LoRd_MuldeR
18th June 2009, 13:39
x264 does not use slices! Long ago it used to use slice-based multi-threading, but up-to-date x264 uses frame-based multi-threading!
As far as I understand, the "slice" entries in x264's log simply refer to "frame types". But definitely frames are not split into slices...
(Note: Frame-based multi-threading is much advanced over slice-based multi-threading, because slices hurt compression. And the more processors, the more slices are needed to keep all these processors busy and the more quality/compression will suffer. x264's frame-based multi-threading scales very well to 16 cores or even more, with near-zero quality loss for multiple threads)
sabeelmk
18th June 2009, 14:27
@Lord_mulder
If frame level multithreading is preferrable
How does one take care of dependencies between frames in Frame level multi-thread encoding,if we try to encode each frame say on each core of a multi-core processor.
LoRd_MuldeR
18th June 2009, 14:35
@Lord_mulder
If frame level multithreading is preferrable
How does one take care of dependencies between frames in Frame level multi-thread encoding,if we try to encode each frame say on each core of a multi-core processor.
x264 does take care of this, don't worry ;)
Each frame/thread references only to parts of its reference frames that have already been encoded. So x264 can start to encode a frame, even if the reference frames are not completed yet.
This somewhat restricts vertical motion search in theory, but doesn't have any significant effect on quality in reality. Only very fast upward motion may suffer...
http://git.videolan.org/gitweb.cgi?p=x264.git;a=blob;f=doc/threads.txt
benwaggoner
18th June 2009, 17:34
This somewhat restricts vertical motion search in theory, but doesn't have any significant effect on quality in reality. Only very fast upward motion may suffer...
Hmmm. Anyone ever do any statistical analysis to determine the ratio of upward versus downward motion vectors :)?
Given gravity, this probably turns out to be a very clever optimization.
benwaggoner
18th June 2009, 17:46
But more broadly, aren't there cases where you'd want slices for decode complexity? Doing HD software decoding on multicore can be a bit easier with a couple slices so CABAC decode can be parallelized.
And it's required by spec (although not generally in practice) for high bitrate Blu-ray.
Sharktooth
18th June 2009, 17:52
maybe one day x264 devs will add slices support, but i think it's not high in their priorities... if it's there at all...
LoRd_MuldeR
18th June 2009, 19:36
Hmmm. Anyone ever do any statistical analysis to determine the ratio of upward versus downward motion vectors :)?
There is a "Speed vs. PSNR" benchmark in the document I linked too. It contains both methods, "old" (slice-based) and "new" (frame-based) multi-threading.
With 8 threads the "new" method has a penalty of -0.019 PSNR, while the "old" method already has a -0.095 PSNR penalty at only 4 threads/slices.
BTW: I'd guess that most motion in horizontal direction anyway (just speculating here). So it's more a question of vertical vs. horizontal motion, I would think.
akupenguin
18th June 2009, 22:41
But more broadly, aren't there cases where you'd want slices for decode complexity? Doing HD software decoding on multicore can be a bit easier with a couple slices so CABAC decode can be parallelized.
So use frame based threading in the decoder too. It has the same dependencies as encoding (or slightly less, since motion estimation needs a neighborhood around the final mv).
benwaggoner
18th June 2009, 23:07
BTW: I'd guess that most motion in horizontal direction anyway (just speculating here). So it's more a question of vertical vs. horizontal motion, I would think.
Yes, motion is absolutely dominated by horizontal motion. Like 10:1 or something.
LoRd_MuldeR
18th June 2009, 23:27
So use frame based threading in the decoder too. It has the same dependencies as encoding (or slightly less, since motion estimation needs a neighborhood around the final mv).
Just want to add that ffmpeg-MT already works like that :cool:
immars
19th June 2009, 04:26
Thanks for all you guys.!
Learned much from the reply :)
benwaggoner
19th June 2009, 04:28
Just want to add that ffmpeg-MT already works like that :cool:
Yep. Presumably adds a frame or two of decode latency (need enough pipeline for that big I-frame whch takes more than one frame's duartion to de-CABAC), but well worth it.
skal
29th June 2009, 19:53
There's probably so-so decoders out there that will *only* support slice-level parallel decoding (if ever), because that's the easiest to implement.
LoRd_MuldeR
29th June 2009, 21:23
There's probably so-so decoders out there that will *only* support slice-level parallel decoding (if ever), because that's the easiest to implement.
Who cares? We have enough multi-threaded H.264 decoders available. Both, OpenSource (ffmpeg/libavcodec) and Proprietary (CoreAVC Decoder, DivX H.264 Decoder, etc).
The one and only reason for re-adding slice support to x264 that I see is "full" BluRay conformity. And even the "real world" BluRay players don't require slices...
benwaggoner
29th June 2009, 21:28
Who cares? We have enough multi-threaded H.264 available. Both, OpenSource (ffmpeg/libavcodec) and Proprietary (CoreAVC Decoder, DivX H.264 Decoder, etc).
The one and only reason for re-adding slice support to x264 that I see is "full" BluRay conformity...
Lower end-to-end broadcast delay is another useful place for slices, since slicing allows per-frame parallelization at encode and decode. I don't know how important a scenario that is for x264.
Semi-random question; do we have any idea what the relative efficiency hit of slicing comes from the loss of motion vectors at boundaries versus entropy encoding being less effcient as there's less to work with in each slice?
If I had to make a random guess it'd be at leaset 85% entropy and at most 15% vectors for typical content.
Dark Shikari
29th June 2009, 22:02
Lower end-to-end broadcast delay is another useful place for slices, since slicing allows per-frame parallelization at encode and decode. I don't know how important a scenario that is for x264.
Semi-random question; do we have any idea what the relative efficiency hit of slicing comes from the loss of motion vectors at boundaries versus entropy encoding being less effcient as there's less to work with in each slice?
If I had to make a random guess it'd be at leaset 85% entropy and at most 15% vectors for typical content.I doubt entropy is a very big deal at all; remember, CABAC was designed for CIF resolution, so it adapts really fast.
And you don't lose motion vectors at boundaries either, by the way.
LoRd_MuldeR
29th June 2009, 22:04
And you don't lose motion vectors at boundaries either, by the way.
But they can't point across the slice border. And that's why we loose compression efficiency, right?
Dark Shikari
29th June 2009, 22:12
But they can't point across the slice border.Sure they can.
LoRd_MuldeR
29th June 2009, 22:16
Sure they can.
So where do slices loose compression efficiency then? :confused:
Dark Shikari
29th June 2009, 22:20
So where do slices loos compression efficiency then? :confused:1. Worse entropy coding.
2. Worse prediction (you can't prediction motion vectors/etc from the previous frame IIRC).
benwaggoner
30th June 2009, 05:44
I've never seen a slice implementation where motion vectors would cross boundaries....
Dust Signs
30th June 2009, 07:17
@benwaggoner: But Dark Shikari is right, it's the same as the fact that motion vectors can point "out of the picture"/frame. The number of cases where this is actually useful may of course be quite low.
@Dark Shikari: If I remember correctly, you can predict MVs from the previous frame but you cannot predict them from one of the other slices in the current frame. But I'll have to check that.
Dust Signs
Manao
30th June 2009, 08:53
Dust Signs : only skip/direct in bframes might be predicted from another frame, and even so, you need the frame to be encoded using temporal direct and not spatial direct (which happens 1% of the time when you use --direct auto).
benwaggoner : you mustn't have looked hard enough. The only sliced encoding for which MV didn't cross slice boundaries was an HD mpeg2 encode. Most likely, the encoder was made of 6 SD encoders which each encoded its own part of the video, without seeing the other parts. But apart from such hackish construct, sliced encoders ought to allow MV to cross slice boundaries.
squid808
1st July 2009, 11:13
What is be the reason for not letting the motion vectors cross the slice boundaries?
I've never seen a slice implementation where motion vectors would cross boundaries....
yeye69
1st July 2009, 16:40
Guys, can you tell when a multi-threading will be in a main branch of ffmpeg/mplayer? (I know about ffmpeg-mt)
akupenguin
3rd July 2009, 01:52
Do we have any idea what the relative efficiency hit of slicing comes from the loss of motion vectors at boundaries versus entropy encoding being less efficient as there's less to work with in each slice?
Tested at 720p with 45 slices (one per mb row) to maximize the total cost for easy measurement. Averaged over 4 movies at crf20 and crf30. Total cost: +30% bitrate at constant psnr.
I enabled the various components of slicing one at a time, and measured the portion of that cost they contribute:
34% intra prediction
25% redundant slice headers, nal headers, and rounding to whole bytes
16% mv prediction
16% reset cabac contexts
6% deblocking between slices (you don't strictly have to turn this off just for standard compliance, but you do if you want to use slices for decoder multithreading)
2% cabac neighbors (cbp, skip, etc)
benwaggoner
3rd July 2009, 09:34
Tested at 720p with 45 slices (one per mb row) to maximize the total cost for easy measurement. Averaged over 4 movies at crf20 and crf30. Total cost: +30% bitrate at constant psnr.
I enabled the various components of slicing one at a time, and measured the portion of that cost they contribute:
34% intra prediction
25% redundant slice headers, nal headers, and rounding to whole bytes
16% mv prediction
16% reset cabac contexts
6% deblocking between slices (you don't strictly have to turn this off just for standard compliance, but you do if you want to use slices for decoder multithreading)
2% cabac neighbors (cbp, skip, etc)
Wow, what a delightfully data-filled response.
The 25% from redundency I hadn't even thought about. Of course, 45 slices is certainly an edge case.
Thanks for some good chewy numbers.
akupenguin
3rd July 2009, 10:52
Of course, 45 slices is certainly an edge case.
The proportional cost of redundant headers should certainly depend on bitrate (since the header size is constant and everything else depends on bitrate). Deblocking should too (due to varing deblock strength).
But none of the proportions should depend strongly on the number of slices: some are triggered per slice while some are triggered per macroblock-that's-on-the-edge-of-a-slice, but as long as there's no more than 1 slice per row, the relative frequency of those two conditions is determined solely by the image width.
Sharktooth
3rd July 2009, 13:35
does adding slicing support overcomplicate the x264 codebase?
benwaggoner
3rd July 2009, 19:59
does adding slicing support overcomplicate the x264 codebase?
x264 used to use slicing for multithreading, no? So adding it back in to enable lower-latency multithreading shouldn't be that big a deal.
If anything, single-slice multithreading is more complicated, but better as we can see here.
Dark Shikari
3rd July 2009, 21:25
x264 used to use slicing for multithreading, no? So adding it back in to enable lower-latency multithreading shouldn't be that big a deal.We don't intend to support sliced multithreading even if we add slicing support.
benwaggoner
3rd July 2009, 21:39
We don't intend to support sliced multithreading even if we add slicing support.
Certainly. But since you new how to do "full" slicing" I woudln't think slicing-for-CABAC shouldn't add overcomplexity
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.