Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
11th July 2007, 12:56 | #1 | Link | |||
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Futzing with the x264 code -- possible improvements
I've been futzing with the x264 code today. I figured it would be a fun idea to find parts of the code that could be improved to increase encoding quality, even if just a bit.
I found the following in the code of the --me UMH motion search function, probably the most commonly used one: Quote:
1: Original, no change. 2: Quote:
3. Quote:
My command lines were relatively low-bitrate with pretty high settings: --trellis 2 --no-fast-pskip --subme 7 --bframes 4 --ref 16 --b-pyramid --partitions all --8x8dct --me umh --bime --b-rdo --mixed-refs --direct auto --weightb --progress --crf 35 --deblock 0:0 --trellis 2 --subme 6 --bframes 4 --ref 6 --b-pyramid --partitions all --8x8dct --me umh --bime --b-rdo --mixed-refs --direct auto --weightb --progress --crf 35 --deblock 0:0 --trellis 1 --subme 6 --bframes 4 --ref 3 --b-pyramid --partitions all --8x8dct --me umh --bime --b-rdo --mixed-refs --direct auto --weightb --progress --crf 35 --deblock 0:0 (tests 1, 2, and 3 respectively) Results: To measure the improvement with a single % value, I used the following formula: (1 / (1 - SSIM)) / Bitrate as my quality-per-bitrate metric. It divides by (1-SSIM) to represent the fact that SSIM increases in quality as it converges towards 1. Since doubling the distance of the SSIM value from 1 halves the quality, 1/(1-SSIM) effectively converts the nonlinear SSIM metric into a linear metric that can be directly compared. Test 1: 0.1725110171 % quality improvement with Single CROSS modification Test 2: 0.1697954506 % quality improvement with Single CROSS modification Test 3: 0.3167555529 % quality improvement with Single CROSS modification As you can see, the tests get a free 0.15-0.35% quality improvement for zero extra CPU cost; not bad! Since the Single CROSS modification had no significant effect on the speed (as its running the same code, just from a different starting point), all the FPSs are within error tolerances of the original. Test 1: 0.4983579238 % quality improvement with Double Cross modification Test 2: 0.5774497533 % quality improvement with Double Cross modification Test 3: 0.3762606716 % quality improvement with Double Cross modification (note that the above are improvements from the original, not from the Single Cross modification) Double CROSS gave even more of a boost: 0.37% at the least, and around 0.5% on the first two tests. Test 1: 9.1346153846 % Speed loss with Double Cross modification Test 2: 5.9615384615 % Speed loss with Double Cross modification Test 3: 2.5839793282 % Speed loss with Double Cross modification The disadvantage, of course, is a speed loss ranging from 2.5-9%. In summary, there are two possible changes here, one which adds a minor quality boost at zero CPU cost, and another that gives a further minor quality boost comparable to minor command lines like --no-fast-pskip at a pretty reasonable CPU cost. Either would need to be tested further before being incorporated into the code, but this is is my futzing for today System: Core 2 Duo 2Ghz/2GB RAM Compiler: Cygwin gcc 3.4.4 Extra compiler options: -march=opteron (seems to work best on my Core 2 Duo) Last edited by Dark Shikari; 11th July 2007 at 13:11. |
|||
11th July 2007, 13:06 | #2 | Link |
Turkey Machine
Join Date: Jan 2005
Location: Lowestoft, UK (but visit lots of places with bribes [beer])
Posts: 1,953
|
Could be useful as a HQ option extension for UMH.
__________________
On Discworld it is clearly recognized that million-to-one chances happen 9 times out of 10. If the hero did not overcome huge odds, what would be the point? Terry Pratchett - The Science Of Discworld |
11th July 2007, 13:10 | #3 | Link |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Yeah, I think we could use something between UMH and ESA; perhaps a modification of UMH with some extra searches added in intelligent places to improve overall quality/bitrate.
This is partially because ESA is completely useless in most cases because of its speed; one might even be better off with a longer-range UMH search than a shorter-range ESA. Another note is that most of the settings that really slow down H.264 don't give much improvement, like --ref 16 instead of --ref 6 on non-cartoon sources, or even --subme 7, which I find is overall not very useful compared to --subme 6. So the "bar" of quality improvement relative to speed loss for an new version of UMH for x264 is not very high, so it would not be as difficult to find improvements that are worth using. Last edited by Dark Shikari; 11th July 2007 at 13:27. |
11th July 2007, 16:36 | #4 | Link |
Turkey Machine
Join Date: Jan 2005
Location: Lowestoft, UK (but visit lots of places with bribes [beer])
Posts: 1,953
|
So replacing this for ESA would work better dya think? I personally never use ESA, but I don't know what benefits it would bring, because UMH seems to justify the search process... if you're checking each pixel difference like ESA does, you'd need a very fast and efficient algorithm or fast hardware, so this would do well to replace ESA.
Just chucking things round that would appeal to akupenguin.
__________________
On Discworld it is clearly recognized that million-to-one chances happen 9 times out of 10. If the hero did not overcome huge odds, what would be the point? Terry Pratchett - The Science Of Discworld |
11th July 2007, 16:36 | #5 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
I've had some other ideas of what to do with this section that I'll work with later; one is to use an if statement to only do the second CROSS if the motion vector changed previously. Another possibility is to perform a search other than CROSS in that if statement and see if that does any better.
Quote:
|
|
11th July 2007, 17:02 | #6 | Link |
Registered User
Join Date: Aug 2006
Posts: 2,229
|
I've heard from somewhere (can't remember where) that the code for x264 is rather unoptimised, there's a lot of places where MMX/MMXEXT/SSE/SSE2/SSE3/SSSE3 code can be included for extra speed but are currently missing. Is this true?
|
11th July 2007, 17:53 | #7 | Link |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
Dark Shikari : everything you did is OK, except the way you compare the results.
As you have noticed, the result quality depends on both the bitrate and the PSNR/SSIM/metric, so since both change at the same time, it's not easy to compare them. You decided to avoid that issue by saying, arbitrarily, that 'quality = 1/(1-SSIM)/bitrate', and then comparing qualities together. That is definitely not how it should be done. The proper way is to encode at several CRFs, and then to draw the curve metric/bitrate. Once curves are drawn, you can compare the modifications. Especially, you can say "at the same bitrate, the metrics differ by XXX", or "at the same metrics, the bitrate differs by YYY %". It's slower, but it works.
__________________
|
11th July 2007, 18:00 | #8 | Link |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
burfadel : you've heard wrong. x264 can be made faster - everything can be made faster. But it's definitely not "rather unoptimized".
What is missing, last time I checked, is SSSE3 for 32bits OSs ( since akupenguin uses a 64bits OS ), and, perhaps, some SSE2 functions instead of MMXEXT ( it would help on P4/conroe ). Imho, that won't represent more than 5/10% of speed gain. And, imho, if development time were to be spent on x264, I would rather look toward psychovisual enhancements, there are none at the moment, and it can dramatically improve things.
__________________
|
11th July 2007, 18:22 | #9 | Link |
x264 developer
Join Date: Sep 2004
Posts: 2,392
|
While you're at it, remove MMX1, SSE1, and SSE3 from your list of instruction sets. SSE1 and SSE3 are floating-point and thus useless for video coding, and the last cpu that only had MMX1 was way too slow for x264 anyway.
|
11th July 2007, 19:01 | #10 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
I agree that the results will differ at different CRFs, of course. I would have to do testing at multiple CRFs to see the true results at more bitrates. I would guess from experience that the higher the bitrate, the less effective the optimizations. Even if you disagree with my metric, you can always compare quality and bitrate separately, i.e. say "quality improved by 1%" and "bitrate improved by 1%" as separate statements. I'm testing a number of improvements to the code other than the one I've stated that should improve the effectiveness of the UMH algorithm... I'll post with more later. Last edited by Dark Shikari; 11th July 2007 at 19:10. |
|
11th July 2007, 19:29 | #11 | Link |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
CROSS( cross_start, i_me_range, i_me_range/2 );
if(saved_omx != bmx || saved_omy != bmy) { omx = bmx; omy = bmy; CROSS( cross_start, i_me_range, i_me_range/2 ); } gives the exact same results and seems to be a bit faster, so this would be preferable to the Double Cross solution above. This requires this: int saved_omx = omx; int saved_omy = omy; to be placed after the previous instance of omx = bmx; omy = bmy; Last edited by Dark Shikari; 11th July 2007 at 19:33. |
11th July 2007, 19:44 | #12 | Link |
x264 developer
Join Date: Sep 2004
Posts: 2,392
|
The problem isn't comparing a wide range of bitrate. Nonlinearity kicks in even for asymptotically small ranges: while the curves will be straight on sufficiently small intervals, their slopes are not necessarily the same.
In general, .05 db psnr is equivalent to 1% bitrate. But that doesn't mean 40.06 psnr & 1010 kb/s is better than 40.00 psnr & 1000 kb/s. Because sometimes it's .03db/%br and sometimes it's .07db/%br. (The same applies to ssim, I just picked psnr because I know the right numbers off-hand.) Sure you can say "encode A is 1% better quality and 1% better bitrate than encode B", knowing that it doesn't mean A is 2% better than B -- it may be more or less than 2% depending on how exactly quality maps to bitrate. The problem comes when encode A is 3% better quality and 1% worse bitrate -- that's not necessarily better at all. If you encode at multiple values of CRF then you can interpolate between them. You can think of it as experimentally determining the constant of proportionality between quality and bitrate for your specific content and settings, though interpolation is more general in that with sufficiently many samples it can handle a wide range of bitrate and thus non-constant proportionality. |
11th July 2007, 19:52 | #13 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
Of course, in my case both the bitrate and quality were generally improved (both of them), so the point is moot unless one gets a true bitrate/quality tradeoff involved. If I find a change that creates such a tradeoff I'll make sure to try your method first; I agree that the quality/bitrate curve can be nastily nonlinear at times, and you certainly have much more experience with the curve as one of the coders behind x264, so I'll trust you on that. Last edited by Dark Shikari; 11th July 2007 at 19:59. |
|
12th July 2007, 05:14 | #14 | Link |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
It appears that changing the hexagon grid in UMH to:
/* hexagon grid */ omx = bmx; omy = bmy; for( i = 1; i <= i_me_range/4; i++ ) { static const int hex4[20][2] = { {-4, 2}, {-4, 1}, {-4, 0}, {-4,-1}, {-4,-2}, { 4,-2}, { 4,-1}, { 4, 0}, { 4, 1}, { 4, 2}, { 2, 3}, { 0, 4}, {-2, 3}, {-2,-3}, { 0,-4}, { 2,-3}, { 3, 2}, { 3,-2}, {-3, 2}, {-3,-2} }; if( 4*i > X264_MIN4( mv_x_max-omx, omx-mv_x_min, mv_y_max-omy, omy-mv_y_min ) ) { for( j = 0; j < 20; j++ ) { int mx = omx + hex4[j][0]*i; int my = omy + hex4[j][1]*i; if( CHECK_MVRANGE(mx, my) ) COST_MV( mx, my ); } } else { COST_MV_X4( -4*i, 2*i, -4*i, 1*i, -4*i, 0*i, -4*i,-1*i ); COST_MV_X4( -4*i,-2*i, 4*i,-2*i, 4*i,-1*i, 4*i, 0*i ); COST_MV_X4( 4*i, 1*i, 4*i, 2*i, 2*i, 3*i, 0*i, 4*i ); COST_MV_X4( -2*i, 3*i, -2*i,-3*i, 0*i,-4*i, 2*i,-3*i ); COST_MV_X4( -3*i, 2*i, -3*i,-2*i, 3*i, 2*i, 3*i,-2*i ); } } gives a decent boost on the clips/settings I've tried it on (adding 4 more spots to the hexagon). |
12th July 2007, 09:37 | #15 | Link | |
Turkey Machine
Join Date: Jan 2005
Location: Lowestoft, UK (but visit lots of places with bribes [beer])
Posts: 1,953
|
Quote:
__________________
On Discworld it is clearly recognized that million-to-one chances happen 9 times out of 10. If the hero did not overcome huge odds, what would be the point? Terry Pratchett - The Science Of Discworld |
|
12th July 2007, 10:18 | #16 | Link |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quality obviously, not speed, how in the world would adding more spots to the hexagon boost speed
I'll run some more metrics in a bit, I didn't save the particulars but it'll be easy to run again and post here. One thing I noticed is that adding more Y-direction motion searching didn't help much, probably because most input video is longer in width than in height and has more side-to-side motion than up-down motion, and so its safe to be biased in that manner; if anything further Y-direction motion searching actually hurt SSIM. |
24th July 2007, 04:15 | #17 | Link |
<The VFW Sheep of Death>
Join Date: Dec 2004
Location: Deathly pasture of VFW
Posts: 1,149
|
Any results for those metrics you were planning to run?
Assuming this "futzing" would indeed yield such improvement in the general case, what effect would changing --merange have with this new algorithm? Would X- and Y-direction motion searching be offset proportionally to the overall extension in search range? Also, in the neighborhood of suggested improvements, I would without hesitation suggest shunting the Exhaustive search onto a different thread than all the other processing. That is, if it proves too difficult to implement ESA into the current multi-thread framework.
__________________
Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds! Last edited by DeathTheSheep; 24th July 2007 at 04:34. |
|
|