Dark Shikari
11th July 2007, 13:56
I've been futzing with the x264 code today. I figured it would be a fun idea to find parts of the code that could be improved to increase encoding quality, even if just a bit.
I found the following in the code of the --me UMH motion search function, probably the most commonly used one:
/* FIXME if the above DIA2/OCT2/CROSS found a new mv, it has not updated omx/omy. We are still centered on the same place as the DIA2. is this desirable? */
CROSS( cross_start, i_me_range, i_me_range/2 );
In other words, even if the above functions found a new motion vector, the CROSS motion search is activated at the previous position... but its not clear whether this is necessarily a bad thing. So I decided to test this. I grabbed a short clip from Ocean's Eleven, about 10 seconds, that had a lot of motion and was relatively denoised. I tested three different code sets:
1: Original, no change.
2: omx = bmx; omy = bmy;
CROSS( cross_start, i_me_range, i_me_range/2 );
i.e. doing exactly what the comment suggested.
3. CROSS( cross_start, i_me_range, i_me_range/2 );
omx = bmx; omy = bmy;
CROSS( cross_start, i_me_range, i_me_range/2 );
Doing the original and what the comment suggested. My reasoning for this is that its quite possible that an earlier search missed some vectors near the origin, and that CROSS exists here to catch some of them that were missed earlier. On the other hand, the comment is making a good suggestion; it might be good to run the CROSS at the new location, too. So why not try both and see what happens?
My command lines were relatively low-bitrate with pretty high settings:
--trellis 2 --no-fast-pskip --subme 7 --bframes 4 --ref 16 --b-pyramid --partitions all --8x8dct --me umh --bime --b-rdo --mixed-refs --direct auto --weightb --progress --crf 35 --deblock 0:0
--trellis 2 --subme 6 --bframes 4 --ref 6 --b-pyramid --partitions all --8x8dct --me umh --bime --b-rdo --mixed-refs --direct auto --weightb --progress --crf 35 --deblock 0:0
--trellis 1 --subme 6 --bframes 4 --ref 3 --b-pyramid --partitions all --8x8dct --me umh --bime --b-rdo --mixed-refs --direct auto --weightb --progress --crf 35 --deblock 0:0
(tests 1, 2, and 3 respectively)
Results:
http://i11.tinypic.com/4xo0wtz.png
To measure the improvement with a single % value, I used the following formula:
(1 / (1 - SSIM)) / Bitrate
as my quality-per-bitrate metric. It divides by (1-SSIM) to represent the fact that SSIM increases in quality as it converges towards 1. Since doubling the distance of the SSIM value from 1 halves the quality, 1/(1-SSIM) effectively converts the nonlinear SSIM metric into a linear metric that can be directly compared.
Test 1: 0.1725110171 % quality improvement with Single CROSS modification
Test 2: 0.1697954506 % quality improvement with Single CROSS modification
Test 3: 0.3167555529 % quality improvement with Single CROSS modification
As you can see, the tests get a free 0.15-0.35% quality improvement for zero extra CPU cost; not bad!
Since the Single CROSS modification had no significant effect on the speed (as its running the same code, just from a different starting point), all the FPSs are within error tolerances of the original.
Test 1: 0.4983579238 % quality improvement with Double Cross modification
Test 2: 0.5774497533 % quality improvement with Double Cross modification
Test 3: 0.3762606716 % quality improvement with Double Cross modification
(note that the above are improvements from the original, not from the Single Cross modification)
Double CROSS gave even more of a boost: 0.37% at the least, and around 0.5% on the first two tests.
Test 1: 9.1346153846 % Speed loss with Double Cross modification
Test 2: 5.9615384615 % Speed loss with Double Cross modification
Test 3: 2.5839793282 % Speed loss with Double Cross modification
The disadvantage, of course, is a speed loss ranging from 2.5-9%.
In summary, there are two possible changes here, one which adds a minor quality boost at zero CPU cost, and another that gives a further minor quality boost comparable to minor command lines like --no-fast-pskip at a pretty reasonable CPU cost.
Either would need to be tested further before being incorporated into the code, but this is is my futzing for today :cool:
System: Core 2 Duo 2Ghz/2GB RAM
Compiler: Cygwin gcc 3.4.4
Extra compiler options: -march=opteron (seems to work best on my Core 2 Duo)
I found the following in the code of the --me UMH motion search function, probably the most commonly used one:
/* FIXME if the above DIA2/OCT2/CROSS found a new mv, it has not updated omx/omy. We are still centered on the same place as the DIA2. is this desirable? */
CROSS( cross_start, i_me_range, i_me_range/2 );
In other words, even if the above functions found a new motion vector, the CROSS motion search is activated at the previous position... but its not clear whether this is necessarily a bad thing. So I decided to test this. I grabbed a short clip from Ocean's Eleven, about 10 seconds, that had a lot of motion and was relatively denoised. I tested three different code sets:
1: Original, no change.
2: omx = bmx; omy = bmy;
CROSS( cross_start, i_me_range, i_me_range/2 );
i.e. doing exactly what the comment suggested.
3. CROSS( cross_start, i_me_range, i_me_range/2 );
omx = bmx; omy = bmy;
CROSS( cross_start, i_me_range, i_me_range/2 );
Doing the original and what the comment suggested. My reasoning for this is that its quite possible that an earlier search missed some vectors near the origin, and that CROSS exists here to catch some of them that were missed earlier. On the other hand, the comment is making a good suggestion; it might be good to run the CROSS at the new location, too. So why not try both and see what happens?
My command lines were relatively low-bitrate with pretty high settings:
--trellis 2 --no-fast-pskip --subme 7 --bframes 4 --ref 16 --b-pyramid --partitions all --8x8dct --me umh --bime --b-rdo --mixed-refs --direct auto --weightb --progress --crf 35 --deblock 0:0
--trellis 2 --subme 6 --bframes 4 --ref 6 --b-pyramid --partitions all --8x8dct --me umh --bime --b-rdo --mixed-refs --direct auto --weightb --progress --crf 35 --deblock 0:0
--trellis 1 --subme 6 --bframes 4 --ref 3 --b-pyramid --partitions all --8x8dct --me umh --bime --b-rdo --mixed-refs --direct auto --weightb --progress --crf 35 --deblock 0:0
(tests 1, 2, and 3 respectively)
Results:
http://i11.tinypic.com/4xo0wtz.png
To measure the improvement with a single % value, I used the following formula:
(1 / (1 - SSIM)) / Bitrate
as my quality-per-bitrate metric. It divides by (1-SSIM) to represent the fact that SSIM increases in quality as it converges towards 1. Since doubling the distance of the SSIM value from 1 halves the quality, 1/(1-SSIM) effectively converts the nonlinear SSIM metric into a linear metric that can be directly compared.
Test 1: 0.1725110171 % quality improvement with Single CROSS modification
Test 2: 0.1697954506 % quality improvement with Single CROSS modification
Test 3: 0.3167555529 % quality improvement with Single CROSS modification
As you can see, the tests get a free 0.15-0.35% quality improvement for zero extra CPU cost; not bad!
Since the Single CROSS modification had no significant effect on the speed (as its running the same code, just from a different starting point), all the FPSs are within error tolerances of the original.
Test 1: 0.4983579238 % quality improvement with Double Cross modification
Test 2: 0.5774497533 % quality improvement with Double Cross modification
Test 3: 0.3762606716 % quality improvement with Double Cross modification
(note that the above are improvements from the original, not from the Single Cross modification)
Double CROSS gave even more of a boost: 0.37% at the least, and around 0.5% on the first two tests.
Test 1: 9.1346153846 % Speed loss with Double Cross modification
Test 2: 5.9615384615 % Speed loss with Double Cross modification
Test 3: 2.5839793282 % Speed loss with Double Cross modification
The disadvantage, of course, is a speed loss ranging from 2.5-9%.
In summary, there are two possible changes here, one which adds a minor quality boost at zero CPU cost, and another that gives a further minor quality boost comparable to minor command lines like --no-fast-pskip at a pretty reasonable CPU cost.
Either would need to be tested further before being incorporated into the code, but this is is my futzing for today :cool:
System: Core 2 Duo 2Ghz/2GB RAM
Compiler: Cygwin gcc 3.4.4
Extra compiler options: -march=opteron (seems to work best on my Core 2 Duo)