Current Patches, Where to get them, How they affect speed/output [Archive]

morph166955

29th September 2007, 16:30

Something I've noticed is that while we have Cef's repository of patches that he uses on his builds at http://mirror05.x264.nl/Cef/?dir=./patches there is no central place to explain what they do, what their effect on both the speed of the encode as well as Creator/Maintainer: Dark Shikari
Description: the output is, who wrote it/where it originated, and if they aren't on Cef's site where to get them (or where they originally came from in case they are updated and not updated on Cef's site). I have included below a list of the ones currently on Cef's as well as an explanation of the one that I know. I would appreciate if people could fill in for some others, I'll update this post with the explanations as people make posts. Please try to use the format that I use below for the thread pool patch so that I don't have to parse through it for the info. Thanks in advance for all who contribute!

Thread Pool Patch:
Current: http://www.benswebs.com/public/x264/patches/x264_thread_pool.04c.r680.diff
Other Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_thread_pool.r680.diff
Origin: http://forum.doom9.org/showthread.php?t=124557
Creator: akupenguin
Maintainer: morph166955/Cef
Description: Forces x264 to use the same threads over and over again instead of creating and destroying threads as needed. Speed benefits seen on Quad-Core and Octa-Core machines (as much as a 20% speed boost seen on my Octa-Core), either little or negative speed change seen on single and dual core systems. The current revision on Cef's site was modified by him to work with r680, the one on my site is basically the same.

Faster DIA patch:
Current:http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_faster-dia.diff
Current:http://www.benswebs.com/public/x264/patches/x264_faster-dia.r680.diff
Creator/Maintainer: Dark Shikari
Description: Tiny patch, 3.5% faster DIA for better first pass.

Subme 7 Improvement
Current:http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_subme7_vc8.diff
Creator/Maintainer: Dark Shikari
Description: Improved subme 7. Basically no speed impact, small quality boost.

SATD ESA Fullpel Comparison Patch:
Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_satd_fpel.11.diff
Creator/Maintainer: Dark Shikari
Description: Allows SATD to be used as a fullpel comparison metric. Totally useless with any search other than ESA, since the SATD ESA has been optimized so well by Akupenguin.

ME Prepass Patch:
Current: http://www.benswebs.com/public/x264/patches/x264_me-prepass_ham.diff (use with hadamard patch)
Current: http://www.benswebs.com/public/x264/patches/x264_me-prepass_noham.diff (use without hadamard patch)
Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_me-prepass.diff
Creator/Maintainer: Dark Shikari
Description: Runs an ME prepass on the predictors before actually doing the motion search. Somewhat bugged--it can probably be a lot better than it currently is.

IMH Motion Estimation Patch:
Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_IMH.diff
Creator/Maintainer: Dark Shikari
Description: A motion search slower than UMH but faster than ESA. Not that worthwhile since ESA is now threaded.

HD HRD/Pulldown Patch:
Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_hrd_pulldown.diff
Creator: Ian Caulfield/Trahald
Description: HRD and pulldown for HD compatibility.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_bssd.diff

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_bchanges.diff

AQ/BRDO Patch:
Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_aq-brdo.diff
Description: This was added to the source a while ago, fixing a bug with AQ and BRDO.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_2pass_vbv.diff

Second Pass ETA Patch:
Current: http://www.benswebs.com/public/x264/patches/x264_fp-eta.01.r680.diff
Creator/Maintainer: morph166955
Description: Forces x264 to use the frame count from the stats file on a second pass if the frame count can't be calculated for some reason (such as the use of a fifo pipe).

Dark Shikari

29th September 2007, 20:07

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_faster-dia.diff
Tiny patch, 3.5% faster DIA for better first pass.
http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_subme7_vc8.diff
Improved subme 7. Basically no speed impact, small quality boost.
http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_satd_fpel.11.diffAllows SATD to be used as a fullpel comparison metric. Totally useless with any search other than ESA, since the SATD ESA has been optimized so well by Akupenguin.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_me-prepass.diff

Runs an ME prepass on the predictors before actually doing the motion search. Somewhat bugged--it can probably be a lot better than it currently is.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_IMH.diff

A motion search slower than UMH but faster than ESA. Not that worthwhile since ESA is now threaded.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_hrd_pulldown.diff

HRD and pulldown for HD compatibility.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_aq-brdo.diff

This was added to the source a while ago, fixing a bug with AQ and BRDO.

J_Darnley

30th September 2007, 01:39

There is another patch, the clock/timing/progress one. I don't know if it still works, the diff I have is from rev. 614

http://users.telenet.be/darnley/x264_clock1-614.diff

It prints the total encoding time and prints process 10000 time per file instead of 1000

Sharktooth

30th September 2007, 02:25

moooo

Dark Shikari

30th September 2007, 06:30

Here's my fixed ME_Prepass patch.

Index: common/common.c
===================================================================
--- common/common.c (revision 675)
+++ common/common.c (working copy)
@@ -441,6 +441,8 @@
p->analyse.i_mv_range_thread = atoi(value);
OPT2("subme", "subq")
p->analyse.i_subpel_refine = atoi(value);
+ OPT2("me-prepass", "meprepass")
+ p->analyse.i_me_prepass = atobool(value);
OPT("bime")
p->analyse.b_bidir_me = atobool(value);
OPT("chroma-me")
@@ -879,6 +881,7 @@
s += sprintf( s, " analyse=%#x:%#x", p->analyse.intra, p->analyse.inter );
s += sprintf( s, " me=%s", x264_motion_est_names[ p->analyse.i_me_method ] );
s += sprintf( s, " subme=%d", p->analyse.i_subpel_refine );
+ s += sprintf( s, " me-prepass=%d", p->analyse.i_me_prepass );
s += sprintf( s, " brdo=%d", p->analyse.b_bframe_rdo );
s += sprintf( s, " mixed_ref=%d", p->analyse.b_mixed_references );
s += sprintf( s, " me_range=%d", p->analyse.i_me_range );
Index: encoder/me.c
===================================================================
--- encoder/me.c (revision 675)
+++ encoder/me.c (working copy)
@@ -61,6 +61,23 @@
COPY3_IF_LT( bpred_cost, cost, bpred_mx, mx, bpred_my, my ); \
}

+#define COST_MV_HPEL2( mx, my, cost ) \
+{ \
+ int stride = 16; \
+ uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
+ cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
+ + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
+}
+
+#define COST_MV_HPEL3( mx, my) \
+{ \
+ int stride = 16; \
+ uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
+ int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
+ + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
+ COPY3_IF_LT( bestcost, cost, bestx, mx, besty, my ); \
+}
+
#define COST_MV_X3_DIR( m0x, m0y, m1x, m1y, m2x, m2y, costs )\
{\
uint8_t *pix_base = p_fref + bmx + bmy*m->i_stride[0];\
@@ -177,18 +194,85 @@
pmx = ( bmx + 2 ) >> 2;
pmy = ( bmy + 2 ) >> 2;
bcost = COST_MAX;
-
+
/* try extra predictors if provided */
if( h->mb.i_subpel_refine >= 3 )
{
COST_MV_HPEL( bmx, bmy );
- for( i = 0; i < i_mvc; i++ )
+ if(!h->param.analyse.i_me_prepass)
{
- const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
- const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
- if( mx != bpred_mx || my != bpred_my )
- COST_MV_HPEL( mx, my );
+ for( i = 0; i < i_mvc; i++ )
+ {
+ const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
+ const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
+ if( mx != bpred_mx || my != bpred_my )
+ COST_MV_HPEL( mx, my );
+ }
+ }
+ else
+ {
+ for( i = 0; i < i_mvc; i++ )
+ {
+ const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
+ const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
+ int doSearch = 1;
+ int j;
+ for(j = 0; j < i; j++)
+ {
+ if(mvc[i][0] == mvc[j][0] && mvc[i][1] == mvc[j][1]) doSearch = 0;
+ }
+ if( ( mx != bpred_mx || my != bpred_my ) && doSearch)
+ {
+ int bestcost;
+ int bestx = mx;
+ int besty = my;
+ COST_MV_HPEL2( mx, my, bestcost );
+ COPY3_IF_LT( bpred_cost, bestcost, bpred_mx, bestx, bpred_my, besty );
+ if(bestcost < 2*bpred_cost)
+ {
+ int n;
+ int dir = -2;
+ COST_MV_HPEL2(bestx-4,besty,costs[0]);
+ COST_MV_HPEL2(bestx-2,besty+4,costs[1]);
+ COST_MV_HPEL2(bestx+2,besty+4,costs[2]);
+ COST_MV_HPEL2(bestx+4,besty,costs[3]);
+ COST_MV_HPEL2(bestx+2,besty-4,costs[4]);
+ COST_MV_HPEL2(bestx-2,besty-4,costs[5]);
+ COPY2_IF_LT( bestcost, costs[0], dir, 0 );
+ COPY2_IF_LT( bestcost, costs[1], dir, 1 );
+ COPY2_IF_LT( bestcost, costs[2], dir, 2 );
+ COPY2_IF_LT( bestcost, costs[3], dir, 3 );
+ COPY2_IF_LT( bestcost, costs[4], dir, 4 );
+ COPY2_IF_LT( bestcost, costs[5], dir, 5 );
+ if( dir != -2 )
+ {
+ static const int hex2[8][2] = {{-2,-4}, {-4,0}, {-2,4}, {2,4}, {4,0}, {2,-4}, {-2,-4}, {-4,0}};
+ bestx += hex2[dir+1][0];
+ besty += hex2[dir+1][1];
+ for( n = 1; n < i_me_range && CHECK_MVRANGE4(bestx, besty); n++ )
+ {
+ static const int mod6[8] = {5,0,1,2,3,4,5,0};
+ const int odir = mod6[dir+1];
+ COST_MV_HPEL2(hex2[odir+0][0]+bestx,hex2[odir+0][1]+besty,costs[0]);
+ COST_MV_HPEL2(hex2[odir+1][0]+bestx,hex2[odir+1][1]+besty,costs[1]);
+ COST_MV_HPEL2(hex2[odir+2][0]+bestx,hex2[odir+2][1]+besty,costs[2]);
+ dir = -2;
+ COPY2_IF_LT( bestcost, costs[0], dir, odir-1 );
+ COPY2_IF_LT( bestcost, costs[1], dir, odir );
+ COPY2_IF_LT( bestcost, costs[2], dir, odir+1 );
+ if( dir == -2 )
+ break;
+ bestx += hex2[dir+1][0];
+ besty += hex2[dir+1][1];
+ }
+ }
+ COST_MV_HPEL3(bestx+2,besty-2);
+ COST_MV_HPEL3(bestx+2,besty);
+ COST_MV_HPEL3(bestx+2,besty+2);
+ COST_MV_HPEL3(bestx,besty-2);
+ COST_MV_HPEL3(bestx,besty+2);
+ COST_MV_HPEL3(bestx-2,besty-2);
+ COST_MV_HPEL3(bestx-2,besty);
+ COST_MV_HPEL3(bestx-2,besty+2);
+ COPY3_IF_LT(bpred_cost,bestcost,bpred_mx,bestx,bpred_my,besty);
+ }
+ }
+ }
}
bmx = ( bpred_mx + 2 ) >> 2;
bmy = ( bpred_my + 2 ) >> 2;
COST_MV( bmx, bmy );
}
Index: x264.c
===================================================================
--- x264.c (revision 675)
+++ x264.c (working copy)
@@ -232,7 +232,8 @@
H1( " --mvrange-thread <int> Minimum buffer between threads [-1 (auto)]\n" );
H0( " -m, --subme <integer> Subpixel motion estimation and partition\n"
" decision quality: 1=fast, 7=best. [%d]\n", defaults->analyse.i_subpel_refine );
- H0( " --b-rdo RD based mode decision for B-frames. Requires subme 6.\n" );
+ H0( " --me-prepass Run an ME prepass on predictors. Requires subme 3 or higher.\n");
+ H0( " --b-rdo RD based mode decision for B-frames. Requires subme 6 or higher.\n" );
H0( " --mixed-refs Decide references on a per partition basis\n" );
H1( " --no-chroma-me Ignore chroma in motion estimation\n" );
H1( " --bime Jointly optimize both MVs in B-frames\n" );
@@ -398,6 +399,7 @@
{ "mvrange", required_argument, NULL, 0 },
{ "mvrange-thread", required_argument, NULL, 0 },
{ "subme", required_argument, NULL, 'm' },
+ { "me-prepass", no_argument, NULL, 0 },
{ "b-rdo", no_argument, NULL, 0 },
{ "mixed-refs", no_argument, NULL, 0 },
{ "no-chroma-me", no_argument, NULL, 0 },
Index: x264.h
===================================================================
--- x264.h (revision 675)
+++ x264.h (working copy)
@@ -220,6 +220,7 @@
int i_mv_range; /* maximum length of a mv (in pixels). -1 = auto, based on level */
int i_mv_range_thread; /* minimum space between threads. -1 = auto, based on number of threads. */
int i_subpel_refine; /* subpixel motion estimation quality */
+ int i_me_prepass; /* run an ME prepass on predictors */
int b_bidir_me; /* jointly optimize both MVs in B-frames */
int b_chroma_me; /* chroma ME for subpel and mode decision in P-frames */
int b_bframe_rdo; /* RD based mode decision for B-frames */

Speed: 25% faster (25% less impact on speed as compared to the old ME-prepass)
Quality: 42% better (42% more increase in quality as compared to the old ME-prepass)

Not surprisingly, eliminating the qpel aspect of the search gave a huge speed boost with an actual slight increase in quality.

morph166955

30th September 2007, 08:33

awesome! i'm heading off to bed but i'll update the first post in the morning.

One thing I noticed though was that the faster-dia patch came up saying unexpected end of file when I ran it (looked like it was missing a new line at the end). Just wanted to make sure it was just that and not a missing bit of code at the end or something.

Dark Shikari

30th September 2007, 08:34

awesome! i'm heading off to bed but i'll update the first post in the morning.

One thing I noticed though was that the faster-dia patch came up saying unexpected end of file when I ran it (looked like it was missing a new line at the end). Just wanted to make sure it was just that and not a missing bit of code at the end or something.
Nah, don't worry, just me being retarded I think.

Terranigma

30th September 2007, 16:36

Here's my fixed ME_Prepass patch.

Speed: 25% faster (25% less impact on speed as compared to the old ME-prepass)
Quality: 42% better (42% more increase in quality as compared to the old ME-prepass)

Not surprisingly, eliminating the qpel aspect of the search gave a huge speed boost with an actual slight increase in quality.

I would love to see a custom build with this a.s.a.p. :D

morph166955

30th September 2007, 18:41

Ok I just updated the first post a little, needs a few more tweaks. I also updated my site with a few of the patches and made some diffs that are clean against r680. Most notably, I made a diff on the new ME_Prepass that you posted the code for above as well as making a clean diff for the faster-dia patch. Both are on my site and the links are above. I'm going to try to keep my site updated with diff's as well as Cef's for people who want them.

le_canz

30th September 2007, 18:45

:thanks:

morph166955

30th September 2007, 20:58

Here's my fixed ME_Prepass patch.

Patch refuses to compile.
encoder/me.c: In function 'x264_me_search_ref':
encoder/me.c:229: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:235: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:236: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:237: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:238: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:239: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:240: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:252: warning: implicit declaration of function 'CHECK_MVRANGE4'
encoder/me.c:256: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:257: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:258: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:269: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:270: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:271: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:272: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:273: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:274: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:275: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:276: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
make: *** [encoder/me.o] Error 1

Dark Shikari

30th September 2007, 21:06

Patch refuses to compile.
encoder/me.c: In function 'x264_me_search_ref':
encoder/me.c:229: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:235: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:236: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:237: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:238: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:239: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:240: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:252: warning: implicit declaration of function 'CHECK_MVRANGE4'
encoder/me.c:256: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:257: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:258repl: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:269: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:270: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:271: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:272: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:273: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:274: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:275: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:276: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
make: *** [encoder/me.o] Error 1

That's because if you want to compile ME-prepass without the Hadamard patch (--fpelcmp), you must replace all instances of "fpelcmp" with "sad" in me.c. Find/replace. They modify some of the same lines so I can't create a generic patch for this reason.

And also, oops, another small mistake in the patch--should be easily fixable:

replace
int mv_x_min = h->mb.mv_min_fpel[0];
int mv_y_min = h->mb.mv_min_fpel[1];
int mv_x_max = h->mb.mv_max_fpel[0];
int mv_y_max = h->mb.mv_max_fpel[1];

#define CHECK_MVRANGE(mx,my) ( mx >= mv_x_min && mx <= mv_x_max && my >= mv_y_min && my <= mv_y_max )
with
int mv_x_min = h->mb.mv_min_fpel[0];
int mv_y_min = h->mb.mv_min_fpel[1];
int mv_x_max = h->mb.mv_max_fpel[0];
int mv_y_max = h->mb.mv_max_fpel[1];
int mv_x_min4 = h->mb.mv_min_fpel[0]<<2;
int mv_y_min4 = h->mb.mv_min_fpel[1]<<2;
int mv_x_max4 = h->mb.mv_max_fpel[0]<<2;
int mv_y_max4 = h->mb.mv_max_fpel[1]<<2;

#define CHECK_MVRANGE(mx,my) ( mx >= mv_x_min && mx <= mv_x_max && my >= mv_y_min && my <= mv_y_max )
#define CHECK_MVRANGE4(mx,my) ( mx >= mv_x_min4 && mx <= mv_x_max4 && my >= mv_y_min4 && my <= mv_y_max4 )

I forgot to include this in the diff.

Sorry I don't have a better diff, but my version control is nonexistent ;) I assume you can probably make a new diff once these are fixed.

morph166955

30th September 2007, 21:47

got it, new diff's creted, original post updated to have both options (w/ and w/o hadamard)

Dark Shikari

30th September 2007, 21:55

Some mistakes in your post... not all those that I explained were created or maintained by me ;)

AQ/BRDO isn't mine, HRD isn't mine.

lexor

30th September 2007, 21:56

I've asked this question in the multi-thread discussion, but answers were inconclusive, so I'll ask it here again.

Are these patches actually applied to Cef's builds? At first I was told that thread_pool patch is applied, but then someone said that it didn't work with 680 and had to be fixed (which it is now), so it couldn't have been applied before.

So perhaps we need another line for each patch stating if it is applied?

morph166955

30th September 2007, 21:57

fixed.

Trahald

30th September 2007, 22:29

the hrd in the patch was done by Ian Caulfield and the pulldown part I added.

morph166955

30th September 2007, 22:32

updated.

martino

30th September 2007, 22:51

I'd also like to know as to which patches are present in Cef's build. I don't mean the "dark" or "exp" version. It's just confusing, since there seem to be a few builds, more patches, and to me it looks like a hell of mess where trying to find an answer is rather hard...

Thanks

Sagekilla

30th September 2007, 22:58

Would be nice if there was a little scrollover icon that would tell you what patches (that haven't been merged in to the main build) are applied to it..

Edit: Also, I too would like to know what patches are in cef's latest build.

Terranigma

1st October 2007, 01:05

I'd also like to know as to which patches are present in Cef's build. I don't mean the "dark" or "exp" version. It's just confusing, since there seem to be a few builds, more patches, and to me it looks like a hell of mess where trying to find an answer is rather hard...

Thanks
I guess you didn't see This (http://forum.doom9.org/showpost.php?p=1048519&postcount=113) ?
Cef, you think you could update the exp build for now and include AQ, Thread Pool, and the new ME-Prepass patch, then for future references, include me-prepass in your regular builds? :scared:

Dark Shikari

1st October 2007, 01:12

Also add the --subme 7 patch, since its been proven quite thoroughly to increase quality in basically all cases at minimal speed cost.

Faster DIA, IMH, and SATD shouldn't be applied yet. One thought I did have was to instead of making SATD an option for all ME search methods, instead add a 5th ME search method:

DIA
HEX
UMH
ESA
HES (Hadamard Exhaustive Search: Better than all the above methods, but correspondingly slower)

The reason for this is simply that Aku's testing showed that SATD slowed down all the other methods so much that it was better to use SAD ESA than SATD anything else. However, SATD ESA is so heavily optimized that its still useful, and not too much slower than regular ESA.

Terranigma

1st October 2007, 01:18

Also add the --subme 7 patch, since its been proven quite thoroughly to increase quality in basically all cases at minimal speed cost.
Oh I thought the new subme-7 made it to the svn, seems I was wrong :p

Faster DIA, IMH, and SATD shouldn't be applied yet.

I agree with you on imh and dia (At first I was all for imh, but suddenly I changed my mind :D).

One thought I did have was to instead of making SATD an option for all ME search methods, instead add a 5th ME search method:

DIA
HEX
UMH
ESA
HES (Hadamard Exhaustive Search: Better than all the above methods, but correspondingly slower)

The reason for this is simply that Aku's testing showed that SATD slowed down all the other methods so much that it was better to use SAD ESA than SATD anything else. However, SATD ESA is so heavily optimized that its still useful, and not too much slower than regular ESA.

I like this idea. Only allowing SATD to be used with the motion search algorithm it gains any real benefit from. I'm all for the new --hes :)

akupenguin

1st October 2007, 02:07

Not sure I like HES, too similar to HEX. Maybe TES (same "T"ransform as in SATD). Or ESH.
But I'm not sure that SATD is only useful in ESA: There are good reasons that integral-based successive elimination for SAD can only be efficient in ESA, but it's possible that SAD-based successive elimination for SATD could work in other search patterns. The cost of a SAD or a SATD is high enough that the overhead of random access needn't be fatal.

Dark Shikari

1st October 2007, 02:14

Not sure I like HES, too similar to HEX. Maybe TES (same "T"ransform as in SATD). Or ESH.TES seems fine to me; it avoids starting with the same letter as any of the others.

fields_g

1st October 2007, 02:42

However, SATD ESA is so heavily optimized that its still useful, and not too much slower than regular ESA.

Here's another solution... set regular ESA to SATD ESA?

Let SAD IMH be the (speed) middle ground between SAD UMH and SATD ESA?

akupenguin, could you update your chart here (http://forum.doom9.org/showthread.php?p=1047085#post1047085) with Dark Shikari's new ME-prepass found here (http://forum.doom9.org/showthread.php?p=1050161#post1050161)?

Depending on the results, I might even suggest regular ESA to be prepass SATD ESA.

I see it like this... If you are mad enough to do ESA, you are quite likely going to do SATD and prepass also.

Dark Shikari

1st October 2007, 02:43

Here's another solution... set regular ESA to SATD ESA?

Let SAD IMH be the (speed) middle ground between SAD UMH and SATD ESA?

akupenguin, could you update your chart here (http://forum.doom9.org/showthread.php?p=1047085#post1047085) with Dark Shikari's new ME-prepass found here (http://forum.doom9.org/showthread.php?p=1050161#post1050161)?

Depending on the results, I might even suggest regular ESA to be prepass SATD ESA.

I see it like this... If you are mad enough to do ESA, you are quite likely going to do SATD and prepass also.Shouldn't force users to use what they don't want to--better to give them the option.

fields_g

1st October 2007, 02:51

Shouldn't force users to use what they don't want to--better to give them the option.

I agree.. I am just looking for some other option to making another me-type that really isn't anything but a preset. Make the defaults best practices and allow switches to deviate.

How about normal default esa be SATD and with a switch it can SAD and ditch the new me-type. All choices still remain.

Dark Shikari

1st October 2007, 02:54

I agree.. I am just looking for some other option to making another me-type that really isn't anything but a preset. Make the defaults best practices and allow switches to deviate.

How about normal default esa be SATD and with a switch it can SAD and ditch the new me-type. All choices still remain.
How is using --fpel-cmp satd on ESA a "best practice"? Its even slower than regular ESA, and so only useful for those who have even more time to waste.

Also note that if TES/whatever its called uses SATD, the --fpel-cmp satd option will be removed.

akupenguin

1st October 2007, 03:24

How about normal default esa be SATD and with a switch it can SAD and ditch the new me-type. All choices still remain.
A new value for --me is simpler interface-wise than a new top-level option.

fields_g

1st October 2007, 03:32

How is using --fpel-cmp satd on ESA a "best practice"? Its even slower than regular ESA, and so only useful for those who have even more time to waste.

Also note that if TES/whatever its called uses SATD, the --fpel-cmp satd option will be removed.

Using the chart (http://forum.doom9.org/showthread.php?p=1047085#post1047085) as an approximation roughly:
SAD - UMH - ME32
has the same quality as
SAD - ESA - ME7
and
SATD - ESA - ME4

However the FPS is 47 vs. 42 vs. 39 respectively. SATD is slowest, by not by too much. Even though SATD is 7-8% slower, the magic is that SATD, with computation (me-range increments), SATD gains quality much quicker and peaks much higher. Additionally, SATD ESA-me6 beats the quality of SAD ESA-me12 at the same FPS! Therefore, SAD ESA only has a place for me range less than 12.

So instead of telling people that ESA is only has benefits from me-7 through me-12, over other ME-types, you could tell them ESA picks up quality-wise where UMH stops. It seems a little more clean to me. I just hope the explanation is understandable.

Maybe I'm a little my willing to throw computation at it than others, but I think the average person ESA would usually do this anyway.

Dark Shikari

1st October 2007, 03:44

Using the chart (http://forum.doom9.org/showthread.php?p=1047085#post1047085) as an approximation roughly:
SAD - UMH - ME32
has the same quality as
SAD - ESA - ME7
and
SATD - ESA - ME4

However the FPS is 47 vs. 42 vs. 39 respectively. SATD is slowest, by not by too much. Even though SATD is 7-8% slower, the magic is that SATD, with computation (me-range increments), SATD gains quality much quicker and peaks much higher. Additionally, SATD ESA-me6 beats the quality of SAD ESA-me12 at the same FPS! Therefore, SAD ESA only has a place for me range less than 12.

So instead of telling people that ESA is only has benefits from me-7 through me-12, over other ME-types, you could tell them ESA picks up quality-wise where UMH stops. It seems a little more clean to me. I just hope the explanation is understandable.

Maybe I'm a little my willing to throw computation at it than others, but I think the average person ESA would usually do this anyway.Except that on some sources, SATD is inferior to SAD as a metric ;)

Anime in particular seems to suffer from this, in my experience.

fields_g

1st October 2007, 03:55

Except that on some sources, SATD is inferior to SAD as a metric ;)

Anime in particular seems to suffer from this, in my experience.

I was just about to state that I might be overusing this single chart (source) a bit! I'll be downloading one of these fancy "bundle-o-patches" builds and start going at it! Quite honestly, I'll be able to follow either scheme and will be happy as long as I have prepass SATD ESA around in some form, especially with the improvements listed here (http://forum.doom9.org/showthread.php?p=1050161#post1050161).

Cef

1st October 2007, 11:00

I'd also like to know as to which patches are present in Cef's build. I don't mean the "dark" or "exp" version. It's just confusing, since there seem to be a few builds, more patches, and to me it looks like a hell of mess where trying to find an answer is rather hard...

Thanks

As I already said, my builds have AQ and thread pool applied. x264_xxx_dark was including all Dark_shikari's patches at the time it was posted (except faster first pass iirc), and x264_xxx_exp was a build requested by Sagittaire with some Dark's patches and hrd.

I completly agree this is confusing, my organization is terrible on this, but I don't have much time to dedicate to it, and I usually spend it fixing conflicts with new rev's or between patches. If you have any suggestion it's welcome.

martino

1st October 2007, 16:57

I guess you didn't see This (http://forum.doom9.org/showpost.php?p=1048519&postcount=113) ?
I did in fact, and heck. At this point there are two AQ patches. ;_;

But I can say at this point that it's the "old" one. And thanks Cef for explaining.

I'm not sure whether I'd have any good suggestions, but perhaps just a small txt in the directory where your builds are located (on x264.nl) which would state which patch(es) was/were applied to which build(s). Or maybe if morph would be so kind to interpret this into the introductory post in this thread... Whatever works really.

Terranigma

1st October 2007, 17:46

martino, do you know how to complile x264? It doesn't look like noone's too eager to compile an experimental build with aq, thread pool, new subme7, new pre-pass, and keep --fpel-cmp sad/satd like it is as suggested by akupenguin, and if you must implement the new aq, add it as an optional command. Maybe something like aq2-strength. You can find the latest aq2 algortihm by Dark Shikari here (http://forum.doom9.org/showpost.php?p=1046740&postcount=41). I'm not sure if that's the latest, so he's the only one that can confirm or deny this.

Dark Shikari

1st October 2007, 18:14

martino, do you know how to complile x264? It doesn't look like noone's too eager to compile an experimental build with aq, thread pool, new subme7, new pre-pass, and keep --fpel-cmp sad/satd like it is as suggested by akupenguin, and if you must implement the new aq, add it as an optional command. Maybe something like aq2-strength. You can find the latest aq2 algortihm by Dark Shikari here (http://forum.doom9.org/showpost.php?p=1046740&postcount=41). I'm not sure if that's the latest, so he's the only one that can confirm or deny this.New AQ is definitely not ready, and that is quite old IIRC.

Don't add it yet. Its way too experimental.

DeathTheSheep

2nd October 2007, 00:10

I think the reason why most people don't have a go at compiling these patches is that the patches themselves are quite troublesome to apply. :)

I did it with a lot of manual patching, so it's definitely possible. But of course I also messed with a lot of other stuff in the code and then finally deleted the folder. I think it's best at this point to wait it out until more stability/commits/developments occur. Else just use the older build Cef (?) made, there shouldn't be much difference.

Terranigma

2nd October 2007, 00:35

I think it's best at this point to wait it out until more stability/commits/developments occur. Else just use the older build Cef (?) made, there shouldn't be much difference.

Speed: 25% faster (25% less impact on speed as compared to the old ME-prepass)
Quality: 42% better (42% more increase in quality as compared to the old ME-prepass)

Not surprisingly, eliminating the qpel aspect of the search gave a huge speed boost with an actual slight increase in quality.

42% is a huge difference, or so I would think :scared:

Dark Shikari

2nd October 2007, 00:56

42% is a huge difference, or so I would think :scared:

Let's say the original gave a 2% quality boost.

42% better quality over the original ME Prepass = 2.84% quality boost.

Terranigma

2nd October 2007, 01:04

Let's say the original gave a 2% quality boost.

42% better quality over the original ME Prepass = 2.84% quality boost.

good enough for me, and it's faster to boot. :)

DeathTheSheep

2nd October 2007, 07:18

They behave identically on my system. Bit-for-bit identical outputs, and no speed boost to boot!

I'm using the 9.29KB me-prepass diff from the first post. It's a bit bigger than the one I used before, so I assume it's new.
Started from fresh r680 source and applied (in order) satd, subme7, me-prepass.

Yep.

[edit] Ah, wait, finally at merange 4 I notice a teensy weensy bit of difference (<.1%). Probably compiler differences, though, since I updated GCC. :p But anyone who uses merange 4 is truly insane, and for a different reason. :D

Dark Shikari

2nd October 2007, 07:36

They behave identically on my system. Bit-for-bit identical outputs, and no speed boost to boot!

I'm using the 9.29KB me-prepass diff from the first post. It's a bit bigger than the one I used before, so I assume it's new.
Started from fresh r680 source and applied (in order) satd, subme7, me-prepass.

Yep.

[edit] Ah, wait, finally at merange 4 I notice a teensy weensy bit of difference (<.1%). Probably compiler differences, though, since I updated GCC. :p But anyone who uses merange 4 is truly insane, and for a different reason. :DThat would be because the one in the original post is the old ME patch, which still hasn't been updated ;)

DeathTheSheep

2nd October 2007, 07:40

Ok I just updated the first post a little, needs a few more tweaks. I also updated my site with a few of the patches and made some diffs that are clean against r680. Most notably, I made a diff on the new ME_Prepass that you posted the code for above as well as making a clean diff for the faster-dia patch. Both are on my site and the links are above. I'm going to try to keep my site updated with diff's as well as Cef's for people who want them.

Really? He seems to indicate otherwise. So does the difference in filesize... But I'll actually have a look at the code now... :D

Dark Shikari

2nd October 2007, 08:20

Really? He seems to indicate otherwise. So does the difference in filesize... But I'll actually have a look at the code now... :D
The numbers in the patch look quite different from those in the diff I posted :p

fields_g

2nd October 2007, 16:47

Would it be possible to have a reversion number line commented into the diff file, or is that against file syntax? I'd love to be able to say "compare version xxx with yyy"!

Dark Shikari

2nd October 2007, 16:58

Would it be possible to have a reversion number line commented into the diff file, or is that against file syntax? I'd love to be able to say "compare version xxx with yyy"!SVN diff does this.

fields_g

2nd October 2007, 17:29

SVN diff does this.

Great! Trying to identify a patch revision as "the one found in post xxx" or for example "the original AQ" vs. "Dark Shikari's old AQ" vs. "Dark Shikari's new AQ" is a bit complicated/limited if there is more than a couple variations. (Don't you love that we have bright people here developing new things to try?)

This will help people who are making builds explicitly describe what is in their builds also!

burfadel

2nd October 2007, 17:50

I still think --me-prepass should be added as a default option. It should be enabled on principle for subme modes 6 and definately 7, and optional on 3,4,5. Realistically, if people choose subme mode 7 they're aiming for quality/filesize, it hardly would seem logical to select subme 7 but refuse to use the --me-prepass command!

fields_g

2nd October 2007, 18:19

I still think --me-prepass should be added as a default option. It should be enabled on principle for subme modes 6 and definately 7, and optional on 3,4,5. Realistically, if people choose subme mode 7 they're aiming for quality/filesize, it hardly would seem logical to select subme 7 but refuse to use the --me-prepass command!

Interesting... I'm not sure, but either you are suggesting a new approach or are mixing two different (though related) things together.

1) There is a ME type: Dia, Hex, UMH, ESA
2) There is a subpixel refinement of 1-7

Discussion before as questioned making prepass dependent on ME type (if ESA then ON, else OFF), not subpixel refinement. I'll let someone else comment on how wise it would be to connect prepass to subpixel refinement.