Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
![]() |
|
Thread Tools | Search this Thread | Display Modes |
![]() |
#1 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
Current Patches, Where to get them, How they affect speed/output
Something I've noticed is that while we have Cef's repository of patches that he uses on his builds at http://mirror05.x264.nl/Cef/?dir=./patches there is no central place to explain what they do, what their effect on both the speed of the encode as well as Creator/Maintainer: Dark Shikari
Description: the output is, who wrote it/where it originated, and if they aren't on Cef's site where to get them (or where they originally came from in case they are updated and not updated on Cef's site). I have included below a list of the ones currently on Cef's as well as an explanation of the one that I know. I would appreciate if people could fill in for some others, I'll update this post with the explanations as people make posts. Please try to use the format that I use below for the thread pool patch so that I don't have to parse through it for the info. Thanks in advance for all who contribute! Thread Pool Patch: Current: http://www.benswebs.com/public/x264/....04c.r680.diff Other Current: http://mirror05.x264.nl/Cef/force.ph...pool.r680.diff Origin: http://forum.doom9.org/showthread.php?t=124557 Creator: akupenguin Maintainer: morph166955/Cef Description: Forces x264 to use the same threads over and over again instead of creating and destroying threads as needed. Speed benefits seen on Quad-Core and Octa-Core machines (as much as a 20% speed boost seen on my Octa-Core), either little or negative speed change seen on single and dual core systems. The current revision on Cef's site was modified by him to work with r680, the one on my site is basically the same. Faster DIA patch: Current:http://mirror05.x264.nl/Cef/force.ph...aster-dia.diff Current:http://www.benswebs.com/public/x264/...-dia.r680.diff Creator/Maintainer: Dark Shikari Description: Tiny patch, 3.5% faster DIA for better first pass. Subme 7 Improvement Current:http://mirror05.x264.nl/Cef/force.ph...ubme7_vc8.diff Creator/Maintainer: Dark Shikari Description: Improved subme 7. Basically no speed impact, small quality boost. SATD ESA Fullpel Comparison Patch: Current: http://mirror05.x264.nl/Cef/force.ph...d_fpel.11.diff Creator/Maintainer: Dark Shikari Description: Allows SATD to be used as a fullpel comparison metric. Totally useless with any search other than ESA, since the SATD ESA has been optimized so well by Akupenguin. ME Prepass Patch: Current: http://www.benswebs.com/public/x264/...epass_ham.diff (use with hadamard patch) Current: http://www.benswebs.com/public/x264/...ass_noham.diff (use without hadamard patch) Current: http://mirror05.x264.nl/Cef/force.ph...e-prepass.diff Creator/Maintainer: Dark Shikari Description: Runs an ME prepass on the predictors before actually doing the motion search. Somewhat bugged--it can probably be a lot better than it currently is. IMH Motion Estimation Patch: Current: http://mirror05.x264.nl/Cef/force.ph.../x264_IMH.diff Creator/Maintainer: Dark Shikari Description: A motion search slower than UMH but faster than ESA. Not that worthwhile since ESA is now threaded. HD HRD/Pulldown Patch: Current: http://mirror05.x264.nl/Cef/force.ph..._pulldown.diff Creator: Ian Caulfield/Trahald Description: HRD and pulldown for HD compatibility. http://mirror05.x264.nl/Cef/force.ph...x264_bssd.diff http://mirror05.x264.nl/Cef/force.ph..._bchanges.diff AQ/BRDO Patch: Current: http://mirror05.x264.nl/Cef/force.ph...4_aq-brdo.diff Description: This was added to the source a while ago, fixing a bug with AQ and BRDO. http://mirror05.x264.nl/Cef/force.ph...2pass_vbv.diff Second Pass ETA Patch: Current: http://www.benswebs.com/public/x264/...a.01.r680.diff Creator/Maintainer: morph166955 Description: Forces x264 to use the frame count from the stats file on a second pass if the frame count can't be calculated for some reason (such as the use of a fifo pipe). Last edited by morph166955; 30th September 2007 at 22:32. Reason: added descriptions on some patches |
![]() |
![]() |
![]() |
#2 | Link | ||||||
x264 developer
Join Date: Sep 2005
Posts: 8,667
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
This was added to the source a while ago, fixing a bug with AQ and BRDO. |
||||||
![]() |
![]() |
![]() |
#3 | Link |
Registered User
Join Date: May 2006
Posts: 957
|
There is another patch, the clock/timing/progress one. I don't know if it still works, the diff I have is from rev. 614
http://users.telenet.be/darnley/x264_clock1-614.diff It prints the total encoding time and prints process 10000 time per file instead of 1000
__________________
x264 log explained || x264 deblocking how-to preset -> tune -> user set options -> fast first pass -> profile -> level Doom10 - Of course it's better, it's one more. Last edited by J_Darnley; 30th September 2007 at 01:45. |
![]() |
![]() |
![]() |
#4 | Link |
Mr. Sandman
Join Date: Sep 2003
Location: Haddonfield, IL
Posts: 11,768
|
moooo
__________________
MPEG-4 ASP Custom Matrices: EQM V1(old), EQM AutoGK Sharpmatrix (aka EQM V2), EQM V3HR (updated 01/10/2004), EQM V3LR, EQM V3ULR (updated 04/02/2005), EQM V3UHR (updated 17/12/2004) and EQM V3EHR (updated 05/10/2004) Info about my ASP matrices. MPEG-4 AVC Custom Matrices: EQM AVC-HR Info about my AVC matrices My x264 builds. Mooo!!! |
![]() |
![]() |
![]() |
#5 | Link |
x264 developer
Join Date: Sep 2005
Posts: 8,667
|
Here's my fixed ME_Prepass patch.
Code:
Index: common/common.c =================================================================== --- common/common.c (revision 675) +++ common/common.c (working copy) @@ -441,6 +441,8 @@ p->analyse.i_mv_range_thread = atoi(value); OPT2("subme", "subq") p->analyse.i_subpel_refine = atoi(value); + OPT2("me-prepass", "meprepass") + p->analyse.i_me_prepass = atobool(value); OPT("bime") p->analyse.b_bidir_me = atobool(value); OPT("chroma-me") @@ -879,6 +881,7 @@ s += sprintf( s, " analyse=%#x:%#x", p->analyse.intra, p->analyse.inter ); s += sprintf( s, " me=%s", x264_motion_est_names[ p->analyse.i_me_method ] ); s += sprintf( s, " subme=%d", p->analyse.i_subpel_refine ); + s += sprintf( s, " me-prepass=%d", p->analyse.i_me_prepass ); s += sprintf( s, " brdo=%d", p->analyse.b_bframe_rdo ); s += sprintf( s, " mixed_ref=%d", p->analyse.b_mixed_references ); s += sprintf( s, " me_range=%d", p->analyse.i_me_range ); Index: encoder/me.c =================================================================== --- encoder/me.c (revision 675) +++ encoder/me.c (working copy) @@ -61,6 +61,23 @@ COPY3_IF_LT( bpred_cost, cost, bpred_mx, mx, bpred_my, my ); \ } +#define COST_MV_HPEL2( mx, my, cost ) \ +{ \ + int stride = 16; \ + uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \ + cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \ + + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \ +} + +#define COST_MV_HPEL3( mx, my) \ +{ \ + int stride = 16; \ + uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \ + int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \ + + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \ + COPY3_IF_LT( bestcost, cost, bestx, mx, besty, my ); \ +} + #define COST_MV_X3_DIR( m0x, m0y, m1x, m1y, m2x, m2y, costs )\ {\ uint8_t *pix_base = p_fref + bmx + bmy*m->i_stride[0];\ @@ -177,18 +194,85 @@ pmx = ( bmx + 2 ) >> 2; pmy = ( bmy + 2 ) >> 2; bcost = COST_MAX; - + /* try extra predictors if provided */ if( h->mb.i_subpel_refine >= 3 ) { COST_MV_HPEL( bmx, bmy ); - for( i = 0; i < i_mvc; i++ ) + if(!h->param.analyse.i_me_prepass) { - const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 ); - const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 ); - if( mx != bpred_mx || my != bpred_my ) - COST_MV_HPEL( mx, my ); + for( i = 0; i < i_mvc; i++ ) + { + const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 ); + const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 ); + if( mx != bpred_mx || my != bpred_my ) + COST_MV_HPEL( mx, my ); + } + } + else + { + for( i = 0; i < i_mvc; i++ ) + { + const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 ); + const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 ); + int doSearch = 1; + int j; + for(j = 0; j < i; j++) + { + if(mvc[i][0] == mvc[j][0] && mvc[i][1] == mvc[j][1]) doSearch = 0; + } + if( ( mx != bpred_mx || my != bpred_my ) && doSearch) + { + int bestcost; + int bestx = mx; + int besty = my; + COST_MV_HPEL2( mx, my, bestcost ); + COPY3_IF_LT( bpred_cost, bestcost, bpred_mx, bestx, bpred_my, besty ); + if(bestcost < 2*bpred_cost) + { + int n; + int dir = -2; + COST_MV_HPEL2(bestx-4,besty,costs[0]); + COST_MV_HPEL2(bestx-2,besty+4,costs[1]); + COST_MV_HPEL2(bestx+2,besty+4,costs[2]); + COST_MV_HPEL2(bestx+4,besty,costs[3]); + COST_MV_HPEL2(bestx+2,besty-4,costs[4]); + COST_MV_HPEL2(bestx-2,besty-4,costs[5]); + COPY2_IF_LT( bestcost, costs[0], dir, 0 ); + COPY2_IF_LT( bestcost, costs[1], dir, 1 ); + COPY2_IF_LT( bestcost, costs[2], dir, 2 ); + COPY2_IF_LT( bestcost, costs[3], dir, 3 ); + COPY2_IF_LT( bestcost, costs[4], dir, 4 ); + COPY2_IF_LT( bestcost, costs[5], dir, 5 ); + if( dir != -2 ) + { + static const int hex2[8][2] = {{-2,-4}, {-4,0}, {-2,4}, {2,4}, {4,0}, {2,-4}, {-2,-4}, {-4,0}}; + bestx += hex2[dir+1][0]; + besty += hex2[dir+1][1]; + for( n = 1; n < i_me_range && CHECK_MVRANGE4(bestx, besty); n++ ) + { + static const int mod6[8] = {5,0,1,2,3,4,5,0}; + const int odir = mod6[dir+1]; + COST_MV_HPEL2(hex2[odir+0][0]+bestx,hex2[odir+0][1]+besty,costs[0]); + COST_MV_HPEL2(hex2[odir+1][0]+bestx,hex2[odir+1][1]+besty,costs[1]); + COST_MV_HPEL2(hex2[odir+2][0]+bestx,hex2[odir+2][1]+besty,costs[2]); + dir = -2; + COPY2_IF_LT( bestcost, costs[0], dir, odir-1 ); + COPY2_IF_LT( bestcost, costs[1], dir, odir ); + COPY2_IF_LT( bestcost, costs[2], dir, odir+1 ); + if( dir == -2 ) + break; + bestx += hex2[dir+1][0]; + besty += hex2[dir+1][1]; + } + } + COST_MV_HPEL3(bestx+2,besty-2); + COST_MV_HPEL3(bestx+2,besty); + COST_MV_HPEL3(bestx+2,besty+2); + COST_MV_HPEL3(bestx,besty-2); + COST_MV_HPEL3(bestx,besty+2); + COST_MV_HPEL3(bestx-2,besty-2); + COST_MV_HPEL3(bestx-2,besty); + COST_MV_HPEL3(bestx-2,besty+2); + COPY3_IF_LT(bpred_cost,bestcost,bpred_mx,bestx,bpred_my,besty); + } + } + } } bmx = ( bpred_mx + 2 ) >> 2; bmy = ( bpred_my + 2 ) >> 2; COST_MV( bmx, bmy ); } Index: x264.c =================================================================== --- x264.c (revision 675) +++ x264.c (working copy) @@ -232,7 +232,8 @@ H1( " --mvrange-thread <int> Minimum buffer between threads [-1 (auto)]\n" ); H0( " -m, --subme <integer> Subpixel motion estimation and partition\n" " decision quality: 1=fast, 7=best. [%d]\n", defaults->analyse.i_subpel_refine ); - H0( " --b-rdo RD based mode decision for B-frames. Requires subme 6.\n" ); + H0( " --me-prepass Run an ME prepass on predictors. Requires subme 3 or higher.\n"); + H0( " --b-rdo RD based mode decision for B-frames. Requires subme 6 or higher.\n" ); H0( " --mixed-refs Decide references on a per partition basis\n" ); H1( " --no-chroma-me Ignore chroma in motion estimation\n" ); H1( " --bime Jointly optimize both MVs in B-frames\n" ); @@ -398,6 +399,7 @@ { "mvrange", required_argument, NULL, 0 }, { "mvrange-thread", required_argument, NULL, 0 }, { "subme", required_argument, NULL, 'm' }, + { "me-prepass", no_argument, NULL, 0 }, { "b-rdo", no_argument, NULL, 0 }, { "mixed-refs", no_argument, NULL, 0 }, { "no-chroma-me", no_argument, NULL, 0 }, Index: x264.h =================================================================== --- x264.h (revision 675) +++ x264.h (working copy) @@ -220,6 +220,7 @@ int i_mv_range; /* maximum length of a mv (in pixels). -1 = auto, based on level */ int i_mv_range_thread; /* minimum space between threads. -1 = auto, based on number of threads. */ int i_subpel_refine; /* subpixel motion estimation quality */ + int i_me_prepass; /* run an ME prepass on predictors */ int b_bidir_me; /* jointly optimize both MVs in B-frames */ int b_chroma_me; /* chroma ME for subpel and mode decision in P-frames */ int b_bframe_rdo; /* RD based mode decision for B-frames */ Quality: 42% better (42% more increase in quality as compared to the old ME-prepass) Not surprisingly, eliminating the qpel aspect of the search gave a huge speed boost with an actual slight increase in quality. |
![]() |
![]() |
![]() |
#6 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
awesome! i'm heading off to bed but i'll update the first post in the morning.
One thing I noticed though was that the faster-dia patch came up saying unexpected end of file when I ran it (looked like it was missing a new line at the end). Just wanted to make sure it was just that and not a missing bit of code at the end or something. |
![]() |
![]() |
![]() |
#7 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,667
|
Quote:
|
|
![]() |
![]() |
![]() |
#8 | Link | |
*Space Reserved*
Join Date: May 2006
Posts: 953
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#9 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
Ok I just updated the first post a little, needs a few more tweaks. I also updated my site with a few of the patches and made some diffs that are clean against r680. Most notably, I made a diff on the new ME_Prepass that you posted the code for above as well as making a clean diff for the faster-dia patch. Both are on my site and the links are above. I'm going to try to keep my site updated with diff's as well as Cef's for people who want them.
|
![]() |
![]() |
![]() |
#11 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
Patch refuses to compile.
Code:
encoder/me.c: In function 'x264_me_search_ref': encoder/me.c:229: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:235: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:236: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:237: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:238: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:239: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:240: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:252: warning: implicit declaration of function 'CHECK_MVRANGE4' encoder/me.c:256: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:257: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:258: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:269: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:270: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:271: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:272: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:273: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:274: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:275: error: 'x264_pixel_function_t' has no member named 'fpelcmp' encoder/me.c:276: error: 'x264_pixel_function_t' has no member named 'fpelcmp' make: *** [encoder/me.o] Error 1 |
![]() |
![]() |
![]() |
#12 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,667
|
Quote:
And also, oops, another small mistake in the patch--should be easily fixable: replace Code:
int mv_x_min = h->mb.mv_min_fpel[0]; int mv_y_min = h->mb.mv_min_fpel[1]; int mv_x_max = h->mb.mv_max_fpel[0]; int mv_y_max = h->mb.mv_max_fpel[1]; #define CHECK_MVRANGE(mx,my) ( mx >= mv_x_min && mx <= mv_x_max && my >= mv_y_min && my <= mv_y_max ) Code:
int mv_x_min = h->mb.mv_min_fpel[0]; int mv_y_min = h->mb.mv_min_fpel[1]; int mv_x_max = h->mb.mv_max_fpel[0]; int mv_y_max = h->mb.mv_max_fpel[1]; int mv_x_min4 = h->mb.mv_min_fpel[0]<<2; int mv_y_min4 = h->mb.mv_min_fpel[1]<<2; int mv_x_max4 = h->mb.mv_max_fpel[0]<<2; int mv_y_max4 = h->mb.mv_max_fpel[1]<<2; #define CHECK_MVRANGE(mx,my) ( mx >= mv_x_min && mx <= mv_x_max && my >= mv_y_min && my <= mv_y_max ) #define CHECK_MVRANGE4(mx,my) ( mx >= mv_x_min4 && mx <= mv_x_max4 && my >= mv_y_min4 && my <= mv_y_max4 ) Sorry I don't have a better diff, but my version control is nonexistent ![]() Last edited by Dark Shikari; 30th September 2007 at 21:11. |
|
![]() |
![]() |
![]() |
#15 | Link |
Registered User
Join Date: Jan 2004
Posts: 849
|
I've asked this question in the multi-thread discussion, but answers were inconclusive, so I'll ask it here again.
Are these patches actually applied to Cef's builds? At first I was told that thread_pool patch is applied, but then someone said that it didn't work with 680 and had to be fixed (which it is now), so it couldn't have been applied before. So perhaps we need another line for each patch stating if it is applied?
__________________
Geforce GTX 260 Windows 7, 64bit, Core i7 MPC-HC, Foobar2000 |
![]() |
![]() |
![]() |
#19 | Link |
masktools2 (ab)user
Join Date: Oct 2006
Location: PAL-I :(
Posts: 235
|
I'd also like to know as to which patches are present in Cef's build. I don't mean the "dark" or "exp" version. It's just confusing, since there seem to be a few builds, more patches, and to me it looks like a hell of mess where trying to find an answer is rather hard...
Thanks |
![]() |
![]() |
![]() |
#20 | Link |
x264aholic
Join Date: Jul 2007
Location: New York
Posts: 1,752
|
Would be nice if there was a little scrollover icon that would tell you what patches (that haven't been merged in to the main build) are applied to it..
Edit: Also, I too would like to know what patches are in cef's latest build. |
![]() |
![]() |
![]() |
Tags |
h.264, x264, x264 builds, x264 patches, x264 unofficial builds |
Thread Tools | Search this Thread |
Display Modes | |
|
|