Current Patches, Where to get them, How they affect speed/output [Archive]

morph166955

29th September 2007, 16:30

Something I've noticed is that while we have Cef's repository of patches that he uses on his builds at http://mirror05.x264.nl/Cef/?dir=./patches there is no central place to explain what they do, what their effect on both the speed of the encode as well as Creator/Maintainer: Dark Shikari
Description: the output is, who wrote it/where it originated, and if they aren't on Cef's site where to get them (or where they originally came from in case they are updated and not updated on Cef's site). I have included below a list of the ones currently on Cef's as well as an explanation of the one that I know. I would appreciate if people could fill in for some others, I'll update this post with the explanations as people make posts. Please try to use the format that I use below for the thread pool patch so that I don't have to parse through it for the info. Thanks in advance for all who contribute!

Thread Pool Patch:
Current: http://www.benswebs.com/public/x264/patches/x264_thread_pool.04c.r680.diff
Other Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_thread_pool.r680.diff
Origin: http://forum.doom9.org/showthread.php?t=124557
Creator: akupenguin
Maintainer: morph166955/Cef
Description: Forces x264 to use the same threads over and over again instead of creating and destroying threads as needed. Speed benefits seen on Quad-Core and Octa-Core machines (as much as a 20% speed boost seen on my Octa-Core), either little or negative speed change seen on single and dual core systems. The current revision on Cef's site was modified by him to work with r680, the one on my site is basically the same.

Faster DIA patch:
Current:http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_faster-dia.diff
Current:http://www.benswebs.com/public/x264/patches/x264_faster-dia.r680.diff
Creator/Maintainer: Dark Shikari
Description: Tiny patch, 3.5% faster DIA for better first pass.

Subme 7 Improvement
Current:http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_subme7_vc8.diff
Creator/Maintainer: Dark Shikari
Description: Improved subme 7. Basically no speed impact, small quality boost.

SATD ESA Fullpel Comparison Patch:
Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_satd_fpel.11.diff
Creator/Maintainer: Dark Shikari
Description: Allows SATD to be used as a fullpel comparison metric. Totally useless with any search other than ESA, since the SATD ESA has been optimized so well by Akupenguin.

ME Prepass Patch:
Current: http://www.benswebs.com/public/x264/patches/x264_me-prepass_ham.diff (use with hadamard patch)
Current: http://www.benswebs.com/public/x264/patches/x264_me-prepass_noham.diff (use without hadamard patch)
Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_me-prepass.diff
Creator/Maintainer: Dark Shikari
Description: Runs an ME prepass on the predictors before actually doing the motion search. Somewhat bugged--it can probably be a lot better than it currently is.

IMH Motion Estimation Patch:
Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_IMH.diff
Creator/Maintainer: Dark Shikari
Description: A motion search slower than UMH but faster than ESA. Not that worthwhile since ESA is now threaded.

HD HRD/Pulldown Patch:
Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_hrd_pulldown.diff
Creator: Ian Caulfield/Trahald
Description: HRD and pulldown for HD compatibility.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_bssd.diff

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_bchanges.diff

AQ/BRDO Patch:
Current: http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_aq-brdo.diff
Description: This was added to the source a while ago, fixing a bug with AQ and BRDO.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_2pass_vbv.diff

Second Pass ETA Patch:
Current: http://www.benswebs.com/public/x264/patches/x264_fp-eta.01.r680.diff
Creator/Maintainer: morph166955
Description: Forces x264 to use the frame count from the stats file on a second pass if the frame count can't be calculated for some reason (such as the use of a fifo pipe).

Dark Shikari

29th September 2007, 20:07

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_faster-dia.diff
Tiny patch, 3.5% faster DIA for better first pass.
http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_subme7_vc8.diff
Improved subme 7. Basically no speed impact, small quality boost.
http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_satd_fpel.11.diffAllows SATD to be used as a fullpel comparison metric. Totally useless with any search other than ESA, since the SATD ESA has been optimized so well by Akupenguin.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_me-prepass.diff

Runs an ME prepass on the predictors before actually doing the motion search. Somewhat bugged--it can probably be a lot better than it currently is.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_IMH.diff

A motion search slower than UMH but faster than ESA. Not that worthwhile since ESA is now threaded.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_hrd_pulldown.diff

HRD and pulldown for HD compatibility.

http://mirror05.x264.nl/Cef/force.php?file=./patches/x264_aq-brdo.diff

This was added to the source a while ago, fixing a bug with AQ and BRDO.

J_Darnley

30th September 2007, 01:39

There is another patch, the clock/timing/progress one. I don't know if it still works, the diff I have is from rev. 614

http://users.telenet.be/darnley/x264_clock1-614.diff

It prints the total encoding time and prints process 10000 time per file instead of 1000

Sharktooth

30th September 2007, 02:25

moooo

Dark Shikari

30th September 2007, 06:30

Here's my fixed ME_Prepass patch.

Index: common/common.c
===================================================================
--- common/common.c (revision 675)
+++ common/common.c (working copy)
@@ -441,6 +441,8 @@
p->analyse.i_mv_range_thread = atoi(value);
OPT2("subme", "subq")
p->analyse.i_subpel_refine = atoi(value);
+ OPT2("me-prepass", "meprepass")
+ p->analyse.i_me_prepass = atobool(value);
OPT("bime")
p->analyse.b_bidir_me = atobool(value);
OPT("chroma-me")
@@ -879,6 +881,7 @@
s += sprintf( s, " analyse=%#x:%#x", p->analyse.intra, p->analyse.inter );
s += sprintf( s, " me=%s", x264_motion_est_names[ p->analyse.i_me_method ] );
s += sprintf( s, " subme=%d", p->analyse.i_subpel_refine );
+ s += sprintf( s, " me-prepass=%d", p->analyse.i_me_prepass );
s += sprintf( s, " brdo=%d", p->analyse.b_bframe_rdo );
s += sprintf( s, " mixed_ref=%d", p->analyse.b_mixed_references );
s += sprintf( s, " me_range=%d", p->analyse.i_me_range );
Index: encoder/me.c
===================================================================
--- encoder/me.c (revision 675)
+++ encoder/me.c (working copy)
@@ -61,6 +61,23 @@
COPY3_IF_LT( bpred_cost, cost, bpred_mx, mx, bpred_my, my ); \
}

+#define COST_MV_HPEL2( mx, my, cost ) \
+{ \
+ int stride = 16; \
+ uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
+ cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
+ + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
+}
+
+#define COST_MV_HPEL3( mx, my) \
+{ \
+ int stride = 16; \
+ uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
+ int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
+ + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
+ COPY3_IF_LT( bestcost, cost, bestx, mx, besty, my ); \
+}
+
#define COST_MV_X3_DIR( m0x, m0y, m1x, m1y, m2x, m2y, costs )\
{\
uint8_t *pix_base = p_fref + bmx + bmy*m->i_stride[0];\
@@ -177,18 +194,85 @@
pmx = ( bmx + 2 ) >> 2;
pmy = ( bmy + 2 ) >> 2;
bcost = COST_MAX;
-
+
/* try extra predictors if provided */
if( h->mb.i_subpel_refine >= 3 )
{
COST_MV_HPEL( bmx, bmy );
- for( i = 0; i < i_mvc; i++ )
+ if(!h->param.analyse.i_me_prepass)
{
- const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
- const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
- if( mx != bpred_mx || my != bpred_my )
- COST_MV_HPEL( mx, my );
+ for( i = 0; i < i_mvc; i++ )
+ {
+ const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
+ const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
+ if( mx != bpred_mx || my != bpred_my )
+ COST_MV_HPEL( mx, my );
+ }
+ }
+ else
+ {
+ for( i = 0; i < i_mvc; i++ )
+ {
+ const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
+ const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
+ int doSearch = 1;
+ int j;
+ for(j = 0; j < i; j++)
+ {
+ if(mvc[i][0] == mvc[j][0] && mvc[i][1] == mvc[j][1]) doSearch = 0;
+ }
+ if( ( mx != bpred_mx || my != bpred_my ) && doSearch)
+ {
+ int bestcost;
+ int bestx = mx;
+ int besty = my;
+ COST_MV_HPEL2( mx, my, bestcost );
+ COPY3_IF_LT( bpred_cost, bestcost, bpred_mx, bestx, bpred_my, besty );
+ if(bestcost < 2*bpred_cost)
+ {
+ int n;
+ int dir = -2;
+ COST_MV_HPEL2(bestx-4,besty,costs[0]);
+ COST_MV_HPEL2(bestx-2,besty+4,costs[1]);
+ COST_MV_HPEL2(bestx+2,besty+4,costs[2]);
+ COST_MV_HPEL2(bestx+4,besty,costs[3]);
+ COST_MV_HPEL2(bestx+2,besty-4,costs[4]);
+ COST_MV_HPEL2(bestx-2,besty-4,costs[5]);
+ COPY2_IF_LT( bestcost, costs[0], dir, 0 );
+ COPY2_IF_LT( bestcost, costs[1], dir, 1 );
+ COPY2_IF_LT( bestcost, costs[2], dir, 2 );
+ COPY2_IF_LT( bestcost, costs[3], dir, 3 );
+ COPY2_IF_LT( bestcost, costs[4], dir, 4 );
+ COPY2_IF_LT( bestcost, costs[5], dir, 5 );
+ if( dir != -2 )
+ {
+ static const int hex2[8][2] = {{-2,-4}, {-4,0}, {-2,4}, {2,4}, {4,0}, {2,-4}, {-2,-4}, {-4,0}};
+ bestx += hex2[dir+1][0];
+ besty += hex2[dir+1][1];
+ for( n = 1; n < i_me_range && CHECK_MVRANGE4(bestx, besty); n++ )
+ {
+ static const int mod6[8] = {5,0,1,2,3,4,5,0};
+ const int odir = mod6[dir+1];
+ COST_MV_HPEL2(hex2[odir+0][0]+bestx,hex2[odir+0][1]+besty,costs[0]);
+ COST_MV_HPEL2(hex2[odir+1][0]+bestx,hex2[odir+1][1]+besty,costs[1]);
+ COST_MV_HPEL2(hex2[odir+2][0]+bestx,hex2[odir+2][1]+besty,costs[2]);
+ dir = -2;
+ COPY2_IF_LT( bestcost, costs[0], dir, odir-1 );
+ COPY2_IF_LT( bestcost, costs[1], dir, odir );
+ COPY2_IF_LT( bestcost, costs[2], dir, odir+1 );
+ if( dir == -2 )
+ break;
+ bestx += hex2[dir+1][0];
+ besty += hex2[dir+1][1];
+ }
+ }
+ COST_MV_HPEL3(bestx+2,besty-2);
+ COST_MV_HPEL3(bestx+2,besty);
+ COST_MV_HPEL3(bestx+2,besty+2);
+ COST_MV_HPEL3(bestx,besty-2);
+ COST_MV_HPEL3(bestx,besty+2);
+ COST_MV_HPEL3(bestx-2,besty-2);
+ COST_MV_HPEL3(bestx-2,besty);
+ COST_MV_HPEL3(bestx-2,besty+2);
+ COPY3_IF_LT(bpred_cost,bestcost,bpred_mx,bestx,bpred_my,besty);
+ }
+ }
+ }
}
bmx = ( bpred_mx + 2 ) >> 2;
bmy = ( bpred_my + 2 ) >> 2;
COST_MV( bmx, bmy );
}
Index: x264.c
===================================================================
--- x264.c (revision 675)
+++ x264.c (working copy)
@@ -232,7 +232,8 @@
H1( " --mvrange-thread <int> Minimum buffer between threads [-1 (auto)]\n" );
H0( " -m, --subme <integer> Subpixel motion estimation and partition\n"
" decision quality: 1=fast, 7=best. [%d]\n", defaults->analyse.i_subpel_refine );
- H0( " --b-rdo RD based mode decision for B-frames. Requires subme 6.\n" );
+ H0( " --me-prepass Run an ME prepass on predictors. Requires subme 3 or higher.\n");
+ H0( " --b-rdo RD based mode decision for B-frames. Requires subme 6 or higher.\n" );
H0( " --mixed-refs Decide references on a per partition basis\n" );
H1( " --no-chroma-me Ignore chroma in motion estimation\n" );
H1( " --bime Jointly optimize both MVs in B-frames\n" );
@@ -398,6 +399,7 @@
{ "mvrange", required_argument, NULL, 0 },
{ "mvrange-thread", required_argument, NULL, 0 },
{ "subme", required_argument, NULL, 'm' },
+ { "me-prepass", no_argument, NULL, 0 },
{ "b-rdo", no_argument, NULL, 0 },
{ "mixed-refs", no_argument, NULL, 0 },
{ "no-chroma-me", no_argument, NULL, 0 },
Index: x264.h
===================================================================
--- x264.h (revision 675)
+++ x264.h (working copy)
@@ -220,6 +220,7 @@
int i_mv_range; /* maximum length of a mv (in pixels). -1 = auto, based on level */
int i_mv_range_thread; /* minimum space between threads. -1 = auto, based on number of threads. */
int i_subpel_refine; /* subpixel motion estimation quality */
+ int i_me_prepass; /* run an ME prepass on predictors */
int b_bidir_me; /* jointly optimize both MVs in B-frames */
int b_chroma_me; /* chroma ME for subpel and mode decision in P-frames */
int b_bframe_rdo; /* RD based mode decision for B-frames */

Speed: 25% faster (25% less impact on speed as compared to the old ME-prepass)
Quality: 42% better (42% more increase in quality as compared to the old ME-prepass)

Not surprisingly, eliminating the qpel aspect of the search gave a huge speed boost with an actual slight increase in quality.

morph166955

30th September 2007, 08:33

awesome! i'm heading off to bed but i'll update the first post in the morning.

One thing I noticed though was that the faster-dia patch came up saying unexpected end of file when I ran it (looked like it was missing a new line at the end). Just wanted to make sure it was just that and not a missing bit of code at the end or something.

Dark Shikari

30th September 2007, 08:34

awesome! i'm heading off to bed but i'll update the first post in the morning.

One thing I noticed though was that the faster-dia patch came up saying unexpected end of file when I ran it (looked like it was missing a new line at the end). Just wanted to make sure it was just that and not a missing bit of code at the end or something.
Nah, don't worry, just me being retarded I think.

Terranigma

30th September 2007, 16:36

Here's my fixed ME_Prepass patch.

Speed: 25% faster (25% less impact on speed as compared to the old ME-prepass)
Quality: 42% better (42% more increase in quality as compared to the old ME-prepass)

Not surprisingly, eliminating the qpel aspect of the search gave a huge speed boost with an actual slight increase in quality.

I would love to see a custom build with this a.s.a.p. :D

morph166955

30th September 2007, 18:41

Ok I just updated the first post a little, needs a few more tweaks. I also updated my site with a few of the patches and made some diffs that are clean against r680. Most notably, I made a diff on the new ME_Prepass that you posted the code for above as well as making a clean diff for the faster-dia patch. Both are on my site and the links are above. I'm going to try to keep my site updated with diff's as well as Cef's for people who want them.

le_canz

30th September 2007, 18:45

:thanks:

morph166955

30th September 2007, 20:58

Here's my fixed ME_Prepass patch.

Patch refuses to compile.
encoder/me.c: In function 'x264_me_search_ref':
encoder/me.c:229: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:235: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:236: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:237: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:238: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:239: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:240: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:252: warning: implicit declaration of function 'CHECK_MVRANGE4'
encoder/me.c:256: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:257: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:258: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:269: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:270: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:271: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:272: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:273: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:274: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:275: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:276: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
make: *** [encoder/me.o] Error 1

Dark Shikari

30th September 2007, 21:06

Patch refuses to compile.
encoder/me.c: In function 'x264_me_search_ref':
encoder/me.c:229: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:235: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:236: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:237: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:238: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:239: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:240: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:252: warning: implicit declaration of function 'CHECK_MVRANGE4'
encoder/me.c:256: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:257: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:258repl: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:269: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:270: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:271: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:272: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:273: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:274: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:275: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
encoder/me.c:276: error: 'x264_pixel_function_t' has no member named 'fpelcmp'
make: *** [encoder/me.o] Error 1

That's because if you want to compile ME-prepass without the Hadamard patch (--fpelcmp), you must replace all instances of "fpelcmp" with "sad" in me.c. Find/replace. They modify some of the same lines so I can't create a generic patch for this reason.

And also, oops, another small mistake in the patch--should be easily fixable:

replace
int mv_x_min = h->mb.mv_min_fpel[0];
int mv_y_min = h->mb.mv_min_fpel[1];
int mv_x_max = h->mb.mv_max_fpel[0];
int mv_y_max = h->mb.mv_max_fpel[1];

#define CHECK_MVRANGE(mx,my) ( mx >= mv_x_min && mx <= mv_x_max && my >= mv_y_min && my <= mv_y_max )
with
int mv_x_min = h->mb.mv_min_fpel[0];
int mv_y_min = h->mb.mv_min_fpel[1];
int mv_x_max = h->mb.mv_max_fpel[0];
int mv_y_max = h->mb.mv_max_fpel[1];
int mv_x_min4 = h->mb.mv_min_fpel[0]<<2;
int mv_y_min4 = h->mb.mv_min_fpel[1]<<2;
int mv_x_max4 = h->mb.mv_max_fpel[0]<<2;
int mv_y_max4 = h->mb.mv_max_fpel[1]<<2;

#define CHECK_MVRANGE(mx,my) ( mx >= mv_x_min && mx <= mv_x_max && my >= mv_y_min && my <= mv_y_max )
#define CHECK_MVRANGE4(mx,my) ( mx >= mv_x_min4 && mx <= mv_x_max4 && my >= mv_y_min4 && my <= mv_y_max4 )

I forgot to include this in the diff.

Sorry I don't have a better diff, but my version control is nonexistent ;) I assume you can probably make a new diff once these are fixed.

morph166955

30th September 2007, 21:47

got it, new diff's creted, original post updated to have both options (w/ and w/o hadamard)

Dark Shikari

30th September 2007, 21:55

Some mistakes in your post... not all those that I explained were created or maintained by me ;)

AQ/BRDO isn't mine, HRD isn't mine.

lexor

30th September 2007, 21:56

I've asked this question in the multi-thread discussion, but answers were inconclusive, so I'll ask it here again.

Are these patches actually applied to Cef's builds? At first I was told that thread_pool patch is applied, but then someone said that it didn't work with 680 and had to be fixed (which it is now), so it couldn't have been applied before.

So perhaps we need another line for each patch stating if it is applied?

morph166955

30th September 2007, 21:57

fixed.

Trahald

30th September 2007, 22:29

the hrd in the patch was done by Ian Caulfield and the pulldown part I added.

morph166955

30th September 2007, 22:32

updated.

martino

30th September 2007, 22:51

I'd also like to know as to which patches are present in Cef's build. I don't mean the "dark" or "exp" version. It's just confusing, since there seem to be a few builds, more patches, and to me it looks like a hell of mess where trying to find an answer is rather hard...

Thanks

Sagekilla

30th September 2007, 22:58

Would be nice if there was a little scrollover icon that would tell you what patches (that haven't been merged in to the main build) are applied to it..

Edit: Also, I too would like to know what patches are in cef's latest build.

Terranigma

1st October 2007, 01:05

I'd also like to know as to which patches are present in Cef's build. I don't mean the "dark" or "exp" version. It's just confusing, since there seem to be a few builds, more patches, and to me it looks like a hell of mess where trying to find an answer is rather hard...

Thanks
I guess you didn't see This (http://forum.doom9.org/showpost.php?p=1048519&postcount=113) ?
Cef, you think you could update the exp build for now and include AQ, Thread Pool, and the new ME-Prepass patch, then for future references, include me-prepass in your regular builds? :scared:

Dark Shikari

1st October 2007, 01:12

Also add the --subme 7 patch, since its been proven quite thoroughly to increase quality in basically all cases at minimal speed cost.

Faster DIA, IMH, and SATD shouldn't be applied yet. One thought I did have was to instead of making SATD an option for all ME search methods, instead add a 5th ME search method:

DIA
HEX
UMH
ESA
HES (Hadamard Exhaustive Search: Better than all the above methods, but correspondingly slower)

The reason for this is simply that Aku's testing showed that SATD slowed down all the other methods so much that it was better to use SAD ESA than SATD anything else. However, SATD ESA is so heavily optimized that its still useful, and not too much slower than regular ESA.

Terranigma

1st October 2007, 01:18

Also add the --subme 7 patch, since its been proven quite thoroughly to increase quality in basically all cases at minimal speed cost.
Oh I thought the new subme-7 made it to the svn, seems I was wrong :p

Faster DIA, IMH, and SATD shouldn't be applied yet.

I agree with you on imh and dia (At first I was all for imh, but suddenly I changed my mind :D).

One thought I did have was to instead of making SATD an option for all ME search methods, instead add a 5th ME search method:

DIA
HEX
UMH
ESA
HES (Hadamard Exhaustive Search: Better than all the above methods, but correspondingly slower)

The reason for this is simply that Aku's testing showed that SATD slowed down all the other methods so much that it was better to use SAD ESA than SATD anything else. However, SATD ESA is so heavily optimized that its still useful, and not too much slower than regular ESA.

I like this idea. Only allowing SATD to be used with the motion search algorithm it gains any real benefit from. I'm all for the new --hes :)

akupenguin

1st October 2007, 02:07

Not sure I like HES, too similar to HEX. Maybe TES (same "T"ransform as in SATD). Or ESH.
But I'm not sure that SATD is only useful in ESA: There are good reasons that integral-based successive elimination for SAD can only be efficient in ESA, but it's possible that SAD-based successive elimination for SATD could work in other search patterns. The cost of a SAD or a SATD is high enough that the overhead of random access needn't be fatal.

Dark Shikari

1st October 2007, 02:14

Not sure I like HES, too similar to HEX. Maybe TES (same "T"ransform as in SATD). Or ESH.TES seems fine to me; it avoids starting with the same letter as any of the others.

fields_g

1st October 2007, 02:42

However, SATD ESA is so heavily optimized that its still useful, and not too much slower than regular ESA.

Here's another solution... set regular ESA to SATD ESA?

Let SAD IMH be the (speed) middle ground between SAD UMH and SATD ESA?

akupenguin, could you update your chart here (http://forum.doom9.org/showthread.php?p=1047085#post1047085) with Dark Shikari's new ME-prepass found here (http://forum.doom9.org/showthread.php?p=1050161#post1050161)?

Depending on the results, I might even suggest regular ESA to be prepass SATD ESA.

I see it like this... If you are mad enough to do ESA, you are quite likely going to do SATD and prepass also.

Dark Shikari

1st October 2007, 02:43

Here's another solution... set regular ESA to SATD ESA?

Let SAD IMH be the (speed) middle ground between SAD UMH and SATD ESA?

akupenguin, could you update your chart here (http://forum.doom9.org/showthread.php?p=1047085#post1047085) with Dark Shikari's new ME-prepass found here (http://forum.doom9.org/showthread.php?p=1050161#post1050161)?

Depending on the results, I might even suggest regular ESA to be prepass SATD ESA.

I see it like this... If you are mad enough to do ESA, you are quite likely going to do SATD and prepass also.Shouldn't force users to use what they don't want to--better to give them the option.

fields_g

1st October 2007, 02:51

Shouldn't force users to use what they don't want to--better to give them the option.

I agree.. I am just looking for some other option to making another me-type that really isn't anything but a preset. Make the defaults best practices and allow switches to deviate.

How about normal default esa be SATD and with a switch it can SAD and ditch the new me-type. All choices still remain.

Dark Shikari

1st October 2007, 02:54

I agree.. I am just looking for some other option to making another me-type that really isn't anything but a preset. Make the defaults best practices and allow switches to deviate.

How about normal default esa be SATD and with a switch it can SAD and ditch the new me-type. All choices still remain.
How is using --fpel-cmp satd on ESA a "best practice"? Its even slower than regular ESA, and so only useful for those who have even more time to waste.

Also note that if TES/whatever its called uses SATD, the --fpel-cmp satd option will be removed.

akupenguin

1st October 2007, 03:24

How about normal default esa be SATD and with a switch it can SAD and ditch the new me-type. All choices still remain.
A new value for --me is simpler interface-wise than a new top-level option.

fields_g

1st October 2007, 03:32

How is using --fpel-cmp satd on ESA a "best practice"? Its even slower than regular ESA, and so only useful for those who have even more time to waste.

Also note that if TES/whatever its called uses SATD, the --fpel-cmp satd option will be removed.

Using the chart (http://forum.doom9.org/showthread.php?p=1047085#post1047085) as an approximation roughly:
SAD - UMH - ME32
has the same quality as
SAD - ESA - ME7
and
SATD - ESA - ME4

However the FPS is 47 vs. 42 vs. 39 respectively. SATD is slowest, by not by too much. Even though SATD is 7-8% slower, the magic is that SATD, with computation (me-range increments), SATD gains quality much quicker and peaks much higher. Additionally, SATD ESA-me6 beats the quality of SAD ESA-me12 at the same FPS! Therefore, SAD ESA only has a place for me range less than 12.

So instead of telling people that ESA is only has benefits from me-7 through me-12, over other ME-types, you could tell them ESA picks up quality-wise where UMH stops. It seems a little more clean to me. I just hope the explanation is understandable.

Maybe I'm a little my willing to throw computation at it than others, but I think the average person ESA would usually do this anyway.

Dark Shikari

1st October 2007, 03:44

Using the chart (http://forum.doom9.org/showthread.php?p=1047085#post1047085) as an approximation roughly:
SAD - UMH - ME32
has the same quality as
SAD - ESA - ME7
and
SATD - ESA - ME4

However the FPS is 47 vs. 42 vs. 39 respectively. SATD is slowest, by not by too much. Even though SATD is 7-8% slower, the magic is that SATD, with computation (me-range increments), SATD gains quality much quicker and peaks much higher. Additionally, SATD ESA-me6 beats the quality of SAD ESA-me12 at the same FPS! Therefore, SAD ESA only has a place for me range less than 12.

So instead of telling people that ESA is only has benefits from me-7 through me-12, over other ME-types, you could tell them ESA picks up quality-wise where UMH stops. It seems a little more clean to me. I just hope the explanation is understandable.

Maybe I'm a little my willing to throw computation at it than others, but I think the average person ESA would usually do this anyway.Except that on some sources, SATD is inferior to SAD as a metric ;)

Anime in particular seems to suffer from this, in my experience.

fields_g

1st October 2007, 03:55

Except that on some sources, SATD is inferior to SAD as a metric ;)

Anime in particular seems to suffer from this, in my experience.

I was just about to state that I might be overusing this single chart (source) a bit! I'll be downloading one of these fancy "bundle-o-patches" builds and start going at it! Quite honestly, I'll be able to follow either scheme and will be happy as long as I have prepass SATD ESA around in some form, especially with the improvements listed here (http://forum.doom9.org/showthread.php?p=1050161#post1050161).

Cef

1st October 2007, 11:00

I'd also like to know as to which patches are present in Cef's build. I don't mean the "dark" or "exp" version. It's just confusing, since there seem to be a few builds, more patches, and to me it looks like a hell of mess where trying to find an answer is rather hard...

Thanks

As I already said, my builds have AQ and thread pool applied. x264_xxx_dark was including all Dark_shikari's patches at the time it was posted (except faster first pass iirc), and x264_xxx_exp was a build requested by Sagittaire with some Dark's patches and hrd.

I completly agree this is confusing, my organization is terrible on this, but I don't have much time to dedicate to it, and I usually spend it fixing conflicts with new rev's or between patches. If you have any suggestion it's welcome.

martino

1st October 2007, 16:57

I guess you didn't see This (http://forum.doom9.org/showpost.php?p=1048519&postcount=113) ?
I did in fact, and heck. At this point there are two AQ patches. ;_;

But I can say at this point that it's the "old" one. And thanks Cef for explaining.

I'm not sure whether I'd have any good suggestions, but perhaps just a small txt in the directory where your builds are located (on x264.nl) which would state which patch(es) was/were applied to which build(s). Or maybe if morph would be so kind to interpret this into the introductory post in this thread... Whatever works really.

Terranigma

1st October 2007, 17:46

martino, do you know how to complile x264? It doesn't look like noone's too eager to compile an experimental build with aq, thread pool, new subme7, new pre-pass, and keep --fpel-cmp sad/satd like it is as suggested by akupenguin, and if you must implement the new aq, add it as an optional command. Maybe something like aq2-strength. You can find the latest aq2 algortihm by Dark Shikari here (http://forum.doom9.org/showpost.php?p=1046740&postcount=41). I'm not sure if that's the latest, so he's the only one that can confirm or deny this.

Dark Shikari

1st October 2007, 18:14

martino, do you know how to complile x264? It doesn't look like noone's too eager to compile an experimental build with aq, thread pool, new subme7, new pre-pass, and keep --fpel-cmp sad/satd like it is as suggested by akupenguin, and if you must implement the new aq, add it as an optional command. Maybe something like aq2-strength. You can find the latest aq2 algortihm by Dark Shikari here (http://forum.doom9.org/showpost.php?p=1046740&postcount=41). I'm not sure if that's the latest, so he's the only one that can confirm or deny this.New AQ is definitely not ready, and that is quite old IIRC.

Don't add it yet. Its way too experimental.

DeathTheSheep

2nd October 2007, 00:10

I think the reason why most people don't have a go at compiling these patches is that the patches themselves are quite troublesome to apply. :)

I did it with a lot of manual patching, so it's definitely possible. But of course I also messed with a lot of other stuff in the code and then finally deleted the folder. I think it's best at this point to wait it out until more stability/commits/developments occur. Else just use the older build Cef (?) made, there shouldn't be much difference.

Terranigma

2nd October 2007, 00:35

I think it's best at this point to wait it out until more stability/commits/developments occur. Else just use the older build Cef (?) made, there shouldn't be much difference.

Speed: 25% faster (25% less impact on speed as compared to the old ME-prepass)
Quality: 42% better (42% more increase in quality as compared to the old ME-prepass)

Not surprisingly, eliminating the qpel aspect of the search gave a huge speed boost with an actual slight increase in quality.

42% is a huge difference, or so I would think :scared:

Dark Shikari

2nd October 2007, 00:56

42% is a huge difference, or so I would think :scared:

Let's say the original gave a 2% quality boost.

42% better quality over the original ME Prepass = 2.84% quality boost.

Terranigma

2nd October 2007, 01:04

Let's say the original gave a 2% quality boost.

42% better quality over the original ME Prepass = 2.84% quality boost.

good enough for me, and it's faster to boot. :)

DeathTheSheep

2nd October 2007, 07:18

They behave identically on my system. Bit-for-bit identical outputs, and no speed boost to boot!

I'm using the 9.29KB me-prepass diff from the first post. It's a bit bigger than the one I used before, so I assume it's new.
Started from fresh r680 source and applied (in order) satd, subme7, me-prepass.

Yep.

[edit] Ah, wait, finally at merange 4 I notice a teensy weensy bit of difference (<.1%). Probably compiler differences, though, since I updated GCC. :p But anyone who uses merange 4 is truly insane, and for a different reason. :D

Dark Shikari

2nd October 2007, 07:36

They behave identically on my system. Bit-for-bit identical outputs, and no speed boost to boot!

I'm using the 9.29KB me-prepass diff from the first post. It's a bit bigger than the one I used before, so I assume it's new.
Started from fresh r680 source and applied (in order) satd, subme7, me-prepass.

Yep.

[edit] Ah, wait, finally at merange 4 I notice a teensy weensy bit of difference (<.1%). Probably compiler differences, though, since I updated GCC. :p But anyone who uses merange 4 is truly insane, and for a different reason. :DThat would be because the one in the original post is the old ME patch, which still hasn't been updated ;)

DeathTheSheep

2nd October 2007, 07:40

Ok I just updated the first post a little, needs a few more tweaks. I also updated my site with a few of the patches and made some diffs that are clean against r680. Most notably, I made a diff on the new ME_Prepass that you posted the code for above as well as making a clean diff for the faster-dia patch. Both are on my site and the links are above. I'm going to try to keep my site updated with diff's as well as Cef's for people who want them.

Really? He seems to indicate otherwise. So does the difference in filesize... But I'll actually have a look at the code now... :D

Dark Shikari

2nd October 2007, 08:20

Really? He seems to indicate otherwise. So does the difference in filesize... But I'll actually have a look at the code now... :D
The numbers in the patch look quite different from those in the diff I posted :p

fields_g

2nd October 2007, 16:47

Would it be possible to have a reversion number line commented into the diff file, or is that against file syntax? I'd love to be able to say "compare version xxx with yyy"!

Dark Shikari

2nd October 2007, 16:58

Would it be possible to have a reversion number line commented into the diff file, or is that against file syntax? I'd love to be able to say "compare version xxx with yyy"!SVN diff does this.

fields_g

2nd October 2007, 17:29

SVN diff does this.

Great! Trying to identify a patch revision as "the one found in post xxx" or for example "the original AQ" vs. "Dark Shikari's old AQ" vs. "Dark Shikari's new AQ" is a bit complicated/limited if there is more than a couple variations. (Don't you love that we have bright people here developing new things to try?)

This will help people who are making builds explicitly describe what is in their builds also!

burfadel

2nd October 2007, 17:50

I still think --me-prepass should be added as a default option. It should be enabled on principle for subme modes 6 and definately 7, and optional on 3,4,5. Realistically, if people choose subme mode 7 they're aiming for quality/filesize, it hardly would seem logical to select subme 7 but refuse to use the --me-prepass command!

fields_g

2nd October 2007, 18:19

I still think --me-prepass should be added as a default option. It should be enabled on principle for subme modes 6 and definately 7, and optional on 3,4,5. Realistically, if people choose subme mode 7 they're aiming for quality/filesize, it hardly would seem logical to select subme 7 but refuse to use the --me-prepass command!

Interesting... I'm not sure, but either you are suggesting a new approach or are mixing two different (though related) things together.

1) There is a ME type: Dia, Hex, UMH, ESA
2) There is a subpixel refinement of 1-7

Discussion before as questioned making prepass dependent on ME type (if ESA then ON, else OFF), not subpixel refinement. I'll let someone else comment on how wise it would be to connect prepass to subpixel refinement.

burfadel

2nd October 2007, 19:01

I didn't mean to connect it in that sense :) it could be also suggested to have --me-prepass enabled when UMH mode is selected (no point for ESA I believe?...), just as a matter of principle, since mode 7 or UMH are usually selected for quality.

Terranigma

2nd October 2007, 22:37

I still think --me-prepass should be added as a default option.
Yes, I agree. Aku, any chance we'll ever see this in the svn? I could care less now about imh, but prepass, otoh, is pretty useful with esa as you've shown from your graphical comparisons. :)

DeathTheSheep

3rd October 2007, 02:36

How the heck do you apply this diff to the source? What program?!

I always get crap like this every time I apply these patches:

$ patch -u -p1 < subme7.diff
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: encoder/me.c
|===================================================================
|--- encoder/me.c (revision 676)
|+++ encoder/me.c (working copy)
--------------------------
File to patch: encoder/me.c
patching file `encoder/me.c'
Hunk #1 succeeded at 28 (offset 1 line).
Hunk #2 succeeded at 853 (offset 51 lines).
patch unexpectedly ends in middle of line
Hunk #3 FAILED at 912.
1 out of 3 hunks FAILED -- saving rejects to encoder/me.c.rej

Then I manually patch. And that's just for subme.. Take a look at prepass:

$ patch -u -p1 < me-prepass.diff
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: common/common.c
|===================================================================
|--- common/common.c (revision 675)
|+++ common/common.c (working copy)
--------------------------
File to patch: common/common.c
patching file `common/common.c'
Hunk #1 succeeded at 444 (offset 3 lines).
Hunk #2 succeeded at 882 with fuzz 2 (offset 1 line).
can't find file to patch at input line 26
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: encoder/me.c
|===================================================================
|--- encoder/me.c (revision 675)
|+++ encoder/me.c (working copy)
--------------------------
File to patch: encoder/me.c
patching file `encoder/me.c'
Hunk #1 succeeded at 65 (offset 4 lines).
patch: **** malformed patch at line 142: + }

And nothing happens at all. No .rej is created to manually patch off of like for subme, so I have to go through line by line and type it all in by hand.

I think I'm going to memorize this algorithm by heart by the time I'm through with these darn patch problems!

Again, how do you guys do it?! I'm in msys 1.0 using the standard $ patch program... the settings I used are shown above...

DeathTheSheep

3rd October 2007, 03:14

And what exactly happened to cost_mv_hpel? You go right from cost_mv to cost_mv_hpel2... It's still used, but its define is gone? :p

Maybe you just have some odd organization and moved it somewhere else in the code so the patch's context is off. I'm glad I caught that, though--I just wonder what other crucial instructions I've unwittingly overwritten as I blindly followed the patch?! :D

Dark Shikari

3rd October 2007, 03:20

And what exactly happened to cost_mv_hpel? You go right from cost_mv to cost_mv_hpel2... It's still used, but its define is gone? :p

Maybe you just have some odd organization and moved it somewhere else in the code so the patch's context is off. I'm glad I caught that, though--I just wonder what other crucial instructions I've unwittingly overwritten as I blindly followed the patch?! :D
Hpel is in the original code, so it doesn't need to be defined again :p

DeathTheSheep

3rd October 2007, 03:36

True, but in the context of your patch, it jumps right from cost_mv (which is already defined) to cost_mv_hpel2. Meaning when I insert it over the context (first and last lines in your patch), it is no longer in the original source!

Meaning, of course, when I Ctrl+A to select all the code for each code block in the patch and paste that block into the .c, I simply start from the first context line and overwrite everything in the original until the last context line, meaning everything in between is overwritten with the lines of the patch.

I then go in and delete all the little "+" signs next to the added lines and manually remove all the lines marked with "-." I know hpel is not "removed" as in marked with the "-", but if you look at your patch's context lines...
So you can understand why the patch threw me off :).

Index: encoder/me.c
===================================================================
--- encoder/me.c (revision 675)
+++ encoder/me.c (working copy)
@@ -61,6 +61,23 @@
COPY3_IF_LT( bpred_cost, cost, bpred_mx, mx, bpred_my, my ); \ (this is the first line I overwrote, extending to the end...)
}

<but hpel was in here, so it disappeared when I overwrote it with this patch, since it's obviously not here now!>

+#define COST_MV_HPEL2( mx, my, cost ) \
+{ \
+ int stride = 16; \
+ uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
+ cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
+ + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
+}

:p

Dark Shikari

3rd October 2007, 03:53

Sorry if my diffing skills are nonexistent :p

DeathTheSheep

3rd October 2007, 03:59

Lol, no problem. But next time could you put up the whole function (or source code?) instead of the diff? Much easier to manually apply that way. :D

Oh, I noticed the new prepass beefs up the filesize along with the SSIM at constant quantization. Is this normal, or is something b0rked for me?

And quality remains constant (and filesize increases!) as merange is increased... FtW? :p Tested with esa, of course... Satd.

[edit]Yes, as I suspected there is something hideously wrong here. Without any prepass at all, differs drastically from an old build without it. Yeah, some patched sources would help like crazy. XD

Dark Shikari

3rd October 2007, 04:50

Lol, no problem. But next time could you put up the whole function (or source code?) instead of the diff? Much easier to manually apply that way. :D

Oh, I noticed the new prepass beefs up the filesize along with the SSIM at constant quantization. Is this normal, or is something b0rked for me?

And quality remains constant (and filesize increases!) as merange is increased... FtW? :p Tested with esa, of course... Satd.

[edit]Yes, as I suspected there is something hideously wrong here. Without any prepass at all, differs drastically from an old build without it. Yeah, some patched sources would help like crazy. XD
Here is the beginning of my source up to the start of ME-DIA and such:

/*****************************************************************************
* me.c: h264 encoder library (Motion Estimation)
*****************************************************************************
* Copyright (C) 2003 Laurent Aimar
* $Id: me.c,v 1.1 2004/06/03 19:27:08 fenrir Exp $
*
* Authors: Laurent Aimar <fenrir@via.ecp.fr>
* Loren Merritt <lorenm@u.washington.edu>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111, USA.
*****************************************************************************/

#include "common/common.h"
#include "me.h"
#include <limits.h>

/* presets selected from good points on the speed-vs-quality curve of several test videos
* subpel_iters[i_subpel_refine] = { refine_hpel, refine_qpel, me_hpel, me_qpel }
* where me_* are the number of EPZS iterations run on all candidate block types,
* and refine_* are run only on the winner. */
//The --subme 7 values are much higher because since they get the motion search
//closer to the optimal value, they actually tend to save time in the more intensive
//RD search that follows.
static const int subpel_iterations[][4] =
{{1,0,0,0},
{1,1,0,0},
{0,1,1,0},
{0,2,1,0},
{0,2,1,1},
{0,2,1,2},
{0,0,2,2},
{0,0,4,10}};

static void refine_subpel( x264_t *h, x264_me_t *m, int hpel_iters, int qpel_iters, int *p_halfpel_thresh, int b_refine_qpel );

#define BITS_MVD( mx, my )\
(p_cost_mvx[(mx)<<2] + p_cost_mvy[(my)<<2])

#define COST_MV( mx, my )\
{\
int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE,\
&p_fref[(my)*m->i_stride[0]+(mx)], m->i_stride[0] )\
+ BITS_MVD(mx,my);\
COPY3_IF_LT( bcost, cost, bmx, mx, bmy, my );\
}

#define COST_MV_HPEL( mx, my ) \
{ \
int stride = 16; \
uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
+ p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
COPY3_IF_LT( bpred_cost, cost, bpred_mx, mx, bpred_my, my ); \
}

#define COST_MV_HPEL2( mx, my, cost ) \
{ \
int stride = 16; \
uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
+ p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
}

#define COST_MV_HPEL3( mx, my) \
{ \
int stride = 16; \
uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
+ p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
COPY3_IF_LT( bestcost, cost, bestx, mx, besty, my ); \
}

#define COST_MV_X3_DIR( m0x, m0y, m1x, m1y, m2x, m2y, costs )\
{\
uint8_t *pix_base = p_fref + bmx + bmy*m->i_stride[0];\
h->pixf.fpelcmp_x3[i_pixel]( m->p_fenc[0],\
pix_base + (m0x) + (m0y)*m->i_stride[0],\
pix_base + (m1x) + (m1y)*m->i_stride[0],\
pix_base + (m2x) + (m2y)*m->i_stride[0],\
m->i_stride[0], costs );\
(costs)[0] += BITS_MVD( bmx+(m0x), bmy+(m0y) );\
(costs)[1] += BITS_MVD( bmx+(m1x), bmy+(m1y) );\
(costs)[2] += BITS_MVD( bmx+(m2x), bmy+(m2y) );\
}

#define COST_MV_X4( m0x, m0y, m1x, m1y, m2x, m2y, m3x, m3y )\
{\
uint8_t *pix_base = p_fref + omx + omy*m->i_stride[0];\
h->pixf.fpelcmp_x4[i_pixel]( m->p_fenc[0],\
pix_base + (m0x) + (m0y)*m->i_stride[0],\
pix_base + (m1x) + (m1y)*m->i_stride[0],\
pix_base + (m2x) + (m2y)*m->i_stride[0],\
pix_base + (m3x) + (m3y)*m->i_stride[0],\
m->i_stride[0], costs );\
costs[0] += BITS_MVD( omx+(m0x), omy+(m0y) );\
costs[1] += BITS_MVD( omx+(m1x), omy+(m1y) );\
costs[2] += BITS_MVD( omx+(m2x), omy+(m2y) );\
costs[3] += BITS_MVD( omx+(m3x), omy+(m3y) );\
COPY3_IF_LT( bcost, costs[0], bmx, omx+(m0x), bmy, omy+(m0y) );\
COPY3_IF_LT( bcost, costs[1], bmx, omx+(m1x), bmy, omy+(m1y) );\
COPY3_IF_LT( bcost, costs[2], bmx, omx+(m2x), bmy, omy+(m2y) );\
COPY3_IF_LT( bcost, costs[3], bmx, omx+(m3x), bmy, omy+(m3y) );\
}

#define COST_MV_X4_ABS( m0x, m0y, m1x, m1y, m2x, m2y, m3x, m3y )\
{\
h->pixf.fpelcmp_x4[i_pixel]( m->p_fenc[0],\
p_fref + (m0x) + (m0y)*m->i_stride[0],\
p_fref + (m1x) + (m1y)*m->i_stride[0],\
p_fref + (m2x) + (m2y)*m->i_stride[0],\
p_fref + (m3x) + (m3y)*m->i_stride[0],\
m->i_stride[0], costs );\
costs[0] += p_cost_mvx[m0x<<2]; /* no cost_mvy */\
costs[1] += p_cost_mvx[m1x<<2];\
costs[2] += p_cost_mvx[m2x<<2];\
costs[3] += p_cost_mvx[m3x<<2];\
COPY3_IF_LT( bcost, costs[0], bmx, m0x, bmy, m0y );\
COPY3_IF_LT( bcost, costs[1], bmx, m1x, bmy, m1y );\
COPY3_IF_LT( bcost, costs[2], bmx, m2x, bmy, m2y );\
COPY3_IF_LT( bcost, costs[3], bmx, m3x, bmy, m3y );\
}

/* 1 */
/* 101 */
/* 1 */
#define DIA1_ITER( mx, my )\
{\
omx = mx; omy = my;\
COST_MV_X4( 0,-1, 0,1, -1,0, 1,0 );\
}

#define DIA2_ITER( mx, my )\
{\
omx = mx; omy = my;\
COST_MV_X4( 0,-2, 0,2, -2,0, 2,0 );\
}

#define CROSS( start, x_max, y_max )\
{\
i = start;\
if( x_max <= X264_MIN(mv_x_max-omx, omx-mv_x_min) )\
for( ; i < x_max-2; i+=4 )\
COST_MV_X4( i,0, -i,0, i+2,0, -i-2,0 );\
for( ; i < x_max; i+=2 )\
{\
if( omx+i <= mv_x_max )\
COST_MV( omx+i, omy );\
if( omx-i >= mv_x_min )\
COST_MV( omx-i, omy );\
}\
i = start;\
if( y_max <= X264_MIN(mv_y_max-omy, omy-mv_y_min) )\
for( ; i < y_max-2; i+=4 )\
COST_MV_X4( 0,i, 0,-i, 0,i+2, 0,-i-2 );\
for( ; i < y_max; i+=2 )\
{\
if( omy+i <= mv_y_max )\
COST_MV( omx, omy+i );\
if( omy-i >= mv_y_min )\
COST_MV( omx, omy-i );\
}\
}

#define ME_HEX(X,Y,range)\
{\
static const int mod6[8] = {5,0,1,2,3,4,5,0};\
bmx = X;\
bmy = Y;\
dir = -2;\
COST_MV_X3_DIR( -2,0, -1, 2, 1, 2, costs );\
COST_MV_X3_DIR( 2,0, 1,-2, -1,-2, costs+3 );\
COPY2_IF_LT( bcost, costs[0], dir, 0 );\
COPY2_IF_LT( bcost, costs[1], dir, 1 );\
COPY2_IF_LT( bcost, costs[2], dir, 2 );\
COPY2_IF_LT( bcost, costs[3], dir, 3 );\
COPY2_IF_LT( bcost, costs[4], dir, 4 );\
COPY2_IF_LT( bcost, costs[5], dir, 5 );\
if( dir != -2 ) {\
static const int hex2[8][2] = {{-1,-2}, {-2,0}, {-1,2}, {1,2}, {2,0}, {1,-2}, {-1,-2}, {-2,0}};\
bmx += hex2[dir+1][0];\
bmy += hex2[dir+1][1];\
for( i = 1; i < range && CHECK_MVRANGE(bmx, bmy); i++ )\
{\
const int odir = mod6[dir+1];\
COST_MV_X3_DIR( hex2[odir+0][0], hex2[odir+0][1],\
hex2[odir+1][0], hex2[odir+1][1],\
hex2[odir+2][0], hex2[odir+2][1],\
costs );\
dir = -2;\
COPY2_IF_LT( bcost, costs[0], dir, odir-1 );\
COPY2_IF_LT( bcost, costs[1], dir, odir );\
COPY2_IF_LT( bcost, costs[2], dir, odir+1 );\
if( dir == -2 ) break;\
bmx += hex2[dir+1][0];\
bmy += hex2[dir+1][1];}\
if(dir == -2 || bcost > bestCost) {}\
else{\
for( i = 1; i < range && CHECK_MVRANGE(bmx, bmy); i++ )\
{\
const int odir = mod6[dir+1];\
COST_MV_X3_DIR( hex2[odir+0][0], hex2[odir+0][1],\
hex2[odir+1][0], hex2[odir+1][1],\
hex2[odir+2][0], hex2[odir+2][1],\
costs );\
dir = -2;\
COPY2_IF_LT( bcost, costs[0], dir, odir-1 );\
COPY2_IF_LT( bcost, costs[1], dir, odir );\
COPY2_IF_LT( bcost, costs[2], dir, odir+1 );\
if( dir == -2 ) break;\
bmx += hex2[dir+1][0];\
bmy += hex2[dir+1][1];}}}\
omx = bmx; omy = bmy;\
COST_MV_X4( 0,-1, 0,1, -1,0, 1,0 );\
COST_MV_X4( -1,-1, -1,1, 1,-1, 1,1 );\
}\

void x264_me_search_ref( x264_t *h, x264_me_t *m, int (*mvc)[2], int i_mvc, int *p_halfpel_thresh )
{
int cost;
const int bw = x264_pixel_size[m->i_pixel].w;
const int bh = x264_pixel_size[m->i_pixel].h;
const int i_pixel = m->i_pixel;
int i_me_range = h->param.analyse.i_me_range;
int bmx, bmy, bcost;
int bpred_mx = 0, bpred_my = 0, bpred_cost = COST_MAX;
int omx, omy, pmx, pmy;
uint8_t *p_fref = m->p_fref[0];
DECLARE_ALIGNED( uint8_t, pix[16*16], 16 );

int i, j;
int dir;
int costs[6];

int mv_x_min = h->mb.mv_min_fpel[0];
int mv_y_min = h->mb.mv_min_fpel[1];
int mv_x_max = h->mb.mv_max_fpel[0];
int mv_y_max = h->mb.mv_max_fpel[1];
int mv_x_min4 = h->mb.mv_min_fpel[0]<<2;
int mv_y_min4 = h->mb.mv_min_fpel[1]<<2;
int mv_x_max4 = h->mb.mv_max_fpel[0]<<2;
int mv_y_max4 = h->mb.mv_max_fpel[1]<<2;

#define CHECK_MVRANGE(mx,my) ( mx >= mv_x_min && mx <= mv_x_max && my >= mv_y_min && my <= mv_y_max )
#define CHECK_MVRANGE4(mx,my) ( mx >= mv_x_min4 && mx <= mv_x_max4 && my >= mv_y_min4 && my <= mv_y_max4 )

const int16_t *p_cost_mvx = m->p_cost_mv - m->mvp[0];
const int16_t *p_cost_mvy = m->p_cost_mv - m->mvp[1];

bmx = x264_clip3( m->mvp[0], mv_x_min*4, mv_x_max*4 );
bmy = x264_clip3( m->mvp[1], mv_y_min*4, mv_y_max*4 );
pmx = ( bmx + 2 ) >> 2;
pmy = ( bmy + 2 ) >> 2;
bcost = COST_MAX;

/* try extra predictors if provided */
if( h->mb.i_subpel_refine >= 3 )
{
COST_MV_HPEL( bmx, bmy );
if(!h->param.analyse.i_me_prepass)
{
for( i = 0; i < i_mvc; i++ )
{
const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
if( mx != bpred_mx || my != bpred_my )
COST_MV_HPEL( mx, my );
}
}
else
{
for( i = 0; i < i_mvc; i++ )
{
const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
int doSearch = 1;
int j;
for(j = 0; j < i; j++)
{
if(mvc[i][0] == mvc[j][0] && mvc[i][1] == mvc[j][1]) doSearch = 0;
}
if( ( mx != bpred_mx || my != bpred_my ) && doSearch)
{
int bestcost;
int bestx = mx;
int besty = my;
COST_MV_HPEL2( mx, my, bestcost );
COPY3_IF_LT( bpred_cost, bestcost, bpred_mx, bestx, bpred_my, besty );
if(bestcost < 2*bpred_cost)
{
int n;
int dir = -2;
COST_MV_HPEL2(bestx-4,besty,costs[0]);
COST_MV_HPEL2(bestx-2,besty+4,costs[1]);
COST_MV_HPEL2(bestx+2,besty+4,costs[2]);
COST_MV_HPEL2(bestx+4,besty,costs[3]);
COST_MV_HPEL2(bestx+2,besty-4,costs[4]);
COST_MV_HPEL2(bestx-2,besty-4,costs[5]);
COPY2_IF_LT( bestcost, costs[0], dir, 0 );
COPY2_IF_LT( bestcost, costs[1], dir, 1 );
COPY2_IF_LT( bestcost, costs[2], dir, 2 );
COPY2_IF_LT( bestcost, costs[3], dir, 3 );
COPY2_IF_LT( bestcost, costs[4], dir, 4 );
COPY2_IF_LT( bestcost, costs[5], dir, 5 );
if( dir != -2 )
{
static const int hex2[8][2] = {{-2,-4}, {-4,0}, {-2,4}, {2,4}, {4,0}, {2,-4}, {-2,-4}, {-4,0}};
bestx += hex2[dir+1][0];
besty += hex2[dir+1][1];
for( n = 1; n < i_me_range && CHECK_MVRANGE4(bestx, besty); n++ )
{
static const int mod6[8] = {5,0,1,2,3,4,5,0};
const int odir = mod6[dir+1];
COST_MV_HPEL2(hex2[odir+0][0]+bestx,hex2[odir+0][1]+besty,costs[0]);
COST_MV_HPEL2(hex2[odir+1][0]+bestx,hex2[odir+1][1]+besty,costs[1]);
COST_MV_HPEL2(hex2[odir+2][0]+bestx,hex2[odir+2][1]+besty,costs[2]);
dir = -2;
COPY2_IF_LT( bestcost, costs[0], dir, odir-1 );
COPY2_IF_LT( bestcost, costs[1], dir, odir );
COPY2_IF_LT( bestcost, costs[2], dir, odir+1 );
if( dir == -2 )
break;
bestx += hex2[dir+1][0];
besty += hex2[dir+1][1];
}
}
COST_MV_HPEL3(bestx+2,besty-2);
COST_MV_HPEL3(bestx+2,besty);
COST_MV_HPEL3(bestx+2,besty+2);
COST_MV_HPEL3(bestx,besty-2);
COST_MV_HPEL3(bestx,besty+2);
COST_MV_HPEL3(bestx-2,besty-2);
COST_MV_HPEL3(bestx-2,besty);
COST_MV_HPEL3(bestx-2,besty+2);
COPY3_IF_LT(bpred_cost,bestcost,bpred_mx,bestx,bpred_my,besty);
}
}
}
}
bmx = ( bpred_mx + 2 ) >> 2;
bmy = ( bpred_my + 2 ) >> 2;
COST_MV( bmx, bmy );
}
else
{
/* check the MVP */
COST_MV( pmx, pmy );
/* I don't know why this helps */
bcost -= BITS_MVD(bmx,bmy);

for( i = 0; i < i_mvc; i++ )
{
const int mx = x264_clip3( ( mvc[i][0] + 2 ) >> 2, mv_x_min, mv_x_max );
const int my = x264_clip3( ( mvc[i][1] + 2 ) >> 2, mv_y_min, mv_y_max );
if( mx != bmx || my != bmy )
COST_MV( mx, my );
}
}

COST_MV( 0, 0 );

DeathTheSheep

3rd October 2007, 04:53

This is with subme7 patch and satd, obviously, which is good. Any other patches in here that would cause conflicts? And I assume this is r680?

If this is all clear, this is ready and rearin' to go!! :)

Dark Shikari

3rd October 2007, 05:00

This is with subme7 patch and satd, obviously, which is good. Any other patches in here that would cause conflicts? And I assume this is r680?

If this is all clear, this is ready and rearin' to go!! :)
r676, but I don't think anything since then has changed this part of the file.

DeathTheSheep

3rd October 2007, 05:21

Nice.

PS: Only 3 more posts to go, Dark Shikari... :)

morph166955

3rd October 2007, 12:07

easiest way to create a diff:

1) make distclean (if needed)
2) svn diff > mydiff.diff

DONE! Thats how I build my patches. To apply I use "patch -Np0 -i mydiff.diff"

Inventive Software

3rd October 2007, 17:35

F***. I have no internet for a few days, and this happens! So many patches, so little time to investigate all their merits and caveats and speed boosts and quality improvements and I'm exhausted typing this sentence already!

Back to simplicity for now! F***! :D

Good work morph166955 on compiling the list so it's at least more readable. Good work Dark Shikari on firstly explaining the patches well, and on getting 1000 posts. :)

Suggestion to the mods: STICKY!!! ;)

And I hit 1000 posts a while ago and I didn't notice! :D

akupenguin

3rd October 2007, 17:43

1) make distclean (if needed)
Superfluous. None of the generated files is in the repository, so svn diff knows to ignore them.

morph166955

3rd October 2007, 23:09

fair enough...wasnt sure and i figured it couldnt really hurt matters to do a distclean before generating the diff.

dirio49

8th October 2007, 00:39

No idea if it is the right thread.
but can anybody tell what do i need to cross compile x264 for win
i have gcc-minge32 4.2.1 installed (on gentoo)
thanks
what command(2) to i need to run

later

P.S. if it is in the wrong thread please move it.

TheRyuu

8th October 2007, 00:58

How would one go about updating the Subme 7 patch to be compatible with rev. 680?

or, would someone please do it? :)
I want to try it out but it's not patching correctly with rev 680.

Thanks.

foxyshadis

8th October 2007, 03:19

No idea if it is the right thread.
but can anybody tell what do i need to cross compile x264 for win
i have gcc-minge32 4.2.1 installed (on gentoo)
thanks
what command(2) to i need to run

later

P.S. if it is in the wrong thread please move it.

Check this thread: https://forum.doom9.org/showthread.php?t=92726

TheRyuu

11th October 2007, 04:40

Ok, did some fiddling around and built it with the subme7 patch, faster dai patch, thread pool patch, and aq patch.

Those 4 patches I think are probably the 4 most use full patches that there are right now.

http://www.sendspace.com/file/oo3j2l

Thats the build. It's generic (cpu needs mmx support though, which should basically be all of us).
Has build options of enable avis input, pthread, and mp4 output.

I have no idea if it works, all the patches were applied successfully except for the subme7 patch, which I had to manually copy and paste over (and delete lines), but it built, so I assume I did it correctly.

If you want me to try and build any other combination of patches, ask me and I might have free time to do it.

Raere

11th October 2007, 15:00

Ok, did some fiddling around and built it with the subme7 patch, faster dai patch, thread pool patch, and aq patch.

Those 4 patches I think are probably the 4 most use full patches that there are right now.

http://www.sendspace.com/file/oo3j2l

Thats the build. It's generic (cpu needs mmx support though, which should basically be all of us).
Has build options of enable avis input, pthread, and mp4 output.

I have no idea if it works, all the patches were applied successfully except for the subme7 patch, which I had to manually copy and paste over (and delete lines), but it built, so I assume I did it correctly.

If you want me to try and build any other combination of patches, ask me and I might have free time to do it.

Are you using r680?

Maybe you could make another build with imh and me-prepass? I use those, so they're useful for me. If they're not hard to implement, why not? Also, I don't know what they're calling hadamard these days, but can that be implemented too? As far as I know that's still somewhat useful. And maybe an SSE2 build, as I think most everyone who's encoding x264 at least has SSE2. I dunno, just a thought if you have time. Thanks!

Terranigma

11th October 2007, 16:18

Are you using r680?
Also, I don't know what they're calling hadamard these days, but can that be implemented too?
--hadamard has been renamed to --fpel-cmp followed by satd (Sum of Absolute Hadamard Transformed Differences)
(Default, it uses Sum of Absolute Differences/SAD).
so, an e.g. would be
--fpel-cmp satd
satd is hadamard, and iirc, it made it in the svn.

Sharktooth

11th October 2007, 16:55

--fpel-cmp is not in the svn.

Terranigma

11th October 2007, 17:22

--fpel-cmp is not in the svn.
You're right, I just tested using rev. 680 from x264.nl:
http://i24.tinypic.com/f4henl.png

:p

TheRyuu

12th October 2007, 06:11

Are you using r680?

Maybe you could make another build with imh and me-prepass? I use those, so they're useful for me. If they're not hard to implement, why not? Also, I don't know what they're calling hadamard these days, but can that be implemented too? As far as I know that's still somewhat useful. And maybe an SSE2 build, as I think most everyone who's encoding x264 at least has SSE2. I dunno, just a thought if you have time. Thanks!

x264's main stuff is written in asm so using a build that used sse2 in it would pretty much be useless in terms of speed increase (considering only 1 part of x264 uses code that the compiler generates that way).

http://www.sendspace.com/file/ei6xlw

That is x264 with the following patches:
-thread pool
-AQ
-IMH
-Faster dai
-Subme7
-me-prepass
-fpel (hadamard, or however you spell it)

Causes x264.exe to weigh in it a hefty 1.03mb :p
I have absolutely no idea if it'll work. It built fine without errors, so it should work. Have fun trying it out :)
All the patches together might in some way screw something up. I have no idea, but all the patches applied, applied without errors, those that did give errors were patched manually by hand.

foxyshadis

12th October 2007, 11:47

Seems to work okay for me. SATD halving the speed while raising quality the tiniest bit. :p The other patches seem to be working okay too.

Raere

12th October 2007, 13:46

x264's main stuff is written in asm so using a build that used sse2 in it would pretty much be useless in terms of speed increase (considering only 1 part of x264 uses code that the compiler generates that way).

http://www.sendspace.com/file/ei6xlw

That is x264 with the following patches:
-thread pool
-AQ
-IMH
-Faster dai
-Subme7
-me-prepass
-fpel (hadamard, or however you spell it)

Causes x264.exe to weigh in it a hefty 1.03mb :p
I have absolutely no idea if it'll work. It built fine without errors, so it should work. Have fun trying it out :)
All the patches together might in some way screw something up. I have no idea, but all the patches applied, applied without errors, those that did give errors were patched manually by hand.

Awesome, thanks! Works for me. I really have no idea about processor optimizations, so thanks for the info.

TheRyuu

12th October 2007, 21:40

SATD halving the speed while raising quality the tiniest bit. :p

Thats normal right? :)

TheRyuu

14th October 2007, 00:18

Another question:
What patches are worth while to actually use (like don't effect speed at all, or only effect speed a little, and obviously increase the quality).

So far I think that these are worthwhile:
-faster dia
-subme7 patch
-AQ
-thread pool (not really effecting quality but still a needed patch)

I'm curious about the following patches:
-fpel patch (heard this was really slow)
-me-prepass
-IMH (vs. exhaustive vs UMH)

btw, what do the bchanges and bssd patch do?

Thanks

Dark Shikari

14th October 2007, 00:25

-faster dia
Not worthwhile. Proven to give dubious benefits in only some cases.

-fpel patch (heard this was really slow)
The --fpel patch with ESA is not much slower than regular ESA and somewhat better, so its "worthwhile" as an insane option.
-me-prepassUseful as a higher-end option in place of using absurd numbers of refs. Can give a roughly 1-3% quality boost with the most recent version.
-IMH (vs. exhaustive vs UMH)Pretty much useless with ESA now threaded.

Gabriel_Bouvigne

15th October 2007, 15:34

btw, what do the bchanges and bssd patch do?

http://mailman.videolan.org/pipermail/x264-devel/2007-August/003559.html

bssd:
when using --direct auto in multipass mode, in case of tie between direct auto and direct temporal, use SSD as an extra criterion to choose direct mode

Sagekilla

15th October 2007, 20:10

Not worthwhile. Proven to give dubious benefits in only some cases.

The --fpel patch with ESA is not much slower than regular ESA and somewhat better, so its "worthwhile" as an insane option.
Useful as a higher-end option in place of using absurd numbers of refs. Can give a roughly 1-3% quality boost with the most recent version.
Pretty much useless with ESA now threaded.

But IMH would be better then ESA if you want a sort of middle ground with not too huge of a speed loss, correct?

Sharktooth

15th October 2007, 20:14

no... IMH is practically useless. the difference in speed is so small it isnt worthwhile.

Terranigma

15th October 2007, 23:25

no... IMH is practically useless. the difference in speed is so small it isnt worthwhile.

I agree with these guys here. Sharktooth and Shikari made me realize that it's virtually useless when you now have esa multithreaded. :D

Sagekilla

16th October 2007, 01:28

Wow good point.. I just tried out the threaded esa and I must say it's insanely fast!

Raere

16th October 2007, 03:24

Will esa's multithreading be useful on single-cores, or is it only good for multi-cores?

Sharktooth

16th October 2007, 03:48

Multicores as usual.

Raere

19th October 2007, 04:01

So, imh is still useful on single-cores then?

Terranigma

19th October 2007, 14:21

So, imh is still useful on single-cores then?

Sorta, but who has a single-core processor nowadays?

lexor

19th October 2007, 15:09

Sorta, but who has a single-core processor nowadays?

Aw, my pride...

Lots of people still use single core, I'd hazard a guess, majority is still on single core. (majority in the world, not just on these boards)

Sharktooth

19th October 2007, 16:22

single core ppl should not bother using ESA or IMH...
They should stick to saner settings unless they dont care about encoding speed at all...

Unearthly

19th October 2007, 16:28

So, imh is still useful on single-cores then?

Not really. The reason IMH was useful before was because it offered higher quality than UMH, but was multi-threaded. At the time, ESA was only single threaded, so the speed difference on a mutli-core systems was very large.

In other words, IMH was never really useful on single core machines. Just use ESA if you want more than UMH.

SpAwN_gUy

30th October 2007, 13:49

i'm sorry...
is there somebody who will update first-post, so all changes to the patches would be more "in one place" and i'd be happy to use them with 681 (as seen on x264.nl)

reason for this: i've noticed some chat about changing option names in Dark Shikari's patches..

fields_g

30th October 2007, 15:15

Another idea to add to the OP is limitations of the patch and/or an explanation of why it hasn't been added to SVN. Really help point out the maturity and support of each patch. This might need the help of akupenguin.

Dark Shikari

30th October 2007, 16:19

Another idea to add to the OP is limitations of the patch and/or an explanation of why it hasn't been added to SVN. Really help point out the maturity and support of each patch. This might need the help of akupenguin.Subme7 is ready to add to SVN, hopefully it'll be done soon.

ME-Prepass is next.

Inventive Software

30th October 2007, 17:49

From the x264 changelog:
use hex instead of dia for rdo mv refinement. ~0.5% lower bitrate at subme=7.
patch by Dark Shikari.

That the patch you're on about?

Dark Shikari

30th October 2007, 18:06

From the x264 changelog:
That the patch you're on about?Yup, that's it.

morph166955

30th October 2007, 22:50

i'm sorry...
is there somebody who will update first-post, so all changes to the patches would be more "in one place" and i'd be happy to use them with 681 (as seen on x264.nl)

reason for this: i've noticed some chat about changing option names in Dark Shikari's patches..

If someone gives me a list of them I'll be happy to update my post.

bob0r

3rd February 2008, 05:48

x264.736.modified.01.exe (http://files.x264.nl/x264.736.modified.01.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.4.MatMaul.diff
http://mailman.videolan.org/pipermail/x264-devel/2008-January/004015.html
x264_hrd_pulldown.04.diff
- HRD and pulldown for HD compatibility

Edit:
Link to x264 patches collected: http://files.x264.nl/x264_patches/

bob0r

4th February 2008, 15:13

x264.736.modified.02.exe (http://files.x264.nl/x264.736.modified.02.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.4.MatMaul.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3521
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Edit:
Link to x264 patches collected: http://files.x264.nl/x264_patches/

survivant001

5th March 2008, 20:08

thanks for thr build. always welcome

leoenc

6th March 2008, 20:31

x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

bob0r, are you sure you applied the interlaced patch to 747? adding the --tff switch results in "uknown option --tff"

bob0r

6th March 2008, 22:13

bob0r, are you sure you applied the interlaced patch to 747? adding the --tff switch results in "uknown option --tff"

Any new version then?
Cus that patch was only to fix the patch with interlacing....

ggab

7th March 2008, 00:12

i hope version 748 will be less problematic, we have several bugs in version 745/746/747

maybe videolan team's official svn will help us with these (great of course) patche' releases we are having :)

csy

7th March 2008, 07:14

bob0r, are you sure you applied the interlaced patch to 747? adding the --tff switch results in "uknown option --tff"

I've also noticed the --tff switch is missing. Something must have gone wrong with the patch and this build.

bob0r

7th March 2008, 07:23

I added x264_hrd_pulldown.04.diff instead of x264_hrd_pulldown.04_interlace.diff
(because it has to be run outside the x264 dir.... my bad :p

Next x264 update ill put up a new build, since there is only a speed difference (not that much) use the previous if you really need interlacing <-- EVIL!

burfadel

7th March 2008, 07:36

Whats with version 748 I saw on techouse's site? I can't seem to access www.x264.tk or techouse.project357.com for the last day or so. Is there any major difference?

Looking on the track log there doesn't seem to be a difference, so I presume thats still 747?

For x264.nl, why not just check for a change in the timestamp thats listed on the log page? A change in the timestamp would mean a new version and thus can be compiled. As this won't have a revision number at the moment, you could list it on the website as a latest revision & date until the revision number for that build is known! You could even have a patched and non patched version done automatically?

Just an idea, I know its not perfect, but I presume it would be reasonably easy to implement for the time being until something else can be worked out. Or will the problem be rectified in the next week or so?...

bob0r

7th March 2008, 09:47

Thanks to akupenguin (pengvado) x264.nl is auto updating again!!

So here is 748 (based on GIT updates count) + fixed HRD interlacing patch

x264.748.modified.exe (http://files.x264.nl/x264.748.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3550
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

leoenc

7th March 2008, 14:46

--tff is working fine now, thanks bob0r!

burfadel

17th March 2008, 09:16

Any chance of an AQ version of 757 (or later)?!

bob0r

17th March 2008, 14:14

x264.757.modified.exe (http://files.x264.nl/x264.757.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3550
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

burfadel

17th March 2008, 14:37

Thanks :)

bob0r

19th March 2008, 04:11

x264.763.modified.exe (http://files.x264.nl/x264.763.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3550
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

Sharktooth

19th March 2008, 04:20

thanks. r763.modified is on megui autoupdate server.

burfadel

19th March 2008, 23:41

764 AQ patched version? thanks :) I take it 764 may provide a fractional speed increase over 763?

Dark Shikari

19th March 2008, 23:52

764 AQ patched version? thanks :) I take it 764 may provide a fractional speed increase over 763?Probably a small speed boost (mostly on Intel CPUs). Might as well wait for when pengvado commits my SSSE3 4x4/4x8 SATD patch, along with my "skip intra encode" patch, both of which give speedups that don't affect the output. The latter is the biggest boost: in particular, it gives the following speed boosts:

1. Small speed increase with no RD and no trellis.
2. Slightly bigger speed increase with RD and trellis 0 or 1.
3. Large speed increase with RD and trellis 2.

bob0r

20th March 2008, 04:50

764 AQ patched version? thanks :) I take it 764 may provide a fractional speed increase over 763?

Tomorrow, tired now :)

Dark Shikari

20th March 2008, 04:58

Tomorrow, tired now :)And we're already up to 767... :p

burfadel

20th March 2008, 05:09

And we're already up to 767... :p

Its great to see these improvements, can't wait to try out 767+ with AQ! - got a whole lot of stuff to encode, lol

bob0r

21st March 2008, 00:54

x264.774.modified.exe (http://files.x264.nl/x264.774.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3550
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

TheRyuu

21st March 2008, 01:48

Can you build one with the exact same patches as above only switch out the VAQ patch for the old haali aq patch? (for anime encoding)

burfadel

21st March 2008, 04:11

There 775 out now :) an Aq version of 775 would be nice, thankyou! I hope it sounds like I'm asking politely and not demanding! lol

MuLTiTaSK

21st March 2008, 04:30

wow it seems like everytime i check theres a new version and a patched mod follows blazing fast thanks alot bob0r i appreciate your dedication

the devz of this awesome encoder are making it better and much quicker with every update. making backups of my dvd's to H.264 look alot better then they did with XviD :thanks:

bob0r

21st March 2008, 04:48

Tomorrow i hope --me tesa and possible other options (only pengvado knows) will be added to the make fprofiled part of the code.

When i wake up ill check for another build.

No i will not create a build with Haali's AQ, if you truely got some issues disguss it with Dark Shikari. If he says it can't be or won't be done, too bad.

Else start making a way i can compile BOTH patches into one single .exe, but that's up to you.

Dark Shikari

21st March 2008, 05:22

No i will not create a build with Haali's AQ, if you truely got some issues disguss it with Dark Shikari. If he says it can't be or won't be done, too bad.Don't stick the burden on me, I don't care either way about it :p

burfadel

21st March 2008, 09:08

wow, 776 just shows up on x264.nl and already 777 is on git! thats some fast work, great to see!

buzzqw

21st March 2008, 10:22

thanks bob0r!

BHH

MythCreator

22nd March 2008, 04:45

This is my first Modified edition:thanks::thanks::thanks:

Based on Rev.779, without mp4-output

x264_aq_var.48.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.v...093/focus=3550

Link to x264 patches collected: http://files.x264.nl/x264_patches/

bob0r

22nd March 2008, 05:09

Try this AQ patch, its basically the same only GIT-optimized:
http://forum.doom9.org/showthread.php?p=1115494#post1115494

I am waiting for pengvado to finish the new fprofiled settings + AQ is part of GIT.

Then a new hddvd/bluray patched .exe will be created, but probably not so often as now..... the main reason now is: AQ

@MythCreator
Good job, you want it tested, compile more often or just for fun?

MythCreator

22nd March 2008, 05:20

MythCreator
Good job, you want it tested, compile more often or just for fun?

I'll compile at least one per day if there is a new GIT version:)

BTW,where can I get the VAQ 2.0?

Dark Shikari

22nd March 2008, 05:25

BTW,where can I get the VAQ 2.0?Its not released yet, still in early development ;)

1.0 will go into official GIT soon.

MythCreator

22nd March 2008, 05:28

1.0 will go into official GIT soon.

It's so nice~

bob0r

22nd March 2008, 05:28

I'll compile at least one per day if there is a new GIT version:)

BTW,where can I get the VAQ 2.0?

Oh my bad, i thought there was a link, just wait for it to be added to GIT, sorry :D

Btw do try to enable mp4 output, and have you used ptheads?
Guess we shall see when your builds are approved, but i dont think a mod is going to approve a new build each day :p

*zzzz :D

MythCreator

22nd March 2008, 06:58

Oh my bad, i thought there was a link, just wait for it to be added to GIT, sorry :D

Btw do try to enable mp4 output, and have you used ptheads?
Guess we shall see when your builds are approved, but i dont think a mod is going to approve a new build each day :p

*zzzz :D

I used pthreads.But I got some problem
with gpac and I don't know how to solve it..

bob0r

22nd March 2008, 14:24

I used pthreads.But I got some problem
with gpac and I don't know how to solve it..

To compile the needed gpac files i do:
- cvs -z3 -d:pserver:anonymous@gpac.cvs.sourceforge.net:/cvsroot/gpac co -P gpac
- cd gpac
- configure
- make clean
- make lib
- copy: gpac/include/gpac (COMPLETE dir) to /local/include/
- copy: gpac/bin/gcc/libgpac_static.a to /local/lib/

Note: My gcc system is installed to local, your include and lib may have other paths, likely just /include/ and /lib/.

Hope this helps...

bob0r

22nd March 2008, 14:37

x264.785.modified.exe (http://files.x264.nl/x264.785.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3550
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

Inventive Software

22nd March 2008, 15:40

I very much appreciate people keeping the version number going, but does GIT not have one? Cos it would make things so much easier to read in the trunk! :D

MythCreator

22nd March 2008, 15:42

To compile the needed gpac files i do:
- cvs -z3 -d:pserver:anonymous@gpac.cvs.sourceforge.net:/cvsroot/gpac co -P gpac
- cd gpac
- configure
- make clean
- make lib
- copy: gpac/include/gpac (COMPLETE dir) to /local/include/
- copy: gpac/bin/gcc/libgpac_static.a to /local/lib/

Note: My gcc system is installed to local, your include and lib may have other paths, likely just /include/ and /lib/.

Hope this helps...

Got it..Thanks a lot~

MythCreator

22nd March 2008, 16:24

Based on Rev.786, without AQ..enable mp4 output

Download Address:
http://www.megaupload.com/?d=U19O3CM4

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.v...093/focus=3550
x264_bssd.diff

Link to x264 patches collected: http://files.x264.nl/x264_patches/

MeteorRain

22nd March 2008, 17:02

Based on Rev.786, without AQ..enable mp4 output

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.v...093/focus=3550
x264_bssd.diff

Link to x264 patches collected: http://files.x264.nl/x264_patches/
Cool but i think you should post the file onto other free netdisk such as fs2you or so

MuLTiTaSK

22nd March 2008, 19:55

bob0r would it be possible to add a rss feed to x264.nl for new builds?

bob0r

22nd March 2008, 22:48

I have no clue how to, nor do i want to know how, but just check x264.nl before you start an encode?

Or have a script pick up latest version always :)

http://mirror01.x264.nl/x264/x264.exe always points to the latest version (on all mirrors)

Zep

23rd March 2008, 22:24

both 785 and 786 crash hard and quick for me but

core:58 r763M 0949975 works great still from MeGui update a few days back (the march 19th release I think)

bob0r

24th March 2008, 03:06

How did it crash?
What commandline?
What source?
Can you put a reproducable package online?

We need details!!!

MythCreator

24th March 2008, 09:38

x264-rev.789 non patch edition
download link:
http://www.fs2you.com/files/80b45d40-f97d-11dc-b91b-0014221f4662/

there is no diffrence from GIT,but make fprofiled with GCC 4.2.1

SpAwN_gUy

24th March 2008, 09:46

any chance building latest GITs with MSVS? (Cef has some troubles.. and i've got fresh setUP .. so i can't test)

bob0r

24th March 2008, 12:46

x264.790.modified.exe (http://files.x264.nl/x264.790.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.786.fixed.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3550
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

MythCreator

25th March 2008, 02:00

x264.791.modified

download link:
http://www.fs2you.com/files/d13ff451-fa06-11dc-ab82-0014221f4662/

x264_aq_var.48.786.fixed.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.v...093/focus=3550
x264_bssd.diff

Link to x264 patches collected: http://files.x264.nl/x264_patches/

make fprofiled by use GCC 4.2.1

MythCreator

25th March 2008, 13:37

x264.796.modified

download link:
http://www.fs2you.com/files/8f5859fa-fa68-11dc-a2ed-0014221f3995/

x264_aq_var.48.786.fixed.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.v...093/focus=3550
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.ph...19#post1047919
x264_bssd.diff

Link to x264 patches collected: http://files.x264.nl/x264_patches/

Zep

25th March 2008, 20:49

How did it crash?
What commandline?
What source?
Can you put a reproducable package online?

We need details!!!

I would have but my encode took days (went back to older version) and by the time it was done so I could get some details, the new version was out and that one seems stable. :D

I'll just keep my mouth shut unless I can paste the crash log etc... lol

thanks

bob0r

26th March 2008, 00:33

....

I'll just keep my mouth shut unless I can paste the crash log etc... lol

thanks

Please don't, just next time report commandline + revision (version number) plus as many info as you can.

I am glad the new .exe files have been stable for you.

If you read the x264.nl notes, you can also read what versions may have been unstable (just scroll down)...

Thanks for your input!

bob0r

26th March 2008, 01:59

@Sharktooth and MythCreator

Please read:
http://forum.doom9.org/showthread.php?t=134391

I am not saying that your builds are crashing, but so far gcc 3.4.6 still seems the best gcc to build x264.exe.

Just a reminder if people start to report x264 related crashes which cannot be reproduced with my .exe files.

bob0r

26th March 2008, 02:16

x264.798.modified.exe (http://files.x264.nl/x264.798.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.786.fixed.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3550
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

Sharktooth

26th March 2008, 04:28

ok, ill update ASAP.

MythCreator

26th March 2008, 09:07

x264.798.modified.beta.exe

Download Link:
http://www.fs2you.com/files/cde2a07d-fb0b-11dc-9640-00142218fc6e/

x264_aq_var.48.786.fixed.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.v...093/focus=3550
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.ph...19#post1047919
x264_bssd.diff

Link to x264 patches collected: http://files.x264.nl/x264_patches/

PS:make fprofiled by GCC 4.3.0,may be someone wish to help me for doing some testing work? Thank's a lot

burfadel

26th March 2008, 10:42

On my C2D, your 798.modified GCC 4.3.0 versions is fractionally slower than bob0r's 798.modified version. By fractionally I mean around .05 - 0.1 fps! (consistently)

MythCreator

26th March 2008, 10:51

On my C2D, your 798.modified GCC 4.3.0 versions is fractionally slower than bob0r's 798.modified version. By fractionally I mean around .05 - 0.1 fps! (consistently)

no crash?

burfadel

26th March 2008, 11:31

No crash :) I only did just under a 1000 frames though, but different segments.

bob0r

26th March 2008, 12:09

@MythCreator

Did you compile gcc 4.3.0 or did you use some pre packages version?

MythCreator

26th March 2008, 12:18

@MythCreator

Did you compile gcc 4.3.0 or did you use some pre packages version?

pre package.I can't connect to SVN and I don't know why.

burfadel

26th March 2008, 13:17

Isn't SVN no longer used?

LoRd_MuldeR

26th March 2008, 13:59

Isn't SVN no longer used?

Nope. VideoLAN, which is also hosting x264, switched to git recently...

burfadel

26th March 2008, 16:28

Thats what I thought, and would explain the problem he's having!

Wishbringer

26th March 2008, 17:27

Thats what I thought, and would explain the problem he's having!

Thought he is talking about compiler svn repo...

@MythCreator

Did you compile gcc 4.3.0 or did you use some pre packages version?
...
pre package.I can't connect to SVN and I don't know why.

:rolleyes:

akupenguin

27th March 2008, 03:14

On my C2D, your 798.modified GCC 4.3.0 versions is fractionally slower than bob0r's 798.modified version. By fractionally I mean around .05 - 0.1 fps! (consistently)
I can think of 3 possible reasons:
* Bssd is slower than unpatched (applies only with direct=auto).
* Different video content for fprofile. Yes, I have said that the content doesn't matter much, but that doesn't mean I'd discount the possibility of .05 fps. (Well, maybe I would. .05 out of what, .2? Speed differences are meaningless. Please don't ever post one again. The meaningful measure is speed ratio, preferably accompanied by the standard deviation of N trials. The base fps that the ratio is relative to (or the amount of the difference, interderivable) is optional and is the least important datum.)
* Different compiler. No, I do not have faith in newer gcc being consistently faster than old versions. (Consider that in a program as small as x264, even a single pessimized instruction in the wrong place could have a significant effect on total speed.)

Sharktooth

27th March 2008, 03:35

gcc 4.x was always slower than 3.4.x in my tests...

bob0r

27th March 2008, 04:03

@MythCreator

Doom9 forum shortens URLs, so copying plain text isn't going to work.

Here to make it easy for you:
http://x264.nl/x264.modified.txt

Also note the updated x264_2pass_vbv.7.diff patch.
grab it from the collected patches dir, apply like this:
patch -p1 < x264_patches/x264_2pass_vbv.7.diff

MythCreator

27th March 2008, 05:10

@MythCreator

Doom9 forum shortens URLs, so copying plain text isn't going to work.

Here to make it easy for you:
http://x264.nl/x264.modified.txt

Also note the updated x264_2pass_vbv.7.diff patch.
grab it from the collected patches dir, apply like this:
patch -p1 < x264_patches/x264_2pass_vbv.7.diff

Thanks~~~~

MythCreator

27th March 2008, 11:31

x264.798.modified.beta2.exe (http://www.fs2you.com/files/5e08fb02-fbe9-11dc-bde1-0014221b798a/)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.786.fixed.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

make fprofiled by GCC 4.3.0 , just for test

Inventive Software

27th March 2008, 15:55

@clsid: If you still read this thread, can you provide an install of MinGW with GCC 3.4.6 please?

@everybody: What's the difference between fprofiled and "normal" builds?

survivant001

27th March 2008, 15:57

@MythCreator. I obtain unknown option -- nal-hrd with your latest build

Inventive Software

27th March 2008, 16:00

x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.ph...19#post1047919

That patch adds the option. If you don't know what it is, don't use it. ;)

survivant001

27th March 2008, 16:05

That patch adds the option. If you don't know what it is, don't use it. ;)

I know that, and Yes I need it, but it's not include in the 798beta2

MythCreator

27th March 2008, 16:14

I know that, and Yes I need it, but it's not include in the 798beta2

It's just my fault...I forget to add it

MythCreator

27th March 2008, 16:28

x264.798.modified.beta2.fixed.exe (http://www.fs2you.com/files/930a6038-fc12-11dc-a915-0014221b798a/)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.786.fixed.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

make fprofiled by GCC 4.3.0 , just for test

MythCreator

28th March 2008, 11:51

x264.798.modified.final.exe (http://www.fs2you.com/files/06ebf71c-fcb5-11dc-8543-0014221f3995/)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.786.fixed.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919
x264_bssd.diff

Link to x264 patches collected: http://files.x264.nl/x264_patches/

make fprofiled by GCC 4.3.0

BTW:Is thread pool still usefull?

survivant001

28th March 2008, 16:53

for me it is. I don't where else to get the latest build patched.

buzzqw

28th March 2008, 17:04

i agree with survivant

BHH

MythCreator

28th March 2008, 17:48

for me it is. I don't where else to get the latest build patched.

I mean, is the Thread Pool.diff still useful at this time?

burfadel

28th March 2008, 19:18

Well, the modified.final build of Mythcreator's is a fraction fractionally (...?!) slower than that of Bobor's modified 798. On a test I just did its was 99.998 percent (rounded up!) the performance of Bobor's modified build.

I'm just curious, if Mythcreator is willing to give it a try, purely as an experimental build whether he could try making a build with GCC 4.4.0. It is very much in testing stage, but snapshots can be downloaded by mirror sites. They're updated weekly according to the gcc site. One such site is:

http://gcc-ca.internet.bs/snapshots/

(of course then select the latest 4.4 folder at the bottom of the list)

Course could always build your own latest 4.4.0 snapshot from the GCC SVN
svn://gcc.gnu.org/svn/gcc/trunk
http://gcc.gnu.org/svn/gcc/trunk
^^ view contents

I realise you probably know all this already, I listed it in case anyone was wondering :)

GCC 4.4.0 supposedly has many optimisations that may finally allow it to be faster than GCC 3.4.6. Would be interesting to see anyway!

MythCreator

29th March 2008, 05:27

Well, the modified.final build of Mythcreator's is a fraction fractionally (...?!) slower than that of Bobor's modified 798. On a test I just did its was 99.998 percent (rounded up!) the performance of Bobor's modified build.

I'm just curious, if Mythcreator is willing to give it a try, purely as an experimental build whether he could try making a build with GCC 4.4.0. It is very much in testing stage, but snapshots can be downloaded by mirror sites. They're updated weekly according to the gcc site. One such site is:

http://gcc-ca.internet.bs/snapshots/

(of course then select the latest 4.4 folder at the bottom of the list)

Course could always build your own latest 4.4.0 snapshot from the GCC SVN
svn://gcc.gnu.org/svn/gcc/trunk
http://gcc.gnu.org/svn/gcc/trunk
^^ view contents

I realise you probably know all this already, I listed it in case anyone was wondering :)

GCC 4.4.0 supposedly has many optimisations that may finally allow it to be faster than GCC 3.4.6. Would be interesting to see anyway!

I know it,but 4.4.0 is still in develope, maybe have some serious bug in it

burfadel

29th March 2008, 07:52

I was actually taking that in to consideration! Like I said above, just as a purely experimental build just to see whether gcc 4.4.0 performs as well or outperforms the older 3.4.6. It may even be the case it, in terms of comparing it to 4.3.0 or 4.2.2, more stable for the use of x264 since due to the crashes and slowdowns somethings not quite right with those two revisions!

Just thought it would be interesting to see, and compare to your 4.3.0 build. Its ok if its too much trouble, just thought it may be interesting!

morph166955

29th March 2008, 13:58

I mean, is the Thread Pool.diff still useful at this time?

For those of us who have very fast octa-core machines it is. For those who don't have them, results have varied between minimal speed boost to nothing at all to even a slight decrease in speed. Unfortunately I haven't been able to find a diff for thread pool that will cleanly apply in a few months. The whole reason that it does still work for the higher speed machines is that our threads are completing and destroying themeslves faster then x264 is expecting them to so there is a lag time created. Its actually taking longer to create/destroy a thread then it is for the thread to do its job in some cases. Having the thread pool there makes it so that we don't have to wait. In tests I ran a while back, when I had the thread pool patch installed I was getting a pretty significant speed boost on SD content (HD content is large enough that it maxes the CPU out anyway).

If we could get a thread pool patch that applied cleanly to the current version that would be awesome. What would be even more awesome is just putting it into the git version in a way where under normal running it would do its normal process but where we could put like --threadpool on the cli and it would run in that mode. No idea how hard that would be to do but I think that would be optimal that way one can choose the threading method they want on the fly.

MythCreator

29th March 2008, 18:50

x264.798.modified.experimental.exe (http://www.fs2you.com/files/d366136e-fdb8-11dc-a5be-0014221f4662/)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264_aq_var.48.786.fixed.diff
http://forum.doom9.org/showthread.php?t=132760
x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

make frofiled in GCC 4.4.0 20080328 experimental,totally for experiment & test

To test the speed change, Please use this build and my beta2.fixed , and bob0r's build

burfadel

29th March 2008, 19:54

The experimental version worked fine, no crashes or anything unexpected... with good results!
On the test clip I used, with 1000 frames (not much I know but good enough for this purpose), I did several runs of each and used the average of all the runs. I used each version after the other one, not do all the runs at the same time to ensure accurate results.

With 798.modified.final (GCC 4.3.0):
Speed: 37.11 fps

With Bobor's 798 modified (GCC 3.4.6):
Speed: 37.25 fps

With 798.experimental (GCC 4.4.0):
speed: 37.37 fps

Of course on each run the speeds were slightly different, but in each run the slowest of Bobor's build run was still faster than the fastest for 798.modified and the slowest run of 798.experimental was still faster than the fastest for Bobor's build.

GCC 4.4.0 looks promising to be faster and regain the speed lost so far with GCC 4.x.x! The final may even be slightly better again (although an assumption one would presume it will be optimised further).

Thanks for the trial, it looks good! Maybe others with different CPU's could also test, the results may be different with AMD's for example?...

Dethis

29th March 2008, 21:45

Burfadel, thanks for the test

But, as MythCreator suggested, you should use the "beta2.fixed" instead of the ".final". The ".final" one contains the bssd patch which probably inserts some more computation load which is absend for the "bobor" and "experimental" versions.

MasterNobody

29th March 2008, 23:02

For those of us who have very fast octa-core machines it is. For those who don't have them, results have varied between minimal speed boost to nothing at all to even a slight decrease in speed. Unfortunately I haven't been able to find a diff for thread pool that will cleanly apply in a few months. The whole reason that it does still work for the higher speed machines is that our threads are completing and destroying themeslves faster then x264 is expecting them to so there is a lag time created. Its actually taking longer to create/destroy a thread then it is for the thread to do its job in some cases. Having the thread pool there makes it so that we don't have to wait. In tests I ran a while back, when I had the thread pool patch installed I was getting a pretty significant speed boost on SD content (HD content is large enough that it maxes the CPU out anyway).

If we could get a thread pool patch that applied cleanly to the current version that would be awesome. What would be even more awesome is just putting it into the git version in a way where under normal running it would do its normal process but where we could put like --threadpool on the cli and it would run in that mode. No idea how hard that would be to do but I think that would be optimal that way one can choose the threading method they want on the fly.

Here is my variant of thread-pool patch which was made relatively to current git version: http://stashbox.org/96770/x264_thread_pool.r798.diff
As I know Dark Shikari's AQ patch slightly conflicts with thread-pool patch (in ratecontrol.c) so one of them need some modifications for compatibility.

morph166955

30th March 2008, 02:06

Here is my variant of thread-pool patch which was made relatively to current git version: http://stashbox.org/96770/x264_thread_pool.r798.diff
As I know Dark Shikari's AQ patch slightly conflicts with thread-pool patch (in ratecontrol.c) so one of them need some modifications for compatibility.

Sweet! Thanks! Can't wait to give it a shot.

MythCreator

31st March 2008, 05:55

x264.805.modified.experimental.exe (http://www.fs2you.com/files/0aed3c00-fedf-11dc-9380-0014221f3995/)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

make frofiled in GCC 4.4.0 20080328 experimental,totally for experiment & test

bob0r

31st March 2008, 06:40

x264.805.modified.exe (http://files.x264.nl/x264.805.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

burfadel

31st March 2008, 14:35

Another test with revision 805, the GCC 4.4.0 version was faster every time!

Bobor's normal and modified builds with GCC 3.4.6 for this test averaged 29.84 fps (they both averaged very close so I grouped them together, I believe the additional patches don't affect the settings I was using).

The GCC 4.4.0 build averaged 30.32fps, and again on each test the slowest run was still faster than the fastest run with the GCC 3.4.6 builds. I used the exact same settings for both, run from the command line.

It looks like GCC 4.4.0 will be a good option once finalised for Bobor's website, since its a significant improvement over 4.3.0 and lower!

survivant001

31st March 2008, 19:08

x264.805.modified.exe (http://files.x264.nl/x264.805.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.6.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3550
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

why did you include vbv patch 7 ?

Wishbringer

31st March 2008, 19:22

Tested x264 bob0r's gcc 3.4.6 build 805 vs. MythCreator's gcc 4.4.0 build 805:
System: EVGA 780i, QX6700 (at 3.2GHz - Multi=12), 8GB DDR2-800 RAM, Vista64 Ultimate

bob0r
-[Information] Log for job1 (video, Test.avs -> Test_video_bob0r.mp4)
--[Information] [31.03.2008 19:47:05] Started handling job
--[Information] [31.03.2008 19:47:05] Preprocessing
--[NoImage] Job commandline: "C:\Program Files (x86)\megui\tools\x264\x264.805.modified.exe" --crf 18 --level 4.1 --ref 8 --mixed-refs --no-fast-pskip

--bframes 16 --b-pyramid --b-rdo --bime --weightb --direct auto --subme 7 --trellis 2 --analyse p8x8,b8x8,i4x4,i8x8 --8x8dct --vbv-bufsize 9000 --vbv-

maxrate 24000 --me umh --threads auto --thread-input --sar 4993:5000 --progress --no-dct-decimate --output "D:\Filme\Test_video_bob0r.mp4"

"D:\Filme\Test.avs" --aud --nal-hrd --me-prepass
--[Information] [31.03.2008 19:47:06] Encoding started
--[NoImage] Standard output stream:
--[NoImage] Standard error stream
---[NoImage] avis [info]: 1920x816 @ 25.00 fps (2977 frames)
---[NoImage] x264 [info]: using SAR=4993/5000
---[NoImage] x264 [warning]: DPB size (18800640) > level limit (12582912)
---[NoImage] x264 [info]: using cpu capabilities: MMX MMXEXT SSE SSE2 SSE3 SSSE3 Cache64
---[NoImage] mp4 [info]: initial delay 2 (scale 25)
---[NoImage] x264 [info]: slice I:44 Avg QP:17.87 size:181942 PSNR Mean Y:45.82 U:50.39 V:51.29 Avg:46.92 Global:44.63
---[NoImage] x264 [info]: slice P:1291 Avg QP:19.59 size:104266 PSNR Mean Y:43.02 U:48.19 V:49.09 Avg:44.21 Global:43.92
---[NoImage] x264 [info]: slice B:1642 Avg QP:21.21 size: 37879 PSNR Mean Y:42.19 U:48.12 V:48.98 Avg:43.46 Global:43.20
---[NoImage] x264 [info]: mb I I16..4: 16.9% 51.8% 31.3%
---[NoImage] x264 [info]: mb P I16..4: 3.3% 11.2% 5.1% P16..4: 49.2% 23.4% 6.2% 0.0% 0.0% skip: 1.6%
---[NoImage] x264 [info]: mb B I16..4: 0.7% 2.0% 1.0% B16..8: 44.8% 3.5% 5.0% direct: 9.8% skip:33.3%
---[NoImage] x264 [info]: 8x8 transform intra:56.1% inter:39.6%
---[NoImage] x264 [info]: direct mvs spatial:99.9% temporal:0.1%
---[NoImage] x264 [info]: ref P 80.0% 13.0% 2.8% 1.4% 0.8% 0.8% 0.6% 0.5%
---[NoImage] x264 [info]: ref B 83.7% 13.1% 1.5% 0.7% 0.4% 0.3% 0.2%
---[NoImage] x264 [info]: SSIM Mean Y:0.9805205
---[NoImage] x264 [info]: PSNR Mean Y:42.605 U:48.185 V:49.062 Avg:43.840 Global:43.514 kb/s:13759.53
---[NoImage] encoded 2977 frames, 2.41 fps, 13763.75 kb/s
--[Information] Final statistics
---[NoImage] Desired video bitrate: 18 kbit/s
---[NoImage] Obtained video bitrate (approximate: 13764 kbit/s
--[Information] [31.03.2008 20:07:45] Job completed
--[Information] [31.03.2008 20:07:45] Postprocessing
---[Information] Deleting intermediate files

MythCreator
-[Information] Log for job1 (video, Test.avs -> Test_video_MythCreator.mp4)
--[Information] [31.03.2008 19:15:23] Started handling job
--[Information] [31.03.2008 19:15:23] Preprocessing
--[NoImage] Job commandline: "C:\Program Files (x86)\megui\tools\x264\x264.805.modified.experimental.exe" --crf 18 --level 4.1 --ref 8 --mixed-refs --no-fast-pskip --bframes 16

--b-pyramid --b-rdo --bime --weightb --direct auto --subme 7 --trellis 2 --analyse p8x8,b8x8,i4x4,i8x8 --8x8dct --vbv-bufsize 9000 --vbv-maxrate 24000

--me umh --threads auto --thread-input --sar 4993:5000 --progress --no-dct-decimate --output "D:\Filme\Test_video_MythCreator.mp4" "D:\Filme\Test.avs"

--aud --nal-hrd --me-prepass
--[Information] [31.03.2008 19:15:24] Encoding started
--[NoImage] Standard output stream:
--[NoImage] Standard error stream
---[NoImage] avis [info]: 1920x816 @ 25.00 fps (2977 frames)
---[NoImage] x264 [info]: using SAR=4993/5000
---[NoImage] x264 [warning]: DPB size (18800640) > level limit (12582912)
---[NoImage] x264 [info]: using cpu capabilities: MMX MMXEXT SSE SSE2 SSE3 SSSE3 Cache64
---[NoImage] mp4 [info]: initial delay 2 (scale 25)
---[NoImage] x264 [info]: slice I:44 Avg QP:17.87 size:181942 PSNR Mean Y:45.82 U:50.39 V:51.29 Avg:46.92 Global:44.63
---[NoImage] x264 [info]: slice P:1291 Avg QP:19.58 size:104282 PSNR Mean Y:43.02 U:48.19 V:49.09 Avg:44.21 Global:43.92
---[NoImage] x264 [info]: slice B:1642 Avg QP:21.21 size: 37887 PSNR Mean Y:42.19 U:48.12 V:48.98 Avg:43.46 Global:43.20
---[NoImage] x264 [info]: mb I I16..4: 16.9% 51.8% 31.3%
---[NoImage] x264 [info]: mb P I16..4: 3.3% 11.2% 5.1% P16..4: 49.2% 23.4% 6.2% 0.0% 0.0% skip: 1.6%
---[NoImage] x264 [info]: mb B I16..4: 0.7% 2.0% 1.0% B16..8: 44.7% 3.5% 5.0% direct: 9.9% skip:33.3%
---[NoImage] x264 [info]: 8x8 transform intra:56.0% inter:39.6%
---[NoImage] x264 [info]: direct mvs spatial:99.9% temporal:0.1%
---[NoImage] x264 [info]: ref P 80.0% 13.0% 2.8% 1.4% 0.9% 0.8% 0.6% 0.5%
---[NoImage] x264 [info]: ref B 83.6% 13.1% 1.5% 0.7% 0.4% 0.3% 0.2%
---[NoImage] x264 [info]: SSIM Mean Y:0.9805220
---[NoImage] x264 [info]: PSNR Mean Y:42.605 U:48.184 V:49.062 Avg:43.840 Global:43.514 kb/s:13761.79
---[NoImage] encoded 2977 frames, 2.41 fps, 13766.00 kb/s
--[Information] Final statistics
---[NoImage] Desired video bitrate: 18 kbit/s
---[NoImage] Obtained video bitrate (approximate: 13767 kbit/s
--[Information] [31.03.2008 19:36:01] Job completed
--[Information] [31.03.2008 19:36:01] Postprocessing
---[Information] Deleting intermediate files

bob0r's build has a slightly lower bitrate: 13763.75 kb/s vs. 13766.00 kb/s
maybe because of "x264_2pass_vbv.6.diff" instead of "x264_2pass_vbv.7.diff"

both encoded at 2.41 fps: bob0r = 20min40sec; MythCreator = 20min38sec
2secs difference seems to be fault-tolerance

bob0r

31st March 2008, 20:25

why did you include vbv patch 7 ?

Fixed, just a typo, 7 was used.

Seems there is 8 now?

tenkai

31st March 2008, 22:19

L 4.1 ref 8? and 16 bframes? Do you think that will work out? :D I´m just wondering if i missed anything.. wasn´t the max 5 ref on 1920x800 for example and 3 bframes on L.4.1 encoding?

Wishbringer

31st March 2008, 22:24

Fixed, just a typo, 7 was used.

Now I am a bit curios. I thought that same build with same patches should produce same bitcompatible encoded output of videostream, independend of used compiler...

See my previous post, where bob0r's build produced a slightly smaller encoded video.

Wishbringer

31st March 2008, 22:28

@tenkai:

I didn't said, that this is a useable clip on any standalone player.
I used my PS3-SD profile on a HD clip, because I wanted to test how encoding speed is in compare with nearly all options maxed out.
On the other hand, these settings work very well with my SD DVD conversions for my PS3.

tenkai

31st March 2008, 22:44

oki.. SD with ref 8 etc shouldn´t be a prob, yea. so am i right? Its still the max for 1080p encoding to fit quality and full support on ps3 etc ref 4/5 but with maximal 3 bframes? I´m askin coz i want to have the maximum possible quality but having a compatible encode on the other hand.. and i have no idea if i can reach that with ref 5 and 3 bfr.. even with subme 7 etc :(

tenkai

31st March 2008, 22:46

some ideas, tweaks are welcomed ofcoz :)

bob0r

2nd April 2008, 07:56

x264.808.modified.exe (http://files.x264.nl/x264.808.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

MythCreator

2nd April 2008, 11:04

x264.808.modified.experimental.exe (http://www.fs2you.com/files/564e6247-009c-11dd-855e-0014221f4662/)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919
x264_fix_win_stdin.diff
http://forum.doom9.org/showthread.php?p=1120065#post1120065

Link to x264 patches collected: http://files.x264.nl/x264_patches/

make frofiled in GCC 4.4.0 20080331 experimental,totally for experiment & test

audyovydeo

3rd April 2008, 10:20

bob0r,

I noticed the Thread Pool patch hasn't been applied in a while.
I haven't seen at which point it has been dumped : is it no longer useful, or it simply hasn't been updated ?

Also, am I alone in regretting SVN versioning ? Git seems pretty f_d up.

cheers
a/v

SpAwN_gUy

3rd April 2008, 10:44

Also, am I alone in regretting SVN versioning ? Git seems pretty f_d up. i was also thinking about WHY? ... SVN is just "New" (comparing to CVS) and git.. is just "newer"...
so.. why?

nm

3rd April 2008, 10:53

i was also thinking about WHY? ... SVN is just "New" (comparing to CVS) and git.. is just "newer"...
so.. why?
Git is in many ways better than SVN and it makes things easier for collaborative development.

Henrikx

3rd April 2008, 12:40

What Patches would be useful in a Linux Build (Ubuntu).

J_Darnley

3rd April 2008, 13:08

What do you want your build to do differently than the git source? The patches (not the Win stdin patch) should work the same way on all systems.

Henrikx

3rd April 2008, 14:25

@J_Darnley

The patches (not the Win stdin patch) should work the same way on all systems.
That is precisely what I wanted to know.

THX!

DeathTheSheep

3rd April 2008, 21:43

Git might be newer, and from a developer's standpoint it might be "supposedly" advantageous, but for pete's sake, the web interface is AWFUL. It can't even list the revision number on its (butt-ugly) log.

Ironic an encoder that makes video quite pretty has such an ugly developer's interface. Simplicity, I know...

Sharktooth

3rd April 2008, 21:48

that's coz there's no revision number in git...

DeathTheSheep

3rd April 2008, 21:50

Exactly my point. :)

Maybe you should "make the move" and switch your builds to a "last modified date," rather than revision number?

...on second thought, I've grown fond of the revision number system. :P

[edit]
More on topic, I'm thinking of optimizing prepass a bit more, at least so that it shares x264's new mv clipping method. Additionally, I'm also thinking of re-releasing AQ0.46 to overwrite the current 1.0 AQ (possibly deciding which rounding method to use based on whether or not an "--anime" tag is specified).

What sayeth thee?

SpAwN_gUy

4th April 2008, 09:44

Additionally, I'm also thinking of re-releasing AQ0.46 to overwrite the current 1.0 AQ (possibly deciding which rounding method to use based on whether or not an "--anime" tag is specified).

What sayeth thee?Cool. :) .. nice and simple... just --anime... and you're an Anime Encoder :) ..

maybe, it would be nice to make such optimisations not only for AQ? (like lame with its presets..)

Razorholt

4th April 2008, 16:46

How about grown-ups who don't like whatching cartoons? Kiiiiiding :)

What's your intention DeathTheSheep in re-releasing 0.46? Using it for anime only? Don't you think VAQ2 is better?

Dark Shikari

4th April 2008, 17:57

How about grown-ups who don't like whatching cartoons? Kiiiiiding :)

What's your intention DeathTheSheep in re-releasing 0.46? Using it for anime only? Don't you think VAQ2 is better?His purpose is to discourage me from ever releasing any patches publicly again by intentionally distributing broken code with my name on it. :rolleyes:

~bT~

4th April 2008, 18:38

^ that's sad :p

keep up the gr8 work Dark!

burfadel

4th April 2008, 20:07

Its probably the combination of settings he's using is falsely giving the impression that 0.46 is better, when for example if crf mode the objective quality and file size better relates to simply lowering the crf value. A much better way would be to lower the crf instead. It could also be a settings issue where non-ideal settings are used? Of course with solid colour in anime (the Japanese definition of which is all animation), it probably just simply means lowering the AQ strength slightly may benefit, say to 0.7, and disabling fast-pskip! by --no-fast-pskip which is purported to be better in those situations.

Actually Dark_shikari, how is the relationship between --no-fast-pskip and AQ, do they affect each other? I was just thinking that if a slight deterioration occurs on solid colour due to pskip, AQ may be compensating hence causing problems referred to by deaththesheep? I'm only guessing here...

Dark Shikari

4th April 2008, 20:25

Actually Dark_shikari, how is the relationship between --no-fast-pskip and AQ, do they affect each other? I was just thinking that if a slight deterioration occurs on solid colour due to pskip, AQ may be compensating hence causing problems referred to by deaththesheep? I'm only guessing here...AQ will reduce the amount of blocks in which fast-pskip is activated, yes. The primary effect is a reduction in speed.

burfadel

4th April 2008, 21:13

Therefore for anime, 0.46 was possibly not correcting the pskip induced areas that occur in flat areas, at least to the extent of the proper VAQ? leaving more bits for lines?

If thats the case, is variable fast-pskip possible? that is, only apply fast-pskip to non-flat areas? I'm guessing that would improve the picture quality and effectiveness of VAQ, and since it would only be applied to flat areas (or largely flat areas) not induce the speed penalty that occurs when --no-fast-pskip is applied? That would be a good default option if possible! I believe its only flat areas that are visually penalised by using fast-pskip?

DeathTheSheep

5th April 2008, 00:43

Mmkay guys, lots of stuff to address:

What's your intention DeathTheSheep in re-releasing 0.46? Using it for anime only? Don't you think VAQ2 is better?
My intentions at the time of initial posting were nil, which is precisely the reason why I was gathering input. I've made extensive quality comparisons of the two and have found 0.46 produces a significantly higher SSIM and (perceived) visual quality on low-bitrate (high quantizer) anime sources, especially in baseline profile, and regardless of other options used. However, I do plan to wait it out a bit in the hopes that DS will in fact address anime specifically in VAQ2.0 (read below).

His purpose is to discourage me from ever releasing any patches publicly again by intentionally distributing broken code with my name on it.
What! I appreciate your work as much as (or perhaps more than) most. Recall I was of the very first to support your first x264 hacks (remember x264-sad-opt and such?), and was always in favor of you releasing all and anything you could. I can't understand your accusation of me intentionally discouraging you...unless the underlying intention of this statement is the OS equivalent of FUD (I'm teasing here, but really now, it's unprofessional). Look at my sources, people--any anime sources, including Gundam Seed--and test yourself. My tests were all very clear, replete with commandlines, the source, tons of screenshots, visual analysis, the sample clips, metrics (especially the one found in the "megathread"). Besides, the code is certainly not "broken" (though the rounding is mathematically suboptimal), and if you'd like I can take your name off of it, DS. You can disown your ugly first born, if you will. But at your strongly suggested request, I will hold off on releasing it and instead watch the progress of VAQ 2.0.

Its probably the combination of settings he's using is falsely giving the impression that 0.46 is better
Wrong, my good sir. While I do not profess nearly the code affinity and proficiency of DS, I understand the quality settings of x264 at least well enough to conduct proper and thorough testing (note: baseline/mobile profile). However, in doubt, I also used an exhaustive testing rubric, trying a ridiculously large combination of settings in the effort to convince myself. As you recall, I used no-fast-pskip in the majority of my testing--but certainly not all--and please remember the fact that the underlying algorithm is essentially a quantizer redistribution according to a variance function, not an intentional modification of color information. The difference between 1.0 and 0.46, for instance, is the rounding for core formula. That is all.

If you want a technical explanation, refer to the previous thread. 1.0 is no longer a patch. The admittedly interesting postulate that 0.46 is "favoring lines" isn't necessarily the case either. The quantizers were properly raised, but to the extent that the x264 deblocker would (conveniently) efficiently conceal artifacts. Additionally, relatively flat backgrounds and such were indeed awarded lower quantizers, but not to the extent that blocks would reappear due to an excess or an irregularity of distribution across the entire background. However, it isn't a matter of simply decreasing AQ strength on 1.0 to re-acheive this fragile balance--rather, the two algorithms have a markedly distinct visual effect from one another, for better (as DS strongly supports), or for worse (as DS strongly opposes, since his interests understandably lie with his newer codebase, the evolution of his brainchild).

And indeed, this is only noticeable on anime. Perhaps 0.48/1.0 is in the lead as regards live footage/non-anime content. One would be more willing to accept that a perfected mathematical formula to account for real quantization error would perform more optimally on non-artificial footage unlike anime.

But when all is said and done, this isn't a settings discussion, or a question of whether or not anybody/anything knows what he/it's doing, or who/what is right and wrong about the subtleties of the algorithms at hand--I had wanted a re-release so that any user can come along and see it for himself with his sources and bitrates which performed more optimally for his needs.

The point of fact is, there is far too much contention, and I fear re-releasing AQ 0.46 at this point will do much in the way of hurting DS (his feelings, his name, his efforts) and perhaps the concept of "progress" itself, so I will not. At least not until 2.0 has matured enough to make differences meaningful, not to mention worthwhile. I have my eyes on it, in the meantime...

Dark Shikari

5th April 2008, 00:48

Look at my sources, people--any anime sources, including Gundam Seed--and test yourself. My tests were all very clear, replete with commandlines, the source, tons of screenshots, visual analysis, the sample clips, metrics (especially the one found in the "megathread").Your tests were completely invalid, and only proved one thing: lower AQ strength was better on certain frames in anime. Because 0.46 had an effectively lower strength, it therefore performed better on those frames. You have never ever posted any comparison clip in which AQ 0.46 offered a "better" distribution of QPs except in that it offered a weaker AQ.

Yet you cannot accept this simple fact, and continue to simply post cherrypicked comparison frames and intentionally misleading comparisons in order to promote an algorithm that does not make sense.

I welcome an attempt to make a better AQ for anime; I have had a number of ideas in mind, in fact, such as intentionally raising QP for flat blocks while putting lambda much lower, to take advantage of the higher deblocking.

I don't welcome an attempt to rip off a broken, bugged version of my own work and, through a careful campaign of misleading comparisons, declare it "better."

DeathTheSheep

5th April 2008, 00:51

Nobody sweepingly declared anything better, and people can determine for themselves whether or not my tests were valid. I strongly believe in their suggestive validity. The higher 0.46 AQ strength was optimal metrically and visually when keyframes (or the frames you "cherrypicked" [that's a funny word] immediately proceeding them!) were respectively boosted. Again, the frames I chose were random. Again, I did not base my comparisons off of still shots.

I welcome your ideas to make AQ better for anime. So much so I'm not re-releasing 0.46, as I said. Please don't misinterpret, and please don't go on the defensive/offensive about this. How about you email me, and I'll take this whole shebang off air, eh?

Dark Shikari

5th April 2008, 00:53

I welcome your ideas to make AQ better for anime.If you want to work on this, drop by #x264dev; I'd be happy to help. I'm somewhat interested in the concept myself, and given your success in getting QNS to work, I suspect you're good enough at coding that given enough guidance you can try out some ideas ;)
So much so I'm not re-releasing 0.46, as I said.Good that's cleared up then.

DeathTheSheep

5th April 2008, 00:56

Hey, I'm not one to so easily forget about that little QNS present you presented me. I owe you big time for help on that. And for that, period. :)

Razorholt

5th April 2008, 01:05

Can we once for all determine how to declare what's best? If comparing frames doesn't make sense at all (Although you DS as well as others is using that method very often) and if SSIM doesn't always make sense either, so what does? I'm talking about professional judgment and not personal taste.

I have the feeling that this little war between VAQ 0.46 partisans and VAQ 1.0 enthusiasts will not end soon... And I don't think it benefits the majority of us (x264 users).

I'm personally spending a lot of hours encoding and comparing VAQ settings because it's the least I can do to “give back”, but I want to know whether my time is spent wisely.

Thanks,
- Dan

DeathTheSheep

5th April 2008, 01:10

Actually, trying to find the "best" is banned in this forum. Even the word. :) You can do metrics, or you can do a double-blind elsewhere.

But, as DS suggested, it's best to pick up 2.0 and optimize that specifically rather than keep around the old guy.

Razorholt

5th April 2008, 01:10

So, SSIM + frames comparison will work or not? - I'm talking about comparing settings.

DeathTheSheep

5th April 2008, 01:13

Why not? Nobody's stopping you, knock yourself out. But in retrospect, 1.0 is committed, and there's a 2.0 under wraps, so why not just integrate 0.46's benefit (not the formula, but some "anime optimization") into 2.0 while it's still not out of the oven?

Razorholt

5th April 2008, 01:33

No no, I'm talking about comparing different settings using VAQ2 exclusively. Oh well, I'll stick to frames + SSIM and that's it. :)

burfadel

5th April 2008, 06:24

So variable fast-pskip, so that is only applies to non-flat areas wouldn't actually work?!

lexor

5th April 2008, 18:58

No no, I'm talking about comparing different settings using VAQ2 exclusively. Oh well, I'll stick to frames + SSIM and that's it. :)

I think the original quote by DS about frame to frame comparison being useless is being misinterpreted. What I think it should say is that you can't take just one frame and compare that frame with different settings. If you take enough frames out of the stream and compare them using different settings, it's valid.

Think of it this way, bits have to go somewhere. If one frame has lost some, another had to gain some to maintain bitrate (especially with 2pass). So selecting like 1 or 2% of frames at random (random is important here) and comparing them is useful and will yield a valid result. Though it will be difficult if the source has a large number of frames (i.e. you aren't testing on a short clip). You can of course reduce the number of frames (2% I would say is an overkill) at the expense of confidence in your conclusion. However don't just use avisynth's selectevery()/even/whatever-else function, you have got to pick randomly as many frames as you are willing to compare. So get a decent pseudo-random number generator (random.org is a good place for a quick one) generate N numbers between 1 and max_num_frames in your clip, then compare the frames corresponding to those numbers.

Dark Shikari

5th April 2008, 20:04

I think the original quote by DS about frame to frame comparison being useless is being misinterpreted. What I think it should say is that you can't take just one frame and compare that frame with different settings. If you take enough frames out of the stream and compare them using different settings, it's valid.One trick is to also compare frame sizes: if two frames are quite different sizes, its probably not a valid comparison.

With P/B frames, one also has to look at the size of the last I-frame and recent frames, too.

DeathTheSheep

5th April 2008, 20:09

Hot diggity dang, QNS is so good it hurts. And so slow it hurts. (Hey, there's always the tradeoff).

It's simply amazing in baseline. It brings CAVLC encoding to the efficiency of CABAC+trellis encoding. Well, you know. Pretty much. It's absolutely, utterly, ridiculously good on low-bitrate anime. No, no, forget AQ for now, this thing is already on the table. This is the first real progress baseline profile has seen since AQ. And that was the first ever, pretty much, since general RDO refinements. Usually CABAC and B-frames and trellis are required when quality is increased, but nobody seemed to care about where the quality hits home the most--in the lowest bitrates, for older computers or crappy decoders or handhelds or small screens and so on.

QNS is astounding. I can't believe something like tucking quant error in strange places does something this...marked. Is there any hope of a speedup??!

Dark Shikari

5th April 2008, 20:27

Hot diggity dang, QNS is so good it hurts. And so slow it hurts. (Hey, there's always the tradeoff).

It's simply amazing in baseline. It brings CAVLC encoding to the efficiency of CABAC+trellis encoding. Well, you know. Pretty much. It's absolutely, utterly, ridiculously good on low-bitrate anime. No, no, forget AQ for now, this thing is already on the table. This is the first real progress baseline profile has seen since AQ. And that was the first ever, pretty much, since general RDO refinements. Usually CABAC and B-frames and trellis are required when quality is increased, but nobody seemed to care about where the quality hits home the most--in the lowest bitrates, for older computers or crappy decoders or handhelds or small screens and so on.

QNS is astounding. I can't believe something like tucking quant error in strange places does something this...marked. Is there any hope of a speedup??!This is because QNS does two things:

1. It serves as a non-optimal version of trellis that works on CAVLC. Normal trellis might actually have to be exponential time to work on CAVLC. This means you get most of the benefits of trellis on CAVLC, at a high speed cost (but not as high as true trellis). One thing you might want to try is to not use the variance-weighting at all (use a constant weight for each pixel) and then change the "error *= 38" line to make the bitrate equal to what it was before in CRF mode; this will make the result of QNS basically like trellis. Considering the amazing benefit of both trellis and CABAC on low-bitrate anime, its not surprising QNS is also so useful.

2. It weights the value of pixels based on their local variance.

Possible speedups:

1. Find a way to estimate the bit cost of a decision without doing an actual macroblock_write. Trellis already has this; obviously its method is CABAC-only.

2. Make a special IDCT function that adds only a single basis vector to an existing IDCT output, since we're only adjusting one coefficient at a time. FFmpeg has something like this for the regular DCT, which is even more important there since MPEG-2/MPEG-4 ASP use much slower DCT algorithms than H.264 does.

3. ASM-ize the quality metric used.

4. Don't look at coefficients that are already zeroed.

5. Don't consider raising or lowering any coefficient more than once.

DeathTheSheep

5th April 2008, 20:46

Varience-weighting, I wonder how it would stack on top of Variance AQ. At the same filesize and settings, I get:
Normal 0.46: SSIM: 0.9752356
w/QNS 0.46: SSIM: 0.9758788

I notice less artifacts in the QNS encode (surprised? :P).
That's a sizable boost even with VAQ on (admittedly I did use 0.46, though, as it was the only one with adjustable sensitivity I had lying around).

So if I disable variance weighting, will this likely go up or down with VAQ? :)

Dark Shikari

5th April 2008, 21:06

So if I disable variance weighting, will this likely go up or down with VAQ? :)Try and see ;)

lexor

5th April 2008, 21:35

One trick is to also compare frame sizes: if two frames are quite different sizes, its probably not a valid comparison.

With P/B frames, one also has to look at the size of the last I-frame and recent frames, too.

It's a bad thing to try to manually find specific frames to compare by an arbitrary criterion like you describe. Just because frames found by that criterion may be better with AQ, the price you pay for them on other frames may be too high. It is the overall quality that we want too see, having a bunch of really good frames counts for little if the rest of the frames tank.

Random selection is as close as we can get to fairness (with large enough sample pool, it will actually achieve fairness), due to natural balancing properties of random selection.

fields_g

6th April 2008, 14:24

No no, I'm talking about comparing different settings using VAQ2 exclusively. Oh well, I'll stick to frames + SSIM and that's it. :)

I've been seeing messages about using PNSR finally disappearing. In a non-AQ world, SSIM has much more meaning than now with VAQ. This is because VAQ, in some cases, will lower SSIM to achieve better subjective quality. This is because SSIM is not a perfect match to the average user's perception (although better then PNSR).

With newer VAQ moving bits between frames, select frames can look worse while a segment of encoded time is better perceptually.

My coding abilities are no where close to many here so I'm not one for implementation, but I think I see the need to complete another tool before we further try to tweak VAQ. Do you guys also see a need for some comparison tool (player)?

Load 2 (or more) encoded files, have a single time slider, blind the user of which is which, Side by side and/or toggled playback, user rating, etc. Have you guys seen what tools are available for audio abx? They even have blinded submissions so results can be combined with other people's results!

We can successfully make crude tweaks to psy, but when we start smaller shifts, we do really need something better.

IgorC

6th April 2008, 18:23

I totally agree with you.
Public ABX is optimal way to go when test psycho visual enhancements.
There was application MSU ABX for video.

lexor

6th April 2008, 20:55

I totally agree with you.
Public ABX is optimal way to go when test psycho visual enhancements.
There was application MSU ABX for video.

Actually I don't think this should be done as a group. Anything but personal tests are not very useful in this case. If you look at the audio ABX tests, they ask one question "can you hear a difference between 2 files?". Testing AQ doesn't just ask the question of "can you see a difference?", it also has to ask "is the difference better?". And "better" is not just a question of more or less noise (as it is in audio abx between multiple formats), it's too much of a personal preference in AQ's case.

I remember when DS first started working on his AQ he asked us to rate a bunch of pics (while fine tuning some params) and the thing that looked best to me was a sharper detailed picture, but others voted for blurrier ones, because that's more DVD like (or so they said).

While ABX will detect difference reliably, in this case it can't really rate quality effectively.

fields_g

7th April 2008, 02:14

lexor,
I see what you are saying about abx about perceivable differences. I think you are right, however if it is tied to using high/low anchors and a rating system, meaningful "x" is better than "y" can be determined. Take for example the 64 kbps encoding comparison here (http://www.listening-tests.info/mf-64-1/results.htm). I think your concern also connects to the important note found on the page:
Important note: These plots represent group preferences (for the particular group of people who participated in the test). Individual preferences vary somewhat. The best codec for a person is dependent on his own preferences and the type of music he prefers.
It is true that it is the rating is based on the participants. And not all participants may agree. That is where the averages and confidence intervals come into play.

But your concern that this should be done independently because personal preference varies, I believe, is wrong. Psy is all about compromises. We strive to have the compromises have as little visual impact on the end product for most people. Opinions will differ at points and what wins out should be what is most pleasing for the majority. Therefore it should be tested as a group, with group data backing our decisions.

I'm not saying that only one version of psy will be committed. Eventually I hope we develop multiple psy models, for example anime, noise, darkness, bitrate, etc and interactions between these. Hopefully scene detection will come into the mix and an auto detect setting would choose the correct model for that scene. We are not looking at psy this way yet, and may not ever. We are trying to find broad psy that works on most for most.

We will not be able to commit every conceived psy into x264 and at some point we will need to trim to the most useful. Personal taste needs to be set aside for group tastes at these times. This is where I believe a comparison tool fits.

I'm not trying to sound overly egalitarian, but it has its place when judging subjectively.

Oh yea... acceptance is also based on if x264 authors are willing to maintain the psy code also.

MythCreator

10th April 2008, 11:30

x264.815.modified.experimental.exe (http://www.fs2you.com/files/3d178ab5-06e9-11dd-a2d4-00142218fc6e/)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919
x264_fix_win_stdin.diff
http://forum.doom9.org/showthread.php?p=1120065#post1120065

Link to x264 patches collected: http://files.x264.nl/x264_patches/

make frofiled in GCC 4.4.0 20080331 experimental,totally for experiment & test

bob0r

10th April 2008, 11:49

x264.816.modified.exe (http://files.x264.nl/x264.816.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

MythCreator

10th April 2008, 12:05

x264.816.modified.experimental.exe (http://www.fs2you.com/files/5cad8a23-06ee-11dd-8be7-0014221b798a/)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919
x264_fix_win_stdin.diff
http://forum.doom9.org/showthread.php?p=1120065#post1120065

Link to x264 patches collected: http://files.x264.nl/x264_patches/

make frofiled in GCC 4.4.0 20080331 experimental,totally for experiment & test

Atak_Snajpera

10th April 2008, 21:36

I cannot find -aq-mode in longhelp :(

MuLTiTaSK

10th April 2008, 22:34

I cannot find -aq-mode in longhelp :(

x264 core:59 r816M 761630d
Syntax: x264 [options] -o outfile infile [widthxheight]

Infile can be raw YUV 4:2:0 (in which case resolution is required),
or YUV4MPEG 4:2:0 (*.y4m),
or AVI or Avisynth if compiled with AVIS support (yes).
Outfile type is selected by filename:
.264 -> Raw bytestream
.mkv -> Matroska
.mp4 -> MP4 if compiled with GPAC support (yes)

Options:

-h, --help List the more commonly used options
--longhelp List all options

Frame-type options:

-I, --keyint <integer> Maximum GOP size [250]
-i, --min-keyint <integer> Minimum GOP size [25]
--scenecut <integer> How aggressively to insert extra I-frames [40]
--pre-scenecut Faster, less precise scenecut detection.
Required and implied by multi-threading.
-b, --bframes <integer> Number of B-frames between I and P [0]
--no-b-adapt Disable adaptive B-frame decision
--b-bias <integer> Influences how often B-frames are used [0]
--b-pyramid Keep some B-frames as references
--no-cabac Disable CABAC
-r, --ref <integer> Number of reference frames [1]
--no-deblock Disable loop filter
-f, --deblock <alpha:beta> Loop filter AlphaC0 and Beta parameters [0:0]
--interlaced Enable pure-interlaced mode (tff)
--tff Alias for --interlaced
--bff Enable pure-interlaced mode (bff)

Ratecontrol:

-q, --qp <integer> Set QP (0=lossless) [26]
-B, --bitrate <integer> Set bitrate (kbit/s)
--crf <float> Quality-based VBR (nominal QP)
--vbv-maxrate <integer> Max local bitrate (kbit/s) [0]
--vbv-bufsize <integer> Enable CBR and set size of the VBV buffer (kbit) [0]
--vbv-init <float> Initial VBV buffer occupancy [0.9]
--qpmin <integer> Set min QP [10]
--qpmax <integer> Set max QP [51]
--qpstep <integer> Set max QP step [4]
--ratetol <float> Allowed variance of average bitrate [1.0]
--ipratio <float> QP factor between I and P [1.40]
--pbratio <float> QP factor between P and B [1.30]
--chroma-qp-offset <integer> QP difference between chroma and luma [0]
--aq-mode <integer> How AQ distributes bits [2]
- 0: Disabled
- 1: Avoid moving bits between frames
- 2: Move bits between frames
--aq-strength <float> Reduces blocking and blurring in flat and
textured areas. [1.0]
- 0.5: weak AQ
- 1.5: strong AQ

-p, --pass <1|2|3> Enable multipass ratecontrol
- 1: First pass, creates stats file
- 2: Last pass, does not overwrite stats file
- 3: Nth pass, overwrites stats file
--stats <string> Filename for 2 pass stats ["x264_2pass.log"]
--rceq <string> Ratecontrol equation ["blurCplx^(1-qComp)"]
--qcomp <float> QP curve compression: 0.0 => CBR, 1.0 => CQP [0.60]
--cplxblur <float> Reduce fluctuations in QP (before curve compression) [20.0]
--qblur <float> Reduce fluctuations in QP (after curve compression) [0.5]
--zones <zone0>/<zone1>/... Tweak the bitrate of some regions of the video
Each zone is of the form
<start frame>,<end frame>,<option>
where <option> is either
q=<integer> (force QP)
or b=<float> (bitrate multiplier)
--qpfile <string> Force frametypes and QPs

Analysis:

-A, --partitions <string> Partitions to consider ["p8x8,b8x8,i8x8,i4x4"]
- p8x8, p4x4, b8x8, i8x8, i4x4
- none, all
(p4x4 requires p8x8. i8x8 requires --8x8dct.)
--direct <string> Direct MV prediction mode ["spatial"]
- none, spatial, temporal, auto
--direct-8x8 <-1|0|1> Direct prediction size [-1]
- 0: 4x4
- 1: 8x8
- -1: smallest possible according to level
-w, --weightb Weighted prediction for B-frames
--me <string> Integer pixel motion estimation method ["hex"]
- dia: diamond search, radius 1 (fast)
- hex: hexagonal search, radius 2
- umh: uneven multi-hexagon search
- esa: exhaustive search
- tesa: hadamard exhaustive search (slow)
--merange <integer> Maximum motion vector search range [16]
--mvrange <integer> Maximum motion vector length [-1 (auto)]
--mvrange-thread <int> Minimum buffer between threads [-1 (auto)]
-m, --subme <integer> Subpixel motion estimation and partition
decision quality: 1=fast, 7=best. [5]
--me-prepass Run an ME prepass on predictors. Requires subme 3 or higher.
--b-rdo RD based mode decision for B-frames. Requires subme 6 or higher.
--mixed-refs Decide references on a per partition basis
--no-chroma-me Ignore chroma in motion estimation
--bime Jointly optimize both MVs in B-frames
-8, --8x8dct Adaptive spatial transform size
-t, --trellis <integer> Trellis RD quantization. Requires CABAC. [0]
- 0: disabled
- 1: enabled only on the final encode of a MB
- 2: enabled on all mode decisions
--no-fast-pskip Disables early SKIP detection on P-frames
--no-dct-decimate Disables coefficient thresholding on P-frames
--nr <integer> Noise reduction [0]

--deadzone-inter <int> Set the size of the inter luma quantization deadzone [21]
--deadzone-intra <int> Set the size of the intra luma quantization deadzone [11]
Deadzones should be in the range 0 - 32.
--cqm <string> Preset quant matrices ["flat"]
- jvt, flat
--cqmfile <string> Read custom quant matrices from a JM-compatible file
Overrides any other --cqm* options.
--cqm4 <list> Set all 4x4 quant matrices
Takes a comma-separated list of 16 integers.
--cqm8 <list> Set all 8x8 quant matrices
Takes a comma-separated list of 64 integers.
--cqm4i, --cqm4p, --cqm8i, --cqm8p
Set both luma and chroma quant matrices
--cqm4iy, --cqm4ic, --cqm4py, --cqm4pc
Set individual quant matrices

Video Usability Info (Annex E):
The VUI settings are not used by the encoder but are merely suggestions to
the playback equipment. See doc/vui.txt for details. Use at your own risk.

--overscan <string> Specify crop overscan setting ["undef"]
- undef, show, crop
--videoformat <string> Specify video format ["undef"]
- component, pal, ntsc, secam, mac, undef
--fullrange <string> Specify full range samples setting ["off"]
- off, on
--colorprim <string> Specify color primaries ["undef"]
- undef, bt709, bt470m, bt470bg
smpte170m, smpte240m, film
--transfer <string> Specify transfer characteristics ["undef"]
- undef, bt709, bt470m, bt470bg, linear,
log100, log316, smpte170m, smpte240m
--colormatrix <string> Specify color matrix setting ["undef"]
- undef, bt709, fcc, bt470bg
smpte170m, smpte240m, GBR, YCgCo
--chromaloc <integer> Specify chroma sample location (0 to 5) [0]

Input/Output:

-o, --output Specify output file
--sar width:height Specify Sample Aspect Ratio
--fps <float|rational> Specify framerate
--seek <integer> First frame to encode
--frames <integer> Maximum number of frames to encode
--level <string> Specify level (as defined by Annex A)

-v, --verbose Print stats for each frame
--progress Show a progress indicator while encoding
--quiet Quiet Mode
--no-psnr Disable PSNR computation
--no-ssim Disable SSIM computation
--threads <integer> Parallel encoding
--thread-input Run Avisynth in its own thread
--non-deterministic Slightly improve quality of SMP, at the cost of repeatability
--no-asm Disable all CPU optimizations
--visualize Show MB types overlayed on the encoded video
--sps-id <integer> Set SPS and PPS id numbers [0]
--aud Use access unit delimiters
--nal-hrd Use NAL HRD parameters
--pulldown <integer> Use 3:2 pulldown
- 32: TBT,BT,BTB,BT pattern
- 64: triple,double *recommended for 720p

Atak_Snajpera

10th April 2008, 22:48

I need stronger glasses :)

bob0r

10th April 2008, 22:49

I need stronger glasses :)

Or put them on (avatar) :rolleyes:

bob0r

12th April 2008, 12:03

x264.818.modified.exe (http://files.x264.nl/x264.818.modified.exe)

General thread:
http://forum.doom9.org/showthread.php?t=130364

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

Link to x264 patches collected: http://files.x264.nl/x264_patches/

SpAwN_gUy

14th April 2008, 15:03

can i post this here?

ok. i've found a solution about howto build x264 in MSVC 2005 ... and even with some patches..
but..

i'm having strange compiler errors about "no ";" before "type"" when x264.gaussian.cplxblur.01.diff is applied (just before first declaration of "double gaussian_weight").. is it fixable?

and i'm a bit confused.. how to determine REVISION, when having access only to daily tar-balls?

is there a way to configure git to use proxy?

any simple howto apply patches? (git-merge applied only x264_hrd_pulldown.04_interlace.diff ... and even not on ALL files...)

upd.can anyone test this?
revision... em.. from 20080410. modified.
gPACK, pthreads - enabled :)

applied patches:
x264_me-prepass_DeathTheSheep.01.diff
x264_2pass_vbv.7.diff
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing

removed...

SpAwN_gUy

15th April 2008, 15:14

ok.. i've made another one... this one is a bit MORE proper :) .. the previous one had constant glitches on some scenes of my testencode... this one went just fine... if anyone interested...

BTW.. i've changed a bit first patch... 'cause under MSVC it still gave me
"error C2143: syntax error : missing ';' before 'type' d:\CVS\x264farm-GUI\x264\build\x264\encoder\ratecontrol.c line: 1757"

so i had to make it like this:
if(weight < .0001){
break;
}
else {
double gaussian_weight = weight * exp(-j*j/200.0);
weight_sum += gaussian_weight;
cplx_sum += gaussian_weight * (qscale2bits(rcj, 1) - rcj->misc_bits);
}

x264 rev.819, modified, MSVC2005 build
General thread:
http://forum.doom9.org/showthread.php?t=130364

x264.gaussian.cplxblur.01.diff
Dark Shikari: - gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
x264_me-prepass_DeathTheSheep.01.diff
http://forum.doom9.org/showthread.php?p=1093523
x264_2pass_vbv.7.diff
http://thread.gmane.org/gmane.comp.video.x264.devel/3093/focus=3748
x264_hrd_pulldown.04_interlace.diff
- HRD and pulldown for HD compatibility, updated patch for interlacing
http://forum.doom9.org/showthread.php?p=1047919#post1047919

grab it here (http://rapidshare.com/files/107703602/x264.819.msvc2005.modified.exe)

buzzqw

15th April 2008, 15:37

@ALL

please apply the x264_fix_win_stdin.diff too!

BHH