Current Patches, Where to get them, How they affect speed/output - Page 3

Terranigma · 2nd October 2007, 01:04

Quote:

Originally Posted by Dark Shikari

Let's say the original gave a 2% quality boost.

42% better quality over the original ME Prepass = 2.84% quality boost.

good enough for me, and it's faster to boot.

DeathTheSheep · 2nd October 2007, 07:18

They behave identically on my system. Bit-for-bit identical outputs, and no speed boost to boot!

I'm using the 9.29KB me-prepass diff from the first post. It's a bit bigger than the one I used before, so I assume it's new.
Started from fresh r680 source and applied (in order) satd, subme7, me-prepass.

Yep.

[edit] Ah, wait, finally at merange 4 I notice a teensy weensy bit of difference (<.1%). Probably compiler differences, though, since I updated GCC.

But anyone who uses merange 4 is truly insane, and for a different reason.

Dark Shikari · 2nd October 2007, 07:36

Quote:

Originally Posted by DeathTheSheep

They behave identically on my system. Bit-for-bit identical outputs, and no speed boost to boot!

I'm using the 9.29KB me-prepass diff from the first post. It's a bit bigger than the one I used before, so I assume it's new.
Started from fresh r680 source and applied (in order) satd, subme7, me-prepass.

Yep.

[edit] Ah, wait, finally at merange 4 I notice a teensy weensy bit of difference (<.1%). Probably compiler differences, though, since I updated GCC.

But anyone who uses merange 4 is truly insane, and for a different reason.

That would be because the one in the original post is the old ME patch, which still hasn't been updated

DeathTheSheep · 2nd October 2007, 07:40

Quote:

Originally Posted by morph166955

Ok I just updated the first post a little, needs a few more tweaks. I also updated my site with a few of the patches and made some diffs that are clean against r680. Most notably, I made a diff on the new ME_Prepass that you posted the code for above as well as making a clean diff for the faster-dia patch. Both are on my site and the links are above. I'm going to try to keep my site updated with diff's as well as Cef's for people who want them.

Really? He seems to indicate otherwise. So does the difference in filesize... But I'll actually have a look at the code now...

Dark Shikari · 2nd October 2007, 08:20

Quote:

Originally Posted by DeathTheSheep

Really? He seems to indicate otherwise. So does the difference in filesize... But I'll actually have a look at the code now...

The numbers in the patch look quite different from those in the diff I posted

fields_g · 2nd October 2007, 16:47

Would it be possible to have a reversion number line commented into the diff file, or is that against file syntax? I'd love to be able to say "compare version xxx with yyy"!

Dark Shikari · 2nd October 2007, 16:58

Quote:

Originally Posted by fields_g

Would it be possible to have a reversion number line commented into the diff file, or is that against file syntax? I'd love to be able to say "compare version xxx with yyy"!

SVN diff does this.

fields_g · 2nd October 2007, 17:29

Quote:

Originally Posted by Dark Shikari

SVN diff does this.

Great! Trying to identify a patch revision as "the one found in post xxx" or for example "the original AQ" vs. "Dark Shikari's old AQ" vs. "Dark Shikari's new AQ" is a bit complicated/limited if there is more than a couple variations. (Don't you love that we have bright people here developing new things to try?)

This will help people who are making builds explicitly describe what is in their builds also!

burfadel · 2nd October 2007, 17:50

I still think --me-prepass should be added as a default option. It should be enabled on principle for subme modes 6 and definately 7, and optional on 3,4,5. Realistically, if people choose subme mode 7 they're aiming for quality/filesize, it hardly would seem logical to select subme 7 but refuse to use the --me-prepass command!

fields_g · 2nd October 2007, 18:19

Quote:

Originally Posted by burfadel

I still think --me-prepass should be added as a default option. It should be enabled on principle for subme modes 6 and definately 7, and optional on 3,4,5. Realistically, if people choose subme mode 7 they're aiming for quality/filesize, it hardly would seem logical to select subme 7 but refuse to use the --me-prepass command!

Interesting... I'm not sure, but either you are suggesting a new approach or are mixing two different (though related) things together.

1) There is a ME type: Dia, Hex, UMH, ESA
2) There is a subpixel refinement of 1-7

Discussion before as questioned making prepass dependent on ME type (if ESA then ON, else OFF), not subpixel refinement. I'll let someone else comment on how wise it would be to connect prepass to subpixel refinement.

burfadel · 2nd October 2007, 19:01

I didn't mean to connect it in that sense

it could be also suggested to have --me-prepass enabled when UMH mode is selected (no point for ESA I believe?...), just as a matter of principle, since mode 7 or UMH are usually selected for quality.

Terranigma · 2nd October 2007, 22:37

Quote:

Originally Posted by burfadel

I still think --me-prepass should be added as a default option.

Yes, I agree. Aku, any chance we'll ever see this in the svn? I could care less now about imh, but prepass, otoh, is pretty useful with esa as you've shown from your graphical comparisons.

DeathTheSheep · 3rd October 2007, 02:36

How the heck do you apply this diff to the source? What program?!

I always get crap like this every time I apply these patches:

Code:

$ patch -u -p1  < subme7.diff
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: encoder/me.c
|===================================================================
|--- encoder/me.c       (revision 676)
|+++ encoder/me.c       (working copy)
--------------------------
File to patch: encoder/me.c
patching file `encoder/me.c'
Hunk #1 succeeded at 28 (offset 1 line).
Hunk #2 succeeded at 853 (offset 51 lines).
patch unexpectedly ends in middle of line
Hunk #3 FAILED at 912.
1 out of 3 hunks FAILED -- saving rejects to encoder/me.c.rej

Then I manually patch. And that's just for subme.. Take a look at prepass:

Code:

$ patch -u -p1  < me-prepass.diff
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: common/common.c
|===================================================================
|--- common/common.c    (revision 675)
|+++ common/common.c    (working copy)
--------------------------
File to patch: common/common.c
patching file `common/common.c'
Hunk #1 succeeded at 444 (offset 3 lines).
Hunk #2 succeeded at 882 with fuzz 2 (offset 1 line).
can't find file to patch at input line 26
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: encoder/me.c
|===================================================================
|--- encoder/me.c    (revision 675)
|+++ encoder/me.c    (working copy)
--------------------------
File to patch: encoder/me.c
patching file `encoder/me.c'
Hunk #1 succeeded at 65 (offset 4 lines).
patch: **** malformed patch at line 142: +                }

And nothing happens at all. No .rej is created to manually patch off of like for subme, so I have to go through line by line and type it all in by hand.

I think I'm going to memorize this algorithm by heart by the time I'm through with these darn patch problems!

Again, how do you guys do it?! I'm in msys 1.0 using the standard $ patch program... the settings I used are shown above...

DeathTheSheep · 3rd October 2007, 03:14

And what exactly happened to cost_mv_hpel? You go right from cost_mv to cost_mv_hpel2... It's still used, but its define is gone?

Maybe you just have some odd organization and moved it somewhere else in the code so the patch's context is off. I'm glad I caught that, though--I just wonder what other crucial instructions I've unwittingly overwritten as I blindly followed the patch?!

Dark Shikari · 3rd October 2007, 03:20

Quote:

Originally Posted by DeathTheSheep

And what exactly happened to cost_mv_hpel? You go right from cost_mv to cost_mv_hpel2... It's still used, but its define is gone?

Maybe you just have some odd organization and moved it somewhere else in the code so the patch's context is off. I'm glad I caught that, though--I just wonder what other crucial instructions I've unwittingly overwritten as I blindly followed the patch?!

Hpel is in the original code, so it doesn't need to be defined again

DeathTheSheep · 3rd October 2007, 03:36

True, but in the context of your patch, it jumps right from cost_mv (which is already defined) to cost_mv_hpel2. Meaning when I insert it over the context (first and last lines in your patch), it is no longer in the original source!

Meaning, of course, when I Ctrl+A to select all the code for each code block in the patch and paste that block into the .c, I simply start from the first context line and overwrite everything in the original until the last context line, meaning everything in between is overwritten with the lines of the patch.

I then go in and delete all the little "+" signs next to the added lines and manually remove all the lines marked with "-." I know hpel is not "removed" as in marked with the "-", but if you look at your patch's context lines...
So you can understand why the patch threw me off

.

Code:

Index: encoder/me.c
===================================================================
--- encoder/me.c    (revision 675)
+++ encoder/me.c    (working copy)
@@ -61,6 +61,23 @@
     COPY3_IF_LT( bpred_cost, cost, bpred_mx, mx, bpred_my, my ); \ (this is the first line I overwrote, extending to the end...)
 }
 
<but hpel was in here, so it disappeared when I overwrote it with this patch, since it's obviously not here now!>

+#define COST_MV_HPEL2( mx, my, cost ) \
+{ \
+    int stride = 16; \
+    uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
+    cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
+             + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
+}

Dark Shikari · 3rd October 2007, 03:53

Sorry if my diffing skills are nonexistent

DeathTheSheep · 3rd October 2007, 03:59

Lol, no problem. But next time could you put up the whole function (or source code?) instead of the diff? Much easier to manually apply that way.

Oh, I noticed the new prepass beefs up the filesize along with the SSIM at constant quantization. Is this normal, or is something b0rked for me?

And quality remains constant (and filesize increases!) as merange is increased... FtW?

Tested with esa, of course... Satd.

[edit]Yes, as I suspected there is something hideously wrong here. Without any prepass at all, differs drastically from an old build without it. Yeah, some patched sources would help like crazy. XD

Dark Shikari · 3rd October 2007, 04:50

Quote:

Originally Posted by DeathTheSheep

Lol, no problem. But next time could you put up the whole function (or source code?) instead of the diff? Much easier to manually apply that way.

Oh, I noticed the new prepass beefs up the filesize along with the SSIM at constant quantization. Is this normal, or is something b0rked for me?

And quality remains constant (and filesize increases!) as merange is increased... FtW?

Tested with esa, of course... Satd.

[edit]Yes, as I suspected there is something hideously wrong here. Without any prepass at all, differs drastically from an old build without it. Yeah, some patched sources would help like crazy. XD

Here is the beginning of my source up to the start of ME-DIA and such:

Code:

/*****************************************************************************
 * me.c: h264 encoder library (Motion Estimation)
 *****************************************************************************
 * Copyright (C) 2003 Laurent Aimar
 * $Id: me.c,v 1.1 2004/06/03 19:27:08 fenrir Exp $
 *
 * Authors: Laurent Aimar <fenrir@via.ecp.fr>
 *          Loren Merritt <lorenm@u.washington.edu>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111, USA.
 *****************************************************************************/

#include "common/common.h"
#include "me.h"
#include <limits.h>

/* presets selected from good points on the speed-vs-quality curve of several test videos
 * subpel_iters[i_subpel_refine] = { refine_hpel, refine_qpel, me_hpel, me_qpel }
 * where me_* are the number of EPZS iterations run on all candidate block types,
 * and refine_* are run only on the winner. */
 //The --subme 7 values are much higher because since they get the motion search
 //closer to the optimal value, they actually tend to save time in the more intensive
 //RD search that follows.
static const int subpel_iterations[][4] = 
   {{1,0,0,0},
    {1,1,0,0},
    {0,1,1,0},
    {0,2,1,0},
    {0,2,1,1},
    {0,2,1,2},
    {0,0,2,2},
    {0,0,4,10}};

static void refine_subpel( x264_t *h, x264_me_t *m, int hpel_iters, int qpel_iters, int *p_halfpel_thresh, int b_refine_qpel );

#define BITS_MVD( mx, my )\
    (p_cost_mvx[(mx)<<2] + p_cost_mvy[(my)<<2])

#define COST_MV( mx, my )\
{\
    int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE,\
                   &p_fref[(my)*m->i_stride[0]+(mx)], m->i_stride[0] )\
             + BITS_MVD(mx,my);\
    COPY3_IF_LT( bcost, cost, bmx, mx, bmy, my );\
}

#define COST_MV_HPEL( mx, my ) \
{ \
    int stride = 16; \
    uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
    int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
             + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
    COPY3_IF_LT( bpred_cost, cost, bpred_mx, mx, bpred_my, my ); \
}

#define COST_MV_HPEL2( mx, my, cost ) \
{ \
    int stride = 16; \
    uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
    cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
             + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
}

#define COST_MV_HPEL3( mx, my) \
{ \
    int stride = 16; \
    uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
    int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
             + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
    COPY3_IF_LT( bestcost, cost, bestx, mx, besty, my ); \
}

#define COST_MV_X3_DIR( m0x, m0y, m1x, m1y, m2x, m2y, costs )\
{\
    uint8_t *pix_base = p_fref + bmx + bmy*m->i_stride[0];\
    h->pixf.fpelcmp_x3[i_pixel]( m->p_fenc[0],\
        pix_base + (m0x) + (m0y)*m->i_stride[0],\
        pix_base + (m1x) + (m1y)*m->i_stride[0],\
        pix_base + (m2x) + (m2y)*m->i_stride[0],\
        m->i_stride[0], costs );\
    (costs)[0] += BITS_MVD( bmx+(m0x), bmy+(m0y) );\
    (costs)[1] += BITS_MVD( bmx+(m1x), bmy+(m1y) );\
    (costs)[2] += BITS_MVD( bmx+(m2x), bmy+(m2y) );\
}

#define COST_MV_X4( m0x, m0y, m1x, m1y, m2x, m2y, m3x, m3y )\
{\
    uint8_t *pix_base = p_fref + omx + omy*m->i_stride[0];\
    h->pixf.fpelcmp_x4[i_pixel]( m->p_fenc[0],\
        pix_base + (m0x) + (m0y)*m->i_stride[0],\
        pix_base + (m1x) + (m1y)*m->i_stride[0],\
        pix_base + (m2x) + (m2y)*m->i_stride[0],\
        pix_base + (m3x) + (m3y)*m->i_stride[0],\
        m->i_stride[0], costs );\
    costs[0] += BITS_MVD( omx+(m0x), omy+(m0y) );\
    costs[1] += BITS_MVD( omx+(m1x), omy+(m1y) );\
    costs[2] += BITS_MVD( omx+(m2x), omy+(m2y) );\
    costs[3] += BITS_MVD( omx+(m3x), omy+(m3y) );\
    COPY3_IF_LT( bcost, costs[0], bmx, omx+(m0x), bmy, omy+(m0y) );\
    COPY3_IF_LT( bcost, costs[1], bmx, omx+(m1x), bmy, omy+(m1y) );\
    COPY3_IF_LT( bcost, costs[2], bmx, omx+(m2x), bmy, omy+(m2y) );\
    COPY3_IF_LT( bcost, costs[3], bmx, omx+(m3x), bmy, omy+(m3y) );\
}

#define COST_MV_X4_ABS( m0x, m0y, m1x, m1y, m2x, m2y, m3x, m3y )\
{\
    h->pixf.fpelcmp_x4[i_pixel]( m->p_fenc[0],\
        p_fref + (m0x) + (m0y)*m->i_stride[0],\
        p_fref + (m1x) + (m1y)*m->i_stride[0],\
        p_fref + (m2x) + (m2y)*m->i_stride[0],\
        p_fref + (m3x) + (m3y)*m->i_stride[0],\
        m->i_stride[0], costs );\
    costs[0] += p_cost_mvx[m0x<<2]; /* no cost_mvy */\
    costs[1] += p_cost_mvx[m1x<<2];\
    costs[2] += p_cost_mvx[m2x<<2];\
    costs[3] += p_cost_mvx[m3x<<2];\
    COPY3_IF_LT( bcost, costs[0], bmx, m0x, bmy, m0y );\
    COPY3_IF_LT( bcost, costs[1], bmx, m1x, bmy, m1y );\
    COPY3_IF_LT( bcost, costs[2], bmx, m2x, bmy, m2y );\
    COPY3_IF_LT( bcost, costs[3], bmx, m3x, bmy, m3y );\
}

/*  1  */
/* 101 */
/*  1  */
#define DIA1_ITER( mx, my )\
{\
    omx = mx; omy = my;\
    COST_MV_X4( 0,-1, 0,1, -1,0, 1,0 );\
}

#define DIA2_ITER( mx, my )\
{\
    omx = mx; omy = my;\
    COST_MV_X4( 0,-2, 0,2, -2,0, 2,0 );\
}

#define CROSS( start, x_max, y_max )\
{\
    i = start;\
    if( x_max <= X264_MIN(mv_x_max-omx, omx-mv_x_min) )\
        for( ; i < x_max-2; i+=4 )\
            COST_MV_X4( i,0, -i,0, i+2,0, -i-2,0 );\
    for( ; i < x_max; i+=2 )\
    {\
        if( omx+i <= mv_x_max )\
            COST_MV( omx+i, omy );\
        if( omx-i >= mv_x_min )\
            COST_MV( omx-i, omy );\
    }\
    i = start;\
    if( y_max <= X264_MIN(mv_y_max-omy, omy-mv_y_min) )\
        for( ; i < y_max-2; i+=4 )\
            COST_MV_X4( 0,i, 0,-i, 0,i+2, 0,-i-2 );\
    for( ; i < y_max; i+=2 )\
    {\
        if( omy+i <= mv_y_max )\
            COST_MV( omx, omy+i );\
        if( omy-i >= mv_y_min )\
            COST_MV( omx, omy-i );\
    }\
}

#define ME_HEX(X,Y,range)\
{\
	static const int mod6[8] = {5,0,1,2,3,4,5,0};\
	bmx = X;\
	bmy = Y;\
	dir = -2;\
	COST_MV_X3_DIR( -2,0, -1, 2,  1, 2, costs   );\
	COST_MV_X3_DIR(  2,0,  1,-2, -1,-2, costs+3 );\
	COPY2_IF_LT( bcost, costs[0], dir, 0 );\
	COPY2_IF_LT( bcost, costs[1], dir, 1 );\
	COPY2_IF_LT( bcost, costs[2], dir, 2 );\
	COPY2_IF_LT( bcost, costs[3], dir, 3 );\
	COPY2_IF_LT( bcost, costs[4], dir, 4 );\
	COPY2_IF_LT( bcost, costs[5], dir, 5 );\
	if( dir != -2 )	{\
		static const int hex2[8][2] = {{-1,-2}, {-2,0}, {-1,2}, {1,2}, {2,0}, {1,-2}, {-1,-2}, {-2,0}};\
		bmx += hex2[dir+1][0];\
		bmy += hex2[dir+1][1];\
		for( i = 1; i < range && CHECK_MVRANGE(bmx, bmy); i++ )\
		{\
			const int odir = mod6[dir+1];\
			COST_MV_X3_DIR( hex2[odir+0][0], hex2[odir+0][1],\
							hex2[odir+1][0], hex2[odir+1][1],\
							hex2[odir+2][0], hex2[odir+2][1],\
							costs );\
			dir = -2;\
			COPY2_IF_LT( bcost, costs[0], dir, odir-1 );\
			COPY2_IF_LT( bcost, costs[1], dir, odir   );\
			COPY2_IF_LT( bcost, costs[2], dir, odir+1 );\
			if( dir == -2 ) break;\
			bmx += hex2[dir+1][0];\
			bmy += hex2[dir+1][1];}\
		if(dir == -2 || bcost > bestCost) {}\
		else{\
			for( i = 1; i < range && CHECK_MVRANGE(bmx, bmy); i++ )\
			{\
				const int odir = mod6[dir+1];\
				COST_MV_X3_DIR( hex2[odir+0][0], hex2[odir+0][1],\
								hex2[odir+1][0], hex2[odir+1][1],\
								hex2[odir+2][0], hex2[odir+2][1],\
								costs );\
				dir = -2;\
				COPY2_IF_LT( bcost, costs[0], dir, odir-1 );\
				COPY2_IF_LT( bcost, costs[1], dir, odir   );\
				COPY2_IF_LT( bcost, costs[2], dir, odir+1 );\
				if( dir == -2 ) break;\
				bmx += hex2[dir+1][0];\
				bmy += hex2[dir+1][1];}}}\
	omx = bmx; omy = bmy;\
	COST_MV_X4(  0,-1,  0,1, -1,0, 1,0 );\
	COST_MV_X4( -1,-1, -1,1, 1,-1, 1,1 );\
}\

void x264_me_search_ref( x264_t *h, x264_me_t *m, int (*mvc)[2], int i_mvc, int *p_halfpel_thresh )
{
    int cost;
    const int bw = x264_pixel_size[m->i_pixel].w;
    const int bh = x264_pixel_size[m->i_pixel].h;
    const int i_pixel = m->i_pixel;
    int i_me_range = h->param.analyse.i_me_range;
    int bmx, bmy, bcost;
    int bpred_mx = 0, bpred_my = 0, bpred_cost = COST_MAX;
    int omx, omy, pmx, pmy;
    uint8_t *p_fref = m->p_fref[0];
    DECLARE_ALIGNED( uint8_t, pix[16*16], 16 );
    
    int i, j;
    int dir;
    int costs[6];

    int mv_x_min = h->mb.mv_min_fpel[0];
    int mv_y_min = h->mb.mv_min_fpel[1];
    int mv_x_max = h->mb.mv_max_fpel[0];
    int mv_y_max = h->mb.mv_max_fpel[1];
	int mv_x_min4 = h->mb.mv_min_fpel[0]<<2;
    int mv_y_min4 = h->mb.mv_min_fpel[1]<<2;
    int mv_x_max4 = h->mb.mv_max_fpel[0]<<2;
    int mv_y_max4 = h->mb.mv_max_fpel[1]<<2;

#define CHECK_MVRANGE(mx,my) ( mx >= mv_x_min && mx <= mv_x_max && my >= mv_y_min && my <= mv_y_max )
#define CHECK_MVRANGE4(mx,my) ( mx >= mv_x_min4 && mx <= mv_x_max4 && my >= mv_y_min4 && my <= mv_y_max4 )

    const int16_t *p_cost_mvx = m->p_cost_mv - m->mvp[0];
    const int16_t *p_cost_mvy = m->p_cost_mv - m->mvp[1];

    bmx = x264_clip3( m->mvp[0], mv_x_min*4, mv_x_max*4 );
    bmy = x264_clip3( m->mvp[1], mv_y_min*4, mv_y_max*4 );
    pmx = ( bmx + 2 ) >> 2;
    pmy = ( bmy + 2 ) >> 2;
    bcost = COST_MAX;
    
    /* try extra predictors if provided */
    if( h->mb.i_subpel_refine >= 3 )
    {
        COST_MV_HPEL( bmx, bmy );
        if(!h->param.analyse.i_me_prepass)
        {
            for( i = 0; i < i_mvc; i++ )
            {
                 const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
                 const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
                 if( mx != bpred_mx || my != bpred_my )
                     COST_MV_HPEL( mx, my );
            }
        }
        else
        {
            for( i = 0; i < i_mvc; i++ )
            {
                const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
                const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
				int doSearch = 1;
				int j;
				for(j = 0; j < i; j++)
				{
					if(mvc[i][0] == mvc[j][0] && mvc[i][1] == mvc[j][1]) doSearch = 0;
				}
                if( ( mx != bpred_mx || my != bpred_my ) && doSearch)
                {
                    int bestcost;
                    int bestx = mx;
                    int besty = my;
                    COST_MV_HPEL2( mx, my, bestcost );
                    COPY3_IF_LT( bpred_cost, bestcost, bpred_mx, bestx, bpred_my, besty );
                    if(bestcost < 2*bpred_cost)
                    {
                        int n;
                        int dir = -2;
                        COST_MV_HPEL2(bestx-4,besty,costs[0]);
                        COST_MV_HPEL2(bestx-2,besty+4,costs[1]);
                        COST_MV_HPEL2(bestx+2,besty+4,costs[2]);
                        COST_MV_HPEL2(bestx+4,besty,costs[3]);
                        COST_MV_HPEL2(bestx+2,besty-4,costs[4]);
                        COST_MV_HPEL2(bestx-2,besty-4,costs[5]);
                        COPY2_IF_LT( bestcost, costs[0], dir, 0 );
                        COPY2_IF_LT( bestcost, costs[1], dir, 1 );
                        COPY2_IF_LT( bestcost, costs[2], dir, 2 );
                        COPY2_IF_LT( bestcost, costs[3], dir, 3 );
                        COPY2_IF_LT( bestcost, costs[4], dir, 4 );
                        COPY2_IF_LT( bestcost, costs[5], dir, 5 );
                        if( dir != -2 )
                        {
                            static const int hex2[8][2] = {{-2,-4}, {-4,0}, {-2,4}, {2,4}, {4,0}, {2,-4}, {-2,-4}, {-4,0}};
                            bestx += hex2[dir+1][0];
                            besty += hex2[dir+1][1];
                            for( n = 1; n < i_me_range && CHECK_MVRANGE4(bestx, besty); n++ )
                            {
                                static const int mod6[8] = {5,0,1,2,3,4,5,0};
                                const int odir = mod6[dir+1];
                                COST_MV_HPEL2(hex2[odir+0][0]+bestx,hex2[odir+0][1]+besty,costs[0]);
                                COST_MV_HPEL2(hex2[odir+1][0]+bestx,hex2[odir+1][1]+besty,costs[1]);
                                COST_MV_HPEL2(hex2[odir+2][0]+bestx,hex2[odir+2][1]+besty,costs[2]);
                                dir = -2;
                                COPY2_IF_LT( bestcost, costs[0], dir, odir-1 );
                                COPY2_IF_LT( bestcost, costs[1], dir, odir   );
                                COPY2_IF_LT( bestcost, costs[2], dir, odir+1 );
                                if( dir == -2 )
                                    break;
                                bestx += hex2[dir+1][0];
                                besty += hex2[dir+1][1];
                            }
                        }
                        COST_MV_HPEL3(bestx+2,besty-2);
                        COST_MV_HPEL3(bestx+2,besty);
                        COST_MV_HPEL3(bestx+2,besty+2);
                        COST_MV_HPEL3(bestx,besty-2);
                        COST_MV_HPEL3(bestx,besty+2);
                        COST_MV_HPEL3(bestx-2,besty-2);
                        COST_MV_HPEL3(bestx-2,besty);
                        COST_MV_HPEL3(bestx-2,besty+2);
                        COPY3_IF_LT(bpred_cost,bestcost,bpred_mx,bestx,bpred_my,besty);
                    }
                }
            }
        }
        bmx = ( bpred_mx + 2 ) >> 2;
        bmy = ( bpred_my + 2 ) >> 2;
        COST_MV( bmx, bmy );
    }
    else
    {
        /* check the MVP */
        COST_MV( pmx, pmy );
        /* I don't know why this helps */
        bcost -= BITS_MVD(bmx,bmy);
        
        for( i = 0; i < i_mvc; i++ )
        {
             const int mx = x264_clip3( ( mvc[i][0] + 2 ) >> 2, mv_x_min, mv_x_max );
             const int my = x264_clip3( ( mvc[i][1] + 2 ) >> 2, mv_y_min, mv_y_max );
             if( mx != bmx || my != bmy )
                 COST_MV( mx, my );
        }
    }
    
    COST_MV( 0, 0 );

DeathTheSheep · 3rd October 2007, 04:53

This is with subme7 patch and satd, obviously, which is good. Any other patches in here that would cause conflicts? And I assume this is r680?

If this is all clear, this is ready and rearin' to go!!

2nd October 2007, 07:18	#42 \| Link
DeathTheSheep <The VFW Sheep of Death> Join Date: Dec 2004 Location: Deathly pasture of VFW Posts: 1,149	They behave identically on my system. Bit-for-bit identical outputs, and no speed boost to boot! I'm using the 9.29KB me-prepass diff from the first post. It's a bit bigger than the one I used before, so I assume it's new. Started from fresh r680 source and applied (in order) satd, subme7, me-prepass. Yep. [edit] Ah, wait, finally at merange 4 I notice a teensy weensy bit of difference (<.1%). Probably compiler differences, though, since I updated GCC. But anyone who uses merange 4 is truly insane, and for a different reason. __________________ Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds! Last edited by DeathTheSheep; 2nd October 2007 at 07:30.

3rd October 2007, 03:14	#54 \| Link
DeathTheSheep <The VFW Sheep of Death> Join Date: Dec 2004 Location: Deathly pasture of VFW Posts: 1,149	And what exactly happened to cost_mv_hpel? You go right from cost_mv to cost_mv_hpel2... It's still used, but its define is gone? Maybe you just have some odd organization and moved it somewhere else in the code so the patch's context is off. I'm glad I caught that, though--I just wonder what other crucial instructions I've unwittingly overwritten as I blindly followed the patch?! __________________ Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds! Last edited by DeathTheSheep; 3rd October 2007 at 03:17.

3rd October 2007, 03:36	#56 \| Link
DeathTheSheep <The VFW Sheep of Death> Join Date: Dec 2004 Location: Deathly pasture of VFW Posts: 1,149	True, but in the context of your patch, it jumps right from cost_mv (which is already defined) to cost_mv_hpel2. Meaning when I insert it over the context (first and last lines in your patch), it is no longer in the original source! Meaning, of course, when I Ctrl+A to select all the code for each code block in the patch and paste that block into the .c, I simply start from the first context line and overwrite everything in the original until the last context line, meaning everything in between is overwritten with the lines of the patch. I then go in and delete all the little "+" signs next to the added lines and manually remove all the lines marked with "-." I know hpel is not "removed" as in marked with the "-", but if you look at your patch's context lines... So you can understand why the patch threw me off . Code: Index: encoder/me.c =================================================================== --- encoder/me.c (revision 675) +++ encoder/me.c (working copy) @@ -61,6 +61,23 @@ COPY3_IF_LT( bpred_cost, cost, bpred_mx, mx, bpred_my, my ); \ (this is the first line I overwrote, extending to the end...) } <but hpel was in here, so it disappeared when I overwrote it with this patch, since it's obviously not here now!> +#define COST_MV_HPEL2( mx, my, cost ) \ +{ \ + int stride = 16; \ + uint8_t src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \ + cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \ + + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \ +} __________________ Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds! Last edited by DeathTheSheep; 3rd October 2007 at 03:39.*

3rd October 2007, 03:59	#58 \| Link
DeathTheSheep <The VFW Sheep of Death> Join Date: Dec 2004 Location: Deathly pasture of VFW Posts: 1,149	Lol, no problem. But next time could you put up the whole function (or source code?) instead of the diff? Much easier to manually apply that way. Oh, I noticed the new prepass beefs up the filesize along with the SSIM at constant quantization. Is this normal, or is something b0rked for me? And quality remains constant (and filesize increases!) as merange is increased... FtW? Tested with esa, of course... Satd. [edit]Yes, as I suspected there is something hideously wrong here. Without any prepass at all, differs drastically from an old build without it. Yeah, some patched sources would help like crazy. XD __________________ Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds! Last edited by DeathTheSheep; 3rd October 2007 at 04:13.

3rd October 2007, 04:53	#60 \| Link
DeathTheSheep <The VFW Sheep of Death> Join Date: Dec 2004 Location: Deathly pasture of VFW Posts: 1,149	This is with subme7 patch and satd, obviously, which is good. Any other patches in here that would cause conflicts? And I assume this is r680? If this is all clear, this is ready and rearin' to go!! __________________ Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds!

2nd October 2007, 16:47	#46 \| Link
fields_g x264... Brilliant! Join Date: Mar 2005 Location: Rockville, MD Posts: 167	Would it be possible to have a reversion number line commented into the diff file, or is that against file syntax? I'd love to be able to say "compare version xxx with yyy"!

2nd October 2007, 17:50	#49 \| Link
burfadel Registered User Join Date: Aug 2006 Posts: 2,229	I still think --me-prepass should be added as a default option. It should be enabled on principle for subme modes 6 and definately 7, and optional on 3,4,5. Realistically, if people choose subme mode 7 they're aiming for quality/filesize, it hardly would seem logical to select subme 7 but refuse to use the --me-prepass command!

2nd October 2007, 19:01	#51 \| Link
burfadel Registered User Join Date: Aug 2006 Posts: 2,229	I didn't mean to connect it in that sense it could be also suggested to have --me-prepass enabled when UMH mode is selected (no point for ESA I believe?...), just as a matter of principle, since mode 7 or UMH are usually selected for quality.

3rd October 2007, 03:53	#57 \| Link
Dark Shikari x264 developer Join Date: Sep 2005 Posts: 8,666	Sorry if my diffing skills are nonexistent

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode