Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd October 2007, 01:04   #41  |  Link
Terranigma
*Space Reserved*
 
Terranigma's Avatar
 
Join Date: May 2006
Posts: 953
Quote:
Originally Posted by Dark Shikari View Post
Let's say the original gave a 2% quality boost.

42% better quality over the original ME Prepass = 2.84% quality boost.
good enough for me, and it's faster to boot.
Terranigma is offline   Reply With Quote
Old 2nd October 2007, 07:18   #42  |  Link
DeathTheSheep
<The VFW Sheep of Death>
 
DeathTheSheep's Avatar
 
Join Date: Dec 2004
Location: Deathly pasture of VFW
Posts: 1,149
They behave identically on my system. Bit-for-bit identical outputs, and no speed boost to boot!

I'm using the 9.29KB me-prepass diff from the first post. It's a bit bigger than the one I used before, so I assume it's new.
Started from fresh r680 source and applied (in order) satd, subme7, me-prepass.

Yep.

[edit] Ah, wait, finally at merange 4 I notice a teensy weensy bit of difference (<.1%). Probably compiler differences, though, since I updated GCC. But anyone who uses merange 4 is truly insane, and for a different reason.
__________________
Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds!

Last edited by DeathTheSheep; 2nd October 2007 at 07:30.
DeathTheSheep is offline   Reply With Quote
Old 2nd October 2007, 07:36   #43  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by DeathTheSheep View Post
They behave identically on my system. Bit-for-bit identical outputs, and no speed boost to boot!

I'm using the 9.29KB me-prepass diff from the first post. It's a bit bigger than the one I used before, so I assume it's new.
Started from fresh r680 source and applied (in order) satd, subme7, me-prepass.

Yep.

[edit] Ah, wait, finally at merange 4 I notice a teensy weensy bit of difference (<.1%). Probably compiler differences, though, since I updated GCC. But anyone who uses merange 4 is truly insane, and for a different reason.
That would be because the one in the original post is the old ME patch, which still hasn't been updated
Dark Shikari is offline   Reply With Quote
Old 2nd October 2007, 07:40   #44  |  Link
DeathTheSheep
<The VFW Sheep of Death>
 
DeathTheSheep's Avatar
 
Join Date: Dec 2004
Location: Deathly pasture of VFW
Posts: 1,149
Quote:
Originally Posted by morph166955 View Post
Ok I just updated the first post a little, needs a few more tweaks. I also updated my site with a few of the patches and made some diffs that are clean against r680. Most notably, I made a diff on the new ME_Prepass that you posted the code for above as well as making a clean diff for the faster-dia patch. Both are on my site and the links are above. I'm going to try to keep my site updated with diff's as well as Cef's for people who want them.
Really? He seems to indicate otherwise. So does the difference in filesize... But I'll actually have a look at the code now...
__________________
Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds!
DeathTheSheep is offline   Reply With Quote
Old 2nd October 2007, 08:20   #45  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by DeathTheSheep View Post
Really? He seems to indicate otherwise. So does the difference in filesize... But I'll actually have a look at the code now...
The numbers in the patch look quite different from those in the diff I posted
Dark Shikari is offline   Reply With Quote
Old 2nd October 2007, 16:47   #46  |  Link
fields_g
x264... Brilliant!
 
Join Date: Mar 2005
Location: Rockville, MD
Posts: 167
Would it be possible to have a reversion number line commented into the diff file, or is that against file syntax? I'd love to be able to say "compare version xxx with yyy"!
fields_g is offline   Reply With Quote
Old 2nd October 2007, 16:58   #47  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by fields_g View Post
Would it be possible to have a reversion number line commented into the diff file, or is that against file syntax? I'd love to be able to say "compare version xxx with yyy"!
SVN diff does this.
Dark Shikari is offline   Reply With Quote
Old 2nd October 2007, 17:29   #48  |  Link
fields_g
x264... Brilliant!
 
Join Date: Mar 2005
Location: Rockville, MD
Posts: 167
Quote:
Originally Posted by Dark Shikari View Post
SVN diff does this.
Great! Trying to identify a patch revision as "the one found in post xxx" or for example "the original AQ" vs. "Dark Shikari's old AQ" vs. "Dark Shikari's new AQ" is a bit complicated/limited if there is more than a couple variations. (Don't you love that we have bright people here developing new things to try?)

This will help people who are making builds explicitly describe what is in their builds also!
fields_g is offline   Reply With Quote
Old 2nd October 2007, 17:50   #49  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
I still think --me-prepass should be added as a default option. It should be enabled on principle for subme modes 6 and definately 7, and optional on 3,4,5. Realistically, if people choose subme mode 7 they're aiming for quality/filesize, it hardly would seem logical to select subme 7 but refuse to use the --me-prepass command!
burfadel is offline   Reply With Quote
Old 2nd October 2007, 18:19   #50  |  Link
fields_g
x264... Brilliant!
 
Join Date: Mar 2005
Location: Rockville, MD
Posts: 167
Quote:
Originally Posted by burfadel View Post
I still think --me-prepass should be added as a default option. It should be enabled on principle for subme modes 6 and definately 7, and optional on 3,4,5. Realistically, if people choose subme mode 7 they're aiming for quality/filesize, it hardly would seem logical to select subme 7 but refuse to use the --me-prepass command!
Interesting... I'm not sure, but either you are suggesting a new approach or are mixing two different (though related) things together.

1) There is a ME type: Dia, Hex, UMH, ESA
2) There is a subpixel refinement of 1-7

Discussion before as questioned making prepass dependent on ME type (if ESA then ON, else OFF), not subpixel refinement. I'll let someone else comment on how wise it would be to connect prepass to subpixel refinement.
fields_g is offline   Reply With Quote
Old 2nd October 2007, 19:01   #51  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
I didn't mean to connect it in that sense it could be also suggested to have --me-prepass enabled when UMH mode is selected (no point for ESA I believe?...), just as a matter of principle, since mode 7 or UMH are usually selected for quality.
burfadel is offline   Reply With Quote
Old 2nd October 2007, 22:37   #52  |  Link
Terranigma
*Space Reserved*
 
Terranigma's Avatar
 
Join Date: May 2006
Posts: 953
Quote:
Originally Posted by burfadel View Post
I still think --me-prepass should be added as a default option.
Yes, I agree. Aku, any chance we'll ever see this in the svn? I could care less now about imh, but prepass, otoh, is pretty useful with esa as you've shown from your graphical comparisons.
Terranigma is offline   Reply With Quote
Old 3rd October 2007, 02:36   #53  |  Link
DeathTheSheep
<The VFW Sheep of Death>
 
DeathTheSheep's Avatar
 
Join Date: Dec 2004
Location: Deathly pasture of VFW
Posts: 1,149
How the heck do you apply this diff to the source? What program?!

I always get crap like this every time I apply these patches:
Code:
$ patch -u -p1  < subme7.diff
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: encoder/me.c
|===================================================================
|--- encoder/me.c       (revision 676)
|+++ encoder/me.c       (working copy)
--------------------------
File to patch: encoder/me.c
patching file `encoder/me.c'
Hunk #1 succeeded at 28 (offset 1 line).
Hunk #2 succeeded at 853 (offset 51 lines).
patch unexpectedly ends in middle of line
Hunk #3 FAILED at 912.
1 out of 3 hunks FAILED -- saving rejects to encoder/me.c.rej
Then I manually patch. And that's just for subme.. Take a look at prepass:
Code:
$ patch -u -p1  < me-prepass.diff
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: common/common.c
|===================================================================
|--- common/common.c    (revision 675)
|+++ common/common.c    (working copy)
--------------------------
File to patch: common/common.c
patching file `common/common.c'
Hunk #1 succeeded at 444 (offset 3 lines).
Hunk #2 succeeded at 882 with fuzz 2 (offset 1 line).
can't find file to patch at input line 26
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: encoder/me.c
|===================================================================
|--- encoder/me.c    (revision 675)
|+++ encoder/me.c    (working copy)
--------------------------
File to patch: encoder/me.c
patching file `encoder/me.c'
Hunk #1 succeeded at 65 (offset 4 lines).
patch: **** malformed patch at line 142: +                }
And nothing happens at all. No .rej is created to manually patch off of like for subme, so I have to go through line by line and type it all in by hand.

I think I'm going to memorize this algorithm by heart by the time I'm through with these darn patch problems!

Again, how do you guys do it?! I'm in msys 1.0 using the standard $ patch program... the settings I used are shown above...
__________________
Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds!
DeathTheSheep is offline   Reply With Quote
Old 3rd October 2007, 03:14   #54  |  Link
DeathTheSheep
<The VFW Sheep of Death>
 
DeathTheSheep's Avatar
 
Join Date: Dec 2004
Location: Deathly pasture of VFW
Posts: 1,149
And what exactly happened to cost_mv_hpel? You go right from cost_mv to cost_mv_hpel2... It's still used, but its define is gone?

Maybe you just have some odd organization and moved it somewhere else in the code so the patch's context is off. I'm glad I caught that, though--I just wonder what other crucial instructions I've unwittingly overwritten as I blindly followed the patch?!
__________________
Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds!

Last edited by DeathTheSheep; 3rd October 2007 at 03:17.
DeathTheSheep is offline   Reply With Quote
Old 3rd October 2007, 03:20   #55  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by DeathTheSheep View Post
And what exactly happened to cost_mv_hpel? You go right from cost_mv to cost_mv_hpel2... It's still used, but its define is gone?

Maybe you just have some odd organization and moved it somewhere else in the code so the patch's context is off. I'm glad I caught that, though--I just wonder what other crucial instructions I've unwittingly overwritten as I blindly followed the patch?!
Hpel is in the original code, so it doesn't need to be defined again
Dark Shikari is offline   Reply With Quote
Old 3rd October 2007, 03:36   #56  |  Link
DeathTheSheep
<The VFW Sheep of Death>
 
DeathTheSheep's Avatar
 
Join Date: Dec 2004
Location: Deathly pasture of VFW
Posts: 1,149
True, but in the context of your patch, it jumps right from cost_mv (which is already defined) to cost_mv_hpel2. Meaning when I insert it over the context (first and last lines in your patch), it is no longer in the original source!

Meaning, of course, when I Ctrl+A to select all the code for each code block in the patch and paste that block into the .c, I simply start from the first context line and overwrite everything in the original until the last context line, meaning everything in between is overwritten with the lines of the patch.

I then go in and delete all the little "+" signs next to the added lines and manually remove all the lines marked with "-." I know hpel is not "removed" as in marked with the "-", but if you look at your patch's context lines...
So you can understand why the patch threw me off .
Code:
Index: encoder/me.c
===================================================================
--- encoder/me.c    (revision 675)
+++ encoder/me.c    (working copy)
@@ -61,6 +61,23 @@
     COPY3_IF_LT( bpred_cost, cost, bpred_mx, mx, bpred_my, my ); \ (this is the first line I overwrote, extending to the end...)
 }
 
<but hpel was in here, so it disappeared when I overwrote it with this patch, since it's obviously not here now!>

+#define COST_MV_HPEL2( mx, my, cost ) \
+{ \
+    int stride = 16; \
+    uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
+    cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
+             + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
+}
__________________
Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds!

Last edited by DeathTheSheep; 3rd October 2007 at 03:39.
DeathTheSheep is offline   Reply With Quote
Old 3rd October 2007, 03:53   #57  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Sorry if my diffing skills are nonexistent
Dark Shikari is offline   Reply With Quote
Old 3rd October 2007, 03:59   #58  |  Link
DeathTheSheep
<The VFW Sheep of Death>
 
DeathTheSheep's Avatar
 
Join Date: Dec 2004
Location: Deathly pasture of VFW
Posts: 1,149
Lol, no problem. But next time could you put up the whole function (or source code?) instead of the diff? Much easier to manually apply that way.

Oh, I noticed the new prepass beefs up the filesize along with the SSIM at constant quantization. Is this normal, or is something b0rked for me?

And quality remains constant (and filesize increases!) as merange is increased... FtW? Tested with esa, of course... Satd.

[edit]Yes, as I suspected there is something hideously wrong here. Without any prepass at all, differs drastically from an old build without it. Yeah, some patched sources would help like crazy. XD
__________________
Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds!

Last edited by DeathTheSheep; 3rd October 2007 at 04:13.
DeathTheSheep is offline   Reply With Quote
Old 3rd October 2007, 04:50   #59  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by DeathTheSheep View Post
Lol, no problem. But next time could you put up the whole function (or source code?) instead of the diff? Much easier to manually apply that way.

Oh, I noticed the new prepass beefs up the filesize along with the SSIM at constant quantization. Is this normal, or is something b0rked for me?

And quality remains constant (and filesize increases!) as merange is increased... FtW? Tested with esa, of course... Satd.

[edit]Yes, as I suspected there is something hideously wrong here. Without any prepass at all, differs drastically from an old build without it. Yeah, some patched sources would help like crazy. XD
Here is the beginning of my source up to the start of ME-DIA and such:

Code:
/*****************************************************************************
 * me.c: h264 encoder library (Motion Estimation)
 *****************************************************************************
 * Copyright (C) 2003 Laurent Aimar
 * $Id: me.c,v 1.1 2004/06/03 19:27:08 fenrir Exp $
 *
 * Authors: Laurent Aimar <fenrir@via.ecp.fr>
 *          Loren Merritt <lorenm@u.washington.edu>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111, USA.
 *****************************************************************************/

#include "common/common.h"
#include "me.h"
#include <limits.h>

/* presets selected from good points on the speed-vs-quality curve of several test videos
 * subpel_iters[i_subpel_refine] = { refine_hpel, refine_qpel, me_hpel, me_qpel }
 * where me_* are the number of EPZS iterations run on all candidate block types,
 * and refine_* are run only on the winner. */
 //The --subme 7 values are much higher because since they get the motion search
 //closer to the optimal value, they actually tend to save time in the more intensive
 //RD search that follows.
static const int subpel_iterations[][4] = 
   {{1,0,0,0},
    {1,1,0,0},
    {0,1,1,0},
    {0,2,1,0},
    {0,2,1,1},
    {0,2,1,2},
    {0,0,2,2},
    {0,0,4,10}};

static void refine_subpel( x264_t *h, x264_me_t *m, int hpel_iters, int qpel_iters, int *p_halfpel_thresh, int b_refine_qpel );

#define BITS_MVD( mx, my )\
    (p_cost_mvx[(mx)<<2] + p_cost_mvy[(my)<<2])

#define COST_MV( mx, my )\
{\
    int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE,\
                   &p_fref[(my)*m->i_stride[0]+(mx)], m->i_stride[0] )\
             + BITS_MVD(mx,my);\
    COPY3_IF_LT( bcost, cost, bmx, mx, bmy, my );\
}

#define COST_MV_HPEL( mx, my ) \
{ \
    int stride = 16; \
    uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
    int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
             + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
    COPY3_IF_LT( bpred_cost, cost, bpred_mx, mx, bpred_my, my ); \
}

#define COST_MV_HPEL2( mx, my, cost ) \
{ \
    int stride = 16; \
    uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
    cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
             + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
}

#define COST_MV_HPEL3( mx, my) \
{ \
    int stride = 16; \
    uint8_t *src = h->mc.get_ref( m->p_fref, m->i_stride[0], pix, &stride, mx, my, bw, bh ); \
    int cost = h->pixf.fpelcmp[i_pixel]( m->p_fenc[0], FENC_STRIDE, src, stride ) \
             + p_cost_mvx[ mx ] + p_cost_mvy[ my ]; \
    COPY3_IF_LT( bestcost, cost, bestx, mx, besty, my ); \
}

#define COST_MV_X3_DIR( m0x, m0y, m1x, m1y, m2x, m2y, costs )\
{\
    uint8_t *pix_base = p_fref + bmx + bmy*m->i_stride[0];\
    h->pixf.fpelcmp_x3[i_pixel]( m->p_fenc[0],\
        pix_base + (m0x) + (m0y)*m->i_stride[0],\
        pix_base + (m1x) + (m1y)*m->i_stride[0],\
        pix_base + (m2x) + (m2y)*m->i_stride[0],\
        m->i_stride[0], costs );\
    (costs)[0] += BITS_MVD( bmx+(m0x), bmy+(m0y) );\
    (costs)[1] += BITS_MVD( bmx+(m1x), bmy+(m1y) );\
    (costs)[2] += BITS_MVD( bmx+(m2x), bmy+(m2y) );\
}

#define COST_MV_X4( m0x, m0y, m1x, m1y, m2x, m2y, m3x, m3y )\
{\
    uint8_t *pix_base = p_fref + omx + omy*m->i_stride[0];\
    h->pixf.fpelcmp_x4[i_pixel]( m->p_fenc[0],\
        pix_base + (m0x) + (m0y)*m->i_stride[0],\
        pix_base + (m1x) + (m1y)*m->i_stride[0],\
        pix_base + (m2x) + (m2y)*m->i_stride[0],\
        pix_base + (m3x) + (m3y)*m->i_stride[0],\
        m->i_stride[0], costs );\
    costs[0] += BITS_MVD( omx+(m0x), omy+(m0y) );\
    costs[1] += BITS_MVD( omx+(m1x), omy+(m1y) );\
    costs[2] += BITS_MVD( omx+(m2x), omy+(m2y) );\
    costs[3] += BITS_MVD( omx+(m3x), omy+(m3y) );\
    COPY3_IF_LT( bcost, costs[0], bmx, omx+(m0x), bmy, omy+(m0y) );\
    COPY3_IF_LT( bcost, costs[1], bmx, omx+(m1x), bmy, omy+(m1y) );\
    COPY3_IF_LT( bcost, costs[2], bmx, omx+(m2x), bmy, omy+(m2y) );\
    COPY3_IF_LT( bcost, costs[3], bmx, omx+(m3x), bmy, omy+(m3y) );\
}

#define COST_MV_X4_ABS( m0x, m0y, m1x, m1y, m2x, m2y, m3x, m3y )\
{\
    h->pixf.fpelcmp_x4[i_pixel]( m->p_fenc[0],\
        p_fref + (m0x) + (m0y)*m->i_stride[0],\
        p_fref + (m1x) + (m1y)*m->i_stride[0],\
        p_fref + (m2x) + (m2y)*m->i_stride[0],\
        p_fref + (m3x) + (m3y)*m->i_stride[0],\
        m->i_stride[0], costs );\
    costs[0] += p_cost_mvx[m0x<<2]; /* no cost_mvy */\
    costs[1] += p_cost_mvx[m1x<<2];\
    costs[2] += p_cost_mvx[m2x<<2];\
    costs[3] += p_cost_mvx[m3x<<2];\
    COPY3_IF_LT( bcost, costs[0], bmx, m0x, bmy, m0y );\
    COPY3_IF_LT( bcost, costs[1], bmx, m1x, bmy, m1y );\
    COPY3_IF_LT( bcost, costs[2], bmx, m2x, bmy, m2y );\
    COPY3_IF_LT( bcost, costs[3], bmx, m3x, bmy, m3y );\
}

/*  1  */
/* 101 */
/*  1  */
#define DIA1_ITER( mx, my )\
{\
    omx = mx; omy = my;\
    COST_MV_X4( 0,-1, 0,1, -1,0, 1,0 );\
}

#define DIA2_ITER( mx, my )\
{\
    omx = mx; omy = my;\
    COST_MV_X4( 0,-2, 0,2, -2,0, 2,0 );\
}

#define CROSS( start, x_max, y_max )\
{\
    i = start;\
    if( x_max <= X264_MIN(mv_x_max-omx, omx-mv_x_min) )\
        for( ; i < x_max-2; i+=4 )\
            COST_MV_X4( i,0, -i,0, i+2,0, -i-2,0 );\
    for( ; i < x_max; i+=2 )\
    {\
        if( omx+i <= mv_x_max )\
            COST_MV( omx+i, omy );\
        if( omx-i >= mv_x_min )\
            COST_MV( omx-i, omy );\
    }\
    i = start;\
    if( y_max <= X264_MIN(mv_y_max-omy, omy-mv_y_min) )\
        for( ; i < y_max-2; i+=4 )\
            COST_MV_X4( 0,i, 0,-i, 0,i+2, 0,-i-2 );\
    for( ; i < y_max; i+=2 )\
    {\
        if( omy+i <= mv_y_max )\
            COST_MV( omx, omy+i );\
        if( omy-i >= mv_y_min )\
            COST_MV( omx, omy-i );\
    }\
}

#define ME_HEX(X,Y,range)\
{\
	static const int mod6[8] = {5,0,1,2,3,4,5,0};\
	bmx = X;\
	bmy = Y;\
	dir = -2;\
	COST_MV_X3_DIR( -2,0, -1, 2,  1, 2, costs   );\
	COST_MV_X3_DIR(  2,0,  1,-2, -1,-2, costs+3 );\
	COPY2_IF_LT( bcost, costs[0], dir, 0 );\
	COPY2_IF_LT( bcost, costs[1], dir, 1 );\
	COPY2_IF_LT( bcost, costs[2], dir, 2 );\
	COPY2_IF_LT( bcost, costs[3], dir, 3 );\
	COPY2_IF_LT( bcost, costs[4], dir, 4 );\
	COPY2_IF_LT( bcost, costs[5], dir, 5 );\
	if( dir != -2 )	{\
		static const int hex2[8][2] = {{-1,-2}, {-2,0}, {-1,2}, {1,2}, {2,0}, {1,-2}, {-1,-2}, {-2,0}};\
		bmx += hex2[dir+1][0];\
		bmy += hex2[dir+1][1];\
		for( i = 1; i < range && CHECK_MVRANGE(bmx, bmy); i++ )\
		{\
			const int odir = mod6[dir+1];\
			COST_MV_X3_DIR( hex2[odir+0][0], hex2[odir+0][1],\
							hex2[odir+1][0], hex2[odir+1][1],\
							hex2[odir+2][0], hex2[odir+2][1],\
							costs );\
			dir = -2;\
			COPY2_IF_LT( bcost, costs[0], dir, odir-1 );\
			COPY2_IF_LT( bcost, costs[1], dir, odir   );\
			COPY2_IF_LT( bcost, costs[2], dir, odir+1 );\
			if( dir == -2 ) break;\
			bmx += hex2[dir+1][0];\
			bmy += hex2[dir+1][1];}\
		if(dir == -2 || bcost > bestCost) {}\
		else{\
			for( i = 1; i < range && CHECK_MVRANGE(bmx, bmy); i++ )\
			{\
				const int odir = mod6[dir+1];\
				COST_MV_X3_DIR( hex2[odir+0][0], hex2[odir+0][1],\
								hex2[odir+1][0], hex2[odir+1][1],\
								hex2[odir+2][0], hex2[odir+2][1],\
								costs );\
				dir = -2;\
				COPY2_IF_LT( bcost, costs[0], dir, odir-1 );\
				COPY2_IF_LT( bcost, costs[1], dir, odir   );\
				COPY2_IF_LT( bcost, costs[2], dir, odir+1 );\
				if( dir == -2 ) break;\
				bmx += hex2[dir+1][0];\
				bmy += hex2[dir+1][1];}}}\
	omx = bmx; omy = bmy;\
	COST_MV_X4(  0,-1,  0,1, -1,0, 1,0 );\
	COST_MV_X4( -1,-1, -1,1, 1,-1, 1,1 );\
}\

void x264_me_search_ref( x264_t *h, x264_me_t *m, int (*mvc)[2], int i_mvc, int *p_halfpel_thresh )
{
    int cost;
    const int bw = x264_pixel_size[m->i_pixel].w;
    const int bh = x264_pixel_size[m->i_pixel].h;
    const int i_pixel = m->i_pixel;
    int i_me_range = h->param.analyse.i_me_range;
    int bmx, bmy, bcost;
    int bpred_mx = 0, bpred_my = 0, bpred_cost = COST_MAX;
    int omx, omy, pmx, pmy;
    uint8_t *p_fref = m->p_fref[0];
    DECLARE_ALIGNED( uint8_t, pix[16*16], 16 );
    
    int i, j;
    int dir;
    int costs[6];

    int mv_x_min = h->mb.mv_min_fpel[0];
    int mv_y_min = h->mb.mv_min_fpel[1];
    int mv_x_max = h->mb.mv_max_fpel[0];
    int mv_y_max = h->mb.mv_max_fpel[1];
	int mv_x_min4 = h->mb.mv_min_fpel[0]<<2;
    int mv_y_min4 = h->mb.mv_min_fpel[1]<<2;
    int mv_x_max4 = h->mb.mv_max_fpel[0]<<2;
    int mv_y_max4 = h->mb.mv_max_fpel[1]<<2;

#define CHECK_MVRANGE(mx,my) ( mx >= mv_x_min && mx <= mv_x_max && my >= mv_y_min && my <= mv_y_max )
#define CHECK_MVRANGE4(mx,my) ( mx >= mv_x_min4 && mx <= mv_x_max4 && my >= mv_y_min4 && my <= mv_y_max4 )

    const int16_t *p_cost_mvx = m->p_cost_mv - m->mvp[0];
    const int16_t *p_cost_mvy = m->p_cost_mv - m->mvp[1];

    bmx = x264_clip3( m->mvp[0], mv_x_min*4, mv_x_max*4 );
    bmy = x264_clip3( m->mvp[1], mv_y_min*4, mv_y_max*4 );
    pmx = ( bmx + 2 ) >> 2;
    pmy = ( bmy + 2 ) >> 2;
    bcost = COST_MAX;
    
    /* try extra predictors if provided */
    if( h->mb.i_subpel_refine >= 3 )
    {
        COST_MV_HPEL( bmx, bmy );
        if(!h->param.analyse.i_me_prepass)
        {
            for( i = 0; i < i_mvc; i++ )
            {
                 const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
                 const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
                 if( mx != bpred_mx || my != bpred_my )
                     COST_MV_HPEL( mx, my );
            }
        }
        else
        {
            for( i = 0; i < i_mvc; i++ )
            {
                const int mx = x264_clip3( mvc[i][0], mv_x_min*4, mv_x_max*4 );
                const int my = x264_clip3( mvc[i][1], mv_y_min*4, mv_y_max*4 );
				int doSearch = 1;
				int j;
				for(j = 0; j < i; j++)
				{
					if(mvc[i][0] == mvc[j][0] && mvc[i][1] == mvc[j][1]) doSearch = 0;
				}
                if( ( mx != bpred_mx || my != bpred_my ) && doSearch)
                {
                    int bestcost;
                    int bestx = mx;
                    int besty = my;
                    COST_MV_HPEL2( mx, my, bestcost );
                    COPY3_IF_LT( bpred_cost, bestcost, bpred_mx, bestx, bpred_my, besty );
                    if(bestcost < 2*bpred_cost)
                    {
                        int n;
                        int dir = -2;
                        COST_MV_HPEL2(bestx-4,besty,costs[0]);
                        COST_MV_HPEL2(bestx-2,besty+4,costs[1]);
                        COST_MV_HPEL2(bestx+2,besty+4,costs[2]);
                        COST_MV_HPEL2(bestx+4,besty,costs[3]);
                        COST_MV_HPEL2(bestx+2,besty-4,costs[4]);
                        COST_MV_HPEL2(bestx-2,besty-4,costs[5]);
                        COPY2_IF_LT( bestcost, costs[0], dir, 0 );
                        COPY2_IF_LT( bestcost, costs[1], dir, 1 );
                        COPY2_IF_LT( bestcost, costs[2], dir, 2 );
                        COPY2_IF_LT( bestcost, costs[3], dir, 3 );
                        COPY2_IF_LT( bestcost, costs[4], dir, 4 );
                        COPY2_IF_LT( bestcost, costs[5], dir, 5 );
                        if( dir != -2 )
                        {
                            static const int hex2[8][2] = {{-2,-4}, {-4,0}, {-2,4}, {2,4}, {4,0}, {2,-4}, {-2,-4}, {-4,0}};
                            bestx += hex2[dir+1][0];
                            besty += hex2[dir+1][1];
                            for( n = 1; n < i_me_range && CHECK_MVRANGE4(bestx, besty); n++ )
                            {
                                static const int mod6[8] = {5,0,1,2,3,4,5,0};
                                const int odir = mod6[dir+1];
                                COST_MV_HPEL2(hex2[odir+0][0]+bestx,hex2[odir+0][1]+besty,costs[0]);
                                COST_MV_HPEL2(hex2[odir+1][0]+bestx,hex2[odir+1][1]+besty,costs[1]);
                                COST_MV_HPEL2(hex2[odir+2][0]+bestx,hex2[odir+2][1]+besty,costs[2]);
                                dir = -2;
                                COPY2_IF_LT( bestcost, costs[0], dir, odir-1 );
                                COPY2_IF_LT( bestcost, costs[1], dir, odir   );
                                COPY2_IF_LT( bestcost, costs[2], dir, odir+1 );
                                if( dir == -2 )
                                    break;
                                bestx += hex2[dir+1][0];
                                besty += hex2[dir+1][1];
                            }
                        }
                        COST_MV_HPEL3(bestx+2,besty-2);
                        COST_MV_HPEL3(bestx+2,besty);
                        COST_MV_HPEL3(bestx+2,besty+2);
                        COST_MV_HPEL3(bestx,besty-2);
                        COST_MV_HPEL3(bestx,besty+2);
                        COST_MV_HPEL3(bestx-2,besty-2);
                        COST_MV_HPEL3(bestx-2,besty);
                        COST_MV_HPEL3(bestx-2,besty+2);
                        COPY3_IF_LT(bpred_cost,bestcost,bpred_mx,bestx,bpred_my,besty);
                    }
                }
            }
        }
        bmx = ( bpred_mx + 2 ) >> 2;
        bmy = ( bpred_my + 2 ) >> 2;
        COST_MV( bmx, bmy );
    }
    else
    {
        /* check the MVP */
        COST_MV( pmx, pmy );
        /* I don't know why this helps */
        bcost -= BITS_MVD(bmx,bmy);
        
        for( i = 0; i < i_mvc; i++ )
        {
             const int mx = x264_clip3( ( mvc[i][0] + 2 ) >> 2, mv_x_min, mv_x_max );
             const int my = x264_clip3( ( mvc[i][1] + 2 ) >> 2, mv_y_min, mv_y_max );
             if( mx != bmx || my != bmy )
                 COST_MV( mx, my );
        }
    }
    
    COST_MV( 0, 0 );
Dark Shikari is offline   Reply With Quote
Old 3rd October 2007, 04:53   #60  |  Link
DeathTheSheep
<The VFW Sheep of Death>
 
DeathTheSheep's Avatar
 
Join Date: Dec 2004
Location: Deathly pasture of VFW
Posts: 1,149
This is with subme7 patch and satd, obviously, which is good. Any other patches in here that would cause conflicts? And I assume this is r680?

If this is all clear, this is ready and rearin' to go!!
__________________
Recommended all-in-one stop for x264/GCC needs on Windows: Komisar x264 builds!
DeathTheSheep is offline   Reply With Quote
Reply

Tags
h.264, x264, x264 builds, x264 patches, x264 unofficial builds

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:58.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.