Variance AQ Megathread (AQ v0.48 update--defaults changed) [Archive]

Dark Shikari

15th December 2007, 10:57

So, after collecting enough magic pixie dust, I've come up with an AQ algorithm that just might work. Its purpose is to avoid blocking in flat areas like regular AQ, but more importantly, avoid blurring in relatively flat textured areas, such as grass at a football game or film grain. It seems to be relatively ineffective (or worse than no AQ at all) on low-bitrate anime, but I have gotten reports that its quite useful at higher bitrates, so try it yourself and see.

Patch (http://files.x264.nl/AQ/AQ_0.48.diff)
Build (http://files.x264.nl/AQ/x264.736.dark.aq.0.48.exe)
PthreadGC2.dll if you need it (http://mirror05.x264.nl/Dark/force.php?file=./pthreadGC2.dll).
A 5 minute or so 1080p sample encoded with the new AQ (0.45) at 5 megabits per second (http://mirror05.x264.nl/Dark/force.php?file=./PlanetEarthSample.mkv)
a 44 second sample of 1080p encoded at 10 megabits with AQ 0.47 (strength 0.9, sensitivity 14, qcomp=1) (http://x264.nl/x264.736.aq.0.47.mkv)

How to use AQ:

1. AQ is on by default at strength 0.5. Change --aq-strength to make it stronger or weaker.
2. In 2pass, use AQ on both passes with the same settings.
3. Watch in wonder as all the blurred details come back to life and the SSIM of your video rises.

Version history:

0.48: AQ strength 0.5, sensitivity 13 made the defaults. Updated to r736. Qcomp is now scaled based on AQ strength automatically.
0.47: Rounding error fixed at low QPs. Code cleanup/optimization by Akupenguin.
0.46: Code cleanup, documentation updated, defaults changed based on testing, and RDRC removed from the main patch in preparation for putting AQ in SVN. RDRC builds and patches can still be found on Mirror05/Dark.
0.45: Fixed bug if variance=0. Additionally, when static sensitivity is used, no limit is put on the quantizers other than qpmin/qpmax; this allows one to use AQ as a form of quasi-ratecontrol to redistribute more bits to flatter frames to improve quality.
0.44: Bug in x264 CABAC encoding (mb_qp_delta) fixed. While this isn't a bug in AQ, it only showed up when using CABAC, interlaced mode, and AQ.
0.431: Crash bug fixed.
0.43: Code cleaned and re-organized. Massive speed increase of the AQ itself due to performance optimizations.
0.42: Lambda-based AQ removed due to incompatibilities with deadzone that will take some work to resolve.

0.4 (Huge overhaul):

1. Totally rewritten AQ. Same basic concept, but now uses a logarithmic scale instead of a hackneyed exponential one.
2. For B-frames, uses a tricky bit of lambda-changing instead of QP changing; this requires absolutely no bits for QP-deltas!
3. For P-frames, uses a slight bit of trickery to reduce the bit cost of QP deltas.
4. Totally rewritten, far faster automatic sensitivity. Respects bitrate in CRF mode better also.
5. Now based on r720.

0.3 (Major overhaul of automatic thresholding and options)
0.21 (Fixed overflow bug with variance function. Code now is allowed to raise quantizers. If this causes problems I may restrict it somewhat.)
0.2 (Re-introduced pre-pass for automagic thresholding)
0.11 (Heavily optimized code)
0.1 (Initial release, fixed scaling formula, removed pre-pass)
0.01 (Initial algorithm)

Results from a test of AQ 0.42:

(1-0.9769377) / (1- 0.9799512) = 15% SSIM boost
(37.751-36.792)/0.05 = 19% PSNR drop
7416.06 / 6960.12 - 1 = 6.55% bitrate drop

Inventive Software

15th December 2007, 10:59

Trust you to come up with this an hour before I go home for 3 weeks! :D

I'll have a play next week, if you're not too busy changing the innards of the algorithm. ;)

ToS_Maverick

15th December 2007, 12:23

OMG Dark Shikari you are a HERO :D

i could only do a quick test for now, but @CRF20 it was COMPLETELY transparent, at CRF22 it was VERY good...

a few questions:
- how does it work, it's very effective?
- where can i donate :D

more detailed test with screens coming soon ;)

Dark Shikari

15th December 2007, 12:30

OMG Dark Shikari you are a HERO :D

i could only do a quick test for now, but @CRF20 it was COMPLETELY transparent, at CRF22 it was VERY good...

a few questions:
- how does it work, it's very effective?
- where can i donate :D

more detailed test with screens coming soon ;)
It works by using the AC energy of each macroblock as a metric. Or, in other words, it takes the average of the block's pixels, subtracts those from the original pixel data, and then takes the sum of squares of the result. This means that blocks that aren't completely flat but have a lot of texture still get hit by AQ.

There are some slight optimizations to this (like the fact that this is mathematically equivalent to SSD - SAD^2, and so forth). Also note I made a major change to the build that's up there (I reuploaded)--it now has --aq-sensitivity affect the algorithm. This affects the thresholding in the following way: lower values mean that blocks have to be "flatter" to be affected by AQ, while higher values mean blocks don't have to be as flat.

AGDenton

15th December 2007, 16:54

Can you do an svn diff against some revision of x264 ? I'd like to test this, but I'm not under win32...

Dark Shikari

15th December 2007, 20:26

Can you do an svn diff against some revision of x264 ? I'd like to test this, but I'm not under win32...

Index: encoder/encoder.c
===================================================================
--- encoder/encoder.c (revision 712)
+++ encoder/encoder.c (working copy)
@@ -472,6 +472,8 @@
if( !h->param.b_cabac )
h->param.analyse.i_trellis = 0;
h->param.analyse.i_trellis = x264_clip3( h->param.analyse.i_trellis, 0, 2 );
+ if( h->param.analyse.b_aq && h->param.analyse.f_aq_strength <= 0 )
+ h->param.analyse.b_aq = 0;
h->param.analyse.i_noise_reduction = x264_clip3( h->param.analyse.i_noise_reduction, 0, 1<<16 );

{
Index: encoder/analyse.c
===================================================================
--- encoder/analyse.c (revision 712)
+++ encoder/analyse.c (working copy)
@@ -29,6 +29,7 @@
#endif

#include "common/common.h"
+#include "common/cpu.h"
#include "macroblock.h"
#include "me.h"
#include "ratecontrol.h"
@@ -2031,8 +2032,61 @@
}
}

+//Finds the total AC energy of the block in all planes.
+static int ac_energy_mb(x264_t *h)
+{
+ DECLARE_ALIGNED( static uint8_t, zero[FDEC_STRIDE*16], 16 );
+ int avg[3];
+ int x,y;
+ for(y = 0; y < 16; y++)
+ for(x = 0; x < 16; x++)
+ zero[FDEC_STRIDE*y+x]=0;
+ avg[0] = h->pixf.sad[PIXEL_16x16](zero,FDEC_STRIDE,h->mb.pic.p_fenc[0],FENC_STRIDE) >> 8;
+ avg[1] = h->pixf.sad[PIXEL_8x8](zero,FDEC_STRIDE,h->mb.pic.p_fenc[1],FENC_STRIDE) >> 6;
+ avg[2] = h->pixf.sad[PIXEL_8x8](zero,FDEC_STRIDE,h->mb.pic.p_fenc[2],FENC_STRIDE) >> 6;
+ int totalSSD = 0;
+ for(y = 0; y < 16; y++)
+ for(x = 0; x < 16; x++)
+ zero[FDEC_STRIDE*y+x]=avg[0];
+ totalSSD += h->pixf.ssd[PIXEL_16x16](zero,FDEC_STRIDE,h->mb.pic.p_fenc[0],FENC_STRIDE);
+ for(y = 0; y < 8; y++)
+ for(x = 0; x < 8; x++)
+ zero[FDEC_STRIDE*y+x]=avg[1];
+ totalSSD += h->pixf.ssd[PIXEL_8x8](zero,FDEC_STRIDE,h->mb.pic.p_fenc[1],FENC_STRIDE);
+ for(y = 0; y < 8; y++)
+ for(x = 0; x < 8; x++)
+ zero[FDEC_STRIDE*y+x]=avg[2];
+ totalSSD += h->pixf.ssd[PIXEL_8x8](zero,FDEC_STRIDE,h->mb.pic.p_fenc[2],FENC_STRIDE);
+ return totalSSD;
+}

/*****************************************************************************
+ * x264_adaptive_quant:
+ * check if mb is "flat", i.e. has most energy in low frequency components, and
+ * adjust qp down if it is
+ *****************************************************************************/
+void x264_adaptive_quant( x264_t *h, x264_mb_analysis_t *a )
+{
+ int qp = h->mb.i_qp;
+ int ac_energy = ac_energy_mb(h);
+ x264_cpu_restore(h->param.cpu);
+ float result = ac_energy;
+ const float expconst = 0.367879441;
+ float threshold = powf(h->param.analyse.f_aq_sensitivity,4)/2;
+ if(result < threshold)
+ {
+ if(result == 0) result = 1;
+ else
+ result = (expconst-expf(-powf(threshold/result,0.2))) * 2.71828183;
+ }
+ else result = 0;
+ int qp_adj = (qp * result * h->param.analyse.f_aq_strength) / 2;
+ qp_adj = x264_clip3(qp_adj, 0, qp/2);
+ h->mb.i_qp = a->i_qp = qp - qp_adj;
+ h->mb.i_chroma_qp = i_chroma_qp_table[x264_clip3( h->mb.i_qp + h->pps->i_chroma_qp_index_offset, 0, 51 )];
+}
+
+/*****************************************************************************
* x264_macroblock_analyse:
*****************************************************************************/
void x264_macroblock_analyse( x264_t *h )
@@ -2040,9 +2094,14 @@
x264_mb_analysis_t analysis;
int i_cost = COST_MAX;
int i;
+
+ h->mb.i_qp = x264_ratecontrol_qp( h );

+ if( h->param.analyse.b_aq )
+ x264_adaptive_quant( h, &analysis );
+
/* init analysis */
- x264_mb_analyse_init( h, &analysis, x264_ratecontrol_qp( h ) );
+ x264_mb_analyse_init( h, &analysis, h->mb.i_qp );

/*--------------------------- Do the analysis ---------------------------*/
if( h->sh.i_type == SLICE_TYPE_I )
Index: x264.c
===================================================================
--- x264.c (revision 712)
+++ x264.c (working copy)
@@ -243,6 +243,12 @@
" - 2: enabled on all mode decisions\n", defaults->analyse.i_trellis );
H0( " --no-fast-pskip Disables early SKIP detection on P-frames\n" );
H0( " --no-dct-decimate Disables coefficient thresholding on P-frames\n" );
+ H0( " --aq-strength <float> Amount to adjust QP per MB [%.1f]\n"
+ " 0.0: no AQ\n"
+ " 1.1: strong AQ\n", defaults->analyse.f_aq_strength );
+ H0( " --aq-sensitivity <float> \"Flatness\" threshold to trigger AQ [%.1f]\n"
+ " 5: applies to almost no blocks\n"
+ " 35: applies to almost all blocks\n", defaults->analyse.f_aq_sensitivity );
H0( " --nr <integer> Noise reduction [%d]\n", defaults->analyse.i_noise_reduction );
H1( "\n" );
H1( " --deadzone-inter <int> Set the size of the inter luma quantization deadzone [%d]\n", defaults->analyse.i_luma_deadzone[0] );
@@ -406,6 +412,8 @@
{ "trellis", required_argument, NULL, 't' },
{ "no-fast-pskip", no_argument, NULL, 0 },
{ "no-dct-decimate", no_argument, NULL, 0 },
+ { "aq-strength", required_argument, NULL, 0 },
+ { "aq-sensitivity", required_argument, NULL, 0 },
{ "deadzone-inter", required_argument, NULL, '0' },
{ "deadzone-intra", required_argument, NULL, '0' },
{ "level", required_argument, NULL, 0 },
Index: common/pixel.c
===================================================================
--- common/pixel.c (revision 712)
+++ common/pixel.c (working copy)
@@ -213,6 +213,14 @@
PIXEL_SATD_C( x264_pixel_satd_4x8, 4, 8 )
PIXEL_SATD_C( x264_pixel_satd_4x4, 4, 4 )

+static int x264_pixel_count_8x8( uint8_t *pix, int i_pix, uint32_t threshold )
+{
+ int x, y, sum = 0;
+ for( y=0; y<8; y++, pix += i_pix )
+ for( x=0; x<8; x++ )
+ sum += pix[x] > (uint8_t)threshold;
+ return sum;
+}

/****************************************************************************
* pixel_sa8d_WxH: sum of 8x8 Hadamard transformed differences
@@ -473,6 +481,8 @@
pixf->ads[PIXEL_16x8] = pixel_ads2;
pixf->ads[PIXEL_8x8] = pixel_ads1;

+ pixf->count_8x8 = x264_pixel_count_8x8;
+
#ifdef HAVE_MMX
if( cpu&X264_CPU_MMX )
{
Index: common/pixel.h
===================================================================
--- common/pixel.h (revision 712)
+++ common/pixel.h (working copy)
@@ -84,6 +84,8 @@
void (*ads[7])( int enc_dc[4], uint16_t *sums, int delta,
uint16_t *res, int width );

+ int (*count_8x8)( uint8_t *pix, int i_pix, uint32_t threshold );
+
/* calculate satd of V, H, and DC modes.
* may be NULL, in which case just use pred+satd instead. */
void (*intra_satd_x3_16x16)( uint8_t *fenc, uint8_t *fdec, int res[3] );
Index: common/common.c
===================================================================
--- common/common.c (revision 712)
+++ common/common.c (working copy)
@@ -123,6 +123,9 @@
param->analyse.i_chroma_qp_offset = 0;
param->analyse.b_fast_pskip = 1;
param->analyse.b_dct_decimate = 1;
+ param->analyse.b_aq = 0;
+ param->analyse.f_aq_strength = 0.0;
+ param->analyse.f_aq_sensitivity = 15;
param->analyse.i_luma_deadzone[0] = 21;
param->analyse.i_luma_deadzone[1] = 11;
param->analyse.b_psnr = 1;
@@ -455,6 +458,13 @@
p->analyse.b_fast_pskip = atobool(value);
OPT("dct-decimate")
p->analyse.b_dct_decimate = atobool(value);
+ OPT("aq-strength")
+ {
+ p->analyse.f_aq_strength = atof(value);
+ p->analyse.b_aq = (p->analyse.f_aq_strength > 0.0);
+ }
+ OPT("aq-sensitivity")
+ p->analyse.f_aq_sensitivity = atof(value);
OPT("deadzone-inter")
p->analyse.i_luma_deadzone[0] = atoi(value);
OPT("deadzone-intra")
@@ -939,6 +949,9 @@
s += sprintf( s, " zones" );
}

+ if( p->analyse.b_aq )
+ s += sprintf( s, " aq=1:%.1f:%.1f", p->analyse.f_aq_strength, p->analyse.f_aq_sensitivity );
+
return buf;
}

Index: x264.h
===================================================================
--- x264.h (revision 712)
+++ x264.h (working copy)
@@ -230,6 +230,9 @@
int i_trellis; /* trellis RD quantization */
int b_fast_pskip; /* early SKIP detection on P-frames */
int b_dct_decimate; /* transform coefficient thresholding on P-frames */
+ int b_aq; /* psy adaptive QP */
+ float f_aq_strength;
+ float f_aq_sensitivity;
int i_noise_reduction; /* adaptive pseudo-deadzone */

/* the deadzone size that will be used in luma quantization */

akupenguin

15th December 2007, 20:43

In addition to being unnecessary as discussed before, your zero array is not thread safe. And if you did for whatever reason need a dc array, its stride should be 0 to reduce the amount of data to initialize.

Dark Shikari

15th December 2007, 20:50

In addition to being unnecessary as discussed before, your zero array is not thread safe. And if you did for whatever reason need a dc array, its stride should be 0 to reduce the amount of data to initialize.Yup, yup, I will fix the code. Wait, the zero array isn't threadsafe though? Isn't that what ordinary AQ uses?

akupenguin

15th December 2007, 20:52

That zero array contains zeros. Yours gets modified. In short: there shouldn't be any non-const static variables.

Dark Shikari

15th December 2007, 21:10

That zero array contains zeros. Yours gets modified. In short: there shouldn't be any non-const static variables.Bleh, fixed code.

Not bit-equivalent to the old one, but close enough, and faster.

Index: encoder/encoder.c
===================================================================
--- encoder/encoder.c (revision 712)
+++ encoder/encoder.c (working copy)
@@ -472,6 +472,8 @@
if( !h->param.b_cabac )
h->param.analyse.i_trellis = 0;
h->param.analyse.i_trellis = x264_clip3( h->param.analyse.i_trellis, 0, 2 );
+ if( h->param.analyse.b_aq && h->param.analyse.f_aq_strength <= 0 )
+ h->param.analyse.b_aq = 0;
h->param.analyse.i_noise_reduction = x264_clip3( h->param.analyse.i_noise_reduction, 0, 1<<16 );

{
Index: encoder/analyse.c
===================================================================
--- encoder/analyse.c (revision 712)
+++ encoder/analyse.c (working copy)
@@ -29,6 +29,7 @@
#endif

#include "common/common.h"
+#include "common/cpu.h"
#include "macroblock.h"
#include "me.h"
#include "ratecontrol.h"
@@ -2031,8 +2032,51 @@
}
}

+//Finds the total AC energy of the block in all planes.
+static int ac_energy_mb(x264_t *h)
+{
+ DECLARE_ALIGNED( static uint8_t, zero[16], 16 );
+ int sad,ssd;
+ int totalSSD = 0;
+ sad = h->pixf.sad[PIXEL_16x16](zero,0,h->mb.pic.p_fenc[0],FENC_STRIDE);
+ ssd = h->pixf.ssd[PIXEL_16x16](zero,0,h->mb.pic.p_fenc[0],FENC_STRIDE);
+ totalSSD += ssd - ((sad * sad) >> 8);
+ sad = h->pixf.sad[PIXEL_8x8](zero,0,h->mb.pic.p_fenc[1],FENC_STRIDE);
+ ssd = h->pixf.ssd[PIXEL_8x8](zero,0,h->mb.pic.p_fenc[1],FENC_STRIDE);
+ totalSSD += ssd - ((sad * sad) >> 6);
+ sad = h->pixf.sad[PIXEL_8x8](zero,0,h->mb.pic.p_fenc[2],FENC_STRIDE);
+ ssd = h->pixf.ssd[PIXEL_8x8](zero,0,h->mb.pic.p_fenc[2],FENC_STRIDE);
+ totalSSD += ssd - ((sad * sad) >> 6);
+ return totalSSD;
+}

/*****************************************************************************
+ * x264_adaptive_quant:
+ * check if mb is "flat", i.e. has most energy in low frequency components, and
+ * adjust qp down if it is
+ *****************************************************************************/
+void x264_adaptive_quant( x264_t *h, x264_mb_analysis_t *a )
+{
+ int qp = h->mb.i_qp;
+ int ac_energy = ac_energy_mb(h);
+ x264_cpu_restore(h->param.cpu);
+ float result = ac_energy;
+ const float expconst = 0.367879441;
+ float threshold = powf(h->param.analyse.f_aq_sensitivity,4)/2;
+ if(result < threshold)
+ {
+ if(result == 0) result = 1;
+ else
+ result = (expconst-expf(-powf(threshold/result,0.2))) * 2.71828183;
+ }
+ else result = 0;
+ int qp_adj = (qp * result * h->param.analyse.f_aq_strength) / 2;
+ qp_adj = x264_clip3(qp_adj, 0, qp/2);
+ h->mb.i_qp = a->i_qp = qp - qp_adj;
+ h->mb.i_chroma_qp = i_chroma_qp_table[x264_clip3( h->mb.i_qp + h->pps->i_chroma_qp_index_offset, 0, 51 )];
+}
+
+/*****************************************************************************
* x264_macroblock_analyse:
*****************************************************************************/
void x264_macroblock_analyse( x264_t *h )
@@ -2040,9 +2084,14 @@
x264_mb_analysis_t analysis;
int i_cost = COST_MAX;
int i;
+
+ h->mb.i_qp = x264_ratecontrol_qp( h );

+ if( h->param.analyse.b_aq )
+ x264_adaptive_quant( h, &analysis );
+
/* init analysis */
- x264_mb_analyse_init( h, &analysis, x264_ratecontrol_qp( h ) );
+ x264_mb_analyse_init( h, &analysis, h->mb.i_qp );

/*--------------------------- Do the analysis ---------------------------*/
if( h->sh.i_type == SLICE_TYPE_I )
Index: x264.c
===================================================================
--- x264.c (revision 712)
+++ x264.c (working copy)
@@ -243,6 +243,12 @@
" - 2: enabled on all mode decisions\n", defaults->analyse.i_trellis );
H0( " --no-fast-pskip Disables early SKIP detection on P-frames\n" );
H0( " --no-dct-decimate Disables coefficient thresholding on P-frames\n" );
+ H0( " --aq-strength <float> Amount to adjust QP per MB [%.1f]\n"
+ " 0.0: no AQ\n"
+ " 1.1: strong AQ\n", defaults->analyse.f_aq_strength );
+ H0( " --aq-sensitivity <float> \"Flatness\" threshold to trigger AQ [%.1f]\n"
+ " 5: applies to almost no blocks\n"
+ " 35: applies to almost all blocks\n", defaults->analyse.f_aq_sensitivity );
H0( " --nr <integer> Noise reduction [%d]\n", defaults->analyse.i_noise_reduction );
H1( "\n" );
H1( " --deadzone-inter <int> Set the size of the inter luma quantization deadzone [%d]\n", defaults->analyse.i_luma_deadzone[0] );
@@ -406,6 +412,8 @@
{ "trellis", required_argument, NULL, 't' },
{ "no-fast-pskip", no_argument, NULL, 0 },
{ "no-dct-decimate", no_argument, NULL, 0 },
+ { "aq-strength", required_argument, NULL, 0 },
+ { "aq-sensitivity", required_argument, NULL, 0 },
{ "deadzone-inter", required_argument, NULL, '0' },
{ "deadzone-intra", required_argument, NULL, '0' },
{ "level", required_argument, NULL, 0 },
Index: common/pixel.c
===================================================================
--- common/pixel.c (revision 712)
+++ common/pixel.c (working copy)
@@ -213,6 +213,14 @@
PIXEL_SATD_C( x264_pixel_satd_4x8, 4, 8 )
PIXEL_SATD_C( x264_pixel_satd_4x4, 4, 4 )

+static int x264_pixel_count_8x8( uint8_t *pix, int i_pix, uint32_t threshold )
+{
+ int x, y, sum = 0;
+ for( y=0; y<8; y++, pix += i_pix )
+ for( x=0; x<8; x++ )
+ sum += pix[x] > (uint8_t)threshold;
+ return sum;
+}

/****************************************************************************
* pixel_sa8d_WxH: sum of 8x8 Hadamard transformed differences
@@ -473,6 +481,8 @@
pixf->ads[PIXEL_16x8] = pixel_ads2;
pixf->ads[PIXEL_8x8] = pixel_ads1;

+ pixf->count_8x8 = x264_pixel_count_8x8;
+
#ifdef HAVE_MMX
if( cpu&X264_CPU_MMX )
{
Index: common/pixel.h
===================================================================
--- common/pixel.h (revision 712)
+++ common/pixel.h (working copy)
@@ -84,6 +84,8 @@
void (*ads[7])( int enc_dc[4], uint16_t *sums, int delta,
uint16_t *res, int width );

+ int (*count_8x8)( uint8_t *pix, int i_pix, uint32_t threshold );
+
/* calculate satd of V, H, and DC modes.
* may be NULL, in which case just use pred+satd instead. */
void (*intra_satd_x3_16x16)( uint8_t *fenc, uint8_t *fdec, int res[3] );
Index: common/common.c
===================================================================
--- common/common.c (revision 712)
+++ common/common.c (working copy)
@@ -123,6 +123,9 @@
param->analyse.i_chroma_qp_offset = 0;
param->analyse.b_fast_pskip = 1;
param->analyse.b_dct_decimate = 1;
+ param->analyse.b_aq = 0;
+ param->analyse.f_aq_strength = 0.0;
+ param->analyse.f_aq_sensitivity = 15;
param->analyse.i_luma_deadzone[0] = 21;
param->analyse.i_luma_deadzone[1] = 11;
param->analyse.b_psnr = 1;
@@ -455,6 +458,13 @@
p->analyse.b_fast_pskip = atobool(value);
OPT("dct-decimate")
p->analyse.b_dct_decimate = atobool(value);
+ OPT("aq-strength")
+ {
+ p->analyse.f_aq_strength = atof(value);
+ p->analyse.b_aq = (p->analyse.f_aq_strength > 0.0);
+ }
+ OPT("aq-sensitivity")
+ p->analyse.f_aq_sensitivity = atof(value);
OPT("deadzone-inter")
p->analyse.i_luma_deadzone[0] = atoi(value);
OPT("deadzone-intra")
@@ -939,6 +949,9 @@
s += sprintf( s, " zones" );
}

+ if( p->analyse.b_aq )
+ s += sprintf( s, " aq=1:%.1f:%.1f", p->analyse.f_aq_strength, p->analyse.f_aq_sensitivity );
+
return buf;
}

Index: x264.h
===================================================================
--- x264.h (revision 712)
+++ x264.h (working copy)
@@ -230,6 +230,9 @@
int i_trellis; /* trellis RD quantization */
int b_fast_pskip; /* early SKIP detection on P-frames */
int b_dct_decimate; /* transform coefficient thresholding on P-frames */
+ int b_aq; /* psy adaptive QP */
+ float f_aq_strength;
+ float f_aq_sensitivity;
int i_noise_reduction; /* adaptive pseudo-deadzone */

/* the deadzone size that will be used in luma quantization */

EXE updated.

LigH

15th December 2007, 21:41

Hooray!

Thanks for this patch. I bet some friends in the german board will test it too.

Sagekilla

16th December 2007, 00:39

So now the general starting point for using AQ-strength would be 1.0 and now 0.5 as before? If so that would be quite nice, since I'd imagine it'd give me some more leeway with only using tiny amounts of AQ in those pesky movies where theres relatively few dark scenes.

Dark Shikari

16th December 2007, 01:47

I found a bug with my energy function where I get an integer overflow in some extremely bright blocks, resulting in AQ not being activated even if the block is flat. A fix will come in a bit.

I'm also working on a magical algorithm to automatically find a good threshold value for each frame. :)

Sagekilla

16th December 2007, 02:10

I found a bug with my energy function where I get an integer overflow in some extremely bright blocks, resulting in AQ not being activated even if the block is flat. A fix will come in a bit.

I'm also working on a magical algorithm to automatically find a good threshold value for each frame. :)

Could this be magically added to a multi-patched x264 with all your other wonderful patches too? :)

Dark Shikari

16th December 2007, 02:15

Could this be magically added to a multi-patched x264 with all your other wonderful patches too? :)Soon. In the meantime, a teaser of the latest algorithm:

(1-pass ABR, 1000 kbit, a comparison of two I-frames)

Original:

http://i6.tinypic.com/72riyc7.pnghttp://i2.tinypic.com/6lb7k2g.png

AQ:

http://i4.tinypic.com/8ftnotu.pnghttp://i17.tinypic.com/8borxbp.png

Note most of the ringing is from the original source, which was not a very well-encoded DVD (and so blurring obscures the ringing when AQ isn't used).

If you want a huge contrast between the two, look at the wheel in the background on the first image. Or the bricks in the background on the second image.

Sagekilla

16th December 2007, 02:16

Very nice, some good detail retention in areas that I'd imagine would otherwise be killed off..

kumi

16th December 2007, 04:38

I see a huge difference in CRF output size with --aq-sensitivity 0. Normal?

--crf 21.5
Size: 9.99 MB
Bitrate (Avg): 1.169

--crf 21.5 --aq-str 1.0
Size: 8.12 MB
Bitrate (Avg): 0.950

--crf 21.5 --aq-str 1.0 --aq-sens 0
Size: 36.8 MB
Bitrate (Avg): 4.309

Sagekilla

16th December 2007, 04:41

I see a huge difference in CRF output size with --aq-sensitivity 0. Normal?

--crf 21.5
Size: 9.99 MB
Bitrate (Avg): 1.169

--crf 21.5 --aq-str 1.0
Size: 8.12 MB
Bitrate (Avg): 0.950

--crf 21.5 --aq-str 1.0 --aq-sens 0
Size: 36.8 MB
Bitrate (Avg): 4.309

Perhaps it may be borked with 0, like one of those divide by zero errors.

kumi

16th December 2007, 04:56

Yes, but
"2. For the automagic thresholding algorithm, use --aq-sensitivity 0."

Dark Shikari

16th December 2007, 05:11

I see a huge difference in CRF output size with --aq-sensitivity 0. Normal?

--crf 21.5
Size: 9.99 MB
Bitrate (Avg): 1.169

--crf 21.5 --aq-str 1.0
Size: 8.12 MB
Bitrate (Avg): 0.950

--crf 21.5 --aq-str 1.0 --aq-sens 0
Size: 36.8 MB
Bitrate (Avg): 4.309Try with bitrate mode to make the results more comparable. Its likely screwing up ratecontrol--I will try to see what I can do to make it avoid blowing up the filesize.

That is natural though--AQ does drastically raise filesize with CRF. Its just in this case its raised it a bit more than usual.

check

16th December 2007, 05:43

what do you get with a sensitivity very near 1?

Sagekilla

16th December 2007, 05:43

I have to say, I find your AQ to be quite interesting.. I actually got a -huge- reduction in bitrate when I used it. 1738 kbps without vs 1475 kbps with, in one of my tests. That was using a simple --aq-strength 0.5 --aq-sensitivity 15.

@Check: At that point I think it'd be still running at the regular non-adaptive sensitivity so it would activate on very few blocks according to what the help says (low aq = less blocks activated on, high aq = more blocks activated on)

Dark Shikari

16th December 2007, 06:17

0 = adaptive, any other value = regular scheme.

Its possible adaptive could be a bit too strong by default, so experiment with lower strength values (and I could experiment with slightly better adaptive schemes).

Sagekilla

16th December 2007, 06:41

0 = adaptive, any other value = regular scheme.

Its possible adaptive could be a bit too strong by default, so experiment with lower strength values (and I could experiment with slightly better adaptive schemes).

It seems like it, because I tried the adaptive mode (Wouldn't that make it a... adaptive adaptive quantization?) myself and I ended up with a severely bloated file over have it at the default sensitivity of 15. Personally I like how it decreases the file sizes while not really decreasing the quality at all, so that's just about reason enough for me to just go with strength 0.9 and sensitivity 15.

Dark Shikari

16th December 2007, 11:18

I found some serious problems with automagic thresholding--it was consistently overestimating the necessary threshold.

As a result, I implemented a much more brute-force (and as a result slightly slower) algorithm that should be able to find a better threshold. Try it out--unlike before, it shouldn't screw up ratecontrol.

Strength has also been moved to a different part of the formula for easier control over the results of the algorithm.

The goal of this latest algorithm is to keep the average QP per frame the same. This, in most cases, keeps the bits per frame relatively similar, which means AQ should no longer drastically increase or decrease bitrate at a given CRF/QP.

kumi

16th December 2007, 11:41

Great! Can't wait to test :D

ToS_Maverick

16th December 2007, 13:23

what i found out about 0.3 with BlackPearl:
- --aq-strength 1.0 --aq-sensitivity 20 and CRF20 is transparent
- your new AQ produces bigger files, but the quality is better than ever!
- str 1.0 is very balanced
- below sens 20 some areas get left behind
- auto mode (sens 0) is producing heavily undersized files
- --aq-strength 1.0 --aq-sensitivity 20 and CRF20 = --aq-strength 1.0 and CRF15, about the same size and quality (only a small difference)
- SSIM is the same (CRF20 with AQ compared to CRF17 without, same size)
- PSNR is 1 dB lower (OMG ;))

why is your adaptive-mode acting so weird? what is it supposed to do?

Sagekilla

16th December 2007, 16:43

what i found out about 0.3 with BlackPearl:
- --aq-strength 1.0 --aq-sensitivity 20 and CRF20 is transparent
- your new AQ produces bigger files, but the quality is better than ever!
- str 1.0 is very balanced
- below sens 20 some areas get left behind
- auto mode (sens 0) is producing heavily undersized files
- --aq-strength 1.0 --aq-sensitivity 20 and CRF20 = --aq-strength 1.0 and CRF15, about the same size and quality (only a small difference)
- SSIM is the same (CRF20 with AQ compared to CRF17 without, same size)
- PSNR is 1 dB lower (OMG ;))

why is your adaptive-mode acting so weird? what is it supposed to do?

The "adaptive mode" for adaptive quantization is supposed to dynamically choose the best sensitivity for each frame, so it can change the qps accordingly, or that's what it seems to be doing from what I can infer.

Dark Shikari

16th December 2007, 21:07

The "adaptive mode" for adaptive quantization is supposed to dynamically choose the best sensitivity for each frame, so it can change the qps accordingly, or that's what it seems to be doing from what I can infer.And it defines "best" as the sensitivity that results in the average QP for that frame not changing--i.e. if it raises 20 QPs by 5, it also has to lower other QPS by a total of 100.

Sagekilla

16th December 2007, 21:38

And it defines "best" as the sensitivity that results in the average QP for that frame not changing--i.e. if it raises 20 QPs by 5, it also has to lower other QPS by a total of 100.

Does this affect the qps of each block after x264 chooses a qp for a given frame or does this intermix with the qp decision to give the frame?

Dark Shikari

16th December 2007, 21:43

Does this affect the qps of each block after x264 chooses a qp for a given frame or does this intermix with the qp decision to give the frame?After, because x264's frame-QP decision is already based on the relative complexity of the frame.

Sagekilla

16th December 2007, 21:45

After, because x264's frame-QP decision is already based on the relative complexity of the frame.

Gotcha, so in this case the new adaptive mode will be just be adding and removing bits here and there without actually reducing or increasing the bitrate significantly?

Dark Shikari

16th December 2007, 22:48

Gotcha, so in this case the new adaptive mode will be just be adding and removing bits here and there without actually reducing or increasing the bitrate significantly?Ideally, yes.

ToS_Maverick

16th December 2007, 23:22

then why does it lead to a massive undersize with this sample, while it actually should increase the bitrate?

and why are the final quants so low (15-17)?

Dark Shikari

16th December 2007, 23:28

then why does it lead to a massive undersize with this sample, while it actually should increase the bitrate?

and why are the final quants so low (15-17)?Can you upload the .h264 stream so I can look at it?

ToS_Maverick

17th December 2007, 00:00

you should have the sample, try it with these settings:
--crf 20.0 --level 4.1 --keyint 100 --min-keyint 1 --ref 3 --mixed-refs --no-fast-pskip --bframes 2 --b-pyramid --bime --weightb --filter -2,-2 --analyse p8x8,b8x8,i4x4,i8x8 --8x8dct --vbv-bufsize 9781 --vbv-maxrate 29400 --threads auto --thread-input --progress --no-dct-decimate --output "output" "input" --aq-strength 1.0

i can't upload it today, if you need it i'll upload it tomorrow!

thx and good night ;)

Dark Shikari

17th December 2007, 00:05

you should have the sample, try it with these settings:What sample? What do you mean, "you should have the sample"? That's not exactly descriptive... :rolleyes:

kumi

17th December 2007, 02:57

It seems the oversizing is a little better now, only +14% @ 0.9 strength. I haven't encountered any undersizing yet :rolleyes:

25.1 MB --crf 21.5
25.4 MB --crf 21.5 --aq-strength 0.3 --aq-sensitivity 0
26.7 MB --crf 21.5 --aq-strength 0.6 --aq-sensitivity 0
29.2 MB --crf 21.5 --aq-strength 0.9 --aq-sensitivity 0

I compared AQ on a 2% sample of bright, outdoor, well-shot prosumer SD DV movie. Scenes consist of lots of close-ups of people's faces talking, with lots of action in the background. No dark scenes at all.

25.1 MB --crf 21.5
vs
25.2 MB --crf 22.5 --aq-strength 0.9 --aq-sensitivity 0

Right away the most noticable improvement is in the increased facial detail and skin tones. I mean HUGE improvement. Mosquito noise, blocking and banding are all much less visible. And not just the dark and/or detailed flat areas, either. Everywhere there is detail to bring out, like humari hair, it seems to bring it out. In fact I can't find areas that look worse than before... where are the extra bits coming from?! This is voodoo magic! :eek:

If there's anything I would ask, it would be to speed it up a bit (if possible), and release a fast-ref-search/AQ binary :p But this is #$%@ing great work you've done here, thank you. :cool:

Dark Shikari

17th December 2007, 03:03

Right away the most noticable improvement is in the increased facial detail and skin tones. I mean HUGE improvement. Mosquito noise, blocking and banding are all much less visible. And not just the dark and/or detailed flat areas, either. Everywhere there is detail to bring out, like humari hair, it seems to bring it out. In fact I can't find areas that look worse than before... where are the extra bits coming from?! This is voodoo magic! :eek:It takes the bits from the areas with the highest variance--a very sharp boundary with strong brightness differences, for example, would get bits taken away. I'm not sure if this would have a negative effect in anime--in live action any negative effect seems to be nearly invisible.

One thing you'll find when looking at bit distribution of non-AQ encodes is that often the vast majority of the bits are concentrated in very small areas; one can easily take a few away without there being much noticeable difference.

Sagekilla

17th December 2007, 03:40

Everywhere there is detail to bring out, like humari hair, it seems to bring it out. In fact I can't find areas that look worse than before... where are the extra bits coming from?! This is voodoo magic! :eek:

Voodoo magic? No.. This.. Is.. SPARTA!!

@Dark Shikari: If it were to be that way, wouldn't it be a good idea to use the mode you're using right now as a "real life" mode, and then use a different type of AQ as an "anime" mode? Because, I do encode a few anime sources where I do need to use AQ, and if the new AQ will harm anime then I really think you should consider adding a switch to choose anime/real life mode or something to that effect.

Sharktooth

17th December 2007, 04:01

why dont you try the new AQ on your anime, see it with your eyes and report back?
it would be a really usefull info...

Dark Shikari

17th December 2007, 04:29

why dont you try the new AQ on your anime, see it with your eyes and report back?
it would be a really usefull info...I've tried it, and its hard to tell. It really does salvage some detail, much like in ordinary encodes, but I'm really not that sure about it.

Sharktooth

17th December 2007, 15:57

... it was directed to sagekilla ...

i know you probably tested it on animes too, but a second POV would be usefull...

ToS_Maverick

17th December 2007, 19:55

Dark Shikari, i meant the BlackPearlSample, my main testsample ;)

from your screens i could see you still got it, anyway, i posted the link here:
http://forum.doom9.org/showthread.php?p=1028047#post1028047

i used this script:
DGDecode_mpeg2source("Black.Pearl.Sample.d2v", idct=7)
trim(2,0)
crop(0,58,0,-62)
LanczosResize(768,320)

crf 15 --aq-strength 1.0:
--[NoImage] Job commandline: "C:\Programme\megui\tools\x264\x264.exe" --crf 15.0 --level 4.1 --keyint 100 --min-keyint 1 --ref 3 --mixed-refs --no-fast-pskip --bframes 2 --b-pyramid --bime --weightb --filter -2,-2 --analyse p8x8,b8x8,i4x4,i8x8 --8x8dct --vbv-bufsize 9781 --vbv-maxrate 29400 --threads auto --thread-input --progress --no-dct-decimate --output "F:\Video\BlackPearl\Black.Pearl.Sample crf 15 newaq10.mkv" "F:\Video\BlackPearl\Black.Pearl.Sample.avs" --aq-strength 1.0
--[Information] [16.12.2007 13:08:58] Encoding started
--[NoImage] Standard output stream
--[NoImage] Standard error stream
---[NoImage] avis [info]: 768x320 @ 23.98 fps (3622 frames)
---[NoImage] x264 [info]: using cpu capabilities: MMX MMXEXT SSE SSE2 SSE3 SSSE3 Cache64
---[NoImage] x264 [info]: slice I:98 Avg QP:14.66 size: 33513 PSNR Mean Y:46.78 U:48.70 V:49.63 Avg:47.40 Global:46.44
---[NoImage] x264 [info]: slice P:1955 Avg QP:16.93 size: 16632 PSNR Mean Y:44.88 U:47.55 V:48.51 Avg:45.69 Global:45.52
---[NoImage] x264 [info]: slice B:1569 Avg QP:18.58 size: 6651 PSNR Mean Y:43.48 U:46.79 V:47.66 Avg:44.39 Global:44.23
---[NoImage] x264 [info]: mb I I16..4: 14.4% 24.3% 61.2%
---[NoImage] x264 [info]: mb P I16..4: 8.8% 21.7% 13.7% P16..4: 21.7% 21.7% 9.9% 0.0% 0.0% skip: 2.5%
---[NoImage] x264 [info]: mb B I16..4: 2.1% 5.5% 2.1% B16..8: 42.6% 3.6% 8.0% direct:14.7% skip:21.4%
---[NoImage] x264 [info]: 8x8 transform intra:47.9% inter:29.8%
---[NoImage] x264 [info]: ref P 74.8% 17.5% 7.7%
---[NoImage] x264 [info]: ref B 79.5% 16.6% 3.9%
---[NoImage] x264 [info]: SSIM Mean Y:0.9818539
---[NoImage] x264 [info]: PSNR Mean Y:44.322 U:47.254 V:48.177 Avg:45.170 Global:44.935 kb/s:2448.41
---[NoImage] encoded 3622 frames, 31.19 fps, 2448.63 kb/s

crf 20 --aq-strength 1.0 --aq-sensitivity 20
--[NoImage] Job commandline: "C:\Programme\megui\tools\x264\x264.exe" --crf 20.0 --level 4.1 --keyint 100 --min-keyint 1 --ref 3 --mixed-refs --no-fast-pskip --bframes 2 --b-pyramid --bime --weightb --filter -2,-2 --analyse p8x8,b8x8,i4x4,i8x8 --8x8dct --vbv-bufsize 9781 --vbv-maxrate 29400 --threads auto --thread-input --progress --no-dct-decimate --output "F:\Video\BlackPearl\Black.Pearl.Sample crf 20 newaq10 sens10.mkv" "F:\Video\BlackPearl\Black.Pearl.Sample.avs" --aq-strength 1.0 --aq-sensitivity 20
--[Information] [16.12.2007 11:58:52] Encoding started
--[NoImage] Standard output stream
--[NoImage] Standard error stream
---[NoImage] avis [info]: 768x320 @ 23.98 fps (3622 frames)
---[NoImage] x264 [info]: using cpu capabilities: MMX MMXEXT SSE SSE2 SSE3 SSSE3 Cache64
---[NoImage] x264 [info]: slice I:98 Avg QP:19.66 size: 28360 PSNR Mean Y:45.00 U:47.60 V:48.59 Avg:45.74 Global:43.20
---[NoImage] x264 [info]: slice P:1955 Avg QP:21.93 size: 15696 PSNR Mean Y:44.31 U:47.18 V:48.16 Avg:45.15 Global:44.86
---[NoImage] x264 [info]: slice B:1569 Avg QP:23.58 size: 6291 PSNR Mean Y:42.89 U:46.36 V:47.24 Avg:43.83 Global:43.60
---[NoImage] x264 [info]: mb I I16..4: 15.5% 28.0% 56.5%
---[NoImage] x264 [info]: mb P I16..4: 8.2% 21.3% 14.6% P16..4: 22.7% 22.1% 8.9% 0.0% 0.0% skip: 2.3%
---[NoImage] x264 [info]: mb B I16..4: 2.3% 6.0% 2.1% B16..8: 43.3% 3.6% 7.1% direct:14.3% skip:21.2%
---[NoImage] x264 [info]: 8x8 transform intra:47.9% inter:29.6%
---[NoImage] x264 [info]: ref P 74.8% 17.6% 7.6%
---[NoImage] x264 [info]: ref B 79.4% 16.6% 4.0%
---[NoImage] x264 [info]: SSIM Mean Y:0.9807162
---[NoImage] x264 [info]: PSNR Mean Y:43.712 U:46.837 V:47.774 Avg:44.595 Global:44.223 kb/s:2294.88
---[NoImage] encoded 3622 frames, 32.24 fps, 2295.10 kb/s

for crf 20 --aq-strength 1.0 i get
918 kb/s
SSIM 0.969
which is way too low in size, SSIM and visual quality

Dark Shikari

17th December 2007, 19:58

for crf 20 --aq-strength 1.0 i get
918 kb/s
SSIM 0.969
which is way too low in size, SSIM and visual qualityHow about you compare two different videos, at the same bitrate, visually?

ToS_Maverick

17th December 2007, 20:13

ok, i think i have to express myself a bit more clearly ;)

the crf 15 vid has 150 kb/s more bitrate than the crf 20 one, that's 6.5 %. for me, thats close enough. i compared them visually of course.

to give you an idea:
vid1=directshowsource("Black.Pearl.Sample crf 15 newaq10.mkv", audio=false).lanczosresize(1280,536)
vid2=directshowsource("Black.Pearl.Sample crf 20 newaq10 sens20.mkv", audio=false).lanczosresize(1280,536)
interleave(vid1,vid2)

normally, i predict the quality of my encodes with the aveage quant/ratefactor. now a sample, that is transparent at 20, suddenly needs 15. that's a bit weird for me.

Dark Shikari

17th December 2007, 20:23

ok, i think i have to express myself a bit more clearly ;)

the crf 15 vid has 150 kb/s more bitrate than the crf 20 one, that's 6.5 %. for me, thats close enough. i compared them visually of course.

to give you an idea:
vid1=directshowsource("Black.Pearl.Sample crf 15 newaq10.mkv", audio=false).lanczosresize(1280,536)
vid2=directshowsource("Black.Pearl.Sample crf 20 newaq10 sens20.mkv", audio=false).lanczosresize(1280,536)
interleave(vid1,vid2)

normally, i predict the quality of my encodes with the aveage quant/ratefactor. now a sample, that is transparent at 20, suddenly needs 15. that's a bit weird for me.I'll have to do some testing with this to see why the bitrate is changing so much--whether its the fact that it doesn't need all that bitrate when AQ is applied, or whether AQ is just being applied unevenly.

I might have found the problem. It might be because of the black on the bottom and the top, the letterbox padding--this is counted as "flat" and so the formula screws up completely. But I'm not 100% sure about this... /goes back to testing.

foxyshadis

17th December 2007, 22:50

Maybe the quant accounting is just off, since it'll average out to the same quant but a very different bit allocation. It probably doesn't really matter that much, forcing the same bitrate would require modifying the RC or accounting for how many bits every quant change adds or subtracts, a lot of work for questionable gain.

Gilgamesh83

17th December 2007, 23:49

Hi!

Was just wondering about what the aq does in a codec,
does it:

a) pull bitrate from dynamic parts of the frame to non dynamic part of the frame?

b) pull bitrate from brighter parts of the frame to darker parts of the frame? (does it have to do with the fact that codecs give less bitrate in darker areas with the same quantizer in bright areas? or something like that.)

again a quick answer is ok since im a noob or rather, I know nothing about programming but am an avid user of x264 in megui.
I have always imagined that aq could in a frame have different quantizers to e.g. only a small dynamic parts of a frame. (like when in an anime there is a frame that is still but has a tv that shows static, that frame would still get a great about of bitrate cause of the dynamics in the static tv part.)

Dark Shikari

18th December 2007, 01:38

a) pull bitrate from dynamic parts of the frame to non dynamic part of the frame?This would be some sort of motion-based AQ. I've never seen one, personally.
b) pull bitrate from brighter parts of the frame to darker parts of the frame? (does it have to do with the fact that codecs give less bitrate in darker areas with the same quantizer in bright areas? or something like that.)That's called brightness-based AQ. Elecard supports this, and regular x264 AQ, though not brightness-based, is thresholded by brightness.

Regular x264 AQ finds the flattest parts of the frame (least complex) and gives the more bits. Mine is somewhat similar, but does it using different math and is much more willing to move bits around.