PDA

View Full Version : x264 MMXEXT patch for 8x8dct transform


bond
25th August 2005, 13:01
just saw this on the x264 maillist:
Hi,

attached is a patch based on rev. 287 that implements
* x264_sub8x8_dct8_mmxext
* x264_add8x8_idct8_mmxext
which are 3.3 and 4.0 times faster than their C counterparts
(respectively) on my AthlonXP.

Of course they produce bit identical output compared to the C
implementation and the overall speed gain was 2.23% for my non- sythetic
test inputs. An SSE2 optimized version is also possible ( only a typing
exercise ) but will only result in a minor speep up ( estimated 3.8/4.5
times faster than C ) since only a few parts can be optimized for SSE2.

So far it assembles with nasm. I haven't tested it for other assemblers.

btw. I noticed that the C sub8x8_dct8 isn't exactly the inverse of
add8x8_idct8. I wonder if this is really intended ( to add
compressability with quant/dequant perhaps ) or just a bug.

regards,
Christianhttp://people.via.ecp.fr/~admin/20050825-videolan/x264-dct8-idct8-mmxext.diff

enjoy

Sharktooth
25th August 2005, 13:28
on my way ;)

bond
25th August 2005, 13:33
:thanks:

squid_80
25th August 2005, 13:58
Can someone actually check if this new build is faster? Seems weird that optimizing functions that uses less than 1% of encoding time would yield an overall gain of 2.2% :confused:

akupenguin
25th August 2005, 16:43
It depends on settings. With RD, dct8 is performed about 2.5 times per MB. Without RD, dct8 is performed only during the final encode, about .5 times per MB. I do see 2% speedup at -m1 or -m6, and less than 1% at -m5.

Sirber
25th August 2005, 17:03
Cool! :D

vortex_hl
25th August 2005, 21:58
i noticed %3.5 performance gain on my AthlonXP 3200+

Source:
loadplugin("C:\Program Files\AutoGK\DGMPGDec\dgdecode.dll")
loadplugin("C:\Program Files\AutoGK\filters\autocrop.dll")
mpeg2Source("C:\vts.d2v")
autocrop(mode=0)
lanczosresize(720,400)
trim(0,1800)

Without Patch:
x264 -B 2000 -b 3 -r 5 --b-pyramid -w -8 -m 6 --progress --no-psnr -o sd.mp4 sd.avs
avis [info]: 720x400 @ 25.00 fps (1801 frames)
x264 [info]: using cpu capabilities MMX MMXEXT SSE 3DNow!
mp4 [info]: initial delay 2 (scale 25)
x264 [info]: slice I:43 Avg QP:17.65 Avg size: 27744
x264 [info]: slice P:901 Avg QP:18.85 Avg size: 13387
x264 [info]: slice B:857 Avg QP:19.95 Avg size: 5168
x264 [info]: slice I Avg I4x4:14.1% I8x8:65.5% I16x16:20.4%
x264 [info]: slice P Avg I4x4:4.7% I8x8:19.5% I16x16:8.6% P:49.7% P8x8:5.3% PSKIP:12.3%
x264 [info]: slice B Avg I4x4:1.6% I8x8:2.6% I16x16:1.0% P:32.8% B:12.6% B8x8:4.3% DIRECT:6.7% BSKIP:38.4%
x264 [info]: 8x8 transform intra:59.1% inter:56.8%
x264 [info]: kb/s:1963.7

encoded 1801 frames, 4.76 fps, 1963.89 kb/s

With Patch:
x264 -B 2000 -b 3 -r 5 --b-pyramid -w -8 -m 6 --progress --no-psnr -o sd.mp4 sd.avs
avis [info]: 720x400 @ 25.00 fps (1801 frames)
x264 [info]: using cpu capabilities MMX MMXEXT SSE 3DNow!
mp4 [info]: initial delay 2 (scale 25)
x264 [info]: slice I:43 Avg QP:17.65 Avg size: 27744
x264 [info]: slice P:901 Avg QP:18.85 Avg size: 13387
x264 [info]: slice B:857 Avg QP:19.95 Avg size: 5168
x264 [info]: slice I Avg I4x4:14.1% I8x8:65.5% I16x16:20.4%
x264 [info]: slice P Avg I4x4:4.7% I8x8:19.5% I16x16:8.6% P:49.7% P8x8:5.3 % PSKIP:12.3%
x264 [info]: slice B Avg I4x4:1.6% I8x8:2.6% I16x16:1.0% P:32.8% B:12.6% B8x8:4.3% DIRECT:6.7% BSKIP:38.4%
x264 [info]: 8x8 transform intra:59.1% inter:56.8%
x264 [info]: kb/s:1963.7

encoded 1801 frames, 4.93 fps, 1963.89 kb/s