Log in

View Full Version : Current Patches, Where to get them, How they affect speed/output


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 [32] 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

ajp_anton
10th January 2009, 00:49
Explains why, when I batch-encoded a few 100 short clips a week ago, when I went to visit the encoding computer it would randomly have crashed after (successfully) encoding a clip and needed a manual "OK, move on" =)

skystrife
14th January 2009, 05:50
x264.1077M.exe (http://www.mediafire.com/?ajqz4ettgam) - Alternate Download (http://skystrife.com/x264/x264.1077M.exe)

Patches used:

x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff (http://skystrife.com/x264/x264_win_zone_parse_fix_05.diff)

gcc 3.4.5 fprofiled build with -march=pentium2.
-----------------------------------------------
x264.1077M.x64.exe (http://www.mediafire.com/?2ahygyjd1zn) - Alternate Download (http://skystrife.com/x264/x264.1077M.x64.exe)

Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff (http://skystrife.com/x264/x264_win_zone_parse_fix_05.diff)
x264_win64_support.01.r1065.diff (http://skystrife.com/x264/x264_win64_support.01.r1065.diff)

gcc 4.4.0 fprofiled build.

Audionut
14th January 2009, 08:02
x264-r1077.7z (http://rapidshare.com/files/183042373/x264-r1077.7z)


Patched with,
x264_custom_strtok_r.r1074.diff
x264_fix_stats_file_work.r1074.diff
x264_multithreading_bug_check.r1074.diff
x264_no_b_adapt_with_pre_scenecut.r1074.diff
x264_bm_error_memoryleaks.04.r1074.diff
x264_bm_thread_pool.02.r1074.diff
x264_thread_priority_with_pool.02.diff
x264_log_file.03k.diff
x264_hrd_pulldown.09_interlace.diff
x264_single_frame_flash.diff

roozhou
14th January 2009, 14:19
Hey, r1077 does not compile with MSVC since the linker fails to link log2f. Can anyone fix this or just tell me in which library I can link to log2f?

Gabriel_Bouvigne
14th January 2009, 16:20
#define log2f(a) (logf(a)/logf(2))

akupenguin
14th January 2009, 17:15
And r1076 does compile in MSVC? I though I broke MSVC in r1066.

Dark Shikari
14th January 2009, 17:23
And r1065 does compile in MSVC? I thought I broke MSVC in r1060.

roozhou
14th January 2009, 18:20
#define log2f(a) (logf(a)/logf(2))
Thanks, now it compiles ok.
And this should run faster.

Index: common/osdep.h
===================================================================
--- common/osdep.h
+++ common/osdep.h
@@ -58,6 +58,7 @@
#endif
#if defined(_MSC_VER) || defined(SYS_SunOS) || defined(SYS_MACOSX)
#define sqrtf sqrt
+#define log2f(x) (logf(x)/1.44269504088896340736f)
#endif
#ifdef _WIN32
#define rename(src,dst) (unlink(dst), rename(src,dst)) // POSIX says that rename() removes the destination, but win32 doesn't.

Atak_Snajpera
14th January 2009, 18:49
+#define log2f(x) (logf(x)/1.44269504088896340736f)
I would use multiplication

+#define log2f(x) (logf(x)*0.69314718055994530941f)

Dark Shikari
14th January 2009, 19:00
I would use multiplicationNot as if it matters, as this code is only run 16384 times for each QP at most. If anybody cared, I would have made a LUT.

techouse
15th January 2009, 12:54
x264_x86_r1080_techouse (http://techouse.project357.com/builds/x264_x86_r1080_techouse.7z)
Source: x264 r1080 GIT (git://git.videolan.org/x264.git)

Applied patches (current versions):


x264_hrd_pulldown.09_interlace.diff

x264_win_zone_parse_fix_05.diff


Please check http://forum.doom9.org/showthread.php?t=130364 and http://git.videolan.org/gitweb.cgi?p=x264.git;a=shortlog for more info

Compiled by techouse on January 15th 2009, 12:44:54 CET with GCC-4.3.2 on Windows Vista Business SP-1 64-bit.

Commandline used: ./configure --extra-cflags="-march=core2" && make fprofiled

Platform: X86
System: MINGW
asm: yes
avis input: yes
mp4 output: yes
pthread: yes
debug: no
gprof: no
PIC: no
shared: no
visualize: no

roozhou
15th January 2009, 14:40
Not as if it matters, as this code is only run 16384 times for each QP at most. If anybody cared, I would have made a LUT.

Will you define them as constants(64k larger executable) or calcultate them at run-time?

Dark Shikari
15th January 2009, 15:55
Will you define them as constants(64k larger executable) or calcultate them at run-time?Runtime, of course. No point in wasting executable size calculating a huge table.

skystrife
24th January 2009, 03:31
x264.1088M.exe (http://www.mediafire.com/?zoiryjtrikf) - Alternate Download (http://skystrife.com/x264/x264.1088M.exe)

Patches used:

x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff (http://skystrife.com/x264/x264_win_zone_parse_fix_05.diff)

gcc 3.4.5 fprofiled build with -march=pentium2.
-----------------------------------------------
x264.1088M.x64.exe (http://www.mediafire.com/?xjdmzzwtte0) - Alternate Download (http://skystrife.com/x264/x264.1088M.x64.exe)

Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff (http://skystrife.com/x264/x264_win_zone_parse_fix_05.diff)
x264_win64_support.02.r1077.diff

gcc 4.4.0 fprofiled build.

Mr VacBob
24th January 2009, 20:11
Thanks, now it compiles ok.
And this should run faster.

The compiler is (probably) capable of this.

CruNcher
25th January 2009, 18:04
No one did a build yet with DaKaz threaded slice and lookahead patch ?
i mean forget most other patches since a long time those are the most important ones :)

Audionut
25th January 2009, 18:09
So, I spoke too soon - Dark Shikari was able to break the patch on his first try :( Turns out that in b-adapt 2 mode, the slice type decide process takes too long, and all threads enter a deadlock state. To fix this, I am going to have to put the main x264_encoder_encode process into its own thread.... stay tuned.

DaKaZ


Perhaps it's better to wait a little while, don't you think!!

kemuri-_9
25th January 2009, 18:46
No one did a build yet with DaKaz threaded slice and lookahead patch ?
i mean forget most other patches since a long time those are the most important ones :)

it's not finished yet...
A. Dark_Shikari broke it on the first try. (as previously mentioned above)
B. they were working on it this morning in the chan and

[11:28] <DaKaZ> I just successfully ran an encode with b-adpat 2 and lookahead ;)
[11:28] <DaKaZ> let me test a little more
[11:31] <DaKaZ> pengvado - thanks for the help... I think we have it!
[11:32] <DaKaZ> damn... seg fault :(


so it's still broken.
this patch is one of the patches that akupenguin/pengvado and Dark_Shikari are looking forward to the most... (the other being holger's)
it'll make the repository once it's tested to be safe.

CruNcher
25th January 2009, 19:08
Thx for the info kemuri-9 read the logs now and also DaKaz replied, great stuff if it finally makes it's way into x264 also holgers optimizations of course :)

Audionut
28th January 2009, 03:59
x264-r1090.7z (http://rapidshare.com/files/190455492/x264-r1090.7z)

Patched with,
x264_custom_strtok_r.r1089.diff
x264_fix_stats_file_work.r1089.diff
x264_multithreading_bug_check.r1089.diff
x264_no_b_adapt_with_pre_scenecut.r1089.diff
x264_bm_error_memoryleaks.04.r1089.diff
x264_bm_thread_pool.02.r1089.diff
x264_thread_priority_with_pool.02.diff
x264_log_file.03k.diff
x264_hrd_pulldown.09_interlace.diff

skystrife
29th January 2009, 00:07
x264.1093M.exe (http://www.mediafire.com/?yzywnyrhzgz) - Alternate Download (http://skystrife.com/x264/x264.1093M.exe)

Patches used:

x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff (http://skystrife.com/x264/x264_win_zone_parse_fix_05.diff)

gcc 3.4.5 fprofiled build with -march=pentium2.
-----------------------------------------------

x264.1094M.x64.exe (http://skystrife.com/x264/x264.1094M.x64.exe)

Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff
x264_win64_support.06.r1093.diff

gcc 4.4.0 fprofiled build.

^-- Provided tentatively. fprofiling with two videos works fine, but with my usual three creates a bad build. There may still be bugs existent in this build--use at your own risk. (fprofiled with two videos, see Win64 x264 thread for more info)

techouse
2nd February 2009, 17:52
x264_x86_r1097_techouse (http://techouse.project357.com/builds/x264_x86_r1097_techouse.7z)
Source: x264 r1097 GIT (git://git.videolan.org/x264.git)

Applied patches (current versions):


x264_hrd_pulldown.09_interlace.diff

x264_win_zone_parse_fix_05.diff


Please check http://forum.doom9.org/showthread.php?t=130364 and http://git.videolan.org/gitweb.cgi?p=x264.git;a=shortlog for more info

Compiled by techouse on February 2nd 2009, 17:18:03 CET with GCC-4.3.2 on Windows Vista Business SP-1 64-bit.

Commandline used: ./configure --extra-cflags="-march=core2" && make fprofiled

Platform: X86
System: MINGW
asm: yes
avis input: yes
mp4 output: yes
pthread: yes
debug: no
gprof: no
PIC: no
shared: no
visualize: no

techouse
5th February 2009, 22:59
x264_x86_r1101_techouse (http://techouse.digitalpulse.us/builds/x264_x86_r1101_techouse.7z)
Source: x264 r1101 GIT (git://git.videolan.org/x264.git)

Applied patches (current versions):


x264_hrd_pulldown.09_interlace.diff

x264_win_zone_parse_fix_05.diff


Please check http://forum.doom9.org/showthread.php?t=130364 and http://git.videolan.org/gitweb.cgi?p=x264.git;a=shortlog for more info

Compiled by techouse on February 5th 2009, 22:48:17 CET with GCC-4.3.2 on Windows Vista Business SP-1 64-bit.

Commandline used: ./configure --extra-cflags="-march=core2" && make fprofiled

Platform: X86
System: MINGW
asm: yes
avis input: yes
mp4 output: yes
pthread: yes
debug: no
gprof: no
PIC: no
shared: no
visualize: no

ACrowley
6th February 2009, 09:54
Ive the Feeling that the latest builds are little bit slower on my Quadcore Q6600 ? CPU Load in 2nd Pass is not mostly around 100%.
But i had full CPU Laod on all Coreas withg a bit older builds ?

Audionut
6th February 2009, 10:54
CPU load means nothing.

Don't rely on it, or your feeling. Measure it with some benchmarks.

Sharktooth
6th February 2009, 13:29
request: skystrife's updated builds ;)

Audionut
6th February 2009, 15:51
Can anyone help me to get configure working with the changes Loren made here http://git.videolan.org/?p=x264.git;a=commit;h=1df50b9287c83d5443d19482345b6842b78081c3

edit: how to get make to work in cygwin?

kemuri-_9
6th February 2009, 16:41
Can anyone help me to get configure working with the changes Loren made here http://git.videolan.org/?p=x264.git;a=commit;h=1df50b9287c83d5443d19482345b6842b78081c3

edit: how to get make to work in cygwin?

doesn't cygwin already have bash?
but either way, can always join #x264 on irc.freenode.net for help.

request: skystrife's updated builds ;)
until the win64 patch is updated again, can only get up to r1100 working in x64. (r1101 broke it).

skystrife
7th February 2009, 03:20
x264.1101M.exe (http://www.mediafire.com/?dejo1zijm2b) - Alternate Download (http://skystrife.com/x264/x264.1101M.exe)

Patches used:

x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff

gcc 3.4.5 fprofiled build with -march=pentium2.
-----------------------------------------------

x264.1100M.x64.exe (http://www.mediafire.com/?dhmz4undgyw) - Alternate Download (http://skystrife.com/x264/x264.1100M.x64.exe)

Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff
x264_win64_support.06.r1093.diff

gcc 4.3.4 fprofiled build.

Audionut
7th February 2009, 10:58
http://komisar.gin.by/x.patch/BugMaster/x264_win64_support.07.r1096.diff

kemuri-_9
7th February 2009, 16:07
http://komisar.gin.by/x.patch/BugMaster/x264_win64_support.07.r1096.diff

Edit:
there's a newer win64 patch than that:
http://stashbox.org/392450/x264_win64_support.08.r1101.diff

MasterNobody
7th February 2009, 16:12
indeed that is the correct latest patch, since ver 06 broke with the asm additions with revision 1096.
The latest is x264_win64_support.08.r1101.diff (http://komisar.gin.by/x.patch/BugMaster/x264_win64_support.08.r1101.diff)

kemuri-_9
7th February 2009, 16:14
The latest is x264_win64_support.08.r1101.diff (http://komisar.gin.by/x.patch/BugMaster/x264_win64_support.08.r1101.diff)

yeah i didn't see it until i checked my x264-dev emails just now ;)

Edit:
i do want to point out that ver 06 broke with the asm additions of r1096, so that x64 build skystrife has could likely cause corruptions and crash.

skystrife
7th February 2009, 18:47
x264.1101M.x64.exe (http://www.mediafire.com/?njeloooqzlz) - Alternate Download (http://skystrife.com/x264/x264.1101M.x64.exe)

Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff
x264_win64_support.08.r1101.diff

gcc 4.3.4 fprofiled build.

komisar
7th February 2009, 23:02
version 1101 of x264 from Komisar (gcc 4.3.3 fprofiled build):
x264.1101.k_GIT.generic.x32.exe (http://komisar.gin.by/x264.1101.k_GIT.generic.x32.exe)
x264.1101.k_GIT.generic.x64.exe (http://komisar.gin.by/x264.1101.k_GIT.generic.x64.exe)

Patches for k_GIT:
x264_win64_support.08.r1101.diff
k.75.cross_compile.01.diff
01_x264_custom_strtok_r.r1089.diff
x264_hrd_pulldown.09_interlace.diff
x264_mingw_aligned_04.diff

(Explain of my builds found in bottom of my page)
(Also my cross-compile toolchain with gcc-4.3.3 found here: tools (http://komisar.gin.by/tools/))

imk
8th February 2009, 09:54
Built with ICC v11.0.066 (with profiling):
x264.r1101M.SSE2.x32.imk.exe (http://imk.cx/pc/x264/x264.r1101M.SSE2.x32.imk.exe)
x264.r1101M.SSSE3.x32.imk.exe (http://imk.cx/pc/x264/x264.r1101M.SSSE3.x32.imk.exe)

x264.r1101M.SSE2.x64.imk.exe (http://imk.cx/pc/x264/x264.r1101M.SSE2.x64.imk.exe)
x264.r1101M.SSSE3.x64.imk.exe (http://imk.cx/pc/x264/x264.r1101M.SSSE3.x64.imk.exe)


Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_icc.diff
x264_win_zone_parse_fix_05.diff
x264_win64_support.08.r1101.diff (for the 64-bit build only)

burfadel
8th February 2009, 11:37
Hey imk, your SSSE3 and SSE2 versions are both pointed towards the SSE2 versions in the links. Changing the link manually to SSSE3 does show the correct file actually exists! :)

imk
8th February 2009, 12:26
Oops. Thanks. :) Corrected.

XhmikosR
8th February 2009, 12:42
I did a quick comparison. The results are here:

480p sample

start /high /b x264-1101 --pass 1 --progress --quiet --bitrate 1823 --stats "1.stats" --level 3.1 --bframes 3 --b-pyramid --direct auto --subme 1 --analyse none --threads auto --thread-input --me hex --no-dct-decimate --no-psnr --no-ssim --vbv-bufsize 14000 --vbv-maxrate 17500 --output NUL "test-480p.avs" --aud

start /high /b x264-1101 --pass 2 --progress --quiet --bitrate 1823 --stats "1.stats" --level 3.1 --ref 5 --mixed-refs --no-fast-pskip --bframes 3 --b-pyramid --weightb --direct auto --subme 7 --trellis 1 --analyse all --8x8dct --vbv-bufsize 14000 --vbv-maxrate 17500 --threads auto --thread-input --me hex --sar 427:360 --no-dct-decimate --no-psnr --no-ssim --output "../run1-480p.mkv" "test-480p.avs" --aud

Results for x264 v0.66.1101 [gcc 3.4.6-x264.nl]
encoded 1419 frames, 147.67 fps, 1835.23 kb/s
encoded 1419 frames, 149.35 fps, 1835.23 kb/s
encoded 1419 frames, 149.12 fps, 1835.23 kb/s
encoded 1419 frames, 148.88 fps, 1835.23 kb/s
encoded 1419 frames, 50.93 fps, 1802.78 kb/s
encoded 1419 frames, 50.67 fps, 1803.55 kb/s
encoded 1419 frames, 50.82 fps, 1804.16 kb/s
encoded 1419 frames, 50.82 fps, 1803.53 kb/s
====================================
Results for x264 v0.66.1101 [ICC11.SSSE3.x32-imk]
encoded 1419 frames, 153.39 fps, 1835.23 kb/s
encoded 1419 frames, 153.92 fps, 1835.23 kb/s
encoded 1419 frames, 153.65 fps, 1835.23 kb/s
encoded 1419 frames, 153.90 fps, 1835.23 kb/s
encoded 1419 frames, 52.67 fps, 1803.50 kb/s
encoded 1419 frames, 52.55 fps, 1803.16 kb/s
encoded 1419 frames, 52.67 fps, 1803.63 kb/s
encoded 1419 frames, 52.19 fps, 1803.60 kb/s


720p sample

start /high /b x264-1101 --quiet --progress --pass 1 --bitrate 3959 --stats "1.stats" --level 4.1 --keyint 24 --min-keyint 1 --bframes 3 --direct auto --subme 1 --analyse none --ipratio 1.4 --pbratio 1.3 --vbv-bufsize 50000 --vbv-maxrate 50000 --qcomp 0.5 --me dia --threads auto --thread-input --sar 1:1 --progress --no-psnr --no-ssim --output NUL "test-720p.avs" --mvrange 511 --aud --nal-hrd

start /high /b x264-1101 --quiet --progress --pass 2 --bitrate 3959 --stats "1.stats" --level 4.1 --keyint 24 --min-keyint 1 --ref 3 --mixed-refs --bframes 3 --weightb --direct auto --subme 7 --analyse p8x8,b8x8,i4x4,i8x8 --8x8dct --ipratio 1.4 --pbratio 1.3 --vbv-bufsize 50000 --vbv-maxrate 50000 --qcomp 0.5 --me umh --threads auto --thread-input --sar 1:1 --progress --no-psnr --no-ssim --output "../run1-720p.mkv" "test-720p.avs" --mvrange 511 --aud --nal-hrd

Results for x264 v0.66.1101 [gcc 3.4.5-skystrife]
encoded 1442 frames, 75.15 fps, 3989.43 kb/s
encoded 1442 frames, 76.08 fps, 3989.43 kb/s
encoded 1442 frames, 76.08 fps, 3989.43 kb/s
encoded 1442 frames, 76.27 fps, 3989.43 kb/s
encoded 1442 frames, 20.55 fps, 3940.81 kb/s
encoded 1442 frames, 20.60 fps, 3940.70 kb/s
encoded 1442 frames, 20.62 fps, 3942.12 kb/s
encoded 1442 frames, 20.61 fps, 3939.54 kb/s
====================================
Results for x264 v0.66.1101 [ICC11.SSSE3.x32-imk]
encoded 1442 frames, 79.01 fps, 3989.43 kb/s
encoded 1442 frames, 79.62 fps, 3989.43 kb/s
encoded 1442 frames, 78.87 fps, 3989.43 kb/s
encoded 1442 frames, 79.34 fps, 3989.43 kb/s
encoded 1442 frames, 21.19 fps, 3941.79 kb/s
encoded 1442 frames, 21.18 fps, 3941.50 kb/s
encoded 1442 frames, 21.19 fps, 3940.68 kb/s
encoded 1442 frames, 21.14 fps, 3940.40 kb/s


Intel Core2 Quad Q6600 (G0) 2.40GHz @ 3GHz
2x1GB DDR2 @ 800MHz
Vista Business SP1 32bit

burfadel
8th February 2009, 13:30
thats a nice little 4 percent speed boost :)

imk
8th February 2009, 13:34
I do maintain some personal benchmarks here:
http://spreadsheets.google.com/pub?key=pbffjdC6iUPWs2HtYHwZ2VQ&hl=sv

You may always do your own.

menlvd
8th February 2009, 14:11
thats a nice little 4 percent speed boost :)

cost a little-little reduce of quality
imk - http://pic.ipicture.ru/uploads/090208/4347/thumbs/qUo5dS3dV3.png (http://ipicture.ru/Gallery/Viewfull/12988164.html)
komisar - http://pic.ipicture.ru/uploads/090208/4347/thumbs/iZZwCaZi5C.png (http://ipicture.ru/Gallery/Viewfull/12988056.html)

cmd:
set o1=-p 1 -B 7000 --vbv-maxrate 40000 --vbv-bufsize 62500 -I 240 -i 1 -r 5 -b 3 -f -3,-3 -A all -8 --me tesa -t 2 -m 9 -w --mixed-refs --no-fast-pskip --threads 6 --no-psnr --no-ssim --level 4.1 --progress --aq-mode 1 --aq-strength 1.7 --psy-rd 0.6:0.1 --b-adapt 2 -o NUL %in% --stats "c:\x264_2pass.log" --sar %sar% --direct auto --thread-input --no-dct-decimate --b-pyramid --qpmax 35 --qcomp 1 --qpmin 5 --ratetol 3

set o2=-p 2 -B 7000 --vbv-maxrate 40000 --vbv-bufsize 62500 -I 240 -i 1 -r 5 -b 3 -f -3,-3 -A all -8 --me tesa -t 2 -m 9 -w --mixed-refs --no-fast-pskip --threads 6 --no-psnr --no-ssim --level 4.1 --progress --aq-mode 1 --aq-strength 1.7 --psy-rd 0.6:0.1 --b-adapt 2 -o %out% %in% --stats "c:\x264_2pass.log" --sar %sar% --direct auto --thread-input --no-dct-decimate --b-pyramid --qpmax 35 --qcomp 1 --qpmin 5 --ratetol 3

imk
8th February 2009, 18:03
Quote from BugMaster on IRC:
"Probably the difference is due multithreading non determinism with VBV."

imk
8th February 2009, 18:39
I just did a comparison myself.

gcc v4.3.3:

yuv4mpeg: 1920x1080@30/1fps, 1:1
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64
x264 [info]: profile High, level 5.1
x264 [info]: slice I:18 Avg QP:17.70 size:398805 PSNR Mean Y:44.88 U:95.96 V:95.40 Avg:46.64 Global:46.55
x264 [info]: slice P:32 Avg QP:23.85 size:220833 PSNR Mean Y:37.61 U:48.26 V:49.09 Avg:39.20 Global:38.83
x264 [info]: consecutive B-frames: 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
x264 [info]: mb I I16..4: 0.6% 96.3% 3.1%
x264 [info]: mb P I16..4: 0.1% 30.1% 1.4% P16..4: 21.4% 29.2% 14.3% 2.0% 1.5% skip: 0.0%
x264 [info]: 8x8 transform intra:95.9% inter:56.5%
x264 [info]: ref P L0 92.0% 4.5% 1.4% 0.5% 0.5% 0.3% 0.2% 0.1% 0.1% 0.1% 0.1% 0.0% 0.1% 0.0% 0.0% 0.0%
x264 [info]: SSIM Mean Y:0.9447691
x264 [info]: PSNR Mean Y:40.227 U:65.431 V:65.758 Avg:41.877 Global:40.372 kb/s:68376.64


icc v11.0.074 (gcc compatibility mode):

yuv4mpeg: 1920x1080@30/1fps, 1:1
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64
x264 [info]: profile High, level 5.1
x264 [info]: slice I:18 Avg QP:17.70 size:398805 PSNR Mean Y:44.88 U:95.96 V:95.40 Avg:46.64 Global:46.55
x264 [info]: slice P:32 Avg QP:23.85 size:220833 PSNR Mean Y:37.61 U:48.26 V:49.09 Avg:39.20 Global:38.83
x264 [info]: consecutive B-frames: 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
x264 [info]: mb I I16..4: 0.6% 96.3% 3.1%
x264 [info]: mb P I16..4: 0.1% 30.1% 1.4% P16..4: 21.4% 29.2% 14.3% 2.0% 1.5% skip: 0.0%
x264 [info]: 8x8 transform intra:95.9% inter:56.5%
x264 [info]: ref P L0 92.0% 4.5% 1.4% 0.5% 0.5% 0.3% 0.2% 0.1% 0.1% 0.1% 0.1% 0.0% 0.1% 0.0% 0.0% 0.0%
x264 [info]: SSIM Mean Y:0.9447691
x264 [info]: PSNR Mean Y:40.227 U:65.431 V:65.758 Avg:41.877 Global:40.372 kb/s:68376.65


icc v11.0.074:

yuv4mpeg: 1920x1080@30/1fps, 1:1
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64
x264 [info]: profile High, level 5.1
x264 [info]: slice I:18 Avg QP:17.70 size:398805 PSNR Mean Y:44.88 U:95.96 V:95.40 Avg:46.64 Global:46.55
x264 [info]: slice P:32 Avg QP:23.85 size:220833 PSNR Mean Y:37.61 U:48.26 V:49.09 Avg:39.20 Global:38.83
x264 [info]: consecutive B-frames: 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
x264 [info]: mb I I16..4: 0.6% 96.3% 3.1%
x264 [info]: mb P I16..4: 0.1% 30.1% 1.4% P16..4: 21.4% 29.2% 14.3% 2.0% 1.5% skip: 0.0%
x264 [info]: 8x8 transform intra:95.9% inter:56.5%
x264 [info]: ref P L0 92.0% 4.5% 1.4% 0.5% 0.5% 0.3% 0.2% 0.1% 0.1% 0.1% 0.1% 0.0% 0.1% 0.0% 0.0% 0.0%
x264 [info]: SSIM Mean Y:0.9447691
x264 [info]: PSNR Mean Y:40.227 U:65.431 V:65.758 Avg:41.877 Global:40.372 kb/s:68376.65


The 0,01 kb/s larger bitrate in the icc builds has to do with the version string being larger, otherwise all numbers are completely identical.

These tests were done with this line:
--bframes 16 --b-pyramid --direct auto --ref 16 --crf 20 --partitions all --weightb --me tesa --subme 9 --mixed-refs --8x8dct --no-fast-pskip --no-dct-decimate --trellis 2 --qpmin 0 --progress --threads auto --frames 50

Using mplayer I dumped the 29th frame out of each resulting encode to a png with mplayer -vo png:z=1. I did a diff between the three PNGs and all three are 100% identical to each other.

kemuri-_9
8th February 2009, 20:02
here's a bench i did...
using the following makefile i created for this purpose:
x264bench (http://kemuri9.net/dev/x264/other/x264bench)
the options are taken from the fprofiling paths, so it hits a lot of things...

executing it like so:

$ make -f x264bench BINS="x264_x64_k8.exe x264.r1101M.SSE2.x64.imk.exe" VIDS="foreman_176x144.yuv x264_build1.y4m"


it also checks for gpac, avs, pthread support, and
the ability to use more than 1 zone (nonbroken strtok_r or win_zone_parse_fix_xx.diff)

resulting files are
x264.r1101M.SSE2.x64.imk.exe_bench.log (http://kemuri9.net/dev/x264/other/x264.r1101M.SSE2.x64.imk.exe_bench.log)
x264_x64_k8.exe_bench.log (http://kemuri9.net/dev/x264/other/x264_x64_k8.exe_bench.log)

Some notes are:
1. there's some small kb/s discrepancies which is a result of one of my patches that moves where the post-encoding stats are generated and displayed from.
2. as I'm on a phenom, i don't have SSSE3 and i use -march=k8 when compiling my x64 builds.

imk
8th February 2009, 20:38
You don't specify a target architecture with ICC; you specify a minimum target instruction set instead. The SSE2 build will work on anything SSE2 and higher, and the SSSE3 build will work on any processor with SSSE3 support and higher. All builds will still take advantage of whatever your processor supports, I.E. the SSE2 build will still use SSSE3, or SSE4.2, etc.

I have a benchmark script on my site:
http://imk.cx/pc/x264/bench_x264.pl

The only module that you should have to install is Data::Types.
The script works on Linux, Mac OS X, and Windows. I have not tested it with Cygwin on Windows, but if you use ActivePerl, it works with that.
To get it to work with ActivePerl, run "ppm install Data::Types" and then change the $output line inside of the script and run it.

The script will output a single value if you only do 1 run. For 2-5 runs, it will display an average. Anything 6 runs or higher will calculate outliers and filter out data to give a more accurate number.

If you feed it no arguments, it will ask for input, but you may also specify the arguments to have it run directly.

./bench_x264.pl <binary> <input> <test number [0-3]> <threads [0-# or auto]> <number of runs>

Examples:
./bench_x264.pl ./x264-r1101M-icc ./SOCCER_352x288_30_orig_02.yuv 0 0 20
./bench_x264.pl ./x264-r1101M-icc ./SOCCER_352x288_30_orig_02.yuv 1 auto 20

Fr4nz
9th February 2009, 11:52
Hello imk, I appreciate your work and I'd have a request for you: is it possibile to implement mp4 output in your build?

Thanks for any answer!

G_M_C
9th February 2009, 12:09
Hello imk, I appreciate your work and I'd have a request for you: is it possibile to implement mp4 output in your build?

Thanks for any answer!

There was a thread about MP4 output, that seemed to suggest that x264 has to output an x264 elementary stream, without container etc. Muxing into some form of container has to be left to muxers, not the encoder. This seems very reasonable to me since x264 is primarely designed for encoding, and muxing into a file-format is a different task (and therefore should be left for other applications).

Fr4nz
9th February 2009, 12:11
There was a thread about MP4 output, that seemed to suggest that x264 has to output an x264 elementary stream, without container etc. Muxing into some form of container has to be left to muxers, not the encoder. This seems very reasonable to me since x264 is primarely designed for encoding, and muxing into a file-format a different task (and therefore should be left for other applications).

I got the point, thanks for the clarification.

burfadel
9th February 2009, 12:36
Wow! nice list of updates an hour ago! It would be interesting to see the cumulative performance benefit over the 4 submissions. One gives slight quality improvement, the others add:

- Up to ~17% faster CABAC RDO, ~36% faster intra-only CABAC RDO. Up to 7% faster overall in extreme cases.
- Faster coeff_last64 on 32-bit
- SSSE3 version of predict_8x8_hu
- SSE2 version of predict_8x8c_p
- SSSE3 versions of both planar prediction functions
- Optimizations to predict_16x16_p_sse2
- Some unnecessary REP_RETs -> RETs.
- SSE2 version of predict_8x8_vr by Holger.
- SSE2 version of predict_8x8_hd.
- Don't compile MMX versions of some of the pred functions on x86_64.
- Remove now-useless x86_64 C versions of 4x4 pred functions.
- Rewrite some of the x86_64-only C functions in asm.

Its a good think its not 1 April, people would think its a cruel April Fool's joke! :)