View Full Version : x264cli with ICC9
uray
6th December 2005, 21:29
it is based on x264 rev.384 from SVN, compiled with intel C Compiler 9.0
you can download here : http://s1.math.itb.ac.id/~uray/x264/
it's my test :
avisynth script:
Crop(0,76,-0,-84)
TomsMoComp(0,5,1)
encoder parameter:
pass 1 :
-b 4 -r 16 -f 1:1 -B 512 -p 1 -A all -w --me umh --merange 16 -m 7 --b-rdo --mixed-refs -8 -t 2 --progress
pass 2 :
-b 4 -r 16 -f 1:1 -B 512 -p 1 -A all -w --me umh --merange 16 -m 7 --b-rdo --mixed-refs -8 -t 2 --progress
and here is the result :
my build :
Pass 1 : encoded 1181 frames, 2.52 fps, 464.48 kb/s
Pass 2 : encoded 1181 frames, 2.60 fps, 515.20 kb/s
build from x264.nl :
Pass 1 : encoded 1181 frames, 2.37 fps, 464.48 kb/s
Pass 2 : encoded 1181 frames, 2.46 fps, 515.20 kb/s
my build tested using x264 rev.284 compiled with ICC9 with Pentium 4 [MMX,SSE,SSE2] optimization (x264cli-r384-IA32P4.exe)
both running on : intel Pentium M 740 1.73Ghz 2MB L2Cache, mem 512MB DDR2
i haven't test another processor yet...
Sirber
6th December 2005, 21:32
good! Did you include all patches?
uray
6th December 2005, 21:35
good! Did you include all patches?
what patch? it's AVIS and MP4 output, and i just download all SVN snapshot and recompile.
why? executable is large (~1MB) maybe because intel use it's own runtime library (like : msvcrt stuff) so i linked to exe
Sharktooth
6th December 2005, 21:42
well, 0.15 fps is not much... and the x264.nl builds have no GCC compiler optimizations.
as i always said building with ICL is not worth trying.
Do the test again vs this SVN build: http://www.webalice.it/f.corriga/temp/x264_384svn.7z
puffpio
6th December 2005, 22:23
well, 0.15 fps is not much... and the x264.nl builds have no GCC compiler optimizations.
as i always said building with ICL is not worth trying.
Do the test again vs this SVN build: http://www.webalice.it/f.corriga/temp/x264_384svn.7z
unless you think of it as a 5-6% speedup...which would then be worthwhile :)
uray
6th December 2005, 22:24
well, 0.15 fps is not much... and the x264.nl builds have no GCC compiler optimizations.
as i always said building with ICL is not worth trying.
Do the test again vs this SVN build: http://www.webalice.it/f.corriga/temp/x264_384svn.7z
well at least when you doing 2 pass encode on 1h40m 25fps movie (that is 150000 frames) you'll save 3 hour by +0.15fps
btw, that's file is corrupt... can you give another one or re-upload it?edit: nevermind, its my fault, ok I'll test it with that build
Sharktooth
6th December 2005, 22:24
unless you think of it as a 5-6% speedup...which would then be worthwhile :)
Only on Intel CPUs... ICL9 is known to be not so kind on AMDs (even slower than previous versions!)...
Sharktooth
6th December 2005, 22:25
well at least when you doing 2 pass encode on 1h40m 25fps movie (that is 150000 frames) you'll save 3 hour by +0.15fps
btw, that's file is corrupt... can you give another one or re-upload it?
The file is working. Use 7-zip 4.31 to open it.
uray
6th December 2005, 22:48
ok here it is the results :
from http://www.webalice.it/f.corriga/temp/x264_384svn.7z :
Pass 1 : encoded 1181 frames, 2.46 fps, 464.48 kb/s
Pass 2 : encoded 1181 frames, 2.56 fps, 515.20 kb/s
and...
from http://files.x264.nl/Sharktooth/force.php?file=./x264-Lite_r384D.7z :
Pass 1 : encoded 1181 frames, 1.80 fps, 463.51 kb/s
Pass 2 : encoded 1181 frames, 1.86 fps, 513.94 kb/s
uray
6th December 2005, 22:50
Only on Intel CPUs... ICL9 is known to be not so kind on AMDs (even slower than previous versions!)...
or even it won't run at all... :)
Sirber
6th December 2005, 22:50
Sharktooth build has more patches which boost quality but kill framerate. Can you apply the same patch he did?
http://files.x264.nl/Sharktooth/?dir=./x264_patches
uray
6th December 2005, 22:54
Sharktooth build has more patches which boost quality but kill framerate. Can you apply the same patch he did?
http://files.x264.nl/Sharktooth/?dir=./x264_patches
ok I'll try..
Sharktooth
6th December 2005, 23:10
ok here it is the results :
from http://www.webalice.it/f.corriga/temp/x264_384svn.7z :
Pass 1 : encoded 1181 frames, 2.46 fps, 464.48 kb/s
Pass 2 : encoded 1181 frames, 2.56 fps, 515.20 kb/s
and...
from http://files.x264.nl/Sharktooth/force.php?file=./x264-Lite_r384D.7z :
Pass 1 : encoded 1181 frames, 1.80 fps, 463.51 kb/s
Pass 2 : encoded 1181 frames, 1.86 fps, 513.94 kb/s
so the real difference from your ICL build and my GCC build (both SVN) is about 0.05 FPS or in other words, less than 2%... ;)
... and still ICL9 is not as "fast" on AMD cpus...
EDIT: i forgot to say my builds do not use the "profiling" feature of the GCC 4.x compilers coz i still use GCC 3.4.4. 4.x with profiling may be faster than ICL (tests shown a 4-8% encoding speed gain with gcc 4.0.2), but i'll wait for a final GCC 4.1 before using that feature.
akupenguin
6th December 2005, 23:25
You mean you don't use profiling at all, or just that it's improved in 4.x? Because 3.4 benefits from profiling too.
Sharktooth
6th December 2005, 23:25
i do not use profiling at all AND 4.x has improved profiling, expecially 4.1.
maybe i will enable it for a win32 test build though.
Sharktooth
7th December 2005, 00:28
Try this one against your ICL9 build: http://www.webalice.it/f.corriga/temp/x264_384svn+fprofile.7z
it was made with GCC profiling.
uray
7th December 2005, 04:09
here we go...
ICC9 build with applied patches:
- x264_new_decimation.2
- x264_umh_termination.0
- x264_signature
- x264_p8rd.9.update2
- x264_aq.5
- x264_check_subpel_predictors.1
- x264_experimental_rd_pskip.2
binaries are available on my site (see my first post)
(x264-r384D-IA32P4.exe)
Pass 1 : encoded 1181 frames, 1.82 fps, 463.60 kb/s
Pass 2 : encoded 1181 frames, 1.89 fps, 514.16 kb/s
I thought the instruction sets between Pentium 4 and Pentium M
is same (both MMX,SSE, and SSE2) but I notice that executable size
produced by compiler is different,then I tried this build, resulting
in slightly faster code since I'm using Pentium M for this test.
(x264-r384D-IA32PM.exe)
Pass 1 : encoded 1181 frames, 1.83 fps, 463.60 kb/s
Pass 2 : encoded 1181 frames, 1.90 fps, 514.16 kb/s
from : http://www.webalice.it/f.corriga/temp/x264_384+pathes+fprofile.7z
Pass 1 : encoded 1181 frames, 1.80 fps, 463.51 kb/s
Pass 2 : encoded 1181 frames, 1.86 fps, 513.94 kb/s
berrinam
7th December 2005, 12:58
Pass 1 : encoded 1181 frames, 1.82 fps, 463.60 kb/s
Pass 2 : encoded 1181 frames, 1.89 fps, 514.16 kb/s
A32PM.exe)
Pass 1 : encoded 1181 frames, 1.83 fps, 463.60 kb/s
Pass 2 : encoded 1181 frames, 1.90 fps, 514.16 kb/s
Pass 1 : encoded 1181 frames, 1.80 fps, 463.51 kb/s
Pass 2 : encoded 1181 frames, 1.86 fps, 513.94 kb/s
To me, this signals that something is up. Why is the GCC build producing a different bitrate from the ICC9 build?
Sharktooth
7th December 2005, 13:29
here we go...
ICC9 build with applied patches:
- x264_new_decimation.2
- x264_umh_termination.0
- x264_signature
- x264_p8rd.9.update2
- x264_aq.5
- x264_check_subpel_predictors.1
- x264_experimental_rd_pskip.2
binaries are available on my site (see my first post)
(x264-r384D-IA32P4.exe)
Pass 1 : encoded 1181 frames, 1.82 fps, 463.60 kb/s
Pass 2 : encoded 1181 frames, 1.89 fps, 514.16 kb/s
I thought the instruction sets between Pentium 4 and Pentium M
is same (both MMX,SSE, and SSE2) but I notice that executable size
produced by compiler is different,then I tried this build, resulting
in slightly faster code since I'm using Pentium M for this test.
(x264-r384D-IA32PM.exe)
Pass 1 : encoded 1181 frames, 1.83 fps, 463.60 kb/s
Pass 2 : encoded 1181 frames, 1.90 fps, 514.16 kb/s
from : http://www.webalice.it/f.corriga/temp/x264_384+pathes+fprofile.7z
Pass 1 : encoded 1181 frames, 1.80 fps, 463.51 kb/s
Pass 2 : encoded 1181 frames, 1.86 fps, 513.94 kb/s
As you can see the difference is still not that "big". Also you use CPU specific optimizations, while im using a generic Pentium2 MMX optimization.
Sirber
7th December 2005, 13:30
@Sharktooth
Why not moving to P3 specific (MMX, SSE)? I'm not sure people with P2 uses x264... ;)
Sharktooth
7th December 2005, 13:32
there is no floating point math in x264 C code (except maybe the ratecontrol), and MMX is usually faster than iSSE.
Sirber
7th December 2005, 13:40
currentl build use MMX and SSE2? I read about some ASM SSE2 optimization for x264...
Sharktooth
7th December 2005, 13:42
i mean in the C code.
uray
7th December 2005, 14:49
As you can see the difference is still not that "big". Also you use CPU specific optimizations, while im using a generic Pentium2 MMX optimization.
do you mean it is compiler optimization or code or hand tuned optimization? if it is compiler optimization why not build different version for P2 , P3 , P4 , AMD Athlon, etc... , then everyone can get real processor power on their PC.... since it is only need just click and click on compiler settings...
Sirber
7th December 2005, 14:54
He used to do, but it's too long :)
I used to do, but it's too long too :)
uray
7th December 2005, 17:57
i don't understand what do you mean by "too long" ?
Sirber
7th December 2005, 18:02
i don't understand what do you mean by "too long" ?
Too much time and efforts, I remember the time when there was like 5 new builds a day...
foxyshadis
7th December 2005, 22:57
When all the time-critical parts are in C, it can be useful to have separate builds, but when they're already hand-optimized MMX/SSE assembly that won't be optimized, why optimize the rest of the non-critical parts? Gain a few percent, at the price of an extra hour a day of the builder's time. >.>
Sharktooth
8th December 2005, 14:15
in x264 all the DSP parts (the critical ones) are already "hand-optimized" and written in MMX/SSE/SSE2 (and maybe even 3DNOW, but i dont remember...) assembly.
So spending much time on C code optimizations is useless coz the gains will be minimal (if not zero) but the time spent trying tweaks and releasing different processor optimized builds will be huge. Also the overoptimization may also be negative in terms of encoding speed...
uray
9th December 2005, 10:14
i mean not hand optimized for processor specific but compiler optimization, that is only changing compiler settings for specific processor, it won't take too much time... even the improvements is not significant i guess it is worth trying for speed geek people... just my opinion, that's ok I'll build intel specific processor for x264 now... btw i've seen amazing code inside x264.... hehehe!!!
Sharktooth
9th December 2005, 15:10
... i already did CPU specific releases... it wasn't worth a cent...
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.