View Full Version : Current Patches, Where to get them, How they affect speed/output
kemuri-_9
27th August 2008, 04:53
So, march=athlon should be the best choice for AMD, as it works with both of those but doesn't require much else :)
confirmed ;)
pentium2 (for ez link access) (http://kemuri9.net/dev/x264/x264_profile.gcc-3.4.5.pentium2.log)
AMD Series:
k6 (http://kemuri9.net/dev/x264/x264_profile.gcc-3.4.5.k6.log), k6-2 (http://kemuri9.net/dev/x264/x264_profile.gcc-3.4.5.k6-2.log), athlon (http://kemuri9.net/dev/x264/x264_profile.gcc-3.4.5.athlon.log)
k6 was slow to p2.
k6-2 was better, but still slower than p2.
athlon was where it hit faster than p2 on a nearly consistent basis.
Ranguvar
27th August 2008, 05:24
http://sites.google.com/site/ranguvar13/x264-builds
Direct download (http://sites.google.com/site/ranguvar13/x264-builds/rang_x264_r0950.7z?attredirects=0), Mirrors (http://www.rapidspread.com/file.jsp?id=nznfhwe88u)
x264 r950 from Git (patched).
Open this archive with the free, multi-platform tools 7-Zip or p7zip. Compressed with LZMA.
The src folder contains the patched source code.
The bin folder contains a binary executable for Athlon and later AMD CPUs (those with 3DNow! and 3DNow!+ support),
and one for those without. There are also DLLs for those apps that use them (NOT for AviDemux).
Official source and vanilla builds: http://x264.nl/
Changelog: http://git.videolan.org/gitweb.cgi?p=x264.git;a=shortlog
Patches and discussion at Doom9's forum: http://forum.doom9.org/
Applied patches (included, unchanged, in the patches folder):
patch -p1 < ../x264diffs/x264_dll_alignment_fix.01.diff
patch < ../x264diffs/x264.progress.indication.01.diff
patch -p1 < ../x264diffs/x264_hrd_pulldown.09_interlace.diff
patch -p1 < ../x264diffs/x264-psyrd-0.6.r950.diff
patch -p1 < ../x264diffs/x264.new.bframes.decision.04.5.diff
Compiled by Ranguvar on August 27th, 2008, with GCC 4.3.1 on Windows XP Professional x64 SP2.
CLI used for non-AMD build: ./configure --enable-shared --extra-cflags="-march=pentium2 -pipe" && make fprofiled VIDS="../enctests/deadline_cif.y4m"
CLI used for AMD build: ./configure --enable-shared --extra-cflags="-march=athlon -pipe" && make fprofiled VIDS="../enctests/deadline_cif.y4m"
Platform: X86
System: MINGW
asm: yes
avis input: yes
mp4 output: yes
pthread: yes
gtk: no
debug: no
gprof: no
PIC: no
shared: yes
visualize: no
Ranguvar
27th August 2008, 05:28
Thanks, kemuri, for the continued research! :D
gigah72
27th August 2008, 05:28
ot:
where can i get the needed package/tools to add the version info to the build?
or can someone upload the needed files?
Ranguvar
27th August 2008, 05:31
Kemuri, I believe, posted the list of stuff needed a while back. Find that and then use Google. First search with "mingw" added to see if there's a binary out there, otherwise, compile from source.
Hint: I believe everything you need is on the MinGW Sourceforge page, except for Git, which can be found by installing msysgit and then grabbing its stuff. Doing a ./configure will tell you if you're missing something.
bob0r
27th August 2008, 05:40
bob0r, skystrife, may I ask how you are compiling the DLL? I know how to do it, but x264 seems to need some modifications (http://avidemux.org/admForum/viewtopic.php?pid=28804) before the DLL will work in AviDemux. Is that exactly what you have done?
My .dll has no modifications, other than the patches themself.
Compared to the GIT-code.
kemuri-_9
27th August 2008, 05:52
if the build doesn't include the revision number which it's automatically configured to do on compilation, then it doesn't have access to some CLI utils:
GIT CLI : git-rev-list, git-status
binutils: awk (usually via gawk or mawk), grep, head, join, sed, sort, wc
be sure these binaries are on the PATH environment variable or it won't be able to include the revision number within the build.
many of these are part of Msys:
https://sourceforge.net/project/showfiles.php?group_id=2435&package_id=24963
coreutils package: head, join, sort, wc
msysCORE package: sed
gawk package: awk (via gawk)
the GIT CLI i have via msysgit:
http://code.google.com/p/msysgit/downloads/list
msysgit has LOTS of stuff in it besides just git cli binaries as well...
as it can fully create a working bash shell to for cli usage if you don't want to use the gui and don't want to have it in your windows PATH variable to use within the command prompt.
in fact msysgit comes with awk (via gawk), grep, head, sed, sort, wc (only missing join for needed purposes)
Ranguvar
27th August 2008, 05:52
@bob0r: OK, so your DLL is b0rked, too... read back a few pages :)
gigah72
27th August 2008, 06:09
many of these are part of Msys:
https://sourceforge.net/project/showfiles.php?group_id=2435&package_id=24963
coreutils package: head, join, sort, wc
msysCORE package: sed
gawk package: awk (via gawk)
the GIT CLI i have via msysgit:
http://code.google.com/p/msysgit/downloads/list
msysgit has LOTS of stuff in it besides just git cli binaries as well...
as it can fully create a working bash shell to for cli usage if you don't want to use the gui and don't want to have it in your windows PATH variable to use within the command prompt.
in fact msysgit comes with awk (via gawk), grep, head, sed, sort, wc (only missing join for needed purposes)
thanks,
i put all the files together, but now this:
fatal: Not a git repository
any idea what's wrong in the setup?
Ranguvar
27th August 2008, 06:09
@gigah72: You need to download x264 from Git, and make sure not to delete the .git folder in the downloaded files. (git clone git://git.videolan.org/x264.git)
@all: r950 was b0rked until now, only use binaries compiled after this post...
Dark Shikari
27th August 2008, 06:10
Warning to all: r950 was b0rked until now, only use binaries compiled after this post...Correct, I made a typo in the commit and modified history in order to fix it. You'll have to re-update your sources and recompile.
Ranguvar
27th August 2008, 06:21
http://sites.google.com/site/ranguvar13/x264-builds
Direct download (http://sites.google.com/site/ranguvar13/x264-builds/rang_x264_r0950-2.7z?attredirects=0), Mirrors (http://www.rapidspread.com/file.jsp?id=rsk1zh9m1w)
x264 r950 (NON-B0RKED VERSION) from Git (patched).
Open this archive with the free, multi-platform tools 7-Zip or p7zip. Compressed with LZMA.
The src folder contains the patched source code.
The bin folder contains a binary executable for Athlon and later AMD CPUs (those with 3DNow! and 3DNow!+ support),
and one for those without. There are also DLLs for those apps that use them (NOT for AviDemux).
Official source and vanilla builds: http://x264.nl/
Changelog: http://git.videolan.org/gitweb.cgi?p=x264.git;a=shortlog
Patches and discussion at Doom9's forum: http://forum.doom9.org/
Applied patches (included, unchanged, in the patches folder):
patch -p1 < ../x264diffs/x264_dll_alignment_fix.01.diff
patch < ../x264diffs/x264.progress.indication.01.diff
patch -p1 < ../x264diffs/x264_hrd_pulldown.09_interlace.diff
patch -p1 < ../x264diffs/x264-psyrd-0.6.r950.diff
patch -p1 < ../x264diffs/x264.new.bframes.decision.04.5.diff
Compiled by Ranguvar on August 27th, 2008, with GCC 4.3.1 on Windows XP Professional x64 SP2.
CLI used for non-AMD build: ./configure --enable-shared --extra-cflags="-march=pentium2 -pipe" && make fprofiled VIDS="../enctests/deadline_cif.y4m"
CLI used for AMD build: ./configure --enable-shared --extra-cflags="-march=athlon -pipe" && make fprofiled VIDS="../enctests/deadline_cif.y4m"
Platform: X86
System: MINGW
asm: yes
avis input: yes
mp4 output: yes
pthread: yes
gtk: no
debug: no
gprof: no
PIC: no
shared: yes
visualize: no
egrimisu
27th August 2008, 08:37
hi where can i get the latest and a good one x264 encoder with psyrdo 0.6?
alfadude
27th August 2008, 08:56
hi where can i get the latest and a good one x264 encoder with psyrdo 0.6?
In this thread only a few posts back.
egrimisu
27th August 2008, 09:06
In this thread only a few posts back.
and who's build would you recomad?
alfadude
27th August 2008, 09:12
Skystrife builds them to my satisfaction. :-)
egrimisu
27th August 2008, 09:23
Skystrife builds them to my satisfaction. :-)
is there a optimized version for core2duo?
is renguvar release better than hte one you like? i saw that your preferd one's latest version has some bframa i don't know what in experimental mode!!!
alfadude
27th August 2008, 09:32
Try it and you will see which one you like best.
The new bframes method is slow when used with many bframes but as standard all builds still use the old method. If you want to test the new method put --b-adapt 2 in your command line.
This is not new information, it was all already answered for those who like to read.
akupenguin
27th August 2008, 09:45
I'm gonna guess that if march=k8 is faster on AMD CPU's, it's because of the 3dNOW! and 3dNOW!+.
gcc doesn't use 3dnow nor mmx, not even if -march / -mmmx / -m3dnow allow it. (The rationale (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552) is that using mmregs requires emms and gcc is too stupid to figure out where to put emms.) This statement applies not only to autovectorization, but even to generic vector intrinsics (__attribute__((vector_size))). Only explicit 3dnow intrinsics use 3dnow.
Sharktooth
27th August 2008, 12:28
is there a optimized version for core2duo?
is renguvar release better than hte one you like? i saw that your preferd one's latest version has some bframa i don't know what in experimental mode!!!
here we go... again...get a generic build.
skystrife
27th August 2008, 14:19
x264.950.modified.02.exe (http://www.mediafire.com/?e030nattaca) - Alternate Download (http://skystrife.com/x264/x264.950.modified.02.exe)
(discontinuing dll unless someone was using it for something other than avidemux and had it working as I was building it; I'm not shifting from 3.4.5)
Patches used:
x264_psy_rdo_0.6_r950.diff <-- Updated to patch with r950 (check psyRDO thread).
x264_new_bframe_decision_04.5.diff <-- This patch is highly experimental, only enabled with --b-adapt 2. The --no-b-adapt parameter now works. Updated version.
x264_hrd_pulldown.09_interlace.diff
x264.progress.indication.01.diff
gcc 3.4.5 fprofiled build.
kemuri-_9
27th August 2008, 15:13
gcc doesn't use 3dnow nor mmx, not even if -march / -mmmx / -m3dnow allow it. (The rationale (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552) is that using mmregs requires emms and gcc is too stupid to figure out where to put emms.) This statement applies not only to autovectorization, but even to generic vector intrinsics (__attribute__((vector_size))). Only explicit 3dnow intrinsics use 3dnow.
if that's true, then what is gcc doing with -march=athlon to make it faster for AMDs than -march=pentium2?
*Removed all profiling logs and binaries for builds besides -march=athlon and -march=pentium2 from my site.*
akupenguin
27th August 2008, 15:18
Instruction choice and scheduling. Very often there are several possible instructions that fill the same role, such as add/shift/lea/mul, and different combinations are faster on different cpus.
bob0r
27th August 2008, 18:44
@skystrife:
x264_new_bframe_decision_04.5.diff where to get?
Ranguvar
27th August 2008, 18:50
Kemuri posted it a while back in this thread.
But it's not patching with r951 right now...
cyberbeing
27th August 2008, 18:50
@skystrife:
x264_new_bframe_decision_04.5.diff where to get?
kemuri-_9 posted it here: http://forum.doom9.org/showpost.php?p=1175019&postcount=774
J_Darnley
27th August 2008, 18:50
Look up, to post #774 (http://forum.doom9.org/showthread.php?p=1175019#post1175019).
kemuri-_9
27th August 2008, 18:55
I just tried it on r951, has a rejection, see if i can fix it.
EDIT:
r951 had this alteration
index 3b720af..5217253 100644
@@ -299,9 +299,11 @@ void x264_frame_init_lowres( x264_t *h, x264_frame_t *frame )
i_stride, frame->i_stride_lowres, frame->i_width_lowres, frame->i_lines_lowres );
x264_frame_expand_border_lowres( frame );
- for( y=0; y<16; y++ )
- for( x=0; x<16; x++ )
- frame->i_cost_est[y][x] = -1;
+ memset( frame->i_cost_est, -1, sizeof(frame->i_cost_est) );
+
+ for( x = 0; x < h->param.i_bframe + 2; x++ )
+ for( y = 0; y < h->param.i_bframe + 2; y++ )
+ frame->i_row_satds[y][x][0] = -1;
}
static void frame_init_lowres_core( uint8_t *src0, uint8_t *dst0, uint8_t *dsth, uint8_t *dstv, uint8_t *dstc,
which creates a rejection for the new bframe patch located at
@@ -299,9 +299,7 @@ void x264_frame_init_lowres( x264_t *h, x264_frame_t *frame )
i_stride, frame->i_stride_lowres, frame->i_width_lowres, frame->i_lines_lowres );
x264_frame_expand_border_lowres( frame );
- for( y=0; y<16; y++ )
- for( x=0; x<16; x++ )
- frame->i_cost_est[y][x] = -1;
+ memset( frame->i_cost_est, -1, sizeof(frame->i_cost_est) );
}
static void frame_init_lowres_core( uint8_t *src0, uint8_t *dst0, uint8_t *dsth, uint8_t *dstv, uint8_t *dstc,
so.... functionally, the change was already added in (or reads to me that way), so the fix is to strike the mc.c sections from the .diff:
x264_new_bframe_decision_04.6.diff -- link removed
DS/akupenguin, is this reasoning correct?
Ranguvar
27th August 2008, 19:09
I (with Kovensky's help) just did :)
http://pastebin.com/f4c77facb
EDIT: NVM, that one fails upon compile. Here's Sharktooth's.
http://www.webalice.it/f.corriga/x264/x264_new_bframe_decision_04.6.diff
kemuri-_9
27th August 2008, 19:13
you still have the
- int x, y;
+ int y;
section, the new r951 code still uses x; so that section needs to be removed too (as i did in mine)
Ranguvar
27th August 2008, 19:29
http://sites.google.com/site/ranguvar13/x264-builds
Direct download (http://sites.google.com/site/ranguvar13/x264-builds/rang_x264_r0951.7z?attredirects=0), Mirrors (http://www.rapidspread.com/file.jsp?id=uatst2lss5)
x264 r951 from Git (patched).
Open this archive with the free, multi-platform tools 7-Zip or p7zip. Compressed with LZMA.
The src folder contains the patched source code.
The bin folder contains a binary executable for Athlon and later AMD CPUs (those with 3DNow! and 3DNow!+ support),
and one for those without. There are also DLLs for those apps that use them (NOT for AviDemux).
Official source and vanilla builds: http://x264.nl/
Changelog: http://git.videolan.org/gitweb.cgi?p=x264.git;a=shortlog
Patches and discussion at Doom9's forum: http://forum.doom9.org/
Applied patches (included, unchanged, in the patches folder):
patch -p1 < ../x264diffs/x264_dll_alignment_fix.01.diff
patch < ../x264diffs/x264.progress.indication.01.diff
patch -p1 < ../x264diffs/x264_hrd_pulldown.09_interlace.diff
patch -p1 < ../x264diffs/x264-psyrd-0.6.r950.diff
patch -p1 < ../x264diffs/x264.new.bframes.decision.04.6.diff
Compiled by Ranguvar on August 27th, 2008, with GCC 4.3.1 on Windows XP Professional x64 SP2.
CLI used for non-AMD build: ./configure --enable-shared --extra-cflags="-march=pentium2 -pipe" && make fprofiled VIDS="../enctests/deadline_cif.y4m"
CLI used for AMD build: ./configure --enable-shared --extra-cflags="-march=athlon -pipe" && make fprofiled VIDS="../enctests/deadline_cif.y4m"
Platform: X86
System: MINGW
asm: yes
avis input: yes
mp4 output: yes
pthread: yes
gtk: no
debug: no
gprof: no
PIC: no
shared: yes
visualize: no
LoRd_MuldeR
27th August 2008, 19:48
Ranguvar, what does --extra-cflags="-pipe" do ???
Sharktooth
27th August 2008, 19:51
speed up compiling using pipes... binaries are unchanged.
expecially useful on multicore CPUs.
-pipe
Use pipes rather than temporary files for communication between the various
stages of compilation. This fails to work on some systems where the assembler
is unable to read from a pipe; but the GNU assembler has no trouble.
alexins
27th August 2008, 22:33
Safe Cflags (http://gentoo-wiki.com/Safe_Cflags)
kemuri-_9
28th August 2008, 01:13
all of those who use gcc 4.3.1 should start looking for mingw versions of gcc 4.3.2 soon, since 4.3.2 released today.
Ranguvar
28th August 2008, 01:26
Thanks for the heads-up, kemuri :) 4.3.2 looks nice.
I'll wait until TDM posts a build.
Underground78
28th August 2008, 07:22
Does somebody know why the "git-status" command gives me this result :
# On branch master
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
#
# modified: build/win32/libx264.vcproj
# modified: build/win32/x264.sln
# modified: build/win32/x264.vcproj
# modified: gtk/fr.po
#
no changes added to commit (use "git add" and/or "git commit -a")
after a "git clone git://git.videolan.org/x264.git" without changing any of these files ?
egrimisu
28th August 2008, 08:10
Hi,
i tested ranguvar' 951-2 x264 with bob0r's 928. But i'm confunsed, since i have a small monitor right now (crt 14") i can't realy tell the diference for 2 encodes one rangulars and the other bob0r's . Another dilemma is that i set ~1800kbits/s for boths encodes, same settings and instead of a 314MB output with bob0r's i get 150mb and with ranguvar's i get a 250mb. Why the output can't be 314 as i set it? The speed is kind of the same, can't tell exactly since i done some other stuff with the meenwhile. Witch release would you recomand for anime encoding aprox 1700-2000kb with allmost all setting maxed out? the speed aprox is 3 hours (1pass) for a 24min anime.
Ranguvar
28th August 2008, 08:21
Hi,
i tested ranguvar' 951-2 x264 with bob0r's 928. But i'm confunsed, since i have a small monitor right now (crt 14") i can't realy tell the diference for 2 encodes one rangulars and the other bob0r's .
So? My build is newer, there's been quite a few changes since r928 (but note that bob0r also has newer builds). So, it should be higher quality at the same bitrate, yes. But that of course depends on your settings. Or, the bitrate may just be too high to see the difference.
Another dilemma is that i set ~1800kbits/s for boths encodes, same settings and instead of a 314MB output with bob0r's i get 150mb and with ranguvar's i get a 250mb. Why the output can't be 314 as i set it?
It could be a multitude of things. We'd need your settings to give more insight. And that belongs in its own thread, really...
Witch release would you recomand for anime encoding aprox 1700-2000kb with allmost all setting maxed out? the speed aprox is 3 hours (1pass) for a 24min anime.
Just grab a recent build with the patches you want. I of course prefer my own builds, but there's generally very little difference between builds of the same revision with the same patches.
techouse
28th August 2008, 08:46
thanks for the heads-up, kemuri :) 4.3.2 looks nice.
I'll wait until tdm posts a build.
+1 :)
P.S.: MSYS got updated yesterday as well ;) http://sourceforge.net/forum/forum.php?forum_id=860793
Ranguvar
28th August 2008, 09:36
Thanks for that, too ;)
Although I believe I already have that version, since msys died (again) and I re-did everything from scratch.
skystrife
30th August 2008, 02:43
x264.953.modified.exe (http://www.mediafire.com/?nv5cgawxeow) - Alternate Download (http://skystrife.com/x264/x264.953.modified.exe)
Patches used:
x264_psy_rdo_0.6_r953.diff (http://skystrife.com/x264/x264_psy_rdo_0.6_r953.diff) <-- Updated to patch with r953, removes changing X264_BUILD.
x264_new_bframe_decision_04.6.diff <-- This patch is highly experimental, only enabled with --b-adapt 2. The --no-b-adapt parameter now works. Updated version.
x264_hrd_pulldown.09_interlace.diff
x264.progress.indication.01.diff
gcc 3.4.5 fprofiled build.
kemuri-_9
30th August 2008, 03:45
nooooo.... now all my .stat files are poo with r953+ :(
Sagekilla
30th August 2008, 05:32
Could x264 have a modification to check if itex and ptex are both present, and if they are to use the sum of both?
burfadel
30th August 2008, 08:23
Skystrife, there's something wrong with your latest 953 build!
Settings used in all cases were:
--crf 24 --keyint 450 --ref 5 --mixed-refs --bframes 5 --b-pyramid --b-rdo --bime --weightb --direct auto --subme 7 --trellis 2 --partitions all --8x8dct --me umh --threads 4 --progress --thread-input --no-psnr --no-ssim --b-adapt 2
Except for the vanilla x264 which was the same without --b-adapt2, aka:
--crf 24 --keyint 450 --ref 5 --mixed-refs --bframes 5 --b-pyramid --b-rdo --bime --weightb --direct auto --subme 7 --trellis 2 --partitions all --8x8dct --me umh --threads 4 --progress --thread-input --no-psnr --no-ssim
With the vanilla 953, the encoding speed was 22.5fps
With 950 your build the speed was (remember with --b-adapt 2)
17.97 fps
With 953 your build the speed was 2.5fps!!
(actually I stopped the encode, the speed was continually dropping)!!
The clip was a short clip with high motion, but the same differences applies to all clips... your 953 build being chronically slow with those settings! Going by your 950 and the vanilla 953 build speeds, this must be due to a compiling/compliance (any modifications you made for the patch to work) induced by the 953 revision.
The processor used is a core 2 duo e6600.
Thanks!
burfadel
30th August 2008, 08:33
Out of interest, I did some tests with Ranguvar's x264 951 versions, again with the same settings as in the post above (different clip, high motion though):
--crf 24 --keyint 450 --ref 5 --mixed-refs --bframes 5 --b-pyramid --b-rdo --bime --weightb --direct auto --subme 7 --trellis 2 --partitions all --8x8dct --me umh --threads 4 --progress --thread-input --no-psnr --no-ssim --b-adapt 2
The AMD build got 15.64 and the non-AMD build got 15.32.
So the AMD build is faster on a core 2 than the non AMD build! :)
Skystrife's 950 build also got 15.64!
techouse
30th August 2008, 14:20
x264_x86_r953_techouse (http://techouse.project357.com/builds/x264_x86_r953_techouse.7z)
Source: x264 r953 GIT (git://git.videolan.org/x264.git)
Applied patches (current versions):
x264_progress.diff
x264_psy_rdo.0.6_r953.diff
x264_hrd_pulldown.09_interlace.diff
x264_new_bframe_decision_04.6.diff
Please check http://forum.doom9.org/showthread.php?t=130364 and http://git.videolan.org/gitweb.cgi?p=x264.git;a=shortlog for more info
Compiled by techouse on August 30th 2008, 14:13:32 CEST with GCC-4.3.2 on Windows Vista Business SP-1 64-bit.
Commandline used: ./configure --extra-cflags="-march=core2 -pipe" && make fprofiled
Platform: X86
System: MINGW
asm: yes
avis input: yes
mp4 output: yes
pthread: yes
gtk: no
debug: no
gprof: no
PIC: no
shared: no
visualize: no
P.S.: I've found it to be 4% faster than my r951 built with gcc 4.3.1 and the generic march option.
LoRd_MuldeR
30th August 2008, 14:22
x264 SVN-r953 + Psy RD (Psy Trellis) v0.6 + Gruntster's Alignment-Fix:
http://www.mediafire.com/?z9pj84ly2kg
Compiled with MinGW GCC 4.3.1-tdm-1 (fprofiled). Cannot notice any major speed-drop.
burfadel
30th August 2008, 14:31
The speed drop was definitely anomalous with the test I did earlier, I tried to show that :) I don't know what coudl have caused it, but its only with that one build that it occurred. The rest of the builds show speed along the lines that one would expect!... I've also tried more than one 953 modded build as well :)
LoRd_MuldeR
30th August 2008, 14:50
x264 SVN-r953 + Psy RD (Psy Trellis) v0.6 + Gruntster's Alignment-Fix:
http://www.mediafire.com/?emoikztuc7o
Compiled with MinGW GCC 4.3.2-tdm-1 (fprofiled). It seems GCC 4.3.2 doesn't break it :)
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.