Log in

View Full Version : Current Patches, Where to get them, How they affect speed/output


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [18] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

kemuri-_9
30th August 2008, 14:54
lol it's not on SVN anymore, so should probably get your tag right
and why are you doing the alignment fix if gcc doesn't have the problem?

LoRd_MuldeR
30th August 2008, 14:56
lol it's not on SVN anymore, so should probably get your tag right
and why are you doing the alignment fix if gcc doesn't have the problem?

Because my "x264.exe" is simply a waste-product when building the libx264.dll for Avidemux :D

And I stick to SVN revision numbers, because everybody is using them and because those git hashcodes are not very user friendly...

skystrife
30th August 2008, 16:39
The speed drop was definitely anomalous with the test I did earlier, I tried to show that :) I don't know what coudl have caused it, but its only with that one build that it occurred. The rest of the builds show speed along the lines that one would expect!... I've also tried more than one 953 modded build as well :)

So another r953 build with the same patches doesn't exhibit the issue? Was it compiled with gcc 4?

burfadel
30th August 2008, 17:03
The build of 953 that I tried was Kemuri's. Kemuri's 953 is built with GCC 3.4.5 according to the text file on the website

Techouse's build shown at post 847 is built with GCC-4.3.2. and that DOES have the same problem as yours (Skstrife's)!

Lord_mulder's build also works fine (without --b-adapt 2 of course, since that build doesn't include that patch)

After taking out --b-adapt 2 your build works fine! so it must be something relating to the way the b-frame decision patch is applied.

Whatever Kemuri has done differently could be the key...

skystrife
30th August 2008, 17:14
The builds of 953 that I tried were Kemuri's and Techouse's. Kemuri's 953 is built with GCC 3.4.5 according to the text file on the website, and techouse build shown at post 847 is GCC-4.3.2.

Both of those builds worked fine!

Lord_mulder's build also works fine (without --b-adapt 2 of course, since that build doesn't include that patch)

Confirmed bug, I had a broken patch it seems. Making a new build now.

EDIT:

x264.953.modified.02.exe (http://www.mediafire.com/?qcjlxdaciei) - Alternate Download (http://skystrife.com/x264/x264.953.modified.02.exe)

Apologies for not catching the bug earlier.

burfadel
30th August 2008, 17:28
Thanks! just did a quick test (about 10 seconds worth)! and now it works fine! I made a mistake in my earlier post that you referenced (I corrected it), the Techouse build did exhibit the same issue as yours.

kemuri-_9
30th August 2008, 21:51
Hmm... looking back at my profiling logs with r953, -march=athlon is now slower than -march=pentium2...
so i tried some other AMD -march types, and athlon-xp (the next higher chip) was back to being faster than pentium2 again....

logs:
pentium2 (http://kemuri9.net/dev/x264/x264_profile.i686.gcc-3.4.5.pentium2.log)
athlon (http://kemuri9.net/dev/x264/x264_profile.i686.gcc-3.4.5.athlon.log), athlon-xp (http://kemuri9.net/dev/x264/x264_profile.i686.gcc-3.4.5.athlon-xp.log), k8 (http://kemuri9.net/dev/x264/x264_profile.i686.gcc-3.4.5.k8.log)

really weird at how -march=athlon slowed down drastically on the average from r951 to r953....
might have been related to the giant rceq code scrapping, but i wouldn't know for sure.

Once again this is only relevant for people on AMD CPUs.
you Intel folk can just ignore the jibber :p

skystrife
31st August 2008, 19:27
Could x264 have a modification to check if itex and ptex are both present, and if they are to use the sum of both?

http://skystrife.com/x264/x264_itex_ptex_compatibility_sum.01.diff

I know quite literally next to nothing about coding in C, so someone let me know if that's an absolutely, completely and utterly retarded way of accomplishing this.

A sample binary (psyrd, new bframe, hrd, progress, and the above patch):

x264.953.modified.03.exe (http://www.mediafire.com/?oejvdrw2luk) - Alternate Download (http://skystrife.com/x264/x264.953.modified.03.exe)

kemuri-_9
31st August 2008, 20:50
imo it would be more practical to write a conversion program to just convert the old format to the new one, rather than actually having it scan for both which would slow down the reading/scanning section some.

if you want to continue doing a double format scanner, it would be easier to use
if (strstr(p, "itex") != NULL)
as a detecter to detect if it's the old format

akupenguin
31st August 2008, 20:56
perl -pi~ -e 's/itex:(\d+) ptex:(\d+)/"tex:".($1+$2)/e'

LoRd_MuldeR
31st August 2008, 21:33
Why do people want to use old stats files anyway?

Isn't it highly recommended to create the stats file with the same revision that is used to do the final encode ???

kemuri-_9
31st August 2008, 21:58
the differences between r951 and r953 were pretty much the rceq removal and the .stats change,
otherwise there was no functional difference;
if you had a bunch of .stats made from r951, you wouldn't want to spend the time remaking them now would you?

are you trying to say that i should knowingly use an old revision of x264 when there's plenty of feature updates and bug fixes that can still use the old stats file in a new revision?
that's basically shooting yourself in the foot.

LoRd_MuldeR
31st August 2008, 22:03
the differences between r951 and r953 were pretty much the rceq removal and the .stats change,
otherwise there was no functional difference;
if you had a bunch of .stats made from r951, you wouldn't want to spend the time remaking them now would you?

are you trying to say that i should knowingly use an old revision of x264 when there's plenty of feature updates and bug fixes that can still use the old stats file in a new revision?
that's basically shooting yourself in the foot.

Unless there was a "major" change, it should be save to use old stats files, yes.
However you could simply stick to the older version to complete your encode in that case.

And if there was a "major" change, you should repeat both passes with the new version...

kemuri-_9
31st August 2008, 22:19
the 1st pass /.stats are pretty much only used for frame decisions (as far as i've gathered),
so unless there was a change in the way frames are decided from one revision to the next, you can update without a hitch.

LoRd_MuldeR
31st August 2008, 23:49
the 1st pass /.stats are pretty much only used for frame decisions (as far as i've gathered),
so unless there was a change in the way frames are decided from one revision to the next, you can update without a hitch.

I'm pretty sure the information from the stats file are also used to calculate the bitrate distribution during the second pass.
And the bitrate can change significantly with new features like VAQ, Psy RDO/Trellis. There's a reason why Psy + VAQ should already be ON in the first pass.

Sharktooth
1st September 2008, 00:23
psy-rdo isnt needed in the first pass. also bitrate is computed in the second pass (IIRC). however, i still cant understand why you should reuse the stats files. i usually delete them when im finished encoding.

kemuri-_9
1st September 2008, 01:15
for anime related things, we do things by the release candidate system, often the video can get re-encoded a few times.

edit:
for those that don't wanna install a scripting language to convert to the new format:
stat_convert.php (http://kemuri9.net/dev/x264/stat_convert.php)

MythCreator
1st September 2008, 14:33
a problem“
I set --asm sse3, but CLI still using SSE2

And then,Just a advice, please add SSE4A support...

Sharktooth
1st September 2008, 14:46
why it's an advice? are you sure sse4a will add to the encoder speed? i think not... if it was usefull it would have been added...
however what's --asm sse3?

MythCreator
1st September 2008, 14:51
--asm <integer> Override CPU detection

LoRd_MuldeR
1st September 2008, 14:52
I set --asm sse3 but CLI still using SSE2

Because there is no SSE3 code in x264 yet !?!?

And then,Just a advice, please add SSE4A support...

Do you know any function in x264 that would actually benefit from SSE4A instructions? :rolleyes:

Sharktooth
1st September 2008, 14:54
--asm <integer> Override CPU detection
--asm <integer>
integer is an integer value ... not a string...

@lord_mulder: there is SSE3 in x264...

LoRd_MuldeR
1st September 2008, 14:57
@lord_mulder: there is SSE3 in x264...

I think there only is SSSE3 code in x264 at the moment...

From my Q6600 system:
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64

The CPU has "SSE3" as well as "SSSE3", but x264 only uses the latter.

komisar
1st September 2008, 15:13
--asm <integer>
integer is an integer value ... not "sse3"...

@lord_mulder: there is SSE3 in x264...
Try this for you:
x264 --asm Altivec,MMX2
This also work...
All recognized flags:Altivec,MMX2,MMXEXT,SSE2Slow,SSE2,SSE2Fast,SSE3,SSSE3,PHADD,SSE4,Cache32,Cache64,Slow_mod4_stack

akupenguin
1st September 2008, 15:21
--asm sse3,cache64 if you want to force it to use a version of the sad function that is slower than what it picks automatically...
sse3 works only on pentium4D. it is completely useless on all other cpus, including core2 and k8.

skystrife
1st September 2008, 17:05
x264.955.modified.exe (http://www.mediafire.com/?xd334ayvaut) - Alternate Download (http://skystrife.com/x264/x264.955.modified.exe)

Patches used:

x264_psy_rdo_0.6_r953.diff
x264_new_bframe_decision_04.6.diff <-- This patch is highly experimental, only enabled with --b-adapt 2. The --no-b-adapt parameter now works.
x264_hrd_pulldown.09_interlace.diff
x264.progress.indication.01.diff

gcc 3.4.5 fprofiled build.

Avenger007
1st September 2008, 18:07
x264_new_bframe_decision_04.6.diff <-- This patch is highly experimental, only enabled with --b-adapt 2. The --no-b-adapt parameter now works.
What is "highly experimental" about the new_bframe_decision patch? :confused:

LoRd_MuldeR
1st September 2008, 18:11
What is "highly experimental" about the new_bframe_decision patch? :confused:

It's really slow when allowing a high number of consecutive b-frames. Future improvements might solve this problem...
Also it causes crazy things when used with Avidemux, I still wonder why :confused:

burfadel
1st September 2008, 18:24
With numerous tests, it seems around 5 to 6 b-frames with the new patch is ideal for both speed and output... (3 or 4 still provides excellent results though)! Any higher even with anime didn't seem to make much different, except for the slowdown :)

bob0r
1st September 2008, 18:59
you still have the

- int x, y;
+ int y;

section, the new r951 code still uses x; so that section needs to be removed too (as i did in mine)


Any proper x264_new_bframe_decision_04.X.diff then?

kemuri-_9
1st September 2008, 19:01
i looked at the results from a build with keeping the modifying the r951 section to one that removed it.
there was no difference in results in my tests,
so i would say to just remove it, which is how the one Sharktooth posted is.

bob0r
1st September 2008, 19:06
i looked at the results from a build with keeping the modifying the r951 section to one that removed it.
there was no difference in results in my tests,
so i would say to just remove it, which is how the one Sharktooth posted is.

Ok, you should edit your post or remove your link for clearness. Usually you guys have to modify dead projects to make .diff files work, but Dark is still very active on both psyrdo and new-b-frame, so i have aksed him if he can check the patches and if needed modify so we got the correct patches. (New build coming up after some vbv-speed issue patch is commited, should be r956 and soon :D)

bob0r
1st September 2008, 20:39
x264.956.modified.01.exe (http://files.x264.nl/x264.956.modified.01.exe)
libx264-61.956.modified.01.dll (http://files.x264.nl/libx264-61.956.modified.01.dll)

x264_psy_rdo_0.6_r953.diff (on by default, adjust with: --psy-rd)
x264_new_bframe_decision_04.7.diff (highly experimental, enabled with: --b-adapt 2, changes: removed the mc.c hunk, based on Dark_Shikari's words)
x264_hrd_pulldown.09_interlace.diff
x264.progress.indication.01.diff


Patchting process:
patch -p1 -l < x264_psy_rdo_0.6_r953.diff (http://files.x264.nl/x264_patches/x264_psy_rdo_0.6_r953.diff) ( -l parameter = ignore whitespaces )
patch -p1 < x264_new_bframe_decision_04.7.diff (http://files.x264.nl/x264_patches/x264_new_bframe_decision_04.7.diff)
patch -p1 < x264_hrd_pulldown.09_interlace.diff (http://files.x264.nl/x264_patches/x264_hrd_pulldown.09_interlace.diff)
patch -p0 < x264.progress.indication.01.diff (http://files.x264.nl/x264_patches/x264.progress.indication.01.diff)

Quote:

[20:52] (x'Dark_Shikari): .......
[20:52] (x'Dark_Shikari): for( x = 0; x < h->param.i_bframe + 2; x++ )
[20:52] (x'Dark_Shikari): - for( y = 0; y < h->param.i_bframe + 2; y++ )
[20:52] (x'Dark_Shikari): - frame->i_row_satds[y][x][0] = -1;
[20:52] (x'Dark_Shikari): WHO IS MAKIN THESE PATCHES? <-- refering to x264_new_bframe_decision_04.6.diff
[20:53] (x'Dark_Shikari): *MAKING
[20:53] (x'Dark_Shikari): gg breaking VBV again


It had to be said, before more bad patches are created.

Revision 956 should solve the speed issues.... it did for me!

TEST BEFORE PUTTING THIS IN YOUR ENCODING TOOLS/GUIS

My results:
x264.956M.hq.mkv (http://x264.nl/x264.956M.hq.mkv)
x264.956M.tv.mkv (http://x264.nl/x264.956M.tv.mkv)
If you told me this was x264 + this bitrate a year ago, i would have laughed at you :D

kemuri-_9
1st September 2008, 21:20
ah ha, there's that code from r951 that sharktooth's patch removed.

so it was necessary after all... i turned out to be right and had no idea i was...

Ranguvar
1st September 2008, 22:34
http://sites.google.com/site/ranguvar13/x264-builds

Direct download (http://sites.google.com/site/ranguvar13/x264-builds/rang_x264_r0956.7z?attredirects=0), Mirrors (http://www.rapidspread.com/file.jsp?id=l6ou1hqkae)

x264 r956 (fixes speed issues) from Git (patched).
Compiled by Ranguvar on September 1st, 2008, with GCC 4.3.2 on Windows XP Professional x64 SP2.

Open this archive with the free, multi-platform tools 7-Zip or p7zip. Compressed with LZMA.
The src folder contains the patched source code.
The bin folder contains a binary executable for Athlon and later AMD CPUs, and one for those without.
There are also DLLs for those apps that use them (NOT for AviDemux - get those from LoRd_MuldeR).

Git: git://git.videolan.org/x264.git
Info, and source tarballs: http://www.videolan.org/developers/x264.html
Changelog: http://git.videolan.org/gitweb.cgi?p=x264.git
Vanilla builds: http://x264.nl/
Discussion: http://forum.doom9.org/forumdisplay.php?f=77
http://forum.doom9.org/showthread.php?t=130364


Applied patches (included, unchanged, in the patches folder):

patch -p1 < ../x264diffs/x264_dll_alignment_fix.01.diff
patch < ../x264diffs/x264.progress.indication.01.diff
patch -p1 < ../x264diffs/x264_hrd_pulldown.09_interlace.diff
patch -p1 < ../x264diffs/x264_psy_rdo_0.6_r953.diff
patch -p1 < ../x264diffs/x264_new_bframe_decision_04.7.diff


CLI used for non-AMD build: ./configure --enable-shared --extra-cflags="-march=pentium2 -pipe"
make fprofiled VIDS="../enctests/deadline_cif.y4m"
CLI used for AMD build: ./configure --enable-shared --extra-cflags="-march=athlon -pipe"
make fprofiled VIDS="../enctests/deadline_cif.y4m"

Platform: X86
System: MINGW
asm: yes
avis input: yes
mp4 output: yes
pthread: yes
gtk: no
debug: no
gprof: no
PIC: no
shared: yes
visualize: no

Avenger007
1st September 2008, 23:01
It's really slow when allowing a high number of consecutive b-frames.
Does that really fall under "highly experimental"? I thought highly experimental meant it was alpha and had lots of potential bugs.
Efficiency isn't a bug -- it's a direct consequence of the algorithm chosen.

Soichiro
1st September 2008, 23:08
I still have reason to believe that there are bugs in it, but the devs seem to believe that I'm loony.

After all, having 99% of b-frames in an ep in 16 frame sequences is completely normal.

Not that that happens every time, of course, but there have been reports, and even one occurrence of a bug is too many.

LoRd_MuldeR
1st September 2008, 23:17
Does that really fall under "highly experimental"? I thought highly experimental meant it was alpha and had lots of potential bugs.
Efficiency isn't a bug -- it's a direct consequence of the algorithm chosen.

Well, unless the speed-loss is worth the quality-gain, extreme slowness can be considered a bug. Also the slowness of the algorithm chosen is caused by certain limitations in x264 - the frame-decision is not multi-threaded yet. In case they "fix" that problem, the new algo should become much faster. Last but not least, DS said he's working on some heuristics to speed-up the algo. So at the moment this patch simply isn't something everybody should use blindly and it's not in a final state yet. Thus it's called "(highly) experimental" ...

Avenger007
1st September 2008, 23:59
Well, unless the speed-loss is worth the quality-gain, extreme slowness can be considered a bug. Also the slowness of the algorithm chosen is caused by certain limitations in x264 - the frame-decision is not multi-threaded yet. In case they "fix" that problem, the new algo should become much faster. Last but not least, DS said he's working on some heuristics to speed-up the algo. So at the moment this patch simply isn't something everybody should use blindly and it's not in a final state yet. Thus it's called "(highly) experimental" ...
Thus it should be called "inefficient" not "(highly) experimental".
Essentially you're saying the algorithm is a "bug" even though it works functionally as far as I can tell.

LoRd_MuldeR
2nd September 2008, 00:08
Thus it should be called "inefficient" not "(highly) experimental".
Essentially you're saying the algorithm is a "bug" even though it works functionally as far as I can tell.

I don't say it is a bug. I just say it's currently less efficient than it could be and it's not in a state were everybody should start using it blindly.
And I said that a patch (not this one) could be called "buggy", if it causes an extreme slowdown for a minimal quality-gain.

After all it doesn't matter whether the b-frame patch is labeled "(highly) experimental", "inefficient" or something else.
If you want to use it, then you can feel free to use it. And if you are afraid to test "experimental" stuff, then keep away from it...

Avenger007
2nd September 2008, 00:13
If you want to use it, then you can feel free to use it. And if you are afraid to test "experimental" stuff, then keep away from it...
That's what I want to know... what is there to be afraid of?
Is efficiency the only/main reason why the patch hasn't been committed as yet?

kemuri-_9
2nd September 2008, 00:34
That's what I want to know... what is there to be afraid of?
Is efficiency the only/main reason why the patch hasn't been committed as yet?

yes, it works perfectly fine outside of the fact it grows incredibly slower with more b-frames

Ranguvar
2nd September 2008, 00:43
I believe so, I think aku/DS were saying they could optimize it a little more, so it's probably not being committed until then.

Sharktooth
2nd September 2008, 01:30
ah ha, there's that code from r951 that sharktooth's patch removed.

so it was necessary after all... i turned out to be right and had no idea i was...
uh? what code i removed?
i just manually did what patch command didnt... i looked at the rejected changes and done them manually by removing the lines with a minus sign in front of them... so i definatly didnt remove any code that the previous patch (0.4.5) didnt already remove...

kemuri-_9
2nd September 2008, 01:44
the code that the pre r951 patch removed was different than the code that there was for r951. the code looked similar but was actually different:
pre r951 patch - '4.5'
- for( y=0; y<16; y++ )
- for( x=0; x<16; x++ )
- frame->i_cost_est[y][x] = -1;


r951 patch - your '4.6'
- for( x = 0; x < h->param.i_bframe + 2; x++ )
- for( y = 0; y < h->param.i_bframe + 2; y++ )
- frame->i_row_satds[y][x][0] = -1;


In the end, DS came and settled the situation, so it's all fine now.

Sharktooth
2nd September 2008, 01:47
oh... then something went wrong during the creation of the .rej file... or im just blind (that could be possible as well since my left eye is recently almost really blind...).

edit: i confirm the .rej file is wrong, so my "patch" bin is definatly screwed...

Quark.Fusion
2nd September 2008, 11:33
options: --qp 0 --ref 2 --subme 3 --no-cabac --mixed-refs --progress --threads 6 --thread-input
avis [info]: 320x240 @ 23.98 fps (2280 frames)

——————————

x264 - core 61 r955kVAQmod.PsyRDO d4265bb
x264 [info]: kb/s:4033.9

BROKEN FILE

——————————

x264 - core 61 r956 7b71d58
x264 [info]: kb/s:5985.2

PLAYABLE

——————————

x264 - core 61 r956kVAQmod.PsyRDO 7b71d58
x264 [info]: kb/s:4033.9

BROKEN FILE

——————————

options: --aq-mode 0 --qp 0 --ref 2 --subme 3 --no-cabac --mixed-refs --progress --threads 6 --thread-input
x264 - core 61 r956kVAQmod.PsyRDO 7b71d58
x264 [info]: slice I:16 Avg QP: 0.00 size: 25373
x264 [info]: kb/s:4033.9

BROKEN FILE

——————————

options: --qp 0 --ref 2 --subme 3 --no-cabac --mixed-refs --progress --threads 1 --thread-input
x264 - core 61 r956kVAQmod.PsyRDO 7b71d58
x264 [info]: slice I:11 Avg QP: 0.00 size: 31243
x264 [info]: kb/s:4033.5

BROKEN FILE

——————————

Whats wrong?


P.S. I currently encoding 1920x832 (196071 frames) with broken build and different options — it is playable, but is it lossless?

Quark.Fusion
2nd September 2008, 11:59
Now encoded with "--partitions i8x8,b8x8,p8x8,i4x4 --ref 4 --subme 4 --me hex --8x8dct --bframes 16 --bime --b-pyramid --b-adapt 1 --b-rdo --deadzone-inter 0 --deadzone-intra 0 --colormatrix bt709 --colorprim bt709 --qp 0 --direct auto --no-deblock --fullrange on --mixed-refs --no-dct-decimate --no-fast-pskip --no-psnr --no-ssim --progress --threads 6 --thread-input --trellis 0 --weightb ":
x264 [info]: slice I:16 Avg QP: 0.00 size: 31473
x264 [info]: slice P:1626 Avg QP: 0.00 size: 29513
x264 [info]: slice B:638 Avg QP: 0.00 size: 22953
x264 [info]: kb/s:5311.3

Seems that my big encode must be correct one.


Edit: seems that it is bug with "--no-cabac".

komisar
2nd September 2008, 12:35
Yes, --no-cabac broken. I find "broker"... Not work from 928 revision...

Quark.Fusion
--qp 0 is lossless...

komisar
2nd September 2008, 13:33
--no-cabac Broken after make fprofiled
Need confirm another "patchers-builders"...