Log in

View Full Version : Current Patches, Where to get them, How they affect speed/output


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 [33] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

G_M_C
9th February 2009, 12:36
I got the point, thanks for the clarification.

Mind you, the outcome of that poll wasnt clear, probably because the poll-question was "multi-interpretable" ;) But what i wrote above was the direction the discussion was going.

wyti
9th February 2009, 16:30
HI, i'm searching for a while for an up to date x264 RDRC patch (ideally compatible with r1096) but i can't find anyone.
And i will be very happy if someone know where to find this patch and have enough time to compile an x264 build (win32) with that patch applied.

J_Darnley
9th February 2009, 16:34
There aren't any, public ones at least. Who knows what Dark Shikari has tucked away.

Sharktooth
9th February 2009, 19:33
HI, i'm searching for a while for an up to date x264 RDRC patch (ideally compatible with r1096) but i can't find anyone.
And i will be very happy if someone know where to find this patch and have enough time to compile an x264 build (win32) with that patch applied.
come on, are you serious?
RDRC is usefull for debugging. noone wants tu run RDRC in real life encodings. it's too damn slow!!!

Dark Shikari
9th February 2009, 19:34
come on, are you serious?
RDRC is usefull for debugging. noone wants tu run RDRC in real life encodings. it's too damn slow!!!You underestimate the amount of time people have on their hands ;)

imk
9th February 2009, 19:52
Hello imk, I appreciate your work and I'd have a request for you: is it possibile to implement mp4 output in your build?

That's another mess that I don't really want to play around with. I just use mp4box directly or mkvmerge. :)

imk
9th February 2009, 20:12
Built with ICC v11.0.066 (with profiling):
x264.r1106M.SSE2.x32.imk.exe (http://imk.cx/pc/x264/x264.r1106M.SSE2.x32.imk.exe)
x264.r1106M.SSSE3.x32.imk.exe (http://imk.cx/pc/x264/x264.r1106M.SSSE3.x32.imk.exe)

x264.r1106M.SSE2.x64.imk.exe (http://imk.cx/pc/x264/x264.r1106M.SSE2.x64.imk.exe)
x264.r1106M.SSSE3.x64.imk.exe (http://imk.cx/pc/x264/x264.r1106M.SSSE3.x64.imk.exe)


Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_icc.diff
x264_win_zone_parse_fix_05.diff
x264_win64_support.09.r1106.diff (for the 64-bit build only)

wyti
9th February 2009, 23:20
come on, are you serious?
RDRC is usefull for debugging. noone wants tu run RDRC in real life encodings. it's too damn slow!!!

Yes i'm serious, i only want to try it by myself, and only after that i will know if this is too slow for me or not.

kemuri-_9
10th February 2009, 00:23
Yes i'm serious, i only want to try it by myself, and only after that i will know if this is too slow for me or not.


<blue_misfit> RDRC looks interesting too!!! probably too slow for me to use
<Dark_Shikari> RDRC is not really stable enough for practical use. not only slow and unthreaded, but it dies horribly on fades (because that's what is RD-optimal to do in the case of no weighted prediction)

so you should hold off on it for the time being.

skystrife
10th February 2009, 00:39
x264.1109M.exe (http://www.mediafire.com/?nwzijnmnzzy) - Alternate Download (http://skystrife.com/x264/x264.1109M.exe)

Patches used:

x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff

gcc 3.4.5 fprofiled build with -march=pentium2.
-----------------------------------------------

x264.1109M.x64.exe (http://www.mediafire.com/?4qd4dmynktz) - Alternate Download (http://skystrife.com/x264/x264.1109M.x64.exe)

Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff
x264_win64_support.09.r1106.diff

gcc 4.3.4 fprofiled build.

burfadel
12th February 2009, 08:24
x264 on www.x264.nl is now 2 builds old, its not updating properly... (I realise since its automatic should create a new build even if the build changes aren't relevant for x86)...

kemuri-_9
12th February 2009, 16:09
Windows 64-bit support
A "make distclean" is probably required after updating to this revision.


that 2nd line there is probably the cause. it's likely not able to compile correctly it until that is done.

bob0r
12th February 2009, 18:59
x264 on www.x264.nl is now 2 builds old, its not updating properly... (I realise since its automatic should create a new build even if the build changes aren't relevant for x86)...

Windows update reboots ftw.
I know it can be turned off, but that computer is standalone and direct online, so better safe than sorry.

Compiling.... :cool:

burfadel
12th February 2009, 20:36
Thanks :)

Snowknight26
12th February 2009, 20:41
Windows update reboots ftw.

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU]
"NoAutoRebootWithLoggedOnUsers"=dword:00000001

JohannesL
13th February 2009, 20:23
<Dark_Shikari> RDRC is not really stable enough for practical use. not only slow and unthreaded, but it dies horribly on fades (because that's what is RD-optimal to do in the case of no weighted prediction)
That explains the blur on the treetops in the beginning fade in your BBB encode. (Can't that be solved with a zones setting though?)

Dark Shikari
13th February 2009, 20:32
That explains the blur on the treetops in the beginning fade in your BBB encode. (Can't that be solved with a zones setting though?)RDRC doesn't respect zones (I think?). It really should be a fix in the algorithm anyways, not a zone hack.

skystrife
15th February 2009, 00:48
x264 r1113 (unpatched) (http://www.mediafire.com/?dbdzwljmykz) - Alternate Download (http://skystrife.com/x264/revision1113/x264.exe)
gcc 4.3.4 fprofiled build.

-------------------------

x264.1113M.x86.exe (http://www.mediafire.com/?t32dn0mzym1) - Alternate Download (http://skystrife.com/x264/x264.1113M.x86.exe) / x264.1113M.x64.exe (http://www.mediafire.com/?xztmzqntwdj) - Alternate Download (http://skystrife.com/x264/x264.1113M.x64.exe)
gcc 3.4.5 fprofiled build with -march=pentium2. / gcc 4.3.4 fprofiled build.

Patches used:

x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff

bob0r
16th February 2009, 00:13
skystrife's unpatched builds will be (manually) updated on x264.nl.

Thanks!

LoRd_MuldeR
16th February 2009, 00:58
skystrife's unpatched builds will be (manually) updated on x264.nl.

Thanks!

I wonder: Isn't it possible to automatically compile/update the x64 builds, just like you do it with the x86 ones?

kemuri-_9
16th February 2009, 01:28
I wonder: Isn't it possible to automatically compile/update the x64 builds, just like you do it with the x86 ones?

he can't profile the x64 ones, so he's using skystrife's which are profiled.

LoRd_MuldeR
16th February 2009, 02:26
he can't profile the x64 ones, so he's using skystrife's which are profiled.

I assume the problem is that his machine isn't 64-Bit.

Well, if profiling doesn't measure realtime, but something like "instructions per function call" or "number of calls per function", it could be run inside an emulator.

QEMU should be able to emulate a x86-64 processor on a x86 machine.

The problem I see is that running the profiling inside an emulator may take a very long time... :(

Egh
16th February 2009, 10:22
One thing which may be good to add a patch for:

http://forum.doom9.org/showthread.php?p=1250312

Apparently, as I identified, current AVIS code supports AVI files input but doesn't care if they over 2GB so x264 just finishes job successfully when it reaches 2gb limit (no errors shown of course;)). It seems it would be relatively easy to adjust the code to understand OpenDML avi files.

Could someone please take a look at it?

imk
17th February 2009, 01:46
Here's a bunch of ICC builds:

x264.r1114M.SSE2.x32.imk.exe (http://imk.cx/pc/x264/x264.r1114M.SSE2.x32.imk.exe)
x264.r1114M.SSE2.x32.mp4.imk.exe (http://imk.cx/pc/x264/x264.r1114M.SSE2.x32.mp4.imk.exe)
x264.r1114M.SSSE3.x32.imk.exe (http://imk.cx/pc/x264/x264.r1114M.SSSE3.x32.imk.exe)
x264.r1114M.SSSE3.x32.mp4.imk.exe (http://imk.cx/pc/x264/x264.r1114M.SSSE3.x32.mp4.imk.exe)

x264.r1114M.SSE2.x64.imk.exe (http://imk.cx/pc/x264/x264.r1114M.SSE2.x64.imk.exe)
x264.r1114M.SSE2.x64.mp4.imk.exe (http://imk.cx/pc/x264/x264.r1114M.SSE2.x64.mp4.imk.exe)
x264.r1114M.SSSE3.x64.imk.exe (http://imk.cx/pc/x264/x264.r1114M.SSSE3.x64.imk.exe)
x264.r1114M.SSSE3.x64.mp4.imk.exe (http://imk.cx/pc/x264/x264.r1114M.SSSE3.x64.mp4.imk.exe)

x264.r1114.SSE2.linux.static.x64.imk.lzma (http://imk.cx/pc/x264/x264.r1114.SSE2.linux.static.x64.imk.lzma)
x264.r1114.SSSE3.linux.static.x64.imk.lzma (http://imk.cx/pc/x264/x264.r1114.SSSE3.linux.static.x64.imk.lzma)


The builds with mp4 in them have mp4 output built in. All I did was compile gpac_lib in MSVC2005 and link x264 against the .lib file. I didn't need any patches or anything for it to work. I personally don't use mp4 output so I haven't tested it. Let me know how it works.

The Linux binaries can be extracted with unlzma. There will be no 32-bit builds of x264 with ICC for Linux since my environment is not multilib (it's 64-bit only).

Older builds, build scripts, information, etc. can all be found here (http://imk.cx/pc/x264/).

G_M_C
17th February 2009, 07:53
I'd like to start using the ICC builds, cause i've noticed they seem moderately faster on my XP/SP3 system with 4Gb and C2D 6750. But i just want to make shure; Are there any differences noticed in the output of the ICC builds, compared to the other builds ?

video_magic
17th February 2009, 08:40
Hello imk,
thanks for providing the x264 builds. I have failed to understand one of your posts earlier; please would you tell me whether I should be downloading the SSE2 or the SSSE3 version, I have this CPU:
http://processorfinder.intel.com/details.aspx?sSpec=SL9KF

You don't specify a target architecture with ICC; you specify a minimum target instruction set instead. The SSE2 build will work on anything SSE2 and higher, and the SSSE3 build will work on any processor with SSSE3 support and higher. All builds will still take advantage of whatever your processor supports, I.E. the SSE2 build will still use SSSE3, or SSE4.2, etc.
...

Also, is there some sort of DLL file I am meant to put on my system to ensure the ICC build works fully as it should? Thankyou again.

Audionut
17th February 2009, 09:00
SSE2 and no to the dll file.

video_magic
17th February 2009, 09:30
Thanks!

Are these two statements true then as I hope to have understood it?:

The SSE2 build would be the fastest for me.

There are some SSSE3 chips which don't have SSE2.

Dark Shikari
17th February 2009, 09:35
There are some SSSE3 chips which don't have SSE2.Er, no.

video_magic
17th February 2009, 09:58
For what reasons might someone prefer to download an SSSE3 build rather than downloading an SSE2 build, from the ICC selection above?

I am only curious so thanks for any simple explanations.

imk
17th February 2009, 10:12
For what reasons might someone prefer to download an SSSE3 build rather than downloading an SSE2 build, from the ICC selection above?

I am only curious so thanks for any simple explanations.

The difference just has to do what instruction set was targeted when profiling. Builds targeted for SSE2 will still take advantage of any instruction set your processor uses, and it will work on any SSE2-capable processor. The same goes for the SSSE3 builds. When compiled targeting SSSE3, ICC will take advantage of instructions found in SSSE3.

So if your processor doesn't support anything higher than SSE2, then grab the SSE2 build. If you have a Core 2 Duo or Core 2 Quad, or any other processor with SSSE3 support, then grab the SSSE3 build.

I haven't benchmarked the difference between the builds, but I wouldn't be surprised if there's absolutely no difference between them.

I just build them because it doesn't really require any extra effort. :)


Are there any differences noticed in the output of the ICC builds, compared to the other builds ?

There should be no difference.

video_magic
17th February 2009, 21:33
Thanks very much guys.
If the SSE2 build will use the other instructions of my CPU as it needs then that is my query answered I think!

LoRd_MuldeR
17th February 2009, 21:35
Thanks very much guys.
If the SSE2 build will use the other instructions of my CPU as it needs then that is my query answered I think!

x264 uses its own runtime CPU detection code anyway. Also most (all?) performance critical functions are written as hand optimized assembler code.

Hence if your CPU supports any instructions that are useful for x264, then x264 will detect and use them, regardless of the compiler settings that were used to make the build.

The compiler optimizations don't effect anything but the plain C code in x264...

kemuri-_9
17th February 2009, 22:12
Are there any differences noticed in the output of the ICC builds, compared to the other builds ?

possibly yes, for the x86 builds:
since the ICC builds are using the sse floating point math instruction set rather the the 387 instruction set that gcc uses by default (because gcc doesn't use sse for x86 builds unless told to),
under certain settings you can get different binary outputs between the x86 gcc and above x86 icc versions.

this was pointed out by BugMaster originally for differences between x86 and x64 gcc binary outputs, since 387 is default for x86 and sse is default for x64.

skystrife
21st February 2009, 14:36
x264 r1114 (unpatched) (http://www.mediafire.com/?2kdijwzhwxz) - Alternate Download (http://skystrife.com/x264/revision1114/x264.exe)
gcc 4.3.4 fprofiled build.

-------------------------

x264.1114M.x86.exe (http://www.mediafire.com/?2oaynozyuxq) - Alternate Download (http://skystrife.com/x264/x264.1114M.x86.exe) / x264.1114M.x64.exe (http://www.mediafire.com/?in1zo2ilimi) - Alternate Download (http://skystrife.com/x264/x264.1114M.x64.exe)
gcc 3.4.5 fprofiled build with -march=pentium2. / gcc 4.3.4 fprofiled build.

Patches used:

x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff

The 2GB files issue should be fixed, but I didn't test it.

EDIT: Revert to 1113 if you experience the issue below. I'm not sure what caused this, but a rebuild appears to have fixed it so I will update the links when it is finished.

komisar
21st February 2009, 15:20
skystrife, sorry, but:
"The image file D:\Temp\movie\.test\vaq.test\x264.1114M.x64.exe is valid, but is for a machine type other than the current machine."
http://komisar.gin.by/img/trash/err1.png
on my AMD Athlon64

LoRd_MuldeR
21st February 2009, 15:37
skystrife, sorry, but:
"The image file D:\Temp\movie\.test\vaq.test\x264.1114M.x64.exe is valid, but is for a machine type other than the current machine."

Works on my Core2 under WindowsXP x64-Edition :confused:

Chengbin
21st February 2009, 15:44
You underestimate the amount of time people have on their hands ;)

+1

I got too much time on my hands. I would like the RDRC patch too. I seriously don't care about encoding time, as long as it is not more than 4x slower than the current x264, as I can't always leave the computer on overnight, so I must be able to finish encoding a movie in 16 hours.

Any idea when RDRC is stable enough for release? Do you think we can see a functional (threaded), and stable release by early 2010? Does it work in main profile? A decoder doesn't care if the video is encoded with RDRC or 2 pass right?

How long did it take to encode the BBB episode? What processor? What kind of bitrate do you need to get roughly equivalent quality using 2 pass or crf?

Is there a webpage that explains how RDRC works and why the quality is so good?

Oh crap I just saw that RDRC is not threaded. Even if it is the same speed as 2 pass, it would be over my time limit. Looks like until it is threaded, I can only use it on TV episodes and anime, because they're shorter.

komisar
21st February 2009, 16:19
For testing (profiled,mp4,pthread, gcc 4.3.4 20090220 prerelease):
x264.1114.kGIT.generic.x32.test.exe (http://komisar.gin.by/test/x264.1114.kGIT.generic.x32.test.exe)
x264.1114.kGIT.generic.x64.test.exe (http://komisar.gin.by/test/x264.1114.kGIT.generic.x64.test.exe)

patches:
01_x264_custom_strtok_r.r1089.diff
x264_hrd_pulldown.09_interlace.diff
x264_mingw_aligned_04.diff

P.S. Also see my post (http://forum.doom9.org/showthread.php?p=1252042#post1252042)

Dark Shikari
21st February 2009, 21:05
+1

I got too much time on my hands. I would like the RDRC patch too. I seriously don't care about encoding time, as long as it is not more than 4x slower than the current x264, as I can't always leave the computer on overnight, so I must be able to finish encoding a movie in 16 hours.

Any idea when RDRC is stable enough for release? Do you think we can see a functional (threaded), and stable release by early 2010? Does it work in main profile? A decoder doesn't care if the video is encoded with RDRC or 2 pass right?

How long did it take to encode the BBB episode? What processor? What kind of bitrate do you need to get roughly equivalent quality using 2 pass or crf?

Is there a webpage that explains how RDRC works and why the quality is so good?

Oh crap I just saw that RDRC is not threaded. Even if it is the same speed as 2 pass, it would be over my time limit. Looks like until it is threaded, I can only use it on TV episodes and anime, because they're shorter.RDRC is incredibly simple.

1. Encode the current frame at a certain quantizer. This isn't at all incompatible with AQ and works how you think it would.

2. Encode the next X frames at the quantizers they were encoded at in the first pass.

3. Measure the RD score (bits*lambda+SSD) of all the frames encoded.

4. Pick a new quantizer, GOTO 1. Do this for a whole bunch of quantizers.

5. Pick the best quantizer of all those tried. This is the quantizer for the frame. Encode the frame.

6. Go to the next frame. :devil:

(This is not as slow as you would think it is because this lookahead pass is done with very very very fast encoding settings and with bitstream writing turned off, so you can pretty much calculate the performance of this by making a rough estimate of how many QPs are tried (say, 10) and multiplying times the lookahead size and using the speed equal to the speed of a very fast first pass.)

Chengbin
21st February 2009, 22:30
Thank you Dark Shikari.

Obviously RDRC can't be that simple, otherwise it would be released already.

Do you mind answering the other questions in my previous post? I really need to know them in order to know if RDRC is for me, or it is for fun.

Can you can use RDRC and specify a specific size?

Dark Shikari
21st February 2009, 22:41
Thank you Dark Shikari.

Obviously RDRC can't be that simple, otherwise it would be released already.Simple does not mean it should be released.
Can you can use RDRC and specify a specific size?No. That could be coded, but it would require some modifications to ratecontrol.

Chengbin
21st February 2009, 22:48
Simple does not mean it should be released.

Then why bother coding it?

It works with main profile right?

Assuming you use the same settings in CRF and RDRC, roughly how much more efficient is RDRC? Roughly how much longer will it take?

Dark Shikari
21st February 2009, 22:54
Then why bother coding it?Because it's a good platform for testing experimental ratecontrol ideas, much like QNS is a good platform for testing arbitrary quality metrics.
It works with main profile right?It works with any profile.
Assuming you use the same settings in CRF and RDRC, roughly how much more efficient is RDRC?I couldn't give you a number, and RDRC optimizes purely for PSNR (and my modification of it optimizes for SSIM). This may not be the best from a visual standpoint.

MB-tree AQ would be a better way to solve the problem.

cyberbeing
21st February 2009, 23:59
MB-tree AQ would be a better way to solve the problem.
The mythical MB-tree patch. :devil:
Did you ever find committed to coding the MB-tree patch, still any chance to make it a project for Google Summer of Code, does it still seem you may have to code it yourself, or are there still more important things which need to be finished first?

Sagittaire
22nd February 2009, 00:17
RDRC is incredibly simple.

1. Encode the current frame at a certain quantizer. This isn't at all incompatible with AQ and works how you think it would.

2. Encode the next X frames at the quantizers they were encoded at in the first pass.

3. Measure the RD score (bits*lambda+SSD) of all the frames encoded.

4. Pick a new quantizer, GOTO 1. Do this for a whole bunch of quantizers.

5. Pick the best quantizer of all those tried. This is the quantizer for the frame. Encode the frame.

6. Go to the next frame. :devil:



Not possible to make RDRC at macroblock level instead frame level?

1. Encode the current frame at a certain quantizer.

2. Encode the next X frames at the quantizers they were encoded at in the first pass.

3. Measure the RD score (bits*lambda+SSD) at macroblock level of all the frames encoded.

4. Pick a new quantizer, GOTO 1. Do this for a whole bunch of quantizers.

5. Pick the best quantizer for each macroblock of all those tried. Encode the frame.

6. Go to the next frame. :devil:

That's should work like a complexity mask: higher relative quantizer for high complexity block and lower relative quantizer for low complexity block if you choose lower relative lambda for high complexity block (save bit advantage) and higher relative lambda for low complexity block (save quality advantage).

Dark Shikari
22nd February 2009, 00:19
Not possible to make RDRC at macroblock level instead frame level?

1. Encode the current frame at a certain quantizer.

2. Encode the next X frames at the quantizers they were encoded at in the first pass.

3. Measure the RD score (bits*lambda+SSD) at macroblock level of all the frames encoded.

4. Pick a new quantizer, GOTO 1. Do this for a whole bunch of quantizers.

5. Pick the best quantizer for each macroblock of all those tried. Encode the frame.

6. Go to the next frame. :devil:

That's should work like a complexity mask: higher relative quantizer for high complexity block and lower relative quantizer for low complexity block if you choose lower relative lambda for high complexity block (save bit advantage) and higher relative lambda for low complexity block (save quality advantage).You could only really do this correctly if you did it for each macroblock separately, which would make it about 1000 times slower.

However, this would accurately emulate the effect of MBtree (in fact, it would emulate a completely-optimal MBtree) and might be practical to use on something like QCIF footage for testing.

Sagittaire
22nd February 2009, 00:30
However, this would accurately emulate the effect of MBtree (in fact, it would emulate a completely-optimal MBtree) and might be practical to use on something like QCIF footage for testing.

Yes and the advantage of RDRC at macroblock level is that you can introduce HVS masking (Spatial and temporal complexity, Luma, constrast ... ) with different lambda at macroblock level. Certainely the Graal for RC ... with very powerfull CPU.

SZGY
22nd February 2009, 03:47
I did a really small benchmark (1000 frames) on my Q6600 under XP x64 with x264.r1114M w/ avs64. The 64 bit version seems to be less than 4% faster than the 32bit one. But hey, that's a start :)

lexor
22nd February 2009, 16:03
MeGUI just updated me from 1113 to 1114 skystrife build and the thing now says it is incompatible with my x64 OS. :( (same problem as komisar reported on previous page)