View Full Version : Current Patches, Where to get them, How they affect speed/output
Adub
25th December 2008, 04:46
Holy crap, I haven't looked at this thread in a while, and I just glanced at Audionut's build. How many freakin' patches do we have/need now?
LoRd_MuldeR
25th December 2008, 04:52
Holy crap, I haven't looked at this thread in a while, and I just glanced at Audionut's build. How many freakin' patches do we have/need now?
x264_fix_esa_crash.diff - workaround to avoid a crash with "Exhaustive Search" introduced in r1057
x264_hrd_pulldown.09_interlace.diff - required only for Blu-Ray compatibility
x264_custom_strtok_r.r1038.diff - fixes a problem with the parsing of custom zones on Windows
x264_single_frame_flash.diff - don't detect single "flash frames" as scene changes
x264_thread_pool.r1038.diff - use a pool of threads (http://en.wikipedia.org/wiki/Thread_pool_pattern) instead of creating new threads all the time
Decide which of these you need. No idea what all the other patches do ;)
Audionut
25th December 2008, 04:55
It's simply a build I did for myself with a patch or 2 or 3. And thought I would share it.
x264 only really needs the ESA patch atm.
Personally I don't use the HRD patch, but most seem to want it.
Audionut
25th December 2008, 05:01
No idea what all the other patches do ;)
x264_thread_priority_with_pool.02.diff - Change the priority of threads and thread-input
x264_debug_defines.r1038.diff
x264_fix_stats_file_work.r1038.diff
x264_multithreading_bug_check.r1038.diff
x264_error_memoryleaks.03.r1038.diff
All needed to get thread-pool to patch. Cause I ain't clever enough to fix the patches myself.
x264_log_file.03k.diff - Have x264 generate a log file - (Log-file level information (-1=NONE,0=ERROR,1=WARNING,2=INFO,3=DEBUG) [2] )
x264_mingw_aligned_03.diff - extra aligned memory allocation methods for mingw http://forum.doom9.org/showthread.php?p=1217760#post1217760
Dark Shikari
25th December 2008, 05:04
I just wondered regarding the last few posts and this quite long patch-list for this buildGenerally there are reasons why patches aren't in git.
x264_debug_defines.r1038.diffThis is a useless patch.
x264_multithreading_bug_check.r1038.diffThe only purpose of this patch is to fix a bug in the above useless patch.
x264_fix_stats_file_work.r1038.diffI don't even know what this one does.
x264_error_memoryleaks.03.r1038.diffThis one explicitly isn't being committed because its author still hasn't brought it up to spec according to akupenguin's guidelines. This means it should not be included in builds.
x264_thread_pool.r1038.diff
x264_thread_priority_with_pool.02.diffThread pool, last I checked, had nearly no effect on speed except on dual core systems, where it slowed things down.
x264_Cosmetic.02.diffWhy the heck do people include cosmetic patches in public builds? The entire definition of a cosmetic change is one that doesn't modify the output binary!
x264_mingw_aligned_03.diffI see no benefit to this patch.
x264_fix_esa_crash.diffalready committed...
LoRd_MuldeR
25th December 2008, 05:08
already committed...
:thanks:
Audionut
25th December 2008, 06:32
already committed...
It wasn't before I did my build.
I seen an improvement with the thread priority patch.
Any chance someone could fix this patch to work on it's own.
MasterNobody
25th December 2008, 08:03
x264_error_memoryleaks.03.r1038.diffThis one explicitly isn't being committed because its author still hasn't brought it up to spec according to akupenguin's guidelines. This means it should not be included in builds.
Can you show me his guidelines? The only guidelines about this patch was here (http://mailman.videolan.org/pipermail/x264-devel/2008-September/004876.html) and I used it in the version 02 (next post in mailing list). In version 03 I only include fix for incorrect memory allocation of 'z->param' in 'parse_zone'.
x264_error_memoryleaks.03.r1038.diff
Thread pool, last I checked, had nearly no effect on speed except on dual core systems, where it slowed things down.Probably, this is because "--thread-queue == --threads" by default (and not "--threads * 2" which was in original patch which I use as base) and I change this only to make output binary identical to GIT-version.
MasterNobody
25th December 2008, 08:14
x264_debug_defines.r1038.diff
x264_fix_stats_file_work.r1038.diff
x264_multithreading_bug_check.r1038.diff
x264_error_memoryleaks.03.r1038.diff
All needed to get thread-pool to patch. Cause I ain't clever enough to fix the patches myself.
May be you are trying to use incorrect version of patch (which I made for my batch). There is version of this patch which can be applied alone: x264_thread_pool.r1038.diff (http://komisar.gin.by/x.patch/BugMaster/20081126/independent/x264_thread_pool.r1038.diff)
Audionut
25th December 2008, 08:34
edit: My bad I only looked that it was on Komisar's site.
Yes that does patch fine.
skystrife
25th December 2008, 09:15
reasons for the crash could be one of the following:
1. it might be crashing due to the mingw aligned patch, since the function names are different for mingw x64.*1
2. the win zone patch also includes a memory allocation change since 03,
this change is meant to fix possible incorrect allocations, which could rear it's ugly head in data manipulation and when freeing the data which would result in seg faults.
the strtok_r patch does not include the supposed memory allocation fix, as it's never crashed for me in mingw builds, only in msvc builds.
*1
I'm holding on releasing the updated patch, since I'm not sure what's going on with mingw x64 having the necessary aligned memory prototypes commented out in the header file.
(options are that the methods are not implemented or not needed, but i'm not exactly sure which one it is - current patch assumes the latter)
you can try out my build listed in the win x64 support thread to see if it proceeds to crash so i can determine if the patches are ok to release or need some working.
The x64 crash is unrelated to the mingw aligned patch; the git head patched with the custom strtok_r patch will crash at the encode's completion. My current patch set is crash-free on my end.
------------------------------------------------
x264.1061M.exe (http://www.mediafire.com/?qtnktwmqleg) - Alternate Download (http://skystrife.com/x264/x264.1061M.exe)
Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_mingw_aligned_03.diff
x264_mingw_check_strtok_r.diff
gcc 3.4.5 fprofiled build with -march=pentium2.
-----------------------------------------------
x264.1061M.x64.exe (http://www.mediafire.com/?3gj0mmk0kn2) - Alternate Download (http://skystrife.com/x264/x264.1061M.x64.exe)
Patches used:
x264_win64_support.01.r1057.diff
x264_win_zone_parse_fix_04.diff
x264_mingw_aligned_03.diff
x264_hrd_pulldown.09_interlace.diff
gcc 4.4.0 fprofiled build.
Audionut
25th December 2008, 09:16
x264-r1061.rar (http://rapidshare.com/files/176590330/x264-r1061.rar)
Patched with,
x264_custom_strtok_r.r1038.diff
x264_fix_stats_file_work.r1038.diff
x264_error_memoryleaks.03.r1038.diff
x264_thread_pool.r1038.diff
x264_thread_priority_with_pool.02.diff
x264_log_file.03k.diff
x264_hrd_pulldown.09_interlace.diff
x264_single_frame_flash.diff
x264_mingw_aligned_03.diff
kemuri-_9
25th December 2008, 09:36
x264_mingw_aligned_03.diff
I see no benefit to this patch.
this is a patch that uses mingw built in aligned memory allocation methods instead of letting x264 do it's hand-aligning.
on the average there's a fairly small performance increase...
(i've mostly seen it personally get a 0.3 - 0.8% speed increase)
the latest version is here (http://kemuri9.net/dev/x264/patches/x264_mingw_aligned_04.diff)
bob0r
25th December 2008, 10:21
Woohoo Santa Penguin to the resque!
Thanks MasterNobody for making him change a numer! :cool:
Dark Shikari
25th December 2008, 12:06
this is a patch that uses mingw built in aligned memory allocation methods instead of letting x264 do it's hand-aligning.
on the average there's a fairly small performance increase...
(i've mostly seen it personally get a 0.3 - 0.8% speed increase)But you have yet to explain why it gives any performance increase--since x264 spends an absolutely negligable amount of time in x264_malloc, and you can't speed up the program by optimizing a function that is almost never called.
We don't commit voodoo patches.
kemuri-_9
25th December 2008, 17:25
all i can think of is that the built in aligned free, malloc, and realloc methods are optimized in some way over the x264's custom aligning methods.
the patch affects the x264_malloc, x264_free, and x264_realloc methods.
best i could do is write up some small program that does a huge number of x264_malloc, x264_realloc, and x264_free calls to dramaticize the time differences w/ + w/o the patch.
but i guess that's only if you find it worth the time
Dark Shikari
25th December 2008, 17:35
all i can think of is that the built in aligned free, malloc, and realloc methods are optimized in some way over the x264's custom aligning methods.If a function uses X% of total program runtime, you cannot gain more than X% performance by optimizing that function.
LoRd_MuldeR
25th December 2008, 20:50
Given that the global memory state is identical and that we want to allocate the same number of bytes, will x264_malloc() and the "native" malloc() return a pointer to the very same memory address ???
If not, this may indirectly effect the performance of other functions - functions that called far more often than the malloc/free functions...
kemuri-_9
26th December 2008, 02:12
since when has any malloc been guaranteed to return the same memory address?
the point of the aligning is that the memory is allocated on a 16byte boundary.
i'll get working on some counts of the mem calls sometime soon to start getting some performance values...
LoRd_MuldeR
26th December 2008, 02:23
since when has any malloc been guaranteed to return the same memory address?
Well, unless malloc() uses a pseudorandom number generator, it should always allocate the same memory range and return a pointer to the same address, given that you call it at the same point of program execution and given that you allocate the same number of bits. Of course this won't be possible, if your program has some indeterminism due to multi-threading...
the point of the aligning is that the memory is allocated on a 16byte boundary.
Yes, I know. However if different malloc() implementations return different addresses, this may effect the performance of other functions, which use the allocated memory. For example one malloc() implementation may implicitly trigger more/less page faults than another implementation. This is just speculation, of course...
kemuri-_9
26th December 2008, 04:26
well, anyways, have some more reliable numbers now:
first, time to see how many times x264_malloc and x264_free were called to get a basis of performance,
running 1000 frames of my 640x480 y4m on 4 threads:
memory calls
x264_malloc: 6067338; x264_free: 6066330; x264_realloc: 0
memory call sizes
x264_malloc: 3247583754 total bytes
(counters were using uint32_t's and the size was using uint64_t, which were 0'd out on program start)
this gives a general idea of how many times the methods are called -> an incredibly large number of times.
and so, wrote this small C program to help with testing the differences between the x264 standard and mingw methods:
test_malloc.c (http://kemuri9.net/dev/x264/other/test_malloc.c)
and the log (http://kemuri9.net/dev/x264/other/mem_test.txt) from the shell.
so unless you want to tell me that time is bugged too, then it shows that the mingw versions are slightly faster
(x264_realloc not particularly tested due to its not particular use at runtime)
@Lord_MuldeR
malloc returns what the OS tells it to as it does request the allocation from the OS layer,
so it's never guaranteed to be the same, as other processes will affect the free physical memory address ranges.
and if the mingw methods didn't work as intended,
people who've been using the patches would have been getting heap corruptions or seg faults when trying to run x264.
LoRd_MuldeR
26th December 2008, 04:51
@Lord_MuldeR
malloc returns what the OS tells it to as it does request the allocation from the OS layer,
so it's never guaranteed to be the same, as other processes will affect the free physical memory address ranges.
Yes. But the allocated memory is mapped to the virtual address space of the calling process. And only the virtual address is returned, not the actual physical one. Since each process has got it's own virtual address space, which is independent of and protected from other processes, I see no reason why the allocated memory should be mapped to different locations each time - given the path of execution is identical and the same amount of memory is allocated (in the current call and in all previous calls). Also it was just a thought that different allocation strategies used by different malloc() implementations may indirectly effect the performance of other functions. That at least would be an explanation why using a different malloc() implementation effects overall performance, although the time actually spend in malloc() is negligible.
BTW: I think malloc() is not implemented in the OS Kernel, but in the C standard library. Of course it uses system calls to the OS Kernel, but different strategies can be implemented.
For example on may use a heap -or- only allocated/free entire pages. The latter has more overhead, but freed pages are returned immediately to the system.
memory calls
x264_malloc: 6067338; x264_free: 6066330; x264_realloc: 0
That sounds like a huge number indeed. But without a reference the number doesn't say much. How does it compare to other functions? :confused:
kemuri-_9
26th December 2008, 06:44
That sounds like a huge number indeed. But without a reference the number doesn't say much. How does it compare to other functions? :confused:
compile x264 with -fprofile-arcs and -ftest-coverage,
and use gcov to generate line hit (coverage) files (*.gcov) to see how often the lines of code are ran after executing it however many times you did.
will give you what you're looking for.
akupenguin
26th December 2008, 07:51
Repeated malloc is used only in ssim and esa. I will not accept any patch which optimizes malloc, because malloc shouldn't be speed-relevant. If malloc runs often enough to even be measurable, then the only proper fix is to eliminate all those excess mallocs.
kemuri-_9
26th December 2008, 19:38
Repeated malloc is used only in ssim and esa. I will not accept any patch which optimizes malloc, because malloc shouldn't be speed-relevant. If malloc runs often enough to even be measurable, then the only proper fix is to eliminate all those excess mallocs.
indeed, this is a more logical way of handling the situation,
but does cause more code to be rewritten...
i wrote a quick tracer and noticed that most of the x264_malloc and x264_free calls are
from ssim (2 each) and in hpel_filter (1 each) (totaling 95+% of all calls)
if the malloc/freed buffers were extracted out into more global/static ones
(as the height and width of the video never change so this is a feasible feat),
it would greatly reduce the number of calls to both.
doing this and being thread safe is a small challenge though...
Dark Shikari
26th December 2008, 20:34
doing this and being thread safe is a small challenge though...Put it in x264_t...
kemuri-_9
27th December 2008, 01:15
Put it in x264_t...
yes, i figured out that would do it since i knew each thread had an x264_t...
but since i wasn't sure of how the threading system was working exactly,
didn't know if i had to handle param calling in some special way outside of the h->{data}
but anywho...
here's a patch, it's gonna need some testing by the devs to make sure it doesn't break something by accident...
but from my quick 640x480 1000 frame encoding runs,
the calls to x264_malloc and x264_free were reduced to
default settings (w/ SSIM): < 1%
default settings + w/o SSIM: about 3%
x264_malloc_reduce.diff (http://kemuri9.net/dev/x264/patches/x264_malloc_reduce.diff)
Dark Shikari
27th December 2008, 01:29
Why not just use one buffer? They're never run at the same time, so you can just set the buffer equal to the larger of the two sizes.
And you should malloc in encoder.c on encoder init (same place zigzag_init, etc are called).
Edit: or, wait, you should probably malloc in the thread init.
kemuri-_9
27th December 2008, 02:24
Why not just use one buffer? They're never run at the same time, so you can just set the buffer equal to the larger of the two sizes.
i'm mostly just trying to get this to work first,
i should leave the tweaking to the smart guys...
though at quick inspection, most if not all of the time the ssim buffers would be larger than the hpel filter one.
And you should malloc in encoder.c on encoder init (same place zigzag_init, etc are called).
Edit: or, wait, you should probably malloc in the thread init.
x264_t *x264_encoder_open ( x264_param_t *param )
...
for( i = 0; i < h->param.i_threads; i++ )
{
/* here? */
}
if so, try #2 (http://kemuri9.net/dev/x264/patches/x264_malloc_reduce_02.diff)
Edit:
updated the patch for a small change of wrapping the ssim sum mallocs around an if
to actually see if ssim will be calculated (saving on the mallocs if not)
Audionut
29th December 2008, 14:36
x264-r1063.rar (http://rapidshare.com/files/178035251/x264-r1063.rar)
Patched with,
x264_custom_strtok_r.r1063.diff
x264_fix_stats_file_work.r1063.diff
x264_no_b_adapt_with_pre_scenecut.r1063.diff
x264_error_memoryleaks.03.r1063.diff
x264_thread_pool.02.r1063.diff
x264_thread_priority_with_pool.02.diff
x264_log_file.03k.diff
x264_hrd_pulldown.09_interlace.diff
x264_single_frame_flash.diff
x264_mingw_aligned_04.diff
komisar
29th December 2008, 21:49
Audionut, update you thread-pool patch...
x264_thread_pool.02.r1063.diff (http://komisar.gin.by/x.patch/BugMaster/20081229/independent/x264_thread_pool.02.r1063.diff)
skystrife
30th December 2008, 08:27
x264.1065M.exe (http://www.mediafire.com/?mhmwzenzymn) - Alternate Download (http://skystrife.com/x264/x264.1065M.exe)
Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix.04.diff
gcc 3.4.5 fprofiled build with -march=pentium2.
-----------------------------------------------
x264.1065M.x64.exe (http://www.mediafire.com/?zdg1ynmgzya) - Alternate Download (http://skystrife.com/x264/x264.1065M.x64.exe)
Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_04.diff
x264_win64_support.01.r1065.diff (http://skystrife.com/x264/x264_win64_support.01.r1065.diff) <-- fixed patch to work with latest commits, but I know nothing as far as asm code goes, so it's highly possible I completely screwed something up. Use at your own risk.
gcc 4.4.0 fprofiled build.
(switched to using win_zone_parse_fix for sanity between the two builds).
imk
30th December 2008, 11:08
x264_win64_support.01.r1065.diff (http://skystrife.com/x264/x264_win64_support.01.r1065.diff) <-- fixed patch to work with latest commits, but I know nothing as far as asm code goes, so it's highly possible I completely screwed something up. Use at your own risk.
It appears to work great. Thanks. :) I was just about to updated it to work with r1065 but decided to check here first to see if someone already had. I compiled a x264 build with ICC using this patch and ran a bunch of profiling runs and all passed perfectly.
skystrife
31st December 2008, 05:41
x264.1066M.exe (http://www.mediafire.com/?wydmmmndm42) - Alternate Download (http://skystrife.com/x264/x264.1066M.exe)
Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix.04.diff
gcc 3.4.5 fprofiled build with -march=pentium2.
-----------------------------------------------
x264.1066M.x64.exe (http://www.mediafire.com/?yyym2nmu2ik) - Alternate Download (http://skystrife.com/x264/x264.1066M.x64.exe)
Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_04.diff
x264_win64_support.01.r1065.diff (http://skystrife.com/x264/x264_win64_support.01.r1065.diff)
gcc 4.4.0 fprofiled build.
skystrife
1st January 2009, 03:41
Sorry for the double-post; just wanted to make sure the new build was found.
-----------------------------------------------
x264.1069M.exe (http://www.mediafire.com/?wjniyj2mz1x) - Alternate Download (http://skystrife.com/x264/x264.1069M.exe)
Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff (http://skystrife.com/x264/x264_win_zone_parse_fix_05.diff)
gcc 3.4.5 fprofiled build with -march=pentium2.
-----------------------------------------------
x264.1069M.x64.exe (http://www.mediafire.com/?tyn4kfbzymy) - Alternate Download (http://skystrife.com/x264/x264.1069M.x64.exe)
Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff (http://skystrife.com/x264/x264_win_zone_parse_fix_05.diff)
x264_win64_support.01.r1065.diff (http://skystrife.com/x264/x264_win64_support.01.r1065.diff)
gcc 4.4.0 fprofiled build.
For all I know something in the new LZCNT asm may need fixing for win64. I don't have a phenom here so right now it worksforme but I don't know if the new additions work properly.
kemuri-_9
1st January 2009, 04:46
For all I know something in the new LZCNT asm may need fixing for win64. I don't have a phenom here so right now it worksforme but I don't know if the new additions work properly.
i'm on a phenom and the win64 asm patch is still working fine.
-> checkasm is reporting no errors
MasterNobody
1st January 2009, 05:59
i'm on a phenom and the win64 asm patch is still working fine.
-> checkasm is reporting no errors
It doesn't mean that it fully correct because it doesn't check the safety of volatile registers (rsi, rdi, xmm6-xmm15).
P.S. Happy New Year!
P.P.S. С Новым Годом всех!
Dark Shikari
1st January 2009, 06:02
It doesn't mean that it fully correct because it doesn't check the safety of volatile registers (rsi, rdi, xmm6-xmm15).By the way, speaking of which, why don't you just backup/restore the volatile registers at the end of publicly accessible API functions, instead of doing so inside assembly functions? The latter seems like a waste of time, as the rest of x264's internals doesn't require the registers to be non-volatile.
akupenguin
1st January 2009, 10:37
By the way, speaking of which, why don't you just backup/restore the volatile registers at the end of publicly accessible API functions, instead of doing so inside assembly functions?
And disable autovectorization. And somehow make sure the compiler doesn't try to store float variables in xmmregs across an asm function call. This is not entirely covered by the mmx/emms separation needed by x86_32, since fpregs are caller-saved so only math and not storage is affected there.
But yes, if you can make it work, anything to better approximate the unix abi will improve performance.
Audionut
2nd January 2009, 15:33
x264-r1070.rar (http://rapidshare.com/files/179026793/x264-r1070.rar)
Patched with,
x264_custom_strtok_r.r1063.diff
x264_no_b_adapt_with_pre_scenecut.r1063.diff
x264_thread_pool.02.r1063.diff
x264_thread_priority_with_pool.02.diff
x264_hrd_pulldown.09_interlace.diff
x264_single_frame_flash.diff
Snowknight26
2nd January 2009, 18:43
Who removed the previous commit that fixed some C99ism?
kemuri-_9
2nd January 2009, 19:25
Who removed the previous commit that fixed some C99ism?
akupenguin did
it was mostly in there to keep MSVC support.
akupenguin no longer cares to maintain MSVC support, so for the time being it's dropped.
see x264-devel email (http://mailman.videolan.org/pipermail/x264-devel/2009-January/005344.html)
skystrife
9th January 2009, 00:59
x264.1074M.exe (http://www.mediafire.com/?mje5nwe14lt) - Alternate Download (http://skystrife.com/x264/x264.1074M.exe)
Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff (http://skystrife.com/x264/x264_win_zone_parse_fix_05.diff)
gcc 3.4.5 fprofiled build with -march=pentium2.
-----------------------------------------------
x264.1074M.x64.exe (http://www.mediafire.com/?mwyynnnjjqx) - Alternate Download (http://skystrife.com/x264/x264.1074M.x64.exe)
Patches used:
x264_hrd_pulldown.09_interlace.diff
x264_win_zone_parse_fix_05.diff (http://skystrife.com/x264/x264_win_zone_parse_fix_05.diff)
x264_win64_support.01.r1065.diff (http://skystrife.com/x264/x264_win64_support.01.r1065.diff)
gcc 4.4.0 fprofiled build.
Snowknight26
9th January 2009, 01:26
Thanks for the continual x64 builds skystrife.
Sharktooth
9th January 2009, 01:29
thanks from me too.
as always, i continue to link them in a more visible thread (https://forum.doom9.org/showthread.php?t=89979) ;)
Atak_Snajpera
9th January 2009, 13:29
as always, i continue to link them in a more visible thread
You should add x64 version as well.
Sharktooth
9th January 2009, 14:30
done.
TL0
9th January 2009, 18:07
How can I tell if I had a problem with this regression?
- Fix regression in r1066
With some combinations of video width and other settings, the scratch buffer was slightly too small.
This caused heap corruption on some systems.
did this cause crashing or corruption in output video?
Sharktooth
9th January 2009, 18:08
usually it was a crash.
Dark Shikari
9th January 2009, 23:17
Crash, and only after encoding generally, as modifying memory data just barely outside of a malloc usually results in heap corruption, not a segfault.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.