View Full Version : Multithreaded XviD! [updated - now bframes too!]
sysKin
18th December 2005, 16:54
Edit much later: this was experimental and now is obsolete. New thread, discussing official code, is here: http://forum.doom9.org/showthread.php?t=107783
Hi everyone :)
I just finished my "multithreaded xvid motion estimation for p-frames" project ^____^
First, the patch: http://syskin.is.dreaming.org/smp.patch . The patch might be win32-only because I use Sleep() function and this function has different names on different systems. Apart from that, it's x264's code so should work on many OSes.
Second, a two-threaded build: http://syskin.is.dreaming.org/xvidcore-2threads.7z
Third, a 4-threaded build: http://syskin.is.dreaming.org/xvidcore-4threads.7z
[edit]Third and a half, a single-threaded reference build: http://www.aziendeassociate.it/cd.asp?dir=/
Use the xvid.cvs.head.2005.12.10.7z build. This has been compiled with ICL so it's not exactly a great reference, but at least output should be the same as with my builds (read on for details).
Fourth, some random speed results (defaults, vhq4, NO bframes. X2 4200+ CPU).
1 thread : 1:48
2 threads : 1:05
3 threads : 1:06
4 threads : 1:08
16 threads: 1:11
Now, some info: p-frame's ME is multithreaded. This means it's faster on systems that can use it, like my AMD X2. The slower ME is, the bigger the effect, so use VHQ4 for extra difference.
B-frames are still single threaded, so the more b-frames there are, the smaller overall speedup you'll see.
Output is, or should be, binary identical to single threaded build.
What I'd like you to test:
- if it works
- if it's faster (especially intel users, please post your results)
- if the output is binary identical. Use Celtic Druid's 10 Dec CVS-Head build and compare - should be the same.
Have fun!
Radek
bond
18th December 2005, 17:07
first! :D
ok i dont have a dualcore cpu, so i cant say much ;)
Doom9
18th December 2005, 17:11
can you just replace xvidcore.dll in the system32 directory, without any registration?
sysKin
18th December 2005, 17:12
can you just replace xvidcore.dll in the system32 directory, without any registration?
Yes. Only VfW and dshow are registered, and xvidcore is just a simple library opened by these two.
Doom9
18th December 2005, 17:28
should I use a CSV Head (http://www.aziendeassociate.it/cd///xvid.cvs.head.2005.12.10.7z, 1.2) build or the latest 1.1 build (http://www.aziendeassociate.it/cd///XviD.1.1.cvs.exe) ?
sysKin
18th December 2005, 17:43
CVS Head build (better trellis than 1.1). I'll update the main post now.
AssassiNBG
18th December 2005, 18:41
Hmm I was never able to get 100 CPU usage while encoding, but this thingie kinda gives me a hope. It's finally doing ~60% CPU Usage. It was 50% before (which means it was using only one thread of my P4 3 GHz HT.)
sysKin
19th December 2005, 10:44
I just coded multithreaded b-frame's ME.
Patch (p- and b- together): http://syskin.is.dreaming.org/smp2.patch
Binary (2 threads): http://syskin.is.dreaming.org/xvidcore-2threads-bframes.7z
This time, output is not binary identical. This is by design - in theory, the more threads for b-frames the smaller encoder efficiency, however in practice the difference is nil (and up to ~1% bitrate for infinite number of threads).
Doom9
19th December 2005, 12:56
It's finally doing ~60% CPU Usage. It was 50% before (which means it was using only one thread of my P4 3 GHz HT.)Singlethreaded XviD in VDub uses about 80% CPU on a real SMP system (HT is largely imaginary.. it can hurt as much as it can help and intel may actually dump it in the future and just focus on physical cores which are the real deal). So you see.. dump that intel marketing blablah and get a second core.. it's well worth it.
gilligan2
20th December 2005, 01:09
I replaced the files in the system 32 folder. Do i need to do anything with the patch itself and,if so,can someone tell me what to do.
I have a dual Xeon system so i would be curious to see if it speeds up my encoding time at all.
TIA,
G :) :thanks:
nightrhyme
20th December 2005, 02:03
A little test from a Hyperthreaded P4 (Complete system specs at bottom)
xvid.cvs.head.2005.12.10
http://img279.imageshack.us/img279/4504/cvs1210test28ze.jpg
2.threads with B frames
http://img279.imageshack.us/img279/6614/2thread7nv.jpg
xvid.cvs.head.2005.12.10 is also able to get 42fps. 42fps was max for both builds
http://img502.imageshack.us/img502/6090/cvs12108js.jpg
System Specs:
CPU: P4C Northwood 2.4@3.0 Ghz, 800@1000 Mhz FSB, Cooled by Zalman CNPS7000A-CU
MOBO: Asus p4p800 Deluxe, bios 1019. MAM enabled, Turbo mode
MEMORY: 2 x 512MB Mushkin pc3200 Special (2-2-3-5-8-64T-64µsec)
GRAPHIC'S: Sapphire 9800pro ,Clock Default, (Cat 4.12), VGA-Silencer
HD's: P-ATA: WD1200JB 8MB Cache 120 GB, WD2000JB 8MB Cache 200 GB, Maxtor DM 9+ 160GB, 8mb Cache
PSU: LC-POWER LC6550G 550W
SOUND: Onboard Soundmax
OS: WinXP SP2, INF Update 6.2.1.1001.
Monitor: Samsung 959NF, 19" Diamondtron
Sharktooth
20th December 2005, 02:26
eh.. usual HT BS...
copy both CPU graphs with a screen capture... get the right one and flip it horizontally...
place it over the left graph and look at how edges are perfectly aligned...
oh... if you sum the cpu load at every istant you get "magically" 100%...
that's why multithreading an app to make it run faster on intel HT CPUs is useless...
sysKin
20th December 2005, 06:43
A little test from a Hyperthreaded P4 (Complete system specs at bottom)
xvid.cvs.head.2005.12.10 is also able to get 42fps. 42fps was max for both builds
Aha! "Total time (estimated)" is visibly lower in 2-threaded build.
Never expect significant improvements from HT only - HT doesn't give you more computing power, it can just plug execution stalls in one thread with instructions from another thread.
The worse program optimization, the more HT can give you. "Unfortunately" XviD is pretty well optimized ;)
gilligan2
20th December 2005, 16:35
I replaced the files in the system 32 folder. Do i need to do anything with the patch itself and,if so,can someone tell me what to do.
I have a dual Xeon system so i would be curious to see if it speeds up my encoding time at all.
TIA,
G :) :thanks:
I'm willing to do some testing with a true 2 CPU machine (Xeon 3.06 x 2 with hyperthreading enabled and/or disabled ) if anyone can explain to me how to apply the patch or exactly the steps i need to follow to test out the multithread versions.
Thanks,
Tom :confused:
sysKin
20th December 2005, 17:24
I'm willing to do some testing with a true 2 CPU machine (Xeon 3.06 x 2 with hyperthreading enabled and/or disabled ) if anyone can explain to me how to apply the patch or exactly the steps i need to follow to test out the multithread versions.
- Install any XviD. It's probably best to use any modern build (like Celtic Druid's), but in theory even 1.0 would do.
- replace xvidcore in windows\system32 directory.
gilligan2
20th December 2005, 19:02
OK,thanks,i'll try it out .I thought i had to do something with the "patch" you posted up above, So i'll just use the 4 thread or 2 thread core along with the celtic druid Xvid and try it out.Thanks for your help !!
G:)
CEC
20th December 2005, 19:08
I've got a 26% speedup here! I used the 2 threads file! I have a P4 HT 3 Ghz!
- it works!!!
- it's faster!!!
- the output isn't binary identical because I used the old Koepi 1.1.0 beta 2 build!!!
Nice work!!!! :D
gilligan2
20th December 2005, 19:19
OK,i went from 59 minutes for a certain clip i am encoding to a little under 30 minutes so it really sped up the process for me !!
Thanks again.
I used the ICL 9 version dated 9/19/2005 and the 4 thread core dated 12/20/2005.
g:) :thanks:
Sharktooth
20th December 2005, 19:25
I've got a 26% speedup here! I used the 2 threads file! I have a P4 HT 3 Ghz!
- it works!!!
- it's faster!!!
- the output isn't binary identical because I used the old Koepi 1.1.0 beta 2 build!!!
Nice work!!!! :D
use the 1.2 CVS for speed comparison.
Warrex
23rd December 2005, 16:17
Hi,
I have an Athlon X2 3800+ and encode mainly with XVID 1.03 (might move to newer version and use Divx profiles when there is a final) for my standalones. One is quite old so I use the following settings for Xvid:
AS@L5, Mpeg and no B-Frames, no GMC, no QPel
Xvid 1.03: 168s
Xvid 1.2 cvs: 144s
Xvid 1.2 cvs with xvidcore-2threads: 144s
Divx 6.1 (Home Theatre Profile, Balanced, no B-Frames): 87s
Yes, I overwrote xvidcore.dll in windows/system32. Used same Avisynth script and newest version of Virtualdubmod. No idea why speed did not go up. Any clues?
With these settings and with this clip SSIM was also better when using Divx:
Xvid 1.2 cvs: 0.9705 (78.7)
Divx 6.1: 0.9748 (81.5)
Anyway: Great to see you working an SMP support! :rolleyes:
Edit: Output was identical.
dvd_maniac
23rd December 2005, 17:20
When using Xvid with AutoGK on a dual-core PentiumD 820, can I just replace the xvidcore.dll and expect AutoGK to funtction with it properly?
Alain_French
8th January 2006, 21:21
Hi,
Is it necessary to have an SMP to test your build ?
I test yours and the koepi's one 1.2CVS but encodage freeze. To encode 1 frame, it makes 1 or 2 min !!! Is use avisynth + vdubmod 1.5.4.1. (I have not updated vdumod because of the extraction of WAV frome the avs was too long...). Should I update vdubmod for that works ?
Can vdubmod version change the speed of encodage ?
Thank you
Alain
Koepi
8th January 2006, 21:39
You need to set the program's priority to normal. If it's less or even idle, the speed will be as slow as you described.
Cheers
Koepi
Doom9
8th January 2006, 22:26
You need to set the program's priority to normal. If it's less or even idle, the speed will be as slow as you described.Why is that? Every other SMP optimized video codec I have has no problem running at idle priority and still eats up as much CPU as the mulithreading code allows.
Koepi
8th January 2006, 22:35
The patch from sysKin is "experimental". Also, I can only tell what I observed here on my uniprocessor setup and describe the workaround which helped in my case.
Maybe I totally f*cked that build. I can't check that, but that's why it's marked "unstable", "alpha", "experimental" :) Until now I have no reports from SMP users. This is the first uniprocessor report and I observe the same.
Maybe I shouldn't hardcode 2 threads, maybe they're evil on 1 processor. So if someone could verify that it works as expected on SMP?
Cheers
Koepi
Alain_French
8th January 2006, 23:40
yes i put vdubmod on idle in the program itself and i put thread in idle in windows...
but i always use your build and this is the first time, i have this problem.
I ll test tomorrow again and if i have always the problem, i ll come back.
I will be happy if i could test on smp but i have only an 2800+ :s
I hope having one soon, but i prefer wait for good report here. Real comparaison on a same sample like Doom9 knows how do it :D
Thank you
Alain
Revgen
9th January 2006, 17:37
Okay I did some tests on my AMD X2 4600+ with Koepi's XviD-1.2.-127-07012006 W/Syskin's multithread patch and Celtic Druid's Xvid 1.1 01012006 stable release. The FPS is recorded on the 2nd pass.
I used no filters in this test exept for DGDecode 1.46 Beta 2
Results:
Celtic Druid - Xvid 1.1 Stable - 12.80 FPS
Koepi - XviD-1.2.-127-07012006 - Experimental Build - 19.86FPS: About a 55% increase.
It Works :)
celtic_druid
9th January 2006, 18:14
If you are going to compare, you should use: http://ffdshow.faireal.net/mirror/xvid.cvs.head.2005.12.25.7z
That way it is 1.2 vs. 1.2. No real changes to the cvs since then. Just a small EMT64 fix and a change to the bitstream version.
Revgen
9th January 2006, 18:46
If you are going to compare, you should use: http://ffdshow.faireal.net/mirror/xvid.cvs.head.2005.12.25.7z
That way it is 1.2 vs. 1.2. No real changes to the cvs since then. Just a small EMT64 fix and a change to the bitstream version.
Thanks,
Well I tested this version and I got 12.63FPS, which is slightly slower than the 1.1 stable version. That would make Koepi's build about 57% faster. It works even better :)
Koepi
9th January 2006, 20:52
Revgen:
Thanks for the test! These results are at least more satisfying :)
I'm currently fiddling around with Sleep(0) vs. WaitForObject() and achieved a blazing 0 fps record (stalled after frame 1, waited forever ;) ). Let's see how it works out :)
Cheers
Koepi
Romario
9th January 2006, 21:01
Koepi, what about B-Frames, are they also multithead optimized, or not?
I am little confused, is your build of Xvid 1.2 an alpha or beta version of Xvid 1.2 codec. Please, be more precize. Thank you.
Doom9
9th January 2006, 21:23
Koepi, what about B-Frames, are they also multithead optimized, or not? How about reading koepi's note that he posted on his website? It says
Changelog to XviD-1.1:
- {xvidcore} Experimental SMP support (2 threads hardcoded).
Patch for P- and B-frames from sysKin applied by hand.
- {xvidcore} Trellis improvements (according to sysKin).
- {xvidcore} Bumped bitstream version to 42, you never know (41 is XviD-1.1.0-final).
- {general} The two hardcoded threads are bad for single processor machines. This build doesn't work well on uniprocessor setups!
You've been warned before..
Koepi
9th January 2006, 21:24
The installer shows you the releasenotes. Later on, in your XviD program group, you can click on "releasenotes" and read them again.
See for yourself:
XviD-1.2.-127-07012006 _Alpha Build_
Based on CVS from 07.01.2006 20:00h MET
Changelog to XviD-1.1.0:
[snip]
Romario
9th January 2006, 21:39
Doom9, why you send me Warning in my inbox? I said that I am confused. Your rules are too strict.
Revgen
9th January 2006, 22:06
Doom9, why you send me Warning in my inbox? I said that I am confused. Your rules are too strict.
Doom9 tends to not like posts that ask for information that is readily available. That is why he created Rule #1. Judging by the fact that it's Rule #1, It's my guess that it's something that he finds hard to tolerate.
He also may have interpreted you're "Please be more precise" as begging and pleading.
I personally wouldn't be offended by what you said, but it's Doom9's forum and his rules are quite clear. You just have to adjust.
Romario
9th January 2006, 22:39
Well, thank you for explanation. But I ask that Koepi, not him.
hatte
9th January 2006, 23:40
Tried on P4 3.0 HT. Normal priority, defaults except VHQ=4.
build time (avg of 3) ~cpu load size
1.1.0-30122005 135 s 65% 18396 kb
1.2.-127-07012006 124 s 93% 18324 kb
levi
10th January 2006, 04:25
Saw no speed improvement on my dual core Intel running 64bit XP when replacing xvidcore.dll using Celtic Druid's 1.1 Final (approaching 40 fps ~60% CPU load).
Tried to install Koepi's XviD-1.2.-127-07012006.exe, but it gives an error on xvid.ax (see attached) & it won't complete the install. I believe that the problem is because of 64bit Windows sytem paths - xvid.ax registers in syswow64
C:\WINDOWS\SysWOW64>regsvr32 xvid.ax
Koepi
10th January 2006, 07:37
Ok, I'll remove the 64bit option from the testbuild installer again. I simply have no 64 bit machine (and no 64bit compilers for that matter), so it's useless there.
The multithreaded code is in xvidcore.dll. So if you replace the test-core with another core-dll, you will loose threads...
Cheers
Koepi
ckjnigel
10th January 2006, 10:06
Saw no speed improvement on my dual core Intel running 64bit XP when replacing xvidcore.dll using Celtic Druid's 1.1 Final (approaching 40 fps ~60% CPU load).
Tried to install Koepi's XviD-1.2.-127-07012006.exe, but it gives an error on xvid.ax (see attached) & it won't complete the install. I believe that the problem is because of 64bit Windows sytem paths - xvid.ax registers in syswow64
Same here -- I'm on x64 with an Athlon single core 3300+.
I do want to say that I was very pleased by the solicitous way the installation rolled back after the glitch.
FWIW, I sometimes find I can only get codecs registered in the x64 directories by using Ghisler's Total Commander to navigate to the directory and perform the run line operation there.
Koepi, in the New Year's spirit, I'm so glad Nero hasn't captured you. Your codec builds have given me so much enjoyment for years now. Thanks!
KornX
11th January 2006, 19:11
i clocked my X2 3800 to 2416Mhz and tried your SMP build!
Amazing (90% on both cores), but it crashes when i enable display decompressed output in vdub
(only that u know)
KornX
Revgen
11th January 2006, 19:42
What version of Virtual Dub are you using? I have an X2 4600+ and Vdub 1.6.10, and I don't have any problems with decompressed output.
KornX
11th January 2006, 19:44
Vdub MPEG-2 1.6.11 23858 from fcchandler...
mmh
KornX
Revgen
11th January 2006, 19:56
I've downloaded and tried this version. No problems with this one either. Sorry :(
Have you tried going back to VirtualDubMod?
KornX
12th January 2006, 02:52
yep
there are crashes too,
but very rare...
KornX
Koepi
12th January 2006, 07:41
Hm. Since you already write that you o/c'ed your machine... maybe that's the issue. Try that with running your machine with it's intended speed and test again (don't forget to let it cool down before and hope you didn't damage any components yet.)
Cheers
Koepi
sysKin
12th January 2006, 08:49
If you can catch virtualdub's crash report, definitely post it here.
Tyere should be nothing magical about decompressed output, all xvidcore functions are reentrant (ie encoder can run in parallel with decoder without *any* problems).
Does it crash immidietly or after some time?
[edit] having said that, I crashed after about 200 frames. Thanks for pointing this out!.
Revgen
12th January 2006, 09:01
@syskin
My X2 4600+ doesn't have any problems with Vdub like his X2 3800+ is having. My CPU is not overclocked yet I run with a huge heatsink (Thermaltake Sonic Tower (http://www.frozencpu.com/cpu-tta-25.html)) and fan that keeps my CPU cool at 40 degrees Celsius at full load on both cores.
Perhaps others are experiencing these issues while I'm not.
KornX
12th January 2006, 09:33
First: thx for all your help,
but the system runs very smoothly and stable.
(and i have some overclocking experience (~10 years roudabout).
The big cooler (Zalman - CNPS7700-Cu) keeps it at 46°C (Full Load).
KornX
P.S. Here's your crashreport
VirtualDub-MPEG2 crash report -- build 23858 (release)
--------------------------------------
Disassembly:
018453c0: 8945a0 mov [ebp-60h], eax
018453c3: 8955a8 mov [ebp-58h], edx
018453c6: 89bb40450100 mov [ebx+14540], edi
018453cc: 89bb34450100 mov [ebx+14534], edi
018453d2: 89bb38450100 mov [ebx+14538], edi
018453d8: 8b4dc4 mov ecx, [ebp-3ch]
018453db: 33ff xor edi, edi
018453dd: 897cb118 mov [ecx+esi*4+18h], edi
018453e1: 897cb11c mov [ecx+esi*4+1ch], edi
018453e5: 897cb110 mov [ecx+esi*4+10h], edi
018453e9: 897cb114 mov [ecx+esi*4+14h], edi
018453ed: 897cb108 mov [ecx+esi*4+08h], edi
018453f1: 897cb10c mov [ecx+esi*4+0ch], edi
018453f5: 893cb1 mov [ecx+esi*4], edi
018453f8: 897cb104 mov [ecx+esi*4+04h], edi
018453fc: 89bcb190010000 mov [ecx+esi*4+190], edi
01845403: 89bcb194010000 mov [ecx+esi*4+194], edi
0184540a: 89bcb188010000 mov [ecx+esi*4+188], edi
01845411: 89bcb18c010000 mov [ecx+esi*4+18c], edi
01845418: 89bcb180010000 mov [ecx+esi*4+180], edi
0184541f: 89bcb184010000 mov [ecx+esi*4+184], edi
01845426: 89bcb178010000 mov [ecx+esi*4+178], edi
0184542d: 89bcb17c010000 mov [ecx+esi*4+17c], edi
01845434: 33ff xor edi, edi
01845436: 897db0 mov [ebp-50h], edi
01845439: 33ff xor edi, edi
0184543b: 897dac mov [ebp-54h], edi
0184543e: 8b7d10 mov edi, [ebp+10h]
01845441: 89bcb1f0000000 mov [ecx+esi*4+f0], edi
01845448: 8b4dc0 mov ecx, [ebp-40h]
0184544b: 8b8cb1ec000000 mov ecx, [ecx+esi*4+ec] <-- FAULT
01845452: 83f910 cmp ecx, 10h
01845455: 0f8450100000 jz 018464ab
0184545b: 8b7dcc mov edi, [ebp-34h]
0184545e: 8b4f0c mov ecx, [edi+0ch]
01845461: 894dc8 mov [ebp-38h], ecx
01845464: 8d41e1 lea eax, [ecx-1fh]
01845467: 8945b8 mov [ebp-48h], eax
0184546a: baffffffff mov edx, ffffffff
0184546f: d3ea shr edx, cl
01845471: 8b0f mov ecx, [edi]
01845473: 894dbc mov [ebp-44h], ecx
01845476: 85c0 test eax, eax
01845478: 8955b4 mov [ebp-4ch], edx
0184547b: 0f8e13100000 jle 01846494
01845481: 8bf9 mov edi, ecx
01845483: 237db4 and edi, [ebp-4ch]
01845486: 8bc8 mov ecx, eax
01845488: 8b45cc mov eax, [ebp-34h]
0184548b: 8b4004 mov eax, [eax+04h]
0184548e: d3e7 shl edi, cl
01845490: f7d9 neg ecx
01845492: 83c120 add ecx, 20h
01845495: d3e8 shr eax, cl
01845497: 0bf8 or edi, eax
01845499: 897dbc mov [ebp-44h], edi
0184549c: 8b4dc8 mov ecx, [ebp-38h]
0184549f: 83c101 add ecx, 01h
018454a2: 894dc8 mov [ebp-38h], ecx
018454a5: 83f920 cmp ecx, 20h
018454a8: 7237 jc 018454e1
018454aa: 8b45cc mov eax, [ebp-34h]
018454ad: 8b7804 mov edi, [eax+04h]
018454b0: 89480c mov [eax+0ch], ecx
018454b3: 8b4810 mov ecx, [eax+10h]
018454b6: 8938 mov [eax], edi
018454b8: 8b4908 mov ecx, [ecx+08h]
018454bb: 894dd8 mov [ebp-28h], ecx
018454be: 8b db 8bh
018454bf: 45 inc ebp
Windows 5.1 (Windows XP build 2600) [Service Pack 2]
EAX = 00000011
EBX = 0395cd80
ECX = 03e05b00
EDX = 00000006
EBP = 0511e55c
ESI = 0001cb1a
EDI = 0000001c
ESP = 0511e4e4
EIP = 0184544b
EFLAGS = 00010246
FPUCW = ffff027f
FPUTW = ffffaaaa
Crash reason: Access Violation
Crash context:
An out-of-bounds memory access (access violation) occurred in module 'xvidcore'...
...while running thread "Processing" (thread.cpp:150).
Pointer dumps:
EBX 0395cd80: 0000ffff 00000000 00000010 00000000 00000005 00000001 01a314c0 00000001
ECX 03e05b00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ESI 03e05b00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ESP 0511e4e0: 0013fa1c fffffff8 00000ded 0395cd80 00000000 00002007 00000011 0000001c
0511e500: 03e176a8 0000000e 00000000 00000000 ffbc37fd 00000011 00000007 03e05b00
0511e520: 03da4a40 fffffffe 0511e5b4 0395cd80 0000a224 ffbc37fd 00000000 3f70fd00
0511e540: 0395cd80 0511e5f0 0511e5d8 00000000 0511ee5c 0183c78d 0183c920 0511ee5c
EBP 0511e558: 0183c920 0511ee5c 0183d80f 0395cd80 0511e5b4 0000001c 00000001 00000001
0511e578: 0395ce50 000002c0 00000180 00000010 00000010 01918ad8 00000aad 0000295e
0511e598: 00000000 00000000 00000000 00000000 00000000 00000000 00000007 ffbc37fd
0511e5b8: cbffffff 00000000 00000011 03220040 03220020 00144000 00000000 9a224c1c
Thread call stack:
0184544b: xvidcore!xvid_decore [017f0000+4c37c+90cf]
0183c78d: xvidcore!xvid_decore [017f0000+4c37c+411]
0183c920: xvidcore!xvid_decore [017f0000+4c37c+5a4]
0183d80f: xvidcore!xvid_decore [017f0000+4c37c+1493]
77d4a044: USER32!ClientThreadSetup [77d40000+a00a+3a]
77d494be: USER32!GetWindowLongA [77d40000+945d+61]
77d4b903: USER32!SendMessageW [77d40000+b8ba+49]
77d56623: USER32!IsDlgButtonChecked [77d40000+165fc+27]
10008582: xvidvfw!DriverProc [10000000+6d20+1862]
1000840a: xvidvfw!DriverProc [10000000+6d20+16ea]
01823d1d: xvidcore!xvid_encore [017f0000+33cd4+49]
100082b7: xvidvfw!DriverProc [10000000+6d20+1597]
1000703d: xvidvfw!DriverProc [10000000+6d20+31d]
0183c3b1: xvidcore!xvid_decore [017f0000+4c37c+35]
10006fcc: xvidvfw!DriverProc [10000000+6d20+2ac]
7c910833: ntdll!RtlAllocateHeap [7c900000+105d4+25f]
7c910833: ntdll!RtlAllocateHeap [7c900000+105d4+25f]
7c80b5b8: kernel32!GetModuleHandleA [7c800000+b529+8f]
7c80b58c: kernel32!GetModuleHandleA [7c800000+b529+63]
7c80b5a1: kernel32!GetModuleHandleA [7c800000+b529+78]
7c80b4b6: kernel32!GetModuleFileNameA [7c800000+b357+15f]
7c80b4cb: kernel32!GetModuleFileNameA [7c800000+b357+174]
7c910732: ntdll!RtlAllocateHeap [7c900000+105d4+15e]
7c910732: ntdll!RtlAllocateHeap [7c900000+105d4+15e]
7c9106ab: ntdll!RtlAllocateHeap [7c900000+105d4+d7]
7c9106eb: ntdll!RtlAllocateHeap [7c900000+105d4+117]
7c910732: ntdll!RtlAllocateHeap [7c900000+105d4+15e]
7c911538: ntdll!wcsncpy [7c900000+10a8f+aa9]
7c911596: ntdll!wcsncpy [7c900000+10a8f+b07]
7c9106eb: ntdll!RtlAllocateHeap [7c900000+105d4+117]
7c910833: ntdll!RtlAllocateHeap [7c900000+105d4+25f]
7c910895: ntdll!RtlImageDirectoryEntryToData [7c900000+10856+3f]
7c910833: ntdll!RtlAllocateHeap [7c900000+105d4+25f]
7c910895: ntdll!RtlImageDirectoryEntryToData [7c900000+10856+3f]
7c9037bf: ntdll!RtlConvertUlongToLargeInteger [7c900000+3745+7a]
7c90378b: ntdll!RtlConvertUlongToLargeInteger [7c900000+3745+46]
7c937860: ntdll!LdrAddRefDll [7c900000+37619+247]
7c911538: ntdll!wcsncpy [7c900000+10a8f+aa9]
7c911596: ntdll!wcsncpy [7c900000+10a8f+b07]
7c9106eb: ntdll!RtlAllocateHeap [7c900000+105d4+117]
7c911538: ntdll!wcsncpy [7c900000+10a8f+aa9]
7c911596: ntdll!wcsncpy [7c900000+10a8f+b07]
7c9106eb: ntdll!RtlAllocateHeap [7c900000+105d4+117]
7c81eb33: kernel32!RaiseException [7c800000+1eae1+52]
75a718a8: MSVFW32!ICSendMessage [75a70000+187d+2b]
75a74c09: MSVFW32!ICCompress [75a70000+4ba6+63]
004ae872: VideoSequenceCompressor::PackFrameInternal()
004ae526: VideoSequenceCompressor::packFrame()
75a718a8: MSVFW32!ICSendMessage [75a70000+187d+2b]
75a74c4d: MSVFW32!ICDecompress [75a70000+4c10+3d]
0047f4da: Dubber::WriteVideoFrame()
00484717: VDStreamInterleaver::PushStreams()
0047ee08: Dubber::WriteVideoFrame()
0047fa53: Dubber::ThreadRun()
7c80e07b: kernel32!DuplicateHandle [7c800000+e016+65]
004df5ee: VDThread::StaticThreadStart()
005285af: _threadstartex@4()
7c80b50b: kernel32!GetModuleFileNameA [7c800000+b357+1b4]
-- End of report
sysKin
12th January 2006, 15:36
Okay it's probably the crash I saw. I could only reproduce it under debugger once (btw encoding is HORRIBLY slow under debugger) but it only crashed after I closed encoding, not during. But the dissasembly looks similar so it's probably the same crash.
It's funny because actually it's decoder that crashes, not encoder.
KornX
12th January 2006, 16:47
yeah like i said,
by disabeling "display decompressed output" it was gone...
KornX
guada 2
12th January 2006, 18:28
@reygen
" My X2 4600+ doesn't have any problems with Vdub like his X2 3800+ is having. My CPU is not overclocked yet I run with a huge heatsink (Thermaltake Sonic Tower) and fan that keeps my CPU cool at 40 degrees Celsius at full load on both cores."
Wow!!!
You dont have a motherhead, but a car.... :D
Revgen
12th January 2006, 19:17
@reygen
" My X2 4600+ doesn't have any problems with Vdub like his X2 3800+ is having. My CPU is not overclocked yet I run with a huge heatsink (Thermaltake Sonic Tower) and fan that keeps my CPU cool at 40 degrees Celsius at full load on both cores."
Wow!!!
You dont have a motherhead, but a car.... :D
Yep.;)
This CPU cost me mucho money. I was not going to give it a chance to fry. Especially since I was going to be doing full load encodes.
Zep
13th January 2006, 22:51
Originally Posted by guada 2
@reygen
" My X2 4600+ doesn't have any problems with Vdub like his X2 3800+ is having. My CPU is not overclocked yet I run with a huge heatsink (Thermaltake Sonic Tower) and fan that keeps my CPU cool at 40 degrees Celsius at full load on both cores."
Wow!!!
You dont have a motherhead, but a car....
Yep.;)
This CPU cost me mucho money. I was not going to give it a chance to fry. Especially since I was going to be doing full load encodes.
haha same here. I have the X2 4400+ OC to 2.6Ghz with PC4000
running at 263Mhz 2.5 3 3 4 and it rips through encodes :)
Anyway i bought this heatsink to keep from frying eggs
http://www.newegg.com/Product/ShowImage.asp?image=35-109-119-07.jpg,35-109-119-06.jpg,35-109-119-02.jpg,35-109-119-08.jpg,35-109-119-03.jpg,35-109-119-09.jpg&CurImage=35-109-119-07.jpg&Description=THERMALRIGHT%20XP-90%20Multiple%20Heatpipes%20Cpu%20Heatsink%20-%20Retail
which uses a 90mm 3000RPM fan and does well (but not as well as that
monster you have LOL ) I sit around 50c on full load. (65c is the max rate temp for the CPU)
BTW - i found going from PC3200 to PC4000 was a HUGE speed help.
avs scripts are so memory bound it isn't funny. (well i do all HDTV
encodes so not sure how bad it is for lower rez stuff not needing
to move as much data around in memory)
now with avisynth2.6MTa and XvidSMP my speeds are just wow
and I see syskin is working on a even better SMP build!
Life is good indeed! :D
DVD_GR
14th January 2006, 04:15
Saw no speed improvement on my dual core Intel running 64bit XP when replacing xvidcore.dll using Celtic Druid's 1.1 Final (approaching 40 fps ~60% CPU load).
Tried to install Koepi's XviD-1.2.-127-07012006.exe, but it gives an error on xvid.ax (see attached) & it won't complete the install. I believe that the problem is because of 64bit Windows sytem paths - xvid.ax registers in syswow64
I need to mention for all Intel cpu users at the hardware part of coding that intel dual core architecture is like a greek tragedy,so dont be ready to see some real improvements from the 1st generation of intels dual-cores,their implementation is really bad.And of course as mentioned before hyperthreading is just a <cheat>...2-4% of average performance gain from 50 to 100% cpu usage change as measured on many hardware sites..dont even bother to test...
hajj_3
14th January 2006, 19:58
got my opteron 146 clocked @ 2.85ghz from 2.0, 43c under full load using xp-90 heatsink :). ram is @ 285 (1:1) @ 2.5-4-4-7. stable under 25 hours of prime95 torture test :)
havent tried encoding anything yet, will see how fast i can encode a movie with xvid 1.1. £105 oem for an opteron 146 at scan.co.uk, you cant beat it!
chilledoutuk
14th January 2006, 19:59
i have run this on my bros dual athlon xp setup but only get 75% utilisation is this normal and is there any more room for further optomising the code for multithread encoding?
Doom9
14th January 2006, 21:00
i have run this on my bros dual athlon xp setup but only get 75% utilisation is this normal and is there any more room for further optomising the code for multithread encoding?Do a search on my nick and keywords like thread, interdependence or smp....
here's the link: http://forum.doom9.org/showthread.php?t=104535
Zep
15th January 2006, 03:43
got my opteron 146 clocked @ 2.85ghz from 2.0, 43c under full load using xp-90 heatsink :). ram is @ 285 (1:1) @ 2.5-4-4-7. stable under 25 hours of prime95 torture test :)
havent tried encoding anything yet, will see how fast i can encode a movie with xvid 1.1. £105 oem for an opteron 146 at scan.co.uk, you cant beat it!
single core right? that is why your temp is so good.
2 cores means more heat. If i run just 1 core (turn one off)
my temps are much lower and of course there is double the L2 cache
(in my case 2 megs worth, 1 meg per core) It is a great heatsink
but for dual core that are way OC i would get something even
better if i had to do it again.
why are you running the ram 1 to 1? (Just curious as i run 2 to 1
and my effective rate is 526Mhz with a 6.8 gigs a second using
sandra bench score though the true read is 6 gigs
and the true write just over 2 gigs a second and latency is 48ns)
LOL i ran *2* prime95's torture tests at once. one for each core for
3 days without any errors a month ago at my current settings. I figured
after 3 days that should be proof enough my OC settings are good.
(note i had to up vlink voltage and ram voltage too)
So far super rock solid and i have encoded many many things since :)
hajj_3
15th January 2006, 03:47
yeah its single core, i run 1:1 285 which is 570mhz effictively, stock is 200. ram does 310 (620), but would have to use divider and set cpu to 2.8ghz with 9/10 divider. dual core aint worth the £ at the moment, when you can get an opteron 146 for £105 that does 2.85ghz you just cant beat it for value. if i had the £ i would get a 4400+
swaaye
18th January 2006, 00:20
Koepi's site is asking for a password... :confused:
SCIF
18th January 2006, 02:02
What version of xvid core i should use for 2 threads? Celtic Druid's xvidcore-dualthread.7z(26.12.05 ) (http://www.aziendeassociate.it/cd///xvidcore-dualthread.7z) or sysKin's dualthread(from first post) (http://syskin.is.dreaming.org/xvidcore-2threads.7z) or anything else?
KornX
18th January 2006, 02:26
take koepis
www.koepi.org
the PW shouldn't pop up...
Am i right, Koepi?
KornX
SCIF
18th January 2006, 02:37
You are not authorized to view this page.
Is it only for me???
Revgen
18th January 2006, 04:02
I have Koepi's multithreaded build. Is there a place that I could upload it to?
celtic_druid
18th January 2006, 04:20
Shouldn't make all that much difference. They are all using the same codec, just difference compilers. Mine was ICL9, Koepi's is probably ICL7 and Syskin's gcc?
SCIF
18th January 2006, 04:23
Mine was ICL9, Koepi's is probably ICL7 and Syskin's gcc?
Are sources identical? From your page download
xvidcore-dualthread.7z? What your latest build?
celtic_druid
18th January 2006, 06:19
Think that the cvs may have been updated between Syskin and my builds. It wasn't between mine and Koepi's though, so the source should be the same, unless he added other patches.
cvs was updated recently, but that was just xvidencraw.
- Removed the 9999 frames encode limit from xvid_encraw
Koepi
18th January 2006, 07:44
celtic:
can you make sure to use BS_VERSION 42 for the multithreaded test build? 43 is for 1.2 vanilla cvs. (I didn't check which BS_VERSION you use, I just wanted to make sure.)
@all:
I already wrote my webhoster, let's hope the site gets back to normal soon.
Cheers
Koepi
celtic_druid
18th January 2006, 09:16
Can't remember what it was using.
Ok, there was one thing in the CVS that did change. My build is using 40.
I'll put a new build up with 42.
swaaye
22nd January 2006, 05:23
I just put together a Opteron 165 dual core for my encoding joy. Running at 2.5GHz with 2GB PC4000 heh heh heh. This patch absolutely hauls on it, sitting around 95% CPU or higher. Amazingly fast. First pass ran over 100fps with what I'm doing, and that is like 3x faster than the AthlonXP Barton @ 2.3GHz I was using a month ago for the same avisynth/xvid setup.
I used Xvid 1.1 as a base and then installed the SMP DLL. seems to be working great. Tried 1.2 but it didn't get along with StaxRip.
stax76
22nd January 2006, 10:45
I used Xvid 1.1 as a base and then installed the SMP DLL. seems to be working great. Tried 1.2 but it didn't get along with StaxRip.
It's mandatory to 'Load Default' in the XviD config dialog to ensure you load 1.2 settings into StaxRip and overwrite projects and profiles with fresh 1.2 settings, did you try that?
JustChecking
22nd January 2006, 11:13
OK,i went from 59 minutes for a certain clip i am encoding to a little under 30 minutes so it really sped up the process for me !!
Thanks again.
I used the ICL 9 version dated 9/19/2005 and the 4 thread core dated 12/20/2005.
g:) :thanks:
The url to the 4 threaded core is broken :(
Someone (maybe you gilligan2) that have it and can put it up somewhere? Me wanna test a dual, dualcore opteron. 2 threads did half job, now all 4 cpucores works at 50% (before just 2 cpucores at 50%) so i think the 4 threded build should do the trick.. or well.. i hope it will :) If someone have it.
swaaye
22nd January 2006, 11:40
It's mandatory to 'Load Default' in the XviD config dialog to ensure you load 1.2 settings into StaxRip and overwrite projects and profiles with fresh 1.2 settings, did you try that?
Yeah I always hit load defaults when I first set up a default profile. When I started a job, VDubmod would just load up and quick quit.
XVid 1.1 plus the SMP dll is working great tho. Am I missing anything this way? I just installed XVid1.1 final and the xvidcore-dualthread.7z DLL
swaaye
25th January 2006, 06:50
Koepi I've tried your build but it just seems to crash when Staxrip starts the encode process. VDubmod pops up and promptly closes, it continues to 2nd pass, closes, encoding ends. Dunno what's up.
HookedOnTV
31st January 2006, 15:51
Should this work with GordianKnot? Using Koepi's build VirtuaDubMod flashes open then closes and GordianKnot says finished.
shpitz
5th February 2006, 00:59
hey guys,
i've tried the 'smp' versions and none of them gave any boost, they actually were a little slower the celtic's 1.1 or the 1.2 cvs head compilations.
when i tried koepi's 1.2 smp compilation vdub just crashes as soon as i start the encode.
what is that smp.dll swaaye was referring to?
i was running the following script:
mpeg2source("F:\snap.d2v",idct=7)
trim(49,5673)
# DEINTERLACING
LeakKernelDeint(order=1, sharp=true, forceCPU=5)
# Cropping
Crop(24,6,-16,-4)
bicubicResize(640,480,0,0.75)
a = last
b=a.RemoveGrain(mode=8)
SeeSaw(a,b, NRlimit=6, NRlimit2=7, Sstr=1.5, Slimit=5, Spower=5, SdampLo=6, Szp=16)
so i need to use any special version of avisynth? i'm using 2.5.6a
HookedOnTV
5th February 2006, 02:53
Finally able to seem some improvement. I had to do the encode from directly within VirtualDubMod. From there I could see cpu (x2 3800) utilitization jump to the 80% area. From MeGUI it tops out at 54%.
celtic_druid
5th February 2006, 04:51
I could do an mencoder compile linked against a multithreaded libxvidcore if anyone is interested?
blubberbirne
5th February 2006, 12:25
Is it sure, that the multithread encoder only works in first pass?
HookedOnTV
5th February 2006, 17:05
I could do an mencoder compile linked against a multithreaded libxvidcore if anyone is interested?
That would be great!
captainvideo
13th February 2006, 00:33
I am running an AMD 4200 x2, NForce4 motherboard, 2Gig ram, SATA HD.
I tried out the multithreaded dll. It did seem to be running an additional thread in total, but it didnt improve encoding speed. (working from a dvd rip, creating a 400x300 29.97 at about 950 bitrate.) I had the same fps when running with affinity set to one core or two, or running the original dll.
I suppose it could mean I am system/hardware limited somewhere outside of the cpu, but I would appreciate if anyone could say how to check for something like that. Is there a way to monitor the percentage of the memory bus that is being used? I have to wonder about it though, since I can open another instance of vdub/xvid and simultaneously do a second encoding session at the same frame rate as a single instance (doubling my overall cpu usage/overall movie frames converted), so I have to think that I am not being limited by memory or harddrive access rates.
So maybe the extra thread is running on the same cpu? or maybe it just isnt doing anything useful? or maybe it is not working at all? anybody?
btw I d/l the xvid beta version from koepi's site (the one that was supposed to be pre-patched.,- and it just crashed with a division by zero error.
Ugh. hope I said all I meant to say. I've had to wait 5 days to be allowed to post on this forum.
Anybody have any definite luck with this new dll?
foxyshadis
13th February 2006, 01:27
Most likely filtering (avisynth or vdub) is starving the encoder. You have to give you entire encoding chain; is it (A)GK? Try opening an avi directly and re-encoding with fast recompress, it should go nearly twice as fast with two threads as with one.
Nrmf
13th February 2006, 02:25
am i understanding this correctly theres a xvid build for dual core cpus...i have an amd 3800x2 i would like to test it out using avi.net is this possible.
Alain_French
13th February 2006, 07:38
Hi,
Is the optimization done for single pass ? Could someone test on dual core or tell me this is the same thing for single/double pass encoding.
Thanks
celtic_druid
13th February 2006, 07:52
Should be whenever bframe and or pframes are used.
Alain_French
13th February 2006, 10:02
ok thank you.
captainvideo
13th February 2006, 20:24
Most likely filtering (avisynth or vdub) is starving the encoder. You have to give you entire encoding chain; is it (A)GK? Try opening an avi directly and re-encoding with fast recompress, it should go nearly twice as fast with two threads as with one.
OK, this is my current test result.
started with Xvid 1.1.0-30122005
added the xvidcore.dll from beginning of thread.
Ran vdub-mpeg2 1.6.11 build23858 release mon dec 5 2005
opened vob from a dvd rip (720x480 29.97)
selected fast recompress and xvid compression
selected no audio
no resizing/no cropping/no deinterlacing
Xvid target bitrate 1018, H.263, singlepass, motionsearch#5(veryhigh),VHQ-1(mode decision),VHQ forB-No,
use chromamotion-yes, turbo-yes, trellis yes.
Running original dll for 9000 frames took 4:37 32.4fps
running smp dll for same 9000 frames took 4:04 36.9fps
For this particular instance I saw about 13% improvement in speed.
Ran test with same 9000 frames but turned on filtering
audio compress as mp3 160k
video
added internal deinterlace filter (blend)
added resize to 400x300 (nearest neighbor)
running smp dll for 9000 frames took 4:04 (244sec)36.9fps
running orig dll took 4:17 (257sec)35fps
about6% improvement
AMD4200X2,NFORCE4,2G mem,SATA HD, WinXPproSP2-32bit(amd dualcore driver/patch installed,MS-KB896256 not installed)
I never learned/used autoGK or avisynth, I sometimes use DVDx2.3, and I use several versions of virtualdub. Are we thinking that vdub is holding it back even in the fast recompress mode? or, is this about what i should expect?
squid_80
13th February 2006, 21:43
Are we thinking that vdub is holding it back even in the fast recompress mode?
I would say yes. Vdub-mpeg2 has a rather slow mpeg2 decoder.
captainvideo
13th February 2006, 21:46
ok for my last post I was using as an smp core dll,
a version with an md5 hash of
EE357239965A155ADDA0556C0ED19757
634,946 bytes modified 12/18/05, I think this was from first entry in thread.
I downloaded a different version that mentions bframes in title of zipfile.
634,946 bytes
modified 12/19
I did another test on same vob 9000 frames
I changed the xvid specs to (raising spec from prior test)
motion search 6 ultra
vhq mode 4
vhq for b frame = yes
use chroma=yes
turbo off
trellis yes
with filtering
audio to mp3 160
video deinterlace
video resize to 400x300
using original xvidcore.dll time=7:02 422sec 21.3fps
using 2thread withbframes xvidcore.dll time=5:18 318sec 28.3fps
looks like 30% faster
md5 sums on coredll's
my original xvidcore.dll md5=81CCA8C60DD2EDAF394B6E75FF8E325F filesize 761,856 bytes 12/30/05
my 2thread with b xvidcore.dll md5=D4531E2C2EA4D281A18855DC8F79E28C
my sys spec in earlier post.
woah!
14th February 2006, 10:17
cant really give any fps differences as i didnt do the single version test before i changed over.
but this screenshot should say all i want it to.. it sits at 50% cpu with the old xvid version, and its at 80-95% now... Nice 1
http://images.dr3vil.com/files/default/nice1.jpg
Doom9
14th February 2006, 10:26
Try opening an avi directly and re-encoding with fast recompress, it should go nearly twice as fast with two threads as with one.That still depends a lot on the output. If your source were a HD x264 stream, decoding would take a lot of CPU cycles. And on top of that, do not forget that VDub uses separate threads for reading and encoding.. thus even a single threaded XviD build would run faster on an SMP capable machine. By how much largely depends on the decoding complexity and your settings.. if encoding takes about as much cpu power as decoding, we have two threads using the same amount of cpu power which can be nicely placed on a core each, delivering an almost optimal CPU usage. If decoding or encoding is significantly more complex, then things would slow down.
If smp capable builds enter the picture, it gets more complex as suddenly we have three threads to be shuffled around on two cores...
vipera
13th March 2006, 05:16
Around 95% load on the first pass, 100% load on the second pass.
I have an AMD X2 4200+, using AutoGK to encode from DVD to 720x416 resolution.
shpitz
13th March 2006, 16:30
cant really give any fps differences as i didnt do the single version test before i changed over.
but this screenshot should say all i want it to.. it sits at 50% cpu with the old xvid version, and its at 80-95% now... Nice 1
HUGE misconception !
CPU usage has NOTHING to do with encoding speed... there's a lot more to it than just cpu usage...
you need to check encoding speed in terms of either fps or time it takes, not by cpu usage...
shpitz
13th March 2006, 16:32
my original xvidcore.dll md5=81CCA8C60DD2EDAF394B6E75FF8E325F filesize 761,856 bytes 12/30/05
my 2thread with b xvidcore.dll md5=D4531E2C2EA4D281A18855DC8F79E28C
my sys spec in earlier post.
can you post links to the binaries?
sysKin
14th March 2006, 07:09
Okay people, could you please don't post in this thread anymore? It's about old experimental code which is not used anymore anywhere, and I'm only confused if you still use that old code or new, offcial one.
@moderators, could you just close this thread please :)
Koepi
14th March 2006, 07:20
The currently "official" multithreading-thread is here: http://forum.doom9.org/showthread.php?t=107783
Please use that one instead; I'll close this one now.
Cheers
Koepi
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.