View Full Version : Quad-Core on Xvid,just wrong on multi-threads?
lserlohn
13th September 2007, 14:18
hi, I am one who use Xvid for real time encoding video. I have read some threads in this forum, especially this : http://forum.doom9.org/showthread.php?t=128036
I find there are many people like me who blame the poor performance that Xvid working on multi-core processor.
However, I have done another test: I opened four encoding applications, and did real time encoding from four 640*480 live video at the same time. I got 6%-8% (Total 30%)CPU usage each encoder with my E6600 (Duad Core).On the same condition, with my Q6600 (Qual Core), I got 4%-7% (Total 23%).
Note, since the Quad Core has two more processors than the Dual Core, it should accelerate two times than the Dual Core. If Xvid just do poorly on multi-threading, with four applications(four processes) the CPU usage should be reduced down to 16%, however, it didn't.
So, where is wrong?
squid_80
13th September 2007, 16:32
It's a combination of xvid's threading model and win32's poor scheduler. Xvid's threads use sleep(0) to synchronize which is pretty bad because a) it still uses cpu time, hence b) the thread still holds the cpu and doesn't surrender its time-slice. On a single core pc this can cause a livelock, on a multi-core machine it just performs poorly.
In short, more cores = more time burned synchronizing.
henryho_hk
17th September 2007, 08:21
I came across this article (http://www.bluebytesoftware.com/blog/PermaLink,guid,1c013d42-c983-4102-9233-ca54b8f3d1a1.aspx) and found this code segment:
uint loops = 0;
while (!cond) {
if (Environment.ProcessorCount == 1 || (++loops % 100) == 0) {
Thread.Sleep(1);
} else {
Thread.SpinWait(20);
}
}
Busy wait for 20 clock cycles (I think) and, for once every such 100 busy waits, sleep for (at least) 1ms. I dunno if it works better then sleep(0) though.
squid_80
17th September 2007, 09:19
SpinWait is virtually the same as a bunch of sleep(0)'s. Sleep should not be used for synchronization, end of story.
henryho_hk
17th September 2007, 16:28
That means XviD's synchronization needs to be rewritten totally. :eek: These discussions should be put in the officially thread anyway. :rolleyes:
Mutant_Fruit
18th September 2007, 17:19
It's a combination of xvid's threading model and win32's poor scheduler.
Meh, i wouldn't call it either actually.
hi, I am one who use Xvid for real time encoding video.
Are you dropping any frames? If the answer is no, then Xvid is performing perfectly. What you're basically complaining about is that Xvid is so fast that it can capture video in real time using only a few % of your CPU. Is that such a bad thing?
One core on my 1.86Ghz core duo can transcode xvid->xvid at over 100fps.
lserlohn
23rd September 2007, 12:22
Meh, i wouldn't call it either actually.
Are you dropping any frames? If the answer is no, then Xvid is performing perfectly. What you're basically complaining about is that Xvid is so fast that it can capture video in real time using only a few % of your CPU. Is that such a bad thing?
One core on my 1.86Ghz core duo can transcode xvid->xvid at over 100fps.
I don't know what are you talking about. I am not complaining about the encoding speed. What I want to know is that why there is no significant improvment on encoding after I improved my CPU from Dual Core to Quad Core.
Mutant_Fruit
23rd September 2007, 13:13
I don't know what are you talking about. I am not complaining about the encoding speed. What I want to know is that why there is no significant improvment on encoding after I improved my CPU from Dual Core to Quad Core.
There can't possibly be an increase in performance when moving from dual core to quad core in this situation. It's just physically impossible. Let me break it down: You are trying to encode a real-time stream. That means you are encoding at exactly 25 FPS (or whatever). A 2Ghz Pentium 4 can encode at 25 fps without breaking a sweat, therefore if you moved from a 2Ghz P4 to a 10Ghz Quadcore, you'd notice no difference.
since the Quad Core has two more processors than the Dual Core, it should accelerate two times than the Dual Core. If Xvid just do poorly on multi-threading, with four applications(four processes) the CPU usage should be reduced down to 16%
Nothing will be accelerated because the 'slowdown' in encoding is because you are only feeding it 25 frames every second. If you want to see the benefits of dualcore->quadcore, don't do realtime encoding. Grab a 30 min video on disk and encode that. Then you'll be able to feed xvid the 200+ frames a second it'd need to come close to maxing out your CPU.
Sergey A. Sablin
23rd September 2007, 21:49
There can't possibly be an increase in performance when moving from dual core to quad core in this situation. It's just physically impossible. Let me break it down: You are trying to encode a real-time stream. That means you are encoding at exactly 25 FPS (or whatever). A 2Ghz Pentium 4 can encode at 25 fps without breaking a sweat, therefore if you moved from a 2Ghz P4 to a 10Ghz Quadcore, you'd notice no difference.
Nothing will be accelerated because the 'slowdown' in encoding is because you are only feeding it 25 frames every second. If you want to see the benefits of dualcore->quadcore, don't do realtime encoding. Grab a 30 min video on disk and encode that. Then you'll be able to feed xvid the 200+ frames a second it'd need to come close to maxing out your CPU.
obviously he should see low CPU usage - that's the difference.
if CPU usage during encoding 25 fps on dual core is equal to the CPU usage during encoding same movie with 25 fps on quad core, than it seems something going wrong, or at least not optimal.
edit: oops, just re-read OP. the CPU usage is lower on quad core, not 2 times lower though, but anyway. So it maybe just not ideal thread usage (maybe not just related to the way it was coded), or CPU was busy with something else.
Inventive Software
24th September 2007, 12:28
That CPU usage for 4 cores is normal, because it takes more CPU overhead to run more cores.
akupenguin
24th September 2007, 16:21
If you want want to run multiple realtime encodes on a multicore CPU (or any multiple encodes, realtime or not), don't enable threading in the encoder. Instead, just run multiple instances of single-threaded Xvid, each getting its own core.
If that still fails to provide a 2x improvement in performance, then I can only blame memory bandwidth. After all, the cpu isn't the only factor.
HarryM
26th September 2007, 05:48
XviD SMP is still alpha. The utilization of 2xCPU is very poor, generally. Compared to x264 codec, e.g.. DivX codec has much better utilization than XviD, but rarely you get 100% on 2xCPU too.
My experiences say, that the best way is using more instances of XviD. I get 165-170% (respectively +65 - +70% speedup) of only-single-cpu encoding speed (1core=30fps, 2cores=2x25fps).
With four instances on 4xCPU, you get 270%, maybe (1core=30fps, 2cores=2x25fps, 4cores=4x20fps).
I think, that the ideal, powerful SMP optimization is better than runing of more instances (possibly better memory using, disc usage,...). But isn't case of XviD 1.2.
Mutant_Fruit
26th September 2007, 18:50
My experiences say, that the best way is using more instances of XviD. I get 165-170% (respectively +65 - +70% speedup) of only-single-cpu encoding speed (1core=30fps, 2cores=2x25fps).
With four instances on 4xCPU, you get 270%, maybe (1core=30fps, 2cores=2x25fps, 4cores=4x20fps).
I think, that the ideal, powerful SMP optimization is better than runing of more instances (possibly better memory using, disc usage,...). But isn't case of XviD 1.2.
If 1 instance of xvid can max out 1 cpu, then 4 instances can max out 4 cpus (all other things being equal). A 'powerful SMP optimization' isn't as good* as running 4 completely separate instances as 4 completely separate instances are 100% parallel.
*comparing raw CPU usage and not memory etc
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.