Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Display Modes
Old 28th May 2005, 11:31   #1  |  Link
bond
Moderator
 
Join Date: Nov 2001
Posts: 9,780
x264 feature test: multithreaded encoding for dual-cpus

ok akupenguin/pengvado has a new funky goodie for us:
multithreaded encoding, being able to make fully use of dualcpus resulting into a hopefully nice speed increase

the only problem is, not everyone has a dualcpu so this can be extensively tested
so i start this thread to point people having such an equipment to the patch enabling this available here, maybe someone can make a compile so people can test this!?

if its found to be useful and working pengvado will commit it to the svn

hehe, maybe its even ~ twice as fast as with a single cpu only

enjoy
__________________
Between the weak and the strong one it is the freedom which oppresses and the law that liberates (Jean Jacques Rousseau)
I know, that I know nothing (Socrates)

MPEG-4 ASP FAQ | AVC/H.264 FAQ | AAC FAQ | MP4 FAQ | MP4Menu stores DVD Menus in MP4 (guide)
Ogg Theora | Ogg Vorbis
use WM9 today and get Micro$oft controlling the A/V market tomorrow for free

Last edited by bond; 28th May 2005 at 19:16.
bond is offline   Reply With Quote
Old 28th May 2005, 14:49   #2  |  Link
Sharktooth
Mr. Sandman
 
Sharktooth's Avatar
 
Join Date: Sep 2003
Location: Haddonfield, IL
Posts: 10,818
uhm... downloading
I'll make an "experimental" build with this patch so ppl can download and test...
Sharktooth is offline   Reply With Quote
Old 28th May 2005, 15:26   #3  |  Link
Doom9
clueless n00b
 
Join Date: Oct 2001
Location: somewhere over the rainbow
Posts: 10,275
hmm.... looking at the other diffs, it almost looks as if someobdy is working on high profile features and the last RD diff is a month old :/
__________________
For the web's most comprehensive collection of DVD backup guides go to www.doom9.org
Doom9 is offline   Reply With Quote
Old 28th May 2005, 17:52   #4  |  Link
Sharktooth
Mr. Sandman
 
Sharktooth's Avatar
 
Join Date: Sep 2003
Location: Haddonfield, IL
Posts: 10,818
ok, here's a multithreaded build (rev239):
http://www.webalice.it/f.corriga/x26..._239thr_mmx.7z
Sharktooth is offline   Reply With Quote
Old 28th May 2005, 17:57   #5  |  Link
superdump
Registered User
 
superdump's Avatar
 
Join Date: May 2003
Posts: 392
RDO is going to be attempted again for the 8x8/4x4 transform decision in high profile. It proved fairly fruitless in aku's previous attempts but maybe it will be useful in this area.

The 8x8 patch currently (2005/05/28) forces the 8x8 transform and as aku says, this will produce less good results than the 4x4 else 8x8 would be in main profile. It's the ability to choose that brings the extra compression.

Also note the ffh264_8x8 patch for libavc. This allows decoding of HP streams though deblocking is currently not producing a bit perfect output compared to x264 and JM, despite looking correct. It won't properly decode CQM HP encodes either.

Still, plenty to play with. Good job aku!
__________________
superdump

Last edited by superdump; 28th May 2005 at 21:06.
superdump is offline   Reply With Quote
Old 28th May 2005, 18:02   #6  |  Link
bond
Moderator
 
Join Date: Nov 2001
Posts: 9,780
thx a lot for this build!

btw its also useable for people who dont have dualcpus (like me ), in this case x264 will simply write frames with multiple slices but will not be any faster when enabling --threads

just in case someone notices problems with decoding the resulting multislice streams with ffdshow, mplayer (or any other libavcodec-based player), there seems to be a bug in libavcodec (not x264!):
1) using --threads 2 + cabac together results in artefacts i also saw when decoding multislice moonlight and vss samples with libavcodec, so i assume libavcodec has a problem combining those two features
2) using --threads 2 + cabac + b-frames + b-pyramid together shows the artefacts described in 1), but also crashes libavcodec after a short time (when disabling b-pyramid its the same as in 1) without a crash)

if you have problems either disable cabac or use another decoder, like nero, for playing the files, till there is a fix in libavcodec
__________________
Between the weak and the strong one it is the freedom which oppresses and the law that liberates (Jean Jacques Rousseau)
I know, that I know nothing (Socrates)

MPEG-4 ASP FAQ | AVC/H.264 FAQ | AAC FAQ | MP4 FAQ | MP4Menu stores DVD Menus in MP4 (guide)
Ogg Theora | Ogg Vorbis
use WM9 today and get Micro$oft controlling the A/V market tomorrow for free
bond is offline   Reply With Quote
Old 28th May 2005, 20:17   #7  |  Link
AlexB17
Registered User
 
Join Date: Mar 2005
Posts: 33
People why you don't include such useful options as zones & multithreading in VFW version? Many peoples don't wanna mess with CLI builds prefer VDubMod.
AlexB17 is offline   Reply With Quote
Old 28th May 2005, 20:31   #8  |  Link
BBugsBunny
Registered User
 
BBugsBunny's Avatar
 
Join Date: Nov 2003
Location: Austria
Posts: 65
I've got a dual Xeon 3.6 GHz.
I did already some testing of SSE3 builds and could compare the speed of the encoding very well as I would use the same test avi.
Could someone make a VFW compile - I never used CLI prefer VFW + VirtualDub. Dual CPU codecs work very well for VFW as well - eg. Mainconcept DV codec.

By the way has someone maybe a vote left for me for the nero beta programme? I think my hardware setup could be an interresting addition to the beta programme.

Last edited by BBugsBunny; 28th May 2005 at 20:34.
BBugsBunny is offline   Reply With Quote
Old 28th May 2005, 20:40   #9  |  Link
bond
Moderator
 
Join Date: Nov 2001
Posts: 9,780
you guys have a dual cpu, but cant use cli? is it really that hard to let vfw go

see the situation is as follows: if you want a big encoding speed increase you have to drop vfw and use the cli, its easy as that
__________________
Between the weak and the strong one it is the freedom which oppresses and the law that liberates (Jean Jacques Rousseau)
I know, that I know nothing (Socrates)

MPEG-4 ASP FAQ | AVC/H.264 FAQ | AAC FAQ | MP4 FAQ | MP4Menu stores DVD Menus in MP4 (guide)
Ogg Theora | Ogg Vorbis
use WM9 today and get Micro$oft controlling the A/V market tomorrow for free
bond is offline   Reply With Quote
Old 28th May 2005, 20:49   #10  |  Link
BBugsBunny
Registered User
 
BBugsBunny's Avatar
 
Join Date: Nov 2003
Location: Austria
Posts: 65
Well I think I will test the CLI version anyway - I've been using computers since the C64 so it will not be a big problem.
My personal opinion about CLI is that it's like going back to MS-Dos.
Whereas still there are some good things about a command shell though.
Using VirtualDub and VFW codecs is more comfortable and I would have already some SSE3 benchmark results to compare with.

PS: If someone is interrested in building a dual xeon CPU system here is my upgrade story:
http://forums.2cpu.com/showthread.php?threadid=58351
BBugsBunny is offline   Reply With Quote
Old 28th May 2005, 21:05   #11  |  Link
superdump
Registered User
 
superdump's Avatar
 
Join Date: May 2003
Posts: 392
Sorry, deblocking in ffh264 isn't disabled, it's just not producing results identical to x264 and JM. Original post edited.
__________________
superdump
superdump is offline   Reply With Quote
Old 28th May 2005, 21:19   #12  |  Link
BBugsBunny
Registered User
 
BBugsBunny's Avatar
 
Join Date: Nov 2003
Location: Austria
Posts: 65
OK did a quick test:

J:\>x264 -p1 -ot.mp4 deintno.avs
avis [info]: 480x640 @ 25.00 fps (405 frames)
x264 [info]: using cpu capabilities MMX MMXEXT SSE SSE2
mp4 [info]: initial delay 0 (scale 25)
x264 [info]: slice I:2 Avg QP:23.00 Avg size: 43203 PSNR Mean Y:41.63 U:50.20
V:52.29 Avg:43.15 Global:43.15
x264 [info]: slice P:403 Avg QP:26.00 Avg size: 10429 PSNR Mean Y:37.89 U:48.11
V:50.33 Avg:39.48 Global:39.48
x264 [info]: slice I Avg I4x4:85.8% I16x16:14.3%
x264 [info]: slice P Avg I4x4:0.1% I16x16:0.5% P:66.5% P8x8:9.6% PSKIP:23.
4%
x264 [info]: PSNR Mean Y:37.91 U:48.12 V:50.34 Avg:39.50 Global:39.49 kb/s:2118.
2

encoded 405 frames, 9.86 fps, 2118.27 kb/s

with --threads 2
J:\>x264 --threads 2 -p1 -ot.mp4 deintno.avs
avis [info]: 480x640 @ 25.00 fps (405 frames)
x264 [info]: using cpu capabilities MMX MMXEXT SSE SSE2
mp4 [info]: initial delay 0 (scale 25)
x264 [info]: slice I:2 Avg QP:23.00 Avg size: 43323 PSNR Mean Y:41.64 U:50.18
V:52.30 Avg:43.16 Global:43.16
x264 [info]: slice P:403 Avg QP:26.00 Avg size: 10458 PSNR Mean Y:37.89 U:48.11
V:50.32 Avg:39.48 Global:39.48
x264 [info]: slice I Avg I4x4:85.8% I16x16:14.2%
x264 [info]: slice P Avg I4x4:0.1% I16x16:0.5% P:66.8% P8x8:9.6% PSKIP:23.
1%
x264 [info]: PSNR Mean Y:37.91 U:48.12 V:50.33 Avg:39.50 Global:39.49 kb/s:2124.
1

encoded 405 frames, 13.12 fps, 2125.15 kb/s

AVISynth script deintno.avs:
AVISource("Capture.avi")
ConvertToYV12()
BBugsBunny is offline   Reply With Quote
Old 28th May 2005, 21:56   #13  |  Link
Doom9
clueless n00b
 
Join Date: Oct 2001
Location: somewhere over the rainbow
Posts: 10,275
405 frames is a bit tiny.. I use 10'000 frames in my codec comparison and I don't even consider that to be enough.. it's merely a choice from a practicality standpoint as I simply cannot not use my PCs while encoding.
__________________
For the web's most comprehensive collection of DVD backup guides go to www.doom9.org
Doom9 is offline   Reply With Quote
Old 28th May 2005, 23:06   #14  |  Link
APF_Gandalf
Registered User
 
Join Date: Mar 2003
Location: France
Posts: 47
made a little test with this build on a 1001 frames file using avisynth on my dual 3.0 Ghz xeon (hyper threading enabled)
mpc with ffdshow 2005-05-27
osmo4 player 0.2.5-DEV
nero show time 2.0.0.26
nero media players refuses to play anything (because of vobsub?) btw, I never use it.

using a simple avs:

avisource("E:\Tsubasa\Tsubasa chronicle.avi")
addborders(0,66,0,66)
trim(0,1000)

common settings:

F:\x264>x264.exe --bframe 2 --ref 5 --pass 1 --stats "x264_stat.log" --qcomp 0.7
5 --ipratio 1.10 --pbratio 1.30 --analyse "all" --weightb --progress -o tsubasa.
mp4 02.avs
avis [info]: 704x528 @ 23.98 fps (1001 frames)
x264 [info]: using cpu capabilities MMX MMXEXT SSE SSE2

default threads (not enabled)
first pass
cpu usage ~27%
encoded 1001 frames, 11.73 fps, 767.80 kb/s
file plays fine in osmo4, MPC and nero
second pass
cpu usage ~27%
3,77 MB (3 962 176 bytes)
encoded 1001 frames, 10.90 fps, 397.23 kb/s
file plays fine in osmo4, MPC and nero

--threads 2
first pass
cpu usage ~41%
encoded 1001 frames, 12.13 fps, 785.49 kb/s
file crashes osmo4 and MPC; nero show time freezes on a frame with artefacts
second pass
cpu usage ~43%
encoded 1001 frames, 16.00 fps, 398.24 kb/s
file crashes osmo4 and MPC; nero show time freezes on a frame with artefacts

--threads 8
first pass
cpu usage ~60%
encoded 1001 frames, 12.13 fps, 785.49 kb/s
file crashes osmo4 and MPC; nero show time freezes on a frame with artefacts
second pass
cpu usage ~66%
encoded 1001 frames, 15.37 fps, 399.29 kb/s
file crashes osmo4 and MPC; nero show time freezes on a frame with artefacts

--threads 16
first pass
cpu usage ~60%
encoded 1001 frames, 11.99 fps, 785.76 kb/s
second pass
cpu usage ~66%
encoded 1001 frames, 15.58 fps, 399.33 kb/s
file crashes osmo4 and MPC; nero show time freezes on a frame with artefacts

first thing to say, the speed increase is really interesting, but I need a decoder that can decode it
I'll try later with a complete anime episode (36 000 frames)

Last edited by APF_Gandalf; 28th May 2005 at 23:29.
APF_Gandalf is offline   Reply With Quote
Old 28th May 2005, 23:20   #15  |  Link
Sharktooth
Mr. Sandman
 
Sharktooth's Avatar
 
Join Date: Sep 2003
Location: Haddonfield, IL
Posts: 10,818
8 and 16 are useless since you have 4 execution units.
-threads 4 is the ideal choice.
Sharktooth is offline   Reply With Quote
Old 29th May 2005, 00:15   #16  |  Link
APF_Gandalf
Registered User
 
Join Date: Mar 2003
Location: France
Posts: 47
8 and 16 were done because:
-the cpu wasn't satureted yet
-I wanted to see the influence on precision aiming the bitrate/filesize

If I understand it well, it's more or less like slicing the movie in "n" equal parts and encoding them separately (at the same time) and then joining them back together.
If I'm right, the rate control may be less efficient and the more threads you'll launch, the least efficient it will be.
APF_Gandalf is offline   Reply With Quote
Old 29th May 2005, 00:37   #17  |  Link
akupenguin
x264 author
 
akupenguin's Avatar
 
Join Date: Sep 2004
Location: /dev/tty0
Posts: 2,327
No. The movie is sliced into N equal parts within each frame, so ratecontrol is not affected. The only detrimental effects are: cabac contexts are reset, and macroblocks on the top edge of a slice don't get to predict MVs from the row above (both slightly increase bitrate).

With the current patch, threads are capped at 4 even if you have more execution units. If your CPU isn't saturated, it's because not all of the encoding is easily parallelizable. Frame type decisions, deblocking, and half-pel interpolation are done single-threaded.

Note that lavc/ffdshow is known to be buggy with cabac+multislice. I'm investigating it.
akupenguin is offline   Reply With Quote
Old 29th May 2005, 08:27   #18  |  Link
Joe Fenton
Registered User
 
Join Date: Jul 2003
Location: In a house.
Posts: 666
I applied the patch to my cvs checkout of x264. I tried it on a raw dump of the opening of Angelic Layer (720x480, 24 FPS, 1402 frames raw yuv 4:2:0). I used the same settings as APF_Gandalf simply because I'm not familiar with x264. Any suggestions on settings is appreciated.

I'm running Fedora Core 3 for AMD64 (64bit linux), 2.6.11 kernel, on an MSI Master2-FAR with two Opteron 240 CPUs, and 1 GByte of DDR333 memory.

one thread:
pass 1 - encoded 1402 frames, 6.91 fps, 1549.57 kb/s
pass 2 - encoded 1402 frames, 7.82 fps, 1549.57 kb/s
CPU usage is 50 - 55%

two threads:
pass 1 - encoded 1402 frames, 10.62 fps, 1571.93 kb/s
pass 2 - encoded 1402 frames, 12.38 fps, 1571.39 kb/s
CPU usage is 77 - 83%

I would say this is a significant speedup for slower dual CPU systems. I'm doing roughly 50% faster encoding. If I had this for xvid, I'd be a REALLY happy camper. Maybe a similar kind of optimization can be made in xvid. As it is, it really helps x264.
Joe Fenton is offline   Reply With Quote
Old 29th May 2005, 15:53   #19  |  Link
Sharktooth
Mr. Sandman
 
Sharktooth's Avatar
 
Join Date: Sep 2003
Location: Haddonfield, IL
Posts: 10,818
above 50% here too, but depends also on the avisynth filters.
i'll do a more accurate test with yuv samples asap.
Sharktooth is offline   Reply With Quote
Old 29th May 2005, 19:17   #20  |  Link
Joe Fenton
Registered User
 
Join Date: Jul 2003
Location: In a house.
Posts: 666
Since most filters work on individual frames, it seems to me that you could make a couple threads that processed every other frame to speed filters in AVISynth.
Joe Fenton is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 18:06.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.