Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se |
|
|
#1 | Link |
|
Registered User
Join Date: Feb 2011
Posts: 70
|
x264 Encoding with MeGUI on 32 Cores System
Just make comparison between using standard Avisynth with Avisynth 2.6 MT on MeGUI
What i used: * Dual AMD Opteron 6272 with Total 32 Cores * RAM 16GB * Video Source 1920x1080p, total duration 30mn 2s * MeGUI 2112 * x264 v2200 * x264 Setting: Code:
program --preset fast --pass 2 --bitrate 5000 --stats ".stats" --threads 16 --deblock -1:-1 --b-adapt 2 --ref 3 --weightp 2 --qpmin 10 --qpmax 51 --chroma-qp-offset -2 --rc-lookahead 60 --merange 32 --me umh --direct auto --subme 9 --trellis 2 --psy-rd 0.00:0 --no-fast-pskip --output "output" "input" Code:
DirectShowSource("D:\Sample.645782-algrm.mkv", fps=23.976, audio=false, convertfps=true).AssumeFPS(24000,1001)
1st pass: 10.95 FPS 2nd pass: 14.97 FPS Threads: 40 Encoding Speed: Start 9:32:37 Finish 11:26:33 its about 2 hours CPU Usage 1st pass: 5% - 10% CPU Usage 2nd pass: 20% - 30% 2. With Avisynth 2.6 MT: Avisynth Script Code:
SetMemoryMax(15000)
SetMTMode(3, 8)
DirectShowSource("D:\Sample.645782-algrm.mkv", fps=23.976, audio=false, convertfps=true).AssumeFPS(24000,1001)
1st pass: 11.28 FPS 2nd pass: 15.28 FPS Threads: 64 Encoding Speed: Start 11:55:33 Finish 01:54:14 its about 2 hours CPU Usage 1st pass: 7% - 13% CPU Usage 2nd pass: 21% - 35% No significant improvement. This is probably the maximum of MT. Have tried mode 1 to 6. For me, mode 3 is the best, other even worst than not using MT. Also tried adjust the threads from 2 to 32. 8 is good choice, values above it will crashed. I dont know its MeGUI problem or the the Avisynth 2.6 MT or x264 it self. 3. Now see when i splitted the video source into 10 parts than encode it simultaneously. This is No MT, just standard Avisynth. Result: 1st pass: See screenshot 2nd pass: See screenshot Threads: 40 per x264 process Encoding Speed: Start 10:17:16 Finish 10:39:38 its about 20 Minutes CPU Usage 1st pass: 60% - 80% CPU Usage 2nd pass: 100% It was awesome...!!! See how the Encoding Speed increase 6x more faster. CPU Usage on 1st pass also increase. More parts means the more u can utilize the CPU usage up to 100%. This is maybe good idea for developer of MeGUI or Avisynth 2.6 MT or x264 rather than playing with frames/threads Any opinion guys... |
|
|
|
|
|
#2 | Link |
|
Registered User
Join Date: Aug 2011
Posts: 98
|
From all of this I'd assume your decoder/demuxer is the bottleneck, obviously haali is used for the demuxing as you've got all the icons for it in the tray but what's in the .mkv file and what's being used to decode it? 10 seperate parts encoded at once being that much faster makes it seem as if the decoder isn't multi-threaded (or can only use a small amount).
|
|
|
|
|
|
#3 | Link |
|
Registered User
Join Date: Aug 2009
Posts: 463
|
If you don't use filters you don't need SetMTMode(). And, since you use MeGUI, instead of DirectShowSource() try to open and index your file with FFMS2.
Also, in MeGUI - Settings - External Program Configuration you can change FFMS Thread Count from 1 to 0 (FFMS will automaticly choose number of threads based on number of CPU cores) or you can specify number of threads you want. In that way you will multithread decoder. |
|
|
|
|
|
#4 | Link |
|
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,946
|
From my experience I can tell that spliting with DirectShowSource() is a terrible idea because DirectShowSource() in itself is not frame accurate! You should really use ffms2 instead with only one decoding thread! Multi-threading decoding in ffms2 tends to break frame accuracy in decoder!
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
|
|
|
|
#6 | Link |
|
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,946
|
most likely his video has too low resolution for x264 to use all cores therefore spliting seems to be good idea for his beast machine
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper Last edited by Atak_Snajpera; 12th June 2012 at 15:53. |
|
|
|
|
|
#7 | Link | |||
|
Registered User
Join Date: Feb 2011
Posts: 70
|
Quote:
Quote:
Quote:
|
|||
|
|
|
|
|
#8 | Link |
|
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,946
|
x264 documentation says that usefull number of threads is
video height / 40 1080 / 40 = 27 threads 27 threads will be used if your cpu has 18 logical processors.
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper Last edited by Atak_Snajpera; 12th June 2012 at 16:59. |
|
|
|
|
|
#9 | Link |
|
契約者
Join Date: Jun 2008
Posts: 1,576
|
This was my guess too, as I answered in other thread. 1920x1080 is not maximum resolution its just more widespread. But the point is that it can be not enough to use more cores due to x264's design. I believe there was yet another forum thread where something similar was discussed, and the solution for such huge amount of cores was to split file into parts and encode simultaneously.
|
|
|
|
|
|
#11 | Link |
|
Registered User
Join Date: Feb 2011
Posts: 70
|
Maybe the GUI need additional features to split the videos into parts than encode it simultaneously, after all job done it will auto re-merging al parts.
This will be more easy to user, they dont need to install anything else like Avisynth 2.6 MT and modify the complicated Avisynth script. Also they dont need to install x264 64bit wich some filters still not compatible. Bcoz x264 32bit has memory stuck arround 2GB, thats why i got x264 [error]: malloc of size 2190464 failed. No matter how much u have the memory, x264 32bit only can handle 2GB. Also no matter how much thread u set in the MT, if x264 reach more than 2GB before u can 100% maximize the whole cores it will crash. Please correct me if i am wrong, i am very newbie here ^^, |
|
|
|
|
|
#12 | Link | |||
|
Registered User
Join Date: Aug 2009
Posts: 463
|
Quote:
And you don't need Avisynth 2.6 MT if you don't use filters to process. If you just need to multithread decoding then use decoder that has internal multithreading. Quote:
Quote:
And we still don't know if the bottleneck is Avisynth, decoder or x264. Use AVSMeter to test the speed of your AVS script and see how many cores are used for decoding. Just drag and drop your AVS to avsmeter.exe. Also, you could update MeGUI to latest version 2153 from development server. Last edited by detmek; 12th June 2012 at 19:21. |
|||
|
|
|
|
|
#13 | Link |
|
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,946
|
defalopii
i'm planning to implement that in ripbot264 in distributed encoding mode.
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
|
|
|
|
#14 | Link |
|
Registered User
Join Date: Aug 2009
Posts: 463
|
Isn't splitting file to encode it subopitmal with 2-pass bitrate mode? Worst case scenario would bi if some parts include mostly high motion/high detail scenes and other parts include low motion/low detail scenes? All parts will be encoded with same average bitrate. For low complexity parts bitrate will probably be overkill and for high complexity parts it would not be sufficient. CFR encoding will not suffer because of splitting but 2-pass encoding will, more or less.
|
|
|
|
|
|
#15 | Link |
|
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,946
|
cq mode is more popular these days. 2pass is mainly used by people still living in xvid era. (movies stored on 700mb CDs)
2pass wastes time and space (you have to guess bitrate)
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
|
|
|
|
#16 | Link |
|
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
2-pass splitting can be done efficiently if you do the following:
1. Run CRF first passes on every chunk. 2. Multiply all the bitrates from the first pass by a constant factor so that the overall bitrate is what you want. 3. Run the second pass for each chunk with the bitrate from 2). Forcing every chunk to the same bitrate is what hurts compression a lot. |
|
|
|
|
|
#17 | Link |
|
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,946
|
Something like this?
Code:
--crf 18 --pass 1 --stats chunk_X.log Code:
--pass 2 --bitrate [ chunk_X_average_bitrate_from_first_pass / pass1_average_bitrate_from_all_chunks * selected_bitrate_for_whole_video ] --stats chunk_X.log //where X is a number of chunk
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
|
|
|
|
#19 | Link | ||
|
Registered User
Join Date: Feb 2011
Posts: 70
|
Quote:
Quote:
|
||
|
|
|
|
|
#20 | Link |
|
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,946
|
New EncodingServer will listen on 8 different ports 1000 ,2000 , 3000 ... so you will be able to maximize cpu usage without instaling vitrtual machines.
You will have to just enter 127.0.0.1:1000 , 127.0.0.1:2000 , 127.0.0.1:3000 ...
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|