Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 Encoder GUIs

Reply
 
Thread Tools Search this Thread Display Modes
Old 12th June 2012, 05:18   #1  |  Link
defalopii
Registered User
 
Join Date: Feb 2011
Posts: 70
x264 Encoding with MeGUI on 32 Cores System

Just make comparison between using standard Avisynth with Avisynth 2.6 MT on MeGUI

What i used:
* Dual AMD Opteron 6272 with Total 32 Cores
* RAM 16GB
* Video Source 1920x1080p, total duration 30mn 2s
* MeGUI 2112
* x264 v2200
* x264 Setting:
Code:
program --preset fast --pass 2 --bitrate 5000 --stats ".stats" --threads 16 --deblock -1:-1 --b-adapt 2 --ref 3 --weightp 2 --qpmin 10 --qpmax 51 --chroma-qp-offset -2 --rc-lookahead 60 --merange 32 --me umh --direct auto --subme 9 --trellis 2 --psy-rd 0.00:0 --no-fast-pskip --output "output" "input"
1. Standard Avisynth, no filter etc, very standard script:
Code:
DirectShowSource("D:\Sample.645782-algrm.mkv", fps=23.976, audio=false, convertfps=true).AssumeFPS(24000,1001)
Result:
1st pass: 10.95 FPS
2nd pass: 14.97 FPS
Threads: 40
Encoding Speed: Start 9:32:37 Finish 11:26:33 its about 2 hours
CPU Usage 1st pass: 5% - 10%
CPU Usage 2nd pass: 20% - 30%

2. With Avisynth 2.6 MT:
Avisynth Script
Code:
SetMemoryMax(15000)
SetMTMode(3, 8)
DirectShowSource("D:\Sample.645782-algrm.mkv", fps=23.976, audio=false, convertfps=true).AssumeFPS(24000,1001)
Result:
1st pass: 11.28 FPS
2nd pass: 15.28 FPS
Threads: 64
Encoding Speed: Start 11:55:33 Finish 01:54:14 its about 2 hours
CPU Usage 1st pass: 7% - 13%
CPU Usage 2nd pass: 21% - 35%

No significant improvement. This is probably the maximum of MT. Have tried mode 1 to 6. For me, mode 3 is the best, other even worst than not using MT. Also tried adjust the threads from 2 to 32. 8 is good choice, values ​​above it will crashed. I dont know its MeGUI problem or the the Avisynth 2.6 MT or x264 it self.

3. Now see when i splitted the video source into 10 parts than encode it simultaneously.
This is No MT, just standard Avisynth.





Result:
1st pass: See screenshot
2nd pass: See screenshot
Threads: 40 per x264 process
Encoding Speed: Start 10:17:16 Finish 10:39:38 its about 20 Minutes
CPU Usage 1st pass: 60% - 80%
CPU Usage 2nd pass: 100%

It was awesome...!!!
See how the Encoding Speed increase 6x more faster.
CPU Usage on 1st pass also increase. More parts means the more u can utilize the CPU usage up to 100%.

This is maybe good idea for developer of MeGUI or Avisynth 2.6 MT or x264 rather than playing with frames/threads

Any opinion guys...
defalopii is offline   Reply With Quote
Old 12th June 2012, 07:32   #2  |  Link
golagoda
Registered User
 
Join Date: Aug 2011
Posts: 98
From all of this I'd assume your decoder/demuxer is the bottleneck, obviously haali is used for the demuxing as you've got all the icons for it in the tray but what's in the .mkv file and what's being used to decode it? 10 seperate parts encoded at once being that much faster makes it seem as if the decoder isn't multi-threaded (or can only use a small amount).
golagoda is offline   Reply With Quote
Old 12th June 2012, 09:36   #3  |  Link
detmek
Registered User
 
Join Date: Aug 2009
Posts: 463
If you don't use filters you don't need SetMTMode(). And, since you use MeGUI, instead of DirectShowSource() try to open and index your file with FFMS2.
Also, in MeGUI - Settings - External Program Configuration you can change FFMS Thread Count from 1 to 0 (FFMS will automaticly choose number of threads based on number of CPU cores) or you can specify number of threads you want. In that way you will multithread decoder.
detmek is offline   Reply With Quote
Old 12th June 2012, 10:17   #4  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,946
From my experience I can tell that spliting with DirectShowSource() is a terrible idea because DirectShowSource() in itself is not frame accurate! You should really use ffms2 instead with only one decoding thread! Multi-threading decoding in ffms2 tends to break frame accuracy in decoder!
Atak_Snajpera is offline   Reply With Quote
Old 12th June 2012, 14:15   #5  |  Link
detmek
Registered User
 
Join Date: Aug 2009
Posts: 463
If he gets speed increase with multithreaded ffms2, he doesn't have to split the file.
detmek is offline   Reply With Quote
Old 12th June 2012, 14:35   #6  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,946
most likely his video has too low resolution for x264 to use all cores therefore spliting seems to be good idea for his beast machine

Last edited by Atak_Snajpera; 12th June 2012 at 15:53.
Atak_Snajpera is offline   Reply With Quote
Old 12th June 2012, 16:45   #7  |  Link
defalopii
Registered User
 
Join Date: Feb 2011
Posts: 70
Quote:
Originally Posted by Atak_Snajpera View Post
most likely his video has too low resolution for x264 to use all cores therefore spliting seems to be good idea for his beast machine
My source is 1920x1080, its the maximum resolution today, isnt it?

Quote:
Originally Posted by detmek View Post
If he gets speed increase with multithreaded ffms2, he doesn't have to split the file.
Quote:
Originally Posted by detmek View Post
If you don't use filters you don't need SetMTMode(). And, since you use MeGUI, instead of DirectShowSource() try to open and index your file with FFMS2.
Also, in MeGUI - Settings - External Program Configuration you can change FFMS Thread Count from 1 to 0 (FFMS will automaticly choose number of threads based on number of CPU cores) or you can specify number of threads you want. In that way you will multithread decoder.
FFMS2 also doesnt work, An error occurred: x264 [error]: malloc of size 2190464 failed
defalopii is offline   Reply With Quote
Old 12th June 2012, 16:57   #8  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,946
x264 documentation says that usefull number of threads is

video height / 40

1080 / 40 = 27 threads

27 threads will be used if your cpu has 18 logical processors.

Last edited by Atak_Snajpera; 12th June 2012 at 16:59.
Atak_Snajpera is offline   Reply With Quote
Old 12th June 2012, 17:10   #9  |  Link
Keiyakusha
契約者
 
Keiyakusha's Avatar
 
Join Date: Jun 2008
Posts: 1,576
This was my guess too, as I answered in other thread. 1920x1080 is not maximum resolution its just more widespread. But the point is that it can be not enough to use more cores due to x264's design. I believe there was yet another forum thread where something similar was discussed, and the solution for such huge amount of cores was to split file into parts and encode simultaneously.
Keiyakusha is offline   Reply With Quote
Old 12th June 2012, 17:47   #10  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
The "divide by 40/48/etc" is just a very rough formula, not exact. Additionally, using B-frames lets you exceed this limit a good bit.

You can utilize a few more cores with lookahead threads, at the very least.
Dark Shikari is offline   Reply With Quote
Old 12th June 2012, 18:34   #11  |  Link
defalopii
Registered User
 
Join Date: Feb 2011
Posts: 70
Maybe the GUI need additional features to split the videos into parts than encode it simultaneously, after all job done it will auto re-merging al parts.
This will be more easy to user, they dont need to install anything else like Avisynth 2.6 MT and modify the complicated Avisynth script. Also they dont need to install x264 64bit wich some filters still not compatible. Bcoz x264 32bit has memory stuck arround 2GB, thats why i got x264 [error]: malloc of size 2190464 failed. No matter how much u have the memory, x264 32bit only can handle 2GB. Also no matter how much thread u set in the MT, if x264 reach more than 2GB before u can 100% maximize the whole cores it will crash.
Please correct me if i am wrong, i am very newbie here ^^,
defalopii is offline   Reply With Quote
Old 12th June 2012, 18:50   #12  |  Link
detmek
Registered User
 
Join Date: Aug 2009
Posts: 463
Quote:
Originally Posted by defalopii View Post
Maybe the GUI need additional features to split the videos into parts than encode it simultaneously, after all job done it will auto re-merging al parts.
This will be more easy to user, they dont need to install anything else like Avisynth 2.6 MT and modify the complicated Avisynth script.
Well, it works for everyone else so, I guess, nobody needs the feature to split file for maximum performace. Or, very few people need it.
And you don't need Avisynth 2.6 MT if you don't use filters to process. If you just need to multithread decoding then use decoder that has internal multithreading.
Quote:
Originally Posted by defalopii View Post
Also they dont need to install x264 64bit wich some filters still not compatible. Bcoz x264 32bit has memory stuck arround 2GB, thats why i got x264 [error]: malloc of size 2190464 failed. No matter how much u have the memory, x264 32bit only can handle 2GB.
If you have 64-bit OS you can use 64-bit x264 even with 32-bit Avisynth filters. Just enable 64-bit x264 in MeGUI - Settings - External Program Configuration
Quote:
Originally Posted by defalopii View Post
Also no matter how much thread u set in the MT, if x264 reach more than 2GB before u can 100% maximize the whole cores it will crash.
Please correct me if i am wrong, i am very newbie here ^^,
x264 uses its own threads, not the ones set for Avisynth.
And we still don't know if the bottleneck is Avisynth, decoder or x264.
Use AVSMeter to test the speed of your AVS script and see how many cores are used for decoding. Just drag and drop your AVS to avsmeter.exe.
Also, you could update MeGUI to latest version 2153 from development server.

Last edited by detmek; 12th June 2012 at 19:21.
detmek is offline   Reply With Quote
Old 12th June 2012, 19:08   #13  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,946
defalopii
i'm planning to implement that in ripbot264 in distributed encoding mode.
Atak_Snajpera is offline   Reply With Quote
Old 12th June 2012, 19:16   #14  |  Link
detmek
Registered User
 
Join Date: Aug 2009
Posts: 463
Isn't splitting file to encode it subopitmal with 2-pass bitrate mode? Worst case scenario would bi if some parts include mostly high motion/high detail scenes and other parts include low motion/low detail scenes? All parts will be encoded with same average bitrate. For low complexity parts bitrate will probably be overkill and for high complexity parts it would not be sufficient. CFR encoding will not suffer because of splitting but 2-pass encoding will, more or less.
detmek is offline   Reply With Quote
Old 12th June 2012, 20:17   #15  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,946
cq mode is more popular these days. 2pass is mainly used by people still living in xvid era. (movies stored on 700mb CDs)

2pass wastes time and space (you have to guess bitrate)
Atak_Snajpera is offline   Reply With Quote
Old 13th June 2012, 01:29   #16  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
2-pass splitting can be done efficiently if you do the following:

1. Run CRF first passes on every chunk.
2. Multiply all the bitrates from the first pass by a constant factor so that the overall bitrate is what you want.
3. Run the second pass for each chunk with the bitrate from 2).

Forcing every chunk to the same bitrate is what hurts compression a lot.
Dark Shikari is offline   Reply With Quote
Old 13th June 2012, 09:34   #17  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,946
Something like this?
Code:
--crf 18 --pass 1 --stats chunk_X.log
Code:
--pass 2 --bitrate [ chunk_X_average_bitrate_from_first_pass / pass1_average_bitrate_from_all_chunks * selected_bitrate_for_whole_video ] --stats chunk_X.log  //where X is a number of chunk
Atak_Snajpera is offline   Reply With Quote
Old 13th June 2012, 09:51   #18  |  Link
detmek
Registered User
 
Join Date: Aug 2009
Posts: 463
Right. 1st pass CRF on every part. I assume we can use those stats files for second pass?
Edit:
I guess we can.
detmek is offline   Reply With Quote
Old 16th June 2012, 07:10   #19  |  Link
defalopii
Registered User
 
Join Date: Feb 2011
Posts: 70
Quote:
Originally Posted by Atak_Snajpera View Post
defalopii
i'm planning to implement that in ripbot264 in distributed encoding mode.
Quote:
Originally Posted by Atak_Snajpera View Post
Something like this?
Code:
--crf 18 --pass 1 --stats chunk_X.log
Code:
--pass 2 --bitrate [ chunk_X_average_bitrate_from_first_pass / pass1_average_bitrate_from_all_chunks * selected_bitrate_for_whole_video ] --stats chunk_X.log  //where X is a number of chunk
What i do for now is creating 8 virtual machine in the server. Using RipBot264 Distributed Encoding i spread encoding to that 8 VM. By this i can maximize the whole cores up to 95 - 100% of CPU Usage in the 1st pass. But creating 8 VM and installing RipBot264 Distributed Encoding on each VM is alot of job and eat huge RAM. So that is good if you can split the source using Your methode in Distributed Encoding, but when encoding this should be no need for another machine, just using the availabe cores in 1 machine.
defalopii is offline   Reply With Quote
Old 16th June 2012, 11:45   #20  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,946
New EncodingServer will listen on 8 different ports 1000 ,2000 , 3000 ... so you will be able to maximize cpu usage without instaling vitrtual machines.
You will have to just enter 127.0.0.1:1000 , 127.0.0.1:2000 , 127.0.0.1:3000 ...
Atak_Snajpera is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:12.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.