Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 15th October 2008, 23:40   #1  |  Link
MattKB
Registered User
 
Join Date: Aug 2008
Posts: 16
FFMPEG, x264, multi-core

Hi,

I have a fresh build of FFMPEG and x264 that I built today. I have noticed 2 things (I don't think anything changed in my most recent build - I just wanted to mention that I am on the most recent code).

I am doing 2 pass encoding on a 2 core machine.

1) It seems that the first pass of my encode only uses 1 core regardless of what I set -threads to. Is this an FFMPEG issue? x264 issue? An issue with a setting I am using?

2) On the 2nd pass, in order to fully load the machine I need to use 4 threads on a 2 core machine. What is the "optimal" number of threads to use for encoding based on the number of cores in the machine?

Thanks
MattKB is offline   Reply With Quote
Old 16th October 2008, 02:57   #2  |  Link
Sharktooth
Mr. Sandman
 
Sharktooth's Avatar
 
Join Date: Sep 2003
Location: Haddonfield, IL
Posts: 11,768
number of cores * 1.5
however, its highly probable you cant fill the cores in the first pass if you specified a high number of b-frames along the b-adapt 2 option.
Sharktooth is offline   Reply With Quote
Old 16th October 2008, 05:54   #3  |  Link
MattKB
Registered User
 
Join Date: Aug 2008
Posts: 16
I am using -bf 16 and -badapt 2. However I tried -badapt 1 and it still only used 1 core (though the FPS went way up). Is it the high b-frame # that causes the lack of parallelism? I don't understand what the encoder is doing and whether it could become more parallel. Now with -badapt 2, which I would like to make use of for better quality, it would be cool to be able to load the system during the first pass. Is this technically possible (just curious)?
MattKB is offline   Reply With Quote
Old 16th October 2008, 06:46   #4  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by MattKB View Post
I am using -bf 16 and -badapt 2. However I tried -badapt 1 and it still only used 1 core (though the FPS went way up).
The frametype decision part of the encoder isn't parallelized. This isn't a problem for b-adapt 1, which is really really fast. It is a problem for b-adapt 2, which, at least for really high --bframes values, isn't fast at all.
Quote:
Originally Posted by MattKB View Post
Now with -badapt 2, which I would like to make use of for better quality, it would be cool to be able to load the system during the first pass. Is this technically possible (just curious)?
Sure, just use --bframes 3 or so instead of 16.
Dark Shikari is offline   Reply With Quote
Old 16th October 2008, 17:01   #5  |  Link
MattKB
Registered User
 
Join Date: Aug 2008
Posts: 16
What is the quality tradeoff for bf16/b-adapt1 vs. bf3/b-adapt2?

Also, as I said even wht using bf16/b-adapt1, I only saw 1 core being used. Is this because my quality settings are low enough for the 1st pass that 16 bframes becomes a parallelism blocker even though the decision mechansim is faster?
MattKB is offline   Reply With Quote
Old 16th October 2008, 17:11   #6  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
is x264 giving a warning similar to
x264 [warning]: not compiled with pthread support! ?
if it is, it means you didn't give it the ability to multithread when you compiled it.
which would explain it is only working with 1 thread no matter what you put for --threads x

of course this error only happens if you used the x264 cli, not ffmpeg with libx264...
i don't recall offhand what errors ffmpeg has for attempting threading w/o thread support...
__________________
custom x264 builds & patches | F@H | My Specs

Last edited by kemuri-_9; 16th October 2008 at 17:16.
kemuri-_9 is offline   Reply With Quote
Old 16th October 2008, 17:15   #7  |  Link
MattKB
Registered User
 
Join Date: Aug 2008
Posts: 16
Quote:
Originally Posted by kemuri-_9 View Post
is x264 giving a warning similar to
x264 [warning]: not compiled with pthread support! ?
if it is, it means you didn't give it the ability to multithread when you compiled it.
which would explain it is only working with 1 thread no matter what you put for --threads x
This is not the problem. I am able to load the cores in the 2nd pass as I said above.
MattKB is offline   Reply With Quote
Old 16th October 2008, 17:24   #8  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 12,800
Quote:
Originally Posted by MattKB View Post
What is the quality tradeoff for bf16/b-adapt1 vs. bf3/b-adapt2?
With "--b-adapt 1" and "--bframes 16" the encoder will not use more than 2 consecutive b-frames very often.
So "--b-adapt 2" with "--bframes 3" should produce a better result in fact. More than "--bframes 5" with "--b-adapt 2" is usually overkill.

That's because "--bframes" only limits the maximum number of consecutive b-frames.
It still depends on the b-frame decision method how many consecutive b-frames will actually be used.
The "old" (fast) method is fast, but tends to use very few b-frames. The "new" (slow) method chooses the optimal number of b-frames.
Hence it's better to use the "new" method and use a sane "--bframes" value...
__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.



Last edited by LoRd_MuldeR; 16th October 2008 at 17:31.
LoRd_MuldeR is offline   Reply With Quote
Old 16th October 2008, 17:44   #9  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,151
As DS has mentioned previously, the old b-frame decision method was far from perfect, and setting it to 16 b-frames didn't represent any benefit above a setting of 4. I think there was a bug at one stage which gave the illusion that there was! With the new method, having 5 b-frames optioned is more efficient than having 16 b-frames in the old method, and in fact even though setting a higher number offers diminishing returns, even setting at 3 represents better b-frame utilisation in most circumstances when you discount the aforementioned bug.
burfadel is offline   Reply With Quote
Old 16th October 2008, 19:23   #10  |  Link
MattKB
Registered User
 
Join Date: Aug 2008
Posts: 16
I change my settings to use -bf 5 and -b_strategy 2, and a few things:

1) I am still not really seeing much parallelism on the 1st pass. I'm going to go under the assumption that the 1st pass is mostly working on b-frame decision and as such there is not much parallelism possible.

2) Now I am seeing the following warning at the very end of the 2nd pass:

[libx264 @ 0x17593170]2nd pass has more frames than 1st pass (726).2kbits/s
[libx264 @ 0x17593170]continuing anyway, at constant QP=18
[libx264 @ 0x17593170]disabling adaptive B-frames
[libx264 @ 0x17593170]2nd pass has more frames than 1st pass (726)
[libx264 @ 0x17593170]continuing anyway, at constant QP=18
[libx264 @ 0x17593170]disabling adaptive B-frames
[libx264 @ 0x17593170]2nd pass has more frames than 1st pass (726)
[libx264 @ 0x17593170]continuing anyway, at constant QP=18
[libx264 @ 0x17593170]disabling adaptive B-frames

Note that this message is repeated based on the number of threads in use. In this case I am using 3 threads. If I use 1 thread I see the message once. Any ideas?
MattKB is offline   Reply With Quote
Old 16th October 2008, 19:29   #11  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by MattKB View Post
I change my settings to use -bf 5 and -b_strategy 2, and a few things:

1) I am still not really seeing much parallelism on the 1st pass. I'm going to go under the assumption that the 1st pass is mostly working on b-frame decision and as such there is not much parallelism possible.
-bf 5 -b_strategy 2 is still pretty damn slow. Try -bf 3 or something.
Quote:
Originally Posted by MattKB View Post
2) Now I am seeing the following warning at the very end of the 2nd pass:

[libx264 @ 0x17593170]2nd pass has more frames than 1st pass (726).2kbits/s
[libx264 @ 0x17593170]continuing anyway, at constant QP=18
[libx264 @ 0x17593170]disabling adaptive B-frames
[libx264 @ 0x17593170]2nd pass has more frames than 1st pass (726)
[libx264 @ 0x17593170]continuing anyway, at constant QP=18
[libx264 @ 0x17593170]disabling adaptive B-frames
[libx264 @ 0x17593170]2nd pass has more frames than 1st pass (726)
[libx264 @ 0x17593170]continuing anyway, at constant QP=18
[libx264 @ 0x17593170]disabling adaptive B-frames

Note that this message is repeated based on the number of threads in use. In this case I am using 3 threads. If I use 1 thread I see the message once. Any ideas?
Sounds like ffmpeg is being broken and is dropping frames at the end of the first pass.
Dark Shikari is offline   Reply With Quote
Old 16th October 2008, 19:44   #12  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 12,800
Quote:
Originally Posted by Dark Shikari View Post
Sounds like ffmpeg is being broken and is dropping frames at the end of the first pass.
Avidemux had a similar problem, until it was fixed recently:
http://avidemux.org/admForum/viewtop...d=29706#p29706
__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.


LoRd_MuldeR is offline   Reply With Quote
Old 16th October 2008, 19:57   #13  |  Link
MattKB
Registered User
 
Join Date: Aug 2008
Posts: 16
Apart from the FFMPEG issue, even if I set -bf 0 I still see barely any parallelism on the 1st pass (I can tell it uses the 2nd core, but very little). Here is my encode script if anyone has any further input on why I can't get parallelism on the 1st pass.

nice -n 10 ffmpeg -t 30 -i $1 -croptop $5 -cropbottom $6 -cropleft $7 -cropright $8 \
-s $3 -y -an -pass 1 -vcodec libx264 -threads $NUM_THREADS \
-b ${BIT_RATE}k -maxrate ${BIT_RATE}k -bufsize ${BUF_SIZE}k -rc_init_occupancy ${BUF_INIT}k -flags +loop \
-cmp +chroma -partitions +parti4x4+partp8x8+partb8x8 -me_method epzs -subq 1 -trellis 0 \
-refs 1 -bf 0 -b_strategy 0 -coder 1 -me_range 16 -g 250 -keyint_min 25 -sc_threshold 40 \
-i_qfactor 0.71 -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 $2

nice -n 10 ffmpeg -t 30 -i $1 -croptop $5 -cropbottom $6 -cropleft $7 -cropright $8 -s $3 -y \
-acodec libfaac -ab 96k -ar 48000 -pass 2 -vcodec libx264 -threads $NUM_THREADS \
-b ${BIT_RATE}k -maxrate ${BIT_RATE}k -bufsize ${BUF_SIZE}k -rc_init_occupancy ${BUF_INIT}k \
-flags +loop -cmp +chroma -partitions +parti8x8+parti4x4+partp8x8+partp4x4+partb8x8 \
-flags2 +dct8x8+wpred+bpyramid+mixed_refs -me_method umh -subq 8 -trellis 1 -refs 6 -bf 0 \
-directpred 3 -b_strategy 0 -bidir_refine 1 -coder 1 -me_range 16 -g 250 \
-keyint_min 25 -sc_threshold 40 -i_qfactor 0.71 -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.6 \
-qmin 10 -qmax 51 -qdiff 4 $2
MattKB is offline   Reply With Quote
Old 16th October 2008, 20:16   #14  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 12,800
Quote:
Originally Posted by MattKB
Apart from the FFMPEG issue, even if I set -bf 0 I still see barely any parallelism on the 1st pass
Are sure that your source isn't the bottleneck here?

Also the "-rc_eq" option is obsolete, I think. The RC equitation is hardcoded now.
__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.



Last edited by LoRd_MuldeR; 16th October 2008 at 20:19.
LoRd_MuldeR is offline   Reply With Quote
Old 16th October 2008, 20:24   #15  |  Link
MattKB
Registered User
 
Join Date: Aug 2008
Posts: 16
The source could definitely be the issue. In this case it is 1080x720 H264. Honestly the threading model of FFMPEG confuses me. I think you can use the -threads option for both the decoder and the encoder. I am pretty sure that the H264 decoder included in FFMPEG is now multithreaded, however when I added -threads 3 to the input I saw little difference. I'm not sure if this is because the decoder is not efficient.

Should I also be using cores *.5 for the # of input threads?

Also in the 2nd pass does audio happen on its own thread?

You would think that FFMPEG would be able to determine the appropriate # of threads for the operation at hand automatically, but that is probably a question for a different forum.
MattKB is offline   Reply With Quote
Old 16th October 2008, 20:34   #16  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 12,800
AFAIK ffmepg's H.264 decoder had at least two different methods of multi-threading in the past:
Slice-based multi-threading (works with sliced sources only) and CABAC-based multi-threading (works with all sources that use CABAC, but not very efficient).
I think Frame-based multi-threading (similar to x264's implementation) for ffmpeg's H.264 decoder is under development, but not ready yet...

(And I think the "threads = cores * 3/2" formula applies to x264 only)
__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.



Last edited by LoRd_MuldeR; 16th October 2008 at 20:38.
LoRd_MuldeR is offline   Reply With Quote
Old 16th October 2008, 20:37   #17  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by LoRd_MuldeR View Post
AFAIK ffmepg's H.264 decoder had at least two different methods of multi-threading in the past:
Slice-based multi-threading (works with sliced sources only) and CABAC-based multi-threading (works with all sources that use CABAC, but not very efficient).

I think Frame-based multi-threading (similar tCo x264's implementation) for ffmpeg's H.264 decoder is under development, but not ready yet...
CABAC-based multithreading was never committed in ffmpeg trunk; its only used in FFDshow.
Dark Shikari is offline   Reply With Quote
Old 16th October 2008, 20:40   #18  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 12,800
Quote:
Originally Posted by Dark Shikari View Post
CABAC-based multithreading was never committed in ffmpeg trunk; its only used in FFDshow.
I see. So unless MattKB's source is sliced or he is using an experimental ffmpeg build from MT-branch, there's no multi-threading at all for the source/decoder.
__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.


LoRd_MuldeR is offline   Reply With Quote
Old 16th October 2008, 20:54   #19  |  Link
MattKB
Registered User
 
Join Date: Aug 2008
Posts: 16
Quote:
Originally Posted by LoRd_MuldeR View Post
I see. So unless MattKB's source is sliced or he is using an experimental ffmpeg build from MT-branch, there's no multi-threading at all for the source/decoder.
I checked (using -debug 1) and the source IS sliced, and most of the frames do not use deblocking (which I read somewhere doesnt work for the MT support in the decoder), so I would think it could be done in parallel.

In either case you would think 1 core could be used for decoding and the other core could be used for encoding with a shared memory buffer between them, but I have no idea how FFMPEG is implemented and this is more of an intellectual excercise so I give up!
MattKB is offline   Reply With Quote
Old 16th October 2008, 20:59   #20  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by MattKB View Post
I checked (using -debug 1) and the source IS sliced, and most of the frames do not use deblocking (which I read somewhere doesnt work for the MT support in the decoder)
Its only deblocking between slices that breaks the multithreading, not deblocking in general. Its a flag set in the header.
Dark Shikari is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:09.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2017, vBulletin Solutions Inc.