Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 16th September 2008, 16:01   #21  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,393
It's possible to have hybrid xvid-threading and frame-threading, with the caveat that both kinds of threads use up the same spatial buffer, so you're only trading off scaling efficiency vs latency, not increasing the max number of threads. And the caveat that xvid-threading is incompatible with accurate RDO (actually I don't know how inaccurate it would be, only that it can't keep track of the bitstream state).
Slice-threading can't mix with frame-threading.
GOP-threading can mix with everything, but that's not what you want for latency
Pipeline-threading can also mix with everything, but it reduces compression efficiency and scaling efficiency.

Last edited by akupenguin; 16th September 2008 at 16:05.
akupenguin is offline   Reply With Quote
Old 16th September 2008, 18:19   #22  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,856
I wonder what pipeline threading is.
__________________
Manao is offline   Reply With Quote
Old 17th September 2008, 01:59   #23  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,393
Quote:
Originally Posted by Manao View Post
I wonder what pipeline threading is.
doing different things in different threads; me, mode decision, entropy coding, hpel/deblock, ...
akupenguin is offline   Reply With Quote
Old 17th September 2008, 05:40   #24  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,174
Quote:
Originally Posted by akupenguin View Post
Slice-threading can't mix with frame-threading.
In that case, what about splitting slices into separate instances of the encoder, then combining and packaging the output as each NAL is output in the control program? Not necessarily x264, but it's an avenue to explore if anyone wants unique ways to lower latency.

Videoconferencing won't really live or die from the efficiency loss of using slices, anyway, it's largely talking heads on static backgrounds.
__________________
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. ~ Ed Howdershelt
foxyshadis is offline   Reply With Quote
Old 17th September 2008, 05:45   #25  |  Link
Shinigami-Sama
Solaris: burnt by the Sun
 
Shinigami-Sama's Avatar
 
Join Date: Oct 2004
Location: /etc/default/moo
Posts: 1,923
Quote:
Originally Posted by akupenguin View Post
doing different things in different threads; me, mode decision, entropy coding, hpel/deblock, ...
I thought thats what was already done?

seems like the best way to do to me
but I'm not a programmer...
__________________
Quote:
Originally Posted by benjust View Post
interlacing and telecining should have been but a memory long ago.. unfortunately still just another bizarre weapon in the industries war on image quality.
Shinigami-Sama is offline   Reply With Quote
Old 17th September 2008, 05:57   #26  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,856
Currently, each thread encode a whole frame (ie, does me, mode decision, entropy, hpel, deblock), so each thread have roughly the same amount of work. If you were to give each thread a specific task instead, the thread doing hpel (for example) would do its task faster than the thread doing the ME, so it would unbalanced CPU load, thus generate threading inefficiency.
__________________
Manao is offline   Reply With Quote
Old 17th September 2008, 08:28   #27  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,393
Quote:
Originally Posted by foxyshadis View Post
what about splitting slices into separate instances of the encoder
Then you can't have mvs span the middle of the frame (nor even near the boundary, i.e. edge mbs can't have small subpel mvs). This is exactly why slice-threading is incompatible with frame-threading.

Last edited by akupenguin; 17th September 2008 at 08:38.
akupenguin is offline   Reply With Quote
Old 17th September 2008, 12:28   #28  |  Link
Quark.Fusion
Registered User
 
Quark.Fusion's Avatar
 
Join Date: Jun 2008
Posts: 177
Quote:
Originally Posted by akupenguin View Post
Pipeline-threading can also mix with everything, but it reduces compression efficiency and scaling efficiency.
Why it reduces compression efficiency?
Quark.Fusion is offline   Reply With Quote
Old 17th September 2008, 12:46   #29  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,856
If you thread decision and entropy, RD costs during the decision will be slightly off, because you won't have the exact cabac context.
__________________
Manao is offline   Reply With Quote
Old 17th September 2008, 16:30   #30  |  Link
Quark.Fusion
Registered User
 
Quark.Fusion's Avatar
 
Join Date: Jun 2008
Posts: 177
Maybe then thread everything that not hurt? Thread balancing is already broken with source decoding, b-adapt, 1.5 thread per core and etc. (Don't forget other processes on system that also distrub x264 thread balancing).
If you want perfect threading you must make multitask threads (or many single-task threads for each core) and thread server (with higher priority), that will split source task to smaller parts and distribute them in a smart way to keep all cores busy.
Quark.Fusion is offline   Reply With Quote
Old 17th September 2008, 16:47   #31  |  Link
Sagekilla
x264aholic
 
Join Date: Jul 2007
Location: New York
Posts: 1,752
So basically you're saying creating a client/server model where the server assigns the tasks and the clients do the grunt work. If they switched to that threading model, you could probably even span it across multiple systems easier.

Edit: If you could get it to work, you could get close to 35 fps (1080p frames) running at the max of gigabit network, or 85 fps if you're doing 720p frames.
__________________
You can't call your encoding speed slow until you start measuring in seconds per frame.

Last edited by Sagekilla; 17th September 2008 at 16:51.
Sagekilla is offline   Reply With Quote
Old 17th September 2008, 17:52   #32  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,856
Source decoding isn't part of x264, and will limit any threading scheme, so it can be ignored. b-adapt can be threaded without losing anything. 1.5 thread per core is needed because frames don't take systematically the same time to encode (and because sometimes, motion vectors restriction make a thread wait for another, so in such cases it's nice to have a third thread that can run during that time), it's not quite the same thing as load balancing

Also, don't forget that the more thread, the more overhead from context switching. So a thread server, though it looks nice and paper, can cost a lot.

In any case, without consideration for latency, slice-type decision is currently the only limiting factor, and it is fairly easy to thread it.
__________________
Manao is offline   Reply With Quote
Old 17th September 2008, 19:57   #33  |  Link
Quark.Fusion
Registered User
 
Quark.Fusion's Avatar
 
Join Date: Jun 2008
Posts: 177
All that isn't part of x264 can't be ignored as x264 isn't the only thing that running on a computer. It's pointless to split task in equal parts per thread as other processes will break all balance. Threads must be dynamically loaded with small tasks from tasks buffer.
Thread server don't need many context switches. But more that 1 thread per core produces concurrency. There was thread pool patch somewhere — did it provide better efficiency?
Quark.Fusion is offline   Reply With Quote
Old 17th September 2008, 20:25   #34  |  Link
Ranguvar
Registered User
 
Ranguvar's Avatar
 
Join Date: Feb 2007
Location: ::1
Posts: 1,236
It did for large numbers of threads (> 4) and worsened efficiency for less threads. IIRC.

http://komisar.gin.by/x.patch/BugMas...pool.r965.diff
Ranguvar is offline   Reply With Quote
Reply

Tags
latency, parallel, threading

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:49.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.