Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
16th September 2008, 16:01 | #21 | Link |
x264 developer
Join Date: Sep 2004
Posts: 2,392
|
It's possible to have hybrid xvid-threading and frame-threading, with the caveat that both kinds of threads use up the same spatial buffer, so you're only trading off scaling efficiency vs latency, not increasing the max number of threads. And the caveat that xvid-threading is incompatible with accurate RDO (actually I don't know how inaccurate it would be, only that it can't keep track of the bitstream state).
Slice-threading can't mix with frame-threading. GOP-threading can mix with everything, but that's not what you want for latency Pipeline-threading can also mix with everything, but it reduces compression efficiency and scaling efficiency. Last edited by akupenguin; 16th September 2008 at 16:05. |
16th September 2008, 18:19 | #22 | Link |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
I wonder what pipeline threading is.
__________________
|
17th September 2008, 05:40 | #24 | Link |
Angel of Night
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
|
In that case, what about splitting slices into separate instances of the encoder, then combining and packaging the output as each NAL is output in the control program? Not necessarily x264, but it's an avenue to explore if anyone wants unique ways to lower latency.
Videoconferencing won't really live or die from the efficiency loss of using slices, anyway, it's largely talking heads on static backgrounds. |
17th September 2008, 05:45 | #25 | Link | |
Solaris: burnt by the Sun
Join Date: Oct 2004
Location: /etc/default/moo
Posts: 1,923
|
Quote:
seems like the best way to do to me but I'm not a programmer... |
|
17th September 2008, 05:57 | #26 | Link |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
Currently, each thread encode a whole frame (ie, does me, mode decision, entropy, hpel, deblock), so each thread have roughly the same amount of work. If you were to give each thread a specific task instead, the thread doing hpel (for example) would do its task faster than the thread doing the ME, so it would unbalanced CPU load, thus generate threading inefficiency.
__________________
|
17th September 2008, 08:28 | #27 | Link |
x264 developer
Join Date: Sep 2004
Posts: 2,392
|
Then you can't have mvs span the middle of the frame (nor even near the boundary, i.e. edge mbs can't have small subpel mvs). This is exactly why slice-threading is incompatible with frame-threading.
Last edited by akupenguin; 17th September 2008 at 08:38. |
17th September 2008, 12:46 | #29 | Link |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
If you thread decision and entropy, RD costs during the decision will be slightly off, because you won't have the exact cabac context.
__________________
|
17th September 2008, 16:30 | #30 | Link |
Registered User
Join Date: Jun 2008
Posts: 177
|
Maybe then thread everything that not hurt? Thread balancing is already broken with source decoding, b-adapt, 1.5 thread per core and etc. (Don't forget other processes on system that also distrub x264 thread balancing).
If you want perfect threading you must make multitask threads (or many single-task threads for each core) and thread server (with higher priority), that will split source task to smaller parts and distribute them in a smart way to keep all cores busy. |
17th September 2008, 16:47 | #31 | Link |
x264aholic
Join Date: Jul 2007
Location: New York
Posts: 1,752
|
So basically you're saying creating a client/server model where the server assigns the tasks and the clients do the grunt work. If they switched to that threading model, you could probably even span it across multiple systems easier.
Edit: If you could get it to work, you could get close to 35 fps (1080p frames) running at the max of gigabit network, or 85 fps if you're doing 720p frames.
__________________
You can't call your encoding speed slow until you start measuring in seconds per frame. Last edited by Sagekilla; 17th September 2008 at 16:51. |
17th September 2008, 17:52 | #32 | Link |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
Source decoding isn't part of x264, and will limit any threading scheme, so it can be ignored. b-adapt can be threaded without losing anything. 1.5 thread per core is needed because frames don't take systematically the same time to encode (and because sometimes, motion vectors restriction make a thread wait for another, so in such cases it's nice to have a third thread that can run during that time), it's not quite the same thing as load balancing
Also, don't forget that the more thread, the more overhead from context switching. So a thread server, though it looks nice and paper, can cost a lot. In any case, without consideration for latency, slice-type decision is currently the only limiting factor, and it is fairly easy to thread it.
__________________
|
17th September 2008, 19:57 | #33 | Link |
Registered User
Join Date: Jun 2008
Posts: 177
|
All that isn't part of x264 can't be ignored as x264 isn't the only thing that running on a computer. It's pointless to split task in equal parts per thread as other processes will break all balance. Threads must be dynamically loaded with small tasks from tasks buffer.
Thread server don't need many context switches. But more that 1 thread per core produces concurrency. There was thread pool patch somewhere — did it provide better efficiency? |
17th September 2008, 20:25 | #34 | Link |
Registered User
Join Date: Feb 2007
Location: ::1
Posts: 1,236
|
It did for large numbers of threads (> 4) and worsened efficiency for less threads. IIRC.
http://komisar.gin.by/x.patch/BugMas...pool.r965.diff |
Tags |
latency, parallel, threading |
|
|