Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
12th September 2008, 21:20 | #1 | Link |
Registered User
Join Date: Aug 2008
Posts: 4
|
Threading and latency in x264
I am interested in more fully understanding the current x264 threading mechanism and expected roadmap. I was using an old (2 1/2 years) version of x264 in a multi-cpu environment (>>1). In those days, threading was handled by slicing the frame. My understanding is the current algorithm threads by frames instead. Unfortunately, if your goals is streaming compression you worry about the latencies this approach introduces.
Naively if I could run a streaming compression on 15 processor in real time, that would introduce a 1/2 second latency right off the top in a 30fps environment (for example). Is my naive view of the x264 approach correct? Any thoughts of the best way to tackle this (other than waiting for faster processors)? I did some searching on this and other forums, but only found this as the most recent comment. My apologies if I've missed this being discussed at length here or elsewhere. Thanks. |
12th September 2008, 23:45 | #2 | Link |
x264... Brilliant!
Join Date: Mar 2005
Location: Rockville, MD
Posts: 167
|
There has been vast changes in speed and quality over the last 2.5 years. It is true that x264 is frame based now. Ok... let me get some clarification from you.
1) Is 30fps what you are able to achieve right now with the old version? 2) 15 (processors) is not a very "computer-like" number. What is your setup? 3) What command line parameters are you using? I might not be the one to be able to give you a good answer, but with the above answers, the community will have a little more info to help you out. |
13th September 2008, 01:02 | #3 | Link |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Yes, the current threading method has such latency, which could be problematic for applications that need extremely low latency, such as videoconferencing. However, x264 doesn't need more than one thread to do realtime SD encoding, so you can still get that realtime with near-zero latency. HD encoding can be done with just two or three threads on a top-end CPU, for just a couple frames of latency.
|
15th September 2008, 16:55 | #4 | Link |
Registered User
Join Date: Aug 2008
Posts: 4
|
Thanks for the prompt responses.
What I'm interested in is real-time 1080p/30 encoding. For HD applications I've primarily seen parallel encoding, but to do this real time requires a number of processors. I'm in a low-power environment where you wouldn't choose to use a top-end CPU, but opt instead for lower clock rate and more processors. I haven't performed a port of the most recent x264 code base to this kind of platform, but my results using someone else's port of the x264 code from 2006 required about 16 processors for real-time on an HD stream. If the current code is approximately as fast, but uses frame-based threading this would introduce a 1/2 second delay (unacceptable in some environments). That's running on a 720p/30 with qp=24, resulting in a 5Mb/s output for my data -- about as low of resolution as I can stand. For 1080p, it's worse of course. As you may have guessed, I'm not using any standard OTS platform and porting the most recent x264 to it will take some effort. I'm trying to evaluate if it's worth it. If the current code is significantly faster, perhaps a reduction from the 1/2 to something below 1/4 would work. I was also curious to understand why the choice was made to be frame-based rather than slice-based threaded. (Ease of programming? Something more subtle?) I'm not sure I've clarified my questions at all, but do appreciate the thoughts. Thanks again. |
15th September 2008, 17:58 | #5 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
Slice-based threading is intolerably inefficient; it caps out very quickly in terms of overall performance increase and does not effectively utilize large numbers of cores. |
|
15th September 2008, 17:58 | #6 | Link |
Turkey Machine
Join Date: Jan 2005
Location: Lowestoft, UK (but visit lots of places with bribes [beer])
Posts: 1,953
|
The "thread-pool" patch (search for it) might be of benefit in that case, but you'd need fast, minimal search settings, and I'm assuming a CRF mode, to do 1080p30 real-time.
__________________
On Discworld it is clearly recognized that million-to-one chances happen 9 times out of 10. If the hero did not overcome huge odds, what would be the point? Terry Pratchett - The Science Of Discworld |
15th September 2008, 18:24 | #7 | Link | |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
Quote:
Said otherwise, if you adapt subme settings according to CPU charge, frame-base will give a better subme average quality than slice based, but the worst case will be the same. Since low latency prevents you from buffering enough frames to adapt subme, you are forced to use a constant subme setting. Which means, in realtime, that you gain nothing between slice & frame based (except a slightly better coding efficiency for frame-base).
__________________
|
|
15th September 2008, 18:27 | #8 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
From the benchmarks I have of realtime encoding with slice-based threading, the maximum performance increase of threads capped out at about 200%, a pathetic value. Frame-based can achieve more than double or triple that. And what really matters is what happens in practice, not some theoretical situation that doesn't actually exist. Also, real time does inherently not imply "low latency" at all. And if you need speedcontrol, the current speedcontrol patch probably works just fine with 30 or even 15 frames of buffer. Furthermore, since it acts on a thread level rather than frame level, it can re-use the buffer that already exists for threads to use; i.e. it shouldn't add any new delay.
__________________
Follow x264 development progress | akupenguin quotes | x264 git status ffmpeg and x264-related consulting/coding contracts | Doom10 Last edited by Dark Shikari; 15th September 2008 at 18:33. |
|
16th September 2008, 14:45 | #9 | Link | |
Registered User
Join Date: Aug 2008
Posts: 4
|
Quote:
I don't consider that pathetic. |
|
16th September 2008, 15:38 | #10 | Link |
x264aholic
Join Date: Jul 2007
Location: New York
Posts: 1,752
|
Would it theoretically be possible to have a hybrid frame-slice based threading? Something like: Frame 0 goes to thread group (0,1,2,3) and Frame 1 goes to thread group (4,5,6,7).
I'm guessing it's very difficult because of temporal dependencies but I was curious.
__________________
You can't call your encoding speed slow until you start measuring in seconds per frame. |
15th September 2008, 18:42 | #11 | Link |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
I didn't say realtime == low latency. I said in a realtime & low latency environment, which is his case.
You propose a 15 frame buffer for speed control, which isn't low latency, so you admit low latency forces a constant subme. And, the worst case scenario, in both cases, is the whole frame taking huge amount of time. Since subme is constant due to low latency, and since worst case scenario has the same speed for both slice & frame based, you end up with the same subme for both slice & frame based threading. And that is achieved at the same CPU usage (since both do the same amount of work).
__________________
|
15th September 2008, 18:46 | #12 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
__________________
Follow x264 development progress | akupenguin quotes | x264 git status ffmpeg and x264-related consulting/coding contracts | Doom10 Last edited by Dark Shikari; 15th September 2008 at 18:49. |
|
15th September 2008, 19:43 | #13 | Link | |
Registered User
Join Date: Aug 2008
Posts: 4
|
Thanks again for all the great insight.
Quote:
For telepresence applications, 15 frames is not low latency. You'd never stand for 1/2 second delay on your cell phone and that's why many telepresence applications are very awkward for the general user. Low latency is 100 ms, which means you need to hold <3 frames in a buffer @30fps. For a television broadcast, a 10 second delay is no big deal and there's where the threading choice of x264 makes perfect sense. However, if we want x264 to be applicable for other problems, it needs to approach a single frame latency, which naively lends itself to slice-based threading. |
|
15th September 2008, 19:05 | #14 | Link | |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
Quote:
__________________
|
|
15th September 2008, 20:04 | #15 | Link |
x264... Brilliant!
Join Date: Mar 2005
Location: Rockville, MD
Posts: 167
|
I like this thread even more! I'm involved in videoconferencing.
In the entire system, latency comes from many places: drivers, buffers, encoding, distance, routing, decoding etc. Latency is inescapable! Theoretically, even talking to someone face-to-face has latency (Distance apart/speed of sound). In technology, you need to determine the overall acceptable latency of a system, then budget for each part of the process. Drivers, buffering, encoding, transmission, distance, queuing, decoding, etc. all add latency. My definition of "Realtime" encoding means being able to indefinitely sustain an encoding rate equal or surpassing the input frame rate. Not including buffering, encoding realtime at 30 fps still can add up to 33ms to the conversation. I believe that 150ms is generally considered the maximum one way latency for a 2-way voice conversation. This can adjust depending on the format of the conversation and the personalities of the people on each end. This should be approximately the same for videoconferencing. I guess and important question is: Is the application latency 2-way videoconferencing? If not, the budget can be expanded greatly. Either way we need to know how much latency can we afford toward encoding/buffering. This will dictate the performance/processor, # of processors, and encoding options that are needed. |
16th September 2008, 07:52 | #17 | Link | |
Registered User
Join Date: Dec 2005
Posts: 133
|
Quote:
HD videoconferencing should be available soon. People already have FullHD camcorders, FullHD TVs, although individuals don't have the required bandwidth to transfer realtime FullHD streams yet, many businesses can afford it. I can't wait to have a fiber connection at home... |
|
16th September 2008, 11:58 | #18 | Link | |
x264... Brilliant!
Join Date: Mar 2005
Location: Rockville, MD
Posts: 167
|
Quote:
All these companies rely on hardware to encode their streams. I no longer have access to these machines, and when I did, I didn't try too hard to get an H.264 stream to analyze. I'm sure it would be interesting though. Many of these companies lock much of their hardware so that it can't setup calls faster than 1.5-2Mbits unless you purchase key codes for greater bitrates. This is unfortunate since we all could imagine what the quality of such an image encoded by a hasty low latency encoder would look like. |
|
16th September 2008, 09:44 | #19 | Link |
Angel of Night
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
|
If you need HD videoconferencing, you pay for the CPU needed to minimize latency. Do note that there's a slice-based patch floating around, which combines both threading methods - I have no way of even testing the latency benefit of combining both, but who knows, it might work.
|
16th September 2008, 09:46 | #20 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
|
|
Tags |
latency, parallel, threading |
|
|