View Full Version : Question to x264 developers
colinhunt
10th June 2009, 10:27
NVIDIA CUDA and Toshiba SpursEngine (found on Canopus Firecoder Blu and Leadtek Winfast PxVC1100) are capable of speeding up H.264 encoding remarkably, as shown by TMPGEnc 4.0 Xpress w/ SpursEngine plug-in. See here: http://tmpgenc.pegasys-inc.com/en/product/te4xp_spurs.html
Will we ever see these two technologies supported in x264? What would it take for that to happen?
thanks,
:: colin
J_Darnley
10th June 2009, 11:30
They might be used when you or someone else provides a patch for it.
colinhunt
10th June 2009, 11:55
OK, but what's needed to produce such a patch? I'm definitely not a programmer, but I might be able to help acquire code, SDK etc.
Chengbin
10th June 2009, 13:27
No, because the quality from using a GPU to encode is really bad compared to x264. To get the quality anywhere near x264 would take a SERIOUS AMOUNT OF WORK.
LoRd_MuldeR
10th June 2009, 13:50
OK, but what's needed to produce such a patch? I'm definitely not a programmer, but I might be able to help acquire code, SDK etc.
The problem is not to get the SDK, because Nvidia offers the CUDA SDK (including documentation) for free download. Also CUDA programming isn't hard to understand, if you have a basic idea of C. But writing fast and efficient CUDA code is the huge problem! What you need is a really deep understand of CUDA and a deep understanding of x264. Then you may be able to identify parts of the encoder that are worth porting to the GPU. And then you may be able to implement these parts efficiently on the GPU, which often requires to come up with completely new algorithms! That's because the existing algorithms usually don't scale to hundreds of threads, as required for an efficient GPU implementation. Last but not least the fact that all GPU/CUDA H.264 encoders available so far suck shows that it's definitely not an easy task to implement a good (and fast) encoder on the GPU...
(BTW: As far as I know there is a x264 CUDA project scheduled for Summer of Code)
colinhunt
10th June 2009, 14:10
Like I said I'm not a programmer, so bear with me guys. Chengbin said GPU encodes result in poor quality. But isn't the GPU only doing the calculations it has been told to do? Surely the GPU can't decide on its own what kind of quality it's going to produce? In my scenario, it would be the x264 parameters making those quality decisions, and both CUDA and SpursEngine would be used to do the heavy lifting, i.e. the number crunching.
Also, so far all the replies have concentrated on CUDA. Please take a look at SpursEngine. It's based on the Cell CPU developed for Sony PS3 and judging by what I've read, it's a completely different beast compared to CUDA. Canopus Firecoder Blu, for example, has been designed for AVC/H.264 encoding.
LoRd_MuldeR
10th June 2009, 14:16
Like I said I'm not a programmer, so bear with me guys. Chengbin said GPU encodes result in poor quality. But isn't the GPU only doing the calculations it has been told to do? Surely the GPU can't decide on its own what kind of quality it's going to produce? In my scenario, it would be the x264 parameters making those quality decisions, and both CUDA and SpursEngine would be used to do the heavy lifting, i.e. the number crunching.
All GPU encoder available so far produce bad quality! That's a fact and it proves that despite all the marketing claims, GPU's are not optimal for video encoding. And yes, the GPU only does the calculations it is told. BUT: You cannot take your existing CPU code, throw it on the GPU and hope it will work. It won't work that way! Your way of thinking and programming must be completely different for the GPU! On the CPU you have 4 or 8, maybe 16 cores. But on the GPU you have ~216 cores you need to keep busy! Also what the GPU cores can do is actually quite limited compared to a CPU core! All GPU cores inside one "warp" must do the same calculation at the same time (or do nothing at all), so they are not independent. Synchronizing threads is only possible within one block. Accessing the graphics memory is really slow, because it's not cached. Floating point is only single-precision, except for the latest GPU generation. And so on...
roozhou
10th June 2009, 14:24
Also, so far all the replies have concentrated on CUDA. Please take a look at SpursEngine. It's based on the Cell CPU developed for Sony PS3 and judging by what I've read, it's a completely different beast compared to CUDA. Canopus Firecoder Blu, for example, has been designed for AVC/H.264 encoding.
Actually x264 does not rely on any "engine", or x264 itself is the "engine". I'd rather port x264 to PS3 than writing a new encoder based on an external library.
burfadel
10th June 2009, 14:41
When using the term 'Cores' on a GPU to refer to the number of pixel shaders, the ATI 48xx cards have 800, and the 58xx cards are rumoured to have 1000 or more. In terms of coding for CUDA, CUDA is only supported by Nvidia and will only ever be supported by Nvidia. There are multi-GPU alternatives that would be much more beneficial to use in the long term, as in OpenCL and Directx11 compute shaders. Nvidia would love nothing more than popular open source software like x264 to use a proprietory system where people will buy a Nvidia card card just to get the few fps faster encoding!
LoRd_MuldeR
10th June 2009, 14:50
When using the term 'Cores' on a GPU to refer to the number of pixel shaders, the ATI 48xx cards have 800, and the 58xx cards are rumoured to have 1000 or more. In terms of coding for CUDA, CUDA is only supported by Nvidia and will only ever be supported by Nvidia. There are multi-GPU alternatives that would be much more beneficial to use in the long term, as in OpenCL and Directx11 compute shaders. Nvidia would love nothing more than popular open source software like x264 to use a proprietory system where people will buy a Nvidia card card just to get the few fps faster encoding!
All that was said about CUDA in this thread applies to OpenCL as well. Switching to OpenCL won't solve any of the fundamental GPU problems...
kemuri-_9
10th June 2009, 15:56
(BTW: As far as I know there is a x264 CUDA project scheduled for Summer of Code)
afaik, no one took that project either, so no go there.
that specific project outline can be found here (http://wiki.videolan.org/SoC_x264_2009#GPU_Motion_Estimation)
(SoC is underway atm and the students are working on their respective projects)
burfadel
10th June 2009, 15:59
OpenCL would be a better place to start :) Probably only good for motion estimation - I'd assume you could massively parallel that.
LoRd_MuldeR
10th June 2009, 16:48
OpenCL would be a better place to start :) Probably only good for motion estimation - I'd assume you could massively parallel that.
In theory. But did you ever try to implement something in CUDA (or Stream/OpenCL), something that isn't trivial? You will run into a bunch of problems you didn't even think about.
And that is before you actually start to optimize your code :D
colinhunt
10th June 2009, 16:58
OK, I get it, forget CUDA. So how about SpursEngine?
Dark Shikari
10th June 2009, 17:02
OK, I get it, forget CUDA. So how about SpursEngine?All of these require a very significant rewrite of the x264 architecture, so unless someone writes a (very large) patch, it's not going to happen.
To begin with, switching to any architecture other than x86 means writing another ~450 assembly functions--that's hard work enough, and then add in the challenge of reorganizing most of the program to suit a different type of program flow... and then finally find that it isn't any faster because things like the SpursEngine are expensive and are crap at anything other than floating point math.
Also, at least half the posts in this thread are misinformed to some extent or another.
colinhunt
10th June 2009, 17:36
and then finally find that it isn't any faster because things like the SpursEngine are expensive and are crap at anything other than floating point math.
The cheapest SpursEngine card, Leadtek Winfast PxVC1100, is less than 170 euros. I don't think that's expensive, as long as the card works like it should. Pegasys Inc's test results on TMPGEnc appear to claim encoding times are hugely shortened using SpursEngine, and even more so when using also CUDA.
Dark Shikari
10th June 2009, 17:39
The cheapest SpursEngine card, Leadtek Winfast PxVC1100, is less than 170 euros. I don't think that's expensive, as long as the card works like it should.A CPU is 0 euros, because every computer already has one. Since computers don't come with SpursEngines, you have to spend extra money to get one.Pegasys Inc's test results on TMPGEnc appear to claim encoding times are hugely shortened using SpursEngine, and even more so when using also CUDA.It's not my fault if their old encoder was a godawfully slow piece of crap.
colinhunt
10th June 2009, 18:05
A CPU is 0 euros, because every computer already has one. Since computers don't come with SpursEngines, you have to spend extra money to get one.
Honestly, what kind of an argument is that? So if you're burdened with a slow CPU and you want to encode faster, you definitely shouldn't spend any money to buy faster hardware. Hey, memory costs nothing either as every computer has some, but forget about buying more if you need it. Yeah, makes perfect sense. Wait, no it doesn't. Everyone of us spends loads of money every year upgrading their computers, and such an add-on card is simply one more upgrade.
It's not my fault if their old encoder was a godawfully slow piece of crap.
No, but being a bit of a d*** about it certainly is.
Sheesh, what a forum this is.
roozhou
10th June 2009, 18:07
The cheapest SpursEngine card, Leadtek Winfast PxVC1100, is less than 170 euros. I don't think that's expensive, as long as the card works like it should. Pegasys Inc's test results on TMPGEnc appear to claim encoding times are hugely shortened using SpursEngine, and even more so when using also CUDA.
Do not underestimate the power of modern CPUs. Have you ever tested how fast x264 can run on a 170-euro CPU?
LoRd_MuldeR
10th June 2009, 18:21
No, but being a bit of a d*** about it certainly is.
Sheesh, what a forum this is.
It seems you totally miss the point! When switching from some old and unoptimized encoder to some dedicated encoder hardware you may get a significant speed-up, just because the old encoder was so slow on the CPU. But x264 already is highly optimized for modern CPU's and it scales very well on multi-core processors. So apart from all the work it would take to implement, switching from the existing highly-optimized CPU implementation of x264 to some dedicated encoder hardware would give far less speed-up, probably not worth the extra money for the special hardware...
Trahald
10th June 2009, 18:52
No, but being a bit of a d*** about it certainly is.
Sheesh, what a forum this is.
The problem is, he attacked software you have nothing to do with, then you personally attacked him. And you have nerve to question the forum?
Shinigami-Sama
11th June 2009, 01:24
ITT: failure
a dev tells you that its not worth it to switch
you point out a single specific instance where it helped
you're then told 90% of that instance is just hot air
hmm...
maybe doing a touch of research would have avoided all this?
Anyways a while back Avail went to port some of x264 to an FPGA Card; they gave up after they found out they got a better boost from a bigger CPU and that porting wasn't worth the time/effort...
ajp_anton
11th June 2009, 04:43
Is there any hope for x264 + Larrabee?
aegisofrime
11th June 2009, 06:46
Recently I tried Cyberlink Mediashow Expresso. One of the highlights was using ATI Stream to accelerate H.264 encoding. For all purposes, it is the same as encoding using CUDA and OpenCL, since it is encoding using the GPU after all.
And what a mess it is. Sure, it is faster. In fact, it's probably twice as fast. The thing is, I have very limited options with the encoder settings. I CANNOT change the bitrate (!!!), which is fixed at 6 Mbps (!!!!!! :eek:). And the funny thing is even with such a high bitrate, the quality was alot worse than the original video which is probably about 1500 kbps in Xvid...
akupenguin
11th June 2009, 08:35
Is there any hope for x264 + Larrabee?
More hope than CUDA, since Larrabee is much closer to a conventional SIMD architecture. But it still involves rewriting a ton of asm and some architecture changes, and it's still not in progress.
Ajax_Undone
11th June 2009, 08:53
Now back to the OP...
Will we ever see these two technologies supported in x264? What would it take for that to happen?
Simple answer: No!!!
GPU's were, are, and should remain primarily for Graphics Processing in Games and 3D projects and Development.
~DarC
Chengbin
11th June 2009, 13:20
Recently I tried Cyberlink Mediashow Expresso. One of the highlights was using ATI Stream to accelerate H.264 encoding. For all purposes, it is the same as encoding using CUDA and OpenCL, since it is encoding using the GPU after all.
And what a mess it is. Sure, it is faster. In fact, it's probably twice as fast. The thing is, I have very limited options with the encoder settings. I CANNOT change the bitrate (!!!), which is fixed at 6 Mbps (!!!!!! :eek:). And the funny thing is even with such a high bitrate, the quality was alot worse than the original video which is probably about 1500 kbps in Xvid...
Well, you can't get better quality video than the original.
The ATI GPU encoding really really sucks. Badaboom beat it by a mile, and badaboom really sucks.
aegisofrime
11th June 2009, 15:09
Well, you can't get better quality video than the original.
The ATI GPU encoding really really sucks. Badaboom beat it by a mile, and badaboom really sucks.
I wasn't expecting to get better quality, where did I said that in my post? I know that when transcoding some generation loss is expected. The thing is, the quality of the new video is probably 50% of the original, whereas if you were to use x264 you can expect at least 90% of the quality, if not reasonably transparent.
The point is that the transcoded video shouldn't look ALOT worse.
LoRd_MuldeR
11th June 2009, 15:31
I wasn't expecting to get better quality, where did I said that in my post? I know that when transcoding some generation loss is expected. The thing is, the quality of the new video is probably 50% of the original, whereas if you were to use x264 you can expect at least 90% of the quality, if not reasonably transparent.
The point is that the transcoded video shouldn't look ALOT worse.
Any decoder (even the most crappy one) should be able to deliver transparent quality, if you only throw enough bitrate on it ;)
What separates a good encoder from a bad one is the ability to retain good quality at a reasonable low bitrate and do this at a reasonable speed!
So you can't judge the "quality" of an encoder without considering "bitrate" and "speed" at the same time...
Sharktooth
11th June 2009, 15:51
Is there any hope for x264 + Larrabee?
Larrabee will be slower than GPUs available at the time it gets out of the intel fabs for production.
It will be even slower than some of the today's GPUs.
LoRd_MuldeR
11th June 2009, 16:03
Larrabee will be slower than GPUs available at the time it gets out of the intel fabs for production.
It will be even slower than some of the today's GPUs.
Maybe slower in terms of "theoretical" floating-point processing power. But it's actually pretty tough to write programs that run efficiently on a GPU. So even if your GPU offers a lot of processing power, you may only be able to utilize a small fraction of that theoretical processing power. The Larrabee will be much more similar to the well-known x86 architecture, so it probably will be far easier to actually utilize it's processing power. As the result an encoder may benefit more from Larrabee than from a high-end GPU, even if the GPU offers more processing power in theory...
Sharktooth
11th June 2009, 17:19
the problems with GPUs are massive multithreading (read as: not all algos can be massive parallelized) and bus bottlenecks... same things happen for larrabee plus it will have lower performance (http://news.cnet.com/8301-13512_3-10024280-23.html) than the competion at the time it gets out (1 (http://news.softpedia.com/news/Larrabee-Said-to-Perform-as-Fast-as-the-GTX-285-113325.shtml) and 2 (http://www.tomshardware.com/news/intel-larrabee-nvidia-geforce,7944.html)).
aegisofrime
11th June 2009, 17:43
Any decoder (even the most crappy one) should be able to deliver transparent quality, if you only throw enough bitrate on it ;)
What separates a good encoder from a bad one is the ability to retain good quality at a reasonable low bitrate and do this at a reasonable speed!
So you can't judge the "quality" of an encoder without considering "bitrate" and "speed" at the same time...
Well, Mediashow Expresso's profile defaulted to 6 Mbps without any way for me to change it. It's fast, but gives about 50% image quality with 4 times the bit rate of the original (1500 kbps).
Dark Shikari
11th June 2009, 17:55
Larrabee will be slower than GPUs available at the time it gets out of the intel fabs for production.
It will be even slower than some of the today's GPUs.32 cores with 512-bit SIMD is not exactly what I'd call "slow" unless it turns out that SIMD has godawful throughput or something.
Cyber-Mav
11th June 2009, 22:39
would MIMD be any better?
Sagekilla
13th June 2009, 00:39
To add on what Dark said, Larrabee is just a 32-core Pentium 3 (with new SSE instructions and 512-bit vector units).
It's probably more similar to the stream processors in nvidia's GPUs than ATI's (~256 shaders vs 800 -- You tell me, which one is probably simpler?).
Relatively speaking, it's probably slower than a midrange GPU (9800 GT for example) but for stuff like video encoding, it'd be much easier to utilize the full processing power than with a GPU.
I have a dinky 8600M GT on my laptop, and it's good enough to play most games at medium, so it's at least as fast as my GPU (Or I should hope so).
If you wanna measure raw performance the way nvidia and ATI do: Larrabee can do 16 cores x 32 fp x 1 - 2 GHz = 1 - 2 TFLOPs. For comparison, the Geforce GTX 275 can do ~1 TFLOPs. That's single precision for either, halve that for double precision on Larrabee.
ajp_anton
13th June 2009, 01:07
Relatively speaking, it's probably slower than a midrange GPU (9800 GT for example) but for stuff like video encoding, it'd be much easier to utilize the full processing power than with a GPU.From what I've heard it's supposed to be like a GTX 285.
iwod
13th June 2009, 04:13
Why doesn't anyone do 32 Core ARM Cortex A9? I am sure the ARM architecture are very well known as well?
Dark Shikari
13th June 2009, 04:17
Why doesn't anyone do 32 Core ARM Cortex A9? I am sure the ARM architecture are very well known as well?Because scaling up an architecture is inherently hard (caching issues, memory access issues, etc).
Also, they need to improve other aspects of the A9 first; the SIMD was reduced from dual to single issue in the A9 because they ran out of transistors. :rolleyes:
raceviper13
1st September 2009, 22:59
I'm just not a programmer, but I think this new SDK by AMD should make this at least feasible. Any thoughts? See this news article: Chip Wars: NVIDIA and ATI spar over OpenCL GPU support (http://www.tgdaily.com/content/view/43635/135/)
LoRd_MuldeR
1st September 2009, 23:17
I'm just not a programmer, but I think this new SDK by AMD should make this at least feasible. Any thoughts? See this news article: Chip Wars: NVIDIA and ATI spar over OpenCL GPU support (http://www.tgdaily.com/content/view/43635/135/)
First of all OpenCL is not "by AMD", but a platform for GPGPU (that is: use the GPU as a co-processor for the CPU) that will be supported by ALL major GPU manufacturers, including NVidia and ATI.
It's going the replace the manufacturer-specific GPGPU platforms, such as NVidia's CUDA and AMD/ATI's Stream. But it's nothing new at all. Just another name/interface for the same thing.
Secondly GPGPU is far less suitable for video encoding than all the GPU manufacturers claim. Or why do you think there still is not one single GPU encoder on the market that can compete with x264?
All the "CUDA" encoders that are available on the market for quite some time now produce horrible quality. AFAIK GPU support for x264 was planned as a GSoC project (http://wiki.videolan.org/SoC_x264_2009#GPU_Motion_Estimation), but never really started.
Atak_Snajpera
2nd September 2009, 00:23
I think we will have to wait for intel's gpu based on x86 architecture.
lucassp
2nd September 2009, 08:17
We'll have to wait for that a little more. Intel has been talking Larrabee for years now but no actual silicon or SDK was seen yet.
7ekno
7th September 2009, 14:50
Pegasys Inc's test results on TMPGEnc appear to claim encoding times are hugely shortened using SpursEngine, and even more so when using also CUDA.
Try their software, MOST of that speed-up is due to the original DECODE of AVC, VC1 and to a lesser extent MPEG2 being offloaded to GPU ...
You can achieve a similar speedup outcome using x264 + a GPU supported decoder (like CoreAVC, NVTools DGSource, etc) ... by offloading decode to GPU you free up more CPU cycles for encoding, and it can make a significant difference when dealing with 1080p AVC sources ...
7ek
Kurtnoise
10th September 2009, 09:36
I'm wondering, how Profile & Level is determined during the 1st pass/multipasses encodes ?
1st pass:
x264 --preset veryslow --tune animation --pass 1 --bitrate 1253 --stats "D:\BBB2_01.stats" --thread-input --output NUL "F:\BBB2.avs"
avis [info]: 640x352 @ 29.98 fps (2001 frames)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.1 Cache64
x264 [info]: profile Main, level 3.0
x264 [info]: frame I:24 Avg QP:12.37 size: 62196
x264 [info]: frame P:778 Avg QP:15.11 size: 8915
x264 [info]: frame B:1199 Avg QP:19.60 size: 1804
x264 [info]: consecutive B-frames: 5.7% 34.1% 33.8% 11.9% 4.6% 5.5% 1.1% 2.0% 1.4%
x264 [info]: mb I I16..4: 13.4% 0.0% 86.6%
x264 [info]: mb P I16..4: 5.3% 0.0% 0.0% P16..4: 55.3% 0.0% 0.0% 0.0% 0.0% skip:39.4%
x264 [info]: mb B I16..4: 0.8% 0.0% 0.0% B16..8: 11.2% 0.0% 0.0% direct: 5.1% skip:83.0% L0:27.1% L1:37.6% BI:35.3%
x264 [info]: final ratefactor: 15.55
x264 [info]: direct mvs spatial:97.4% temporal:2.6%
x264 [info]: coded y,uvDC,uvAC intra:83.0% 91.5% 70.3% inter:15.3% 19.0% 4.2%
x264 [info]: kb/s:1269.4
encoded 2001 frames, 39.26 fps, 1269.52 kb/s
2nd pass:
x264 --preset veryslow --tune animation --pass 2 --bitrate 1253 --stats "D:\BBB2_01.stats" --thread-input --output "D:\BBB2_01.mp4" "F:\BBB2.avs"
avis [info]: 640x352 @ 29.98 fps (2001 frames)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.1 Cache64
x264 [info]: profile High, level 3.1
mp4 [info]: initial delay 524288 (scale 15716057)
x264 [info]: frame I:24 Avg QP:12.64 size: 68072
x264 [info]: frame P:778 Avg QP:17.09 size: 8352
x264 [info]: frame B:1199 Avg QP:21.64 size: 1962
x264 [info]: consecutive B-frames: 5.7% 34.1% 33.8% 11.9% 4.6% 5.5% 1.1% 2.0% 1.4%
x264 [info]: mb I I16..4: 5.7% 60.6% 33.7%
x264 [info]: mb P I16..4: 0.2% 3.0% 1.1% P16..4: 26.0% 7.8% 12.1% 1.6% 2.0% skip:46.2%
x264 [info]: mb B I16..4: 0.0% 0.7% 0.5% B16..8: 14.7% 1.6% 2.4% direct: 2.1% skip:78.0% L0:36.1% L1:44.9% BI:19.0%
x264 [info]: 8x8 transform intra:64.1% inter:38.7%
x264 [info]: direct mvs spatial:90.2% temporal:9.8%
x264 [info]: coded y,uvDC,uvAC intra:91.7% 95.4% 81.0% inter:13.5% 14.4% 6.2%
x264 [info]: ref P L0 72.8% 8.4% 4.9% 3.0% 2.7% 1.8% 1.2% 0.9% 0.8% 0.7% 0.6% 0.5% 0.4% 0.4% 0.5% 0.4%
x264 [info]: ref B L0 75.7% 7.9% 4.4% 2.6% 2.7% 1.7% 1.0% 0.8% 0.7% 0.7% 0.5% 0.3% 0.3% 0.2% 0.5%
x264 [info]: kb/s:1256.4
encoded 2001 frames, 16.84 fps, 1256.57 kb/s
I thought that HP was the default one ?
my script:
ImageSource("F:\big buck bunny\big_buck_bunny_%05d.png", 3000, 5000, 29.976)
ConvertToYV12()
LanczosResize(640, 352)
Dark Shikari
10th September 2009, 09:42
Fast firstpass turns off 8x8dct...
Kurtnoise
11th September 2009, 10:13
Right...forgot that.
Yet another question: is there somewhere a recent compiler speed comparison (gcc 3.4.x vs gcc 4.x.x vs icc) ?
Conquerist
11th September 2009, 10:20
is there somewhere a recent compiler speed comparison (gcc 3.4.x vs gcc 4.x.x vs icc) ?I think this (http://forum.doom9.org/showthread.php?p=1317009#post1317009) is exactly what you're looking for. It's also got good info on the speed difference between 32-bit and 64-bit builds.
Kurtnoise
11th September 2009, 10:29
great...10x.
Dark Shikari
23rd October 2009, 21:15
Decoding of an h.264 stream with VDPAU (Video decode programming API for unix) on an Nvidia GPU is of perfect quality, when compared to any software decoder. Due to the noticeable quality difference, we can think of GPU as of a high quality video processor, since it does not rely on any lossy "optimizations" in order to work in real time. Of course, decoding is not the same as encoding, but it also needs CABAC, ME, and all of this was already implemented (successfully) on a GPU.Don't post about things you know nothing about.
All H.264 decoders are required to give identical output. Additionally, ME is not part of the decoding process. Finally, no GPU in existence performs video decoding; the process is done on a dedicated ASIC separate from the main processor.
qduaty
23rd October 2009, 21:35
I deleted the post when trying to edit it, sorry. But the most interesting part still exists.
Don't post about things you know nothing about.
All H.264 decoders are required to give identical output. Additionally, ME is not part of the decoding process. Finally, no GPU in existence performs video decoding; the process is done on a dedicated ASIC separate from the main processor.
Perhaps, conformant H.264 decoders must give identical output, but since their output differ in quality, it seems some are not conformant. Regarding the ASIC, Nvidia claimed there is such a part of their GPU that does the video decode in DirectX, but their developers just say that VDPAU will not be released in a form that exposes internal functions to the user - which means there are such functions in VDPAU (and it is not just an interface to some sort of a specialized device) and the decoding is implemented programatically.
Dark Shikari
23rd October 2009, 21:38
Perhaps, conformant H.264 decoders must give identical output, but since their output differ in qualityNo they don't.Regarding the ASIC, Nvidia claimed there is such a part of their GPU that does the video decode in DirectX, but their developers just say that VDPAU will not be released in a form that exposes internal functions to the user - which means there are such functions in VDPAU (and it is not just an interface to some sort of a specialized device) and the decoding is implemented programatically.Or it just means they're not going to expose the proprietary features of their graphics cards.
qduaty
23rd October 2009, 21:53
No they don't.
I know about the loop filter that can be disabled in ffh264 to make it decode faster. Maybe what I've seen was decoding with such tricks enabled. Anyway, Nvidia doesn't seem to do it and it's AVC output looks better.
Or it just means they're not going to expose the proprietary features of their graphics cards.
I just don't believe they devote so many transistors to a function that cannot be used in a game. And since they have a fast DCT implementation freely available as a test for CUDA in their SDK, any other function that is independent of time and reasonably data-parallel (ie. not an IIR filter) can be implemented this way.
Also, even if I don't distinguish between ME in an encoder and whatever function takes several reference frames to reproduce images in decoder, these algorithms seem to be related. I may be wrong but all those 500$ authors of CUDA encoders certainly aren't; they already have ME working. It's crap, but it works.
Dark Shikari
23rd October 2009, 21:58
I know about the loop filter that can be disabled in ffh264 to make it decode faster. Maybe what I've seen was decoding with such tricks enabled. Anyway, Nvidia doesn't seem to do it and it's AVC output looks better.No decoder I know of does that unless you explicitly enable it as an option.I just don't believe they devote so many transistors to a function that cannot be used in a game. And since they have a fast DCT implementation freely available as a test for CUDA in their SDK, any other function that is independent of time and reasonably data-parallel (ie. not an IIR filter) can be implemented this way.Many transistors? The iPhone 3GS can decode Level 4.1 H.264. It is not a lot of transistors.
And video decoding is extremely non-data-parallel; it would be physically impossible to decode a 50 megabit CABAC stream in realtime on a GPU.
Lyris
24th October 2009, 01:59
Dark Shikari - can you elaborate on differences between decoders? I was under the impression that the output was bit-identical.
Astrophizz
24th October 2009, 02:09
The output is bit-identical, otherwise it's a bug in the decoder.
Lyris
24th October 2009, 03:19
Sorry, I totally skim-read - apologies!
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.