Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 14th December 2009, 16:52   #181  |  Link
Firebird
Registered User
 
Join Date: Mar 2008
Posts: 61
Quote:
Is CUDA getting closer to x264
No. It will never be as good as x264 is.
Firebird is offline   Reply With Quote
Old 14th December 2009, 17:01   #182  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Quote:
Is CUDA getting closer to x264
No. It will never be as good as x264 is.
That statement doesn't make sense. CUDA is a platform technology while x264 is one specific software. So you are comparing apples and oranges

So the real question is: Will GPGPU-based (CUDA, Stream, OpenCL, etc.) H.264 encoders eventually beat CPU-only encoders performance-wise and quality-wise?

Well, currently it doesn't look like this will happen soon. But this may change with upcoming GPU generation...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊
LoRd_MuldeR is offline   Reply With Quote
Old 14th December 2009, 17:04   #183  |  Link
Cyber-Mav
Registered User
 
Join Date: Dec 2005
Posts: 244
cuda is getting closer now in quality.
Cyber-Mav is offline   Reply With Quote
Old 14th December 2009, 17:12   #184  |  Link
nakTT
Registered User
 
Join Date: Dec 2008
Posts: 415
Quote:
Originally Posted by LoRd_MuldeR View Post
That statement doesn't make sense. CUDA is a platform technology while x264 is one specific software. So you are comparing apples and oranges

So the real question is: Will GPGPU-based (CUDA, Stream, OpenCL, etc.) H.264 encoders eventually beat CPU-only encoders performance-wise and quality-wise?

Well, currently it doesn't look like this will happen soon. But this may change with upcoming GPU generation...
Thanks Loard, that's what I meant.

Quality wise, what the upcoming GPU generation got to do with it? It is just the hardware, I can understand if it speed wise. Please shed some light on this.

nakTT is offline   Reply With Quote
Old 14th December 2009, 17:29   #185  |  Link
roozhou
Registered User
 
Join Date: Apr 2008
Posts: 1,181
Quote:
Originally Posted by nakTT View Post
Thanks Loard, that's what I meant.

Quality wise, what the upcoming GPU generation got to do with it? It is just the hardware, I can understand if it speed wise. Please shed some light on this.

There is no video encoding chip on any GPU so the encoding quality has nothing to do with GPU generation or CPU models. You get same quality between a P3 and a i7 with x264 if you use the same settings.

Quality is only determined by the algorithm that the encoder uses. This applies to both CUDA and x264.
roozhou is offline   Reply With Quote
Old 14th December 2009, 17:42   #186  |  Link
nakTT
Registered User
 
Join Date: Dec 2008
Posts: 415
Quote:
Originally Posted by roozhou View Post
There is no video encoding chip on any GPU so the encoding quality has nothing to do with GPU generation or CPU models. You get same quality between a P3 and a i7 with x264 if you use the same settings.

Quality is only determined by the algorithm that the encoder uses. This applies to both CUDA and x264.
That is my understanding that I'm trying to share on my previous post. Perhaps we are wrong/right?
nakTT is offline   Reply With Quote
Old 14th December 2009, 18:16   #187  |  Link
nakTT
Registered User
 
Join Date: Dec 2008
Posts: 415
Thanks for your thought Stephen. It doesn't occur to me until you point it out. Thanks again.
nakTT is offline   Reply With Quote
Old 14th December 2009, 19:24   #188  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by roozhou View Post
There is no video encoding chip on any GPU...
Not yet. But there is dedicated decoder hardware on any modern graphics card already. Also there are encoding solutions available that ship with a dedicated encoder hardware/stick.

Therefore it's not completely absurd to think about adding dedicated encoder chips to future GPU generations...

Quote:
Originally Posted by roozhou View Post
...so the encoding quality has nothing to do with GPU generation or CPU models.
Well, the capabilities of the first GPGPU-enabled GPU generation were pretty limited. Since then the GPU manufactures have added new GPGPU-specific capabilities with each generation.

Certain encoding algorithms, that can not be implemented (efficiently) on the current GPU generation, may be implementable on future GPU generations.

So we may see improved GPGPU encoders on future GPU generations indeed! Especially since the development of GPU's currently is rapid, while the development of CPU's is slowing down.

Of course there is no automatism. We'll have to wait and see whether future GPU generations will be more suitable for video encoder than the current GPU's.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 14th December 2009 at 19:33.
LoRd_MuldeR is offline   Reply With Quote
Old 15th December 2009, 03:45   #189  |  Link
nakTT
Registered User
 
Join Date: Dec 2008
Posts: 415
Quote:
Originally Posted by LoRd_MuldeR View Post
Well, the capabilities of the first GPGPU-enabled GPU generation were pretty limited. Since then the GPU manufactures have added new GPGPU-specific capabilities with each generation.

Certain encoding algorithms, that can not be implemented (efficiently) on the current GPU generation, may be implementable on future GPU generations.

So we may see improved GPGPU encoders on future GPU generations indeed! Especially since the development of GPU's currently is rapid, while the development of CPU's is slowing down.

Of course there is no automatism. We'll have to wait and see whether future GPU generations will be more suitable for video encoder than the current GPU's.
So is my understanding correct to say that CPU encoding give programmer a huge of room to maneuver as appose to GPGPU?

Thanks for a very newbie friendly explanation. I have always like the way you treat newbies. Keep it up.


nakTT is offline   Reply With Quote
Old 16th December 2009, 04:37   #190  |  Link
kidjan
Registered User
 
kidjan's Avatar
 
Join Date: Oct 2008
Posts: 39
Quote:
Originally Posted by St Devious View Post
ok, doing that.

In the meantime here's some more images

CUDA 6 Mbps


x264 6Mbps


Source
IMO, quality comparisons like this would be a lot more useful with SSIM (and possibly PSNR) measurements. My $.02, possibly wrong. It's a lot easier to encode the same video to equal bitrates and then see how it fares with an objective measurement than posting screenshots.
kidjan is offline   Reply With Quote
Old 16th December 2009, 09:19   #191  |  Link
Puncakes
Registered User
 
Join Date: Aug 2009
Posts: 26
Quote:
Originally Posted by kidjan View Post
IMO, quality comparisons like this would be a lot more useful with SSIM (and possibly PSNR) measurements. My $.02, possibly wrong. It's a lot easier to encode the same video to equal bitrates and then see how it fares with an objective measurement than posting screenshots.
I don't know about you, but considering the fact that those measurements are useless for comparing actual visual quality, I think I'd rather have screenshots.
Puncakes is offline   Reply With Quote
Old 16th December 2009, 10:52   #192  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by nakTT View Post
So is my understanding correct to say that CPU encoding give programmer a huge of room to maneuver as appose to GPGPU?
You must think of the GPU as a massively parallel processor. So GPGPU (CUDA, Stream, OpenCL, etc) gives the programmer access to a massively parallel co-processor.

And we are not talking about four or eights threads here. We are talking about hundreds or even better thousands of threads that need to run on the GPU!

So if you want to leverage the theoretical processing power of a GPU, your problem must be highly parallelizeable and new algorithms are needed that scale to hundreds/thousands of threads.

Therefore not any problem is suitable for the GPU. There are inherently sequential problems that don't fit on the GPU at all!


The GPU cores are many, but they are very limited. Especially memory access to the (global) GPU memory is extremely slow, because it's not cached at all (except for texture memory).

Thus we must try to "hide" slow memory access with calculations, which means that we need much more GPU threads than we have GPU cores.

Well, each group/block of GPU cores has its own local "shared" memory that is fast, but the size of that per-block shared memory is small. Way too small for many things!

Also we can't sync the shared memories of different blocks, so whenever threads from different blocks need to "communicate", this needs to be done through the slow "global" memory.

Even organizing/synchronizing the threads within a block is a though task, because "bad" memory access patterns can slow down your GPU program significantly!


Last but not least the GPU cannot access the main/host memory at all. Hence the CPU program needs to upload all input data to the graphic's device first and later download all the results.

That "host <-> device" data transfer is a serious bottleneck and means that you cannot run "small" functions on the GPU, even if they are a lot faster there.

What worth is it to complete a calculation in 1 ms instead of 10 ms, but it takes 20 ms to upload/download the data to/from the graphic's device? Yes, it's completely useless!

So if we move parts of our program to the GPU, this must be significant parts with enough "work" to justify the communication delay. It's not trivial to find such parts in your software.

Remember: Those parts must also be highly parallelizable and efficient parallel algorithms for the individual problem must exists (or must be developed).


See also:
http://developer.download.nvidia.com..._Guide_2.0.pdf
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 16th December 2009 at 13:43.
LoRd_MuldeR is offline   Reply With Quote
Old 16th December 2009, 10:58   #193  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by LoRd_MuldeR View Post
And we are not talking about four or eights threads here. We are talking about hundreds or even better thousands of threads that run on the GPU.
More like 20,000.
Dark Shikari is offline   Reply With Quote
Old 16th December 2009, 11:30   #194  |  Link
nakTT
Registered User
 
Join Date: Dec 2008
Posts: 415
Thanks again LoRd_MuldeR, for your informative posting. I really enjoy reading the info.


nakTT is offline   Reply With Quote
Old 16th December 2009, 14:53   #195  |  Link
Limit
Registered User
 
Join Date: Oct 2005
Location: .DE
Posts: 15
I wonder if the next generation CPU/GPU combo chips like Llano/Fusion would make any difference. The latencies should be significant lower although they are still conected over PCIe. What do you think, would such a APU be useful for x264?
Limit is offline   Reply With Quote
Old 16th December 2009, 15:21   #196  |  Link
nakTT
Registered User
 
Join Date: Dec 2008
Posts: 415
Quote:
Originally Posted by Limit View Post
I wonder if the next generation CPU/GPU combo chips like Llano/Fusion would make any difference. The latencies should be significant lower although they are still conected over PCIe. What do you think, would such a APU be useful for x264?
IMHO integrated GPU (be it just on the same packaging or on the same silicone) will be nowhere near the power of a high-end discreet GPU. Please note that those kind of CPUwithGPU are targeted towards notebooks and other lightly demanding graphic usage like office PC and others.
nakTT is offline   Reply With Quote
Old 16th December 2009, 15:30   #197  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Limit View Post
I wonder if the next generation CPU/GPU combo chips like Llano/Fusion would make any difference. The latencies should be significant lower although they are still conected over PCIe. What do you think, would such a APU be useful for x264?
Well, it may make the bottleneck less critical, but certainly doesn't remove it, as the basic architecture still is the same. Unless they use more PCIe lanes for the internal interconnect than they used for the "external" PCIe bus, there won't be much difference. And even if there is a difference, the way we call GPU kernels/programs is still the same: Upload input data from the host to the device, invoke the GPU kernel, wait for completion (while maybe doing other things on the CPU) and finally download the results from the device back to the host. Also I doubt that the combined CPU/GPU chip packages will contain very powerful GPU's. It will be more like what he have as "on board" graphics chips now. Not anywhere near high-end GPU's.

However with NVidia's new "Fermi" GPU generation there will be significant improvements for the per-block "shared" GPU memory: It's now much larger and it can (optionally) be used to cache accesses to the global GPU memory. This may (or may not) significantly help for specific problems. Also this is one example for what I said before: Future GPU generations may be more suitable for implementing video compression algorithms than the current generation. In the case of Fermi I cannot tell you whether it helps video encoding or not. The Codec gurus need to decide ^^
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 16th December 2009 at 15:48.
LoRd_MuldeR is offline   Reply With Quote
Old 16th December 2009, 16:14   #198  |  Link
Limit
Registered User
 
Join Date: Oct 2005
Location: .DE
Posts: 15
Quote:
Originally Posted by LoRd_MuldeR View Post
Well, it may make the bottleneck less critical, but certainly doesn't remove it, as the basic architecture still is the same. Unless they use more PCIe lanes for the internal interconnect than they used for the "external" PCIe bus, there won't be much difference.
PCIe is a point-to-point link so it should be possible to run the GPU's link with a much higher clock rate. Standard PCIe runs at 100MHz. With CPU, GPU and PCIe Controller on the same die a much higher frequency for the GPU PCIe link should be a small problem. For example if you get it running with 1GHz, you increase the bandwidth and decrease the latency by a factor of 10.

Quote:
Originally Posted by LoRd_MuldeR View Post
And even if there is a difference, the way we call GPU kernels/programs is still the same: Upload input data from the host to the device, call the kernel, wait for completion (while maybe doing other things on the CPU) and finally download the results from the device back to the host.
Afaik the integrated GPUs has no own memory besides the small caches. So there is no need to copy data from host memory to device memory because it is the same memory.

Quote:
Originally Posted by LoRd_MuldeR View Post
Also I doubt that the combined CPU/GPU chip packages will contain very powerful GPU's. It will be more like what he have as "on board" graphics chips now. Not anywhere near high-end GPU's.
That is clear. The last rumours I heard speak of 240 shader units for AMD/ATIs first generation Fusion APU. That is far from high-end but its computing power is still higher then any avaible CPU's.
Limit is offline   Reply With Quote
Old 16th December 2009, 16:28   #199  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Limit View Post
Afaik the integrated GPUs has no own memory besides the small caches. So there is no need to copy data from host memory to device memory because it is the same memory.
Well, it then "shares" the RAM modules with the CPU - not to be confused with the on-chip shared memory. But this doesn't mean that the GPU can directly access the same memory locations that the CPU uses. We don't know it yet, but I would assume they simply "lock" a certain range of the physical main memory address space for the GPU. So we'd still have to copy the input data from the "regular" memory area (used by the CPU) over to some place in the memory area reserved for GPU - and the results need to copied back the same way.

Also we are talking about Intel CPU's here, the upcoming "Arrandale" to be precise. So far Intel doesn't offer any GPGPU API for their GPU's. Until Intel does so (probably by making their GPU's accessible through OpenCL), we cannot use those combined CPU/GPU chips for anything but graphics output or video decoding at all! And if you look at the OpenCL API, it is defined similar to the CUDA API. In particular there is "host" memory that OpenCL kernels explicitly cannot access! And there's the "global" (device) memory, which all OpenCL kernels can access.

Quote:
Originally Posted by Limit View Post
PCIe is a point-to-point link so it should be possible to run the GPU's link with a much higher clock rate. Standard PCIe runs at 100MHz. With CPU, GPU and PCIe Controller on the same die a much higher frequency for the GPU PCIe link should be a small problem. For example if you get it running with 1GHz, you increase the bandwidth and decrease the latency by a factor of 10.
That sounds like pure speculation. Unless there are some facts, I will assume that the "internal" PCIe-based interconnect will be roughly at the same level as "external" PCIe 2.0 is nowadays...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 16th December 2009 at 16:58.
LoRd_MuldeR is offline   Reply With Quote
Old 16th December 2009, 16:39   #200  |  Link
ajp_anton
Registered User
 
ajp_anton's Avatar
 
Join Date: Aug 2006
Location: Stockholm/Helsinki
Posts: 805
Quote:
Originally Posted by LoRd_MuldeR View Post
Also we are talking about Intel CPU's here, the upcoming "Arrandale" to be precise. So far Intel doesn't offer any GPGPU API for their GPU's. Until Intel does so (probably by making their GPU's accessible through OpenCL), we cannot use those combined CPU/GPU chips for anything but graphics output or video decoding at all!
No to mention that when we say "the integrated GPUs aren't powerful", we mean the low end of AMD and Nvidia, which is far from what Intel has to offer =)
ajp_anton is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 21:28.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.