Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
14th December 2009, 17:01 | #182 | Link | ||
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Quote:
So the real question is: Will GPGPU-based (CUDA, Stream, OpenCL, etc.) H.264 encoders eventually beat CPU-only encoders performance-wise and quality-wise? Well, currently it doesn't look like this will happen soon. But this may change with upcoming GPU generation...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ |
||
14th December 2009, 17:12 | #184 | Link | |
Registered User
Join Date: Dec 2008
Posts: 415
|
Quote:
Quality wise, what the upcoming GPU generation got to do with it? It is just the hardware, I can understand if it speed wise. Please shed some light on this. |
|
14th December 2009, 17:29 | #185 | Link | |
Registered User
Join Date: Apr 2008
Posts: 1,181
|
Quote:
Quality is only determined by the algorithm that the encoder uses. This applies to both CUDA and x264. |
|
14th December 2009, 17:42 | #186 | Link | |
Registered User
Join Date: Dec 2008
Posts: 415
|
Quote:
|
|
14th December 2009, 19:24 | #188 | Link | |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Not yet. But there is dedicated decoder hardware on any modern graphics card already. Also there are encoding solutions available that ship with a dedicated encoder hardware/stick.
Therefore it's not completely absurd to think about adding dedicated encoder chips to future GPU generations... Quote:
Certain encoding algorithms, that can not be implemented (efficiently) on the current GPU generation, may be implementable on future GPU generations. So we may see improved GPGPU encoders on future GPU generations indeed! Especially since the development of GPU's currently is rapid, while the development of CPU's is slowing down. Of course there is no automatism. We'll have to wait and see whether future GPU generations will be more suitable for video encoder than the current GPU's.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 14th December 2009 at 19:33. |
|
15th December 2009, 03:45 | #189 | Link | |
Registered User
Join Date: Dec 2008
Posts: 415
|
Quote:
Thanks for a very newbie friendly explanation. I have always like the way you treat newbies. Keep it up. |
|
16th December 2009, 04:37 | #190 | Link |
Registered User
Join Date: Oct 2008
Posts: 39
|
IMO, quality comparisons like this would be a lot more useful with SSIM (and possibly PSNR) measurements. My $.02, possibly wrong. It's a lot easier to encode the same video to equal bitrates and then see how it fares with an objective measurement than posting screenshots.
|
16th December 2009, 09:19 | #191 | Link | |
Registered User
Join Date: Aug 2009
Posts: 26
|
Quote:
|
|
16th December 2009, 10:52 | #192 | Link | |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Quote:
And we are not talking about four or eights threads here. We are talking about hundreds or even better thousands of threads that need to run on the GPU! So if you want to leverage the theoretical processing power of a GPU, your problem must be highly parallelizeable and new algorithms are needed that scale to hundreds/thousands of threads. Therefore not any problem is suitable for the GPU. There are inherently sequential problems that don't fit on the GPU at all! The GPU cores are many, but they are very limited. Especially memory access to the (global) GPU memory is extremely slow, because it's not cached at all (except for texture memory). Thus we must try to "hide" slow memory access with calculations, which means that we need much more GPU threads than we have GPU cores. Well, each group/block of GPU cores has its own local "shared" memory that is fast, but the size of that per-block shared memory is small. Way too small for many things! Also we can't sync the shared memories of different blocks, so whenever threads from different blocks need to "communicate", this needs to be done through the slow "global" memory. Even organizing/synchronizing the threads within a block is a though task, because "bad" memory access patterns can slow down your GPU program significantly! Last but not least the GPU cannot access the main/host memory at all. Hence the CPU program needs to upload all input data to the graphic's device first and later download all the results. That "host <-> device" data transfer is a serious bottleneck and means that you cannot run "small" functions on the GPU, even if they are a lot faster there. What worth is it to complete a calculation in 1 ms instead of 10 ms, but it takes 20 ms to upload/download the data to/from the graphic's device? Yes, it's completely useless! So if we move parts of our program to the GPU, this must be significant parts with enough "work" to justify the communication delay. It's not trivial to find such parts in your software. Remember: Those parts must also be highly parallelizable and efficient parallel algorithms for the individual problem must exists (or must be developed). See also: http://developer.download.nvidia.com..._Guide_2.0.pdf
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 16th December 2009 at 13:43. |
|
16th December 2009, 14:53 | #195 | Link |
Registered User
Join Date: Oct 2005
Location: .DE
Posts: 15
|
I wonder if the next generation CPU/GPU combo chips like Llano/Fusion would make any difference. The latencies should be significant lower although they are still conected over PCIe. What do you think, would such a APU be useful for x264?
|
16th December 2009, 15:21 | #196 | Link |
Registered User
Join Date: Dec 2008
Posts: 415
|
IMHO integrated GPU (be it just on the same packaging or on the same silicone) will be nowhere near the power of a high-end discreet GPU. Please note that those kind of CPUwithGPU are targeted towards notebooks and other lightly demanding graphic usage like office PC and others.
|
16th December 2009, 15:30 | #197 | Link | |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Quote:
However with NVidia's new "Fermi" GPU generation there will be significant improvements for the per-block "shared" GPU memory: It's now much larger and it can (optionally) be used to cache accesses to the global GPU memory. This may (or may not) significantly help for specific problems. Also this is one example for what I said before: Future GPU generations may be more suitable for implementing video compression algorithms than the current generation. In the case of Fermi I cannot tell you whether it helps video encoding or not. The Codec gurus need to decide ^^
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 16th December 2009 at 15:48. |
|
16th December 2009, 16:14 | #198 | Link | ||
Registered User
Join Date: Oct 2005
Location: .DE
Posts: 15
|
Quote:
Quote:
That is clear. The last rumours I heard speak of 240 shader units for AMD/ATIs first generation Fusion APU. That is far from high-end but its computing power is still higher then any avaible CPU's. |
||
16th December 2009, 16:28 | #199 | Link | ||
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Quote:
Also we are talking about Intel CPU's here, the upcoming "Arrandale" to be precise. So far Intel doesn't offer any GPGPU API for their GPU's. Until Intel does so (probably by making their GPU's accessible through OpenCL), we cannot use those combined CPU/GPU chips for anything but graphics output or video decoding at all! And if you look at the OpenCL API, it is defined similar to the CUDA API. In particular there is "host" memory that OpenCL kernels explicitly cannot access! And there's the "global" (device) memory, which all OpenCL kernels can access. Quote:
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 16th December 2009 at 16:58. |
||
16th December 2009, 16:39 | #200 | Link | |
Registered User
Join Date: Aug 2006
Location: Stockholm/Helsinki
Posts: 805
|
Quote:
|
|
|
|