Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
25th October 2017, 08:47 | #1 | Link |
Registered User
Join Date: Sep 2016
Posts: 16
|
Why doesn't x265 use the GPU in encoding?
I know it's not a new topic. But, would it be thinkable that x265 could be supported by the immense computing power of modern graphics cards? I'm not a programmer and certainly not the first one to come up with this idea, but I am interested to know why this way is not tried?
|
25th October 2017, 11:35 | #2 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,806
|
Because algorithms used in x265 are not suitable for OpenCL.
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
25th October 2017, 19:40 | #4 | Link |
Registered User
Join Date: Dec 2008
Posts: 589
|
I'm wondering if it would be too hard to reserve 2-3 GB of video card memory and fill it up with video frames and then do colorspace conversions, motion estimations and stuff like that on multiple frames simultaneously (ex instead of splitting one frame over hundreds of "mini-cores", preload the card with 100 frames and have a few cores working on each frame simultaneously... unless the random memory reads would kill the performance).
Unless I'm wrong, x265 does a lot of "early skips" and the defaults aren't as exhaustive or high quality as possible (in order to increase encoding speed) ... provided the user is willing to buffer loads of frames and has enough memory for that, wouldn't using video cards be beneficial? |
25th October 2017, 22:23 | #5 | Link | |
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
|
Quote:
The latest Intel CPUs are really incredible devices for encoding HEVC.If anything, codecs are becoming less well suited for GPU versus CPU as they become more complex. MPEG-2 on GPU was pretty trivial, H.264 could fall short in qualty at lower bitrates, and HEVC on GPU simply hasn't ever demonstrated high qualiity with high compression efficiency. I don't think GPUs are going to be viable in the foreseeable future. FPGA seems like the most viable alternative to CPU. |
|
25th October 2017, 23:24 | #6 | Link |
Registered User
Join Date: May 2009
Posts: 328
|
There is a fork of it that does exactly this https://bitbucket.org/vovagubin/x265...r-cuda-encoder but people here who have compiled it and tried it have said it is no faster than using a CPU. I personally have no clue how to do such "magic" so I can't comment on that. I'm a 100% hardware person, so asking me to compile something you may as well be telling me to do rocket surgery.
|
25th October 2017, 23:31 | #7 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,344
|
x264 got some OpenCL features, but from what I remember they barely help speed at all, and a more complex codec like HEVC will likely benefit even less. So overall, what benwaggoner said - modern codecs are too complex for that. You can of course do any pre-filtering on the GPU, if required, but thats really outside of the actual encoding process anyway.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
26th October 2017, 05:31 | #8 | Link | |
Guest
Posts: n/a
|
Quote:
Traditional "GPU Computing" involves offloading tasks to the GPU. The GPU needs to be able to complete these tasks faster than they would be completed in the CPU, so that other tasks that depend on the result of the first task don't have to wait for the first task to complete. GPU cores (shaders, stream processors, EUs, or whatever you want to call them) are slower than CPU cores, but typically there are many times more GPU cores available. So single-threaded work doesn't accelerate well on a GPU. You need highly parallelizeable functions. You need tasks that are sufficiently large (not small units of work). And you need tasks that don't have serial dependencies. Video encoding involves LOTS of serial dependencies. The block of pixels I'm trying to encode now has to reference neighboring blocks above and to the left, and we are also searching for references in previously encoded frames. All of those blocks and frames must be finished encoding before we can start encoding the block we are trying to encode now. The units of work involved in each task are relatively small. If I want to offload that work to GPU that is sitting across a PCI bus, it will take 1 millisecond to send the work to the GPU, and 1 millisecond to get the result back (the latency of a PCI bus). 1 millisecond is an eternity... 2.5 million CPU clock cycles, typically. When people talk about a GPU encoder, they may really be talking about a fixed function (hardware) encoder that is part of a graphics chip. But that's not software running in the GPU - that's a hardware encoder. There are tricks we can use to work around some of these issues, but there are also many ways we can continue to accelerate x265, and we are looking at all of them. |
|
26th October 2017, 23:15 | #10 | Link | |
Registered User
Join Date: Mar 2004
Posts: 1,120
|
Quote:
|
|
27th October 2017, 01:04 | #11 | Link | |
Registered User
Join Date: Apr 2015
Posts: 21
|
Quote:
Do they use the PCI interface despite being on the same chip ? If so, is the newly announced Ryzen+Vega APU that use Infinity fabric to link CPU and GPU going to improve things ? Can this bring improvement on video encoding or tasks are still too small to benefit this lower latency ? |
|
27th October 2017, 09:42 | #13 | Link |
Registered User
Join Date: Dec 2008
Posts: 589
|
How about a scenario where for example user wants to apply a filter like resize on some content...?
For example the encoder can take in 4K footage and produce 1080p content. So the encoder starts using the cpu and is busy encoding at 5-10 fps, and while this happens it could go ahead a few hundred frames and "upload" 100 frames or so into the video card to have resizing done on the card instead of using the processor. So by going ahead, by the time the encoder processes using cpu enough frames to reach that offset, the video card should already be done with job and the resized frames should be already down in the regular computer memory.. and repeat the process with another batch of frames, or just upload frame by frame as frames are resized and transferred down into the regular memory in some "memory pool" / buffer ... not sure if there's any time saved considering you have to "upload" a few GB into the card, wait for the card to process the frames, "download" the frames ... Would there be a noticeable quality difference in the resize algorithms done on CPU and on graphics card? Or maybe this should not happen in the encoder but rather in the frame server or render that passes the frames to the encoder... |
27th October 2017, 09:47 | #14 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,344
|
We've already talked about pre-processing/pre-filtering frames a few posts above, which would include resizing. Those are things GPUs are good at. But its not really the encoders job, you can do that before the frame gets to the actual encoder just fine.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
27th October 2017, 13:02 | #15 | Link | |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,806
|
Quote:
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
27th October 2017, 19:11 | #16 | Link | |
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
|
Quote:
Good old integer 8-bit bicubic scaling is pretty trivial now, certainly. |
|
27th October 2017, 21:48 | #17 | Link |
Guest
Posts: n/a
|
Yes, and we're working to develop such a hybrid, but right now it falls outside what we would push into x265 (as it involves a whole bunch of extra code to run the hardware encoder, get the analysis from the hardware encoder, format the analysis, such that x265 can use it as a starting point for a software encode). The trouble is that the quality of analysis coming from hardware encoders is not great. That will improve over time.
|
30th October 2017, 08:18 | #18 | Link |
Registered User
Join Date: Apr 2002
Posts: 756
|
People will need to learn this Amdahl's law
https://en.wikipedia.org/wiki/Amdahl%27s_law To simply put, Speedup is limited by the total time needed for the sequential (serial) part of the program. For 10 hours of computing, if we can parallelize 9 hours of computing and 1 hour cannot be parallelized, then our maximum speedup is limited to 10x. And as explained, there isn't that much you can parallelize in complex codec like HEVC. GPU isn't great at complex linear processing, it isn't going to be, and possibly never will be. But it will continue to do more simple processing at the same time, hence why GPU scales linearly will more transistors in place. ( Until it eventually hit bottleneck else where like memory bandwidth ) If doing more parallelization on encoder, I will assume having more CPU Cores will be better then doing it on the GPU. And you will need to thanks AMD for that. |
4th November 2017, 03:23 | #19 | Link |
Registered User
Join Date: May 2005
Location: Swansea, Wales, UK
Posts: 196
|
Still confuses me how SIMD on a CPU (AVX) is fine but GPU SIMD is bad, arent they both intrinsically parallel in nature?
Would a general video encoding accelerator ever be possible, or would processes such as RDO and trellis and so forth be too specific to each codec to generalise for future proofing? Somewhat like hardware accelerated ray tracing this seems to be an area with little commercial development going forward right now in the age of AI/inference accelerators. |
4th November 2017, 13:29 | #20 | Link | |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,806
|
Quote:
BTW. Why x265 does not use OpenCL for lookahead part like x264?
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|