x264 OpenCL [Archive] - Doom9's Forum

View Full Version : x264 OpenCL

swg

25th May 2010, 15:50

I was thinking of adding OpenCL support to x264. I’ve googled a bit and it seems that people would be interested, except that not many are willing to start on such a big project. I don’t know GCC compiler directives well enough to start converting it to OpenCL.
I could work on the C code possibly; however, it’s a bit confusing to go through it. If someone were to add an OpenCL folder under the common folder adding to the configure script to allow for compiling with OpenCL support and put the C code in there of the important files such as mc.c, encode.c. Basically all the C code versions of the assembly files. I could implement OpenCL from there.

Guest

25th May 2010, 16:44

What do you mean by "OpenCL support"?

swg

25th May 2010, 17:04

What do you mean by "OpenCL support"?

OpenCL hardware acceleration, essentially GPU acceleration.

swg

25th May 2010, 18:12

Basically OpenCL would replace the assembly part, openCL allows for massively parallel operations to happen on GPU, CPU or any supported hardware and would be optimized accordingly.

LoRd_MuldeR

25th May 2010, 18:44

Basically OpenCL would replace the assembly part, openCL allows for massively parallel operations to happen on GPU, CPU or any supported hardware and would be optimized accordingly.

It has been explained a dozen times why the naive idea that you only need to throw your existing code on the GPU to get a massive speed-up is more than wrong :rolleyes:

Summery: Writing code that actually runs fast on the GPU isn't trivial at all. Inventing new algorithms that are suitable for GPU is even harder. Switching from the CPU to the GPU often requires finding completely new solutions for your problems - which needs a whole lot of work! And in some cases the problem is inherently sequential and thus will never run (efficiently) on the GPU. Last but not least, moving only small parts of a software to the GPU isn't reasonable (speed-wise), because transferring data between the host memory and the GPU memory has a huge delay...

(The fact that all the "GPU encoders" available on market only reach fast encoding speed by sacrificing quality shows that GPU's aren't that great for H.264 encoding)

:search:

swg

25th May 2010, 19:35

It has been explained a dozens times why the naive idea that you only need to throw your existing code on the GPU to get a massive speed-up is more than wrong :rolleyes:

Summery: Writing code that actually runs fast on the GPU isn't trivial at all. Inventing new algorithms that are suitable for GPU is even harder. Switching from the CPU to the GPU often requires finding completely new solutions for your problems - which needs a whole lot of work! And in some cases the problem is inherently sequential and thus will never run (efficiently) on the GPU. Last but not least, moving only small parts of a software to the GPU isn't reasonable (speed-wise), because transferring data between the host memory and the GPU memory has a huge delay...

(The fact that all the "GPU encoders" available on market only reach fast encoding speed by sacrificing quality shows that GPU's aren't that great for H.264 encoding)

:search:

Yes I'm aware of that, I was just offering my help in trying to achieve it. If your not interested that's fine I'll just do it as a side project. Just don't bash. I just read this quote on http://x264dev.multimedia.cx/?p=332
# Ashish Says:
March 25th, 2010 at 3:42 am

What about Motion Estimation using CUDA ? I see much activity 6 months ago. But, not much since then, is the project already complete ?

I have had past CUDA experience and would like to work on this, but I have minimal Video experience. I got some head start from Anton of Nvidia, and I am now exploring wiki pages regarding video compression. I hope I get selected.
# onitake Says:
March 26th, 2010 at 8:26 am

Ashish: gpu offloading would be much appreciated.
could you consider working with opencl instead of cuda though? it’s available in nvidia and amd drivers on both windows and linux now, and also on osx. cuda is constrained to nvidia hardware

LoRd_MuldeR

25th May 2010, 19:43

Yes I'm aware of that, I was just offering my help in trying to achieve it.

What you wrote just didn't sound like that ;)

If your not interested that's fine I'll just do it as a side project.

I am interested. But be warned that it won't be easy. Various companies tried to implement a GPU encoder for H.264 and so far their results all were more than disappointing :rolleyes:

But if you really think you can do it better, then you should talk to the developers at irc://irc.freenode.net/x264dev

Just don't bash.

I only summarized the facts. If you think that is bashing, then you have a rather bizarre definition of bashing.

Don't ask, if you don't want to hear the answer...

What about Motion Estimation using CUDA ?

There already is a GSoC 2010 task scheduled for GPU Motion Estimation:
http://wiki.videolan.org/SoC_x264_2010#GPU_Motion_Estimation_2

Not sure whether that task has already been assigned to somebody. If not, you may want to volunteer...

swg

25th May 2010, 19:56

I only summarized the facts. If you think that is bashing, then you have a rather bizarre definition of bashing.

Don't ask, if you don't want to hear the answer...

There already is a GSoC project for that...
Sorry I'm just currently being bashed on that channel for having suggested it. That statement was aimed at people bashing the idea there. Basically I believe for high definition pictures it would help while low definition there would be little to no performance gain due to latency from main memory to GPU memory

Guest

25th May 2010, 19:59

Why wouldn't latency affect HD also?

Do you have any data to support your beliefs?

burfadel

25th May 2010, 20:44

Does motion estimation only work from one from to the next or can it be effective over several frames? Say you had a graphic at the bottom of the screen, and the next frame it moves up. That I know is already done with ME. If that graphic gets hidden and exposed again I know that gets covered as well as long as its within a certain number of frames. What I am referring to is if that graphic gets covered, then when it is uncovered it is now say, to the left and up a bit? Wouldn't that require the motion estimation (say in UMH mode) to be calculated from the first frame to the second, the first to the third, first to the fourth and so on? Since the changes for each frame are then known, the only ME that would need to be applied between the second frame and third frame, and second frame and fourth frame etc would is the differences between the first and second frame, since the other information is already known. Over time for that group of frames you have a large known set of motion estimations making each successive frame quicker and easier to process.

Thats probably already done? and if not is it a stupid idea or actually possible? I'm guessing it would be very cpu intensive if not already done and its a possible scenario where GPU assistance could be handy? Good for animation especially?

swg

25th May 2010, 20:49

1. Latency != bandwidth. The latency I'm concerned about is not only the hardware latency, but the task scheduler latency in starting the job. For even a fully uncompressed HD image at 1920 x 1080 ( 1 byte per band per pixel at 3 bands per pixel) the transfer time is going to be less than 1/1000 of a second from main memory to GPU memory on a PCI Ex16 V2.x bus at 8 GB/s. The difference in overhead between HD and low def is minimum in transferring because the constant latency time would still be there in HD. The payoff would be greater.
2. I don't have any data to back up my beliefs.

From the type of questions you two are asking me and the responses I'm getting on IRC I get the point that the developers are not too interested in this. I am interested in the Motion Estimation task in http://wiki.videolan.org/SoC_x264_2010#Projects however I am going to take more time to fully explore the code in x264 before committing myself to it. I've only been studying it for 2 weeks.

Dark Shikari

25th May 2010, 21:20

Reports from a tester in #x264dev are that the device latency is more like 1 microsecond.

swg

25th May 2010, 22:07

Interesting I would have thought it to be higher. Oh well I'll look at the motion estimation problem as well as continue to look at the rest of the code.

MfA

26th May 2010, 17:17

Whatever happened to me-prepass? A true-motion motion estimation pre-pass, with the main code only taking a pick between true motion and predicted MV and then doing local RD optimization, could be easily decoupled from the main codec.