Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se |
|
|
#1 | Link |
|
Starting to learn
Join Date: Dec 2005
Location: Canada
Posts: 17
|
GPU Video Encoding - OpenCL
Hi,
I am an ITC student, and I have decided that my graduation project will be : "Utilizing GPU instead of CPU for video encoding" I am targeting the new OpenCL, from ATI, on my ATI HD 4670. Basically my final goal is do x.264 encodings utilizing the GPU, through building a new simple program, or modifying a pre-existing simple open source program, with ATI's OpenCL SDK. Though my coding knowledge experience is next to zero, it's something I really wanna work on, and I know it'll be something I enjoy (besides the times when things don't work and I feel like committing a suicide! ). I already was discussing my plan with Avery Lee at VirtualDub forum (I was set at first to use VirtualDub), but he pointed out that I should use something else simpler, as VirtualDub is too complicated for me with my programming background. I also think he suggested that optimizing a specific x.264 build for utilizing huge parallelism is more important than the program itself. I would like to know if you guys suggest a specific simple basic program with neat code/good documentation to start with? (ie, for studying the code, and then for modifying through OpenCL's SDK). OR would you recommend writing a simple code from scratch? ( In the later case I would still need a suggestion for an example code to study, anyways) Also I would like to know if x.264 is already optimized enough to utilize the massive parallelism of GPU's processors. Do I take it as it is or do I have to optimize it more? Finally, if you have any ideas, suggestions or hints, I would be happy with any assistance no matter how small. I live in Saudi Arabia, and not a single person in the whole country that I know of even has the slightest background of what I wanna do or how it should be done. I am really on my own. Thank you for your time Please don't tell me how difficult this project will be on me. I already know. I wanna learn, and my professor is crediting me for hard work and trying, even if I fail to achieve my goal at the end. Last edited by Mk4ever; 11th November 2009 at 07:50. |
|
|
|
|
|
#2 | Link |
|
x264aholic
Join Date: Jul 2007
Location: New York
Posts: 1,752
|
The problem with doing video encoding on GPUs is that there's huge portions of non-trivial code that cannot run in parallel (CABAC, for example). Dark Shikari approached this previously, and one of the big issues was how difficult it was to accelerate the simple functions that are run hundreds of times per frame, like the SAD function.
It's more or less a problem of optimizing a given function to utilize the maximum number of shaders on a GPU, without having sections sitting unused. Anyone can write a SAD function, but to write it so it fully utilizes the massive number of processing cores available is extremely tricky. My advice would be to try porting something small and optimize it for speed as best as you can. You can look up Dark Shikari's previous results with porting SAD with a simple search on google.
__________________
You can't call your encoding speed slow until you start measuring in seconds per frame. |
|
|
|
|
|
#3 | Link |
|
Software Developer
![]() Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,275
|
The problem with optimizing "something small" is that transferring data from the host (main memory) to the device (GPU memory) takes a pretty long time. If you calculation is only "small", this will be a serious bottleneck and kill your speed-up! Therefore the function you port to the GPU must not only scale extremely well to hundreds (if not thousands) of threads, it must also have a total running time that ai long enough to "hide" the transfer delay! There is no point in speeding up a calculation from 3ms to 1ms, if it takes 5ms to upload/download the input/output data to/from the device...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ |
|
|
|
|
|
#4 | Link |
|
Starting to learn
Join Date: Dec 2005
Location: Canada
Posts: 17
|
Thank you guys for your participation. Sorry for not replying sooner but I got busy with exams.
Well if complexity of optimization is that difficult (whether in general or just for me cuz I'am a beginner), what do u think of implementing a simpler profile for coding with x.264? a profile that does not support features that are hard to impelement with parallelism. A simpler profile would not utilize CABAC for example. (I am definitely interested in implementing a full High Profile x.264 with OpenCL at some point after my school project, but I know for the current time I should admit to my limited skills, and should start with something simple, until I learn more) Would that be easier? Would you guys recommend a specific profile that would be easier for my project? Thank you for your time. |
|
|
|
|
|
#5 | Link | ||
|
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
There is no magic way to get around the fact that practically every aspect of video encoding is hard to parallelize. Getting 15,000 threads out of x264 requires completely new algorithms. Quote:
|
||
|
|
|
|
|
#6 | Link | |
|
Starting to learn
Join Date: Dec 2005
Location: Canada
Posts: 17
|
Quote:
Unless I'm wrong, ATI has done it, in their AVIVO package, utilizing their GPUs to encode/ help with encoding H.264, as well as Nvidia with Badaboom, and both with Cyberlink's Espresso. I know their encoded videos do not compare to the quality of encodes done the old fashion way, but they did it. There must be a way. I am committed to doing my project for both my graduation and for learning how to do it. If you guys know anything that might help, an Idea, or a path I should take to resolve this problem, please let me know. And Dark Shikari, thanks anyways for your post, I know you are just trying to inform me of things I didn't know. I'd really appreciate it if you could point me to where I could learn more about which parts of x.264 are prallelizable and which are not, and if there are ways to get around that. I know I have a LOT of reading to do. Thank you for your time. I appreciate your help Edit: Do you guys think contacting ATI for a source for their codecs would help? Last edited by Mk4ever; 14th November 2009 at 08:41. |
|
|
|
|
|
|
#7 | Link | |
|
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
You can parallelize almost any task if you're willing to approximate all the data dependency away and accept bad results. We've already planned out what algorithms we would use on a GPU, but out of every single person who has told us they would attempt to implement them, every single one (dozens!) has disappeared without a trace. We suspect it's because CUDA/OpenCL is like a tome of Eldritch lore: people go mad from attempting to use it. Also, there is no dot in "x264". |
|
|
|
|
|
|
#8 | Link |
|
Starting to learn
Join Date: Dec 2005
Location: Canada
Posts: 17
|
Thank you for your reply.
I know they did it badly, but even doing what they did is still a plus to me: 1- I finish my project and graduate. 2- I learn what's right and what's wrong. Right now I know NOTHING. At least when I start to write the code, I learn what wrong shortcuts ATI or Nvidia used to make it easier to write the code, and when I'm done with the project, I'd know how to do it right later. It still has a positive side the way I see it. A wrong try, but closer to the target, and learning from their implementation and yours at the same time. As for the people who volunteered to write parts of the codec in OpenCL/CUDA/Stream, I am not volunteering at this stage. Even if I did, and I am serious about trying, you couldn't depend on me as I am just starting to learn. Maybe after months of coding I might reach the level to volunteer, but definitely not now. I am simply and frankly asking you guys for guidance from your experience on the best path or what to do at a certain stage. As for seriousness, I've spend nearly 8 years of my life following advancements in video encoding, without really getting involved with the code and programming. I am a smart, but a really lazy guy. I finally managed to force myself to learn and go deeper in this field by committing to this project. As for the (no dot) in x264, I know. It was just a typo. If you don't want to/ can't help me, that's definitely up to you and no one can blame you, but I am trying to learn and discuss. the least you could do is be supportive. Thanks anyways. |
|
|
|
|
|
#10 | Link |
|
Registered User
Join Date: Jan 2002
Location: San Jose, CA
Posts: 216
|
It's inherently difficult to port serial algorithms to a massively parallel architecture, so most likely an efficient gpu implementation would require different algorithms than what is currently used in x264 (thus not producing a binary identical output, but not necessarily worse either).
|
|
|
|
|
|
#11 | Link |
|
Starting to learn
Join Date: Dec 2005
Location: Canada
Posts: 17
|
Thank you guys for your posts
@ Dark Shikari, Thank you very much for your offer to help. I'll try to read some documentation about OpenCL and x264 first so I don't look completely stupid (yet it is still possible!). I'll try to catch you on IRC in the next few days. Thanks again guys. You really have no idea what it means to me to have any kind of help. Really appreciated. |
|
|
|
|
|
#12 | Link | |
|
Registered User
Join Date: Mar 2002
Posts: 1,075
|
Quote:
Even RDO can be parallelized with a greedy iterative search. (Change MV/mode/coefficients/whatever, encode MB and neighbors which use it for prediction using the context from the previous frame so it can be parallelized.) Won't work for mode changes with large knock on effects like dquants, but for the most part it should converge to something near optimal. A fuckton of work and experimentation though, even without the low level GPU coding. |
|
|
|
|
|
|
#13 | Link | |
|
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
|
|
|
|
|
![]() |
|
|