Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd July 2008, 02:39   #1  |  Link
Chengbin
Registered User
 
Join Date: Oct 2007
Posts: 1,060
CUDA encoders for H.264, hows the quality?

I'm very excited about CUDA and it's ability to encode videos to H.264 very quickly. I'm wondering about quality. Do you think the quality would be worse, about the same or (preferably) better? I don't care about time(though that would be icing on the cake), but quality is absolute first. Is there any information about the quality of the conversion? Is it possible to be supported by GUIs such as AutoMKV (a newer version, edited to support CUDA, of course). Most importantly, does it support my 8800GT?
Chengbin is offline   Reply With Quote
Old 2nd July 2008, 03:32   #2  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
There is no publicly available CUDA-supporting encoder if I recall correctly; there's two GPU-based solutions that are being promoted (though aren't released yet, I think). RapiHD is an intra-only solution designed probably for high-speed high-quality high-bitrate recording of film footage/similar. Elemental is promoting their own one (which has shown up on Anandtech/similar), but I'm somewhat underwhelmed by their results, since speed-wise x264 can beat it and I highly doubt they're using high-quality settings when promoting how fast they can go

GPU encoding has a lot of potential, but it has a lot of weaknesses too. Its a bit like programming for a Cell or an FPGA, except exponentially more of a nightmare.
Dark Shikari is offline   Reply With Quote
Old 2nd July 2008, 03:55   #3  |  Link
Inventive Software
Turkey Machine
 
Join Date: Jan 2005
Location: Lowestoft, UK (but visit lots of places with bribes [beer])
Posts: 1,953
I smell a potential final-year project coming on in the next year or 2 if nothing viable turns up in that time. Would be good to use the PS3's hardware properly for encoding, and I can see GPU encoding taking off if the feature sets are right.
__________________
On Discworld it is clearly recognized that million-to-one chances happen 9 times out of 10. If the hero did not overcome huge odds, what would be the point? Terry Pratchett - The Science Of Discworld
Inventive Software is offline   Reply With Quote
Old 2nd July 2008, 08:25   #4  |  Link
tre31
Registered User
 
Join Date: Dec 2006
Posts: 69
Toshiba already announced a laptop with a Cell DSP chip (similar too their previously announced FPGA/PCI-e solution - which I've heard nothing of since, would be good if they just bought out a PCI-e solution, people would snap it up like hot cakes) on it specifically for encoding/decoding tasks.

Announcements are all well and good, but its actual products that we want.... (yes patience is a virtue ...lol)
tre31 is offline   Reply With Quote
Old 2nd July 2008, 15:49   #5  |  Link
lexor
Registered User
 
Join Date: Jan 2004
Posts: 849
Quote:
Originally Posted by Inventive Software View Post
Would be good to use the PS3's hardware properly for encoding, and I can see GPU encoding taking off if the feature sets are right.
It would, but it won't happen to the full extent. If you don't shell out for a PS3 dev kit, you can't get at the GPU. It can only be done from PS3's firmware, Linux doesn't get access to the GPU and has to use software rendering mode to render graphics on Cell.
__________________
Geforce GTX 260
Windows 7, 64bit, Core i7
MPC-HC, Foobar2000
lexor is offline   Reply With Quote
Old 2nd July 2008, 16:00   #6  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by lexor View Post
It would, but it won't happen to the full extent. If you don't shell out for a PS3 dev kit, you can't get at the GPU. It can only be done from PS3's firmware, Linux doesn't get access to the GPU and has to use software rendering mode to render graphics on Cell.
And apparently the Cell is not very impressive either. From what I heard, an IBM team wrote an encoder from the ground up for the Cell and optimized it the best they could, but still needed 2.5 out of 8 cores to do realtime 1080p on extremely fast and crappy settings.

x264 can do realtime 1080p24 on a single 3Ghz core of a Core 2 on its fastest and crappiest settings, so those numbers aren't really impressive.
Dark Shikari is offline   Reply With Quote
Old 2nd July 2008, 16:07   #7  |  Link
audyovydeo
Registered User
 
audyovydeo's Avatar
 
Join Date: Apr 2007
Posts: 464
Quote:
Originally Posted by Dark Shikari View Post
From what I heard, an IBM team wrote an encoder from the ground up for the Cell and optimized it the best they could, but still needed 2.5 out of 8 cores to do realtime 1080p on extremely fast and crappy settings.
Are you talking about this ?

"White Paper
In this paper we present an implementation of an H.264 video encoding algorithm on a Cell Broadband Engine (CBE), for the application of high-quality video surveillance. The proposed system aims to encode three channels of a standarddefinition (720 × 480) video stream at 30 frames per second with a target bit-rate of 2 Mbps. The presented encoder is compliant with the main-profile of the H.264 standard, and uses a learning-theoretic mode selection algorithm as an alternative to brute-force rate-distortion optimized mode selection, enabling significantly reduced computational complexity. The CBE offers an aggregate of 204.8 GFlops of computing power, at 3.2 GHz, in 8 Synergistic Processor Elements (SPEs), each with 128-bit wide vector processing capability. The SPEs are under the control of a central Power Processor Element (PPE) which has its own 128-bit vector processing unit and all units are connected by an on-chip broadband bus with 25.6 GB/s bandwidth capacity and an I/O bus providing 50 GB/s. This combination of processing units and high-speed internal buses is ideally suited for the target application of multi-channel real-time H.264 video encoding. The proposed system employed only the standard tools provided with the CBE toolkit, without resorting to customized assembly level programming."

Revision Date: 31/10/05


source : http://www-01.ibm.com/chips/techlib/...2570AB00594459


or this : http://www-03.ibm.com/technology/cel...n_brief1.6.pdf ???


cheers
audyovydeo

Last edited by audyovydeo; 2nd July 2008 at 16:11.
audyovydeo is offline   Reply With Quote
Old 2nd July 2008, 16:08   #8  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by audyovydeo View Post
Are you talking about this ?

...

standarddefinition (720 × 480) video stream at 30 frames per second with a target bit-rate of 2 Mbps.
No, I'm not talking about that. I said 1080p, not SD
Dark Shikari is offline   Reply With Quote
Old 2nd July 2008, 16:17   #9  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,392
Quote:
Originally Posted by audyovydeo View Post
uses a learning-theoretic mode selection algorithm as an alternative to brute-force rate-distortion optimized mode selection, enabling significantly reduced computational complexity.
Translation: "We threw out all the structure of the problem and gave it to a black-box optimizer, and then we had to compare it against brute-force because it didn't fare well against anyone else's ad-hoc heuristics."
akupenguin is offline   Reply With Quote
Old 2nd July 2008, 16:17   #10  |  Link
audyovydeo
Registered User
 
audyovydeo's Avatar
 
Join Date: Apr 2007
Posts: 464
Quote:
Originally Posted by Dark Shikari View Post
No, I'm not talking about that. I said 1080p, not SD
!! The linked PDF actually dated 1998 !!!

cheers
audyovydeo
audyovydeo is offline   Reply With Quote
Old 2nd July 2008, 17:41   #11  |  Link
lexor
Registered User
 
Join Date: Jan 2004
Posts: 849
Quote:
Originally Posted by akupenguin View Post
Translation: "We threw out all the structure of the problem and gave it to a black-box optimizer, and then we had to compare it against brute-force because it didn't fare well against anyone else's ad-hoc heuristics."
I think you are being unfair. And this isn't the first time you've made that argument. How does x264 with all it's great (on x86) algorithms run on Cell? Like shit. That's because hand written asm goes out of the window, and gcc fails oh-so-hard at generating code for Cell (more so than for anything else). You can't expect IBM, or any other research team for that matter, to start implementing heuristics to compare. They just implemented the easy brute-force method so that when others implement their algorithms on Cell they can compare to it as well and know how their heuristic is doing compared to IBM's.

Similar point goes to DS, I doubt very much they've created a full featured encoder. They did it in terms (and time frame) of research project, now comapre that to how long x264 took to get to where it is, and it is still being improved. I think criticizing IBM's effort here for being weak is like criticizing midgets for being short.
__________________
Geforce GTX 260
Windows 7, 64bit, Core i7
MPC-HC, Foobar2000

Last edited by lexor; 2nd July 2008 at 17:50.
lexor is offline   Reply With Quote
Old 2nd July 2008, 19:09   #12  |  Link
Gabriel_Bouvigne
L.A.M.E. developer
 
Gabriel_Bouvigne's Avatar
 
Join Date: Dec 2001
Location: Paris - France
Posts: 276
Quote:
Originally Posted by Dark Shikari View Post
From what I heard, an IBM team wrote an encoder from the ground up for the Cell and optimized it the best they could
I think that it was in fact written by Vanguard
Gabriel_Bouvigne is offline   Reply With Quote
Old 2nd July 2008, 21:50   #13  |  Link
Inventive Software
Turkey Machine
 
Join Date: Jan 2005
Location: Lowestoft, UK (but visit lots of places with bribes [beer])
Posts: 1,953
You know what? GPUs these days are pushing the Teraflop barrier, and we're struggling to use them to our advantage. Not so long ago, 100 billion operations per second (100 G-ops) was really something to shout about... http://www.youtube.com/watch?v=ldiYYJNnQUk
__________________
On Discworld it is clearly recognized that million-to-one chances happen 9 times out of 10. If the hero did not overcome huge odds, what would be the point? Terry Pratchett - The Science Of Discworld
Inventive Software is offline   Reply With Quote
Old 3rd July 2008, 11:55   #14  |  Link
708145
Professional Lemming
 
708145's Avatar
 
Join Date: Dec 2003
Location: Stuttgart, Germany
Posts: 359
Hi folks,

Quote:
Originally Posted by audyovydeo View Post
[I]
...
The proposed system employed only the standard tools provided with the CBE toolkit, without resorting to customized assembly level programming."
Well the CBE toolkit includes assembly optimized functions in a library. What they did not do is use additional assembly optimization.

And as already pointed out the algorithms and heuristics have to be tuned to/for the CELL architecture. These heuristics will be very different from x86 optimizations!
The main problem is that the people optimizing and tuning for CELL are by no means video experts. But nonetheless there is work going on towards good encoders on CELL.

A side note:
Despite the number 1998 in the filename (which is a counter not the year) the PDF is from 2005.
[Jagmohan 05] A. Jagmohan, B. Paulovicks, V. Sheinin, H.
Yeo, “H.264 Video Encoding Algorithm on Cell Processor,”
GSPx 2005.

bis besser,
T0B1A5
__________________
projects page: ELDER, SmoothD, etc.
708145 is offline   Reply With Quote
Old 3rd July 2008, 17:23   #15  |  Link
7oby
Registered User
 
Join Date: Sep 2007
Posts: 27
Quote:
Originally Posted by audyovydeo View Post
The proposed system employed only the standard tools provided with the CBE toolkit, without resorting to customized assembly level programming."
I'd also like to add that jumping to assembly isn't the first optimization you want to do. Probably posted elsewhere a couple of times, but I think this

http://www.hotchips.org/archives/hc1.../HC17.S1T4.pdf

is a somewhat impressing demonstration of the profiling tools for Cell. It shows at the example of a real H.264 encoder how to balance all the constraints like DMA latency, number of registers, branch misses ... and using pipeplining to load the SPEs with sufficient work.

Cell development environment looks very mature compared to CUDA. Cell should be less painful. A Playstation3 could be a very powerful BluRay transcoder, besides being not exactly legal.

Last edited by 7oby; 3rd July 2008 at 17:26.
7oby is offline   Reply With Quote
Old 3rd July 2008, 17:50   #16  |  Link
MfA
Registered User
 
Join Date: Mar 2002
Posts: 1,075
Cell is different than a GPU ... a lot different. Good programmers can get good utilization out of it for nearly any type of algorithm (including branch heavy stuff).
MfA is offline   Reply With Quote
Old 5th July 2008, 14:20   #17  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Hehe the Elemental Guys trying themselves in Advertising
http://www.badaboomit.com/

also Cyberlinks PowerDirector is gonna support ATIs 4x00 series AVT http://www.cyberlink.com/eng/press_room/view_1756.html

I wonder on which H.264 SDK it's based i remember their old H.264 encoder was based on Moonlights very old one and later i think the Mainconcept SDK has been used, it wouldn't be bad if the Cyberlink Engineers added GPU support to that now or maybe it comes from Elecard/Mainconcept directly, tough they should first get Decoding on the GPU work as it was the case in the last Elecard Beta Decoders (before they joined Mainconcept it worked) instead of concentrating so much on GPU Encoding now (just my 2 cents)

And last but not least ATI updated their AViVo Transcoder Software Core Recently (i guess it's now on the level that AMDs Encoder showed in the MSU test, tough maybe it's even a little better now) but it's still impossible to set any tools everything is still preseted

It's hard to judge Elementals Quality and Watt efficiency in Encoding on the GPU compared to CPU by all the Reviews that are out currently and done with the 2 min restricted Alpha Press Demo. As long as it doesn't hit Doom9 or it's Community Members (which is really surprising that he didn't get a Press Copy to test, i would guess Elemental fears the Doom9 judgment when comparing against the X264 optimizations and a good balanced set of Device Profiles for it hehe).
Also no Encoded stream has been released yet to analyze the tools that get used for the different Device Profiles Elemental decided to use.
I think full judgment and in depth comparison vs X264 can only be conducted as soon as the Premiere Plugin (with the whole setting featureset) gets released.

@Inventive Software
This Voxel Space Rendering is pretty well usage off the new Teraflop Power if you ask me
http://farm4.static.flickr.com/3148/...1219cf32_b.jpg
http://farm4.static.flickr.com/3085/...c42296fe_b.jpg
think about in the future not Broadcasting Video but High Compressed 3D assets

Heres some AMD Press Material about AVT and all the new Scaling stuff (most probably now lanczosresize via the driver (shader) directly) (Careful this is Business representation stuff no real technical information here it gives only clues and exaggerating of facts is common in such marketing Material) (hmm never saw any information about ATIs Premiere Encoding Plugin interesting must have been a test for API implementation for 3rdparty Developers)

http://static.pcinpact.com/pdf/docs/...edia-Final.pdf
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 6th July 2008 at 04:22.
CruNcher is offline   Reply With Quote
Old 19th July 2008, 20:30   #18  |  Link
metaxaos
Registered User
 
Join Date: May 2008
Posts: 3
So why just not to switch x264 to use CUDA in next version? Community doesn't have NV cards?
It is really strange to have a method to generally boost up perf in times but not to use it.
I am even agree to pay for that improved x264.
metaxaos is offline   Reply With Quote
Old 19th July 2008, 20:32   #19  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by metaxaos View Post
So why just not to switch x264 to use CUDA in next version? Community doesn't have NV cards?
Want to rewrite the entire encoder completely using the most nightmarish API I've ever seen in my life? Patches are welcome.
Quote:
Originally Posted by metaxaos View Post
I am even agree to pay for that improved x264.
Given my experience so far in trying to port the motion search to CUDA, and Avail's hiring of a contractor to attempt to do so, I'd put the quote for porting the whole encoder somewhere on the level of a few million dollars... if you can even find people willing and able to do it.
Dark Shikari is offline   Reply With Quote
Old 19th July 2008, 20:45   #20  |  Link
metaxaos
Registered User
 
Join Date: May 2008
Posts: 3
But what exactly is so bad that you can call it "nightmarish"? I'm not a specialist but AFAIR CUDA is almost standart C plus compiler. Is it really so way-more ass-pain programming for GPUs than for multy-CPUs?
metaxaos is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:21.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.