View Full Version : x264 & AMD Stream SDK (GPGPU, SMP, CUDA)
sacharja
15th November 2008, 02:19
AMD will release its answer on NVIDIA's CUDA in 3 weeks (German translated news):
http://translate.google.com/translate?u=http%3A%2F%2Fwww.pcgameshardware.de%2Faid%2C666909%2FNews%2FAti_Stream_ab_Catalyst_812_mit_GPU-Video-Encoding_und_mehr%2F&hl=de&ie=UTF-8&sl=de&tl=en
And AMD's presentation is here:
http://www.pcgameshardware.de/&menu=browser&entity_id=154104&image_id=936621&article_id=666909&mode=article&overview=yes&page=2&show=original
AMD Stream SDK is released on 10th december with the new Catalyst 8.12 driver, free and open source to download; up to 17x acceleration for all HD 3xxx+ modells. Will some x264 developers have a look and might x264 support Stream?
Dark Shikari
15th November 2008, 03:27
Patches welcome, but if their performance claims are accurate, its completely useless.
They claim 5x realtime encoding speed of 320x240 video.
I mean, seriously?
x264 can 26.5x realtime on a brand new Nehalem 2.66Ghz... with one core. If their marketing is actually true, this will be slower than Badaboom by a wide margin.
cogman
15th November 2008, 04:25
Patches welcome, but if their performance claims are accurate, its completely useless.
They claim 5x realtime encoding speed of 320x240 video.
I mean, seriously?
x264 can 26.5x realtime on a brand new Nehalem 2.66Ghz... with one core. If their marketing is actually true, this will be slower than Badaboom by a wide margin.
Bu, bu, Its got the x in it. Anything that is 5x something must be fast :P.
On a serious note, with AMD trying to be OpenCU (or is it OpenCL) compliant, and I believe nVidia is already there. Could there be a speed boost realized in x264, and if so, where?
Dark Shikari
15th November 2008, 04:27
Could there be a speed boost realized in x264, and if so, where?Sure, but it would require offloading a very large portion of the code to the GPU (latency is too high to offload only a small portion), so it would take a lot of effort. Patches welcome though.
If someone wanted to try something simple, they could port the hpel filter or lowres interpolation filter to the GPU.
sacharja
15th November 2008, 10:14
I've corrected the links in the post above. Please have a look. They claim they're 1,5% faster than badaboom:
http://www.pcgameshardware.de/&menu=browser&entity_id=154104&image_id=936636&article_id=666909&mode=article&overview=yes&page=3&show=original. This seems realistic because AMD's shader performance is better than Nvidia's.
Here it says sth. about h.264 support, but I don't know what they mean (avivo?).
http://www.pcgameshardware.de/&menu=browser&entity_id=154104&image_id=936635&article_id=666909&mode=article&overview=yes&page=2&show=original
BTW: the AMD cards surely have a DEcoder for h.264. As you can read in the presentation Cyberlink even claimed a magnificent performance boost with Stream in the upcoming WinDVD.
CruNcher
15th November 2008, 11:09
For now it's still questionable if the GPU hype is all worth it currently GPU still lack in terms of Power Efficiency Imho though if you compare a Quadcore with a new Generation GPU the difference slowly is vanishing and i guess sooner or later CPU cant hold up anymore, that will be the time GPU Encoding really becomes interesting :) i made a small Power Consumption comparison (Decoding for now) Encoding (in the works) with my G92 (rather old GPU vs old CPU)
Blu-Ray (Casino Royale Construction Run)
idle = 105W
CoreAVC = 160W
GPU = 113W
This was on a X2 CNQ enabled running on it's highest multiplier @ 1.3v 2.4 GHz (load) and lowest multiplier @ 1.1v 1.1 GHz (idle)
a heavy amount of power draw comes from the GPU while in Idle state (Energy is totally inefficient blown out here, 3.2v all the time no usage adaption)
I should mention that the Decoding in this example doesn't work over CUDA but on the PV2 Logic a small ARM11 Core that's inside Nvidias GPU and runs @ 400 MHz (basicly a SOC inside the GPU, independent of it) Ati uses the same approach calling the chip logic UVD
Though there are other solutions in form of specialized low cost SOC's (first time big adoption of that gonna be seen with Microsofts Sheed Transcoding on Win7) that have the same or even better efficiency then GPUs internal used Decoding logic and would allow even more power to be saved (those solutions are mostly used in the Professional area but slowly emerge in consumer trans coding as well), especially in Netbooks those SOC's are gonna get used alot :)
lexor
15th November 2008, 14:30
CruNcher, I don't follow your logic. You claim GPU is power inefficient, but then provide measurements that say the opposite.
sacharja
15th November 2008, 18:03
@CruNcher
You're comparing apples with pears. On simple tasks a GPU can be 200 times faster than a CPU, i.e. a CPU with 45W would need 9000W for the same performance. Disadvantage is that the application area of a GPU is VERY limited.
@lexor
I also have a HD2600XT ;) Do you also buy the successor to RV630, the RV730 (HD4670)? Mine is coming on wednesday, I'm curious if it really consumes just 3W more while being twice as fast (and Stream of course).
LoRd_MuldeR
15th November 2008, 18:10
@CruNcher
You're comparing apples with pears. On simple tasks a GPU can be 200 times faster than a CPU, i.e. a CPU with 45W would need 9000W for the same performance.
I'd say you are comparing apples with pears. What is the benefit of a GPU being able to do a certain calculation 200x faster than a CPU, but there's no encoder that is actually able to make use of that theoretical speed-up? We are yet to see a GPU-based encoder that runs only 2x faster than the CPU-based one and provides equal quality at the same bitrate. So a speed-up of 200x seems to be completely unachievable in the near future. I'm pretty sure the CPU won't be unemployed that soon. It's more likely that particular GPU-friendly tasks will be moved to the GPU...
CruNcher
15th November 2008, 19:56
CruNcher, I don't follow your logic. You claim GPU is power inefficient, but then provide measurements that say the opposite.
No i just want to say with that, that it's not yet their in terms of state switching :) and shutting of unneeded logic to save power @ idle.
The power consumption @ work is ok @ least for the task of Video Decoding (though that is done in this case not on the GPU but on a SOC like logic inside the GPU)
@sacharja
as Lord_Mulder said show me the GPU that does 200x transcoding @ the same Quality as current CPU Encoders,
even Nvidia and Ati don't go that high and claim something like 17x, though both compare on a Core2 vs Quicktime Itunes and that is the real joke here :)
LoRd_MuldeR
15th November 2008, 20:13
even Nvidia and Ati don't go that high and claim something like 17x, though both compare on a Core2 vs Quicktime Itunes and that is the real joke here :)
Until they trust their own encoders enough to compare them against x264 running on Nehalem (or at least Core2 Quad), we are far away from GPU encoders taking over the scene ;)
sacharja
15th November 2008, 20:31
@CruNcher
As I wrote some posts before, GPUs can be the ultimate performance boost, but just for a very limited set of tasks (like folding proteins in medical applications). You can't compare them in the means of efficiency. Either they are up to 200 times more efficient, or they are totally inefficient because they can't run 99% of the programs you have on your computer.
CruNcher
15th November 2008, 20:47
The most efficiency of GPUs i currently see for the average consumer (and i only speak for the average consumer as this is what Nvidia and ATi market this stuff for primarily) in broad range software to not waste this idle energy (especialy of the older GPU generations) is to allow nice animated GUIs for OSes at least there you use the power consumption for something even if we can arguee about the usefullness of that (though alot of average consumers like it) :)
lexor
15th November 2008, 21:14
@lexor
I also have a HD2600XT ;) Do you also buy the successor to RV630, the RV730 (HD4670)? Mine is coming on wednesday, I'm curious if it really consumes just 3W more while being twice as fast (and Stream of course).
Nah, I'm still on the 7 year old AGP system here, so no more upgrades for this little puppy. Come January/February I'll finally be buying a replacement, though it will probably be Nvidia (simply because I want to get in on CUDA decoder Neruon2 got going with DGAVCDecNV). ATI dropped the ball on this, and it's the same ball AMD dropped when competing with Intel's compiler division.
CruNcher
15th November 2008, 21:37
Why are you so sure ATI droped the ball on giving access to UVD for average developers i wouldn't say so see the recent Stream SDK anouncement most probably there will be also a way to access UVD via it like there was for PV2 and Nvidia :)
They are late yes but i don't think that they gonna withold UVD access to average developers seeing Nvidias extreme moving forward on this matter like recently the Linux Patch for all the important OSS Video stuff http://www.phoronix.com/scan.php?page=article&item=nvidia_vdpau&num=1 http://mythtv.org/pipermail/mythtv-dev/2008-November/063677.html mplayer, libavcodec, libavutil, and ffmpeg
Ati has to react now and the API is allready implemented and waits to get used :) see http://www.phoronix.com/scan.php?page=article&item=amd_xvmc_xvba&num=1 only the tools to access it are missing and those will most probably arrive with the Stream SDK on the 10th of December, maybe ATI now under presure are even gonna bring out own OSS patches before the official SDK release :)
9876
15th November 2008, 23:07
I think if [AMD Stream/Nvidia CUDA] works, why not using them together, even they are slow?
A simple example:
I can use the C2Q to encode my video vol 1,2,3 and use the GPU to encode my video vol 4...
If they complete at the same time, it will boost performance 25% (or less little), right?
:)
LoRd_MuldeR
15th November 2008, 23:23
I don't think the GPU can take over the entire encoding process, making the CPU free to perform another encode concurrently.
It's more likely that the GPU will assist the CPU as a Co-Processor by moving particular parts of the encoding process from the CPU to the GPU.
Also you can easily put 100% load on your C2Q by encoding one single video with x264, no need to encode several videos concurrently...
foxyshadis
15th November 2008, 23:48
Moving to hardware, discussion isn't really related to x264 (until someone writes some patches).
sacharja
16th November 2008, 00:01
Moving to hardware, discussion isn't really related to x264 (until someone writes some patches).
Patches? Stream SDK v1.3 is released on 10th december. But it would be interesting if someone plans writing some when it's been released. AMD says implementing Stream support will be easy, I think we just need to wait and see.
deekey777
16th November 2008, 00:39
I think if [AMD Stream/Nvidia CUDA] works, why not using them together, even they are slow?
A simple example:
I can use the C2Q to encode my video vol 1,2,3 and use the GPU to encode my video vol 4...
If they complete at the same time, it will boost performance 25% (or less little), right?
:)
With PowerDirector 7 and HD4850 you will able to encode four videos at once.
Here is a nice interview: ATI Stream Team Interview (http://www.rage3d.com/articles/stream/index.php?p=3)
WTF is ATV? -> http://ati.amd.com/products/firepro/Siggraph_2008_video_encode_final.pdf
It's always the same: x.264 against the world, but what's about x.264 and eg. MC as a part of TMPGEnc and CUDA filters?
LoRd_MuldeR
16th November 2008, 00:45
With any pure software encoder I'm able to encode as many videos concurrently as I like :p
The only question is: Does it make sense ???
Instead of encoding n videos at the same time, I could simple convert them one after another (batch encode) and each video will taking only 1/n of the time...
Jawed
16th November 2008, 00:49
This PDF should prove to be rather sobering info for anyone who's still optimistic:
http://ati.amd.com/products/firepro/Siggraph_2008_video_encode_final.pdf
Note how the comparison is against a single core.
LOL, the thread moved on a bit while I was reading/being-distracted/posting.
Jawed
deekey777
16th November 2008, 00:59
With any pure software encoder I'm able to encode as many videos concurrently as I like :p
The only question is: Does it make sense ???
Instead of encoding n videos at the same time, I could simple convert them one after another (batch encode) and each video will taking only 1/n of the time...
If the converting a video for an iPhone underutilizes a HD4850, why not to convert 2-4 videos at once, if it doesn't take (much) more time (example one video -> 30 minutes, four videos -> 90 minutes)?
I've converted my Firefly DVDs for my iPhone and it took a lot of time (AutoMKV, 13 episodes*, ~ 40 minutes for every episode, [2-pass, X2 4200+], the quality is great!)
*Serenity (Pilot) was converted by Nero
LoRd_MuldeR
16th November 2008, 01:09
If the encode "underutilizes" the GPU, you could simply use 100% of it's processing power instead of only a part of it and finish the one video faster. Then process the next video. And so on...
So if you can finish n videos in m minutes, then you can finish 1 video in m/n minutes. Or better: You should be able to do so!
The fact that they need to encode several videos concurrently to get 100% load on the hardware shows that they have a bottleneck in their encoder, their multi-threading code scales really bad or both :rolleyes:
CruNcher
16th November 2008, 09:41
Yep look @ the slides carefully it will be the same situation as with Nvidia for Encoding the CPU will still be utilized for Nvidia those utilization is somewhere @ 30% most likely for ATI it will be somewhere in that region too, it's not the same situation as with Video Decoding (which is almost done entirely on the GPU, though that is done on it's own optimized logic inside of the GPU).
MfA
16th November 2008, 20:34
Non greedy R/D optimization of MB parameters suits the GPU well, since the search space is too great for trellis based techniques (presently most of the optimization above MB level is greedy, don't think even the quantizer is done with trellis search although if you stick to just optimizing that in isolation trellis is practical).
This is not the kind of thing the commercial world or NVIDIA/ATI is likely to produce though ...
cyberbeing
17th November 2008, 04:26
It sounds like AMD is going to be releasing a h.264 encoder along with the Stream SDK?
If so, does anybody know if AMD's encoder will be like Badaboom and only support Baseline Profile?
CruNcher
17th November 2008, 07:52
thats unfortunately 1 thing they left complete unanswered and i guess they even didn't answered it to the audience most probably no one asked either ;)
sacharja
17th November 2008, 22:20
Because the whole Stream is open source and meant to ensure a platform independent programming and to create some standards, we might even see a x264 implementation :D
Because this is now in Hardware forum: I just got my Sapphire HD-4670 today for 77€ and it's almost twice as fast as my old HD-2600. The (passive) Accelero S2 cooler fits perfectly (even if it's not stated at the homepage). Board design is exactly the same as the Sapphire HD-2600XT, just the chip and ram is different. I have a passive system (not one fan in the whole PC system) and the card is just about 60°C at full load and under 50°C at idle, if it had a fan it would run with 3% ;) Watching movies is subjectively faster and the CPU load is with 48% less than the 90% with the old card. If Ati's postprocessing filters improve I might use the Stream decoder codecs in the near future. Haven't tested the HDMI 7.1 LPCM sound output though.
Sharktooth
23rd November 2008, 05:16
the good news is the encoder will be included in a future version of the catalyst drivers... ;)
sacharja
23rd November 2008, 13:17
the good news is the encoder will be included in a future version of the catalyst drivers... ;)
Do you know which encoder?
tomos
23rd November 2008, 14:35
isnt it just going to be avivo basically but this time it will actually use the GPU? comes out on the 10th or so of Dec iirc
Sharktooth
25th November 2008, 04:11
all i can say is it's not the old encoder running on the GPU.
sacharja
10th December 2008, 13:34
Stream SDK v1.3 Beta is released: http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=104500&enterthread=y
Brook+ is AMD’s modified Brook open source compiler. CAL is AMD's forward-compatible, cross-platform interface to the GPU. Both components are required to implement applications in the Stream environment. The download comes complete with a unified installer. Installation includes programming guides and specifications for Brook+ and CAL.
Sounds good to me ;)
BTW: I tried the Catalyst 8.12 from here http://www.sapphiretech.com/us/ , seems to be the RC3 instead of the final, but I couldn't resist. Nevertheless I didn't found any h.264 encoder. CCC's Avivo is just encoding Mpeg-1, Mpeg-2, WMV, DivX and Xvid (I tested the whole drop-down box, because it's not obvious in CCC which codec is used). I also got no video while converting a WMV-HD to ipod.
sacharja
10th December 2008, 16:21
I was able to encode a WMV-HD trailer with ATIXcoder (http://forums.guru3d.com/showthread.php?t=226993) and H264 (not the ipod crap).
About the quality, maybe you can see a difference between x264 and AVIVO :D :
http://rapidshare.com/files/172091601/AVIVO_with_GPGPU_vs._x264.rar.html
At least it's fast, a full HD movie is converted to H264 in 5 minutes instead of 30 minutes with x264. BTW: I wasn't able to test how the GPU is used because I momentarily have no tool that can show me GPU usage, ATITool is not working with my new HD-4670.
Dark Shikari
10th December 2008, 16:23
At least it's fast, a full HD movie is converted to H264 in 5 minutes instead of 30 minutes with x264.Have you tried using equivalently fast settings with x264 and comparing quality?
Edit: god, what an awful comparison. If you're going for max quality, the x264 settings are awful, and if you're going for comparability to the AVIVO encode in terms of features used, they're bad as well...
x264 - core 65 r1016bm dbc5ef0 - H.264/MPEG-4 AVC codec - Copyleft 2003-2008 - h
ttp://www.videolan.org/x264.html - options: cabac=1 ref=1 deblock=1:0:0 analyse=
0x3:0x13 me=umh subme=7 psy_rd=0.0:0.0 mixed_ref=0 me_range=16 chroma_me=1 trell
is=1 8x8dct=1 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=2 thread_queue=2 n
r=0 decimate=1 mbaff=0 bframes=0 keyint=250 keyint_min=25 scenecut=40(pre) rc=ab
r bitrate=670 ratetol=1.0 qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 ip_ratio=1.40 aq
=1:1.00:0:10.00
And H.264 in AVI? Single-pass encoding? 8x8dct but no B-frames? Psy off? Threads 2, despite the fact that there is no CPU in the world on which threads 2 is optimal for performance? facepalm
(For comparison, AVIVO uses baseline profile...)
Edit again:
I am laughing hysterically. Thank you for the AVIVO encode. You have vindicated yourself. :D
Speaking of which, can I have a sane encode from AVIVO that isn't at near-CQP51? I can't reverse-engineer it with this kind of insanity ;)
sacharja
10th December 2008, 17:03
Just tried to use the same settings in AVIVO and x264 with the few settings I can change in ATIXcoder (in AVIVO directly you can't even change the resolution) while keeping the bitrate low to emphasize differences.
If you think you can make a better comparison go ahead ;) But AVIVO is known for its poor quality, thus the results seemed realistic to me.
I only did this comparison because I initially had the hope that AMD uses x264 instead of their cheap encoders.
Have you had a look at Stream v1.3 yet? Is there hope for a x264 with Stream support?
EDIT: Here's a detailed test of PCGH http://translate.google.com/translate?hl=de&u=http%3A%2F%2Fwww.pcgameshardware.de%2F%3Farticle_id%3D669991&sl=de&tl=en&swap=1
As the first AVIVO this seems to be just a software encoder that IS NOT using Steam.
deekey777
10th December 2008, 21:16
RE PCGH:
http://forum.beyond3d.com/showthread.php?p=1247436#post1247436
And here are ComputerBase' results:
http://www.computerbase.de/artikel/hardware/grafikkarten/2008/kurztest_ati_video_converter_benchmarks/3/#abschnitt_benchmarks
So, the GPUs are used.
And it's better to install ATi Tray Tools, then you get this:
http://img2.abload.de/img/unbenanntxynm.jpg
What really sucks, that AVIVO@GPU/AVT support is limited to HD4000 series. Official statement: "HD4800 delievers 5x better integer performance". Great, do I care about this? I'm an owner of the HD3850 and here are its specs:
http://ati.amd.com/products/radeonhd3800/specs.html
ATI Avivo™ HD Video and Display Platform
* Dedicated unified video decoder (UVD) for H.264/AVC and VC-1 video formats
...
* MPEG-2, MPEG-4, DivX, WMV9, VC-1, and H.264/AVC encoding and transcoding
tomos
11th December 2008, 02:27
just tried the avivo thing after downloading it off ATis site since it wasnt included in the 8.12 package, and right there on the page it says:
Avivo Video Converter*
*Avivo Video Converter will only work
with X1000 series products
installed it anyway and it doesnt accept any type of file i have. VOBs, M2TS, matroska and doesnt take AVS scripts either. i dont have AVIs to test it with so it's pretty much useless to me now :(
Sharktooth
11th December 2008, 03:19
older cards will be probably supported in future versions.
tomos
11th December 2008, 11:10
i have a 4850 though.
Bitwarrior
11th December 2008, 16:59
just tried the avivo thing after downloading it off ATis site since it wasnt included in the 8.12 package, and right there on the page it says:
installed it anyway and it doesnt accept any type of file i have. VOBs, M2TS, matroska and doesnt take AVS scripts either. i dont have AVIs to test it with so it's pretty much useless to me now :(
Unfortunately ATI did not expose the encoder as a DirectShow device. I looked at newly installed video codecs but unfortunately nothing was there. Otherwise we could use Graphedit to use it for converting from almost all other formats.
Sharktooth
11th December 2008, 17:50
it does.
LoRd_MuldeR
11th December 2008, 18:18
it does.
He probably was looking for a VfW Encoder (e.g in VirtualDub or VirtualDubMOD). DirectShow encoders won't show up there, of course.
@Bitwarrior: Try to use GraphStudio or GraphEdit.
Bitwarrior
11th December 2008, 20:00
He probably was looking for a VfW Encoder (e.g in VirtualDub or VirtualDubMOD). DirectShow encoders won't show up there, of course.
@Bitwarrior: Try to use GraphStudio or GraphEdit.
I did. I will try again and report here (naming is probably not that obvious). It would be nice to be able to write an alternative user interface around it and to extend the input formats it likes.
LoRd_MuldeR
11th December 2008, 21:14
It would be nice to be able to write an alternative user interface around it and to extend the input formats it likes.
Like this?
http://forum.doom9.org/showpost.php?p=1222130&postcount=4
Bitwarrior
11th December 2008, 21:33
Like this?
http://forum.doom9.org/showpost.php?p=1222130&postcount=4
:thanks: I will check it and test whether it likes DV as input. However the remarks about quality are worrying. It might not be worth the effort after all.
qyqgpower
12th December 2008, 15:13
although this is kinda OT, I have several questions about ATI's directshow filters.
1. Is the hardware deinterlacer accessible via graphedit? I remember someone had said he could use it in his encoding. I have found a filter named "MMACE DeInterlace" which is from ATI's dll. But only two output pins in this filter and no input pin is available.
2. what's the directshow name of xcode.dll (which is responsible for the ATI converter)?
tre31
12th December 2008, 21:22
although this is kinda OT, I have several questions about ATI's directshow filters.
1. Is the hardware deinterlacer accessible via graphedit? I remember someone had said he could use it in his encoding. I have found a filter named "MMACE DeInterlace" which is from ATI's dll. But only two output pins in this filter and no input pin is available.
2. what's the directshow name of xcode.dll (which is responsible for the ATI converter)?
1. Yes it is using Cyberlink PowerDVD (PDVD7) decoder, then using a directshow encoder (if you want too do it in graphedit/graphstudio) or using something that supports input from diretshow like this - http://forum.doom9.org/showthread.php?t=141441
Read more here - http://forum.doom9.org/showthread.php?t=143106
I tried too explain what I do there, however the person I explained too is still asking the same question even after I explained what too do, just no helping some people even after you give them the cleaned fish, fillet it, and even go as far as put the oil in the pan ... lol
2. Not sure I can't find a way as yet, the MPEG Encoder is available.
------
My test on ati hd2600xt using (http://www.radeon.ru/downloads/att/b/atixcoder.rar) :
input - 1920x1088 pal 16:9 m2v
output - 720x576 16:9 h264
gpu usage during encode - 18% constant
cpu usage on q6600 - 90-95% on 3 cores last was not used much
verdict on output - .avi (WTF ati? - this crashed explorer multiple times with cl264dec.ax being culprit when i even clicked on the file)
was baseline 3.0 - er .. why even bother
encoded at 100fps, but I'm sure x264 would do the same or perhaps better at such low settings.
gpu transcoding - well it was using the gpu for something, whether that was de-interlacing or resizing or encoding I really don't know, as for the output - didn't bother too check since it was crashing things, so why bother, in the end at such low settings I say atm = FAIL ... in the future if they do Main & High Profile and allow for better configuration settings - lets face it if you are going too use the gpu you want too use good encoding settings, not bare minimum, then and only then will you be able too perform a good comparison against x264.
sacharja
22nd December 2008, 13:59
I found an article from Hexus.net (http://www.hexus.net/content/item.php?item=11725), where they were able to test GPGPU with a Radeon card and a Adobe Premiere Plugin. Performance boost versus CPU only according up to 4x:
As it happens, many media-rich applications' code is wonderfully parallelisable, meaning that, if coded for, additional plug-ins could take the load off the CPU, place it on the broad, broad shoulders of the GPU, and execute in a more-timely fashion. As an example, during the CTO conference in Amsterdam, Holland, a high-definition clip was transcoded to MPEG2 HD. Running on a Radeon HD 3870 X2, with the Adobe Premier plug-in activated, the GPU ran at 4x a quad-core CPU's speed.
AMD's presentation about AVT (Accelerated video transcoding) says the following about supported programs:
http://www.freeimagehosting.net/uploads/th.23682aeb30.png (http://www.freeimagehosting.net/image.php?23682aeb30.png)
It seems that Cyberlinks PowerDirector 7 (updated next year) will be the first application that takes full advantage of the GPGPU. I wrote a mail to Cyberlink, because the GPGPU update should've been released in Q3 08. Here is the answer I got:
Dear Customer,
Thanks for your e-mail.
This is Cyberlink Customer Support.
Due to ATI driver release issue, so PowerDirector 7 will support this feature on 2009 Q1.
Sorry for all the inconvenience you've been caused.
We sincerely apologize for any inconvenience you have been caused.
If you have any further question, please feel free to contact us.
Customer Support Dept.
Cyberlink Corp.
LoRd_MuldeR
22nd December 2008, 16:19
I found an article from Hexus.net (http://www.hexus.net/content/item.php?item=11725), where they were able to test GPGPU with a Radeon card and a Adobe Premiere Plugin. Performance boost versus CPU only according up to 4x:
"4x speed" says exactly nothing! :rolleyes:
Everybody can talk about speed-up numbers and show fancy diagrams. But how does the quality compare to x264 at the same bitrate ???
Unless they offer a proper quality comparison, we must assume they don't show any comparison, because the quality is crap...
sacharja
22nd December 2008, 18:05
"4x speed" says exactly nothing! :rolleyes:
Everybody can talk about speed-up numbers and show fancy diagrams. But how does the quality compare to x264 at the same bitrate ?
Sure, but according to your own post ("We are yet to see a GPU-based encoder that runs only 2x faster than the CPU-based one and provides equal quality at the same bitrate.") I assume that 4x is a realistic statement and not so ridiculous as the 19x of AVIVO which seems to be achieved just by crippling the quality. I'm no GPGPU developer and haven't tried it out, but according to this value I think that the quality of PowerDirector will be somewhat comparable to x264.
Nevertheless I think GPGPU is a good technique. If it's implemented correctly the quality can be even better without affecting speed. Imagine that not just the motion estimation can be offloaded but also color correction, deinterlacing etc. etc..
Dark Shikari
22nd December 2008, 19:00
Sure, but according to your own post ("We are yet to see a GPU-based encoder that runs only 2x faster than the CPU-based one and provides equal quality at the same bitrate.") I assume that 4x is a realistic statement and not so ridiculous as the 19x of AVIVO which seems to be achieved just by crippling the quality. I'm no GPGPU developer and haven't tried it out, but according to this value I think that the quality of PowerDirector will be somewhat comparable to x264."Comparable speed to x264" is not an extraordinary claim. "Comparable quality to x264" most definitely is.
LoRd_MuldeR
22nd December 2008, 20:34
Nevertheless I think GPGPU is a good technique. If it's implemented correctly the quality can be even better without affecting speed.
It would be enough if the quality of a GPGPU encoder is equal to the current state-of-the-art CPU encoders while the speed is significant faster. It would also be enough if the quality (at a given bitrate) is significant better, while speed is equal to the current CPU encoders. But the problem is: We are yet to see a GPGPU encoder that does achieve this...
deekey777
22nd December 2008, 21:56
Is Badaboom 1.1 (Main Profile and CABAC) so bad?
LoRd_MuldeR
22nd December 2008, 22:49
Is Badaboom 1.1 (Main Profile and CABAC) so bad?
Compare it to x264 at the same bitrate (2-Pass mode) and you'll know. And pick a mid-range bitrate. Not a high bitrate where both encoders look transparent ;)
CruNcher
23rd December 2008, 02:43
Yep but you say it it depends also heavily on your usage scenario and your system :) older systems can get a big boost with GPU support doing several things with it, but the higher you go the lower the advantage gets and the rest that stays is the Energy Efficiency of both and still GPUs are far away from being that Energy Efficient except their Internal Video Decoding Logic's like VP2 and UVD2 currently :)
Though imho GPU Parallelism and even CPU are still in it's Kid shoes and people will get more and more advanced with it over times and create nice clean efficient applications with it like we have today running on Singelcore CPUs (and even there it looks dark for the new Generations as many don't concentrate these days on low level optimization anymore but more and more use stuff like .Net and Java i see a big risk in that anyway just my thought's) :)
There are very interesting Companies forming that are into GPU R&D and its own unique field of expertise, that market grows daily :)
MfA
26th December 2008, 12:54
GPU shaders are fine for what they are designed for, SPMD with coherently branching programs performing floating point computations (no 4x8 or 2x16 bit SIMD). Also with G80 and 48xx you get scatter/gather into local cache at much greater bandwidths than desktop CPUs.
GPUs have great strengths ... it's just that normal video encoding doesn't need any of them, and at what video coding does need they aren't strong.
The best immediate use for GPUs in encoding at the moment would be for a pre-pass motion estimation for x264 (the existing ME in x264 might not fit the GPU, but something like hierarchical ME can be easily implemented on GPUs and should be sufficient for a pre-pass). Improving x264 encoding is not something either NVIDIA or AMD are really looking to do (off line encoding is a niche market and it would piss off other market partners) so we get stuff like Badaboom and Avivo, which try to have the GPU do more than it's suited for.
An exhaustive search for semi-optimal encoding parameters among a fixed size set of MVs/quantizers/etc specified by the CPU is also reasonably well suited to the GPU (locally optimal given the coding parameters from the previous iteration of the frame, or even previous iteration of the video if you want to take temporal effects into account, the first iteration would come from the CPU). A lot more work and will end up with very slow encoding modes.
CruNcher
26th December 2008, 21:08
You absolutely right, though especially ETI does a lot of effort (investment) to make Encoding as best as Possible on the GPU but sure it's not suited for that Generation 100% yet, but it can enhance the whole process especially in the post processing field as a supporter for the CPU i think about things like Realtime Motion Adaptive Deinterlacing, Denoising and Super Resolution there the GPU does allready a nice job :)
Nvidias Deinterlacer is all ready more efficient then Yadif and much faster, with it realtime HD Deinterlacing to 60p is really a easy task :)
and combined with the UVD2/VP2 logic (Decoding) before those Shader Power (Post processing) it's really a powerful fast and compared to Software Encoding better Energy Value Encoding for the average Consumer.
Even 4 cores doing this whole chain in the same quality which from the Deinterlacing side would need something more advanced then Yadif (MCBob or something else using Needi2, which would result automatically in a immense slowdown currently not really thinkable of doing all this 1080i->p in Realtime in a Software environment even with 4 cores.
This is my current Encoding Chain and it works marvelous :) on my relatively old System (X2 Toledo) and gives me nice results in much faster time, then what a entire software chain would be capable of.
VP2 Shader Dual Core
Decoding (GPU/DSP) -> Post Processing (GPU) -> Encoding (CPU)
The most time win comes from the Shader Processing and the almost 0 utilization of the CPU @ Decoding, the rest is free for the Encoding task
There is a lot of cpu cycles saved that can be invested fully into the Encoding task :) (either Speed/Quality or Balanced)
Sure FPGA solutions and such are more efficient but don't forget we talking about the end Consumer sure thing soon especially in Netbooks such transcoding tasks will be more and more done in FPGA solutions but for the Average Desktop it's not here yet and the quality is still questionable (Depends on the Encoder implementation) :) (and not influenced by the Consumer)
deekey777
10th January 2009, 15:32
CyberLink PowerDirector 7 Optimized for ATI Stream Technology from AMD (http://www.cyberlink.com/eng/press_room/view_1997.html)
The First Consumer Video Editing Software Optimized for ATI Stream Encoder
Taipei, Taiwan—January 10, 2009—CyberLink Corp. (5203.TW), innovative solution provider for the connected media lifestyle, launches today the latest PowerDirector 7 optimized for ATI Stream technology with accelerated MPEG2 and H.264 video encoding performance.
ATI Stream technology allows applications to take advantage of a graphic processing unit’s massive compute capability for demanding algorithms such as video editing. CyberLink PowerDirector 7 leverages ATI Stream technology to maximize transcoding performance when rendering MPEG2 and H.264 video files that users can output and view on the iPod®. PowerDirector 7 supports Stream-enabled ATI Radeon™ HD 4800 and ATI Radeon™ HD 4600 series graphic card solutions.
“PowerDirector 7 is a prime example of how ATI Stream technology can accelerate and streamline processing intensive PC tasks to improve the experience for consumers,” said Matt Skynner, vice president of product marketing, Graphics Products Group, AMD. “We are excited about the impressive results PowerDirector 7 is yielding with our innovative GPU technology and we applaud CyberLink’s commitment to deliver great HD video editing performance for video enthusiasts.”
"CyberLink is happy to collaborate with AMD, optimizing PowerDirector 7 as the first consumer video editing software with ATI Stream technology,” said Alice H. Chang, CEO of CyberLink. “With the growing popularity of video editing and AMD’s focus on developing advanced graphic and computing solutions, we felt it was a natural step to deliver accelerated software performance to allow users to spend more time on creating and sharing HD videos and less time waiting."
...
Product Availability
PowerDirector 7 optimized for ATI Stream technology will be available on CyberLink’s online store early February.
sacharja
11th January 2009, 12:18
Thanks, seems we have to wait till Feb
Product Availability
PowerDirector 7 optimized for ATI Stream technology will be available on CyberLink’s online store (http://www.cyberlink.com/prog/trial/index.do?locale=en_US) early February.
Ynatik
8th February 2009, 14:47
New patch build 2519 add Now Supports ATI Stream hardware video encoding
http://www.cyberlink.com/multi/download/patches_4_en_US.html
sacharja
9th February 2009, 22:06
Thanks, I tried the trial. Unfortunately I wasn't able to select "hardware video encoder" with my HD-4670. But I think I wouldn't use it even if I can profit from the 215% speed increment with Stream. PowerDirector offers almost no settings and you can only save the h.264 video as MPEG2-transport stream.
Hopefully there will be other programs with Stream support in the future.
tomos
9th February 2009, 22:15
so powerdirector is kinda pointless at the moment (unless you want to encode to mpeg2)?
CruNcher
10th February 2009, 00:00
Lol wtf "Now Suports Nvidia Cuda hardware Encoding" ehh wasn't the january update CES release already ? see my test of Nvidias Encoder, seems now it's real GPU i hope so gonna retest :)
@Tomos read what he wrote carefully it saves H.264 into transport stream .m2ts because it's mainly for AVCHD creation, you have some possibilities of settings but don't expect full control over the Encoder :)
deekey777
10th February 2009, 01:38
Thanks, I tried the trial. Unfortunately I wasn't able to select "hardware video encoder" with my HD-4670. But I think I wouldn't use it even if I can profit from the 215% speed increment with Stream. PowerDirector offers almost no settings and you can only save the h.264 video as MPEG2-transport stream.
Hopefully there will be other programs with Stream support in the future.
It looks like the trial version is older than the newest version, so it lacks AMD Stream support.
tomos
10th February 2009, 12:25
Lol wtf "Now Suports Nvidia Cuda hardware Encoding" ehh wasn't the january update CES release already ? see my test of Nvidias Encoder, seems now it's real GPU i hope so gonna retest :)
@Tomos read what he wrote carefully it saves H.264 into transport stream .m2ts because it's mainly for AVCHD creation, you have some possibilities of settings but don't expect full control over the Encoder :)
oops, you're right. i'm a 'tard.
i'm just hoping ATi put some effort into this. I haven't heard anything about them helping someone the way nvidia have been helping neuron2 with any probs on his CUDA AVC/VC1 decoder.
doesn't seem that they are as serious about stream as nvidia are about CUDA. :(
Sharktooth
10th February 2009, 16:32
they are serious, they just lack human resources.
sacharja
10th February 2009, 18:18
It looks like the trial version is older than the newest version, so it lacks AMD Stream support.
Yes, I found an explanation:
A trial is not available yet, while Nvidia customers are already able to download and install a modified trial. But a ATI stream demo will certainly follow the next few days
http://translate.google.com/translate?prev=hp&hl=de&u=http%3A%2F%2Fwww.computerbase.de%2Fnews%2Fhardware%2Fgrafikkarten%2F2009%2Ffebruar%2Fpowerdirector_ati_stream_cuda%2F&sl=de&tl=en&swap=1
deekey777
11th February 2009, 17:30
Note: in order to enjoy speed gains while transcoding you will need to install the ATI Avivo™ converter from the ATI website
It's a joke?
tomos
13th March 2009, 18:24
i just tried it again and the gpu assisted encoding option is greyed out - not installing avivo though. no way a program made years ago for the x1000 range is going to help with my 4850
next GPU upgrade for me will probably be nvidia unless ATi get their act together. as an ati user i cant even use neurons tools :(
Sharktooth
13th March 2009, 20:03
are you £$%&#! or what?
avivo is based on stream SDK and stream SDK works on 48xx series cards (too).
if it doesnt find a 48xx card or a supported one it reverts to software (CPU).
so, yes, avivo converter is GPU assisted thanks to stream SDK and if a software requires avivo converter that means it will relay on it to get the accelerated encoding.
sacharja
17th March 2009, 18:15
@Sharktooth
I think tomos meant the greyed out option in PowerDirector. I made the same experience. I would test it again, but I don't want to reinstall the necessary WMP.
BTW: Did you ever try Avivo with a HD card? During transcoding:
GPU load: 0%
CPU load: 100%
When you have watched my example movie that I posted at the beginning of the thread you know that the speed of AVIVO is just achieved by reducing quality drastically.
Sharktooth
18th March 2009, 02:23
it seems avivo GPU assisted encoding is broken with catalyst 9.2.
however it works correctly with catalyst 9.1.
Ajax_Undone
18th March 2009, 09:24
Until they trust their own encoders enough to compare them against x264 running on Nehalem (or at least Core2 Quad), we are far away from GPU encoders taking over the scene ;)
Just a question not really aiming for much but would it not be favorable to start building codecs that can use GPU logic along with CPU b4 they start getting good results... Or am I just out of my area of experience here and should shut up...:)
~DarC
Ajax_Undone
18th March 2009, 09:26
it seems avivo GPU assisted encoding is broken with catalyst 9.2.
however it works correctly with catalyst 9.1.
Whats with that? I was wondering about that my self.:confused:
Sharktooth
18th March 2009, 14:18
probably a bug or just a transition phase to new APIs.
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.