View Full Version : 80% faster encoding? [hardware]
antdgar
18th August 2008, 02:56
Apparently the new software for ATI's 48** cards:
http://downloads.guru3d.com/ATI-Avivo-Xcode-pack-for-HD4800-series-download-1973.html
and the news story
http://xtreview.com/addcomment-id-5670-view-Four-threads-real-time-encoding-with-HD-4850.html
"ATI’s Avivo Video Converter can take a 30 minute recorded show, and convert it into a format playable by an iPod in less than 1 minutes. It can cut the conversion time by 80% or more. When compared to competitive solutions, the Avivo Video Converter easily beats them. This means you’ll be watching more, and waiting less."
They also claim to convert 1080p in real-time, "8x faster than the fastest intel core 2 duo"
"The most exciting of the new developments however has to be the implementation of faster than real time transcoding of 1080p videos. This leverages the parallel computing prowess of the 800 stream processing units on the GPU and is an additional AVT (Accelerated Video Transcoding) layer on top of ATI's Compute Abstraction Layer (CAL) that handles the GPGPU programming. Currently, only the latest CyberLink PowerDirector 7 is able to utilize the GPU for transcoding but ATI cites a 19x speed up compared to using the CPU for transcoding"
Has anybody with a new ATI card used this? And are their claims true about the H.264 encoding speeds?
My 4850 arrives on Tuesday :3
wyti
18th August 2008, 03:09
it's easy to get a high speed with an encoder designed for.
What quality give this encoder ?
antdgar
18th August 2008, 03:11
I don't know. I have not received my 4850 graphics card yet.
Sharktooth
18th August 2008, 03:15
that version should use the GPU for encoding too. the previous version didnt. however i really hope they improved the quality...
however anyone with a 48x0 AND Vista for testing? i cant do that right now since i dont have Vista...
Dark Shikari
18th August 2008, 03:40
The last marketing claim of "8x faster than a Core 2" I saw was crap, since it was actually slower (Badaboom). They're probably comparing to a rather slow encoder.
But more to the point, given that AVIVO is often considered a benchmark for awful encoding (used for "vs H.264" comparisons to make H.264 look bad), I'm not expecting much...
antdgar
18th August 2008, 03:43
haha,
I'll test it out on Tuesday.
Sharktooth
18th August 2008, 03:45
yeah... they said 8x faster... than what? and with what encoder?
the 48x0 have a huge processing power however we need to see some test results and output quality to judge though... otherwise it's all speculation...
antdgar
18th August 2008, 03:49
I believe it's similar technology as Nvidia's
http://www.youtube.com/watch?v=8C_Pj1Ep4nw
They showed that it was faster than 'itunes encoder'. They state the name in the video... I forgot it's proper name.
Sharktooth
18th August 2008, 03:53
@antdgar: uhm... dunno. however 48x0s are waaaay more powerfull than any nvidia GPUs for stream processing. the results may be wildly different.. but it will depend on the encoder implementation.
Dark Shikari
18th August 2008, 04:01
I believe it's similar technology as Nvidia's
http://www.youtube.com/watch?v=8C_Pj1Ep4nw
They showed that it was faster than 'itunes encoder'. They state the name in the video... I forgot it's proper name.The iTunes encoder is Apple's, which is widely known as one of the worst H.264 encoders in the entire world. :p
cogman
18th August 2008, 05:07
The iTunes encoder is Apple's, which is widely known as one of the worst H.264 encoders in the entire world. :p
Come now, Almost any encoder pales in comparison to x264. Ateme (sp?) and the Mainconcept encoder are the only ones that compare, and they are both commercial (This win goes to the GPL)
RunningSkittle
18th August 2008, 05:08
downlading cyberlink trial.
Will post results later.
What x264 options do you all want tested? Post command lines.
CruNcher
18th August 2008, 05:32
I doubt that this Xcodec release from june is accellerated, im useing it since then it's just a updated xcodec.dll (works also on XP you don't need Vista) (and it's working with the OLD gui on my Nvidia Card) also nobody ever said something about it on any forum that it works accellerated with his 4800 series card under Vista, i highly doub't Ati gonna release this for free without incorporating a 3rd party. it will come as a free Cyberlink PowerDirector 7 Ultra update very soon now, they released another part or the final of the 4800 Ruby Raytracing (Otoy) Tech Demo Video after the X2 release wich blasts Nvidias Medusa away http://www.youtube.com/watch?v=Wi5s4yP1Kpg im sure they makeing themselves ready for the big PR bang vs Nvidias recent Physx Pack Release and as part of that the Cyberlink AVT update will be released ;).
@RunningSkittle
It won't work jesus you will find a big Press announcement when the update gonna be released on AMDs and Cyberlinks Page and on every important Video GFX site on the net just look out for the words "Accelerated Video Transcoding (AVT)" Avivo has nothing todo with this, tough several Press guys allready used it but they are under NDA (impressive to read that it can do 4 1080i streams @ the same time in Parallel and Realtime and the i suggests they wan't to show off that it does the deinterlacing @ the same time i guess)
Dark Shikari
18th August 2008, 06:50
Come now, Almost any encoder pales in comparison to x264. Ateme (sp?) and the Mainconcept encoder are the only ones that compare, and they are both commercial (This win goes to the GPL)<cynic>Ateme and Mainconcept compare rather badly</cynic>
But more seriously, I'm not just saying in relation to x264; I'd rather use pretty much any encoder other than Apple's.
Snowknight26
18th August 2008, 06:56
I like how when you download it from Guru3D, the file name conatins 8_6 in reference to Catalyst 8.6.. almost 2 months old now. :\
G_M_C
18th August 2008, 09:09
I think GPGPU in general, and specifically on compute-intensive tasks is a very good development. But the encoders i see now, from botn Nv and Ati, are just too limited in options and quality to seriously consider using. They might be faster, but thats about the only thing they have going for them atm.
In the same "genre" there is intels "Larrabee"; That could also be a very good development for compute-intensive tasks, like encoding.
But in general i think that the encoders that are developed by Ati/Nv/Intel wont ever be as good as X264. There will offcourse be very specialized, and very expesive, commercial products.
But I think it will be up to the X264 community to port X264, or develop a hardware-assisted version1, because that will be the only way we (as community) will get a good quality encoder on hardware.
Dark Eiri
18th August 2008, 10:42
We'll have to see the quality, though. That Badaboom thing is truly awful in this point, and slower than most Core2Duos for SD content.
CruNcher
19th August 2008, 07:03
I think GPGPU in general, and specifically on compute-intensive tasks is a very good development. But the encoders i see now, from botn Nv and Ati, are just too limited in options and quality to seriously consider using. They might be faster, but thats about the only thing they have going for them atm.
In the same "genre" there is intels "Larrabee"; That could also be a very good development for compute-intensive tasks, like encoding.
But in general i think that the encoders that are developed by Ati/Nv/Intel wont ever be as good as X264. There will offcourse be very specialized, and very expesive, commercial products.
But I think it will be up to the X264 community to port X264, or develop a hardware-assisted version1, because that will be the only way we (as community) will get a good quality encoder on hardware.
No ATI GPU Solution has been released yet i doubt the Software Encoder will have anything todo with the GPU Encoder. The Question remains tough who's providing it AMD Research, Cyberlinks most used SDK Provider Mainconcept or does it really come from Cyberlink themselves ?
If it really is Cyberlinks own creation then i somehow doubt that it's based on the Mainconcept SDK so they maybe have completely written it from Scratch with the help from AMD/ATI. And this is also the Problem if you write a Encoder from Scratch and for the GPU you can't expect it to beat a Encoder that has been much longer in Development and Optimized and Fine tuned since years (X264). I think under this viewpoint the ETI (Startup) Encoder isn't that Bad, but i hope to see even better results from Veterans like ATI/Cyberlink (compared to ETI's Encoder) :D
G_M_C
19th August 2008, 11:50
No ATI GPU Solution has been released yet i doubt the Software Encoder will have anything todo with the GPU Encoder. The Question remains tough who's providing it AMD Research, Cyberlinks most used SDK Provider Mainconcept or does it really come from Cyberlink themselves ?
If it really is Cyberlinks own creation then i somehow doubt that it's based on the Mainconcept SDK so they maybe have completely written it from Scratch with the help from AMD/ATI. And this is also the Problem if you write a Encoder from Scratch and for the GPU you can't expect it to beat a Encoder that has been much longer in Development and Optimized and Fine tuned since years (X264). I think under this viewpoint the ETI (Startup) Encoder isn't that Bad, but i hope to see even better results from Veterans like ATI/Cyberlink (compared to ETI's Encoder) :D
I think the best solution we all can hope for is that Nv & Ati decide to help out the Open source community; Like Anand said, in his article about Badaboom
"... but it's a sad day when a video enthusiast has to look to Cyberlink to save the day. What both AMD and NVIDIA need to do is help the open source community and existing codec developers include GPU acceleration in their software today."
See also Anand's review (where he actually thrashes Badaboom) here: http://www.anandtech.com/video/showdoc.aspx?i=3374
I can only agree with him. Hopefully the future brings up something better, and hupefully the OS-community doesnt get left out !
Sharktooth
19th August 2008, 13:08
AMD/ATI is usually more committed to the OS community however this is at the beginning, so im sure we will have to wait...
antdgar
21st August 2008, 03:16
Ahh, we'll have to wait until Cyberlink releases their software
Blue_MiSfit
21st August 2008, 03:40
Apple's encoder is bad, but it's gotten a bit better, recently (at least the encoder you get with Compressor 3). I was pretty impressed, even though its slower than hell it didn't do nearly as badly as I remember.
~MiSfit
G_M_C
21st August 2008, 10:01
Ahh, we'll have to wait until Cyberlink releases their software
"....it's a sad day when a video enthusiast has to look to Cyberlink to save the day."
CruNcher
21st August 2008, 15:58
it actually means there are boundaries of OSS like heavy investment remember what Dark said it needs millions todo it ETI has 7 of them now and Cyberlink much more on the Bank ;)
Tough ETI doesn't use Cuda it seems for the Accelerated Decoding of the Stream they use the PureVideo Decoder directly wich means they must have granted access to it and that requires even not be OSS @ all currently. They send the Bitstream into the PV2 Decoder and the accellerated output then to their Cuda Encoder no OSS is able to utilize such a PV2 Framework yet (and most probably never will).
Sulik
21st August 2008, 18:17
Actually, I was just taking a look at the CUDA 2.0 Beta2 SDK, and it does provide VP decoding functionaly in the API, and there is even an open-source sample decoder (The VideoDecode sample). I would think someone could write a AVISynth source that uses that (seems like it would be easy since the HW does all the work - the source seems fairly simple)
CruNcher
21st August 2008, 19:56
Sulik that are indeed crazy good news finaly it seems to be open then, so even Mplayer and VLC could utilize it now :)
bond
21st August 2008, 20:17
now someone only needs to patch ffmpeg for it ;)
edison
22nd August 2008, 10:21
The NVIDIA HW decoder maybe does not support lossless h264 video.
G_M_C
22nd August 2008, 11:50
The NVIDIA HW decoder maybe does not support lossless h264 video.
That's Nv's problem; Actually, this discussion is about the ENcoding of things.
But your reply points to one of the problems; The development of a HW-assisted ENcoder is made more difficult by the fact that Ati/AMD's hardware works in totally different way than Nv's. It is made even more difficult because both are very secretive about their stuff.
Developing HW-assisted software in general would benifit from having some common programming language. There seems to be a slow movement to get such a language, but actual usabillity is still far off into the future.
My guess will be that Intel will eventually "win out"; Simply because they will focus on "general usabillity", but mainly because they are simply big enough and do not have to be so secretive about their hardware and Nv/Ati are now. Iwould not be surprised when in the further future Intel might also just bring Larrabee to us in a "many-core" solution; something like a 32-core larrabee paired up with a 4-core Nehalem, but all on 1 die.
All-in-all, Nv/Ati had better get their acts together and device some common C#-language that works on both their GPU's, or they will eventually get left out.
But thats my 0,02$ on HW-assisted programming as it stands now; And why HW-assisted x264 will not come in the near future..
Sulik
22nd August 2008, 18:11
The NVIDIA HW decoder maybe does not support lossless h264 video.
I don't think that's likely to change, no matter if it's NV, ATI or any other HW, since lossless is not part of Main/High profiles.
I for one don't really give a crap about lossless support (might as well use uncompressed YUV)
lucassp
23rd August 2008, 11:57
Actually, I was just taking a look at the CUDA 2.0 Beta2 SDK, and it does provide VP decoding functionaly in the API, and there is even an open-source sample decoder (The VideoDecode sample). I would think someone could write a AVISynth source that uses that (seems like it would be easy since the HW does all the work - the source seems fairly simple)
it would be really nice if neuron2 could add this feature to his great DGAVCIndex :)
kemuri-_9
23rd August 2008, 16:08
DGAVCIndex works from ffdshow tryouts' libavcodec, would just be easier to see it implemented there and he wouldn't have to worry about it for himself.
neuron2
24th August 2008, 18:57
DGAVCIndex works from ffdshow tryouts' libavcodec, would just be easier to see it implemented there and he wouldn't have to worry about it for himself. I'm disappointed with libavcodec and am looking for an alternative decoding engine. With the length of time we've been waiting for PAFF fixes, I think CUDA support is unlikely to arrive in our lifetime. I'm going to look into this.
LoRd_MuldeR
24th August 2008, 21:38
I'm disappointed with libavcodec and am looking for an alternative decoding engine.
Is there any other OpenSource decoder for H.264 in existence? :confused:
I don't think you are going to implement your own ^^
neuron2
24th August 2008, 21:53
Is there any other OpenSource decoder for H.264 in existence? :confused: We're talking about using the HW decoder on the video card via CUDA.
Dark Shikari
25th August 2008, 00:49
We're talking about using the HW decoder on the video card via CUDA.You're welcome to try to write one, but given my experience with CUDA, I'll warn you that its going to be much much much much harder than implementing an equivalent software decoder.
And given the amount of work that went into libavcodec, well...
Sulik
25th August 2008, 01:51
You're welcome to try to write one, but given my experience with CUDA, I'll warn you that its going to be much much much much harder than implementing an equivalent software decoder.
And given the amount of work that went into libavcodec, well...
CUDA 2.0 has a dedicated api that enables using the GPU's VP engine (not 3D) to perform MPEG-2 & H.264 decoding: no need to write complex cuda kernels, only need to use cuMemcpyDtoH to transfer the output back to system memory for regular CPU processing (with maybe some conversion, since it appears that the output of the HW decode is in NV12 format).
There is a "videodecode" sample in the SDK, and the whole decoding process is like 4-5 function calls per frame, followed by a cuda kernel to convert from NV12 to RGB (not needed if we want to keep everything in YUV).
LoRd_MuldeR
25th August 2008, 02:01
CUDA 2.0 has a dedicated api that enables using the GPU's VP engine (not 3D) to perform MPEG-2 & H.264 decoding: no need to write complex cuda kernels, only need to use cuMemcpyDtoH to transfer the output back to system memory for regular CPU processing (with maybe some conversion, since it appears that the output of the HW decode is in NV12 format).
There is a "videodecode" sample in the SDK, and the whole decoding process is like 4-5 function calls per frame, followed by a cuda kernel to convert from NV12 to RGB (not needed if we want to keep everything in YUV).
But I assume it has the same limitations that all the "Hardware" decoder have. So it's not really an alternative to libavcodec ...
neuron2
25th August 2008, 02:07
Well, I have just run the sample successfully, breaking at the decoded picture callback just for fun. I've looked at the code and see that I can remove their parser and push bitstream into the decoder and receive decoded pictures back. It operates just the way I am using libavcodec! It looks pretty simple to me.
So will the naysayers please be a little more specific about your misgivings? LoRd_MuldeR, what limitations are you referring to? And why do you think it's not an alternative to libavcodec for my application?
Dark Shikari
25th August 2008, 02:12
Well, I have just run the sample successfully, breaking at the decoded picture callback just for fun. It operates just the way I am using libavcodec! I've looked at the code and see that I can remove their parser and push bitstream into the decoder and receive decoded pictures back. It looks pretty simple to me.
So will the naysayers please be a little more specific about your misgivings? LoRd_MuldeR, what limitations are you referring to? And why do you think it's not an alternative to libavcodec?DXVA is rather restrictive in terms of its support for levels and profiles, but more importantly, last I heard it has some odd arbitrary restrictions involving numbers of B-frames/etc.
I assume that, since they use the same engine, CUDA's interface will have the same limitations as the DXVA acceleration.
LoRd_MuldeR
25th August 2008, 02:13
Well, I have just run the sample successfully, breaking at the decoded picture callback just for fun. It operates just the way I am using libavcodec! I've looked at the code and see that I can remove their parser and push bitstream into the decoder and receive decoded pictures back. It looks pretty simple to me.
So will the naysayers please be a little more specific about your misgivings? LoRd_MuldeR, what limitations are you referring to? And why do you think it's not an alternative to libavcodec?
Well, all those "Hardware Accelerated" H.264 decoders have pretty harsh limitations on the maximum number of reference frames and b-frames.
These use DXVA, I know. But since we talk about hardware limitations here, I have to assume that CUDA has the very same restrictions, unless they can be worked around somehow...
BTW: In case you switch to CUDA, could users without recent NVIDIA hardware still use your program in "Software" mode?
I assume that, since they use the same engine, CUDA's interface will have the same limitations as the DXVA acceleration.
That's pretty much what I thought. So CUDA can't serve as a replacement for libavcodec...
neuron2
25th August 2008, 02:21
I'm not talking about replacing libavcodec. I'm talking about offering an alternative engine for use in appropriate applications.
BTW: In case you switch to CUDA, could users without recent NVIDIA hardware still use your program in "Software" mode? I'm just experimenting with it right now. The problem for me is that I am blocked in certain things by bugs in libavcodec. I don't have an eternity of time to get up to speed on libavcodec internals and try to fix the issues, and we've been waiting a long time and are still waiting just for correct decoding of legal streams. Sure, there have been some fixes, but most of my broken PAFF streams remain broken with the latest code, not to mention that regressions break some other things in DGAVCDec, and I'd have to embark on a lengthy debug for those.
So, if I need correct decoding of PAFF, why not use an existing decode engine if possible, and especially if it performs much better.
To answer your question, if a correctly functioning libavcodec is ever forthcoming, I will support it. If I also can achieve correct decoding in some applications by using a GPU solution, then what's not to like about it? You click an option to choose the CUDA decoding engine, and if it works better you keep it, if not, you select libavcodec.
LoRd_MuldeR
25th August 2008, 02:28
Still my questions:
1. Do H.264 decoders based on CUDA have the same Profile/Level/B-Frame/Ref-Frame restrictions as the existing DXVA decoders? (I have to assume: Yes)
2. Will programs based on the CUDA SDK fall back to "Software" mode when there is no recent NVIDA hardware or will CUDA simply reject to work?
neuron2
25th August 2008, 02:31
1. Do H.264 decoders based on CUDA have the same Profile/Level/B-Frame/Ref-Frame restrictions as the existing DXVA decoders? (I have to assume: Yes) I don't know. I'll have to look into it. I do know that it played the samples I tried it with. Anyway, we all know that DivX will set the standards for all this. :) I assume they will have a profile for DXVA-like decoders.
2. Will programs based on the CUDA SDK fall back to "Software" mode when there is no recent NVIDA hardware or will CUDA simply reject to work? That's an implementation decision. As I said, probably there will be an option to select the decode engine, and if you try to select CUDA it would be rejected if the appropriate hardware were not present.
woah!
25th August 2008, 03:25
well i did a quick test using atixcoder and x264 using 1 pass, as thats what the xcoder is doing. from a 1080p clip down to 720x400.
heres the 2 encode results. xcoder did it in 13 secs and x264 in 43secs.
http://tinyurl.com/49ah8u
x264 settings were : -B 2580 --nf --no-cabac --subme 1 --no-chroma-me --partitions none --me dia --merange 8 --threads auto --thread-input --progress --no-psnr --no-ssim --output
i set xcoder to 4000 and it came out at 2580 bitrate, so i used the in x264. for a really quick and dirty encode this thing is very fast...
Ranguvar
25th August 2008, 04:01
Can you see if you can hit the same speed with x264, so we can make a better visual quality comparison?
Use --no-b-adapt (--b-adapt 0), --non-deterministic, and/or --merange 4.
woah!
25th August 2008, 04:22
Can you see if you can hit the same speed with x264, so we can make a better visual quality comparison?
Use --no-b-adapt (--b-adapt 0), --non-deterministic, and/or --merange 4.
even with those settings added i cant gain 3x speedup :(
i seem to be getting only 40% of my 4 cores tho? if i could get the other 60% going to then i think x264 can get close to the speed. but before anyone says anything, the atixcoder is only using 56% of my cores, which means it could go 2x faster aswell.
i believe this is because of the 1080p source i am using, not dvd res stuff.
Ranguvar
25th August 2008, 05:01
Probably the reason (at least in x264) is the fact that frametype decision is not threaded yet (IIRC). This causes less-than-perfect CPU usage on very fast settings (fast first pass, for example).
Dark Shikari
25th August 2008, 05:07
Its probably because he's downscaling, and so the bottleneck is in the scaler, not in encoding, so x264 is not to blame here.
Also, he could get faster by using --scenecut=-1 and --no-dct-decimate.
woah!
25th August 2008, 05:28
ok now it gets interesting, at 1080p with no resizing, x264 walks all over atixcoder in speed. atixcoder only uses 60% cpu to get 36fps, while x264 uses 100% and gets 53fps.
--nf --no-cabac --non-deterministic --subme 1 --no-b-adapt --no-chroma-me --partitions none --me dia --merange 4 --threads auto --thread-input --progress --no-psnr --no-ssim --output
http://tinyurl.com/68e972
adding above DS --scenecut=-1 and --no-dct-decimate gained 1.3fps :)
CruNcher
25th August 2008, 05:39
Still my questions:
1. Do H.264 decoders based on CUDA have the same Profile/Level/B-Frame/Ref-Frame restrictions as the existing DXVA decoders? (I have to assume: Yes)
2. Will programs based on the CUDA SDK fall back to "Software" mode when there is no recent NVIDA hardware or will CUDA simply reject to work?
Nvidia balanced everything out for Blu-Ray,HD-DVD and Broadcast (so did ATI) playback where is the Problem with that i don't understand user issues with this.
It's pretty stable since all the filters and stuff adapted to Cyberlinks Decoder and Level 4.1 is really not that restrictive see DivX gonna use Level 4.0 and that is also not much more restrictive, tough if you get problems with B-pyramid i guess X264 is to blame for that not Nvidia and if you get VBV problems it's the same.
And for sure the restrictions here wont cause a Quality Problem with all your Encodes ever but if you as crazy and do stuff like --bframes 16 --ref 16 and more insane things that is just crazy and calling for trouble (do your really thrust X264 adaptive decissions that much over your own viewieng experience ?, hell i even don't thrust its partition decissions :P on a visual level, does that make me a fool or outsider in your eyes ?) I think you would be suprised how much speed or efficiency you just throw out of the Window for some sources (without any subjective visible difference) in the end (but yeah seeing some 100 pixels more in a frame by frame comparision makes you belive this stuff) but that you might invested a 50% speed lose for that small neglible difference per frame isnt of interest right?.
Imho Psy RD and Psy Trellis are the real advantages (especialy for Film Like sources) but they come @ a high price (RD without it is most of the times especialy @ high bitrates useless), tough @ lower complexity levels X264 has only VAQ to push it forward Visualy and that's what DivX are targeting for these lower complexity levels and there it will be a close call, with a better VBV implementation it could even end that alot of people become upset with X264 :( (i know the DivX Encoder isn't as far yet and still slower tough they are targeting this interoperability for Hardware Devices from the ground up,and they really be carefull in dont messing it up and still trying to gurantee a very good visual quality even with these restrictions upon them, the same situation btw that Ateme has to fight since years with).
@neuron2
Hmm Donald i guess you have no access over the API to Nvidias Shader based Motion Adaptive Deinterlacer right ?
neuron2
25th August 2008, 06:13
@neuron2
Hmm Donald i guess you have no access over the API to Nvidias Shader based Motion Adaptive Deinterlacer right ? Don't know. Do I?
G_M_C
25th August 2008, 09:59
[...] I think CUDA support is unlikely to arrive in our lifetime[...]
I concur, see also my post;
That's Nv's problem; Actually, this discussion is about the ENcoding of things.
But your reply points to one of the problems; The development of a HW-assisted ENcoder is made more difficult by the fact that Ati/AMD's hardware works in totally different way than Nv's. It is made even more difficult because both are very secretive about their stuff.
Developing HW-assisted software in general would benifit from having some common programming language. There seems to be a slow movement to get such a language, but actual usabillity is still far off into the future.
My guess will be that Intel will eventually "win out"; Simply because they will focus on "general usabillity", but mainly because they are simply big enough and do not have to be so secretive about their hardware and Nv/Ati are now. Iwould not be surprised when in the further future Intel might also just bring Larrabee to us in a "many-core" solution; something like a 32-core larrabee paired up with a 4-core Nehalem, but all on 1 die.
All-in-all, Nv/Ati had better get their acts together and device some common C#-language that works on both their GPU's, or they will eventually get left out.
But thats my 0,02$ on HW-assisted programming as it stands now; And why HW-assisted x264 will not come in the near future..
The big problem remains that it is virtually impossible to write any kind of HW-assisted software that is usable by everybody (so the same thing LoRd_MuldeR is concerned about). Simpy because there is no common programming language, no common ground in hardware, and no common way to compile these programms. So you need more than 1 version; 1 for every different kind of graphics-card, and if your "really in luck" (note: sarcasm) also for every generation of graphics board ...
There really needs to be a common language/compiler etc. to make this whole venture more feasable imho.
CruNcher
25th August 2008, 10:36
@G_M_C
ATI is moveing away from their CAL/METAL base to OpenCL, but i doubt Nvidia will skip their whole CUDA idea not in the distant future at least :D
Tough imho GPU gets more succesfull each day more Video stuff is coming out wich has GPU suport on the comercial side of things @ least, here something very interesting with GPU support (http://www.motiondsp.com/products/IkenaReveal) (and it's easy to see for what GPU ;) )
@neuron2
lol Do you ? :)
Gusar
25th August 2008, 12:20
There really needs to be a common language/compiler etc. to make this whole venture more feasable imho.Enter Intel Larrabee. It's x86. You can write plain old C code for it and compile it with your compiler of choice (MSVC, gcc, ...). CUDA is not the answer for hardware accelerated decoding/encoding (just read about Dark Shikari's adventures with it), Larrabee is - well, we'll see once it's released.
lucassp
25th August 2008, 12:53
You don't need large amounts of GPGPU/CUDA specific programming to access the VP2 Decoder. nVidia simply offered direct access to the VP2 Decoder through CUDA...and we can use CUDA freely.
Even though it's BluRay, HD DVD, DVB-S/T limited, it's still way faster than any software decoder.
G_M_C
25th August 2008, 12:59
Enter Intel Larrabee. It's x86. You can write plain old C code for it and compile it with your compiler of choice (MSVC, gcc, ...). CUDA is not the answer for hardware accelerated decoding/encoding (just read about Dark Shikari's adventures with it), Larrabee is - well, we'll see once it's released.
That's what i said & what think ;)
click (http://forum.doom9.org/showthread.php?p=1173558#post1173558)
You don't need large amounts of GPGPU/CUDA specific programming to access the VP2 Decoder. nVidia simply offered direct access to the VP2 Decoder through CUDA...and we can use CUDA freely.
Even though it's BluRay, HD DVD, DVB-S/T limited, it's still way faster than any software decoder.
That's nV, but what if you have a Ati card ... or a Intel onboard ? That's the problem; Interchangability !
lucassp
25th August 2008, 13:11
You can write plain old C code for it and compile it with your compiler of choice (MSVC, gcc, ...)
False. You still need a custom compiler. It has been discussed before.
@G_M_C: ATi should do the same thing. They should offer access to UVD through the FireStream SDK.
I don't think Intel has H264 decoding capabilities yet.
Sharktooth
25th August 2008, 13:33
a compiler with larrabee support or a custom compiler.
that means, nothing is "automatic"...
G_M_C
25th August 2008, 13:57
False. You still need a custom compiler. It has been discussed before.
a compiler with larrabee support or a custom compiler.
that means, nothing is "automatic"...
And how it that different from having to use CUDA or some other "GPU-specific" compiler/language ?
Larrabee might be on it best when used with a specific compiler, but it uses x86 code; And x86 code is much more general than CUDA/BROOKE whatever GPU-language. That why Larrabee will probably more eaily to use, there are simply many many more people with x86-knowledge, than there are people with CUDA-knowledge. And given Intels-trackrecord for making very decent (if not very good) compilers, you can expect a Larrabee to arrave togeter with its Larrabee-enabled compiler.
slavickas
25th August 2008, 14:16
err guys you have too much free time spaculating what Laterbe will be? (ok you can ;) be imho it has nothing to do with this thread)
somewhat on topic: anybody recently checked IPP's H.264 decoder feature wise? maybe it could be substitute for lavc , hey even quite likely it will get laterbee support whenever it gets released
lucassp
25th August 2008, 14:35
And how it that different from having to use CUDA or some other "GPU-specific" compiler/language ?
The VP2 is a stand alone part of the GPU silicon. The only way we (Open Source Community) can access it is by using the CUDA API. You have to use CUDA to send the stream to the GPU and to get the resulting image from there. You don't need to write any highly parallel/CUDA algorithm to do this.
G_M_C
25th August 2008, 14:44
The VP2 is a stand alone part of the GPU silicon. The only way we (Open Source Community) can access it is by using the CUDA API. You have to use CUDA to send the stream to the GPU and to get the resulting image from there. You don't need to write any highly parallel/CUDA algorithm to do this.
Yes, on nVidia-hardware. What if I have a Ati/AMD HD4870 ? CUDA wont work on that ... Thats the whole problem with GPU-assisted programms, you need to make more than 1 version.
neuron2
25th August 2008, 14:55
I don't have to make multiple versions. If someone wants to use my HW assisted Nv support, then they buy an Nv card. If not, they fall back to the SW decoding. I might *choose* to support other platforms, but I don't have to, just like I don't have to support Linux.
lucassp
25th August 2008, 15:06
Yes, on nVidia-hardware. What if I have a Ati/AMD HD4870 ? CUDA wont work on that ... Thats the whole problem with GPU-assisted programms, you need to make more than 1 version.
That's why I'm hoping ATi will include UVD access into its FireStream SDK or OpenCL.
For the moment even an CUDA/VP2 based AviSynth filter would be nice if it can get rid of libavcodec's problems.
Sergey A. Sablin
25th August 2008, 20:24
somewhat on topic: anybody recently checked IPP's H.264 decoder feature wise? maybe it could be substitute for lavc , hey even quite likely it will get laterbee support whenever it gets released
not sure about new high444 profile, but everything else seems to work. paff shall definitely work.
and it shall be no doubts that it will support larrabee.
Dark Shikari
25th August 2008, 20:30
not sure about new high444 profile, but everything else seems to work. paff shall definitely work.
and it shall be no doubts that it will support larrabee.It sounds like you've tested the IPP decoder--out of curiosity, how fast is it?
Sharktooth
25th August 2008, 20:41
@G_M_C: even more ppl know C/C++... and still have problems with CUDA...
the x86 thing is all fuzz. basically larrabee is nothing more than a stream processor. most pll will use C or other languages to access it thru the APIs...
Sergey A. Sablin
25th August 2008, 20:41
It sounds like you've tested the IPP decoder--out of curiosity, how fast is it?
last time it was half a year ago... so just from the top of my head - 15% slower than coreavc maybe (on the content I've used). and it was previous release of IPP (5.2 beta or something)
crypto
25th August 2008, 22:01
I don't have to make multiple versions. If someone wants to use my HW assisted Nv support, then they buy an Nv card. If not, they fall back to the SW decoding. I might *choose* to support other platforms, but I don't have to, just like I don't have to support Linux.
I like the idea. And it is alway good to have an alternative, if someone runs into problems with libavcodec.
edison
27th August 2008, 07:19
The whole thing of Larrabee is not just x86, it is "1 scalar x86 ALU + 1 SIMD16 ALU" for each core, you need to learn the new ISA extension: LRBni for achieve full performance. If not, you can get only 1/17(maybe lower this) performance of Larrabee .
Sulik
27th August 2008, 07:59
I have a feeling that Larrabee will end up much more similar to CUDA than to standard x86 development, and will have similar data flow restrictions and performance pitfalls.
Even though it does have the advantage of the familiar x86 ISA, most of the asm code developped on x86 today is there to take advantage of SIMD instructions, which will be quite different than SSEx on Larrabee (maybe why intel/MS are trying to push everyone to use intrinsics rather than inline assembly, even though it actually makes the code less readable - to me at least).
Though the generalized L1 cache vs shared memory might avoid some of the severe performance problems of CUDA on the current generation of GPUs, no one really knows how much CUDA perf will improve in the next generation of GPUs available by the time Larrabee hits the shelves.
MfA
27th August 2008, 17:40
From what Intel has presented it seems Larrabee won't even have a fast SAD instruction (or any fast byte wide SIMD instructions at all for that matter, everything is upgraded to 32 bit before processing).
canTsTop
29th August 2008, 20:33
Is it really GPU that much more powerful than CPU? :devil:
http://www.youtube.com/watch?v=fKK933KK6Gg
CruNcher
30th August 2008, 06:32
it just shows the difference imho between single thread serial processing (shoot,wait,shoot,wait) and multi thread parallel processing (shoot all at once) also a CPU can do that but it's correct the GPU has much more cores to shoot @ once with currently, tough as Dark_Shikari said it's currently with CUDA much harder to programm this (and manage it to be efficient) :) see the Patio Problem ETI had to solve with their H.264 encoder.
Would be very well compareable with the Size of the 2 Devices here the CPU being small and known since years how it works and the GPU Big heavy machine new uknown land for many and harder todo maintainence work on. I guess next time when those 2 guys are invited by Intel the thing looks different again then you have 2 of these bigo machines and other things count.
http://www.youtube.com/watch?v=lX24AGLyYU4
http://www.youtube.com/watch?v=U7R90A7ZqLU <- even more powerfull then the current PSP (SD H.264) in Video Tegra does HD H.264 720p
http://www.youtube.com/watch?v=0UQVHJ3UFaA <- aswell Gaming
http://www.youtube.com/watch?v=XXYshhuJzh4 <- Power Draw is amazing (ARM11 Core + GPU accelleration running on Windows Mobile thus currently) :)
http://www.youtube.com/watch?v=DeupVAMnvFA <- the whole industry freaks out
http://www.youtube.com/watch?v=TVs6As1_ZT4 <- 3D GPS this year Mobile Devices (like HD (3D) Phones) next year and HD PMPs end of this year :)
http://www.youtube.com/watch?v=OvBqtx43x90 <- im quiete sure it's gonna use Nvidias APX 2500 Platform maybe with Android on top of it :)
neuron2
6th September 2008, 04:44
Hmm Donald i guess you have no access over the API to Nvidias Shader based Motion Adaptive Deinterlacer right ? Actually, it turns out that I do. And to the 3:2 engine as well.
Ranguvar
6th September 2008, 04:56
Niiiice....
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.