PDA

View Full Version : AviShader (Hardware assisted avisynth plugin)


Antitorgo
15th December 2004, 22:50
Hi folks, been a long time lurker.

I've been really interested in the prospect of doing hardware assisted plugin for AviSynth, and have been playing around with the ATI Video Shader demo that uses the pixel shaders on modern graphics cards to do video manipulation. The demo has some pretty cool HLSL files that do some processing of some sort, and even have one that will perform an FFT/iFFT and I was getting like 15fps on my laptop.

So, being that I'm a developer, and have a little bit of free time on my hands, I've started development on what I'm calling "AviShader". It is a plugin that I'm basing off of the SimpleSample plugin that Si put together (great job BTW), and the ATI Video Shader demo.

I'm wondering if anyone else has done any work in this area (I don't think so), and wanted some advise/input/thoughts from you folks on things you'd like to see, or things I should watch out for. I'm aware of the usual arguments that reading video memory is too slow etc, etc. However, I'm not sure how slow it really is. If I could get 5fps on something that does the same thing as IIP and uses very little of my CPU, that would be a huge improvement. Plus, with PCIX, I think the video card slow read issue isn't going to apply much longer (Look at the nVidia's TurboCache that the anounced today, it renders to system memory).

Another thought I had is that me, being an image/video processing amateur, could someone explain some of the more advanced filtering techniques so that I could try coding some of them in HLSL? Something like Dust (I know Steady isn't coming around much anymore, and Didée seems to have some idea of what it does), or Fizick's new very slow FFT denoiser, which looks cool but the paper he mentioned in his post on Motion Picture Restoration seems to not be there anymore. If there is already a thread like this, I couldn't find it using search/etc.

Anyway, I'm hoping to have a beta of "AviShader" out soon, maybe by XMas or New Years? It would be cool to have some HLSL or FX files that do something useful by then.

Fizick
15th December 2004, 23:40
Links:

Paper:
http://www.mee.tcd.ie/~ack/papers/a4ackphd.ps.gz

GPU:
http://forum.doom9.org/showthread.php?s=&threadid=85680&highlight=GPU

http://forum.doom9.org/showthread.php?s=&threadid=76161&highlight=GPU

http://forum.doom9.org/showthread.php?s=&threadid=72137&highlight=GPU

etc...

morsa
16th December 2004, 04:55
If I'm not worng I guess Shodan was into using GPU for something I don't remember :D

Leak
16th December 2004, 07:53
Originally posted by morsa
If I'm not worng I guess Shodan was into using GPU for something I don't remember :D

Dunno about sh0dan, but Avery Lee is using the GPU in VirtualDub for Bicubic resizing:

http://www.virtualdub.org/oldnews

Kurosu
16th December 2004, 11:33
I think TheJam79, whose page can't be accessed anymore, had written such a tool, called 'GPU'. He had implemented Convolution3D, a temporal smoother and some colorspace transform AFAIK. It was using DX9 and needed at least a Radeon 9500 IIRC.

Maybe I can dig up the sources, but it had never worked on my 9800, even when recompiled.

I remember a syntax of the like:
GPU_Start()
GPU_<function>
GPU_End()

Probably you could put several GPU function between Start() and End().

bill_baroud
16th December 2004, 14:09
I'm wondering if anyone else has done any work in this area (I don't think so)

well,i had the same thought as you some times ago and tried some things, but i never got something really working and my free time disappeared :scared:

i'm looking forward your progress ;)

(and btw, have a look to the nvidia Cg toolkit ... that's much nicer than asm to start playing with shader ;))

sh0dan
16th December 2004, 14:54
I've been toying with GPU-based video-processing earlier. It seems promising, but I don't see any good way of integrating it into AviSynth, so I basicly started from scratch on my tests.

For now I've tested upload and download speeds over the bus, and most performance issues arise from this. I've used DirectX9 as framework and cG for pixel shaders. DX9 is really great and does a good job. cG is a bit more flaky, with several compiler bugs already peeping out (though v1.3 mixed most of them).

* Pixel Shader 2.X (DX9) is needed for this to be useful.
* For now I use FP16 4:4:4 YUV.
* Uploading YV12 as three greyscale textures works quite nice.
* Downloading frames is a bit more tricky, as it will lock the CPU while the GPU is producing the image. This will also make SLI-supported processing harder. Buffering might be able to make this better.
* Interlaced processing is tricky, but possible with PS 2.X.
* Making framebased dicisions (like telecide for instance) is a rather big problem.

I've got a basic framework working, but nothing big this far. A bit more info on my blog (http://sh0dan.blogspot.com/).

Leak
16th December 2004, 17:20
Originally posted by sh0dan
* Making framebased dicisions (like telecide for instance) is a rather big problem.


How about just calculating the metrics that Telecide and Decomb (and TIVTCs equivalents) use on the GPU and reading those back? One metric per 32x32 block is a lot smaller than a full image and doing the final decision on the CPU is almost nothing compared to the metrics calculation...

Just braindumping here, though. :)

np: Autechre - Remix Of Spangle By Seefeel

Prettz
16th December 2004, 20:50
I have yet to learn anything about pixel shaders (been meaning to one of these days), but would PS 3.0 bring any further benefits, at least ones which would justify writing seperate routines for 3.0?

Soulhunter
16th December 2004, 21:56
A GPU accelerated ssxsharpen would be cool... :)

Guess the supersampling could be done much faster this way !?!


Bye

Antitorgo
16th December 2004, 23:01
Yes, a hardware accelerated ssxsharpen would be cool, but seeing as how I don't know all the specifics of how it works, I can only guess on it supersampling and doing maybe an unsharp mask? I dunno. That was one of the points to me starting this thread.

I have coded up an HLSL function that does a so called "smart" sharpen by doing an unsharp mask on sobelized output so that only the edges get sharpened. I thought it would be kinda interesting to apply the sharpening to the edges, and leave the gaussian blur on the rest. It was kinda cool when I was testing, I left the sobelized edges in there instead of the unsharp mask because it turned into a cartoonish type medium (would be cool for someone wanting to do non-photorealistic type stuff).

Anyway, as far as the comments on using cG, HLSL isn't the asm pixel shader stuff, and I am more familiar with it and I think that CG is an nVidia thing and since I have ATI... I might see about using FX which lets you specify some more rules around the HLSL you use in it, and I think it would be "free" in the since that MS makes it as easy to compile the FX file into a shader as it does the HLSL.

Beyond that, I was going to post links and maybe some screen caps of my progress, but I'll do that tonite when I have more time.

Soulhunter
16th December 2004, 23:29
Originally posted by Antitorgo
Yes, a hardware accelerated ssxsharpen would be cool, but seeing as how I don't know all the specifics of how it works, I can only guess on it supersampling and doing maybe an unsharp mask? I dunno. That was one of the points to me starting this thread.


Afaik it works like this...

LanczosResize(ox*4,oy*4).XSharpen(255,255).LanczosResize(ox,oy)


Maybe have a look here (http://www.interpolatethis.com/phpBB2/viewtopic.php?t=170), here (http://www.neuron2.net/xsharp.html), here (http://mf.creations.nl/avs/functions/SharpTools-v0.3.avs) and here (http://www.geocities.co.jp/SiliconValley-PaloAlto/2382/warpsharp_2003_1103.cab) !!!


Bye

AS
17th December 2004, 00:20
Actually the whole point of ssxsharpen is to sharpen everything, both edges and areas :)

Antitorgo
17th December 2004, 02:54
I've coded up XSharpen as a shader with hardcoded 255,255 values and straight comparison vs. luma comparison (I was trying to calculate the luma inline and saturated the registers, so to do true luma i'd have to do two seperate shaders with one rendering to a different texture for lookups, and I was too lazy so i settled a straight comparison as a quick compromise). The results look about the same as XSharpen in the static images, but I will compare actuals when I fix the screen capturing code. BTW, XSharpen on Shrek2 w/o supersampling looks like absolute crap. :p

So I just need to add supersampling code and see what happens... As my only way to test the HLSL right now is the super buggy Video Shader (which I'm trying to debuggify a bit), the built in screen caps are resizing to some odd size (the video being displayed was off as well, but I fixed that particular bug).

Anyway, with luck, I'll get some of the supersampling coded and have some results later tonite.

Soulhunter
17th December 2004, 04:28
Originally posted by Antitorgo

Anyway, with luck, I'll get some of the supersampling coded and have some results later tonite.

Nice... :)

Btw, what kind of resizing you gonna use ???


Bye

Antitorgo
17th December 2004, 19:32
Ugh... I need to work more on learning DX9 and dusting off my C++ skills, I've been pampered too much by .NET. Plus, this is a long rambling type post, so bear with me...

Anyway, I think I'm going to concentrate right now on getting a basic framework in place for the plugin, and then I can work more on the shader algorithms. I have some screencaps of Shrek2 with XSharpen and my modified smart sharpen (with my gausian on non-edges and the edge detection cranked up to a high threshold of .9 (90%)). The smart sharpen is actually damn impressive looking, and of course XSharpen looks like crap because either I implemented something wrong (possible), or it needs supersampling really bad. It seems that XSharpen on animation just introduces a crapload of aliasing, maybe it does better on natural images? Or maybe it is because I have it hardcoded to (255,255) type parameters (not really hardcoded, but left off for simplicity at the time and 255,255 is the "default" behavior).

Oh, and I thought of a good way to calculate Luma using less instructions (It is a dot product of two vectors, so I would imagine that it calculates luma a hell of a lot faster too being that the hardware likes doing dot products, I woudn't know offhand because it is so damn fast anyway). So the XSharpen uses true luma for all its calculations now.

I would upload the screenshots to my web site, but my hosting provider apparently screwed something up, because I can't upload files via FTP. So I guess I'll post later when I get some files uploaded.

As far as the resizing (interpolation), I have no idea how lanczos works, I sorta know how bicubic and binormal work, but then again, I could just let the video card handle it (which was my initial thought). But then I did a search on interpolation algorithms to see how hard it would be to maybe to lanczos, and ran across a triangular interpolation algorithm which is custom tailored for 3d video cards in that you could map the triangles to a mesh and resize like crazy and it would be super incredibly fast. (Imagine almost free resizing to do some crazy supersampling (of course, memory becomes a concern here)). One of the amazing things about it, is setting up the mesh is pretty fast, and should be comparable to bicubic in terms of speed (I think, this is all theoretic on my part, and I could be totally wrong). Anyway, I will leave that for later I think, and just concentrate on the basics and go from there.

For now though, I am thinking that I will implement first with HLSL for the shader language, and then probably graduate to Effect (.FX) files so you could define multi-pass and multi-shader scenarios in one file (which would be really friggin awesome). The major hurdle though is getting the framework in place, and testing the speed of reads from the graphics card (I wish I had PCI-E)).

AS
17th December 2004, 20:15
Originally posted by Antitorgo
xSharpen looks like crap because either I implemented something wrong (possible), or it needs supersampling really bad.

No, you probably have done it right and getting the correct behaviour. Xsharpen(255,255) does need supersampling bad, which is why we use supersampling for it, to disguise the aliasing caused by xsharpen().

As for Lanczosresize(), it's a native avisynth filter, which you can get the source within avisynth, I believe.

Antitorgo
17th December 2004, 20:45
Yeah, I know lanczos is in AviSynth and I could go dig through the source code to find it. I hate to do that because it is like deciphering greek and translating it into Babylonian cunieform. I'd rather do English to Babylonian cunieform (heh).

For example, looking the XSharpen code, it took me a minute to figure out that he was bitshifting things around to get the RGB and store the luma in that A channel, then deciphering the fact that he used an unrolled loop to sample the 3x3 matrix. When a description of something like: take the RGB values in a 3x3 matrix centered on the target, find the max/min luma and use the luma closest to the center pixel as the new value. Would have been easier for me to code. Actually, I spend a couple of hours trying to calculate the luma correctly, and everyone seems to have slightly different opinions on the constants to multiple your RGB values by in order to get the Luma. But I digress...

I think what I want to do is see if I can't get the D3D environment set up, render a frame to it, and read it back all in an avisynth plugin. I think that getting to that point is going to be the biggest challenge. One of the difficulties is that the AVI Video Shader demo uses VMR9 to do all the decoding and stream serving, so I need to replace that, strip out all the code they have in there for setting up cube maps, textures etc etc for some of the more "advanced" features, and to be honest, I think I'm better off starting w/o that baggage and just using it for reference as needed. Kinda going to the KISS principle.

Soulhunter
17th December 2004, 21:51
Originally posted by Antitorgo

Yeah, I know lanczos is in AviSynth and I could go dig through the source code to find it. I hate to do that because it is like deciphering greek and translating it into Babylonian cunieform...

Not sure, but maybe this (http://www.binbooks.com/books/photo/i/l/57186AF8DE) could help ya... :)

EDIT: Or this (http://gcc.gnu.org/ml/gcc-bugs/1998-09/msg00330/lanczos.cpp.cc.gz) here ???


Bye

Antitorgo
18th December 2004, 07:46
Okay, finally was able to upload to my web server. Here's some screen caps of the raw, "smart" sharpen, and XSharpen (w/o supersampling). It isn't exactly the same frame, because Video Shader doesn't have frame level control, but I got them as close to the same as possible. (Sorry, I'll hopefully have better tonite).

I wanted to point out that Default Shader does nothing, so it is the raw image

http://www.blosser.org/d9/VideoShaderSnapShot013.png
http://www.blosser.org/d9/VideoShaderSnapShot014.png
http://www.blosser.org/d9/VideoShaderSnapShot015.png

I'd be interested to see what comments anyone has. I think the smart sharpen looks a heck of a lot better than the original, and you can tell by the file sizes that it would seem it compresses quite a bit less than the original.

As far as the supersampling for XSharpen... I've been thinking about how to do this, and the biggest issue that I'm running into is that I don't think I can cheat and get a "quick" example by using the hardware to resize in one pass, I'd have to do multiple passes on it, feeding the textures back in. Maybe I can size it up, take a screen cap, and then resize in photoshop as a quick example? Anyone actually interested? I'm kinda curious just to see what happens to performance if I resize 4x... So curious, I just went and hacked it in, and a 4x XSharpen runs at ~8fps, the 4x SmartSharpen ran at ~6fps... Note that this isn't doing any copying back to system memory yet, so speed might be quite slower. This makes me think that if I code the resize as a "mapped" function within HLSL, it would be the more optimal solution... I'll have to think about it...

I've written up a quick little app in C# to figure out how to set up all the D3D rendering, and have it all working with the exception of getting the rendered image back to system memory so I can save it off to a file or whatever. With any luck, I'll have that done tonight, and I'll post the source code up on my server, along with the smart sharpen and xsharpen HLSL shaders so people can play around with it and see if it'll work on their hardware or not. Oddly enough, I was writing part of it on my work computer, which has an onboard Intel video solution, and it says it supported pixel shader 2_0 (PS_2_0) and it did actually render everything correctly (I'm not sure what the speed situation was though, since it is limited to a static image for now).

Anyway, next step will be to take what I've learned on the D3D stuff, and write it in C++ and write it as an AviSynth plugin. Overall, it is pretty cool.

Oh, and lastly, I've thought more about my triangular resize using a mesh, and I'm liking it even more and more. I think I'll try coding something up to see if it works like I think it will, and then I'll make a post with neat graphics on how it works, and why it is better than other interpolations...

Antitorgo
18th December 2004, 09:24
Wow... the screen cap stuff built into the Video Shader was way off... Maybe it was resizing or something. Anyway, I've finished my prototype of the D3D shader renderer in C#. For those that want to play with it, here's the link CShader (http://www.blosser.org/d9/CShader.zip). There's lots of hardcoded stuff in there, because I wanted to just test functionality. I've also included my HLSL for the SmartShader and XSharpen. But it'll load img.png from the current directory, and shader.hlsl, and size the window the the image size. Press the 'S' key to apply the shader, and the 'D' key to dump it to a file named CShader.png, 'Esc' quits. The url for the source is here (http://www.blosser.org/d9/CShaderSrc.zip).

And now for the new improved screen shots (note, the white spots are artifacts of the way I did the screen dump apparently, they don't appear onscreen).

Original
http://www.blosser.org/d9/ShrekFrame60vob2.png


SmartSharpen (notice the ringing along the neckline, maybe it needs to be tuned down a little bit)
http://www.blosser.org/d9/ShrekFrame60vob2Smartsharp.png


XSharpen
http://www.blosser.org/d9/ShrekFrame60vob2XSharpen.png

Let me know if you try CShader and it works on your hardware with the default shader.hlsl (That is the smart sharpen)...

I'm kinda tired, so hopefully everything will work and my post makes some sense... I have a burst of energy and desperately want to go code the AviSynth plugin now! Maybe it'll be tonight? Nevermind, it is too late to be thinking about D3D in C++, too much extra crap to code for it.

AS
18th December 2004, 12:29
It failed for me... Radeon 9600 pro with catalyse driver 3.6, DX 9b

Message:

The application failed to initialize properly (0xc0000135). Click on OK to terminate.

Antitorgo
18th December 2004, 16:33
Hmm... I'm sorry, I didn't realize that there were dependencies that aren't even in the latest d/l for DX9.0c. It is from the December DX9SDK update. I should mention that folks need DX9, and .NET 1.1 framework installed to run this (in case you didn't realize that).

Anyway, if you could do me a favor and try to redownload and run it again, I've included the DLLs it needs. If that doesn't work, then just d/l this redist of the managed directx stuff from my site located here (http://www.blosser.org/d9/mdxredist.msi).

Please let me know if it doesn't work for you. Thx.

Soulhunter
18th December 2004, 18:50
Hmm, Im not even able to decompress the zip !!!

Downloaded it 5 times or so... :(

Seems its b0rked !?!


Bye

Antitorgo
18th December 2004, 22:01
Hmm... I've tried downloading it a few times now, and I've had no problem. Anyone else having a problem?

bill_baroud
18th December 2004, 22:44
Source file is ok (which was interesting me ;)). Binary file is corrupted here, i can't extract the last 4 files of the zip.

Antitorgo
19th December 2004, 00:37
Recompressed and re-upped. Please try it again. I used WinRar the first time, and added some files after the fact, so maybe that was it. I used the Windows compression this time.

Soulhunter
19th December 2004, 22:32
Seems its still b0rked... :(


Bye

Antitorgo
19th December 2004, 23:34
Ah... weird, I think my ISP is munging zips or something. I cleared out my cache, and reproduced the problem on my box. Then, I renamed the file to .ZAP on the server, and downloaded it, renamed it back to .Zip and it decompressed fine... Anyway, I decided to just screw ZIP compatibility and uploaded it as a RAR. I tested this one, and hopefully it works for everyone else! Link here (http://www.blosser.org/d9/CShader.rar).

Sorry about all that. I tried d/ling it with no problem until I realized that it must have been in my cache still...

My kid walked up and hit a key that submitted it while I was typing.

Soulhunter
20th December 2004, 03:06
Nice, finally it works !!!

But it produces some sort of blocking/smoothing with custom pics...

Source
http://img88.exs.cx/img88/3787/img5js.png

CShader
http://img88.exs.cx/img88/4725/feed2ql.png

With the original picture everything looks fine !!!

Btw, Im using a GF6600GT... ;)


Bye

Antitorgo
20th December 2004, 03:29
I'm guessing the second pic is without you hitting 'S' to turn on the shader? Because the 'default' shader will smooth non-edges and sharpen edges. I think the latest version I posted I had tuned the sobelized edges algorithm down to like 1/2 what it was... on my laptop's screen, I can't tell much by eye, so I loaded them into PS and did a difference...

Three thoughts come to mind as to why there is a difference:
1) Antialiasing on the card (I'll have to load this on my ATI and see what happens, because nVidia does "real" antialiasing, and ATI does "edge" antialiasing... Okay, I did it on my ATI card, and there are differences still, but different differences, if that makes sense...

2) The way this gets rendered is that the image is textured onto a flat plane, and then it gets rendered. So it is possible that the geometry is a little bit off somewhere. Possibly anisotropic filtering? It shouldn't be, but alas...

3) Something is off with the screen capture. I'm not sure what offhand, I'm thinking maybe colorspaces or something weird like that...

I'll have to play around with it some more.

I've gotten the basics down for my mesh resizing engine. I'll maybe have something later tonite.

Soulhunter
20th December 2004, 08:04
Originally posted by Antitorgo

I'm guessing the second pic is without you hitting 'S' to turn on the shader? Because the 'default' shader will smooth non-edges and sharpen edges...

You're right, I havent applied the shader @ the sample pic... ;)

But when I turn it on, I get exactly the same kind of artefacts !!!


Originally posted by Antitorgo

1) Antialiasing on the card (I'll have to load this on my ATI and see what happens, because nVidia does "real" antialiasing, and ATI does "edge" antialiasing... Okay, I did it on my ATI card, and there are differences still, but different differences, if that makes sense...

As ATI n' nVidia cards render stuff differently (in games), it makes sense !!!

Maybe someone else with a nVidia card could confirm the results I get...


Originally posted by Antitorgo

2) The way this gets rendered is that the image is textured onto a flat plane, and then it gets rendered. So it is possible that the geometry is a little bit off somewhere. Possibly anisotropic filtering? It shouldn't be, but alas...

Should I try to change the settings of my GFX ???

Maybe disabling anisotropic filtering or so...


Tia n' Bye

Antitorgo
20th December 2004, 10:37
Hmm... I uploaded a new version that seems to have fixed the problem. I turned off antialiasing, auto mipmap generation and fixed the pixel formats so that should all be the same... I'm too tired to go and figure out exactly which one was the problem... But please give it a try on your box and let me know the results, I want the image to be pure before shaders get run on it.

I will say that hopefully when I get my mesh generating scheme up and running, that it should eliminate any of the texture compression/filtering type issues, but it is kinda slow in that it generates a huge mesh (1 vertex for each pixel). Overall, the whole thing is quite annoying; in that the way it is all rendered is overkill. I just want to run a pixel shader on a 2D object, but to do it, you've got to set up a while 3D environment to get it done.

Good news on the speed tests:
Just from a loop of raw data, not reloading a texture etc etc...
A 720x480 frame looks like it is copying back into system memory in ~0.02s-0.03s (30-50fps) (on my laptop 1.7Ghz Pentium M with ATI Radeon 9600 mobility), I'd imagine a PCI-E card would be faster... with occasional hiccups up to 0.50s when I think the GC runs in .NET (this obviously wouldn't happen in the C++ code). In any case, I think it would be reasonable to expect pretty good results on the AviSynth plugin.

Soulhunter
20th December 2004, 12:24
Seems it still produces the same effect... :confused:


Bye

Antitorgo
22nd December 2004, 01:47
Wow... I totally reworked the entire codebase. It is MUCH cooler now, and I am hoping it fixed your problem (I'm not getting it on my Intel whatever chipset at work, or my 9600mobile).

New version being posted as soon as I get home tonight, because for some reason I can't upload to my FTP site from the office.

I've reworked it so it uses Effects (.fx) files versus plain old HLSL, and I've included a bunch of sample filters that I found on this site here (http://www.facewound.com/tutorials/shader1/) (I wanna play that game now actually) There's more cool info here (http://www.flipcode.com/cgi-bin/fcarticles.cgi?show=64217) .

Also, I added some code to test out the framerate. It samples capturing 50 frames to system memory and averages it all out over time. The larger the size, the longer it takes obviously... Some tests from my setup are 720x480 is ~50fps, at fullscreen ~10fps. I'd be interested in what other people get (especially if you have PCI-E!!!).

Antitorgo
22nd December 2004, 05:15
Okay. CShader version 0.1 is here: http://www.blosser.org/d9/CShader01.rar

Next I'm adding support for AVI/MPEG/etc...

Soulhunter
22nd December 2004, 08:07
Internal Server Error...


Bye

Antitorgo
22nd December 2004, 09:01
I don't know if it is that PHP virus going around or what, but my hosting company has been kinda sucky lately... I've figured out that for some reason, all the files named CShader*.* were throwing the 500 error. I renamed them to dlCShader*.* and it all is working now... Was frustrating because it CShader.rar was working fine before today...

New links for everything:
http://www.blosser.org/d9/dlCShader.rar (Version 0.0)
http://www.blosser.org/d9/dlCShaderSrc.zip (Version 0.0 Source)
http://www.blosser.org/d9/dlCShader01.rar (Version 0.1)

Video rendering coding is going pretty good, I have the code in, and am currently debugging. It is too buggy for me to put up for d/l anywhere yet. Good news is that on the simple shaders I've included on this latest release, the frame rate for capture is well above the 30fps that the test video played at.

After this, I'll work on putting this code into an AviSynth plugin, so we can do some "real" testing and get down to the real work of writing pixel shaders to do cool stuff that will run really fast in AviSynth. :D

Soulhunter
22nd December 2004, 09:32
Nice work... :)

The blocking/smoothing is gone !!!


Bye

easyfab
26th December 2004, 23:16
My system:
amd barton 2500+
asus a7n8x delux
ati 9200 se (directx 8)
windows xp sp2
directx 9.0c
.net 1.1

It works :) 34.04 fps max

but i can't see any effects in the list (only default) even if i open effect.fx.
If you perform frame rate test few times the memory usage go over 256 Mo.

I also see some tests of this gpu : XGI Volari V3 XT

http://www.club-3d.com/distri/productinfo/spec_vga.php?ordercode=CGX-V38TVD
This GPU support dx 9,Pixel Shaders 2.0 and some hardware accelaration (Cipher™ Video Processor) for a price under 50€.

This GPU interested me but can it support futur hardware encoding acceration and your futur avishader ?? will only ati and nvidia support it ? or all the dx9 GPU ?

Thanks for your work and sorry for my poor english

Soulhunter
27th December 2004, 00:23
Originally posted by easyfab

It works :) 34.04 fps max


I am able to get more than 100fps... :D

But yes, it eats a huge amount of my RAM !!!

http://img77.exs.cx/img77/2374/0011bz.png


Bye

easyfab
27th December 2004, 12:22
For my 34.04 fps, i think it's because my GPU (ati 9200 se) is only a directx 8 card.
I will try with alyssa images, my ati will surely be more motivated. :)

Soulhunter
27th December 2004, 15:23
Originally posted by easyfab

I will try with alyssa images, my ati will surely be more motivated. :)

I hope with "my ati" you meant your graphics card... :D

Seriously, I think Antitorgo should replace the old picture with Alyssa's (http://img158.exs.cx/img158/4573/sample0x10xd.png) !!!


Bye

Antitorgo
27th December 2004, 22:51
Hey all, I am back from my holiday vacation where I had zero internet access (scary).

Yeah, I just included the Lena pic because it is used a lot in the image processing world since it is a pretty crappy scan and is good to test denoising and all...

The memory allocation thing is because it is just allocating everything in .NET and the GC (garbage collection) has to run to reclaim all the memory. I didn't bother optimizing the memory stuff there because it is just a quick and dirty test to check performance of transfers to system memory.

Beyond that, I'm not sure why someone isn't seeing all the filters in the .FX file... Maybe post your version of DX9?

For video cards... It just needs to support pixel shader 1.0 minimum for very, very basic shaders. Better would be PS 2.0 support or of course, PS 3.0 support (nVidia is the only vendor with PS 3.0 support AFAIK). Anyway, my built-in video on my Intel motherboard that I have at work apparently has PS 2.0 support, so I'd imagine that other people should give it a try and see what happens... I suppose I could whip up some program to tell you which version of shader your card supports real quick...

I'm hoping to get back to work on some stuff tonite. Hopefully have a real plug-in soon.

Antitorgo
28th December 2004, 10:56
Merry X-Mas everyone:

http://www.blosser.org/d9/dlAviShader01.rar

version 0.1 Alpha

Usage:
clip.ConvertToRGB32().AviShader("effect filename", "technique to use")
Ex:
clip.ConvertToRGB32().AviShader("c:\effect.fx", "Default")

Some limitations are:
- Multipass isn't working yet (It wasn't in CShader either, but I'm 99.9% sure I know the reason)
- You need to convert to RGB32. I'm working on YV12 support, because I don't want to hurt performance. If you leave it in YV12 colorspace, it'll look really funky.
- No temporal support yet, I know how I want to do it though.
- No width/height constant support in the FX files yet, I want to add this so that the filter kernels are more accurate. (Ex: The sharpen kernel is just a guess)
- 900 million other things.

Please, please leave feedback on speeds etc. Plus, I need more ideas on cool filters to write!

Soulhunter
28th December 2004, 11:33
EDIT: After the silent update, everything works perfectly... :)


Bye

State of Mind
2nd January 2005, 10:45
I am speechless and simply dumbfounded (right word?) at the effort you have put into this (just by reading the thread) and how much of a passion you have for coding it. Anyway, is this effective on DV? What will it actually do to the video if I use the filter on it?

Dell Pentium 4, 3.0 GHz
512 DDR PC3200 333 MHz RAM
ATI Radeon 9800 Pro 128 MB
XP Home SP2 (so of course DX9.0C)

I eagerly await your response because judging by the screens of Angelina Jolie, and remembering some frames from my DV, the horizontal "unmatched lines" (the best way I can put it), the type of artifacts seem quite similar. Am I on home base or way out in left field? :confused:

State of Mind
4th January 2005, 06:09
Response? :rolleyes:

Antitorgo
4th January 2005, 07:26
Sorry, I've been busy working on my real job and putting in YV12 support...

I'm not sure what filtering you are talking about on Angelina Jolie, I think it was a side-effect of some bilinear resizing happening with my old architecture. So if you have a similar source, try doing a bilinear upsize/downsize or something and see if it softens your source pic up a bit. ;)

Anyway, there isn't any particular one filter that this plugin does, it does one of any number of filters that can be performed on the graphics card hardware, versus on the CPU. I am working on some personal filter ideas that I thought would be cool, that sharpen/blur edges (smart sharpen and smart blur), but none of those are ported to my new architecture yet (need to covert from HLSL to .FX).

For example, right now, I am able to do a 4x SSXSharpen process (using linear interpolation right now unfortunately), at 9fps versus the < 2fps you'd get using the SSXSharpen script. All using 50% CPU usage... This is also using sub-optimal code, I expect it to get closer to 4x at realtime when I finish my YV12 enhancements...

One of the things I'm trying to accomplish is to allow for more advanced filters to be performed faster by utilizing modern graphics cards. The number of filters that can be done are pretty much limitless, and I would love it, if any filter devs wanted to contact me and help me port some existing filters to use gfx hardware.

There are many advantages to doing things on the graphics card, things like huge lookup tables are less of an issue because texture memory is really fast. Also, graphics cards are much more capable of doing per-pixel processing, and I think it would be entirely feasable to do things like better than realtime bicubic/lanczos upsizing/resizing all on the graphics card (by using multitexturing and texture blending).

One thing that most like would never make it into the Avisynth plug-in, but could possible be in a stand-alone player would be utilizing the antialiasing hardware. This is because anti-aliasing only occurs on the frontbuffer of the graphics card, and access to the front buffer requires that a) it be rendered to screen, and b) is reeeeeeaaaalllly slow.

I'm hoping that I'll release a new version when I finish my YV12 optimization. I was going to wait until I got some kernel bases interpolation going (bicubic/lanczos on the card), but my brain doesn't want to work on that stuff right now.

Note: I do have the DDT image mesh algorithm semi-working, but it is really slow. The results look REALLY nice though. See:
2x DDT Resize (http://www.blosser.org/d9/CMesh2x.png)
4x DDT Resize (http://www.blosser.org/d9/CMesh4x.png)
8x DDT Resize (http://www.blosser.org/d9/CMesh8x.png)
For Comparison
2x Lanczos Resize (http://www.blosser.org/d9/aj2xlanczos.png)
4x Lanczos Resize (http://www.blosser.org/d9/aj4xlanczos.png)
8x Lanczos Resize (http://www.blosser.org/d9/aj8xlanczos.png)

One nice thing about the DDT resize is that 8x doesn't take any longer than 2x. As far as the graphics card is concerned, it is just a bunch of triangles, it doesn't care how big those triangles get. ;)

State of Mind
4th January 2005, 08:38
Brilliant ideas. The ability to do everything in AviSynth with your graphics card. Once you get a better/newer card, likely better filtering results, eh?
I compared the images and had to put my eyes an inch from the screen and look verrry closely to be the oh-so-smallest difference, but the difference was sharper quality with the Lancsoz resizes, though you have to be looking as close as I did to notice it.
My video card is an ATI Radeon 9800 Pro. I look forward to the progression of the development of your filter. Of, of, sheesh...

Cheers,
Jeremy

Antitorgo
4th January 2005, 10:24
One thing I really like about the DDT algorithm is that it helps on sharp angled edges. Her eyelashes start getting blocking w/ lanczos, with the DDT it keeps the edge. (Although, this DDT only does 45 degree angles, a more advanced algorithm would use multiple angles and make things even less blocky). [Edit: Also, I'd like to some day do the DDT with something besides linear interpolation on the triangles, doing lanczos on the triangles would be interesting, perhaps when I finish a lanczos pixel shader I can implement this and we can compare again.]

It actually has me thinking on a new way to do compression that is inherently non-blocky (its trianguly). But then again, I am half asleep as I type this so I could be totally, totally out in left field...

State of Mind
4th January 2005, 13:08
Your left field probably has greener grass than mine does. Hehehe.

708145
10th February 2005, 22:11
Great idea this HW assisted filtering!

It would be even greater if it worked for me but I posted that in the usage thread already.

Development related I just want to post a list of filters I usually use and that would profit from HW acceleration:

fft3dfilter
limitedsharpen
mftoon
warpsharp
lanczos4resize


Additionally a kind of spp filter would be great.

Don't get me wrong. These are just suggestions in case you don't know what to implement after SSX_sharpen :sly:

I'll watch the progress closely and am already considering a GPU upgrade.

bis besser,
Tobias

Antitorgo
10th February 2005, 22:31
I've already started work on some of these. Since ssxsharpen requires me to write lanczos as a filter, it is holding me up a bit...

fft3dfilter - I'm not too sure how feasible this is, but would be waaaay down on my list since it involves FFTs and my brain doesn't want to think about FFTs for a while (although someone could probably look at using some of the FFT stuff from the Mersenne folks www.mersenne.org who have a super-high performance hand tuned FFT that seems to kick major ass over all the other FFT algorithms out there).

LimitedSharpen - Already done and working (with unsharp mask), it really is waiting for me to fix some other bugs and add speed improvements (Especially the YV12 stuff).

mftoon - would be fairly easy to implement.

warpsharp - I dunno, is there source for this?

Lanczos - this is what I'm currently working on (for the ssx stuff), I have bicubic and Lanczos3 working right now, but there seems to be a major bug on ATI hardware (since someone tested it on their nVidia 6800 and that bug wasn't there). So until I write up a test program to send to ATI, I'm holding off a bit on any sort of release (plus it required some rejiggering that I forked off into a seperate filter for testing).

tsp
10th February 2005, 22:51
well I', nearly done with a GPU version of fft3dfilter. It's working somewhat now (only bt=1 is implementet at the moment and there is still some bug hunting left.) I'm getting about the same speed as the cpu version with an Asus GeForce 6800GT(this card has slower memory than an ordinary GT) and an athlon xp 2400 MHz, but I will try to increase the speed when I have a working version without to many bugs. The filter will requere Hardware dx 9.0 support.
So it seems as if most of the filter you are requesting is in the making.

708145
10th February 2005, 22:57
@tsp: Great :D

Originally posted by Antitorgo
I've already started work on some of these. Since ssxsharpen requires me to write lanczos as a filter, it is holding me up a bit...

fft3dfilter - I'm not too sure how feasible this is, but would be waaaay down on my list since it involves FFTs and my brain doesn't want to think about FFTs for a while (although someone could probably look at using some of the FFT stuff from the Mersenne folks www.mersenne.org who have a super-high performance hand tuned FFT that seems to kick major ass over all the other FFT algorithms out there).

LimitedSharpen - Already done and working (with unsharp mask), it really is waiting for me to fix some other bugs and add speed improvements (Especially the YV12 stuff).

mftoon - would be fairly easy to implement.

warpsharp - I dunno, is there source for this?

Lanczos - this is what I'm currently working on (for the ssx stuff), I have bicubic and Lanczos3 working right now, but there seems to be a major bug on ATI hardware (since someone tested it on their nVidia 6800 and that bug wasn't there). So until I write up a test program to send to ATI, I'm holding off a bit on any sort of release (plus it required some rejiggering that I forked off into a seperate filter for testing).

Good to hear that lanczos4, mftoon and Limitedsharpen are no big deal.

Can you do DCT in a shader? That would enable spp and SmoothD.

bis besser,
Tobias

tsp
10th February 2005, 23:04
if you can do a FFT in a shader you can also do a DCT.

Antitorgo
10th February 2005, 23:08
DCT is on the same order of magnitude of FFT... These are possible, but I'm not sure about them on two levels.

1) The time required
2) If any of the filters require precision, there could be a problem.

Actually, from the FFT front, ATI has a demo of an FFT/iFFT being done in a shader and it runs pretty fast (I want to say 15-20fps, but I could be way off).

TSP: You're using Brook aren't you? I'm a little curious as to how that is working out.

On another note, I think I've finally decided to do my YV12 as uploading 4 frames in a texture at once using one color channel per frame. I think this will be infinitely easier than me packing/unpacking bits, that just added too much complexity... Once this is done, I expect to see a decent speedup in framerate (although non-realtime filters will give a "jumpy" framerate). Plus, it gives the CPU things to do while the GPU is rendering and adds to my CPU/GPU parallelism.

tsp
10th February 2005, 23:22
Antitorgo: Yes I'm using brook and it works (sort of) well. It's easier to code but hard to debug. Also my FFT algorithm is fast enough to do ~32000 16x16 FFT and iFFT per sec.

tsp
11th February 2005, 11:06
would someone with an ati radeon 9500 or better please confirm that this (http://www.tsp.person.dk/fft3dgpu.zip) version of fft3dgpu produces strange artifacts. I suspect that they could be caused by a NVIDIA driver bug. Usage:
fft3dGPU(float sigma,float beta,int bw,int bh)
bw and bh should be a power of 2 (8,16,32,64,128,256 etc)
sigma and beta has the same meaning as in fft3dfilter
also this version defaults to bt=1.

TheJudge
11th February 2005, 15:54
This produces some kind of strange "line-displacement" on a radeon 9800.

used line: fft3dGPU(2,1,32,32)

image removed

JD

Antitorgo
11th February 2005, 16:29
Hmm... if I set bh=8 it all looks fine, anything else gives me the same banding that TheJudge is getting.

Also, I plugged it into my avs file I had set up for debugging AviShader because it was conveniently open, and I noticed that D3D9 was throwing some warnings like:
Direct3D9: (WARN) :Can not render to a render target that is also used as a texture. A render target was detected as bound, but couldn't detect if texture was actually used in rendering.
and:
Direct3D9: (WARN) :Device that was created without D3DCREATE_MULTITHREADED is being used by a thread other than the creation thread.

Both of which I've encountered before. The first being that you the drivers may not support rendering your render target to itself. I'm guessing that you don't control this and it is something in Brook...

The second is that you either need to create your D3D device as D3DCREATE_MULTITHREADED or you need to do your initialization in the first GetFrame() call vs. in the constructor since AviSynth apparently calls the constructor w/ one thread and GetFrame in a totally different one...

Anyways, neither of those seems to account for the banding problem. I think that is more likely to be a rounding error (I commonly get the rounding problem on NP2 textures) or you have your array index being calculated wrong or something...

tsp
11th February 2005, 17:15
Thanks Antitorgo you just solved the problem. After adding D3DCREATE_MULTITHREADED in brook and change the shader that was rendering to the source texture the lines disappeared. I have updated the file above so please try it and tell me about the speed compaired to fft3dfilter with bt=1 and the same bh and bw.

Antitorgo
11th February 2005, 17:55
Okay, tried it again and everything worked! Hooray.

Speed comparison was using:
fft3dFilter(2,1,0,32,32,1,2,0,true) - 7.0fps @ 100% CPU
fft3dGPU(2,1,32,32) - 6.0 fps @ 100% CPU

I noticed that fft3dGPU was still throwing off a bunch of render target warnings, but again, that might not affect anything...

I'm not sure if you are doing anything like pre-loading frames or anything like that, but you might want to look into it...

Also, just because internally, the graphics card doesn't support rendering to 8-bit targets, I think brook does packing/unpacking which slows things down, you might want to look at using my trick of loading 4 luma frames into a texture and letting the shader operate across all 4 at once (in a 32-bit texture), that should save the packing/unpacking operation, plus the added benefit of parellelizing the CPU/GPU a bit more with the CPU loading/unloading textures while things are rendering on the GPU.

tsp
11th February 2005, 18:22
Antitorgo i fixed the last rendertargets warnings. Also the filter does use all 4 channels. It works like this.
The cpu mirrors the border to process the hole frame(instead as only working on the center as fft3dfilter) then the 1 channel 8 bit image is uploaded to the GPU. The GPU then creates a 4 channels floating point texture containing both the image and the shiftet image multiplied with a factor to avoid borderartifacts. This texture is then FFT'ed (so both the shifted and non-shifted image is transformed in the same passes.) To calculate this FFT the filter does 2 bitreverse passes and log2(bw)+log2(bh) Butterfly passes with renders to two textures and log2(bw)+log2(bh) collect passes that combines the 2 textures to 1. Between the Horizontal FFT and vertical FFT there is 1 pass to convert the complex fft to real fft.
Then the resulting Texture is filtered in 1 pass and then iFFT'ed that requeres the same number of passes as the FFT. The last step is to add the two images in the texture to 1 4 channel 8 bit texture that are downloaded to the cpu. I think it would be possible to move the combine stage to the butterfly stage. This would maybe increase the speed some. But for now I need to add a 3D FFT to the filter for support for bt=2 and 3.

What should I look for in the brook sourcecode if I want to place some sleep commands to reduce cpu load. Is it before the textures are downloaded or after the shaders is done?

modsoul
25th May 2006, 14:19
hi all.
I have question. i read in this post.
Brilliant ideas. The ability to do everything in AviSynth with your graphics card. Once you get a better/newer card, likely better filtering results, eh?
I compared the images and had to put my eyes an inch from the screen and look verrry closely to be the oh-so-smallest difference, but the difference was sharper quality with the Lancsoz resizes, though you have to be looking as close as I did to notice it.
My video card is an ATI Radeon 9800 Pro. I look forward to the progression of the development of your filter. Of, of, sheesh...

that with this plugin we can do anything avisynth does in the gpu. Now my question is can i do de-interlacing using my gpu. this is the command i normally use to do de-interlace.
directshowSource("D:\organised\anime\fate stay night\mgs_complete\mgs2\MGS Sons Of Liberty CD2.avi", fps=25)
ConvertToYV12()
#interlacing
edeintted =last.AssumeBFF().SeparateFields().SelectEven().EEDI2(field=-1)
TDeint(order=0,full=false,edeint=edeintted)
Trim (500,0)

now i am not too good with avisynt. please tell me how i can do the de-interlacing on my gpu cuz currently its working at <10 fps.

i'll be happy to post my results..simply as a way of contributing back.
[edit]
it seems google has a nasty habit of digging up old threads.
is work still active on this. if so where. if not then ok.

tsp
25th May 2006, 23:04
Currently there are no way to run TDeint on the GPU. Most graphics card can do hardware accelerated deinterlacing(using directx Video Acceleration) but I don't know if there are any avisynth filters that takes advantages of this yet .

Fizick
25th May 2006, 23:15
Do you really think, that you can do EVERYTHING with this plugin? Funny man. :)

Your current script is a result of advanced algorithms of plugin writers (tritical), it can not be effective ported to GPU.

LigH
16th February 2011, 15:41
What a pity I missed this thread for years... so now:

http://www.blosser.org/d9/dlAviShader01.rar => HTTP Error 404.0 - Not Found

I hope the avishader_25_dll_20041228.zip hostet at WarpEnterprises (http://avisynth.org/warpenterprises/) is about the same?

Wilbert
16th February 2011, 20:01
Looking at the dates they are the same. It's a pity that the source is not included.

naoan
16th February 2011, 20:27
What a pity I missed this thread for years... so now:

http://www.blosser.org/d9/dlAviShader01.rar => HTTP Error 404.0 - Not Found

I hope the avishader_25_dll_20041228.zip hostet at WarpEnterprises (http://avisynth.org/warpenterprises/) is about the same?

found seemingly newer version of avishader (0.42) here : http://www.avisynth.info/?%A5%A2%A1%BC%A5%AB%A5%A4%A5%D6

it's in japanese, but you could search "avishader" and it'll be a direct link. :)

leeperry
16th February 2011, 20:27
that's the latest build I have(1/12/2005): http://www.mediafire.com/?i8ltgj0dt11si9j

it's a CPU hog, but it works wonders w/ this script: http://www.avsforum.com/avs-vb/showthread.php?t=912720

SubPixie
22nd November 2011, 19:03
Speaking about using external shaders, does anyone know how -if possible- I could adapt MAME CRT emulation HLSL Shader (http://forums.bannister.org/ubbthreads.php?ubb=showflat&Number=73673&page=1) to be used with AviShader ?

http://i44.tinypic.com/24osy75.png

I already tried something like :
AviShader("D:\Emu\MAME\hlsl\post.fx", "ScanMaskTechnique")

It works... I mean something is going on and the image is actually modified, it looks like a "burned out" image with a strong contrast... But I wonder how I could apply scanlines and shadow mask like in the above example... I quickly gave a look at the source code this afternoon at work, but I don't really HLSL language...

If anyone can help me, maybe...