Log in

View Full Version : CUDASynth Filters


Pages : [1] 2

eac3to_mod
7th February 2024, 17:36
Thread split from KNLMeansCL: OpenCL NLMeans de-noising algorithm [2018-01-29]:-https://forum.doom9.org/showthread.php?p=1997163#post1997163

DGDenoise from DGToolsNV is actually a NLMeans denoiser.
That's true but a major update is coming shortly with much better performance in both speed and quality, so I would wait for that.

kedautinh12
7th February 2024, 19:03
That's true but a major update is coming shortly with much better performance in both speed and quality, so I would wait for that.

Nice, thanks

ChaosKing
8th February 2024, 11:45
DGDenoise from DGToolsNV is actually a NLMeans denoiser.

It has no temporal denoising only spatial mode.

tormento
8th February 2024, 13:25
It has no temporal denoising only spatial mode.


The author has expressed the intention to add temporal mode too. For more info come to DGTools support forum.

ChaosKing
8th February 2024, 15:50
The author has expressed the intention to add temporal mode too. For more info come to DGTools support forum.

I asked about this like 5 years ago. Don't think it will happen soon.

eac3to_mod
8th February 2024, 15:51
Don't bet the farm on that. A lot happens in 5 years, or one day.

ChaosKing
8th February 2024, 16:52
Maybe 2024 will be the year, fingers crossed :)

eac3to_mod
8th February 2024, 16:58
Is there a temporal denoiser whose performance impresses you? We're deciding on an appropriate algorithm. We will do it within DGSource() and not require an extra filter as BM3D does.

ChaosKing
8th February 2024, 17:35
I would say I had the best results with:
1. MVTools in combination with a prefilter (prefilter can also be mvtools)
2. knlmeans with ref parameter
3. bm3d is bit harder to use, but can give impressive results then used in combination with mvtools

eac3to_mod
8th February 2024, 17:43
Please, what is "with ref parameter"? I don't see such an option for KNLMeansCL.

ChaosKing
8th February 2024, 17:48
Sorry it's called rclip aka reference clip https://github.com/Khanattila/KNLMeansCL/wiki/Filter-description#advanced

eac3to_mod
8th February 2024, 18:00
Thank you. And what would the reference clip be?

ChaosKing
8th February 2024, 18:17
A filtered clip. Can also be overfiltered.

Depends on how good/bad the spource is. wref is also interessting:

float wref [default: 1.0]
Amount of original pixel to contribute to the filter output, relative to the weight of the most similar pixel found.

clip rclip [default: not set]
Extra reference clip option to do weighting calculation.

LigH
9th February 2024, 20:37
:thanks:

pinterf and eac3to_mod and everyone keeping free software competitive.

LigH
9th February 2024, 21:00
Sounds like if(a = b) instead of if(a == b) ... happened so many times to me :rolleyes:

tormento
9th February 2024, 21:07
pinterf and eac3to_mod and everyone keeping free software competitive.
There are so many to thank.

ChaosKing
10th February 2024, 00:50
It crashes in vapoursynth / vsedit and avisynth also does not work for me - illegal instruction. I have a Ryzen 5600G + 3070TI

StainlessS
10th February 2024, 03:31
Blankclip
Info

shows avisynth determined capabilites
https://i.postimg.cc/QKDcrBQh/Flags.jpg (https://postimg.cc/QKDcrBQh)

eac3to_mod
10th February 2024, 03:41
Good idea. I will compare to Info() source code.

StainlessS
10th February 2024, 03:43
Prev image was for i7-8700, is similar for i7-12700T,
BUT, for
i5-11400T
https://i.postimg.cc/18zwPndr/flags-i5-11400-T.jpg (https://postimg.cc/18zwPndr)

EDIT: There is supposed to be some kind of hack available to switch on AVX512 for 12 gen i7/i9 CPU, may need to switch off E cores in BIOS firmware.
Switched off E cores in my firmware, no dice, flags same as for i7-8700 again.
EDIT: The AVX512 hack is permanently disabled for 12 gen i7/i9, by intel.

ChaosKing
10th February 2024, 09:14
I re-downloaded it. Same error in Vdub2:
Avisynth open failure:
System exception - Illegal Instruction

Even if I simply use dgsource() without any path.

tormento
10th February 2024, 09:53
There is supposed to be some kind of hack available to switch on AVX512 for 12 gen i7/i9 CPU, may need to switch off E cores in BIOS firmware
Just for the initial batches, later they laser cut the capability.

guest
10th February 2024, 11:28
Just a thought...

Would it be easier to create one for AVX/AVX2, and a separate one for AVX512 ??

There's others out there do this.

ChaosKing
10th February 2024, 12:00
Why not check how other plugins solved it? Like here https://github.com/sekrit-twc/zimg/blob/71431815950664f1e11b9ee4e5d4ba23d6d997f1/src/testapp/cpuinfoapp.cpp#L32 (I just randomly searched for avx2)
Or a simpler one here which uses some parts from zimg https://github.com/Irrational-Encoding-Wizardry/descale/blob/8c53f5d1297dee286e5a854ae5731103614a0583/src/x86/cpuinfo_x86.h#L26

tormento
10th February 2024, 12:00
To be fair, AVX512 is supported on latest AMD and on Xeon processors only, making it a niche instruction set.

I love Intel compiler but it has a steeper curve than MSVC. I have the best performance results with builds from it.

I would be sad to see AVX2 only, as older processors are the ones that benefit most from GPU offloading.

If you can't cope the challenge, I will get over it.

Unfortunately that will prevent a lot of people not only from using the newer CUDA filtering (as it was before) but also from updating even the decode part of DGSource and that would be pure evil decision.

guest
10th February 2024, 12:03
Sure, I could make separate DLLs but it's just not something I want to do.

My reasoning is that to get the best out of DGDecNV you're going to have a good modern GPU together with a good modern CPU. AVX2 has been around since 2013 and I don't feel like I need to help out luddites that stick to things like Win XP. The luddites can stick with DGDecNV build 251.

I do plan to move this intrinsics code to the GPU at some point but for now if y'all want anything it's going to require AVX2. I'll just wait for any more people to weigh in before I give an AVX2 test build.

EDIT: Ah I see you were more wanting to get the AVX512 support. I thought about that but haven't decided. Once I go down the route of multiple DLLs, I might as well include no-AVX/AVX2. But I prefer not to go there. I'd rather switch to the Intel compiler and do things right.

I realise that AVX/AVX2 has been around for ages, but to put it into a bit more perspective...

I have several older Ryzen's (3950X, 5900X, 5950X) that only support AVX2 at best, and I also have a gen 13 13900KF CPU which is only AVX2 (changed after Gen 12), however, I have a Ryzen 7950X, which does support AVX512.

So for me, it would be a no brainer to have 2 variants.

Just out of curiosity, what is your preferred OS ??

I actually had to Google "luddite's", I like it.

I need to mention, that whatever you do, I probably won't be able to use it with my preferred encoding app (tried it in the past, was too much work, and there were other issues that are out of my hands, it's just the way the app works), and I didn't use KNLMeansCL all that much either.

eac3to_mod
10th February 2024, 12:32
Win 10.

What is the encoding app and why is it too much work? What issues were out of your hands?

tormento
10th February 2024, 12:34
Win 10.
Could you please reup test1? It was working just fine.

tormento
10th February 2024, 12:41
I don't have a build for that as it has evolved into test 2, and I didn't keep the zip file.
Test2 isn't working :D

tormento
10th February 2024, 12:43
Use test 1 if you just want the 33% speed-up.


That is the reason I was asking you.

@all somebody kept that?

guest
10th February 2024, 12:44
Win 10.

What is the encoding app and why is it too much work? What issues were out of your hands?

Good to know, I'm on W11 23H2, I personally just don't understand why so many ppl are still on W7....

Read my "sig"...

It's a L-O-N-G story, but basically, you had to import a new job, then do a fair bit of manually editing, adding & redoing things to get it to encode, but then the best function of the app did not work, so it's just a big waste of time :(

eac3to_mod
10th February 2024, 12:52
It's a L-O-N-G story, but basically, you had to import a new job, then do a fair bit of manually editing, adding & redoing things to get it to encode, but then the best function of the app did not work, so it's just a big waste of time Well, golly gee, I'd be happy to work with the author to make things integrate more smoothly. It's just a source filter like all the rest, after all. Do you have any links describing the issues? What is the "best function" that did not work?

guest
10th February 2024, 13:16
Well, golly gee, I'd be happy to work with the author to make things integrate more smoothly. It's just a source filter like all the rest, after all. Do you have any links describing the issues? What is the "best function" that did not work?

Well, Curly, the author is, IMO, a "piece of work", he is very set in his ways, and is VERY reluctant to add options that are asked about, repeatedly.

I was involved in a customised build of the app, that added many more filtering options, and keeping it right up to date, but the main behind the scenes "gut's" of the app are untouchable, it's written on a pretty old platform (can't recall atm).

So if I recall, you had to import a job, then also run that same job thru DGIndex (or whatever you use to get the .dgi file), then manually edit some commands, and hope for the best.

But, the best function of the app is Distributed Encoding, and I'm pretty sure it would not initiate that, so it was just a single PC encode, instead of multiple PC's helping.

I do recall someone, saying that the nature of DGDecNV didn't lend itself to the way that DE worked, it might have been someone from your neck of the woods, DG, that is.

So you'd have to play around with the app for yourself, I'd say.

Like I said, it's a LONG sad story.

eac3to_mod
10th February 2024, 13:34
Thank you for the explanation. With the latest DGDecNV the index is automatically generated so at least that hassle should be avoidable. But if the author is unwilling to do anything to move into the modern era we'll just let it go.

tebasuna51
10th February 2024, 13:40
test3 in CPU with only AVX2 crash with:

cuMemAlloc() failed

with:
C:\Portable\Avs\AVSMeter64 test.avs -gpu

test.avs only:

dgsource("test.dgi")

than work without problems with other DGDecodeNV.dll versions

guest
10th February 2024, 13:44
Thank you for the explanation. With the latest DGDecNV the index is automatically generated so at least that hassle should be avoidable. But if the author is unwilling to do anything to move into the modern era we'll just let it go.

Sad, but he's the epitome of a "luddite", anything other than W7, is no good....

tormento
10th February 2024, 14:16
Test3 gives me

System exception - Illegal instruction

again.

eac3to_mod
10th February 2024, 14:36
@tebasuna51

What is your GPU and how much memory does it have?
What is the resolution of the video?

Also your post is not clear. Does it fail with AVSMeter64 without the -gpu option? Does it fail with the -gpu option? Does it fail in VirtualDub?

Hey, we're making progress. It's not crashing with illegal instructions, except for luddites.

BTW, I was able to make a build configuration for AVX512, so management of multiple DLLs will be possible without too much pain. We already do it for 64 versus 32 bit.

tebasuna51
10th February 2024, 14:42
Only a cheap NVIDIA GeForce GT 1030 with 2 Gb GDDR5
Video resolution 1920x1040

StainlessS
10th February 2024, 14:46
Quote from recent post in another D9 forum, a little related and maybe of interest.


Now for Intel it's more drastic. Their little cores for Alder Lake were upgraded to support avx2 and new instructions were added because they couldn't do avx512 anymore on the whole system. They're large p cores could, but they disabled them.
People wondered when they would return, well Intel wants to rework avx for hybrid processors, but it won't support avx512 for a long time and probably won't support current cpus

I've got a Lenovo ThinkCenter M70Q gen 2 Tiny, with i5-11400T (35W TDP) that claims AVX512 (at least partial).
Also, a Dell Optiplex 7000 Micro, with i7-12700T (35W TDP) that professes not to support AVX512 [8 P cores {+ 8 hyperthreaded}, 4 E cores].

I find below a little interesting but have not as yet investigated it.

Software support

Alder Lake requires special support from the operating system due to its relatively unusual-for-x86 hybrid nature. For software unable to be upgraded, a UEFI-provided compatibility mode may be used to disable the E cores; it is enabled by the user turning on scroll lock.[29]

This problem has been fixed in a microcode update. The P and E cores now return the same CPUID when both are enabled. A different CPUID is reported when E cores are disabled and only P cores are enabled. The AVX-512 instruction set extension is implemented in the P cores but disabled due to incompatibility with the E cores.[32] Hackers have shown that it is possible to enable the AVX-512 instructions on the P cores when the E cores are disabled and an old microcode version is used.[33]
https://en.wikipedia.org/wiki/Alder_Lake#Dies

When using MeGUI x64 to encode Avisynth script, I'm finding that the i7-12700T machine is basically using only the 4 E cores to encode
(well a few % of the 8 P cores [+ 8 hyperthreaded logical cores] is also used), thats got me a little perplexed.
I'll havta look into it a little more next time I use the i7-12700T machine [I can turn off any number of E cores or P cores in BIOS UEFI Firmware].

EDIT: I'm using Win10 on all machines [I really dont want W11].
EDIT: Long ago, I use to used a program called something like "Prio" to set default Processor Affinity & Priority for certain applications [Addon TAB for TaskManager].
EDIT: This seems to be it, not tested in W10. [EDIT: Supposed to be W7+ compatible].:- https://www.prnwatch.com/prio/
EDIT: This seems to be a similar app to prio, Process Lasso {There is a limited free version}:- https://bitsum.com/
EDIT: 6 Tools to Permanently Set Process Priority in Windows:- https://whatsoftware.com/permanently-set-process-priority-in-windows-task-manager-with-prio/

EDIT: Flags determined by Avisynth for i7-12700T with E cores disabled in BIOS(same as for i7-8700) : No change to flags whether E cores/Scroll-Lock enabled or disabled.

Blankclip
info
[clickMe]
https://i.postimg.cc/5HmN7n44/Flags-i7-12700-T.jpg (https://postimg.cc/5HmN7n44)

And for i5-11400T [Has Avx512 flags]
https://i.postimg.cc/18zwPndr/flags-i5-11400-T.jpg (https://postimg.cc/18zwPndr)

EDIT: i5-12500T does not have AVX512 either and as it does not have E cores, then does not seem to have any reason to not support AVX512 (unless its to avoid users buying i5 rather than i7/i9 to get AVX512 without the P/E cores hybrid incompatability problem).

guest
10th February 2024, 14:49
Poor old Curly... he's bending over backwards to get this to work, which it must be for him, on his system, then whenever others use it, it errors due to different OS's, snd dodgy cheap GPU's, etc.

eac3to_mod
10th February 2024, 15:08
@teba

That GPU should be fine. Please grab the download again as I rebuilt it completely just to be sure.

Fingers crossed!

tormento
10th February 2024, 15:16
I clearly said it requires AVX2. I may support AVX later.
Somewhere I read AVX2 and below and it stuck in my mind.
I found test1 on backup and re-uploaded it.

:thanks:

eac3to_mod
10th February 2024, 15:19
Poor old Curly... he's bending over backwards to get this to work, which it must be for him, on his system, then whenever others use it, it errors due to different OS's, snd dodgy cheap GPU's, etc. That's the thing, it works great for me, especially the AVX512 build. i7-11700K

I wouldn't call the 1030 dodgy but I get your point.

StainlessS
10th February 2024, 15:29
I did try on my Lenovo Tiny (i5-11400T) but got platform code 126 (missing requirement),
then realized that none of my 35W TDP machines have discrete GPU, duh.
Works in i7-8700 without probs.

eac3to_mod
10th February 2024, 15:34
Ooh, finally it works for someone else. Happy Day! :p
You scared the poop out of me with your first sentence. Did you do that on purpose? ;)

And thank you for the info you posted previously.

OK, I'll go ahead and make the per-CPU builds and make them available.

ChaosKing
10th February 2024, 15:48
Test3 works for me too. With dn_enable=2 I get a message box: avx512

StainlessS
10th February 2024, 15:54
Did you do that on purpose?
Nope, did not think of it.

How many people will hate me if I require AVX2?
If X people hate you, and Y people will hate you for AVX2 requirement, then would be X + Y people. :)
{EDIT: Assumes that X and Y are distinct sets}
Its your plug, do whatever you feel happiest with, improve as time permits.
The multitude of different instruction sets are a nightmare, dont forget MMX and same era instruction sets, horrible!

EDIT: The "platform code 126 (missing requirement)" thingy might be something else,
I'm not sure that the AVS setup on that machine is fully functional, not used it in a while and changed things about.
It though does not have GPU and so would fail anyways.
The 35W TDP machines are only babies, ~ 7" x 7" x 1.5", are low power, quiet, and with a bunch of them, can just let them go encode till they're done, and main i7-8700 machine free to do other stuff.

EDIT:
Test3 works for me too. With dn_enable=2 I get a message box: avx512
Snap, me gets dat too for i7-8700 {with GTX1070+8GB} without AVX512.

eac3to_mod
10th February 2024, 16:24
@ChaosKing

dn_enable=3, dn_quality="best" works pretty good with the sample you gave me back in early 2021, methinks. I'm a pack rat, I keep everything.

eac3to_mod
10th February 2024, 16:28
@teba

For the avx512, there are lots of different subflags, and different CPUs have different combos of them. I'll have to check specifically for the ones I need. Give me a while. For now grab the AVX2 DLL.

Your benchmarks look fishy. I'll try here. Also, bench with dn=0 just to see raw frame serving.

EDIT: Hey, where'd the benchmarks go?