Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 28th November 2023, 22:26   #21  |  Link
Argaricolm
Registered User
 
Join Date: Apr 2018
Posts: 23
Quote:
Originally Posted by Selur View Post
Nice! Thanks!
Did some quick tests, seems to work fine in Vapoursynth.
About color space support:
Would be cool if you could also add 10, 16, 32bit support, if possible.
(in detail: RGBS, RGBH, RGB48,RGB30, YUV420P10, YUV420P16, YUV420PS, YUV420PH, YUV444PH, YUV444PS, YUV444P10, YUV444P16)


Cu Selur
Redownload 1.11 release. I'v fixed it a little. It contained wrong check in avisynth version.
Next I will add RGB32 support.
Argaricolm is offline   Reply With Quote
Old 29th November 2023, 11:49   #22  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,277
Will do. Thanks!
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 2nd February 2024, 00:19   #23  |  Link
Argaricolm
Registered User
 
Join Date: Apr 2018
Posts: 23
A new update.
Added new mode 1 (other numbers changed).
Added RGB32. But only for avisynth (vapoursynth strangely does not support RGB32).
Argaricolm is offline   Reply With Quote
Old 3rd February 2024, 20:36   #24  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,277
Using the latest version in Vapoursynth, when using:
Code:
# adjusting color space from YUV420P8 to RGB24 for Softlight
clip = core.resize.Bicubic(clip=clip, format=vs.RGB24, matrix_in_s="470bg", range_s="limited")
# color adjustment using Softlight
clip = core.Argaricolm.Softlight(clip)
I get just a black output, while using:
Code:
# adjusting color space from YUV420P8 to YUV444P8 for Softlight
clip = core.resize.Bicubic(clip=clip, format=vs.YUV444P8, matrix_in_s="470bg", range_s="limited")
# color adjustment using Softlight
clip = core.Argaricolm.Softlight(clip)
works as expected.

Doesn't matter whether I use "CUDA 12.3/SoftLight.dll" or "CUDA 11.8/SoftLight.dll".
=> seems like v1.12 broke RGB24 support for Vapoursynth.


Cu Selur

Ps.: Do you prefer if I post such stuff here or on github?
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 14th February 2024, 10:58   #25  |  Link
Argaricolm
Registered User
 
Join Date: Apr 2018
Posts: 23
"Added RGB32. But only for avisynth (vapoursynth strangely does not support RGB32)."

There is no RGB24 support (so far).
And strangely vapoursynth does not support RGB32. That should be faster in memory because of 4 bytes addressing.
Argaricolm is offline   Reply With Quote
Old 14th February 2024, 20:08   #26  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,277
afaik. Vapoursynth handles alpha channels in a separate 'stream/clip'.
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 18th February 2024, 00:59   #27  |  Link
Argaricolm
Registered User
 
Join Date: Apr 2018
Posts: 23
Quote:
Originally Posted by Selur View Post
afaik. Vapoursynth handles alpha channels in a separate 'stream/clip'.
Well for RGB32 i just use R,G,B bytes and skip alpha one.
So it's just like I use RGB24 but in RGB32 adressing.
In avisynth memory 4th byte is automatically set to FF (255) when 24bit (8*3) content is converted to RGB32.

So it's not realy a correct RGB32 support.
Maybe I need to change it to RGB24.
Argaricolm is offline   Reply With Quote
Old 18th February 2024, 08:41   #28  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,277
Yeah, sound like it should be RGB24 not RGB32 if the alpha channel isn't used.
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 15th March 2024, 22:05   #29  |  Link
Argaricolm
Registered User
 
Join Date: Apr 2018
Posts: 23
Quote:
Originally Posted by Selur View Post
Yeah, sound like it should be RGB24 not RGB32 if the alpha channel isn't used.
New version 1.13.
Now RGB will work in Vapoursynth. It is RGB24 planar.
Also I'v updated CUDA toolkit to 12.4 version.
Argaricolm is offline   Reply With Quote
Old 16th March 2024, 13:48   #30  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,496
Am I missing something?

Code:
colorbars(pixel_type="rgb24", width = 3840, height = 2160).softlight
doesn't seem to do anything (same with a real image source). AvsMeter can't time the script because it's too fast, which kind of suggests the filter isn't doing anything. I tried it on two different computers (laptop and desktop) with Nvidia cards.

Does it just pass through the original clip if there is a CUDA issue?

---

Edit: RGB24 interleaved doesn't work, RGB32/YV12/presumably others does

Further edit: doesn't seem to work at all on my desktop computer, just returns unaltered clip...

Edit: Having looked at the code there is zero error handling/reporting, even for CUDA failures. You might want to add some!
__________________
My AviSynth filters / I'm the Doctor

Last edited by wonkey_monkey; 16th March 2024 at 16:27.
wonkey_monkey is offline   Reply With Quote
Old 16th March 2024, 14:12   #31  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,277
RGB24 works in Vapoursynth here.
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 16th March 2024, 20:12   #32  |  Link
Argaricolm
Registered User
 
Join Date: Apr 2018
Posts: 23
Quote:
Originally Posted by wonkey_monkey View Post
Does it just pass through the original clip if there is a CUDA issue?
It does nothing if CUDA is not supported or not supported input format.
Argaricolm is offline   Reply With Quote
Old 16th March 2024, 23:27   #33  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,496
Quote:
Originally Posted by Argaricolm View Post
It does nothing if CUDA is not supported or not supported input format.
Error throwing would be very helpful to avoid confusion, e.g.:

Code:
	if (cudaStatus == cudaSuccess) {
		...
	} else {
		env->ThrowError("SoftLight: CUDA failed");
	}
and similar when none of the conditions in GetFrame are met (although testing should be in the constructor, ideally).

Going back in time a little:

Quote:
Originally Posted by Argaricolm View Post
Possible on CPU too. But it will be around some fps. While cuda version 10x more faster.
I've investigated my scepticism and although it will obviously vary depending on CPU, GPU, colourspace etc, for a YV12 input I found only a 1.4x-1.7x speed increase over CPU AVX code implementing mode=3 (pegtop).

For interleaved RGB input, AVX code was 1.3x faster than CUDA, even with AviSynth+ colourspace conversion overheads. Multithreading might give another 25%-50% boost.
__________________
My AviSynth filters / I'm the Doctor

Last edited by wonkey_monkey; 16th March 2024 at 23:37.
wonkey_monkey is offline   Reply With Quote
Old 17th March 2024, 07:08   #34  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,060
Quote:
Originally Posted by Argaricolm View Post
Because summing each frame on CPU is very slow.
To sum all samples of the frame at SIMD there are possible several ways:

1. Sum at integer - require unpacks of 8..16 samples to 32bits and use summing of standard SIMD full width * superscalar factor of sum dispatch ports first for all samples of a line.

Because it looks 32bit integer can not hold UHD frame samples number * 256..65535 samples values sum without overflow - it is possible to make intermediate division of intermediate sums for each line and accumulate normalized sums of the all lines of a frame. It is more complex to program in compare with float32 processing but maybe visibly faster for SD 8bit and some HD frame sizes.

2. Make unpack and convert to float32 and perform all of 1 in float32 domain.

So best performance implementation can have different processing engines inside for different frame sizes. At least 1920x1080 with 8bit still can be processed with integer full frame summing without 32bit accumulator overflow. Also with SIMD word summing programmer anyway have partial sums at the final SIMD word ready to partial normalizing with some more overflow protection (AVX2 SIMD word of 8 32bit integers provides additional +3bits to overflow so total capacity is 32+3=35bits) and without significant precision loss.

Method 2 can process any frame sizes in single engine but expected to be slower at non-UHD frame sizes.

CPU SIMD is not very slow - but algorithm requires at least 2 full frame passes: first analisys pass of sum and second is correction pass of adjustment so performance will depend on frame size fitting in availavle CPU caches (our lovely Xeon MAX with HBM onboard will be nice performer here).

Last edited by DTL; 17th March 2024 at 07:12.
DTL is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:23.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.