AVS Softlight [Archive] - Doom9's Forum

Argaricolm

25th June 2023, 00:21

Example (https://imgsli.com/MjIyMzA1)
Brightness example (https://imgsli.com/MjM3MTEx)

Realization of CUDA soflight negative average blend.

Plugin is x64 (CUDA toolkit 12.8 & 11.8)

You could see on Youtube videos about removing color cast using Photoshops softlight blend of negative average. This is a CUDA realization of it that process every frame.
Input should be in PC color range (output will be too)! Use mode 8 & 9 to convert to full range and back.
And I suggest to remove noise from input before processing.

Parameters:

Softlight(mode, formula, skipblack, yuvin, yuvout, rangemin, rangemax, changerange)

All parameters are optional.

mode = 0-12 (0 is default)

Can be used like this: Softlight() same as Softlight(0)

Mode 0 (default):

YUV->RGB conversion
Calculates sums of all pixels in R,G,B planes (for each plane).
Get average from these sums (sum / number of pixels).
Get negative from this sum (255 - sum)
Use softlight blend of each plane with above negative. After this step we have same as photoshop does. But brightness of frame will be changed. To have brightness intact we need to restore it to original. That what other steps do.
We get HSV planes. V plane from orignal image (RGB => V). And HS from result after softlight. Then we do HS(changed) + V(original) -> RGB -> YUV
So first mode will neutralize only colors (hue + saturation) in frame and not brightness (volume).

Also keep in mind that you better remove black bars in video for correct processing (if there are any). Or they will affect average sum.

1 mode: Same as mode 0 but planes S & V restored to their original values. So this mode only normalizes lightness/brightness and does not change colors.

2 mode: Same as mode 0. But plane S is also boosted (softlight is done for each pixel with itself). So it neutralises colors and boost contrast.

3 mode: Same as mode 0 but without brightness restoration. Use it if you want to make brigtness also average (makes dark frames brighter).

4 mode: Same as mode 3 but each of RGB planes are boosted using softlight (contrast boost).

5 mode: YUV->RGB->softlight each RGB plane with itself->YUV (color/contrast boost).

6 mode: YUV->RGB->HSV->boost S->RGB->YUV (boost saturation).

7 mode: Limited color range clamping. Some videos with limited color range contain values < 16 and > 235. This mode change them to 16 & 235. This mode is not needed after mode 9.

8 mode: TV to PC color range conversion (use it on videos where you see no total black and only grays). Or check video using ShowChannels plugin (if minimum in Y is 16 or 15 - then your source is in limited range). You can change input levels using rangemin & rangemax params. They are used only in this mode. If they are not specified or wrong (rangemin>=rangemax) then default will be used.

9 mode: PC to TV color range conversion

10 mode: Grayscale.

For RGB32 - this mode uses RGB -> YUV444 -> RGB cuda conversion. U & V planes are set to 128 (and 512 on 10 bit).

For YUV - just U & V planes are set to 128 (or 512 for 10bit) without cuda.

11 mode: OETF function is applied to each pixel.

12 mode: EOTF function is applied to each pixel.

You can use 3 different softlight formulas:

formula = 0,1,2

0 - pegtop

1 - illusions.hu

2 - W3C

In my opinion - pegtop fomula is the best.

Also mode 1 & mode 3 are my favourite.

Photoshop formula was removed because of discontinuity of local contrast.

Formulas are explained here: https://en.wikipedia.org/wiki/Blend_modes

rangemin & rangemax These parameters are used for TV2PC color range conversion. If not specified, then default 16-235 (8 bit) and 64 - 963 (10 bit) will be used.

changerange Previously named "fullrange". But not only name was changed - now it works different. By default it is 0. When 0 it will treat YUV as limited range and RGB as full range. This means that for YUV it will rerange it to full before processing and for RGB it will not rerange. Else if it is 1, then YUV will not be reranged and RGB will be reranged. So, for example, if your source is RGB but in limited range (that is not normal) you should do:

softlight(3,changerange=1).

This will rerange RGB to full range, process it, and rerange to limited back. But if your RGB source is normal (full range), then

softlight(3)

will not rerange anything.

Example for range conversion outside for YUV source:

softlight(8) - we change YUV to full range

softlight(3,changerange=1) - we process "not normal" YUV without reranging it

softlight(9) - we make it back to "normal" limited range YUV

This will be slower, than when conversion is done inside. Because data will go back and forth from RAM to VRAM with each mode call.

Usage in AviSynth:

Softlight() same as SoftLight(0,0,0) same as SoftLight(mode=0,formula=0,skipblack=0,yuvin=0,yuvout=0)

Usage in VapourSynth:

video = core.Argaricolm.Softlight(video) or core.Argaricolm.Softlight(video,mode=0,formula=0,skipblack=0)

Skipblack option is a new enhancement for averate calculation. By default skipblack = 0 and it means it is activated.

To disable it - set it to anything not zero (like 1).

What it does is calculates how many plane (channel) elements are zero. Then they will not be counted in average calculation.

Example:

Original: (1 + 2 + 0) / 3 = 1 average

With skipblack enabled: (1 + 2 + 0) / 2 = 1.5 average

Color modes supported so far:

Avisynth:

Planar YUV 420 8 bit and 10 bit (YUV420P8, YUV420P10)
Planar YUV 444 8 bit and 10 bit (YUV444P8, YUV444P10)
Not planar RGB32 (BGR32) - this one is default you get by using ConvertToRGB() or ConvertToRGB32()
Planar RGB 8 bit and 10 bit (you get it by using ConvertToPlanarRGB()
Same for VapourSynth except BGR32 (Fredrik "asked" not to implement it in VapourSynth plugins)

yuvin & yuvout options are used for modes where yuv <-> rgb conversion is used and they define formula used for decode and encode

0 = Default is Rec.709.

Or you can select 601, 709, 2020. Like Softlight(yuvin=601,yuvout=601)

About OETF & EOTF functions.

They are added just to play with. EOTF is a reverse of OETF.

Try OETF function when your source is converted to PC range. To convert to PC range use Softlight(8). If result after OETF lacks of contrast then try to change black level higher than 16 like so:

Softlight(8, rangemin=16, rangemax=235)
Softlight(11)

Download at github (https://github.com/ArturAlekseev/AVS_SoftLight/releases)

StainlessS

25th June 2023, 10:29

Would not do any harm to post a few Mode before/after example images.

Postimages.org allows to embed images in your post, without needing Postimages.org account
(and dont need to wait for mods approval)

Postimages.org:- https://postimages.org/
Use, "thumbnail" or "image" for forum, modes. [copies url to clipboard, just paste in your post]

EDIT: If you do post images, I'll try remember to delete this post.

Selur

25th June 2023, 14:12

@StainlessS: here's an example: https://imgsli.com/MTg4MTEz
script used: ClearAutoloadDirs()
SetFilterMTMode("DEFAULT_MT_MODE", MT_MULTI_INSTANCE)
LoadPlugin("F:\Hybrid\64bit\Avisynth\avisynthPlugins\LSMASHSource.dll")
Import("F:\Hybrid\64bit\Avisynth\avisynthPlugins\mtmodes.avsi")
LoadPlugin("c:\Users\Selur\Desktop\Softlight.dll")
# loading source: G:\TestClips&Co\files\MPEG-4 H.264\Canon 5D RAW.mp4
# color sampling YV12@8, matrix: bt709, scantyp: progressive, luminance scale: limited
LWLibavVideoSource("G:\TestClips&Co\files\MPEG-4 H.264\Canon 5D RAW.mp4",cache=false,format="YUV420P8", prefer_hw=0)

org=last
Softlight(mode=X)

Interleave(org.Subtitle("Original"), last.Subtitle("Softlight(mode=X)"))
# current resolution: 1920x1080
PreFetch(16)
# output: color sampling YV12@8, matrix: bt709, scantyp: progressive, luminance scale: limited
return last

@Argaricolm: Any plans for a Vapoursynth version?
Any plans to also allow RGB input and high bit depth support?

Cu Selur

StainlessS

25th June 2023, 18:33

Cheers Selur, nice comparison method.

Argaricolm

25th June 2023, 19:08

@Argaricolm: Any plans for a Vapoursynth version?
Any plans to also allow RGB input and high bit depth support?

Cu Selur

RGB input is easy. I can do it fast.
Vapoursynth - never compiled for it. If much needed I can do it.
For high bit depth I'm not sure. If I will be able to change softlight code for it - then possible.

Selur

25th June 2023, 19:36

More supported color spaces are always better, since it give more freedom.
Vapoursynth would be great, since I mainly use Vapoursynth. https://forum.doom9.org/showthread.php?t=182961 might help with supporting both Avisynth and Vapoursynth.

Cu Selur

tormento

29th June 2023, 18:43

Am I the only one who doesn't understand what it does? :)

Argaricolm

29th June 2023, 23:03

Am I the only one who doesn't understand what it does? :)

It does this (https://www.youtube.com/watch?v=m5V2zuhGr4U). But is not changing brightness.

Frank62

30th June 2023, 11:24

Interesting. Has this something to do with what they did with the colours on "Moby Dick"? Or later with the BluRay version of "French Connection"?

Argaricolm

1st July 2023, 21:23

Interesting. Has this something to do with what they did with the colours on "Moby Dick"? Or later with the BluRay version of "French Connection"?

Don't know. I'v got an idea to do it with video. Possibly someone got same idea. But it's possible only using CUDA/GPU. Because summing each frame on CPU is very slow.

wonkey_monkey

2nd July 2023, 16:26

But it's possible only using CUDA/GPU. Because summing each frame on CPU is very slow.

How are you defining "possible" and "very slow"?

Argaricolm

3rd July 2023, 20:11

How are you defining "possible" and "very slow"?

Possible on CPU too. But it will be around some fps. While cuda version 10x more faster.

StainlessS

3rd July 2023, 23:27

Because summing each frame on CPU is very slow.
Any good for plain C version ? [should be ok for 8K+, req slight mods for 16 bit, probably dont need altscan, or crop coords]

double __cdecl PVF_AverageLuma_Planar(const PVideoFrame &src,const int xx,const int yy,const int ww,const int hh,const bool altscan) {
const int ystep = (altscan) ? 2:1;
const int pitch = src->GetPitch(PLANAR_Y);
const int ystride = pitch*ystep;
const BYTE *srcp = src->GetReadPtr(PLANAR_Y) + (yy * pitch) + xx;
__int64 acc = 0;
unsigned int sum = 0;
const int yhit = (altscan) ? (hh +1)>>1 : hh;
const unsigned int Pixels = (ww * yhit);

if(ww == 1) { // Special case for single pixel width
for(int y=yhit ; --y>=0;) {
sum += srcp[0];
srcp+= ystride;
}
} else {
const int eodd = (ww & 0x0F);
const int wm16 = ww - eodd;
for(int y=yhit; --y>=0 ;) {
switch(eodd) {
case 15: sum += srcp[wm16+14];
case 14: sum += srcp[wm16+13];
case 13: sum += srcp[wm16+12];
case 12: sum += srcp[wm16+11];
case 11: sum += srcp[wm16+10];
case 10: sum += srcp[wm16+9];
case 9: sum += srcp[wm16+8];
case 8: sum += srcp[wm16+7];
case 7: sum += srcp[wm16+6];
case 6: sum += srcp[wm16+5];
case 5: sum += srcp[wm16+4];
case 4: sum += srcp[wm16+3];
case 3: sum += srcp[wm16+2];
case 2: sum += srcp[wm16+1];
case 1: sum += srcp[wm16+0];
case 0: ;
}
for(int x=wm16; (x-=16)>=0 ; ) {
sum += (
srcp[x+15] +
srcp[x+14] +
srcp[x+13] +
srcp[x+12] +
srcp[x+11] +
srcp[x+10] +
srcp[x+ 9] +
srcp[x+ 8] +
srcp[x+ 7] +
srcp[x+ 6] +
srcp[x+ 5] +
srcp[x+ 4] +
srcp[x+ 3] +
srcp[x+ 2] +
srcp[x+ 1] +
srcp[x+ 0]
);
}
if(sum & 0x80000000) {acc += sum;sum=0;} // avoid possiblilty of overflow
srcp += ystride;
}
}

acc += sum;
double dacc = double(acc);
return dacc / Pixels;
}

EDIT: From RT_Stats, v2.0 Beta 13 [8 bit CS only].
Not that slow really. Similar method for other colorspace in "PVF_ ... " files.

EDIT: The switch stuff only accounts for ww pixel width, does not take srcp memory alignment stuff into account, so could be improved
to better use compiler vectorization type stuff {probably require additional switch thingy for end cases}.
If always full frame {no coords}, then could take some shortcuts. {Avisynth+ frames LHS always aligned, not so for Avs standard 'in place' cropping}

EDIT: Might be handy, [from here:- https://forum.doom9.org/showthread.php?p=1935661#post1935661 ]

Function PitchTortureTest(clip c) { # IanB:- https://forum.doom9.org/showthread.php?p=1628159#post1628159
c
A=SelectEvery(4, 0)
B=SelectEvery(4, 1).AddBorders(0,0,8,0).Crop(0,0,-8,0)
C=SelectEvery(4, 2).AddBorders(0,0,16,0).Crop(0,0,-16,0)
D=SelectEvery(4, 3).AddBorders(2,0,22,0).Crop(2,0,-22,0)
Interleave(A,B,C,D)
}

Could probably be improved if modified to take cropping granularity of colorspace into account.

EDIT: from OP,
1. YUV->RGB conversion
2. Calculates sums of all pixels in R,G,B planes (for each).
3. Get average from these sums (sum / number of pixels).
4. Get negative from this sum (255 - sum)
Check intent of -ve method.

From posted link for IanB thingy thread, here:- https://forum.doom9.org/showthread.php?p=1935616#post1935616

MyAverage, v2.6+

A simple average filter for Avisynth v2.60 standard colorspaces, only.

Returns a clip where each return frame is a single color average of input frame, same size and colorspace as input.
Does an invert on result if Bool Invert==true.

ColorSpace, YV12, YV16, YV24, YV411, Y8, YUY2, RGB24, RGB32, only.

Return clip Y, U and V, or R, G and B, will be channel averages, unless Invert==True, where channels averages will be inverted.

MyAverage(clip c, Bool "Invert"=false,Bool "TV_YUV"=False,Bool "MyYV24"=False)

Invert, Default false == sampled average. Otherwise Inverted average.
TV_YUV, Default false, If True(And YUV), then photo negative invert around TV levels mid Y(125.5), rather than 127.5.
MyYV24, If true and YV24 (only YV24), then process Y,U,V, together, else by planes.

Returns clip same colorspace and size as input.

if(invert) { // invert ?
if(tvy) { // tv levels invert ? [ TV levels center is 125.5 not 127.5, ie (16 + 235)/2 ]
ave = int(-(ave_D - 125.5) + 125.5 + 0.5); // TV_YUV Y mid = 125.5, invert, and Round
} else {
ave = int(ave_D + 0.5) ^ 0xFF; // PC_YUV Y mid = 127.5, symmetrical about 127.5 [EDIT: ADDED, or ave = 255 - int(ave_D + 0.5)]
}
aveU ^= 0xFF ;
aveV ^= 0xFF;
} else {

Also
In this snippet from posted source,

if(invert) { // invert ?
if(tvy) { // tv levels invert ? [ TV levels center is 125.5 not 127.5, ie (16 + 235)/2 ]
ave = int(-(ave_D - 125.5) + 125.5 + 0.5); // TV_YUV Y mid = 125.5, invert, and Round
} else {
ave = int(ave_D + 0.5) ^ 0xFF; // PC_YUV Y mid = 127.5, symmetrical about 127.5
}
aveU ^= 0xFF ;
aveV ^= 0xFF;
} else {
ave = int(ave_D + 0.5);
}
ave = max( min( ave, 255) ,0);

Stuff in BLUE ain't exactly correct, xor with $FF would invert $80 U,V center to $7F,
but avoids problem where source U,V == 0 would invert to $100, we invert to $FF in 8 bit range,
also method adopted will arrive back to original source value if inverted twice.
As it is, it aint quite right but method I chose. Maybe I should invert and clip, there should be no source of $00 anyway.
EDIT: Source $00 equivalent to center - 128, and source $FF equiv to center + 127, ie not symmetrical about center 128.

StainlessS

4th July 2023, 03:45

Further to above,

W = 3 * 1280 # 3840
H = 3 * 720 # 2160
STATICFRAMES = False
SECONDS = 5 * 60
###
FRAMES = Round(29.97 * SECONDS)

# SECONDS seconds @ 29.97 FPS. STATICFRAMES, If set to false, generate all frames. Default true (one static frame is served)
Colorbars(Width=W,Height=H,pixel_type="YV12",staticframes=STATICFRAMES).Trim(0,-FRAMES)

# Comment one of below out
Return Scriptclip("AverageLuma() return last") # Avs+ builtin. Always full frame.
#Return Scriptclip("RT_AverageLuma() return last") # RT_Stats, RT_AverageLuma. Used to be faster than AVS 2.60 Standard.

4K : AVS+ AverageLuma : STATICFRAMES = FALSE

c:\Z>avsmeter64 test.avs

AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0)

Number of frames: 8991
Length (hh:mm:ss.ms): 00:05:00.000
Frame width: 3840
Frame height: 2160
Framerate: 29.970 (30000/1001)
Colorspace: YV12
Audio channels: 2
Audio bits/sample: 32 (Float)
Audio sample rate: 48000
Audio samples: 14399985

Frames processed: 8991 (0 - 8990)
FPS (min | max | average): 554.7 | 890.6 | 843.1
Process memory usage (max): 104 MiB
Thread count: 16
CPU usage (average): 7.9%

Time (elapsed): 00:00:10.664

4K : RT_AverageLuma : STATICFRAMES = FALSE

c:\Z>avsmeter64 test.avs

AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0)

Number of frames: 8991
Length (hh:mm:ss.ms): 00:05:00.000
Frame width: 3840
Frame height: 2160
Framerate: 29.970 (30000/1001)
Colorspace: YV12
Audio channels: 2
Audio bits/sample: 32 (Float)
Audio sample rate: 48000
Audio samples: 14399985

Frames processed: 8991 (0 - 8990)
FPS (min | max | average): 220.2 | 382.7 | 329.0
Process memory usage (max): 104 MiB
Thread count: 14
CPU usage (average): 8.0%

Time (elapsed): 00:00:27.328

4K : AVS+ AverageLuma : STATICFRAMES = TRUE

c:\Z>avsmeter64 test.avs

AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0)

Number of frames: 8991
Length (hh:mm:ss.ms): 00:05:00.000
Frame width: 3840
Frame height: 2160
Framerate: 29.970 (30000/1001)
Colorspace: YV12
Audio channels: 2
Audio bits/sample: 32 (Float)
Audio sample rate: 48000
Audio samples: 14399985

Frames processed: 8991 (0 - 8990)
FPS (min | max | average): 2335 | 3244 | 3082
Process memory usage (max): 81 MiB
Thread count: 16
CPU usage (average): 6.9%

Time (elapsed): 00:00:02.917

4K : RT_AverageLuma : STATICFRAMES = TRUE

c:\Z>avsmeter64 test.avs

AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0)

Number of frames: 8991
Length (hh:mm:ss.ms): 00:05:00.000
Frame width: 3840
Frame height: 2160
Framerate: 29.970 (30000/1001)
Colorspace: YV12
Audio channels: 2
Audio bits/sample: 32 (Float)
Audio sample rate: 48000
Audio samples: 14399985

Frames processed: 8991 (0 - 8990)
FPS (min | max | average): 511.8 | 859.5 | 801.5
Process memory usage (max): 81 MiB
Thread count: 16
CPU usage (average): 7.9%

Time (elapsed): 00:00:11.218

Clearly, Pinterf did some improvements to Avs+ AverageLuma,
Suggest steal some of his code [I will not tell him].

Also note, Scriptclip would slow it quite a bit compared with GetFrame() in plugin.
We did not assign AverageLuma to variable in scriptclip, as that would likely greatly affect results.

However, FPS for the RT_AverageLuma aint so very bad for 4K, and you would likely want pure C version anyway.

EDIT: Numbers above on i7-8700.
[No Prefetch {also Scriptclip single thread}, so I assume fully single core numbers]
EDIT: Yep, Resource meter seems to show single core in use.

Selur

13th July 2023, 13:49

Vapoursynth - never compiled for it. If much needed I can do it.
:) A Vapoursynth version would be really nice.

Selur

10th September 2023, 14:25

btw. since yuv->rgb is used internally: How about also supporting RGB input?

Argaricolm

25th November 2023, 03:36

btw. since yuv->rgb is used internally: How about also supporting RGB input?

I'm planning to add vapoursynth, YUV444, RGB support soon.
So far a new release.

Selur

25th November 2023, 08:59

Thanks! Looking forward to it!

Argaricolm

28th November 2023, 00:42

Added YUV 444 support and VapourSynth support. And a small bugfix.

Selur

28th November 2023, 19:28

Nice! Thanks!
Did some quick tests, seems to work fine in Vapoursynth.
About color space support:
Would be cool if you could also add 10, 16, 32bit support, if possible. :D
(in detail: RGBS, RGBH, RGB48,RGB30, YUV420P10, YUV420P16, YUV420PS, YUV420PH, YUV444PH, YUV444PS, YUV444P10, YUV444P16)

Cu Selur

Argaricolm

28th November 2023, 22:26

Nice! Thanks!
Did some quick tests, seems to work fine in Vapoursynth.
About color space support:
Would be cool if you could also add 10, 16, 32bit support, if possible. :D
(in detail: RGBS, RGBH, RGB48,RGB30, YUV420P10, YUV420P16, YUV420PS, YUV420PH, YUV444PH, YUV444PS, YUV444P10, YUV444P16)

Cu Selur

Redownload 1.11 release. I'v fixed it a little. It contained wrong check in avisynth version.
Next I will add RGB32 support.

Selur

29th November 2023, 11:49

Will do. Thanks!

Argaricolm

2nd February 2024, 00:19

A new update.
Added new mode 1 (other numbers changed).
Added RGB32. But only for avisynth (vapoursynth strangely does not support RGB32).

Selur

3rd February 2024, 20:36

Using the latest version in Vapoursynth, when using:
# adjusting color space from YUV420P8 to RGB24 for Softlight
clip = core.resize.Bicubic(clip=clip, format=vs.RGB24, matrix_in_s="470bg", range_s="limited")
# color adjustment using Softlight
clip = core.Argaricolm.Softlight(clip)
I get just a black output, while using:
# adjusting color space from YUV420P8 to YUV444P8 for Softlight
clip = core.resize.Bicubic(clip=clip, format=vs.YUV444P8, matrix_in_s="470bg", range_s="limited")
# color adjustment using Softlight
clip = core.Argaricolm.Softlight(clip)
works as expected.

Doesn't matter whether I use "CUDA 12.3/SoftLight.dll" or "CUDA 11.8/SoftLight.dll".
=> seems like v1.12 broke RGB24 support for Vapoursynth.

Cu Selur

Ps.: Do you prefer if I post such stuff here or on github?

Argaricolm

14th February 2024, 10:58

"Added RGB32. But only for avisynth (vapoursynth strangely does not support RGB32)."

There is no RGB24 support (so far).
And strangely vapoursynth does not support RGB32. That should be faster in memory because of 4 bytes addressing.

Selur

14th February 2024, 20:08

afaik. Vapoursynth handles alpha channels in a separate 'stream/clip'.

Argaricolm

18th February 2024, 00:59

afaik. Vapoursynth handles alpha channels in a separate 'stream/clip'.

Well for RGB32 i just use R,G,B bytes and skip alpha one.
So it's just like I use RGB24 but in RGB32 adressing.
In avisynth memory 4th byte is automatically set to FF (255) when 24bit (8*3) content is converted to RGB32.

So it's not realy a correct RGB32 support.
Maybe I need to change it to RGB24.

Selur

18th February 2024, 08:41

Yeah, sound like it should be RGB24 not RGB32 if the alpha channel isn't used.

Argaricolm

15th March 2024, 22:05

Yeah, sound like it should be RGB24 not RGB32 if the alpha channel isn't used.

New version 1.13 (https://github.com/ArturAlekseev/AVS_SoftLight/releases/tag/v1.13).
Now RGB will work in Vapoursynth. It is RGB24 planar.
Also I'v updated CUDA toolkit to 12.4 version.

wonkey_monkey

16th March 2024, 13:48

Am I missing something?

colorbars(pixel_type="rgb24", width = 3840, height = 2160).softlight

doesn't seem to do anything (same with a real image source). AvsMeter can't time the script because it's too fast, which kind of suggests the filter isn't doing anything. I tried it on two different computers (laptop and desktop) with Nvidia cards.

Does it just pass through the original clip if there is a CUDA issue?

---

Edit: RGB24 interleaved doesn't work, RGB32/YV12/presumably others does

Further edit: doesn't seem to work at all on my desktop computer, just returns unaltered clip...

Edit: Having looked at the code there is zero error handling/reporting, even for CUDA failures. You might want to add some!

Selur

16th March 2024, 14:12

RGB24 works in Vapoursynth here.

Argaricolm

16th March 2024, 20:12

Does it just pass through the original clip if there is a CUDA issue?

It does nothing if CUDA is not supported or not supported input format.

wonkey_monkey

16th March 2024, 23:27

It does nothing if CUDA is not supported or not supported input format.

Error throwing would be very helpful to avoid confusion, e.g.:

if (cudaStatus == cudaSuccess) {
...
} else {
env->ThrowError("SoftLight: CUDA failed");
}

and similar when none of the conditions in GetFrame are met (although testing should be in the constructor, ideally).

Going back in time a little:

Possible on CPU too. But it will be around some fps. While cuda version 10x more faster.

I've investigated my scepticism and although it will obviously vary depending on CPU, GPU, colourspace etc, for a YV12 input I found only a 1.4x-1.7x speed increase over CPU AVX code implementing mode=3 (pegtop).

For interleaved RGB input, AVX code was 1.3x faster than CUDA, even with AviSynth+ colourspace conversion overheads. Multithreading might give another 25%-50% boost.

DTL

17th March 2024, 07:08

Because summing each frame on CPU is very slow.

To sum all samples of the frame at SIMD there are possible several ways:

1. Sum at integer - require unpacks of 8..16 samples to 32bits and use summing of standard SIMD full width * superscalar factor of sum dispatch ports first for all samples of a line.

Because it looks 32bit integer can not hold UHD frame samples number * 256..65535 samples values sum without overflow - it is possible to make intermediate division of intermediate sums for each line and accumulate normalized sums of the all lines of a frame. It is more complex to program in compare with float32 processing but maybe visibly faster for SD 8bit and some HD frame sizes.

2. Make unpack and convert to float32 and perform all of 1 in float32 domain.

So best performance implementation can have different processing engines inside for different frame sizes. At least 1920x1080 with 8bit still can be processed with integer full frame summing without 32bit accumulator overflow. Also with SIMD word summing programmer anyway have partial sums at the final SIMD word ready to partial normalizing with some more overflow protection (AVX2 SIMD word of 8 32bit integers provides additional +3bits to overflow so total capacity is 32+3=35bits) and without significant precision loss.

Method 2 can process any frame sizes in single engine but expected to be slower at non-UHD frame sizes.

CPU SIMD is not very slow - but algorithm requires at least 2 full frame passes: first analisys pass of sum and second is correction pass of adjustment so performance will depend on frame size fitting in availavle CPU caches (our lovely Xeon MAX with HBM onboard will be nice performer here).

Argaricolm

4th May 2024, 17:17

A new release - 1.14 (https://github.com/ArturAlekseev/AVS_SoftLight/releases/tag/v1.14-release).

Also a question for video gurus here:
As I see nearly ALL content that is released now on blurays or streamed through streaming services are encoded in limited color range (16-235). The question is why it is so?
Old TVs that had such limitations are already all in junk. And new TV's can't determine automatically that source is limited color range.
This results that we watch limited color range without convertion to full range. But we should watch limited range converted to full.
I understand that streaming services long ago could do this to make streaming smaller in size (limited color range take fewer space).
But now, when we have fast internet speeds nearly everywhere it is just ridiculous.
And for blurays I don't understand it at all.

It looks like some conspiracy to mock on people eyes.

DTL

4th May 2024, 17:35

" The question is why it is so?"

It is industry standard to keep more quality with limited number of bits (until possible changing to float32 or at least float16 samples values encoding). But 8bit-narrow (limited) works good enough so it is unlikely industry will change to float16/32 any fast.

"This results that we watch limited color range without convertion to full range."

Physical display converts 16..255 Y code values to 0..PHYmax brightness values. So you not lost 236..255 code values encoded in 8bit narrow range. You can test it with 16..255 Y values test pattern. If display clips 236..255 to PHYmax it is broken and need repair or adjustment.

235 code value only marks position of nominal white - not max PHY white. Display hardware may treat 236..255 range very differently (depending on the processor cost and AI algoriphms included) - either continue to track system transfer function or make HDR-expansion of any type.

Selur

4th May 2024, 18:29

@Argaricolm: posted in the issue tracker over at GitHub, 10bit does not work in Vapoursynth

Argaricolm

4th May 2024, 22:20

" The question is why it is so?"

It is industry standard to keep more quality with limited number of bits (until possible changing to float32 or at least float16 samples values encoding). But 8bit-narrow (limited) works good enough so it is unlikely industry will change to float16/32 any fast.

8 bit is not about limited color range.

Limited color range is 8 bit 16-235 levels of brightness (220 from 256).

When you view it on tv as it is - nothing is converted to full range.
You see incorrect colors and contrast.
But you have seen it for years now. So you think that it is "normal".
Here is example (https://imgsli.com/MjYxMzA0).
In limited color range there is no 0 and thats why in frames with a lot of black/dark you dont see black. You see only nearly black. This results in fewer contrast. And incorrect colors (because of contrast). For example you should see red color, but you will see light red.

Physical display converts 16..255 Y code values to 0..PHYmax brightness values.

It should do so. But how can it find out that it should?
For example - that batman 2022 video from above is surely limited color range but it does not have any info inside about limited color range. And I'v checked pixels. You can find values 0-15 inside any limited color range video. Its just they are fewer in numbers than should be. So I don't see any easy way for TV to determine - is video with limited color range or not.

DTL

4th May 2024, 22:56

"You can find values 0-15 inside any limited color range video"

It is also correct - footroom in narrow range mapping is to hold filter undershoots to display better sharpness (visible in the PHY range above zero). See https://forum.doom9.org/showthread.php?p=2000687#post2000687

wonkey_monkey

4th May 2024, 23:38

When you view it on tv as it is - nothing is converted to full range.
You see incorrect colors and contrast.
But you have seen it for years now. So you think that it is "normal".
Here is example.
In limited color range there is no 0 and thats why in frames with a lot of black/dark you dont see black. You see only nearly black.

Are you saying all TVs have been getting it wrong since forever?

Because I don't think that is the case.

DTL

5th May 2024, 00:40

"Here is example."

Computer displays and OS (bitmap processing) were designed for RGB full range mapping. Typically for static imaging like photo. So to watch industry standard encoded moving pictures with narrow range mapping you need special software (or software + driver for video card to support all required conversions and levels re-mapping) and you will got all your blacks correct and some not clipped and not very bad super-whites. It is topic for '(software) video players' section of forum - https://forum.doom9.org/forumdisplay.php?f=15

Julek

5th May 2024, 05:15

When you view it on tv as it is - nothing is converted to full range.

That's just wrong.
There is metadata for this, check a WEBDL with mediainfo for example.

And if you can't see 100% black on your TV, maybe it's because your TV isn't OLED, in which case it's physically incapable of making true black.

wonkey_monkey

5th May 2024, 12:44

So to watch industry standard encoded moving pictures with narrow range mapping you need special software

I wouldn't say you need special software. I've never seen a video player that doesn't expand from limited range by default.

DTL

5th May 2024, 13:11

"And new TV's can't determine automatically that source is limited color range."

It may be broken TV or badly configured from defaults. After RCA/SCART analog cunsumers connections between Disk Players and Display devices new standard is HDMI. And typical display should expect HDMI data in narrow range by default. Some displays have control how to treat HDMI data - narrow or full (for the case of Computer connection via HDMI). Also HDMI may have some metadata signalling on range mapping used (is some version and depends on transmitter and receiver compatibility ?).

So in the case you use standard consumer Disk Player and standard consumer Media Display Device and connect via RCA/SCART or HDMI everything expected to runs fine with correct blacks and super-whites.

If you trying to use Media Display Device with media file playback - there are many points of failure like badly created file rip or wrong display playback firmware (or incompatible with some hand-crafted file rip etc). If you try to use general purpose consumer Computer to playback some file rip there are even more places to fail.

If you use some network streaming it is also may or may not be correctly decoded in playback device (depend on codec/protocol/etc settings at source side and firmware at playback device). So it is the subject to post issues to streaming provider or manufacturer of playback device about possible errors in range mapping treatment.

With AVS it is possible to change levels mapping using Levels so simulate Computer Playback transform of narrow range RGB like Levels(16,255,1,0,255) (and in system transform domain or linear (?)) into Computer full range RGB 0..255. Yes - in 8bit it will adds some quantization noise (banding) so may be good to add some dithering after this range mapping (expansion) if source natural noise levels are too low to do self-dithering.

It is not one and always correct range remapping - it only some example to keep super-white unclipped. To get more contrast with clipping of possible super-whites you can use Levels(16,235,1,0,255) or you can go into RGBPS and apply some LUT or AI/NN plugin to do some nice super-whites expansion to HDR and convert result back into some standard HDR transfer domain to feed to HDR-capable display device.

Argaricolm

9th May 2024, 18:00

Well I have a cheap TV Skyworth. And it does not have any switch between TV / Full color range and looks like it shows everything in full range.
Yes it has some feature called "adaptive brightness control", but it is adaptive. Its is not a static convertion from tv to full range.
Also I watch some content from TV box Beelink, that is flashed with android tv custom firmware. It also does not have any switch for TV / Full range. It has only settings to choose between YUV444, YUV422, RGB. But this switch does not affect color range.
And also I'm not some guru, so as a normal user when I see some settings about full range or limited range I think that "full" is better than "limited". And I think most other normal TV users think like so.
And in result we see limited color range without convertion. So we watch data in its intermediate state designed to be converted to full range before output. And I think it's a strange design for our days when we have fast transfer speeds and space - no need for limited color range anymore.

And here is example (https://imgsli.com/MjYyNTYz) of what we see and what we should see.

And especially strange to see such content on youtube. For example most tv records are published there without convertion to full range. And we watch it on our tablets/computers also without any convertion. And we think that it is "normal".

Julek

9th May 2024, 19:24

And here is example (https://imgsli.com/MjYyNTYz) of what we see and what we should see.

Can you post the script used to convert limited->full, you seem to be doing it the wrong way, when you convert the YUV video to RGB it is already adjusted to full, so there should be no difference, and your “full” is clipping dark areas.

wonkey_monkey

10th May 2024, 23:05

It might be worth pointing out here that humans have a tendency to automatically associate brighter/louder/higher contrast/higher saturation with "better". That doesn't mean "full" is always the proper choice.

Argaricolm

10th May 2024, 23:28

Can you post the script used to convert limited->full, you seem to be doing it the wrong way, when you convert the YUV video to RGB it is already adjusted to full, so there should be no difference, and your “full” is clipping dark areas.

Before I was using YUV <-> RGB conversion formula from here (https://learn.microsoft.com/en-us/windows/win32/medfound/recommended-8-bit-yuv-formats-for-video-rendering).
But result was always YUV with limited range.

Now I use formula from here (https://www.mikekohn.net/file_formats/yuv_rgb_converter.php).

It does not care for color space resulting the same range.
Example: RGB(5,5,5) <=> YUV(5,128,128)
So from limited YUV I get limited RGB and then limited YUV back (or from full I get full).

Softlight(8) rerange limited range to full this way:

1. YUV is converted to RGB (if input is YUV)
2. RGB is reranged
3. Back to YUV.

Rerange is done this way:

(R - 16) / 220 * 255 + 0.5

So each level from 16 to 235 will become from 0 to 254.
When it is converted to YUV - I get full range YUV.

So far the best combination I use for myself is:
ConvertToRGBNV() <- this is from ImageSourceNV plugin
softlight(8)
softlight(3)
ConvertToYUVNV()

Here I use BGR32 input because I use one plugin between softlight(8) and softlight(3) that requires BGR32 input.

If you use only softlight combination then its better to convert to RGB planar input. This way softlight functions will not convert YUV <-> RGB each time.
In above my example they convert BGR32 <-> RGB planar in each softlight call (but using CUDA).

Think I will add just convertion functions the next release.

Argaricolm

10th May 2024, 23:34

It might be worth pointing out here that humans have a tendency to automatically associate brighter/louder/higher contrast/higher saturation with "better". That doesn't mean "full" is always the proper choice.

Better choice to view surely.
Looks more 3D and with higher saturation.
Yes you see fewer details in dark areas. But I think its a small tradeoff not to see some dark details for deeper 3D and not brightened colors (and that's how you should view it anyway - if your hardware/software will correctly identify source as limited).

wonkey_monkey

10th May 2024, 23:44

Better choice to view surely.
Looks more 3D and with higher saturation.
Yes you see fewer details in dark areas. But I think its a small tradeoff not to see some dark details for deeper 3D and not brightened colors.

That's your personal choice. It doesn't mean your TV is doing something wrong just because you can override it to a setting that you think looks better.