Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
|
|
#1 | Link |
|
Registered User
Join Date: Apr 2018
Posts: 58
|
AVS Softlight
Example
Brightness example Realization of CUDA soflight negative average blend. Plugin is x64 (CUDA toolkit 12.8 & 11.8) You could see on Youtube videos about removing color cast using Photoshops softlight blend of negative average. This is a CUDA realization of it that process every frame. Input should be in PC color range (output will be too)! Use mode 8 & 9 to convert to full range and back. And I suggest to remove noise from input before processing. Parameters: Softlight(mode, formula, skipblack, yuvin, yuvout, rangemin, rangemax, changerange) All parameters are optional. mode = 0-12 (0 is default) Can be used like this: Softlight() same as Softlight(0) Mode 0 (default): YUV->RGB conversion Calculates sums of all pixels in R,G,B planes (for each plane). Get average from these sums (sum / number of pixels). Get negative from this sum (255 - sum) Use softlight blend of each plane with above negative. After this step we have same as photoshop does. But brightness of frame will be changed. To have brightness intact we need to restore it to original. That what other steps do. We get HSV planes. V plane from orignal image (RGB => V). And HS from result after softlight. Then we do HS(changed) + V(original) -> RGB -> YUV So first mode will neutralize only colors (hue + saturation) in frame and not brightness (volume). Also keep in mind that you better remove black bars in video for correct processing (if there are any). Or they will affect average sum. 1 mode: Same as mode 0 but planes S & V restored to their original values. So this mode only normalizes lightness/brightness and does not change colors. 2 mode: Same as mode 0. But plane S is also boosted (softlight is done for each pixel with itself). So it neutralises colors and boost contrast. 3 mode: Same as mode 0 but without brightness restoration. Use it if you want to make brigtness also average (makes dark frames brighter). 4 mode: Same as mode 3 but each of RGB planes are boosted using softlight (contrast boost). 5 mode: YUV->RGB->softlight each RGB plane with itself->YUV (color/contrast boost). 6 mode: YUV->RGB->HSV->boost S->RGB->YUV (boost saturation). 7 mode: Limited color range clamping. Some videos with limited color range contain values < 16 and > 235. This mode change them to 16 & 235. This mode is not needed after mode 9. 8 mode: TV to PC color range conversion (use it on videos where you see no total black and only grays). Or check video using ShowChannels plugin (if minimum in Y is 16 or 15 - then your source is in limited range). You can change input levels using rangemin & rangemax params. They are used only in this mode. If they are not specified or wrong (rangemin>=rangemax) then default will be used. 9 mode: PC to TV color range conversion 10 mode: Grayscale. For RGB32 - this mode uses RGB -> YUV444 -> RGB cuda conversion. U & V planes are set to 128 (and 512 on 10 bit). For YUV - just U & V planes are set to 128 (or 512 for 10bit) without cuda. 11 mode: OETF function is applied to each pixel. 12 mode: EOTF function is applied to each pixel. You can use 3 different softlight formulas: formula = 0,1,2 0 - pegtop 1 - illusions.hu 2 - W3C In my opinion - pegtop fomula is the best. Also mode 1 & mode 3 are my favourite. Photoshop formula was removed because of discontinuity of local contrast. Formulas are explained here: https://en.wikipedia.org/wiki/Blend_modes rangemin & rangemax These parameters are used for TV2PC color range conversion. If not specified, then default 16-235 (8 bit) and 64 - 963 (10 bit) will be used. changerange Previously named "fullrange". But not only name was changed - now it works different. By default it is 0. When 0 it will treat YUV as limited range and RGB as full range. This means that for YUV it will rerange it to full before processing and for RGB it will not rerange. Else if it is 1, then YUV will not be reranged and RGB will be reranged. So, for example, if your source is RGB but in limited range (that is not normal) you should do: softlight(3,changerange=1). This will rerange RGB to full range, process it, and rerange to limited back. But if your RGB source is normal (full range), then softlight(3) will not rerange anything. Example for range conversion outside for YUV source: softlight(8) - we change YUV to full range softlight(3,changerange=1) - we process "not normal" YUV without reranging it softlight(9) - we make it back to "normal" limited range YUV This will be slower, than when conversion is done inside. Because data will go back and forth from RAM to VRAM with each mode call. Usage in AviSynth: Softlight() same as SoftLight(0,0,0) same as SoftLight(mode=0,formula=0,skipblack=0,yuvin=0,yuvout=0) Usage in VapourSynth: video = core.Argaricolm.Softlight(video) or core.Argaricolm.Softlight(video,mode=0,formula=0,skipblack=0) Skipblack option is a new enhancement for averate calculation. By default skipblack = 0 and it means it is activated. To disable it - set it to anything not zero (like 1). What it does is calculates how many plane (channel) elements are zero. Then they will not be counted in average calculation. Example: Original: (1 + 2 + 0) / 3 = 1 average With skipblack enabled: (1 + 2 + 0) / 2 = 1.5 average Color modes supported so far: Avisynth: Planar YUV 420 8 bit and 10 bit (YUV420P8, YUV420P10) Planar YUV 444 8 bit and 10 bit (YUV444P8, YUV444P10) Not planar RGB32 (BGR32) - this one is default you get by using ConvertToRGB() or ConvertToRGB32() Planar RGB 8 bit and 10 bit (you get it by using ConvertToPlanarRGB() Same for VapourSynth except BGR32 (Fredrik "asked" not to implement it in VapourSynth plugins) yuvin & yuvout options are used for modes where yuv <-> rgb conversion is used and they define formula used for decode and encode 0 = Default is Rec.709. Or you can select 601, 709, 2020. Like Softlight(yuvin=601,yuvout=601) About OETF & EOTF functions. They are added just to play with. EOTF is a reverse of OETF. Try OETF function when your source is converted to PC range. To convert to PC range use Softlight(8). If result after OETF lacks of contrast then try to change black level higher than 16 like so: Softlight(8, rangemin=16, rangemax=235) Softlight(11) Download at github Last edited by Argaricolm; 14th February 2025 at 14:25. Reason: New version |
|
|
|
|
|
#2 | Link |
|
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,394
|
Would not do any harm to post a few Mode before/after example images.
Postimages.org allows to embed images in your post, without needing Postimages.org account (and dont need to wait for mods approval) Postimages.org:- https://postimages.org/ Use, "thumbnail" or "image" for forum, modes. [copies url to clipboard, just paste in your post] EDIT: If you do post images, I'll try remember to delete this post.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 25th June 2023 at 10:32. |
|
|
|
|
|
#3 | Link |
|
Registered User
Join Date: Oct 2001
Location: Germany
Posts: 7,843
|
@StainlessS: here's an example: https://imgsli.com/MTg4MTEz
script used: Code:
ClearAutoloadDirs()
SetFilterMTMode("DEFAULT_MT_MODE", MT_MULTI_INSTANCE)
LoadPlugin("F:\Hybrid\64bit\Avisynth\avisynthPlugins\LSMASHSource.dll")
Import("F:\Hybrid\64bit\Avisynth\avisynthPlugins\mtmodes.avsi")
LoadPlugin("c:\Users\Selur\Desktop\Softlight.dll")
# loading source: G:\TestClips&Co\files\MPEG-4 H.264\Canon 5D RAW.mp4
# color sampling YV12@8, matrix: bt709, scantyp: progressive, luminance scale: limited
LWLibavVideoSource("G:\TestClips&Co\files\MPEG-4 H.264\Canon 5D RAW.mp4",cache=false,format="YUV420P8", prefer_hw=0)
org=last
Softlight(mode=X)
Interleave(org.Subtitle("Original"), last.Subtitle("Softlight(mode=X)"))
# current resolution: 1920x1080
PreFetch(16)
# output: color sampling YV12@8, matrix: bt709, scantyp: progressive, luminance scale: limited
return last
Any plans to also allow RGB input and high bit depth support? Cu Selur Last edited by Selur; 25th June 2023 at 14:21. |
|
|
|
|
|
#4 | Link | |
|
Registered User
Join Date: Apr 2018
Posts: 58
|
Quote:
Vapoursynth - never compiled for it. If much needed I can do it. For high bit depth I'm not sure. If I will be able to change softlight code for it - then possible. |
|
|
|
|
|
|
#5 | Link |
|
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,394
|
Cheers Selur, nice comparison method.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? |
|
|
|
|
|
#6 | Link |
|
Registered User
Join Date: Oct 2001
Location: Germany
Posts: 7,843
|
More supported color spaces are always better, since it give more freedom.
Vapoursynth would be great, since I mainly use Vapoursynth. https://forum.doom9.org/showthread.php?t=182961 might help with supporting both Avisynth and Vapoursynth. Cu Selur |
|
|
|
|
|
#11 | Link |
|
Registered User
Join Date: Jul 2018
Posts: 1,469
|
To sum all samples of the frame at SIMD there are possible several ways:
1. Sum at integer - require unpacks of 8..16 samples to 32bits and use summing of standard SIMD full width * superscalar factor of sum dispatch ports first for all samples of a line. Because it looks 32bit integer can not hold UHD frame samples number * 256..65535 samples values sum without overflow - it is possible to make intermediate division of intermediate sums for each line and accumulate normalized sums of the all lines of a frame. It is more complex to program in compare with float32 processing but maybe visibly faster for SD 8bit and some HD frame sizes. 2. Make unpack and convert to float32 and perform all of 1 in float32 domain. So best performance implementation can have different processing engines inside for different frame sizes. At least 1920x1080 with 8bit still can be processed with integer full frame summing without 32bit accumulator overflow. Also with SIMD word summing programmer anyway have partial sums at the final SIMD word ready to partial normalizing with some more overflow protection (AVX2 SIMD word of 8 32bit integers provides additional +3bits to overflow so total capacity is 32+3=35bits) and without significant precision loss. Method 2 can process any frame sizes in single engine but expected to be slower at non-UHD frame sizes. CPU SIMD is not very slow - but algorithm requires at least 2 full frame passes: first analisys pass of sum and second is correction pass of adjustment so performance will depend on frame size fitting in availavle CPU caches (our lovely Xeon MAX with HBM onboard will be nice performer here). Last edited by DTL; 17th March 2024 at 07:12. |
|
|
|
|
|
#14 | Link | ||||
|
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,394
|
Quote:
Code:
double __cdecl PVF_AverageLuma_Planar(const PVideoFrame &src,const int xx,const int yy,const int ww,const int hh,const bool altscan) {
const int ystep = (altscan) ? 2:1;
const int pitch = src->GetPitch(PLANAR_Y);
const int ystride = pitch*ystep;
const BYTE *srcp = src->GetReadPtr(PLANAR_Y) + (yy * pitch) + xx;
__int64 acc = 0;
unsigned int sum = 0;
const int yhit = (altscan) ? (hh +1)>>1 : hh;
const unsigned int Pixels = (ww * yhit);
if(ww == 1) { // Special case for single pixel width
for(int y=yhit ; --y>=0;) {
sum += srcp[0];
srcp+= ystride;
}
} else {
const int eodd = (ww & 0x0F);
const int wm16 = ww - eodd;
for(int y=yhit; --y>=0 ;) {
switch(eodd) {
case 15: sum += srcp[wm16+14];
case 14: sum += srcp[wm16+13];
case 13: sum += srcp[wm16+12];
case 12: sum += srcp[wm16+11];
case 11: sum += srcp[wm16+10];
case 10: sum += srcp[wm16+9];
case 9: sum += srcp[wm16+8];
case 8: sum += srcp[wm16+7];
case 7: sum += srcp[wm16+6];
case 6: sum += srcp[wm16+5];
case 5: sum += srcp[wm16+4];
case 4: sum += srcp[wm16+3];
case 3: sum += srcp[wm16+2];
case 2: sum += srcp[wm16+1];
case 1: sum += srcp[wm16+0];
case 0: ;
}
for(int x=wm16; (x-=16)>=0 ; ) {
sum += (
srcp[x+15] +
srcp[x+14] +
srcp[x+13] +
srcp[x+12] +
srcp[x+11] +
srcp[x+10] +
srcp[x+ 9] +
srcp[x+ 8] +
srcp[x+ 7] +
srcp[x+ 6] +
srcp[x+ 5] +
srcp[x+ 4] +
srcp[x+ 3] +
srcp[x+ 2] +
srcp[x+ 1] +
srcp[x+ 0]
);
}
if(sum & 0x80000000) {acc += sum;sum=0;} // avoid possiblilty of overflow
srcp += ystride;
}
}
acc += sum;
double dacc = double(acc);
return dacc / Pixels;
}
Not that slow really. Similar method for other colorspace in "PVF_ ... " files. EDIT: The switch stuff only accounts for ww pixel width, does not take srcp memory alignment stuff into account, so could be improved to better use compiler vectorization type stuff {probably require additional switch thingy for end cases}. If always full frame {no coords}, then could take some shortcuts. {Avisynth+ frames LHS always aligned, not so for Avs standard 'in place' cropping} EDIT: Might be handy, [from here:- https://forum.doom9.org/showthread.p...61#post1935661 ] Code:
Function PitchTortureTest(clip c) { # IanB:- https://forum.doom9.org/showthread.php?p=1628159#post1628159
c
A=SelectEvery(4, 0)
B=SelectEvery(4, 1).AddBorders(0,0,8,0).Crop(0,0,-8,0)
C=SelectEvery(4, 2).AddBorders(0,0,16,0).Crop(0,0,-16,0)
D=SelectEvery(4, 3).AddBorders(2,0,22,0).Crop(2,0,-22,0)
Interleave(A,B,C,D)
}
EDIT: from OP, Quote:
From posted link for IanB thingy thread, here:- https://forum.doom9.org/showthread.p...16#post1935616 Quote:
Code:
if(invert) { // invert ?
if(tvy) { // tv levels invert ? [ TV levels center is 125.5 not 127.5, ie (16 + 235)/2 ]
ave = int(-(ave_D - 125.5) + 125.5 + 0.5); // TV_YUV Y mid = 125.5, invert, and Round
} else {
ave = int(ave_D + 0.5) ^ 0xFF; // PC_YUV Y mid = 127.5, symmetrical about 127.5 [EDIT: ADDED, or ave = 255 - int(ave_D + 0.5)]
}
aveU ^= 0xFF ;
aveV ^= 0xFF;
} else {
Quote:
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 4th July 2023 at 20:12. |
||||
|
|
|
|
|
#15 | Link |
|
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,394
|
Further to above,
Code:
W = 3 * 1280 # 3840
H = 3 * 720 # 2160
STATICFRAMES = False
SECONDS = 5 * 60
###
FRAMES = Round(29.97 * SECONDS)
# SECONDS seconds @ 29.97 FPS. STATICFRAMES, If set to false, generate all frames. Default true (one static frame is served)
Colorbars(Width=W,Height=H,pixel_type="YV12",staticframes=STATICFRAMES).Trim(0,-FRAMES)
# Comment one of below out
Return Scriptclip("AverageLuma() return last") # Avs+ builtin. Always full frame.
#Return Scriptclip("RT_AverageLuma() return last") # RT_Stats, RT_AverageLuma. Used to be faster than AVS 2.60 Standard.
Code:
c:\Z>avsmeter64 test.avs AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021 AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0) Number of frames: 8991 Length (hh:mm:ss.ms): 00:05:00.000 Frame width: 3840 Frame height: 2160 Framerate: 29.970 (30000/1001) Colorspace: YV12 Audio channels: 2 Audio bits/sample: 32 (Float) Audio sample rate: 48000 Audio samples: 14399985 Frames processed: 8991 (0 - 8990) FPS (min | max | average): 554.7 | 890.6 | 843.1 Process memory usage (max): 104 MiB Thread count: 16 CPU usage (average): 7.9% Time (elapsed): 00:00:10.664 Code:
c:\Z>avsmeter64 test.avs AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021 AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0) Number of frames: 8991 Length (hh:mm:ss.ms): 00:05:00.000 Frame width: 3840 Frame height: 2160 Framerate: 29.970 (30000/1001) Colorspace: YV12 Audio channels: 2 Audio bits/sample: 32 (Float) Audio sample rate: 48000 Audio samples: 14399985 Frames processed: 8991 (0 - 8990) FPS (min | max | average): 220.2 | 382.7 | 329.0 Process memory usage (max): 104 MiB Thread count: 14 CPU usage (average): 8.0% Time (elapsed): 00:00:27.328 4K : AVS+ AverageLuma : STATICFRAMES = TRUE Code:
c:\Z>avsmeter64 test.avs AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021 AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0) Number of frames: 8991 Length (hh:mm:ss.ms): 00:05:00.000 Frame width: 3840 Frame height: 2160 Framerate: 29.970 (30000/1001) Colorspace: YV12 Audio channels: 2 Audio bits/sample: 32 (Float) Audio sample rate: 48000 Audio samples: 14399985 Frames processed: 8991 (0 - 8990) FPS (min | max | average): 2335 | 3244 | 3082 Process memory usage (max): 81 MiB Thread count: 16 CPU usage (average): 6.9% Time (elapsed): 00:00:02.917 Code:
c:\Z>avsmeter64 test.avs AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021 AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0) Number of frames: 8991 Length (hh:mm:ss.ms): 00:05:00.000 Frame width: 3840 Frame height: 2160 Framerate: 29.970 (30000/1001) Colorspace: YV12 Audio channels: 2 Audio bits/sample: 32 (Float) Audio sample rate: 48000 Audio samples: 14399985 Frames processed: 8991 (0 - 8990) FPS (min | max | average): 511.8 | 859.5 | 801.5 Process memory usage (max): 81 MiB Thread count: 16 CPU usage (average): 7.9% Time (elapsed): 00:00:11.218 Suggest steal some of his code [I will not tell him]. Also note, Scriptclip would slow it quite a bit compared with GetFrame() in plugin. We did not assign AverageLuma to variable in scriptclip, as that would likely greatly affect results. However, FPS for the RT_AverageLuma aint so very bad for 4K, and you would likely want pure C version anyway. EDIT: Numbers above on i7-8700. [No Prefetch {also Scriptclip single thread}, so I assume fully single core numbers] EDIT: Yep, Resource meter seems to show single core in use.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 4th July 2023 at 04:12. |
|
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|