Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se |
|
|
#1641 | Link |
|
...?
Join Date: Nov 2005
Location: Florida
Posts: 1,501
|
[Some? Most?] of the filter sources that have x86 intrinsics in them are in the process of being de-duplicated. It would seem there's a regression introduced in there somewhere, since a build with the intrinsics disabled works as expected:
Code:
$ mpv test.avs mpv: ../avs_core/filters/resample.cpp:424: FilteredResizeH::FilteredResizeH(PClip, double, double, int, ResamplingFunction*, IScriptEnvironment*): Assertion `0' failed. Aborted (core dumped) Code:
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737101604416)
at pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140737101604416)
at pthread_kill.c:80
#2 __GI___pthread_kill (threadid=140737101604416, signo=signo@entry=6)
at pthread_kill.c:91
#3 0x00007ffff5e5f476 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/posix/raise.c:26
#4 0x00007ffff5e457b7 in __GI_abort () at abort.c:79
#5 0x00007ffff5e456db in __assert_fail_base
(fmt=0x7ffff5ff9770 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7fffd29785a1 "0", file=0x7fffd2978580 "../avs_core/filters/resample.cpp", line=424, function=<optimized out>) at assert.c:92
#6 0x00007ffff5e56e26 in __GI___assert_fail
(assertion=0x7fffd29785a1 "0", file=0x7fffd2978580 "../avs_core/filters/resample.cpp", line=424, function=0x7fffd2978518 "FilteredResizeH::FilteredResizeH(PClip, double, double, int, ResamplingFunction*, IScriptEnvironment*)")
at assert.c:101
#7 0x00007fffd25366c3 in FilteredResizeH::FilteredResizeH(PClip, double, double, int, ResamplingFunction*, IScriptEnvironment*) ()
at /usr/local/lib/libavisynth.so
#8 0x00007fffd25397e7 in FilteredResize::CreateResizeH(PClip, double, double, int, ResamplingFunction*, IScriptEnvironment*) ()
--Type <RET> for more, q to quit, c to continue without paging--
at /usr/local/lib/libavisynth.so
#9 0x00007fffd2539d9f in FilteredResize::CreateResize(PClip, int, int, AVSValue const*, ResamplingFunction*, IScriptEnvironment*) ()
at /usr/local/lib/libavisynth.so
#10 0x00007fffd253a182 in FilteredResize::Create_BilinearResize(AVSValue, void*, IScriptEnvironment*) () at /usr/local/lib/libavisynth.so
#11 0x00007fffd223c19b in FilterConstructor::InstantiateFilter() const ()
at /usr/local/lib/libavisynth.so
#12 0x00007fffd2276b73 in ScriptEnvironment::Invoke_(AVSValue*, AVSValue const&, char const*, Function const*, AVSValue const&, char const* const*, InternalEnvironment*, bool) () at /usr/local/lib/libavisynth.so
#13 0x00007fffd22822a2 in ThreadScriptEnvironment::Invoke_(AVSValue*, AVSValue const&, char const*, Function const*, AVSValue const&, char const* const*) ()
at /usr/local/lib/libavisynth.so
#14 0x00007fffd22d88a0 in ExpFunctionCall::Evaluate(IScriptEnvironment*) ()
at /usr/local/lib/libavisynth.so
#15 0x00007fffd22d5037 in ExpExceptionTranslator::Evaluate(IScriptEnvironment*)
() at /usr/local/lib/libavisynth.so
#16 0x00007fffd22d53cf in ExpLine::Evaluate(IScriptEnvironment*) ()
at /usr/local/lib/libavisynth.so
#17 0x00007fffd22d4f8e in ExpSequence::Evaluate(IScriptEnvironment*) ()
at /usr/local/lib/libavisynth.so
#18 0x00007fffd22d4e25 in ExpRootBlock::Evaluate(IScriptEnvironment*) ()
--Type <RET> for more, q to quit, c to continue without paging--
at /usr/local/lib/libavisynth.so
#19 0x00007fffd22dbe15 in Eval(AVSValue, void*, IScriptEnvironment*) ()
at /usr/local/lib/libavisynth.so
#20 0x00007fffd223c19b in FilterConstructor::InstantiateFilter() const ()
at /usr/local/lib/libavisynth.so
#21 0x00007fffd2276ae5 in ScriptEnvironment::Invoke_(AVSValue*, AVSValue const&, char const*, Function const*, AVSValue const&, char const* const*, InternalEnvironment*, bool) () at /usr/local/lib/libavisynth.so
#22 0x00007fffd2281c03 in ThreadScriptEnvironment::Invoke(char const*, AVSValue, char const* const*) () at /usr/local/lib/libavisynth.so
#23 0x00007fffd22dcec9 in Import(AVSValue, void*, IScriptEnvironment*) ()
at /usr/local/lib/libavisynth.so
#24 0x00007fffd223c19b in FilterConstructor::InstantiateFilter() const ()
at /usr/local/lib/libavisynth.so
#25 0x00007fffd2276ae5 in ScriptEnvironment::Invoke_(AVSValue*, AVSValue const&, char const*, Function const*, AVSValue const&, char const* const*, InternalEnvironment*, bool) () at /usr/local/lib/libavisynth.so
#26 0x00007fffd2281c03 in ThreadScriptEnvironment::Invoke(char const*, AVSValue, char const* const*) () at /usr/local/lib/libavisynth.so
#27 0x00007fffd22aa949 in avs_invoke () at /usr/local/lib/libavisynth.so
#28 0x000055555574e4ba in ()
#29 0x0000555555c80201 in ()
#30 0x0000555555815984 in ()
--Type <RET> for more, q to quit, c to continue without paging--
#31 0x000055555580dc0d in ()
#32 0x000055555580e49a in ()
#33 0x000055555586f716 in ()
#34 0x00007ffff5eb1927 in start_thread (arg=<optimized out>)
at pthread_create.c:435
#35 0x00007ffff5f419e4 in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
|
|
|
|
|
|
#1642 | Link |
|
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 3,075
|
I am having crashes too with some scripts and not others.
The following SetMemoryMax() SetFilterMTMode("DEFAULT_MT_MODE", 2) LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll") DGSource("F:\In\1_58 Ieri oggi domani\ieri.dgi",ct=132,cb=132,cl=0,cr=0) CompTest24(1) ConvertBits(16) SMDegrain (tr=3, thSAD=300, refinemotion=true, contrasharp=false, PreFilter=6, plane=4, chroma=true) fmtc_bitdepth (bits=8,dmode=8) Prefetch(6) gives me System exception - Access Violation (D:/Programmi/Media/AviSynth+/plugins64/SMDegrain-3.3.9d~Dogway.avsi, line 901) (D:/Programmi/Media/AviSynth+/plugins64/SMDegrain-3.3.9d~Dogway.avsi, line 228)
__________________
@turment on Telegram |
|
|
|
|
|
#1643 | Link | |
|
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,379
|
Quote:
Code:
ColorBars(848, 480, pixel_type="YV12") BilinearResize(1920, 1080) Even worse, getting rid of BilinearResize() makes it crash just as well, for instance: Code:
ColorBars(848, 480, pixel_type="YV12") Trying with: Code:
SetMaxCPU("none")
ColorBars(848, 480, pixel_type="YV12")
![]() same goes for: Code:
SetMaxCPU("none")
ColorBars(848, 480, pixel_type="YV12")
Spline64Resize(1280, 720)
![]() and even more complicated scripts like: Code:
SetMaxCPU("none")
FFVideoSource("\\mibctvan000.avid.mi.bc.sky.it\Ingest\MEDIA\temp\SIC_Preview_REC709_20211124_de.mov")
DeBilinearResizeMT(720, 480)
t = QTGMC( Preset="Slower", InputType=2, ProgSADMask=1.0, ShutterBlur=3)
b = QTGMC( Preset="Slower", InputType=3, PrevGlobals="Reuse" )
Repair( t, b, 1 )
mt_convolution("1","1 2 1",chroma="process")
Blur(0.0, 1.58).Blur(0.0, 1.58).Blur(0.0, 1.58).Blur(0.0, 1.58)
dfttest(sigma=64, tbsize=1, lsb_in=false, lsb=false, Y=true, U=true, V=true, opt=0, dither=0)
dfttest(sigma=64, tbsize=1, lsb_in=false, lsb=false, Y=true, U=true, V=true, opt=0, dither=0)
super = MSuper(pel=2, sharp=1)
bv1 = MAnalyse(super, isb = true, delta = 1, overlap=4)
fv1 = MAnalyse(super, isb = false, delta = 1, overlap=4)
bv2 = MAnalyse(super, isb = true, delta = 2, overlap=4)
fv2 = MAnalyse(super, isb = false, delta = 2, overlap=4)
MDegrain2(super,bv1,fv1,bv2,fv2,thSADC=800, thSAD=800)
Spline64ResizeMT(2048, 858)
![]() Hence confirming Stephen's theory about intrinsics being the cause of the crashes. I'm gonna stick with Test 32 in all my servers. Last edited by FranceBB; 8th December 2021 at 13:11. |
|
|
|
|
|
|
#1644 | Link |
|
Registered User
Join Date: Nov 2009
Posts: 2,375
|
Yes, no problem, I was just confirming this was known. I installed test33 only to update all the filters that used pixel addressing and could benefit also from scale_inputs.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread |
|
|
|
|
|
#1646 | Link | |
|
Registered User
Join Date: Nov 2009
Posts: 2,375
|
Quote:
BTW do you know if CombinePlanes can be optimized? Currently the old YtoUV() is faster by some 15%, probably same with mergechroma() and mergeluma(). Also StainlessS stated that using 'lut' with scaled_inputs (ie. lut over scaled down to 8-bit expression) would be slower than realtime.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread |
|
|
|
|
|
|
#1648 | Link | ||
|
Registered User
Join Date: Jan 2014
Posts: 2,533
|
Quote:
Maybe. The in-clip plane shuffles are optimized if I remember correctly. The other cases make a new empty frame and copy source planes into that. In YtoUV the original Y plane could be kept and only U and V is copied actually. Quote:
|
||
|
|
|
|
|
#1649 | Link |
|
Registered User
Join Date: Nov 2009
Posts: 2,375
|
Yes, actually I was testing with the following:
Code:
Y = ExtractY() U = ExtractU() V = ExtractV() # some per plane filtering YtoUV(U,V,Y) # CombinePlanes(Y,U,V,planes="YUV",sample_clip=a) Thanks for looking into that.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread |
|
|
|
|
|
#1650 | Link |
|
Video damager
Join Date: Sep 2008
Posts: 1,260
|
Can CombinePlanes combine directly to YV12 when U and V clips are half resolution of Y?
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling |
|
|
|
|
|
#1651 | Link | |
|
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,407
|
Quote:
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? |
|
|
|
|
|
|
#1652 | Link | ||
|
Registered User
Join Date: Nov 2009
Posts: 2,375
|
Quote:
Actually even Expr() can combine planes if you declare the "format" type, but it might use the slow CombinePlanes() code path, so maybe pinterf can also look into that. Quote:
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread |
||
|
|
|
|
|
#1653 | Link | |
|
Registered User
Join Date: Jan 2014
Posts: 2,533
|
Quote:
EDIT: the provided sample was not perfect: you don't need to specify a third Y parameter because it is then copied as well and the whole thing is perfectly identical to present CombinePlanes. Code:
Colorbars(pixel_type="YV12") Y = ExtractY() U = ExtractU() V = ExtractV() # some per plane filtering YtoUV(U,V) # instead of YtoUV(U,V,Y) EDIT2: The speed difference was because I was stupid and omitted the Y parameter from the YtoUV example. Finally. These are giving identical results. With identical speed. They are working the same way internally. They create a new empty frame then copy source planes bytes one by one. Code:
Colorbars(pixel_type="YV12") a=last Y = ExtractY() U = ExtractU() V = ExtractV() # some per plane filtering #YtoUV(U,V,Y) CombinePlanes(Y,U,V,planes="YUV", source_planes="YYY",sample_clip=a) Last edited by pinterf; 9th December 2021 at 17:33. |
|
|
|
|
|
|
#1654 | Link |
|
Registered User
Join Date: Nov 2009
Posts: 2,375
|
Yes sorry for the delay. Indeed with synthetic or even staged tests I get almost same speed (with a slight edge on YtoUV), in any case my tests were with the rework of ex_gaussianblur() which uses by default mergeluma() (UV=2). Here an almost finished version of the filter.
Testing with 1080 Code:
setmemorymax()
DGSource("1080psource.dgi")
ConvertBits(16)
ex_gaussianblur(6) # 400fps (340fps with CombinePlanes)
Prefetch(4) # This seems ideal value for scalers on a 4/8 CPU
EDIT: Ok, this is a simpler script but still doesn't show the 15% speed difference Code:
Y = ExtractY().BicubicResize(round(width()/1.5),round(height()/1.5)) Y = Y.BicubicResize(width(),height()) mergeluma(y) EDIT2: Ok, here's a stripped down test that starts to show the issue (still not quite 15% diff) probably cropping in-between makes things worse: Code:
a=last ExtractY().BilinearResize(344,204) GaussResize(a.width(),a.height(),p=9) #mergeluma(a,last) CombinePlanes(last,a,planes="YUV",sample_clip=a)
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 9th December 2021 at 20:14. |
|
|
|
|
|
#1655 | Link |
|
Video damager
Join Date: Sep 2008
Posts: 1,260
|
Tested and got +11% speed with CombinePlanes, instead of upsizing chroma then downsizing it when using MergeLuma.
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling |
|
|
|
|
|
#1656 | Link |
|
Registered User
Join Date: Nov 2009
Posts: 2,375
|
This shows a +6% speed for mergeluma():
Code:
setmemorymax()
DGSource("1080psource.dgi")
ConvertBits(16)
a=last
w0=width() h0=height() p=64
ExtractY().BilinearResize(344,204, src_left=-p, src_top=-p, src_width=w0+p+p, src_height=h0+p+p)
GaussResize(w0+p+p,h0+p+p,p=9)
crop (p, p, -p, -p)
mergeluma(a,last) # 384
#CombinePlanes(last,a,planes="YUV",sample_clip=a) # 363
Prefetch(4)
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread |
|
|
|
|
|
#1657 | Link |
|
Video damager
Join Date: Sep 2008
Posts: 1,260
|
What if you remove decoding "bottleneck" with BlankClip(last) after DGSource? I didn't used Prefetch.
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling |
|
|
|
|
|
#1658 | Link |
|
Registered User
Join Date: Nov 2009
Posts: 2,375
|
With BlankClip(last) it's same speed using Prefetch(4) but I always test in context with real world material, maybe CombinePlanes() is superfast on its own, but has a harder time when frame is served in a specified manner.
To note I also experienced these slowdowns in TransformsPack which is a different monster, there I don't crop nor resize, simply do per-plane matrix operations with Expr() and then Combine. Code:
YUV444 source # to don't add more YUV -> RGB overhead
m=RGB_to_XYZ("sRGB",true)
MatrixClip2(m)
Prefetch(4)
function MatrixClip2 ( clip clp, float_array mat, string "fmt_o") {
rgb = isRGB(clp)
fmt_o = Default(fmt_o, rgb ? "RGB" : "YUV")
CLPa = ExtractClip(clp)
# clip · 3x3
C = DotClipA(CLPa,[mat[0],mat[3],mat[6]])
L = DotClipA(CLPa,[mat[1],mat[4],mat[7]])
P = DotClipA(CLPa,[mat[2],mat[5],mat[8]])
# YtoUV(L, P, C) } # 141
CombinePlanes(C, L, P, planes=fmt_o) } # 139
With prefetch(6) this doesn't happen, they are same speed, but at the cost of a lower speed. The optimal Prefetch here again is 4, at least for my CPU which sees both methods increase speed albeit one more than the other. On another note, in AddBorders() I was about to add "color" to the Alpha plane of RGB, there's no such an option right?
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 10th December 2021 at 00:29. |
|
|
|
|
|
#1659 | Link | |
|
Registered User
Join Date: Jan 2014
Posts: 2,533
|
Quote:
When the input clip is not referenced by other clips in the filter chain (there is exactly one reference on it, technically "IsWritable") MergeLuma can obtain a write-permission directly, sparing the need of copying Y plane. I'm gonna check this on CombinePlane. EDIT: MergeLuma is called w/o passing weight, so weight is 1.0. This means the luma of second clip (last) is kept 100%. This also means that the smaller U and V planes (4:2:0) are needed to be copied, saving time. But: this is not the case here, since 'last' is a luma-only Y and cannot accept U+V copy. In this MergeLuma example all three planes are copied to a brand new empty frame which is the worst case. Last edited by pinterf; 10th December 2021 at 12:57. |
|
|
|
|
|
|
#1660 | Link |
|
Registered User
Join Date: Nov 2009
Posts: 2,375
|
Thanks for looking into it. Not sure what that means, that MergeLuma (CombinePlanes regardless) can work even faster? In any case good you could spot it because a 15% speed difference isn't normal with the example of ex_gaussianblur().
I think I found another bug, masktools2 related though while trying to match my ex_lutspa() version: Code:
mt_lutspa(mode="relative", expr="x range_max *",U=128,V=128) And some issues with internal filters with color arguments like BlankClip, Blackness, Letterbox, AddBorders and FadeXXX. If you specifiy white (color_yuv=$ffffff or $ff8080) output is 65280 for 16-bit, not suitable for masks. I think a solution would be to map automatically 0~15 and 236~255 values to full scale if a fulld argument is not desired.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 11th December 2021 at 18:25. |
|
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|