Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se |
|
|
#3601 | Link | |
|
Registered User
Join Date: Jan 2014
Posts: 2,527
|
Quote:
In a next rstdoc update will include the fix. |
|
|
|
|
|
|
#3602 | Link | |
|
Registered User
Join Date: Dec 2008
Posts: 2,399
|
I'd like to clarify the GetFrame method.
Older VSFilters didn't throw exceptions in GetFrame, and it simply didn't work. The newer xy-VSFilter raises a ThrowError if it doesn't support the video format, which causes many applications (FFmpeg-based) to crash. Example script. Quote:
Is this an application issue or an xy-VSFilter issue?
__________________
MPC-BE 1.8.9 and Nightly builds | VideoRenderer | ImageSource | ScriptSource | BassAudioSource |
|
|
|
|
|
|
#3604 | Link | ||
|
Registered User
Join Date: Dec 2008
Posts: 2,399
|
Quote:
But this does not work with avformat_open_input. Added: Quote:
__________________
MPC-BE 1.8.9 and Nightly builds | VideoRenderer | ImageSource | ScriptSource | BassAudioSource Last edited by v0lt; 1st February 2026 at 08:41. |
||
|
|
|
|
|
#3605 | Link |
|
Registered User
Join Date: Jan 2014
Posts: 2,527
|
New build: Avisynth r4483
https://github.com/pinterf/AviSynthP...3.7.6pre-r4483 Code:
20260203 3.7.5.r4483 (pre 3.7.6) -------------------------------- * rst documentation update: RGBAdjust https://avisynthplus.readthedocs.io/...rs/adjust.html * rst documentation update: ColorYUV https://avisynthplus.readthedocs.io/.../coloryuv.html * optimization: add AVX2 TurnLeft/TurnRight/Turn180 (R/L: 1,5-3x speed). * optimization: ConvertBits AVX2 integer->float * optimization: ConvertToPlanarRGB(A): YUV->RGB add AVX2 (2-3x speed) * optimization: ConvertToPlanarRGB(A): YUV->RGB 16 bit: a quicker way (1,5x) * Fix: C version of 32-bit ConvertToPlanarRGB YUV->RGB to not clamp output RGB values. * ConvertToPlanarRGB(A): add bits parameter to alter target bit-depth. * ConvertToPlanarRGB(A): from YUV->RGB full range output: optimized in-process when bits=32, other cases call ConvertBits internally. * Fix: Packed RGB conversions altering the bit-depth (e.g. rgb32->ConvertToRGB64() worked always in full range. * Add more AVX512 resampler code. (WIP) * Add more AVX512_BASE code paths (Resamplers) * Build: add _avx512b.cpp/hpp pattern in CMake to detect source to compile with base (F,CD,BW,DQ,VL) flags. However AVX512_BASE itself is set only when AVX512_FAST found. For pre-Ice Lake (older AVX512) systems you can enable it with SetMaxCPU("avx512base+") and get the optimized AVX512_BASE functions. * Build: add new architecture z/Architecture |
|
|
|
|
|
#3607 | Link |
|
Registered User
Join Date: Jul 2018
Posts: 1,476
|
It looks for memory management (and filtergraph filters interconnection ?) for CUDA-based plugins (no CUDA internal filters currently ?). It looks like in old times when the number of programmers was big there was an attempt to make CUDA-computing avisynth with script-based filters interconnection inside CUDA-accelerator and/or even a mix of onCPU and onCUDA filters. But the development of this looks like it stopped a long time ago. Now all filters work as onCPU and if a filter needs some external acceleration it makes all memory management inside itself. This causes some performance loss in case of single filter and more in case of a filterchain but it is more simple in support by the current limited number of developers.
There is an idea for special onAccelerator filters extended filters interconnection interface like GetFrameToBuffer(buffer_description) so that onAccelerator based filters can ask for storage of a frame directly into an allocated upload buffer in global virtual memory address space. Currently AI/NN filters like RIFE and avs-mlrt only can get frames by standard GetFrame() method and make additional copy into allocated buffers for uploading to the accelerator. With the largest RGBPS format this causes great load on the memory subsystem (though while AI/NN filters are not very fast it may be not greatly visible). As I understand, the idea of the CUDA-based filtergraph was to save from frame resources upload/download to/from host RAM if several CUDA-based filters are connected in a graph inside an external accelerator. Current AVS filters interconnection (typically via software Cache() for frame-based MT) require to write frames to host RAM cache buffers and read from cache buffers (though some DO_NOT_CACHE_ME mode exist and filters can attempt for direct connection via GetFrame() ?). |
|
|
|
|
|
#3609 | Link |
|
Registered User
Join Date: Jul 2018
Posts: 1,476
|
CUDA-based filters may interact with other CUDA-based filters without downloading input/output frames to host RAM.
Important update for use with ML/AI/NN filters - ConvertToPlanarRGB(A): add bits parameter to alter target bit-depth. It mean instead of long and slow sequence ConvertToPlanarRGB() ConvertBits(32) before RIFE/avs-mlrt filters after r4483 can be used single (and faster) filter ConvertToPlanarRGB(bits=32) It is expected finally faster in comparison with avsresize Z_ConvertFormat(pixel_type="RGBPS") if convert from YUV. Also uint8..16 to float32 precision may become a bit better because the new function uses direct int32 immediate to float32 conversion without immediate integer stage. May be close in precision to possible (but also slower) ConvertBits(32) ConvertToPlanarRGB() sequence. Also waiting for a new test release from pinterf - it will have finally fixed ColorBarsHD to any integer precision and float and again fixed matrix/dematix part for better YUV<->RGB conversions to test precision of new functions. Currently ColorBarsHD uses only 8bit internal table and bit depth upscale and can not be used as a good source generator for precise matrix testing at different bit depths. Precision of 10bit YUV to 8bit RGB conversion expected to be very close or equal to avsresize in new test release. Last edited by DTL; 6th February 2026 at 16:01. |
|
|
|
|
|
#3610 | Link | |
|
Registered User
Join Date: Jan 2014
Posts: 2,527
|
Quote:
However, at 16-bit, a possible +/-1 lsb error can be considered as "good enough". When using full range, it's effectively the same as working in 32 bit float; for complex transformations (like full range bit depth conversion), my version uses float calculations internally as well. Avisynth hasn't previously had chained (fused) filter options where a matrix was involved, so this bits= parameter is a first. It immediately became clear that while optimizing the simplest conversion was easy, the code bloats exponentially when you try to optimize every specific sub-case. Luckily, the more complex cases finally needed 'float' - inside and could be unified. Combining the bit depth conversion is not only much faster but also more accurate than simply adding ConvertBits after the YUV-RGB conversion, so it was a very good and useful idea from DTL. |
|
|
|
|
|
|
#3611 | Link | |
|
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 3,075
|
Quote:
SetFilterMTMode("DEFAULT_MT_MODE", 2) LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll") Import("D:\Eseguibili\Media\StaxRip\Apps\Plugins\AVS\DehaloAlpha\Dehalo_alpha.avsi") Import("D:\Eseguibili\Media\StaxRip\Apps\Plugins\AVS\Dither\mt_xxpand_multi.avsi") Import("D:\Eseguibili\Media\StaxRip\Apps\Plugins\AVS\FineDehalo\FineDehalo.avsi") DGSource("M:\In\The promised neverland S1 ~Dynit\01-1.dgi") z_ConvertFormat(resample_filter="Spline64", pixel_type="yuv420p16") DeBilinearResizeMT(1280, 720, threads=2, prefetch=2, accuracy=2) z_ConvertFormat(resample_filter="Spline64", pixel_type="yuv444ps") BM3D_CUDA(sigma=12, radius=4, chroma=true, block_step=6, bm_range=12, ps_range=6, fast=false) BM3D_VAggregate(radius=4) z_ConvertFormat(resample_filter="spline64",dither_type="error_diffusion",pixel_type="yuv420p16") FineDehalo(rx=2, ry=2, thmi=80, thma=128, thlimi=50, thlima=100, darkstr=0.3, brightstr=1.0, showmask=0, contra=0.0, excl=true) libplacebo_Deband(radius=12, iterations=4, temporal=false, planes=[3,3,3]) fmtc_bitdepth (bits=10,dmode=8) Prefetch(2,6) How could I change it to have some performance increase, according to the last commits?
__________________
@turment on Telegram |
|
|
|
|
|
|
#3612 | Link |
|
Registered User
Join Date: Jul 2018
Posts: 1,476
|
Currently looks like no. In future some 420<->444 conversion performance increase expected in UV 2x upsize/downsize but it is not yet ported to AVS+ core so no test release exist. Also it is internal Resize() optimization and not required script changes.
As some testing may be recommended to try 14bit integers instead of 16bit. It was found 16bit format was not nice for performance in many AVS filters and processed by different functions in comparison with 10-14 bits. But if you use external filters - it may depend on their implementations. The precision may be not greatly different between 14 and 16bits. Last edited by DTL; 6th February 2026 at 16:23. |
|
|
|
|
|
#3613 | Link | ||||
|
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,376
|
Quote:
TextSub("whatever.ass") would literally pass the frames through without returning any error nor overlaying any subtitles. ![]() ![]() Quote:
Quote:
Also, when it comes to distributed farms like in the case of FFAStrans running on prem, the servers it's running on may not be all exactly the same and have the very same dedicated GPU with the same drivers etc and the same goes for the CPU cores, instruction sets (assembly optimization) etc so writing the scripts and letting the filters "figure it out" automatically on the CPU is actually very helpful. Even in a cloud environment in which you typically have 1 machine = 1 job with a workflow running end to end, you only really have CPU only EC2 like the Elastic c6i.4xlarge and Elastic c6i.2xlarge that we're using as only the ones without a GPU would scale up and down by being created dynamically according to the number of jobs. Sure, one could set it up in a similar way with GPU powered EC2, but the likelihood of them not being created as there aren't any resources available for the region raises significantly, thus making the CPU only option way more appealing. TL;DR I would rather pick something slower but that works and is available all the time than something slightly faster that might not be available. Quote:
|
||||
|
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|