Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 29th January 2026, 13:40   #3601  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,527
Quote:
Originally Posted by DTL View Post
It looks documentation for ConvertToPlanarRGB missed chromaresample params:
https://avisynthplus.readthedocs.io/...s/convert.html
RGB planar

ConvertToPlanarRGB(clip, [ string matrix, bool interlaced,
string ChromaInPlacement,
string chromaresample ] )
ConvertToPlanarRGBA(clip, [ string matrix, bool interlaced,
string ChromaInPlacement,
string chromaresample,
float param1, float param2, float param3 ] )

But sources for ConvertToPlanarRGB also lists param1,2,3
Thanks, obviously I missed it on a copy-paste session. Also, http://avisynth.nl/index.php/Convert is O.K.
In a next rstdoc update will include the fix.
pinterf is offline   Reply With Quote
Old 31st January 2026, 18:31   #3602  |  Link
v0lt
Registered User
 
Join Date: Dec 2008
Posts: 2,399
I'd like to clarify the GetFrame method.
Older VSFilters didn't throw exceptions in GetFrame, and it simply didn't work. The newer xy-VSFilter raises a ThrowError if it doesn't support the video format, which causes many applications (FFmpeg-based) to crash.
Example script.
Quote:
LoadPlugin("c:\temp\XySubFilter\VSFilter.dll")

Colorbars(1280, 720)
ConvertToPlanarRGB()

TextSub("c:\temp\test_subtitles.srt")
In my ScriptSourceFilter, the call to PClip::GetFrame is inside a try-catch block, but that doesn't help (or am I doing something wrong).

Is this an application issue or an xy-VSFilter issue?
v0lt is offline   Reply With Quote
Old 31st January 2026, 23:37   #3603  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,812
AvisynthError isn't derived from std::exception, so your code won't catch it (I think).

(Not sure why xy-VSFilter isn't checking the format in its constructor...)
__________________
My AviSynth filters / I'm the Doctor
wonkey_monkey is offline   Reply With Quote
Old 1st February 2026, 06:45   #3604  |  Link
v0lt
Registered User
 
Join Date: Dec 2008
Posts: 2,399
Quote:
Originally Posted by wonkey_monkey View Post
AvisynthError isn't derived from std::exception, so your code won't catch it (I think).
Thanks. I forgot about that again. I fixed it.

But this does not work with avformat_open_input.

Added:
Quote:
Originally Posted by wonkey_monkey View Post
(Not sure why xy-VSFilter isn't checking the format in its constructor...)
Thanks. It works!

Last edited by v0lt; 1st February 2026 at 08:41.
v0lt is offline   Reply With Quote
Old 3rd February 2026, 09:15   #3605  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,527
New build: Avisynth r4483
https://github.com/pinterf/AviSynthP...3.7.6pre-r4483

Code:
20260203 3.7.5.r4483 (pre 3.7.6)
--------------------------------
* rst documentation update: RGBAdjust https://avisynthplus.readthedocs.io/...rs/adjust.html
* rst documentation update: ColorYUV https://avisynthplus.readthedocs.io/.../coloryuv.html
* optimization: add AVX2 TurnLeft/TurnRight/Turn180 (R/L: 1,5-3x speed).
* optimization: ConvertBits AVX2 integer->float
* optimization: ConvertToPlanarRGB(A): YUV->RGB add AVX2 (2-3x speed)
* optimization: ConvertToPlanarRGB(A): YUV->RGB 16 bit: a quicker way (1,5x)
* Fix: C version of 32-bit ConvertToPlanarRGB YUV->RGB to not clamp output RGB values.
* ConvertToPlanarRGB(A): add bits parameter to alter target bit-depth.
* ConvertToPlanarRGB(A): from YUV->RGB full range output: optimized in-process when bits=32, other cases call ConvertBits internally.
* Fix: Packed RGB conversions altering the bit-depth (e.g. rgb32->ConvertToRGB64() worked always in full range.
* Add more AVX512 resampler code. (WIP)
* Add more AVX512_BASE code paths (Resamplers)
* Build: add _avx512b.cpp/hpp pattern in CMake to detect source to compile with base (F,CD,BW,DQ,VL) flags.
  However AVX512_BASE itself is set only when AVX512_FAST found.
  For pre-Ice Lake (older AVX512) systems you can enable it with SetMaxCPU("avx512base+") and get the optimized AVX512_BASE functions.
* Build: add new architecture z/Architecture
pinterf is offline   Reply With Quote
Old 4th February 2026, 06:39   #3606  |  Link
Kurt.noise
Registered User
 
Join Date: Nov 2022
Location: Aix en Provence, France
Posts: 163
Hi,

Didnt look closely but I've seen CUDA parameters into the compilation settings. What is it for exactly ?
Kurt.noise is offline   Reply With Quote
Old 4th February 2026, 08:11   #3607  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,476
It looks for memory management (and filtergraph filters interconnection ?) for CUDA-based plugins (no CUDA internal filters currently ?). It looks like in old times when the number of programmers was big there was an attempt to make CUDA-computing avisynth with script-based filters interconnection inside CUDA-accelerator and/or even a mix of onCPU and onCUDA filters. But the development of this looks like it stopped a long time ago. Now all filters work as onCPU and if a filter needs some external acceleration it makes all memory management inside itself. This causes some performance loss in case of single filter and more in case of a filterchain but it is more simple in support by the current limited number of developers.

There is an idea for special onAccelerator filters extended filters interconnection interface like GetFrameToBuffer(buffer_description) so that onAccelerator based filters can ask for storage of a frame directly into an allocated upload buffer in global virtual memory address space. Currently AI/NN filters like RIFE and avs-mlrt only can get frames by standard GetFrame() method and make additional copy into allocated buffers for uploading to the accelerator. With the largest RGBPS format this causes great load on the memory subsystem (though while AI/NN filters are not very fast it may be not greatly visible).

As I understand, the idea of the CUDA-based filtergraph was to save from frame resources upload/download to/from host RAM if several CUDA-based filters are connected in a graph inside an external accelerator. Current AVS filters interconnection (typically via software Cache() for frame-based MT) require to write frames to host RAM cache buffers and read from cache buffers (though some DO_NOT_CACHE_ME mode exist and filters can attempt for direct connection via GetFrame() ?).
DTL is offline   Reply With Quote
Old 4th February 2026, 11:29   #3608  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 3,075
Quote:
Originally Posted by DTL View Post
As I understand, the idea of the CUDA-based filtergraph was to save from frame resources upload/download to/from host RAM
So, there could be an implementation for my beloved BM3DCUDA, where the temporal part is CPU bonded?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 6th February 2026, 08:51   #3609  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,476
CUDA-based filters may interact with other CUDA-based filters without downloading input/output frames to host RAM.

Important update for use with ML/AI/NN filters -
ConvertToPlanarRGB(A): add bits parameter to alter target bit-depth.

It mean instead of long and slow sequence

ConvertToPlanarRGB()
ConvertBits(32)

before RIFE/avs-mlrt filters after r4483 can be used single (and faster) filter

ConvertToPlanarRGB(bits=32)

It is expected finally faster in comparison with avsresize Z_ConvertFormat(pixel_type="RGBPS") if convert from YUV.

Also uint8..16 to float32 precision may become a bit better because the new function uses direct int32 immediate to float32 conversion without immediate integer stage.
May be close in precision to possible (but also slower)
ConvertBits(32)
ConvertToPlanarRGB()
sequence.

Also waiting for a new test release from pinterf - it will have finally fixed ColorBarsHD to any integer precision and float and again fixed matrix/dematix part for better YUV<->RGB conversions to test precision of new functions. Currently ColorBarsHD uses only 8bit internal table and bit depth upscale and can not be used as a good source generator for precise matrix testing at different bit depths.
Precision of 10bit YUV to 8bit RGB conversion expected to be very close or equal to avsresize in new test release.

Last edited by DTL; 6th February 2026 at 16:01.
DTL is offline   Reply With Quote
Old 6th February 2026, 13:03   #3610  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,527
Quote:
Originally Posted by DTL View Post
CUDA-based filters may interact with other CUDA-based filters without downloading input/output frames to host RAM.

Important update for use with ML/AI/NN filters -
ConvertToPlanarRGB(A): add bits parameter to alter target bit-depth.

It mean instead of long and slow sequence

ConvertToPlanarRGB()
ConvertBits(32)

before RIFE/avs-mlrt filters after r4483 can be used single (and faster) filter

ConvertToPlanarRGB(bits=32)

It is expected finally faster in comparison with avsresize Z_ConvertFormat(pixel_type="RGBPS") if convert from YUV.

Also uint8..16 to float32 precision may become a bit better because the new function uses direct int32 immediate to float32 conversion without immediate integer stage.
May be close in precision to possible (but also slower)
ConvertBits(32)
ConvertToPlanarRGB()
sequence.

Also waiting for a new test release from pinterf - it will have finally fixed ColorBarsHD to any integer precision and float and again fixed matrix/dermatix part for better YUV<->RGB conversions to test precision of new functions. Currently ColorBarsHD uses only 8bit internal table and bit depth upscale and can not be used as a good source generator for precise matrix testing at different bit depths.
Precision of 10bit YUV to 8bit RGB conversion expected to be very close or equal to avsresize in new test release.
Yep, returning to this matrix conversion area during development caused a "bit" more work, testing, and experimentation than I had initially expected. I can't quite compete with the accuracy of avsresize, as it performs all matrix operations internally in 32 bit float.

However, at 16-bit, a possible +/-1 lsb error can be considered as "good enough".

When using full range, it's effectively the same as working in 32 bit float; for complex transformations (like full range bit depth conversion), my version uses float calculations internally as well.

Avisynth hasn't previously had chained (fused) filter options where a matrix was involved, so this bits= parameter is a first. It immediately became clear that while optimizing the simplest conversion was easy, the code bloats exponentially when you try to optimize every specific sub-case. Luckily, the more complex cases finally needed 'float' - inside and could be unified.

Combining the bit depth conversion is not only much faster but also more accurate than simply adding ConvertBits after the YUV-RGB conversion, so it was a very good and useful idea from DTL.
pinterf is offline   Reply With Quote
Old 6th February 2026, 13:18   #3611  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 3,075
Quote:
Originally Posted by DTL View Post
CUDA-based filters may interact with other CUDA-based filters without downloading input/output frames to host RAM.
My usual script is something like:

SetFilterMTMode("DEFAULT_MT_MODE", 2)
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
Import("D:\Eseguibili\Media\StaxRip\Apps\Plugins\AVS\DehaloAlpha\Dehalo_alpha.avsi")
Import("D:\Eseguibili\Media\StaxRip\Apps\Plugins\AVS\Dither\mt_xxpand_multi.avsi")
Import("D:\Eseguibili\Media\StaxRip\Apps\Plugins\AVS\FineDehalo\FineDehalo.avsi")
DGSource("M:\In\The promised neverland S1 ~Dynit\01-1.dgi")
z_ConvertFormat(resample_filter="Spline64", pixel_type="yuv420p16")
DeBilinearResizeMT(1280, 720, threads=2, prefetch=2, accuracy=2)
z_ConvertFormat(resample_filter="Spline64", pixel_type="yuv444ps")
BM3D_CUDA(sigma=12, radius=4, chroma=true, block_step=6, bm_range=12, ps_range=6, fast=false)
BM3D_VAggregate(radius=4)
z_ConvertFormat(resample_filter="spline64",dither_type="error_diffusion",pixel_type="yuv420p16")
FineDehalo(rx=2, ry=2, thmi=80, thma=128, thlimi=50, thlima=100, darkstr=0.3, brightstr=1.0, showmask=0, contra=0.0, excl=true)
libplacebo_Deband(radius=12, iterations=4, temporal=false, planes=[3,3,3])
fmtc_bitdepth (bits=10,dmode=8)
Prefetch(2,6)


How could I change it to have some performance increase, according to the last commits?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 6th February 2026, 16:17   #3612  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,476
Currently looks like no. In future some 420<->444 conversion performance increase expected in UV 2x upsize/downsize but it is not yet ported to AVS+ core so no test release exist. Also it is internal Resize() optimization and not required script changes.

As some testing may be recommended to try 14bit integers instead of 16bit. It was found 16bit format was not nice for performance in many AVS filters and processed by different functions in comparison with 10-14 bits. But if you use external filters - it may depend on their implementations. The precision may be not greatly different between 14 and 16bits.

Last edited by DTL; 6th February 2026 at 16:23.
DTL is offline   Reply With Quote
Old Yesterday, 23:24   #3613  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,376
Quote:
Originally Posted by v0lt View Post
Older VSFilters didn't throw exceptions in GetFrame, and it simply didn't work.
I remember it well. Sometimes I forgot to convert things at the end and I ended up wasting encoding hours only to find out that the output wasn't hardsubbed.

TextSub("whatever.ass") would literally pass the frames through without returning any error nor overlaying any subtitles.




Quote:
Originally Posted by DTL View Post
there was an attempt to make CUDA-computing avisynth with script-based filters interconnection inside CUDA-accelerator and/or even a mix of onCPU and onCUDA filters.
Yes, however it's always better to avoid going back and forth between CPU and GPU (so onCPU and onCUDA) because it means copying a lot of data and it might not be worth it. One classic example of this is Cube() and DGCube() from Donald Graft. Although - on paper - DGCube() makes the entire tetrahedral interpolation and application of the 65x65x65 LUT on the GPU and it's significantly computationally faster than doing it on the CPU with AVX512, copying huge frames from RAM (DDR) to VRAM (GDDR) via the motherboard lanes to perform the process and then back causes a significant slowdown and only really makes DGCube() a fraction faster than Cube(), thus losing almost all its advantage. We're talking about 16bit RGB Planar UHD frames here for normal content (or potentially 6K for post production stuff shot in log).

Quote:
Originally Posted by DTL View Post
Now all filters work as onCPU and if a filter needs some external acceleration it makes all memory management inside itself. This causes some performance loss in case of single filter and more in case of a filterchain but it is more simple in support by the current limited number of developers.
To be fair, this is very easily supportable and it's actually very simple from the user perspective as well given that there's no need to mess with MT Modes and Prefetch() either as the threadpool is created internally by the filter itself, so from a user perspective you know that all you have to do is write the script and it will run everywhere.

Also, when it comes to distributed farms like in the case of FFAStrans running on prem, the servers it's running on may not be all exactly the same and have the very same dedicated GPU with the same drivers etc and the same goes for the CPU cores, instruction sets (assembly optimization) etc so writing the scripts and letting the filters "figure it out" automatically on the CPU is actually very helpful.

Even in a cloud environment in which you typically have 1 machine = 1 job with a workflow running end to end, you only really have CPU only EC2 like the Elastic c6i.4xlarge and Elastic c6i.2xlarge that we're using as only the ones without a GPU would scale up and down by being created dynamically according to the number of jobs. Sure, one could set it up in a similar way with GPU powered EC2, but the likelihood of them not being created as there aren't any resources available for the region raises significantly, thus making the CPU only option way more appealing. TL;DR I would rather pick something slower but that works and is available all the time than something slightly faster that might not be available.

Quote:
Originally Posted by DTL View Post
Also waiting for a new test release from pinterf - it will have finally fixed ColorBarsHD to any integer precision and float and again fixed matrix/dematix part for better YUV<->RGB conversions to test precision of new functions. Currently ColorBarsHD uses only 8bit internal table and bit depth upscale and can not be used as a good source generator for precise matrix testing at different bit depths.
Precision of 10bit YUV to 8bit RGB conversion expected to be very close or equal to avsresize in new test release.
That's actually gonna be very good as I routinely use Avisynth's colorbars to test stuff, so thank you for looking into this guys.
FranceBB is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 02:53.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.