Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 5th December 2020, 11:44   #141  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 61
If the full source plane for process will fit CPU cache it may be better in compare with different_thread_per_different_frame (as may be with MT-avisynth works). If not - it will creates as many memory read streams as threads count and user will see significant speed drop if processing input plane size > CPU cache size.

It looks the whole point of ResampleMT multithreading in compare with Avisynth-level MT is to make thread's processing pieces closer to each other in memory addresses space. The closest possible is neibour-samples processing.

" splitting image horizontaly."

May be an idea - at time of threads planning make compare of plane size against CPU cache size. If plane size < CPU cache size the threads planner may simply split image to N-threads parts. If input plane size is > CPU cache size so at first split input plane into number of parts like plane size / cache size and assign N-threads processing for each part sequentally.
May be because it may be hard to determine CPU cache size via API make separate user-defined parameter like CPU_cache_size (or may be better MAX_BLOCK_SIZE_FOR_MT). Because of different memory managers at different CPUs may be the optimal MT block size may be significally < CPU cache size. So user may determine the optimal size using tests at his hardware.

If user set too small MT block size it will got performance penalty because of too frequent threads start/stop/re-assigning new piece of work. If splitting image buffer horizontally to each thread workunit is measured in processing rows. May be allow user to determine max_thread_rows count. If undefined - threads planner as now calculated each_thread_rows = height/num_threads (i suppose). If defined - threads planner limit each_thread_rows to max_thread_rows and after (all ?) intial threads finishes - start new threads to process the remaining rows (blocks of remaining rows). The waiting for _all_ initial threads finishes before starting threads processing new rows blocks may be required for keeping current cached data unchanged (because new rows blocks required new memory reads). So it is also may be user-defined parameter - wait_for_all_threads_finish=yes/no.

2Dx2D convolution require many reads of input samples (I think as many as kernel_size^2 times) so will definately benefits from cached reads and may perform uncached writes to memory because output samples are never required for processing again (in compare with Avisynth 1Dx1D processing twice for resize). So if possible the ASM 2Dx2D convolution subroutines may be modified to use non-cached hinted write instructions and may be look into WinAPI for caching properties of memory pages.

Last edited by DTL; 5th December 2020 at 12:11.
DTL is offline   Reply With Quote
Old 6th December 2020, 07:06   #142  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 61
Quote:
Originally Posted by real.finder View Post
edit: there are also JincResize maybe worth adding too
The more 'strategic' ideas:

Actually the 'linear math' resizers consists of 3 discrete and interconnected parts:

kernel_base x weighting/windowing x resampler

Typical kernel_base are sinc,gauss,linear,cubic, etc. Jinc is also only kernel_base function defined as 1D f(x)=BesselJ1(x)/x. Sinc is equal to spherical BesslJ0(x) as i wee from wiki.

Typical weighting/windowing is 'non' that is 'rectangular' 1-based window limiting 'width/size' of kernel and many more like 1 lobe of Sinc, Jinc, linear, trapecoidal and any other function typically =1 at zero(start) and fading to =0 at its weighting/windowing end.

The most typical resampler is separated vertical + horizontal processing (1Dx1D twice convolution of weighted kernel with input source), that is fast and gives some acceptable quality with 2D image processing. There is also may be more suitable for 2D image processing resampler as direct of 'full' 2D convolution, but it require more CPU operations and typically not/rarely used because it was too slow on old CPUs and also not very great even on 2020 CPUs. It gives only 'a bit better' quality (may be +20..30% by my taste at some (extreme) tests) in compare with much more faster V+H processing. The main difference is that 2D-convolution resampler allow equally good process all angles spatial frequencies and V+H resampler only good process vertical and horizontal. Though on many real practical (usually and not very sharp) images the difference may be not greatly visible.

So typically 'named' 'linear math' resizers are just named some combination of kernel_base and weighting like:
Lanczos = sinc kernel weighted by sinc
Gauss = gauss kernel non-weighted (rect weighted and with good self-weighting properties)
Sinc = sinc non-weighted
SinPow = sort of 'self-weighted'
And typically processed by V+H resampler.

Jinc as in that pluging is looks jinc weighted by jinc and processed by 2D-convolution resampler. It is because Jinc mostly benefits of 2D-convolution resampler though can also be kernel for V+H resampler (and produce close results to sinc-family resizers I think, especially with small taps value).

So the 'linear' resizer library may use just 'one' resize function (like that in z.lib) and just defines all 3 components of resize like kernel + weighting + resampler as arguments. And also for user-friendly looking may provide named functions like typical Bilinear/Lancsos/etc. Because it is hard to make all possible combinations of 3-parts as named - so for advanced users the one_for_all parametrized function is shorter.

So currently to add Jinc to ResampleMT plugin the 2D-convolution resampler must be added as the most hard part and also some kernel functions like jinc as kernel base and jinc as weighting (may be jinc weighted by jinc for beginning as 1 func). The kernel+weighting is the simpliest part because todays C-programming uses standard math library for both sin(x) and bessel*(x).

As i see image-processing software (advanced enough/for 'geeks') already uses command-line syntax for providing manual-input of kernel_base and weighting functions to resampler - I see it in imagemagic forums.

It is good to have in ResamplerMT to have ability also perform user-defined function calls for required combination of kernel + resampler or even kernel_base + weighting + resampler. Because most kernel functions are good applicable for both V+H and 2D-convolution resamplers with may be small parameters tweaking. 1D kernel for 2D-convolution I think is usually radius/distance-argumented rotation of 1D kernel around center point. i.e. (kernel_2D(x,y) = kernel_1D(sqrt(x^2+y^2))).

Last edited by DTL; 6th December 2020 at 08:09.
DTL is offline   Reply With Quote
Old 14th December 2020, 10:16   #143  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 61
Quote:
Originally Posted by real.finder View Post
There is a sample build of this plugin with internal multithreading by OpenMP https://forum.doom9.org/showthread.p...87#post1930687 . It MT only main frames processing loop so start time of preparing large full-frame coeff array is the same. And uses full-auto threading without manual control.

From my test it actually uses a bit less multi-core CPU (about 90%) in compare with Avisynth+ MT (100% CPU) and runs a bit slower. So there is not great advance for speed if using newer MT Avisynths. But it uses only one large array of coeffs for all threads processing so require much less memory in compare with Avisynth's frame-based MT. I beleive somedays it will be rewritten for small full cacheable LUT-based additive convolution approach instead of full-frame coeffs array mul+add. At least for integer ratio of scaling. So the large memory requirement for each frame-processing thread will be removed too.
DTL is offline   Reply With Quote
Old 14th December 2020, 10:52   #144  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,000
Quote:
Originally Posted by DTL View Post
There is a sample build of this plugin with internal multithreading by OpenMP https://forum.doom9.org/showthread.p...87#post1930687 . It MT only main frames processing loop so start time of preparing large full-frame coeff array is the same. And uses full-auto threading without manual control.

From my test it actually uses a bit less multi-core CPU (about 90%) in compare with Avisynth+ MT (100% CPU) and runs a bit slower. So there is not great advance for speed if using newer MT Avisynths. But it uses only one large array of coeffs for all threads processing so require much less memory in compare with Avisynth's frame-based MT. I beleive somedays it will be rewritten for small full cacheable LUT-based additive convolution approach instead of full-frame coeffs array mul+add. At least for integer ratio of scaling. So the large memory requirement for each frame-processing thread will be removed too.
maybe jpsdr can make it MT in another way like by frame division "not just splitting image horizontaly" base on resize samples and taps with threads?
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 14th December 2020, 20:56   #145  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 1,859
I don't know if i'll be working on this... For now, i've put an alt to coding in my spare time for others personnal projects.
Isn't "splitting image horizontaly" a "frame division" ?
__________________
My github.
jpsdr is offline   Reply With Quote
Old 14th December 2020, 23:11   #146  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,000
Quote:
Originally Posted by jpsdr View Post
I don't know if i'll be working on this... For now, i've put an alt to coding in my spare time for others personnal projects.
Isn't "splitting image horizontaly" a "frame division" ?
it is, but I was mean as blocks, don't know if splitting image horizontaly will work with 2D resizes
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 15th December 2020, 19:44   #147  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 1,859
While there is no feed back (you need some previous results to compute next result), you can divide however you want and do things in whatever order you want. Of course doing in a total random order will not be the most efficient/faster way. But I see no reasons why splitting horizontaly shouldn't work.

BTW, i meant "i've put an halt"...
__________________
My github.

Last edited by jpsdr; 15th December 2020 at 19:48.
jpsdr is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:34.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, vBulletin Solutions Inc.