Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th April 2024, 05:35   #281  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
More safely use _e.XX builds for SMDegrain. Latest is https://github.com/DTL2020/mvtools/r.../r.2.7.46-e.03 . If it also not work on AVX - the only way is fallback to pinterf' 2.7.45 builds.
Hi DTL,

Another day, another lot of tests to figure out what's going on here...

So I thought I'd do it all on the AVX E5-2697 v2 dual CPU system, cut to the source, (so to speak), and I've uncovered several problems

I was using AVX2 builds of HDRTools & plugins_jpsdr that were causing unexpected problems, so I have revised that situation, and then I tried your latest compile of mvtools,
and I used the AVX DX12 (I have DX installed on this particular PC), and had no issues, tried noDX12, that worked. (any reason why you've done DX variants ?)

Have not got an AVX PC without DX installed, so I can't check that.

So after all that fiddle, I think I've got it sorted, next test is that it behaves itself using the Distributed Encode function from another AVX2 PC (the Ryzen's), in which I will use the DX12 AVX compile.
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..
TDS is offline   Reply With Quote
Old 19th April 2024, 13:27   #282  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
"any reason why you've done DX variants ?"

They are for hosts where DX12 installed and users want to use DX12 features. If DX12 is not installed - they will not loads at all. So for Win7 and old only noDX12 builds are applicable.
DTL is offline   Reply With Quote
Old 19th April 2024, 13:35   #283  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
"any reason why you've done DX variants ?"

They are for hosts where DX12 installed and users want to use DX12 features. If DX12 is not installed - they will not loads at all. So for Win7 and old only noDX12 builds are applicable.
All my PC's are running Windows 11.

So noDX12 ?

I do have DX installed, and both variants worked.
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..
TDS is offline   Reply With Quote
Old 19th April 2024, 13:44   #284  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
Win10 and later looks like has DX12 embedded from setup - you can use both DX12 and noDX12 builds.

If you have different CPUs in the farm - can you test onCPU MAnalyse performance using this script:

Code:
LoadPlugin("mvtools2.dll")

ColorBars(1920,1080, pixel_type="YV12")

super = MSuper(mt=false, pel=4, chroma=true)
forward_vec1 = MAnalyse(super, isb = false, delta = 1, search=3, chroma=true, mt=false)
MStoreVect(forward_vec1)

Prefetch(...)
with AVSmeter ? I hope this can run with any build of mvtools. Its output is RAW ME performance in frames pairs per second (using Expanding search - close to Exhaustive).

Expanding (from zero dx, dy) center search with pnew > 0 (better to adjust using MShow or other methods for current noise in the source) expected to be more noise-resistant at the static areas. Because after checking center (not moving point) - the other definitely false positions will be somehow penalty-protected by pnew setting. Though too high pnew param will cause skipping of real motion and blur in the motion areas.

Where Prefetch() is for number of real CPU cores at host.

Last edited by DTL; 19th April 2024 at 15:06.
DTL is offline   Reply With Quote
Old 19th April 2024, 14:37   #285  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
Win10 and later looks like has DX12 embedded from setup - you can use both DX12 and noDX12 builds.

If you have different CPUs in the farm - can you test onCPU MAnalyse performance using this script:

Code:
LoadPlugin("mvtools2.dll")

ColorBars(1920,1080, pixel_type="YV12")

super = MSuper(mt=false, pel=4, chroma=true)
forward_vec1 = MAnalyse(super, isb = false, delta = 1, search=3, chroma=true, mt=false)
MStoreVect(forward_vec1)

Prefetch(...)
with AVSmeter ? I hope this can run with any build of mvtools. Its output is RAW ME performance in frames pairs per second (using Expanding search - close to Exhaustive). Expanding (from zero dx, dy) center search with pnew > 0 (better to adjust using MShow or other methods for current noise in the source) expected to be more noise-resistant at the static areas. Because after checking center (not moving point) - the other definitely false positions will be somehow penalty-protected by pnew setting. Though too high pnew param will cause skipping of real motion and blur in the motion areas.

Where Prefetch() is for number of real CPU cores at host.
I have to say that I have no idea how to "run" that script with AVSMeter

I am only familiar with how scripts are written for RipBot, so changing that script for RipBot is way over my head, sorry.

Here is a sample of a script, from RipBot:-

Code:
#After_Prefetch_Custom
LoadPlugin("%AVISYNTHPLUGINS%\Plugins_JPSDR\Plugins_JPSDR.dll")
LevelLimit=(video.BitsPerComponent==8) ? 255 : 1023
IntensityMask=ConvertToY(video).Levels(0,2,LevelLimit,0,LevelLimit,coring=false)
EdgeMask=aSobel(IntensityMask,chroma=0,thresh=255,SetAffinity=false).invert.Levels(0,2,LevelLimit,0,LevelLimit,coring=false).Blur(1)
SharpMask=Overlay(IntensityMask,EdgeMask,mode="Multiply",opacity=1.0)
SharpenedVideo=Sharpen(video,1)
video=Overlay(video,SharpenedVideo,mask=SharpMask,opacity=1.0)
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..
TDS is offline   Reply With Quote
Old 19th April 2024, 15:18   #286  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
"I have to say that I have no idea how to "run" that script with AVSMeter"

1. Enter required number of threads for Prefetch(arg) and save to .avs file like test.avs (you can use Notepad text editor from Windows).

For 2 cores CPU (Core2Duo E7500)
Code:
LoadPlugin("mvtools2.dll")

ColorBars(1920,1080, pixel_type="YV12")

super = MSuper(mt=false, pel=4, chroma=true)
forward_vec1 = MAnalyse(super, isb = false, delta = 1, search=3, chroma=true, mt=false)
MStoreVect(forward_vec1)

Prefetch(2)
2. Put all required files to some folder like c:\test :
c:\test\test.avs
c:\test\mvtools2.dll
c:\test\AVSmeter64.exe

3. Run from command prompt

c:\test\AVSmeter64.exe test.avs

It will shows performance metering like
Code:
AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.3 (r3982, 3.7, x86_64) (3.7.3.0)

Number of frames:                   107892
Length (hh:mm:ss.ms):         00:59:59.996
Frame width:                           720
Frame height:                          180
Framerate:                          29.970 (30000/1001)
Colorspace:                          RGB32

Frame (current | last):             2416 | 107891
FPS (cur | min | max | avg):        6.130 | 3.900 | 317501 | 19.94
Process memory usage:               346 MiB
Thread count:                       8
CPU usage (current | average):      100.0% | 93.9%
Better to wait some time for FPS avg value to become stable.

Where required line only
FPS (cur | min | max | avg): 6.130 | 3.900 | 317501 | 19.94

If Prefetch( ) arg set correctly (not very low for current CPU cores present) - AVSmeter must display CPU usage close to 100%.

By placing different mvtools2.dll builds in the folder with script you can also check if any performance difference exist between SSE2/AVX/AVX2 builds at your host. If you need best performance with most common scripts (QTGMC/SMDegrain) without modification - I think -e.XX builds will be faster. For -a.XX builds it is recommended at least add optSearchOption=1 to all MAnalyse calls to enable some SIMD optimizations.

Last edited by DTL; 19th April 2024 at 15:28.
DTL is offline   Reply With Quote
Old 19th April 2024, 15:36   #287  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
"I have to say that I have no idea how to "run" that script with AVSMeter"

1. Enter required number of threads for Prefetch(arg) and save to .avs file like test.avs (you can use Notepad text editor from Windows).

For 2 cores CPU (Core2Duo E7500)
Code:
LoadPlugin("mvtools2.dll")

ColorBars(1920,1080, pixel_type="YV12")

super = MSuper(mt=false, pel=4, chroma=true)
forward_vec1 = MAnalyse(super, isb = false, delta = 1, search=3, chroma=true, mt=false)
MStoreVect(forward_vec1)

Prefetch(2)
2. Put all required files to some folder like c:\test :
c:\test\test.avs
c:\test\mvtools2.dll
c:\test\AVSmeter64.exe

3. Run from command prompt

c:\test\AVSmeter64.exe test.avs

It will shows performance metering like
Code:
AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.3 (r3982, 3.7, x86_64) (3.7.3.0)

Number of frames:                   107892
Length (hh:mm:ss.ms):         00:59:59.996
Frame width:                           720
Frame height:                          180
Framerate:                          29.970 (30000/1001)
Colorspace:                          RGB32

Frame (current | last):             2416 | 107891
FPS (cur | min | max | avg):        6.130 | 3.900 | 317501 | 19.94
Process memory usage:               346 MiB
Thread count:                       8
CPU usage (current | average):      100.0% | 93.9%
Better to wait some time for FPS avg value to become stable.

Where required line only
FPS (cur | min | max | avg): 6.130 | 3.900 | 317501 | 19.94

If Prefetch( ) arg set correctly (not very low for current CPU cores present) - AVSmeter must display CPU usage close to 100%.

By placing different mvtools2.dll builds in the folder with script you can also check if any performance difference exist between SSE2/AVX/AVX2 builds at your host. If you need best performance with most common scripts (QTGMC/SMDegrain) without modification - I think -e.XX builds will be faster. For -a.XX builds it is recommended at least add optSearchOption=1 to all MAnalyse calls to enable some SIMD optimizations.
OK, thanks for that, I will attempt to do this in the next few days.

Most of my CPU's are 12 or 16 core Ryzens.

The AVX CPU's are Xeon E5 2697 v2 (12c dual)
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..
TDS is offline   Reply With Quote
Old 21st April 2024, 06:45   #288  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post

2. Put all required files to some folder like c:\test :
c:\test\test.avs
c:\test\mvtools2.dll
c:\test\AVSmeter64.exe

3. Run from command prompt

c:\test\AVSmeter64.exe test.avs

It will shows performance metering like
Code:
AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.3 (r3982, 3.7, x86_64) (3.7.3.0)

Number of frames:                   107892
Length (hh:mm:ss.ms):         00:59:59.996
Frame width:                           720
Frame height:                          180
Framerate:                          29.970 (30000/1001)
Colorspace:                          RGB32

Frame (current | last):             2416 | 107891
FPS (cur | min | max | avg):        6.130 | 3.900 | 317501 | 19.94
Process memory usage:               346 MiB
Thread count:                       8
CPU usage (current | average):      100.0% | 93.9%
Better to wait some time for FPS avg value to become stable.

Where required line only
FPS (cur | min | max | avg): 6.130 | 3.900 | 317501 | 19.94
So I had the opportunity to do some test today, and they are interesting, almost like a benchmark test for Prefetch.

One thing that was strange, I did many test's with different mvtools.dll, and Prefetch's, and then just to double check,
I did a couple that had good results on the 1st run, but on the 2nd run were quite different, faster.....

Anyway, these were all done on an AVX CPU system (E5-2697v2)

Is there any reason to test on an AVX2 PC ??

Anyway, here are the results:-

https://www.mediafire.com/file/1igo0...sults.txt/file

I did some quick tests on the Ryzen 7950X

https://www.mediafire.com/file/m6q07...7950X.txt/file
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..

Last edited by TDS; 21st April 2024 at 08:03.
TDS is offline   Reply With Quote
Old 21st April 2024, 07:30   #289  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
Code:
LoadPlugin("mvtools2.dll")

ColorBars(1920,1080, pixel_type="YV12")

super = MSuper(mt=false, pel=4, chroma=true)
forward_vec1 = MAnalyse(super, isb = false, delta = 1, search=3, chroma=true, mt=false)
MStoreVect(forward_vec1)

Prefetch(2)
This is probably a bit off topic, but would there some way to use a similar "test" to benchmark x265 @ 4K, using different Prefetch settings ??
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..
TDS is offline   Reply With Quote
Old 21st April 2024, 12:23   #290  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
x265 is completely separate process from AVS+ system process so it can not be controlled by AVS+ threading settings. x265 uses its own methods of multithreading and provides separate controls (limited ?).
DTL is offline   Reply With Quote
Old 21st April 2024, 14:08   #291  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
x265 is completely separate process from AVS+ system process so it can not be controlled by AVS+ threading settings. x265 uses its own methods of multithreading and provides separate controls (limited ?).
So no comments on the mvtools tests ????
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..
TDS is offline   Reply With Quote
Old 21st April 2024, 15:03   #292  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
I am missed that previous post. Will look now or later when have time.

Quick answer:

"Is there any reason to test on an AVX2 PC ??"

Yes - the main steps in SIMD designs are SSE2 (or latest 4.2) and next is AVX2 for full-blood integers processing. AVX is only new beginning of 256bit SIMD and mostly limited to rarely used float calculations. Next step to 512bit is AVX512.

So I typically make builds of SSE2/AVX2/(AVX512).

Last edited by DTL; 21st April 2024 at 15:13.
DTL is offline   Reply With Quote
Old 23rd April 2024, 15:46   #293  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
Finally some time to type the long post:

Thank you for the provided test results. One valuable result is that even top end-users desktop CPUs are still significantly slower in ME in comparison with 10years old MPEG encoder chips. Also it takes close to 100% CPU time and nothing left on MDegrainN and MPEG encoding (x264/x265).

First the recommended number of Prefetch threads is the number of physical cores, not total logic if HyperThreading is enabled. Though it may greatly depends on each CPU architecture especially in the case of massive-multicores chips with low RAM channels and big enough cache.

I also made some tests with that script and CPUs and NVIDIA chips.

It looks the performance is heavily limited by RAM performance so at some CPUs the total cores and 50% cores performance is very close. It means for a real transcoding process it may be better to finetune threads number on the AVS script used so it can leave more CPU time for MPEG encoder. In complex scripts with mvtools used it may be tested with a different Prefetch() setting for each filter (or groups of filters ?)

Like

_myDegrain_with_mvtools_script(params).Prefetch(K)

other_plugins_calls(params).Prefetch(N)

where K < N.

And adjust N and K numbers for best total transcoding performance at a given host.

i5-9600K (6 cores without HT) example:
Prefetch(6) - 120 fps (97% total CPU load)
Prefetch(8) - 118 fps
Prefetch(4) - 120 fps (66% total CPU load)
Prefetch(3) - 112 fps (50% total CPU load)

The on chip cache size looks like winning the game with AMD 7950X (64 MB cache) even in comparison with Xeon Gold 6134 (8c/16th 32 MB cache) and 6 (?) RAM channels.
Xeon Gold 6134 results vs Prefetch number
Prefetch(8) - 220 fps (49% CPU load)
Prefetch(12) - 248 fps (74% CPU load)
Prefetch(16) - 258 fps (91% CPU load)

While with a fixed AVSTP plugin you can test internal MT in mvtools and it may make more performance benefit because of lower RAM usage and more on-chip cache efficiency. Unfortunately -a.XX builds long ago were not tested with AVSTP and look like they become very unstable in that mode.

Testing of different builds branches (-a.XX and -e.XX) really not completely fair as I remember a.XX have always enabled some simple logical optimization of skipping of already checked predictors and it can make visible performance benefit with static noise-free sources like ColorBars(). An attempt to add noise with AddGrain() will take some CPU time and lower results. 2.7.45 build and -e.03 build do not skip any predictor check and make some redundant work and it causes more or less visible performance penalty (may significantly depend on source).

As I see the AMD 7950X also have some graphic core integrated - you can also test its performance in hardware ME mode (using DX12 builds of a.XX and put Compute.cso with same folder) if it provide this API:
Code:
LoadPlugin("mvtools2.dll")

ColorBars(1920,1080, pixel_type="YV12")

super = MSuper(mt=false, pel=4, chroma=true, levels=1, pelrefine=false)
forward_vec1 = MAnalyse(super, isb = false, delta = 1, chroma=true, mt=false, optSearchOption=5, levels=1)
MStoreVect(forward_vec1)

Prefetch(..)
Also found during DX12 tests: Setting MT mode to 3 greatly limits performance of DX12, so default MT mode does not limit performance.

Tested Maxwell 2nd gen - it does not provide DX12ME API. So it looks like everything usable starts only from the Pascal series.

Turing cards with single NVENC are only slightly faster over Pascal 1 NVENC (about 700 vs 600 fpps).

I tried to find any information on possibly applicable AMD cards - found only some Wiki about different generations of MPEG hardware encoding ASIC from 199x to 202x on ATI/AMD cards - it looks like only modern VCE can be usable ?

Next release will have MAverage onCPU script function for performing script-based AreaMode simulation. First tests show with pel=4 precision it runs about 4 times faster (i5-9600K and GTX1060) in comparison with onCPU at AreaMode=1 AMflags=1 search more (+4 additional search positions for block). And still uses only 45% of single NVENC. So with full DX12 onboard frame shifts it is expected to be about 2 times faster. But DX12 onboard frame shifts requires much more DX12 programming. The quality is still not as perfect as expected - need more debugging. Currently SAD (DM) is not updated in simple MAverage function (DM update requires much more programming with import of super clip too). But if using IntOvlp=1 or 3 the SAD must be updated in MDegrainN after MVs interpolation so precheck of SAD in MAverage may be redundant and cause performance penalty. Need to check what is wrong now with current sources (some bad blocks passed to output like SAD not updated for resulting MVs or maybe other bugs still exist).
DTL is offline   Reply With Quote
Old 24th April 2024, 03:19   #294  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
Finally some time to type the long post:
Hi DTL,

You are quite good at long posts

So most of that doesn't mean too much to me, but if you got some of that from my tests, then it was worth it.

I might do the same tests on my 13900KF (no iGPU), and see what that shows with the P & E cores.

The 7950X does have an iGPU, but it's so "weak" I don't enable it, or install the appropriate drivers.

Regards
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..
TDS is offline   Reply With Quote
Old 24th April 2024, 12:35   #295  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
"The 7950X does have an iGPU, but it's so "weak" I don't enable it, or install the appropriate drivers."

In the reviews it is named as RDNA2 and can encode 264/265 and DX12_1 (or even DX12_2) compatible - so you can try enable it and check performance if it provides DX12-ME API too.
DTL is offline   Reply With Quote
Old 24th April 2024, 12:55   #296  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
"The 7950X does have an iGPU, but it's so "weak" I don't enable it, or install the appropriate drivers."

In the reviews it is named as RDNA2 and can encode 264/265 and DX12_1 (or even DX12_2) compatible - so you can try enable it and check performance if it provides DX12-ME API too.
There's probably not much point in testing it, as I need to compensate for the lesser CPU's used in the encoding pool.
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..
TDS is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 22:22.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.