Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Usage

Reply
 
Thread Tools Search this Thread Display Modes
Old 18th May 2023, 11:13   #2541  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,367
Well lately I've been only developing not really encoding/rendering. Even then I switched to Redshift render engine which is GPU based. And for the next 2 years I'll be doing some APPs in Dart.

I will wait until the market relaxes a bit, today prices are inflated and then I'll move to the next Xeon, even if it's the bottom tier. Not really digging this performance/efficiency core system and no AVX512 on top of that, also looking forward to AVX-1024, AMX2, TSX, DDR6 and PCI Gen6, and RTX5000 with path tracing for the GPU with some meaningful RAM+CUDA count.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread
Dogway is offline   Reply With Quote
Old 18th May 2023, 11:20   #2542  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,207
"DDR6"

After Xeon Max with at least 64 GB of HBM integrated starts shipping in 2023 it looks DDR-SDRAM died in the past for high-performance computing. I hope for poor and dying end-users PCs for homes market intel (or may be AMD) can provide HBM-based CPU compute platform in close years. Still some hope exist.

Also the total compute platform architecture finally redesigned again - from poor host RAM with fast CPU and non-x86/64-compatible compute accelerator(s) mounted via PCIe bus intel finally moves to high-performance CPU+RAM factory assembled module with 20x faster RAM speed and it compatible with all x86/64 software and can use standart periferal and also graphics/DCA PCIe adapters if needed.

It can be designed to poor-people low-LGA CPU socket motherboards with 2ch DDR-SDRAM if expansion required.

Last edited by DTL; 18th May 2023 at 11:25.
DTL is offline   Reply With Quote
Old 18th May 2023, 11:29   #2543  |  Link
Guest
Guest
 
Posts: n/a
Quote:
Originally Posted by Dogway View Post
Well lately I've been only developing not really encoding/rendering. Even then I switched to Redshift render engine which is GPU based. And for the next 2 years I'll be doing some APPs in Dart.

I will wait until the market relaxes a bit, today prices are inflated and then I'll move to the next Xeon, even if it's the bottom tier. Not really digging this performance/efficiency core system and no AVX512 on top of that, also looking forward to AVX-1024, AMX2, TSX, DDR6 and PCI Gen6, and RTX5000 with path tracing for the GPU with some meaningful RAM+CUDA count.
It's a shame that the 13th Gen Intel's dropped AVX512 support, and the P-cores & E-cores complicate the situation, a little.

However, the 7950X does have AVX512 support, and ALL cores do the job equally (kind of).

In my experience, the 7950X is just a little better than a 13900 (except maybe the 13900KS), when encoding.

But things just keep getting too expensive
  Reply With Quote
Old 18th May 2023, 11:29   #2544  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,207
Quote:
Originally Posted by LeXXuz View Post
How can a 6C/12T i-8700 be any close to a 16C/32T 3950x
The computing may be simply RAM-speed limited because AVS make most of data storage in main host RAM.

The simple 6ch DDR4 6 cores Xeon may run at 1800+ fps and any cores 2ch DDR4/5 based CPU may be limited to about 1000 fps - https://forum.doom9.org/showthread.p...62#post1987162

Last edited by DTL; 18th May 2023 at 11:34.
DTL is offline   Reply With Quote
Old 18th May 2023, 11:31   #2545  |  Link
Guest
Guest
 
Posts: n/a
Quote:
Originally Posted by DTL View Post
"DDR6"

After Xeon Max with at least 64 GB of HBM integrated starts shipping in 2023 it looks DDR-SDRAM died in the past for high-performance computing. I hope for poor and dying end-users PCs for homes market intel (or may be AMD) can provide HBM-based CPU compute platform in close years. Still some hope exist.

Also the total compute platform architecture finally redesigned again - from poor host RAM with fast CPU and non-x86/64-compatible compute accelerator(s) mounted via PCIe bus intel finally moves to high-performance CPU+RAM factory assembled module with 20x faster RAM speed and it compatible with all x86/64 software and can use standart periferal and also graphics/DCA PCIe adapters if needed.

It can be designed to poor-people low-LGA CPU socket motherboards with 2ch DDR-SDRAM if expansion required.
I don't think saying "poor people" is very polite....
  Reply With Quote
Old 18th May 2023, 11:40   #2546  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,207
"It's a shame that the 13th Gen Intel's dropped AVX512 support"

Intel optimize general-public (really shrinking and dying) market of home PCs to the standard performance tests to be comparable with AMD goods of the same season. So as general public market software close to no benefit from AVX512 and fast RAM - the marketing goods will continue to fight for best fps in Microsoft Word. So no AVX512 and no HBM RAM may reach general public CPUs any more.

For intel it is better to add E-cores with simple AVX2 (and 2 cores of AVX2 can overload simple 2ch DDR4/DDR5 RAM config - no need to put 1 core of AVX512 with about 4x more performance). So it will be comparable in MS Word and 7zip and all other standard marketing tests to make money in each season.

If you will look in the every-year home PCs tests - the intel and AMD is about +-10..30% of performance only. It is simple making money every year - not making high-performance compute platforms for users of smartphones.

Real HPC is Datacenters/Enterprise in 2023 and it is Xeon Max now.

Quote:
Originally Posted by Dogway View Post
For 8-cores is about 40% faster but I don't know if AviSynth can really leverage 8-cores.
AVS can start whatever required by user threads with Prefetch() and it will be displayed in Task Manager as busy CPU cores. But each thread is also a chain of filters and each filter is surrounded with large software cache located in host RAM. So Busy CPU cores typically spin idle loops and waiting for RAM. It is not thread-logical idle state so Task Manager of OS do not show core idle. It can be found in the hardware performance counters in VTune for example.

Last edited by DTL; 18th May 2023 at 11:51.
DTL is offline   Reply With Quote
Old 18th May 2023, 11:52   #2547  |  Link
kedautinh12
Registered User
 
Join Date: Jan 2018
Posts: 2,168
Quote:
Originally Posted by FTLOY View Post
It's a shame that you only have access to a fairly old CPU (might have been a powerhouse in its day, but not so much, now).

I think it would make a significant difference to your settings, if you were able to upgrade.

And if you think a 5950X is a "monster", a 7950X shame's it !!!
Old man was said: "7950x don't different with 5950x"
kedautinh12 is offline   Reply With Quote
Old 18th May 2023, 12:03   #2548  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,803
Quote:
Originally Posted by Dogway View Post
Thanks for the tests!
I read that for some people, maybe more complex scripts anything higher than Prefetch(4) or 6 didn't bring more performance. Anyway I don't use anything else than my signature rig so I set the 8700 as the cut off since it was a considerable performance step up compared to previous gens.

The 5950x is quite a monster so I expected a bit better numbers than those specially for such a simple script. I will keep the setting as 'false' for a while I think.

Will update TransformsPack as soon as I can, today if I have some time. Really excited for the new feature.
The thing with Prefetch is that you have to control the number of frames or the performance will suffer (because the default is threads x 2 I think). Threads can always be set to the maximum number available IMO. For example, I use Prefetch(threads=32, frames=12) on my 5950X. Lowering the amount of prefetched frames will also have a positive effect on memory usage which is often an issue with 4K stuff and Avisynth cache being what it is..
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 18th May 2023, 12:06   #2549  |  Link
Guest
Guest
 
Posts: n/a
Quote:
Originally Posted by kedautinh12 View Post
Old man was said: "7950x don't different with 5950x"
Who is this "old man" you keep referring to ?

A 7950X with the right settings is WAY better than a 5950X !!!
  Reply With Quote
Old 18th May 2023, 12:12   #2550  |  Link
kedautinh12
Registered User
 
Join Date: Jan 2018
Posts: 2,168
Quote:
Originally Posted by FTLOY View Post
Who is this "old man" you keep referring to ?

A 7950X with the right settings is WAY better than a 5950X !!!
He is
https://forum.doom9.org/member.php?u=244197
kedautinh12 is offline   Reply With Quote
Old 18th May 2023, 15:25   #2551  |  Link
LeXXuz
21 years and counting...
 
LeXXuz's Avatar
 
Join Date: Oct 2002
Location: Germany
Posts: 716
Quote:
Originally Posted by DTL View Post
So Busy CPU cores typically spin idle loops and waiting for RAM. It is not thread-logical idle state so Task Manager of OS do not show core idle.
Would you mind linking a source for that?

I have not encountered this under Win11 so far. Quite the contrary where other tools report CPU usage but Task Manager not.
That happens upon starting scripts with heavy GPU load and memory usage. HWI reports quite high CPU usage while Win11 TM doesn't.

Quote:
Originally Posted by Boulder View Post
. For example, I use Prefetch(threads=32, frames=12) on my 5950X.
Why 12 frames exactly? In any relation to the filters you use?

Well, if you guys think that was way too slow, I'm happy to test with different settings for comparison.

Last edited by LeXXuz; 18th May 2023 at 15:35.
LeXXuz is offline   Reply With Quote
Old 18th May 2023, 16:19   #2552  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,803
Quote:
Originally Posted by LeXXuz View Post
Why 12 frames exactly? In any relation to the filters you use?
I've tested different combinations and that produced the best performance over multiple kinds of sources with threads=32. I mostly use the same framework script for every source and just tune the parameters so the load on CPU is pretty much the same anyway. It's particularly important to test it by running the encoder with your normal settings since it will affect CPU scheduling greatly.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 18th May 2023, 19:12   #2553  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,207
"Would you mind linking a source for that?"

You can run any software profiler and look at the timestamps around load from memory operations (in disassembly view). From freeware AMD uProf . Loading from memory is not any compute operation - it simply CPU core stall because no data to continue compute. But it is thread taking CPU time so Task Manager will show core load time (it not show real computing load - it only show % of time thread using core time).

Thread waiting for data can not free CPU core resources because thread switching is very costly operation. The only possible is in 'hyperthreading' mode when 2 threads uses 1 core resources the stalling thread waiting for memory may be switched to other thread if it got ready to compute data. So 'hyperthreading' sometime really make some total performance benefit.

The more advanced intel VTune will also show you lots of performance counters around cache misses and so on. Also VTune can create some hints about how much the application is memory-performance bounded (after analysis of memory-performance counters).

The real indirect way to estimate compute load is look into CPU Power usage - any real computing part make switching and in CMOS it takes some power. Stalling in idle parts in CMOS not draw power and not produce heat. So CPU engine put parts to idle zero power state if nothing to compute. So the more power draw to CPU and more heat produced - the more real useful computing is performed. Or at least some about useful shuffling of data in lots of caches (L1,L2,(L3,L4)). The AVX2 and AVX512 compute units running with good load draw lots of power and produce lots of heat (so typically CPU clock trotled to save from overheating).

CPU temp may be used to estimate how compute software is optimized for compute at current chip - with fixed cooling the CPU temp increases as software make more compute switching at given time. It not mean it is useful compute but at least not stall waiting for something.

"I have not encountered this under Win11 so far"

I still not see Win11 - may be Microsoft make some redesign of performance measurement tools in that version. If all supported CPU vendors provide required hardware performance counters. In old times OS can only measure thread time at core (OS knows start and end time of thread at core and can measure 'physical time' using common RDTSC instruction).

Last edited by DTL; 18th May 2023 at 19:41.
DTL is offline   Reply With Quote
Old 18th May 2023, 19:52   #2554  |  Link
LeXXuz
21 years and counting...
 
LeXXuz's Avatar
 
Join Date: Oct 2002
Location: Germany
Posts: 716
Quote:
Originally Posted by DTL View Post
The real indirect way to estimate compute load is look into CPU Power usage - any real computing part make switching and in CMOS it takes some power. Stalling in idle parts in CMOS not draw power and not produce heat. So CPU engine put parts to idle zero power state if nothing to compute. So the more power draw to CPU and more heat produced - the more real useful computing is performed. Or at least some about useful shuffling of data in lots of caches (L1,L2,(L3,L4)).
Well, looking at the temps and reported values for TDC, EDC and PPT, I'd say the CPU pretty much has its hands full and isn't really idling very much.

And between Prefetch 2,4,8,16, and 18...32 there's a quite good linear scale from 2 to 16, with almost double the prefetch = double the performance. After that the gradient drops with increasing prefetch but performance still increases and tops out with ~32. So I see no reason not to use 32 threads on a 16C/32T CPU.

I'll post a speed test between 'untouched', UHDHalf=true and UHDHalf=false with the 7950x soon for another comparison.

Last edited by LeXXuz; 18th May 2023 at 19:55.
LeXXuz is offline   Reply With Quote
Old 18th May 2023, 20:12   #2555  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,367
@LeXXuz: I forgot to say, are you using DGSource() or a CPU bound loader? it makes substantial difference.

I think Xeon Max is enterprise tier, I'm looking forward Diamond Rapids w3 or w5, whatever delivers 8c/16t at least. And the thing with AVX512 is that not all are created equal, Intel now says they are going to bring it to mainstream line but does it include AVX512-VNNI?
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread
Dogway is offline   Reply With Quote
Old 18th May 2023, 21:50   #2556  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,207
I hope the Xeon Max (already renamed in several months after 'Sapphire Rapids HBM') is only one of many new families names in the future. May be end-user chip will be named i7-15xxx or more numbers. Or i9-15xxx, i11-xxxx and so on.

" 8c/16t at least."

Hyperthreading typically good only if it is close to free addition. Only after all other hardware resources are fully filled (max possible channels of RAM and max possible ways in the caches and so on). If the same priced less cores without HT but offer more DDR RAM channels - it may be faster with well optimized software. Also not very great optimized software may somehow benefit from more ways in the caches (Xeons may have 12..24 ways and low quality chips 8..12 ways at L2/L3). If the processing datasets fits in the L2/L3 caches.

"AVX512-VNNI?"

The only known real benefit from NN is may be deinterlacing with fighting special separated fields aliasing. And the interlaced content moves to the past now. I not know any other processing with real benefit from NN on CPU. May be if civilization will not die too quickly we can see 4:2:0 to 4:4:4 decoding of the same high quality as deinterlacing - but it may be visible to Die-Hard perfectionists only. The general benefit of AVX512 is 4x larger register file of AVX2 and 2x wider dispatch ports and lots of new faster instructions for simple integer 8/16 bit operations typically exist in F/BW/DQ/VL/VBMI old series. 202x years expected to be season of AVX512 most members in all CPU chips at the market but something go not as nice as expected. Also the software still very poorly optimized even for AVX2 and number of programmers with understanding in SIMD looks like fast shrinking too. So if we even have AVX512 everywhere - there already close to nobody knows how to program it. The 512 bytes AVX2 register file already not very easy to keep in mind about data placement - and AVX512 is 4x times larger. Only really nice (typically young) great brains can make nice fast handcrafted software for 2048 bytes data array and with complex computing (not simple 2x expanding of old poorly designed AVX2 software). Or with degrading and dying of real humans we need to some NN-robots to design nice AVX512 programs - still not exist may be.

Last edited by DTL; 18th May 2023 at 22:12.
DTL is offline   Reply With Quote
Old 19th May 2023, 00:23   #2557  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,367
I think we need general NN support at CPU level as well since as we know the future is AI based and frees us up from those time-consuming/low-reward tasks.
My only real concern with future hardware is if they are going back to the efficiency road path as lately everything consumes and heats way over I consider stable.

By the way I updated TransformsPack with a new gamut compression function ported from Jed Smith:

Code:
ConvertFormat(cs_in="ACEScg",cs_out="709",EOTFi="",GC=true)

TM_Hable(mode="Dark",filmic=false)  
CCTF("1886",false, tv_out=false)

ConvertBits(8,dither=1)
ACEScg - Linear


709 - 1886


709 - 1886 (+Gamut Compression)


Since probably all these samples are scene referred (radiance based) they look bland, lacking the filmic punch.
You can try setting 'filmic' to true for either Dark or Bright 'mode'. But I prefer the following.
Here's an example using LMT_DCP() (DCP Tone Curve OOTF)
Code:
TM_Hable(mode="Dark",filmic=false)  
LMT_DCP()
CCTF("1886",false,tv_out=false)
709 - 1886+DCP OOTF (+Gamut Compression)


And here using LMT_EMoR() instead, a typical Camera Response Function fit. It's an OOTF+EOTFi so we suppress the 1886 inverse EOTF:
Code:
TM_Hable(mode="Dark",filmic=false)  
LMTi_EMoR(tv_range=false)
709 - EMoR OOTF/EOTFi (+Gamut Compression)



EDIT: Same as above with LMT_EMoR() but with updated GamutCompression with non-linear integration of negative values
709 - EMoR OOTF/EOTFi (+Gamut Compression Updated)
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread

Last edited by Dogway; 24th May 2023 at 05:10.
Dogway is offline   Reply With Quote
Old 19th May 2023, 01:04   #2558  |  Link
LeXXuz
21 years and counting...
 
LeXXuz's Avatar
 
Join Date: Oct 2002
Location: Germany
Posts: 716
Quote:
Originally Posted by Dogway View Post
@LeXXuz: I forgot to say, are you using DGSource() or a CPU bound loader? it makes substantial difference.
No. I've used LSMASH. On purpose. The 7950x machine has no Nvidia GPU that can decode HEVC as I have no use for that.
Otherwise it wouldn't be a fair comparison between both systems.

The GT 710 is fine and fast enough for frameserving MPEG-2 and AVC. And the Intel ARC is great for OpenCL filters but doesn't support CUDA of course.
I usually don't do UHD content. For various reasons. This one was just for testing out of curiosity.

Anyhow, here is the same content encoded with a Ryzen 7950x system:



Again, top-left to bottom-right:

tl: x265 medium preset CRF20, UHDHalf=true, Prefetch(16)
tm: x265 medium preset CRF20, UHDHalf=false, Prefetch(16)
tr: x265 medium preset CRF20, no filtering in Avisynth

bl: x265 medium preset CRF20, UHDHalf=true, Prefetch(32)
bm: x265 medium preset CRF20, UHDHalf=false, Prefetch(32)
br: x265 medium preset CRF20 AVX512, no filtering in Avisynth

UHDhalf=false benefits a little from Prefetch 32 over 16, while
UHDhalf=true is about the same. Already noticed this with more demanding scripts for 1080p too.

Lastly a comparison between default AVX2 and allowing AVX512. Difference is negligible, as I expected. But mileage may vary with different content.

Anyhow, I'm fine with these numbers.

Last edited by LeXXuz; 19th May 2023 at 01:06.
LeXXuz is offline   Reply With Quote
Old 19th May 2023, 10:11   #2559  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,367
Thanks for the tests, the 7950x is a monster CPU so I think the numbers are fine, UHDHalf=true does almost a 63% perf increase so I think it's fine given people would resort to more convoluted scripts and filtering.
I don't know what filters do make use of AVX512 in AviSynth so I can't tell. My interest in the instruction is more for 3D DCC and games (if I ever happen to find time for that).
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread
Dogway is offline   Reply With Quote
Old 19th May 2023, 16:33   #2560  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 716
My footage in uhd is very sharp/finely detailed I guess and when using temporalsoften mode in uhdhalf=true it is considerably blurrier compared to uhdhalf=false. Not nearly as little difference as to LeXXuz comparison examples. Is there a way to regain the detail/sharpness by pel or something and yet gain the speed from mscalevector somewhat?
anton_foy is offline   Reply With Quote
Reply

Tags
avisynth, dogway, filters, hbd, packs

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 15:34.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.