Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
10th January 2022, 08:02 | #1 | Link |
Registered User
Join Date: May 2005
Posts: 1,462
|
Vapoursynth CPU saturation
(moved out of main VS thread)
Is R57 broken somehow? At first I blamed Windows 11's new termimal for the very slow throughput (slow pipe transfer?), but nope, even using ffmepeg with -vapoursynth, the process is extremely slow, using CPU for only like 25%. Both QTGMC and MCTemporalDenoise seem to grind to a near halt. All on my new i9 12900K. This used to go blistering fast, even on my old 6700K. Here's what I do (see below). It's almost as if multi-threading is broken for these two functions (it isn't, but appears to work exceedngly inefficient). This is 4K material, btw. Code:
import vapoursynth as vs import havsfunc as haf core = vs.core core.max_cache_size = 65535 vid = core.dgdecodenv.DGSource (r'c:\jobs\am.dgi', ct=44, cb=44, cl=0, cr=0) vid = haf.QTGMC (vid, InputType=1, Preset="Very Slow", TR2=3, EdiQual=2, EZDenoise=0.5, NoisePreset="Slower", TFF=True, Denoiser="KNLMeansCL") vid = haf.MCTemporalDenoise (vid, settings="very low", stabilize=True) vid = core.neo_f3kdb.Deband (vid, preset="veryhigh", dither_algo=2) vid = core.std.AddBorders (clip=vid, left=0, right=0, top=44, bottom=44) vid.set_output () CPU saturation N.B. I had the same issue on my previous i7 11700K, btw. P.S. Does it matter my plugins folder has 148 plugins in it? (all 64-bit recent vapoursynth plugins someone posted here).
__________________
Gorgeous, delicious, deculture! |
10th January 2022, 12:32 | #2 | Link | |
Professional Code Monkey
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
|
Quote:
Autoloading every dll on the planet is a bad idea. I don't approve of this.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet Last edited by Myrsloik; 10th January 2022 at 12:38. |
|
10th January 2022, 13:44 | #3 | Link | |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,795
|
Quote:
It should only slowdown the initial load, right? (my nvme is fast!!!111)
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database Last edited by ChaosKing; 10th January 2022 at 13:47. |
|
10th January 2022, 20:21 | #4 | Link | ||
Registered User
Join Date: May 2005
Posts: 1,462
|
Quote:
Didn't even know you could use this timing function. Here are the results (of 'VSPipe --filter-time -c y4m "f:\jobs\test.vpy" NUL'). Still dismal at 1.49 fps. Code:
Output 7215 frames in 4836.29 seconds (1.49 fps) Filtername Filter mode Time (%) Time (s) DFTTest parallel 236.04 11415.38 Degrain3 parallel 153.59 7428.12 Analyse parallel 140.13 6776.88 Analyse parallel 139.92 6766.79 Analyse parallel 135.95 6574.87 Analyse parallel 134.19 6489.61 Analyse parallel 133.42 6452.38 Analyse parallel 131.75 6371.89 Degrain1 parallel 76.81 3714.79 Degrain1 parallel 76.38 3694.15 Analyse parallel 59.43 2874.38 Analyse parallel 59.07 2856.85 Degrain1 parallel 42.95 2077.16 Super parallel 42.45 2053.03 Compensate parallel 37.70 1823.22 Compensate parallel 37.16 1797.00 KNLMeansCL parreq 36.76 1777.75 TemporalSoften2 parallel 34.00 1644.15 Super parallel 31.56 1526.36 Super parallel 30.51 1475.59 Super parallel 25.62 1239.13 Compensate parallel 24.76 1197.31 Compensate parallel 24.47 1183.26 Compensate parallel 24.47 1183.21 Compensate parallel 24.18 1169.34 TemporalSoften2 parallel 22.21 1073.95 resample parallel 18.60 899.78 TTempSmooth parallel 18.35 887.22 resample parallel 17.68 854.87 Deband parallel 15.32 740.92 Super parallel 12.12 585.95 Expr parallel 9.82 474.90 Expr parallel 8.71 421.06 Expr parallel 8.30 401.20 Expr parallel 7.56 365.69 Expr parallel 7.47 361.42 Point parallel 7.30 352.99 Expr parallel 7.16 346.42 Expr parallel 7.13 344.99 Expr parallel 7.06 341.45 Expr parallel 7.01 339.19 Expr parallel 6.79 328.22 Expr parallel 6.78 327.71 Merge parallel 6.73 325.45 Expr parallel 6.73 325.34 Expr parallel 6.69 323.71 Merge parallel 6.36 307.75 MakeDiff parallel 6.29 303.96 Merge parallel 6.26 302.53 Merge parallel 6.23 301.09 MergeDiff parallel 6.21 300.25 Merge parallel 6.19 299.49 Convolution parallel 6.18 298.65 MergeDiff parallel 6.10 295.05 MakeDiff parallel 6.02 291.36 Convolution parallel 5.97 288.95 MaskedMerge parallel 5.85 282.99 Expr parallel 5.84 282.41 Inflate parallel 5.70 275.67 Deflate parallel 5.61 271.28 Merge parallel 5.51 266.49 Inflate parallel 5.36 259.19 Deflate parallel 5.28 255.32 Expr parallel 5.25 254.09 DGSource unordered 5.18 250.73 Expr parallel 4.98 241.04 Minimum parallel 4.63 224.13 Maximum parallel 4.63 223.72 Minimum parallel 4.60 222.57 Minimum parallel 4.59 221.93 Maximum parallel 4.56 220.74 Maximum parallel 4.56 220.34 Minimum parallel 4.55 220.16 Minimum parallel 4.51 217.99 Minimum parallel 4.51 217.94 Super parallel 4.49 217.20 Maximum parallel 4.46 215.60 Minimum parallel 4.41 213.16 Maximum parallel 4.37 211.56 Maximum parallel 4.32 208.95 Maximum parallel 4.29 207.35 Minimum parallel 4.27 206.58 Maximum parallel 4.22 204.18 MakeDiff parallel 4.20 203.26 MakeDiff parallel 4.19 202.86 Minimum parallel 4.16 201.38 Maximum parallel 4.13 199.86 Crop parallel 4.13 199.63 MakeDiff parallel 4.06 196.45 Median parallel 3.98 192.53 MakeDiff parallel 3.96 191.64 Convolution parallel 3.95 191.13 Inflate parallel 3.95 190.91 Convolution parallel 3.94 190.69 MakeDiff parallel 3.93 190.28 Convolution parallel 3.93 189.94 bitdepth parallel 3.89 188.22 Expr parallel 3.86 186.72 bitdepth parallel 3.85 186.14 Convolution parallel 3.68 177.99 Convolution parallel 3.53 170.95 Convolution parallel 3.50 169.11 Lut parallel 3.46 167.12 AddBorders parallel 3.32 160.49 PlaneStats parallel 2.84 137.48 Interleave parallel 0.04 1.89 SetFieldBased parallel 0.01 0.44 SCDetect parallel 0.00 0.22 SelectEvery parallel 0.00 0.13 ShufflePlanes parallel 0.00 0.04 Trim parallel 0.00 0.01 Quote:
__________________
Gorgeous, delicious, deculture! Last edited by asarian; 10th January 2022 at 22:01. Reason: Updated data |
||
12th January 2022, 12:04 | #5 | Link |
Professional Code Monkey
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
|
The first thing I'd try is removing all GPU filters like knlmeanscl and see if those are bottlenecking things. Their resource usage doesn't show up in the filter times.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet |
15th January 2022, 16:34 | #6 | Link | |
Registered User
Join Date: May 2005
Posts: 1,462
|
Quote:
It may simply also be a memory issue. I also bought G.Skill memory nearly twice as fast as the one I had, and that makes the entire process nearly go twice as fast too (currently 42666 Mhz).
__________________
Gorgeous, delicious, deculture! |
|
16th January 2022, 13:23 | #7 | Link | |
Professional Code Monkey
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
|
Quote:
For example a threadripper 1950x can be considerably faster than a shiny new 5950x due to the extra memory channels.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet |
|
17th January 2022, 02:35 | #8 | Link | |
Registered User
Join Date: May 2005
Posts: 1,462
|
Quote:
Just regular 10-bit 4K material. Not super-high. Weird thing is, though, that when I split up the 4K job into 4 parts (each 1080p + a reasonable overscan), then I get full saturation on all cores.** You'd think QTGMC and such repeatedly working on a full 4K frame isn't fast enough to produce enough throughput for x265, right? But that would ere make the CPU work in overdrive, rather than lazily sitting at 40% or less. ** Sometimes even with overscan, grainy sources still won't be seamless afterwards.
__________________
Gorgeous, delicious, deculture! |
|
26th January 2022, 04:38 | #9 | Link |
Registered User
Join Date: May 2005
Posts: 1,462
|
Well, the matter is resolved. Looks like it was the E-cores, after all. I found a very useful option in the BIOS to disable the E-cores, pressing ScrollLock while in Windows 11 (it doesn't actually disable them, just marks them all as 'parked'). Now I get a blistering fast, sustained 100% CPU saturation again on all P-cores.
Even though heretofore the E-cores appeared to be hardly used at all, nonetheless they were the source of the (significant) hold-up.
__________________
Gorgeous, delicious, deculture! Last edited by asarian; 26th January 2022 at 05:22. |
Tags |
speed qtgmc |
Thread Tools | Search this Thread |
Display Modes | |
|
|