Log in

View Full Version : Help diagnostic performance issue


Q3CPMA
24th August 2024, 09:14
Hello people,

I tried filtering -> encoding some DVD content tonight, but when I came back this morning, the script performance (initially pretty good) had cratered to a completely unrealistic level: only 5~6 minutes were done, I was getting something like a frame per minute and each vspipe was consuming something like 6~8 GB of RAM.

Here's my initial command line (I launched 4 of these in parallel):

$ vspipe -a INFILE=A1_t00.mkv script.vpy -c y4m - |
ffmpeg -f yuv4mpegpipe -i - -c:v libx265 -profile:v main10 -pix_fmt yuv420p10le -preset:v slow -crf 18 \
-x265-params bframes=8:ref=6:deblock=-1,-1:limit-sao=1:psy-rd=1.5:psy-rdoq=2:rdoq-level=1:aq-mode=3 \
-y out_00.mkv


And here's the script:

from vapoursynth import core
import vsdehalo
import havsfunc as haf
import muvsfunc as muf

vid = core.bs.VideoSource(source=INFILE)

# Bob then deblend
vid = haf.QTGMC(vid, TFF=True, Preset="slower")
vid = muf.srestore(vid, frate=24000/1001, dclip=core.std.Crop(vid, 24, 24, 24, 24))

# Square pixel conversion
vid = core.resize.Bicubic(vid, width=720, height=540)

# Crop black bars
cropargs = {"A1_t00.mkv": {"top": 0, "left": 6, "right": 10, "bottom": 2},
"A2_t01.mkv": {"top": 0, "left": 6, "right": 12, "bottom": 2},
"A3_t02.mkv": {"top": 0, "left": 8, "right": 8, "bottom": 2},
"A4_t03.mkv": {"top": 0, "left": 14, "right": 6, "bottom": 2}}
vid = core.std.Crop(vid, **cropargs[INFILE])

# Restore levels
vid = haf.SmoothLevels(vid, input_low=18, input_high=255 - 18)

# Dehalo
vid = vsdehalo.fine_dehalo(vid)

vid.set_output()


This morning, I tried investigating performance by filtering 30000 output frames (~21 min) using:

$ vspipe --filter-time --progress -e 30000 -a INFILE=A1_t00.mkv script.vpy -c y4m out.y4m 2>&1 | tr '\r' '\n' >progress

to get a better view of the problem and got to see the fps go from 75 to 42 while CPU usage stayed at a very stable 1000% (10 cores) and RAM near 2.5 GB. The two top filters are Lanczos and FrameEval at 3~3.5% each and the sum of all filter times reaches 57% of total time.
Interestingly, using a cachesize=10000 parameter for BestSource (default: 1000) made it 15% faster, while using 13 GB of RAM and a bit more CPU.

Before I start to play with Python profiling tools, anyone got an idea about it? Could have something to do with BestSource's cache and FrameEval being in "unordered" mode (seems like srestore could be it: https://github.com/WolframRhodium/muvsfunc/blob/master/muvsfunc.py#L8856)?


PS: System is Gentoo on an AMD 5900X with 64 GB of DDR4, using up-to-date packages from https://github.com/4re/vapoursynth-portage (including latest git for havsfunc and muvsfunc). Input is 30 min of "mpeg2video (Main), yuv420p(tv, top first), 720x480 [SAR 8:9 DAR 4:3], 29.97 fps, 29.97 tbr".

Myrsloik
24th August 2024, 10:12
What's the format of INFILE?

Q3CPMA
24th August 2024, 10:26
MPEG2 in mkv (generated by makemkv).

PS: in fine, doing it without piping into x265 took 18 min (at ~40 fps) for the complete file with cachesize=30000, ffmpeg might do some things when using pipe input that slows it down to a crawl (and thus fills the pipe)

Myrsloik
24th August 2024, 10:31
MPEG2 in mkv (generated by makemkv).

PS: in fine, doing it without piping into x265 took 18 min (at ~40 fps) for the complete file with cachesize=30000, ffmpeg might do some things when using pipe input that slows it down to a crawl (and thus fills the pipe)

Then I have no idea what's causing it. Most types of bugs on the VS side would slow it down in both cases. You could try with ffms2 just for fun and see if it happens with it as well I guess.

It's time to randomly do things until a pattern emerges!

Q3CPMA
24th August 2024, 10:34
I did try ffms2 too, was a bit faster than BS with default cachesize, but not as much when increasing it. Thanks for reading anyway! And for dedicating your time to VS, it's a really cool tool.

edcrfv94
26th August 2024, 08:32
You can try VapourSynth R45.