Q3CPMA
24th August 2024, 09:14
Hello people,
I tried filtering -> encoding some DVD content tonight, but when I came back this morning, the script performance (initially pretty good) had cratered to a completely unrealistic level: only 5~6 minutes were done, I was getting something like a frame per minute and each vspipe was consuming something like 6~8 GB of RAM.
Here's my initial command line (I launched 4 of these in parallel):
$ vspipe -a INFILE=A1_t00.mkv script.vpy -c y4m - |
ffmpeg -f yuv4mpegpipe -i - -c:v libx265 -profile:v main10 -pix_fmt yuv420p10le -preset:v slow -crf 18 \
-x265-params bframes=8:ref=6:deblock=-1,-1:limit-sao=1:psy-rd=1.5:psy-rdoq=2:rdoq-level=1:aq-mode=3 \
-y out_00.mkv
And here's the script:
from vapoursynth import core
import vsdehalo
import havsfunc as haf
import muvsfunc as muf
vid = core.bs.VideoSource(source=INFILE)
# Bob then deblend
vid = haf.QTGMC(vid, TFF=True, Preset="slower")
vid = muf.srestore(vid, frate=24000/1001, dclip=core.std.Crop(vid, 24, 24, 24, 24))
# Square pixel conversion
vid = core.resize.Bicubic(vid, width=720, height=540)
# Crop black bars
cropargs = {"A1_t00.mkv": {"top": 0, "left": 6, "right": 10, "bottom": 2},
"A2_t01.mkv": {"top": 0, "left": 6, "right": 12, "bottom": 2},
"A3_t02.mkv": {"top": 0, "left": 8, "right": 8, "bottom": 2},
"A4_t03.mkv": {"top": 0, "left": 14, "right": 6, "bottom": 2}}
vid = core.std.Crop(vid, **cropargs[INFILE])
# Restore levels
vid = haf.SmoothLevels(vid, input_low=18, input_high=255 - 18)
# Dehalo
vid = vsdehalo.fine_dehalo(vid)
vid.set_output()
This morning, I tried investigating performance by filtering 30000 output frames (~21 min) using:
$ vspipe --filter-time --progress -e 30000 -a INFILE=A1_t00.mkv script.vpy -c y4m out.y4m 2>&1 | tr '\r' '\n' >progress
to get a better view of the problem and got to see the fps go from 75 to 42 while CPU usage stayed at a very stable 1000% (10 cores) and RAM near 2.5 GB. The two top filters are Lanczos and FrameEval at 3~3.5% each and the sum of all filter times reaches 57% of total time.
Interestingly, using a cachesize=10000 parameter for BestSource (default: 1000) made it 15% faster, while using 13 GB of RAM and a bit more CPU.
Before I start to play with Python profiling tools, anyone got an idea about it? Could have something to do with BestSource's cache and FrameEval being in "unordered" mode (seems like srestore could be it: https://github.com/WolframRhodium/muvsfunc/blob/master/muvsfunc.py#L8856)?
PS: System is Gentoo on an AMD 5900X with 64 GB of DDR4, using up-to-date packages from https://github.com/4re/vapoursynth-portage (including latest git for havsfunc and muvsfunc). Input is 30 min of "mpeg2video (Main), yuv420p(tv, top first), 720x480 [SAR 8:9 DAR 4:3], 29.97 fps, 29.97 tbr".
I tried filtering -> encoding some DVD content tonight, but when I came back this morning, the script performance (initially pretty good) had cratered to a completely unrealistic level: only 5~6 minutes were done, I was getting something like a frame per minute and each vspipe was consuming something like 6~8 GB of RAM.
Here's my initial command line (I launched 4 of these in parallel):
$ vspipe -a INFILE=A1_t00.mkv script.vpy -c y4m - |
ffmpeg -f yuv4mpegpipe -i - -c:v libx265 -profile:v main10 -pix_fmt yuv420p10le -preset:v slow -crf 18 \
-x265-params bframes=8:ref=6:deblock=-1,-1:limit-sao=1:psy-rd=1.5:psy-rdoq=2:rdoq-level=1:aq-mode=3 \
-y out_00.mkv
And here's the script:
from vapoursynth import core
import vsdehalo
import havsfunc as haf
import muvsfunc as muf
vid = core.bs.VideoSource(source=INFILE)
# Bob then deblend
vid = haf.QTGMC(vid, TFF=True, Preset="slower")
vid = muf.srestore(vid, frate=24000/1001, dclip=core.std.Crop(vid, 24, 24, 24, 24))
# Square pixel conversion
vid = core.resize.Bicubic(vid, width=720, height=540)
# Crop black bars
cropargs = {"A1_t00.mkv": {"top": 0, "left": 6, "right": 10, "bottom": 2},
"A2_t01.mkv": {"top": 0, "left": 6, "right": 12, "bottom": 2},
"A3_t02.mkv": {"top": 0, "left": 8, "right": 8, "bottom": 2},
"A4_t03.mkv": {"top": 0, "left": 14, "right": 6, "bottom": 2}}
vid = core.std.Crop(vid, **cropargs[INFILE])
# Restore levels
vid = haf.SmoothLevels(vid, input_low=18, input_high=255 - 18)
# Dehalo
vid = vsdehalo.fine_dehalo(vid)
vid.set_output()
This morning, I tried investigating performance by filtering 30000 output frames (~21 min) using:
$ vspipe --filter-time --progress -e 30000 -a INFILE=A1_t00.mkv script.vpy -c y4m out.y4m 2>&1 | tr '\r' '\n' >progress
to get a better view of the problem and got to see the fps go from 75 to 42 while CPU usage stayed at a very stable 1000% (10 cores) and RAM near 2.5 GB. The two top filters are Lanczos and FrameEval at 3~3.5% each and the sum of all filter times reaches 57% of total time.
Interestingly, using a cachesize=10000 parameter for BestSource (default: 1000) made it 15% faster, while using 13 GB of RAM and a bit more CPU.
Before I start to play with Python profiling tools, anyone got an idea about it? Could have something to do with BestSource's cache and FrameEval being in "unordered" mode (seems like srestore could be it: https://github.com/WolframRhodium/muvsfunc/blob/master/muvsfunc.py#L8856)?
PS: System is Gentoo on an AMD 5900X with 64 GB of DDR4, using up-to-date packages from https://github.com/4re/vapoursynth-portage (including latest git for havsfunc and muvsfunc). Input is 30 min of "mpeg2video (Main), yuv420p(tv, top first), 720x480 [SAR 8:9 DAR 4:3], 29.97 fps, 29.97 tbr".