Log in

View Full Version : Question about std.cache


ChaosKing
16th November 2017, 14:54
http://www.vapoursynth.com/doc/functions/cache.html

My script:

#Vapo R40
clip = predenoise(...)
#clip = clip.std.Cache(size = 100, fixed=False, make_linear=False)
clip = haf.Stab(clip, range=30, mirror=15, dxmax=3, dymax=3) # 30 frames

If I add std.Cache the script is nearly 6x faster. ram usage with std.cache is even slightly lower (1.7GB vs 1.6). I have 16gb of Ram.

So my Question is: Shouldn't vapoursynth handle the caching stuff for me? The doc mentions Python modules. Python module = python script?

Does this mean I should use std.cache for compiled filters for best speed? Or rather when should I use cache?



here's the stab function
def Stab(clp, range=1, dxmax=4, dymax=4, mirror=0):
core = vs.get_core()

if not isinstance(clp, vs.VideoNode):
raise TypeError('Stab: This is not a clip')

temp = AverageFrames(clp, weights=[1] * 15, scenechange=25 / 255)
inter = core.std.Interleave([core.rgvs.Repair(temp, AverageFrames(clp, weights=[1] * 3, scenechange=25 / 255), 1), clp])
mdata = core.depan.DePanEstimate(inter, range=range, trust=0, dxmax=dxmax, dymax=dymax)
last = core.depan.DePan(inter, data=mdata, offset=-1, mirror=mirror)
return core.std.SelectEvery(last, 2, [0])

Myrsloik
16th November 2017, 15:09
http://www.vapoursynth.com/doc/functions/cache.html

My script:

#Vapo R40
clip = predenoise(...)
#clip = clip.std.Cache(size = 100, fixed=False, make_linear=False)
clip = haf.Stab(clip, range=30, mirror=15, dxmax=3, dymax=3) # 30 frames

If I add std.Cache the script is nearly 6x faster. ram usage with std.cache is even slightly lower (1.7GB vs 1.6). I have 16gb of Ram.

So my Question is: Shouldn't vapoursynth handle the caching stuff for me? The doc mentions Python modules. Python module = python script?

Does this mean I should use std.cache for compiled filters for best speed? Or rather when should I use cache?



here's the stab function
def Stab(clp, range=1, dxmax=4, dymax=4, mirror=0):
core = vs.get_core()

if not isinstance(clp, vs.VideoNode):
raise TypeError('Stab: This is not a clip')

temp = AverageFrames(clp, weights=[1] * 15, scenechange=25 / 255)
inter = core.std.Interleave([core.rgvs.Repair(temp, AverageFrames(clp, weights=[1] * 3, scenechange=25 / 255), 1), clp])
mdata = core.depan.DePanEstimate(inter, range=range, trust=0, dxmax=dxmax, dymax=dymax)
last = core.depan.DePan(inter, data=mdata, offset=-1, mirror=mirror)
return core.std.SelectEvery(last, 2, [0])

How many frames did you test it on? Your script has a quite large temporal radius which means that the starting cache size probably is slightly too small. The fun part about this is that even slightly too small cache sizes can have a very big impact if it causes seeking in the source filter. Let it run for a few 1000 frames and it should automatically grow to a suitable size.

Obviously adding an extra cache with the proper (or more exactly overdimensioned) starting size will skip the adaptation step.

And a bonus question: how many threads?

ChaosKing
16th November 2017, 15:38
But what if my 1000 frames are slooooow? :D

620 frames:
Time elapsed: 10:06.405 - 1.03 FPS; estimated time to finish: 7:50:34.106
Time elapsed: 01:44.409 - 5.97 FPS; estimated time to finish: 1:21:33.162
(guess what uses the extra cache ;-)

My cpu 3570K @ 4Ghz with 4 threads.
I would say the CPU usage was a bit higher without the extra cache. Over 90% and mostly below 90% with cache.

Myrsloik
16th November 2017, 16:42
But what if my 1000 frames are slooooow? :D

620 frames:
Time elapsed: 10:06.405 - 1.03 FPS; estimated time to finish: 7:50:34.106
Time elapsed: 01:44.409 - 5.97 FPS; estimated time to finish: 1:21:33.162
(guess what uses the extra cache ;-)

My cpu 3570K @ 4Ghz with 4 threads.
I would say the CPU usage was a bit higher without the extra cache. Over 90% and mostly below 90% with cache.

Congratulations! You found an interesting edge case!

I'm going to save it in my collection and see if I can improve the cache size adjustment.

ChaosKing
16th November 2017, 16:46
My 2000 frames test just finished. Interessting is that the non extra cache version got even slower while the extra cache version a bit faster.

Time elapsed: 38:04.504 - 0.86 FPS
Time elapsed: 04:59.019 - 6.7 FPS

ChaosKing
16th November 2017, 16:59
Now it gets even more interessting. I removed my temporal denoiser and now it's just FFMS2 + convertto16bit + Stab().

Time elapsed: 0:32.985 - 15.19 FPS no extra cache
Time elapsed: 0:33.673 - 14.89 FPS cache size 40
Time elapsed: 0:48.488 - 10.33 FPS cache size 100-1000

And now the "auto cache" ist winning again.

//Edit
If I remove Stab() and keep the denoiser, I get
Time elapsed: 0:11.888 - 42.14 FPS no extra cache
Also extra cache doesn't seem to affect the speed. Tested also large cache sizes of 1000

Myrsloik
17th November 2017, 13:51
Now it gets even more interessting. I removed my temporal denoiser and now it's just FFMS2 + convertto16bit + Stab().

Time elapsed: 0:32.985 - 15.19 FPS no extra cache
Time elapsed: 0:33.673 - 14.89 FPS cache size 40
Time elapsed: 0:48.488 - 10.33 FPS cache size 100-1000

And now the "auto cache" ist winning again.

//Edit
If I remove Stab() and keep the denoiser, I get
Time elapsed: 0:11.888 - 42.14 FPS no extra cache
Also extra cache doesn't seem to affect the speed. Tested also large cache sizes of 1000

You just confirmed what I suspected, that the initial cache size is too small and grows too slowly. I'll investigate if I can make it less bad (note the less bad part, having the ideal size from the start will obviously always win but if you have multiple 1000 frame caches you run out of memory insanely fast)