In Place Processing - What For? [Archive]

View Full Version : In Place Processing - What For?

martin53

7th December 2013, 22:14

When I saw that a SimpleSample version for AviSynth 2.6 on the Wiki (http://avisynth.nl/index.php/Filter_SDK/SimpleSample) uses 'in place' processing, I was keen to test it because I expected a speed advantage.
After I merged that example code with the 1.7 version of SimpleSample, I added a bool parameter to switch between the frame copying or in place modes (posted here (http://forum.doom9.org/showthread.php?p=1656962#post1656962)).

Finally, I did also a few benchmarks with VirtualDub's video analysis pass (script is also in above post).
RGB24 with frame copying yields ~1940fps, which is the fastest configuration I found.
RGB24 with in place processing of a 100 pixels wide square in the 640x480 colorbars frame yields ~1720fps.

Maybe I didn't make the benchmark right, in any case I'm curious if anyone has an explanation... and if there is another reason besides speed for it.

I also tested
c=BlankClips
c.SimpleSample(inplace=true)
StackHorizontal(c,last)
to check if c is also modified? - It is not. That's not what I expected.

Groucho2004

7th December 2013, 22:47

Finally, I did also a few benchmarks with VirtualDub's video analysis pass (script is also in above post).
RGB24 with frame copying yields ~1940fps, which is the fastest configuration I found.
RGB24 with in place processing of a 100 pixels wide square in the 640x480 colorbars frame yields ~1720fps.
Using VDub for benchmarking is a bad idea. It uses the ancient VfW interface which has a huge overhead.
This is no problem with real life scripts that run at 50-something FPS but your test with 1000~2000 FPS is possibly already inaccurate.
You should use a program that uses the Avisynth API directly, like this one (http://forum.doom9.org/showthread.php?t=165528).

Wilbert

7th December 2013, 23:35

I was keen to test it because I expected a speed advantage.
Although it should be faster, you are still copying the frame (dst=src, somewhere in your code).

TurboPascal7

7th December 2013, 23:41

When you call env->MakeWritable, Avisynth will check if someone else has references to this frame. If these references do exist (which is true pretty much always?) - a full copy of the frame is made and returned to you.
In-place processing will never alter the past. So your "c" clip will not be modified.

This implementation might be useful if you have a simple filter that works with one input clip and actually uses its values. Examples being invert or mt_lut. With in-place implementation you'll have to keep only one frame around (since destination is the same as source) which is better for CPU caching.

In ideal situation, when you're the only entity that has a reference to this video frame, it will be returned to you without making a copy, giving you the mentioned benefit at zero cost. I don't think I've seen this happening, a full copy is made pretty much always.

So when should you use it? Just try and see if it helps in real scripts but don't expect any kind of significant performance benefit.

Gavino

8th December 2013, 00:47

In ideal situation, when you're the only entity that has a reference to this video frame, it will be returned to you without making a copy, giving you the mentioned benefit at zero cost. I don't think I've seen this happening, a full copy is made pretty much always.
A copy always has to be made if the input is BlankClip(), since BlankClip() always returns the same frame (which obviously it keeps a reference to). But even for other inputs, it is usually the case that the cache holds a reference to the frame you have just received.

ultim

8th December 2013, 01:54

Yep, cache will most often hold a reference to the frame, so even if you do in-place processing, you'll end up with a copy. There are some rare cases though where there is no cache (or is turned off) where you might benefit from an in-place filter. So if you can, for maximum performance, do the processing in-place, because in some rare cases it will avoid a copy, but usually it won't. Copies are also cheap compared to many filters, so even if occasionally you can avoid a copy, the speed difference might be less than what you have imagined.

martin53

12th December 2013, 07:55

I found out at least one reason: less pointers are needed to address the target pixels if they are equal to the source pixels. That can dramatically boost speed.
I was able to tune a color adjustment plugin (RGB contrast+brightness for planar YVxx) from ~44fps @640x480 = 220 clocks/pixel to ~440fps = 22 clocks/pixel that way - together with other optimizations.
(Measured with AVSMeter)

wonkey_monkey

12th December 2013, 12:25

Is in-place processing new to 2.6?

ARDA

12th December 2013, 15:10

Déjà vu

I don't think I've seen this happening, a full copy is made pretty much always..

Just try and see if it helps in real scripts but don't expect any kind of significant performance benefit

Yep, cache will most often hold a reference to the frame
Copies are also cheap compared to many filters, so even if occasionally you can avoid a copy,
the speed difference might be less than what you have imagined.

In place updating, where it can be achived, can indeed, be very fast. However one must be mindfull of the overhead
if the source frame is not in fact writeable. In this case a call to MakeWriteable will copy the frame to a new VFB,
so you must add the time of a full frame blit to the improved time of doing an inplace update.

In simple linear scripts, in place algortihms will mostly win hands down.

In complicated scripts with multiple cross dependancies and temporal demands, the added overhead of MakeWriteable
will surely negate any advantage

So, How and when can we apply a possible algorithm in place ?

//false code
if (src->IsWritable() && sizeframe < (biggest available cache) ) {

yes! ; so we read source, apply algorithm and write in place (source frame)

ret src;
}
else
{

no! ; we create a new frame (dst), read source, apply algorithm and write in desty frame

if sizeframe< 40% biggest available cache, store temporal to a new frame

else store non temporal to a new frame

ret dst;
}

Caution: values for by passes cache are rough estimations, just an example, it dependes on computational waste
between load and store and the size of largest available cache. Of course these borders are always a trade off,
the further we are from that border of the comparison, the clearer will appear the performance gain.
Finally and not least important, not all algorithms can be used in place depending on the kind of math that is used.
To measure speed I would rather use something like this;

MPEG2Source("your source")
# or any source you like and use always the same to get a little more accurate benchmarks
# and test always the same frames each time. 9000 frames it is a good quantity for this script.

#TemporalSoften(4,8,8,15,2) # use this line to force a non writable src and test when a new video frame
# is created by your filter or not. It is just an example.

AvsTimer(frames=1000, name="ANYONE",type=3, frequency=2000, total=false, quiet=true)# use your cpu frequency

# Put here your filter that work in place
# or your filter that create a new video frame, or create a branch in the previous one.

AvsTimer(frames=1500 ,name="ANYONE",type=3, frequency=2000, difference=1, total=false)# use your cpu frequency

Open the scipt in virtualdub, set direct stream copy, set an initial frame and an end frame.
Open debugview(google), set a filter highlight in debugview, in this example *ANYONE*
Go back to virtualdub and Run video analysis pass. You will see in debug view windows the results every 1500 frames.

As always I hope this can be usefull.
Arda

StainlessS

13th December 2013, 20:46

Is in-place processing new to 2.6?

I believe all that is meant by that is,

(env->GetFrame, env->MakeWritable) combo rather than env->NewVideoFrame.