Log in

View Full Version : How can I speed up AVIsynth processing?


jrodefeld
17th March 2015, 06:58
Hello,

I've been using Avisynth scripts for video projects for a while now. It has been an invaluable tool but I'm becoming increasingly frustrated with the time it takes to process video.

I've got an Intel Core i7 4930k overclocked to 4.4ghz. That is a six core cpu. I've got 16gb of ram and an Nvidia Geforce 780 graphics card.

This is clearly a pretty fast system. Frankly, I'm sort of getting sick of having to wait 26 hours to properly deinterlace and filter a two hour video clip. Transcoding and encoding in Premiere takes fractions of that time.

I'd really love to get Avisynth to properly take advantage of ALL cores/threads on my cpu so I could speed up the process. However I haven't been able to get the MT version of avisynth to work at all. I get crashes every time I try to load a script.

Is there any way to get a properly working multithreaded avisynth to work fast and reliably?

Is there a specific version of avisynth I should be using? Or maybe some other tools or script examples could come in handy?

I'd appreciate any tips you could give me.

Thanks.

colours
17th March 2015, 08:29
Don't use MT (http://mod16.org/hurfdurf/?p=234), at least until it's more mature in Avisynth+. (Or you can already try multithreading in Avisynth+ now and see if it helps with the crashes compared to SEt's builds.)

Other than inbuilt multithreading, what you can do to more efficiently utilise your MOAR COARS rig is to split up the script into, say, 20000-frame segments, run all of them simultaneously, then concatenate the results.

Alternatively, and this might be somewhat obvious, just don't use slow filters if you don't want filtering to be slow.

GMJCZP
17th March 2015, 16:14
It would be good that you will publish the script with which you are working to see which filters could cause problems with Avisynth Set MT.

You could also use the Preroll() command. As a baseline test Preroll(fps video).

goorawin
19th March 2015, 04:10
It took me a long time to get MT to work and I almost gave up. I have found several things.
You need Avisynth 2.6 installed and working before installing the MT version of the avisynth.dll
Certain plugins do not work in MT mode so you have to write the script accordingly.
Setting the memory size is very important, it can be to big or to small, if its not right it will crash on the render. You just have to do a lot of testing with your scripts on your system.
My computer is also a core i7 with 16g of ram. I have also installed it on 2 other systems and they seem to work.
The following script is a very complex script which includes, image stabilisation, noise reduction, detail enhancement, chroma shift, auto levels, smoothlevels, and adding grain. All this done in one pass.
MTClipEnhance is where all this image processing is done. It also has various Setmtmode levels within the script, right down to Setmtmode(1).

It took me almost a week to get this all to work under MT, it just kept crashing. It look good in AvsPmod but try to render it was another thing altogether.
MT has halved the render time.
Mind you this script still only uses 45 to 60 % of the CPU resources, but that beats 18 to 25% without MT.



SetMemoryMax(512)
Setmtmode(5,4)
LoadVirtualDubPlugin ("C:\Virtual Dub\plugins32\Deshaker.vdf", "deshaker", preroll=0)Clip="G:\Video\Raw\00050.MTS"
V1=avss_26_DSS2(Clip, fps = 50)
A1=FFAudioSource(Clip,track = -1)
AudioDub(V1, A1).AdvancedMultiTrim("0,-25")
Setmtmode(2)
SCRIPT_1=ConvertToRGB32(matrix="Rec709").Deshaker("19|1|30|4|1|0|1|0|640|480|1|2|1000|1000|1000|1000|4|1|3|2|8|30|300|4|C:\\Users\\Peter\\AppData\\Local\\Deshaker\\Deshaker.log|0|0|0|0|0|0|0|0|0|0|0|0|0|1|10|10|7|10|0|0|30|30|0|0|1|0|1|1|0|10|1000|1|90|1|1|20|5000|100|20|1|0|ff00ff")
ForceProcess(SCRIPT_1)
SCRIPT_1 = 0
Setmtmode(2)
SCRIPT_2=ConvertToRGB32(matrix="Rec709").Deshaker("19|2|30|4|1|0|1|0|640|480|1|2|1000|1000|1000|1000|4|1|3|2|8|30|300|4|C:\\Users\\Peter\\AppData\\Local\\Deshaker\\Deshaker.log|0|0|0|0|0|0|0|0|0|0|0|0|0|1|10|10|7|10|0|0|30|30|0|0|1|0|1|1|0|10|1000|1|90|1|1|20|5000|100|20|1|0|ff00ff")
\.ConvertToYV12()
\.MTClipEnhance()

return SCRIPT_2

Function ForceProcess(clip c) {
# Force process c clip
RT_DebugF("Starting Force process of clip") # EDIT: Output to DebugView (google)
GScript("""
for(i=0,c.FrameCount-1) {
RT_YankChain(c,i)
}
""")
}

Don't give up yet!!!!!!!!

johnmeyer
19th March 2015, 05:42
I have almost 100 AVISynth scripts and I use SetMTMode in most of them. As already mentioned, there are some filters that don't work with SetMTMode. Also, you can sometimes have stability problems, and there are also some really nasty race conditions where frames get out of order. However, usually everything works just fine.

The primary requirement to getting it to work is the version of AVISynth you are using. If you user the version() statement, what exactly does AVISynth report?

I agree with goorawin that it is definitely worth spending more time trying to get it to work. I generally get a 3x - 6x improvement in speed, which obviously changes everything when it comes to processing hours of video. Most of my (fairly simple) scripts run faster than real time, but only when using SetMTMode(). My adaptation of VideoFred's somewhat complex film restoration script achieved a 4x improvement in speed, even though several of the plugins do not benefit from multi-threading.

MysteryX
20th March 2015, 16:57
While working with AviSynth, I didn't have any issues or configuration complications using the MT version; perhaps I was lucky. With quad-core i7 processor, in my case, it still runs at 100% CPU with or without MT, but MT is still a bit faster.

I found AviSynth+ to be faster than AviSynth 2.6, but AviSynth 2.6 MT to be faster than AviSynth+. I did some tests with a MT build of AviSynth+ and it was buggy as hell so I'd stay away from that build.

A new build of AviSynth 2.6 MT was very recently released so I'd stick to that build.

Again, for people here to help you with specific details, you'd have to post your script. By altering the script and changing the sequence of operations, you can probably gain a massive performance boost with minor changes to quality.

creaothceann
20th March 2015, 18:33
Why are the MT versions so buggy anyway? Are the filters just not designed to be used that way?

johnmeyer
20th March 2015, 18:49
Why are the MT versions so buggy anyway? Are the filters just not designed to be used that way?Yes. That is exactly the reason.

StainlessS
20th March 2015, 22:04
I dont use MT. I go to sleep, when it's done, it's done, not a prob.

Groucho2004
20th March 2015, 22:56
I go to sleep, when it's done, it's done
Wise words.

I just let another computer do the encoding, usually more than one encode at a time so there won't be any lazy CPU cycles loitering around.

johnmeyer
20th March 2015, 23:15
But why wait when you don't have to? I can get four times as much video done by using it, and with many of my larger projects, overnight isn't long enough.

Therefore, I think it is worth continuing to try to get it to work.

I use AVISynth 2.60, build: "Mar 9 2013 [13:28:27] (c) 2000-2011 Ben Rudiak-Gould, et al." I usually get a 3-4 time speed improvement when I enable it. I always test without it just so I can make sure there are no problems being introduced, and then enable it and do the encode.

GMJCZP
21st March 2015, 02:08
If this thread is how to increase the speed does not make sense not exhaust all possibilities to use MT.

goorawin, you could try the following (I do not know what is your core i7 model):

LoadPlugins ...
SetMemoryMax (512) # Some recommend that this value is 1/4 of the total RAM

SetMTMode (5,4) # For jrodefeld is (5,6),
SourceFilters ...
Preroll (int(fps video)) # As I indicated above. This may help stabilize the process.
SetMTMode (2,4) # For jrodefeld is (2,6)
.
.
.

goorawin
22nd March 2015, 04:38
I am working with a quad core i7 running at about 3.7
Thank you GMJCZP for your input. I had tried preroll before on this script without much luck. I have just tried it again from what you sent and guess what, it crashed out at the end of the render. At least the one I have works well, and has rendered many hundreds of clips without crashing. So I am not about to change it.

Stereodude
22nd March 2015, 12:28
I played with MT Avisynth a few years ago and found it to be incredibly unstable. I gave up on it. I found MP_Pipeline (http://forum.doom9.org/showthread.php?t=163281) to be much better way to speed up a script (presuming you have several high CPU usage steps in your script). I also moved to using intermediate lossless AVI files so I could process the source in several chunks simultaneously.

StainlessS
22nd March 2015, 13:51
Here is script to split an AVS script into parts for render to lossless, set PARTS=x as appropriate.


# MakeMultiPartScripts.avs
#######
PARTS=4 # Edit as appropriate
#######
myName="MakeMultiPartScripts: "
Assert(PARTS>1,RT_String("MakeMultiPartScripts.avs: PARTS MUST be greater than 1 (%d)",PARTS))
IsAvsPlus=(FindStr(VersionString, "AviSynth+")>0 || FindStr(VersionString, " Neo")>0)
HasGScript=RT_FunctionExist("GScript")
Assert(IsAvsPlus || HasGScript,RT_String("%sNeed either GScript or AVS+",myName))
#######
FSEL_TITLE="Select AVS files"
FSEL_DIR="."
FSEL_FILT="AVS files|*.AVS"
FSEL_MULTI=True
FILE_LIST = RT_FSelOpen(title=FSEL_TITLE,dir=FSEL_DIR,filt=FSEL_FILT,multi=FSEL_MULTI)
Assert(!(FILE_LIST.IsInt && FILE_LIST==0),RT_String("%sUser Cancelled in FileSelector - No action taken",myName))
Assert(FILE_LIST.IsString,RT_String("%sRT_FSelOpen Error=%s",myName,String(FILE_LIST)))

SCRIPT_TMPT = """
PART = %d
PARTS = %d
Import("%s")
FC = FrameCount
Start = ((part==0 ) ? -1 : part * (FC-1) / PARTS) + 1
End = ((part==PARTS-1) ? FC-1 : (part+1) * (FC-1) / PARTS)
End = (End == Start) ? -1 : End
Trim(Start,End)
# Subtitle("Trim("+String(Start)+","+String(End)+")")
Return Last
"""

GS="""
NLINES = RT_TxtQueryLines(FILE_LIST)
for(i=0,NLINES-1) {
PARTSNAME = RT_TxtGetLine(FILE_LIST,i)
for(part=0,PARTS-1) {
PartName = RT_String("%s_PART_%d.avs",PARTSNAME,part+1)
ScriptInst=RT_String(SCRIPT_TMPT,part,PARTS,PARTSNAME)
RT_WriteFile(PartName,"%s",ScriptInst)
}
}
"""
HasGScript ? GScript(GS) : Eval(GS) # Use GSCript if installed (loaded plugs override builtin)

MessageClip(RT_String("ALL DONE, Processed %d avs files of %d parts each",NLINES,PARTS))

EDITED:

Puts up a file selector, choose script to split, and et voila done.
Will probably play up in AVSPMod (somehow captures and changes fileselector), can just right click and open with eg Vdub or
almost any player (not VLC).

Despite some others suggesting it be necessary to have an overlap, dont think is necessary [EDIT: is NOT necessary], if you find different, then post somewhere.

Requires RT_Stats(). EDIT: Oops, and GScript().

Stereodude
22nd March 2015, 18:15
Despite some others suggesting it be necessary to have an overlap, dont think is necessary, if you find different, then post somewhere.
It all depends on the filters you're using. If they're temporal in nature you will want a enough frames of overlap to cover the temporal range. That way the first used frame (in the final output) of the 2nd and later clips had the benefit of the temporal information from the frames before it as opposed to having none.

I could see some visible differences at the splice points when using MCTD without overlap vs. continuous (or non split contiguous).

StainlessS
22nd March 2015, 19:09
I could see some visible differences at the splice points when using MCTD without overlap vs. continuous (or non split contiguous).

Dont see how that is possible if trim is done AFTER temporal processing (later in script), if chopping out and returning a single frame, the same input
frames will be used to construct that frame unless there is a problem in either Avisynth or the plugins in use, you sure you got the splice points correct,
or did you trim before the temporal stuff ?

EDIT: I guess if using MT, anything could happen.

EDIT: In the above Multipart script thing, trim done after any temporal processing, in fact not even in same script, the temporal processing part in original script has full access to every single frame in the source clip, it is only the synthesized script that has no access after the trim
filter.

Stereodude
22nd March 2015, 21:11
Dont see how that is possible if trim is done AFTER temporal processing (later in script), if chopping out and returning a single frame, the same input
frames will be used to construct that frame unless there is a problem in either Avisynth or the plugins in use, you sure you got the splice points correct,
or did you trim before the temporal stuff ?

EDIT: I guess if using MT, anything could happen.

EDIT: In the above Multipart script thing, trim done after any temporal processing, in fact not even in same script, the temporal processing part in original script has full access to every single frame in the source clip, it is only the synthesized script that has no access after the trim
filter.
I wasn't speaking to your script. I'm speaking in general. If I have a 10000 frame clip and I process it in parallel in four discrete 2500 frame chunks, frames 2500, 5000, and 7500 all have no access to the frames before them for temporal processing and the result is visible in the output (at least in the case of MCTD).

When I split up a clip that like for parallel processing I overlap and recombine like this:

((AVI 0-2510).trim(0,2500) +/
(AVI 2490-5010).trim(11,2511) +/
(AVI 4990-7510).trim(11,2511) +/
(AVI 7490-9999).trim(11,-0))

StainlessS
22nd March 2015, 21:13
OK, I finally got MCTemporalDenoise set up with required plugins and tried these two test scripts.

Test clip creator

Import("MCTemporalDenoise.avs")
Avisource("D:\DBSC\AP10_out.avi").trim(10000,-100)

MCTemporalDenoise(settings="very high")

A=Last # Untrimmed test clip 100 frames, Save lossless as A.AVI
B=Trim(0,49) # trimmed part 1 50 frames, Save lossless as B.AVI
C=Trim(50,99) # trimmed part 2 50 frames, Save lossless as C.AVI

A # Save clip A.AVI


Comparison

A=Avisource("A.avi") # 100 frames
B=Avisource("B.avi") # part 1 50 frames
C=Avisource("C.avi") # part 2 50 frames
Z=B++C

COUNT = 0
SCRIPT="""
n=current_frame
COUNT = COUNT + RT_LumaPixelsDifferentCount(Last,Z,n=n,n2=n)
RT_Subtitle("Total pixels different count = %d",COUNT)
Return Last
"""
return A.ScriptClip(SCRIPT)


Not a single pixel different. Dont know what setting you may have used in MCTD, but I chose 'very high'.
EDIT: Of course that did not check chroma, but I can say that its likely the exact same result.

EDIT: Of course if you trim clips in your initial script BEFORE the temporal processing then you will definitely need the overlap,
would not if done right.

Stereodude
22nd March 2015, 21:23
OK, I finally got MCTemporalDenoise set up with required plugins and tried these two test scripts.

Test clip creator

Import("MCTemporalDenoise.avs")
Avisource("D:\DBSC\AP10_out.avi").trim(10000,-100)

MCTemporalDenoise(settings="very high")

A=Last # Untrimmed test clip 100 frames, Save lossless as A.AVI
B=Trim(0,49) # trimmed part 1 50 frames, Save lossless as B.AVI
C=Trim(50,99) # trimmed part 2 50 frames, Save lossless as C.AVI

A # Save clip A.AVI


Comparison

A=Avisource("A.avi") # 100 frames
B=Avisource("B.avi") # part 1 50 frames
C=Avisource("C.avi") # part 2 50 frames
Z=B++C

COUNT = 0
SCRIPT="""
n=current_frame
COUNT = COUNT + RT_LumaPixelsDifferentCount(Last,Z,n=n,n2=n)
RT_Subtitle("Total pixels different count = %d",COUNT)
Return Last
"""
return A.ScriptClip(SCRIPT)


Not a single pixel different. Dont know what setting you may have used in MCTD, but I chose 'very high'.
EDIT: Of course that did not check chroma, but I can say that its likely the exact same result.
I wouldn't expect any differences. You didn't run the instances of MCTD multiple split segments in parallel (I think).

If you run 3 scripts

A: -> A.AVI
Import("MCTemporalDenoise.avs")
Avisource("D:\DBSC\AP10_out.avi").trim(10000,-100)

MCTemporalDenoise(settings="very high")

B: -> B.AVI
Import("MCTemporalDenoise.avs")
Avisource("D:\DBSC\AP10_out.avi").trim(10000,-50)

MCTemporalDenoise(settings="very high")

C: -> C.AVI
Import("MCTemporalDenoise.avs")
Avisource("D:\DBSC\AP10_out.avi").trim(10050,-50)

MCTemporalDenoise(settings="very high")
B+C will differ from A for a few frames around the end of B and the start of C.

Edit: Now I see what you're saying to do. Trim the output not the input and run those scripts in parallel.

StainlessS
22nd March 2015, 21:33
You didn't run the instances of MCTD multiple split segments in parallel

Would make no difference.

Another test script

Import("MCTemporalDenoise.avs")
Avisource("D:\DBSC\AP10_out.avi").trim(10000,-100)

SCRIPT="""
RT_DebugF("%d ] Frame Requested",current_frame)
Return Last
"""

ScriptClip(SCRIPT)
MCTemporalDenoise(settings="very high")
Trim(100,-1) # Return ONLY a single frame, frame 100 # Should have requested frame 50 here


Result

00000003 6.24568701 RT_DebugF: 96 ] Frame Requested
00000004 6.29375124 RT_DebugF: 94 ] Frame Requested
00000005 6.33775997 RT_DebugF: 95 ] Frame Requested
00000006 6.36865711 RT_DebugF: 97 ] Frame Requested
00000007 6.50268555 RT_DebugF: 94 ] Frame Requested
00000008 6.59054947 RT_DebugF: 93 ] Frame Requested
00000009 6.62691736 RT_DebugF: 91 ] Frame Requested
00000010 6.66711283 RT_DebugF: 92 ] Frame Requested
00000011 7.28551531 RT_DebugF: 98 ] Frame Requested
00000012 7.50896025 RT_DebugF: 99 ] Frame Requested
00000013 8.36514378 RT_DebugF: 96 ] Frame Requested
00000014 8.42717552 RT_DebugF: 90 ] Frame Requested
00000015 8.46586323 RT_DebugF: 88 ] Frame Requested
00000016 8.50676250 RT_DebugF: 89 ] Frame Requested


And that is not including cache interceptions.

EDIT: Repeat, you have to trim at the END of the script.

EDIT:

Edit: Now I see what you're saying to do. Trim the output not the input and run those scripts in parallel.
Yep, you got it OK :)

EDIT: Actually I stuffed up this test script, requested frame 100 and the last frame in clip is frame 99, I meant to request middle frame,
so, a request for frame 100 would actually be limited to frame 99 within avisynth and all plugins. :(
Anyway, it does show that frame access is not restricted to the trimmed frames which it was what was intended to show.

Here test repeated but requesting only single frame 50

00000003 6.48292446 RT_DebugF: 47 ] Frame Requested
00000004 6.54360437 RT_DebugF: 45 ] Frame Requested
00000005 6.58776665 RT_DebugF: 46 ] Frame Requested
00000006 6.61964703 RT_DebugF: 48 ] Frame Requested
00000007 6.75685120 RT_DebugF: 45 ] Frame Requested
00000008 6.86953878 RT_DebugF: 44 ] Frame Requested
00000009 6.91244841 RT_DebugF: 42 ] Frame Requested
00000010 6.94972992 RT_DebugF: 43 ] Frame Requested
00000011 7.58406591 RT_DebugF: 49 ] Frame Requested
00000012 7.80722141 RT_DebugF: 50 ] Frame Requested
00000013 8.01869965 RT_DebugF: 51 ] Frame Requested
00000014 8.72660160 RT_DebugF: 47 ] Frame Requested
00000015 8.78894424 RT_DebugF: 41 ] Frame Requested
00000016 8.82385731 RT_DebugF: 39 ] Frame Requested
00000017 8.86326408 RT_DebugF: 40 ] Frame Requested
00000018 10.73035145 RT_DebugF: 52 ] Frame Requested
00000019 11.22004700 RT_DebugF: 53 ] Frame Requested
00000020 11.66969299 RT_DebugF: 54 ] Frame Requested
00000021 12.61259842 RT_DebugF: 55 ] Frame Requested
00000022 13.15336418 RT_DebugF: 56 ] Frame Requested
00000023 13.68243790 RT_DebugF: 57 ] Frame Requested
00000024 14.21497536 RT_DebugF: 58 ] Frame Requested
00000025 14.74297905 RT_DebugF: 59 ] Frame Requested

markanini
23rd November 2015, 12:23
Here is script to split an AVS script into parts for render to lossless, set PARTS=x as appropriate.


# MakeMultiPartScripts.avs

PARTS=4
#######
Assert(PARTS>1,RT_String("MakeMultiPartScripts.avs: PARTS MUST be greater than 1 (%d)",PARTS))
FSEL_TITLE="Select AVS files"
FSEL_DIR="."
FSEL_FILT="AVS files|*.AVS"
FSEL_MULTI=True
FILE_LIST = RT_FSelOpen(title=FSEL_TITLE,dir=FSEL_DIR,filt=FSEL_FILT,multi=FSEL_MULTI)
Assert(FILE_LIST.IsString,"MakeMultiPartScripts.avs: RT_FSelOpen Error="+String(FILE_LIST))

SCRIPT_TMPT = """
PART = %d
PARTS = %d
Import("%s")
FC = FrameCount
Start = ((part==0 ) ? -1 : part * (FC-1) / PARTS) + 1
End = ((part==PARTS-1) ? FC-1 : (part+1) * (FC-1) / PARTS)
End = (End == Start) ? -1 : End
Trim(Start,End)
# Subtitle("Trim("+String(Start)+","+String(End)+")")
Return Last
"""

GSCript("""
NLINES = RT_TxtQueryLines(FILE_LIST)
for(i=0,NLINES-1) {
PARTSNAME = RT_TxtGetLine(FILE_LIST,i)
for(part=0,PARTS-1) {
PartName = RT_String("%s_PART_%d.avs",PARTSNAME,part+1)
ScriptInst=RT_String(SCRIPT_TMPT,part,PARTS,PARTSNAME)
RT_WriteFile(PartName,"%s",ScriptInst)
}
}
""")
MessageClip(RT_String("ALL DONE, Processed %d avs files of %d parts each",NLINES,PARTS))


Puts up a file selector, choose script to split, and et voila done.
Will probably play up in AVSPMod (somehow captures and changes fileselector), can just right click and open with eg Vdub or
almost any player (not VLC).

Despite some others suggesting it be necessary to have an overlap, dont think is necessary, if you find different, then post somewhere.

Requires RT_Stats(). EDIT: Oops, and GScript().

What do I do with?

[Original].avs_PART_1.avs
[Original].avs_PART_2.avs
[Original].avs_PART_3.avs
[Original].avs_PART_4.avs

StainlessS
23rd November 2015, 16:59
The purpose of script is to automatically split results of source script into x number of equal parts, so that each part
can be process individually via eg VirtualDub (written to lossless file).
For your [Original].avs_PART_1.avs type files above, you would start each of them in separate instances of VDub, and write to separate AVI files.
[Original].avs_PART_1.avs would process first quarter of frames, [Original].avs_PART_2.avs second, etc.

EDIT: Result of all 4 lossless files would later be spliced together from the resultant AVI's.
Allows you to process in parallel without any MT associated problems.
Read the thread from the start for further on reason for script to exist.

markanini
23rd November 2015, 20:59
I did a quick test with four instances of Virtualdub, setting CPU affinity in Windows task manager. On a CPU heavy script I went from approximately 3 fps single-threaded to 4 fps combined. The downside is that the system became unresponsive. My CPU is a i5-2500k.

creaothceann
23rd November 2015, 22:55
The downside is that the system became unresponsive.
Set each encoder instance's priority to "Low" (Task Manager) / "Idle" (Process Manager (https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx)) / "LOW" (cmd.exe's "start" command when using x264 etc.).