View Full Version : Sora's avs multi-process/multi-thread plugin package (2012-02-20)
leiming2006
10th February 2012, 09:16
Hello, today I have made two avisynth plugins to solve some problem.
the first one is SoraThread.
It uses multiple threads to pipeline the filters.
It aims at making full use of multi-core cpu.
the other is SoraSMSource/SoraSMServer.
It uses multiple processes to go through the filters.
I heard some filter uses a huge number of memory, and process will crash during going through filters.
Also shared memory is the fastest way to make inter-process communication so I have a try on this way.
This one aims at break the limit of memory usage of a single 32-bit process.
It's my first attempt to write an avisynth plugin, and my English is not that good. 2 pictures to explain what I am doing are in the 7z.
I hope my work is of use.
Please feel free to post comments and questions.
Source code (in bsd license) is also in the 7z.
http://173.224.214.51/ftp/sora_mtmp_package_20120220.7z <- some experimental codes are in this version
old versions:
http://173.224.214.51/ftp/sora_mtmp_package_20120219.7z <- so maybe you'll also like this.
http://173.224.214.51/ftp/sora_mtmp_package_20120213.7z
http://173.224.214.51/ftp/sora_mtmp_package_20120211.7z
http://173.224.214.51/ftp/sora_mtmp_package_20120210.7z
ChaosKing
11th February 2012, 00:38
Interesting plugin... just played a bit with it and got a speed gain of ~40% CPU from 25% -> 60%, :)
But I don't understand how to use it properly, perhaps you can give some examples?
Can or should I use multiple SoraSMSource() ?
I tried this little script
1. called: sorasmserver aa myscript.avs
SoraSMSource("aa")
directshowsource("P4_02.m2ts") # FullHD Source
SoraThread(30,60)
fft3dfilter()
SoraThread(30,60)
fft3dfilter()
TheProfileth
11th February 2012, 04:42
Hmmm will definetly give this a try
leiming2006
11th February 2012, 06:36
2012-2-11
+ sorathread: custom prefetch algorithm is supported via lua script
f sorasmserver: the way to retrive single frame size is changed. the previous one causes overflow and crashes. (critical!)
now, examples are also included
Interesting plugin... just played a bit with it and got a speed gain of ~40% CPU from 25% -> 60%, :)
But I don't understand how to use it properly, perhaps you can give some examples?
Can or should I use multiple SoraSMSource() ?
examples are now included in the 7z.
I tried this little script
1. called: sorasmserver aa myscript.avs
SoraSMSource("aa")
directshowsource("P4_02.m2ts") # FullHD Source
SoraThread(30,60)
fft3dfilter()
SoraThread(30,60)
fft3dfilter()
1. sorasmsource is a source filter, directshowsource will discard it.
if you are not in the problem of a single 32-bit process's memory limit, I think you don't need sorasmsource/sorasmserver.
2. be careful that sorathread with parameters 30,60 will eat 90 frames of memory.
mastrboy
11th February 2012, 11:44
is SoraThread based on ThreadRequest? (http://forum.doom9.org/showthread.php?p=1405205)
leiming2006
11th February 2012, 13:55
is SoraThread based on ThreadRequest? (http://forum.doom9.org/showthread.php?p=1405205)
No, I was writing a new plugin.
In fact I didn't know there is a plugin called ThreadRequest before...
I have already make it work so I posted in the forum to see how people's attitude toward this.
ChaosKing
11th February 2012, 16:51
if you are not in the problem of a single 32-bit process's memory limit, I think you don't need sorasmsource/sorasmserver.
Ah okay, so basicly SoraSMSource/SoraSMServer and SoraThread do the same thing and I only need SoraSource/Server if I want to use more then 4GB Ram with 32Bit Software?
leiming2006
13th February 2012, 05:40
updated.
2012-2-13
binaries are compiled with /LARGEADDRESSAWARE
f sorathread: over use buffer and cache
f sorasmserver: avs script without a empty line in the end
of the file could be read correctly now.
also, a new sorasmserver/sorasmsource with higher performance is planning.
Ah okay, so basicly SoraSMSource/SoraSMServer and SoraThread do the same thing and I only need SoraSource/Server if I want to use more then 4GB Ram with 32Bit Software?
inter-process communication is expansive, so SoraThread should has better performance then SoraSMServer/Source.
And then, SoraSMServer/Source didn't use a prefetch buffer. I thought script running with SoraSMServer/Source would be slower then running in single thread, single process but the test result is SoraSMServer/Source is slightly faster.
Hiritsuki
15th February 2012, 04:20
I had use this plugin to run my project, and it's stable than TCPServer/Source for use.
see that view this picture.
http://xtupload.com/image-E727_4F3A8313.jpg
aegisofrime
16th February 2012, 08:35
Not bad, I tried this yesterday and finished an encode that will normally crash with SetMTMode(2) :)
However I'm confused as to how to get more threads. Do I just call SoraThread multiple times, like for example:
Sorathread(15,30)
Sorathread(15,30)
QTGMC()
leiming2006
16th February 2012, 09:09
Not bad, I tried this yesterday and finished an encode that will normally crash with SetMTMode(2) :)
However I'm confused as to how to get more threads. Do I just call SoraThread multiple times, like for example:
Sorathread(15,30)
Sorathread(15,30)
QTGMC()
SoraThread pipelines the processing.
It uses a stand-alone thread to have the filters before it work.
It makes effect when used between filters,
and (should) makes no effect or negative effect when arranged just in a row
ChaosKing
18th February 2012, 17:00
I get random crashes with the newest version 20120213. No Problems with the previous version.
And maybe it is a bug, I don't know ... If I set cache for example at 20, the encode will slow down and is even slower after a while then without sorathread
I'm using now sorathread(2,1) and it seems to give the best performance on my encode.
leiming2006
19th February 2012, 06:18
updated.
2012-2-19
avisynth.h is now from 2.58
+ sorathread: a totally cache-disabled version is offered (with cache=0)
c sorathread: the parameter "cache" is now default 0
f sorasmserver: error message is given when avisynth script has errors.
c sorathread (internal): some codes of mutex and conditional variable usage rollback to the version 20120211
I get random crashes with the newest version 20120213. No Problems with the previous version.
And maybe it is a bug, I don't know ... If I set cache for example at 20, the encode will slow down and is even slower after a while then without sorathread
I'm using now sorathread(2,1) and it seems to give the best performance on my encode.
I don't sure if it's the problem about working with multi-threaded avisynth. the test environment is avisynth 2.58 official build.
I have re-check the source code and the problem didn't appear in my computer so I can't get to know why it crashes. now I rollback the source code about inter-thread synchronization to previous version.
and then, after thought twice I wonder if the cache is necessary. filters which need to refer to previous frames are likely maintain it's own cache. So the cache is changed to be disabled by default. (the cache algorithm will not execute and the mutex will not be created)
zerowalker
20th February 2012, 03:50
Does this have negative effects?
Like filters may not work properly?
or is it, it either work or crash?
leiming2006
20th February 2012, 04:35
Does this have negative effects?
Like filters may not work properly?
or is it, it either work or crash?
Certainly it has overheads to maintain buffer.
I checked the code and test the program. There is neither deadlock nor crashes happened in my computer but it seems they sometimes happened in others' computer. I don't know why it happens. I'll try different ways to fix, or at least work around it if I can.
leiming2006
20th February 2012, 15:27
Someone in a Chinese forum said that sorathread hang or simply ate a whole cpu core with doing nothing.
Sometimes even crashes.
In my computer I can't reproduce the bug.
Even with minidump I can't find what is wrong.
So I have to try some ways to work around the problem...
A new mode called copy mode is added. (default: off)
In this mode it will copy the prefetched frames' data into it's own memory.
This mode is significantly slow.
I wonder if it can go against the crash.
Also, when the avisynth environment changes (I think it will not happen),
the copy mode will be automatically enabled.
Then, when wait for frame, there is a timeout of 2 seconds now.
One person told me that sorathread hangs on his computer.
But I can neither reproduce it on my computer, nor found where is wrong in my code.
So... sorry, finally I comes up with such an idea......
2012-2-20
+ sorathread: copy mode is now added. frame data will copied into buffer instead of just save the smart pointer.
+ sorathread: timeout is added when wait for frames or empty slot of buffer. Trying to help my poor algorithm escape from deadlock.
f sorathread: frame copying to that belongs to different IScriptEnvironment now will performance data copy
ChaosKing
20th February 2012, 16:53
Tested your newest version 2012-2-20
My System:
Win x64
CPU Q9550
Avisynth 2.6 build May 25 2011
My test script:
deen("a3d", 3,2,3,3)
fastlinedarkenmod(72, thinning=0)
awarpsharp2(62)
sorathread() # 2,1# 2,0# datacopy=true
SantiagMod(strh=2,strv=2)# Antialiasing Filter with NNEDI3
New Version doesn't crash anymore, Speed seems to be the same as in previous versions with my script: ~4.5 fps (without sorathread) -> ~6.9 fps (with sorathread)
With DataCopy= true it's slightly slower ~6.8 fps
I also tested sorathread on "light" filters like deen or fastlinedarken. Here it is just not worth to run soratread, worst case: your encode becomes slower
I love your filter, very easy to use with nice speed improvements! :cool:
zerowalker
20th February 2012, 17:53
I still donīt get it, is it better than SetMTmode;S?
;D!
ChaosKing
20th February 2012, 18:10
Well, I tested SetMTmode long time ago, it was ok, but had some problems with some filters + u need a moded avisynth dll for it.
With Soreathreat u can use it with the "official" avisynth versions, without to mod anything.
I don't think it's better or worse than SetMTmode, it's just another way of using more CPU cores in avisynth.
zerowalker
20th February 2012, 18:12
Okay, well i use the Avisynth 2.6 version that is posted in this forum with this;O
But am i supposed to write SoraThread(X,X) before every filter?
Thanks
leiming2006
21st February 2012, 05:03
New Version doesn't crash anymore, Speed seems to be the same as in previous versions with my script: ~4.5 fps (without sorathread) -> ~6.9 fps (with sorathread)
With DataCopy= true it's slightly slower ~6.8 fps
I also tested sorathread on "light" filters like deen or fastlinedarken. Here it is just not worth to run soratread, worst case: your encode becomes slower
I love your filter, very easy to use with nice speed improvements! :cool:
I'm glad to see the problem disappeared.
The CPU of my computer is T2390 (2 cores),
some problems may happened more rarely in CPU with 2 cores than CPU with 4 or more cores
I'm happy that you like it :p
Okay, well i use the Avisynth 2.6 version that is posted in this forum with this;O
But am i supposed to write SoraThread(X,X) before every filter?
Thanks
You may insert SoraThread() between filters to separate them into different threads.
For example,
AVISource(...)
SoraThread(...)
Crop(...)
Lanczosresize(...)
Then, the [decoding] and [cropping and resizing] will work in different threads.
AVISource(...)
Crop(...)
SoraThread(...)
Lanczosresize(...)
Then, the [decoding and cropping] and [resizing] will work in different threads.
tormento
21st February 2012, 07:05
It's a nice plugin but there is a great problem: it doesn't multithread avsi scripts as SetMTMode does. Am I wrong?
zerowalker
21st February 2012, 19:05
how can i make QTGMC() alone work in 4 threads then;O?
mastrboy
27th February 2012, 18:55
@zerowalker, you have to modify QTGMC.avsi by adding sorathread calls various places inside the function definition.
I finally got around to testing sorathread today, adding a SoraThread() after a call to nnedi3 the fps was increased by 30%, adding more of SoraThread() other places did not increase speed any further.
@leiming2006, how do one decide what the appropriate buffer value is for a script?
turbojet
28th February 2012, 06:53
I gained about 25% by adding 3 sorathread(1) between source, tfm, tdecimate and crop/resize, thanks leiming2006. Threadrequest was about a 15% gain, MT and SetMTMode slowed down the script considerably.
1 frame buffer was much faster than the default of 3 in my case.
leiming2006
28th February 2012, 15:17
In my friends' tests, concurrently reading frames from the same filter instance will cause crash.
for a meaningless (but should have not crashed) example,
(this script is offered by "304". I have made it more simple)
ffvideosource(...)
assumeframebased()
separatefields()
selectevery(8, 0, 1, 0, 3, 2, 5, 4, 5, 6, 7)
weave()
w=width
h=height
TFM(mode=5, pp=0, slow=2).TDecimate(1)
sorathread()
src=last
c=last.spline64resize(w,h,2,2,-2,-1)
clip1=c.sorathread()
msk=c.sorathread()
clip2=src.sorathread()
mt_merge(clip1, clip2, msk, luma=false, u=3, v=3).sorathread()
I'm thinking about whether it's because the object which comes (type: IScriptEnvironment) from Avisynth engine is not thread-safed.
I'll try locking the environment object during processing next time, to see if the crash could be avoided.
@leiming2006, how do one decide what the appropriate buffer value is for a script?
I'm sorry I don't know.
In fact the default value 5 is what I "guess".
Just thought that "maybe... 5 is enough"
As my designing, higher value will cause more buffer-maintaining overhead, more locking conflict (thread waiting for each other) and a value which is not enough will also cause threads waiting for data happened more frequently.
Pat357
18th March 2012, 17:13
It's a pitty that we don't see any new versions of this nice tool...
Why is this ?
Has the developer (Sora) lost his interest ?
GEfS
18th March 2012, 18:31
ffms2_FFVideoSource("bluray remux source.mkv")
SetMTMode(2,0)
Spline36Resize(1280,534,0,140,0,-140)
LSFmod(ss_x=1.0,ss_y=1.0,Smode=3,Smethod=3,Lmode=3,strength=60,overshoot=2,undershoot=1)
I'm using a avs script like that.
Can you give me a "Sora" script for this? :thanks:
My x264 setting:
--level 4.1 --deblock -3:-3 --bframes 8 --b-adapt 2 --ref 12 --aq-mode 2 --aq-strength 0.8
--cqmfile "C:\Users\Administrator\Desktop\eqm_avc_hr.cfg" --merange 32 --me umh --direct auto --subme 11
--partitions all --trellis 2 --psy-rd 1.15:0.45 --no-dct-decimate --no-fast-pskip
Pass 1: ~17.5 fps
Pass 2: ~5 fps
ChaosKing
18th March 2012, 21:31
[code]
Can you give me a "Sora" script for this? :thanks:
Just add sorathread() before LSFmod()
Spline36Resize(1280,534,0,140,0,-140)
sorathread()
LSFmod(ss_x=1.0,ss_y=1.0,Smode=3,Smethod=3,Lmode=3,strength=60,overshoot=2,undershoot=1)
I don't know how SetMTMode would affect sorathread, so maybe you need to remove it from your script.
Pat357
19th March 2012, 01:29
ffms2_FFVideoSource("bluray remux source.mkv")
SetMTMode(2,0)
Spline36Resize(1280,534,0,140,0,-140)
LSFmod(ss_x=1.0,ss_y=1.0,Smode=3,Smethod=3,Lmode=3,strength=60,overshoot=2,undershoot=1)
I'm using a avs script like that.
Can you give me a "Sora" script for this? :thanks:
My x264 setting:
--level 4.1 --deblock -3:-3 --bframes 8 --b-adapt 2 --ref 12 --aq-mode 2 --aq-strength 0.8
--cqmfile "C:\Users\Administrator\Desktop\eqm_avc_hr.cfg" --merange 32 --me umh --direct auto --subme 11
--partitions all --trellis 2 --psy-rd 1.15:0.45 --no-dct-decimate --no-fast-pskip
Pass 1: ~17.5 fps
Pass 2: ~5 fps
What CPU do you have ?
Take in mind that the script and x264 run at the same time.
Speeding up your script makes only sense if the script runs slower than x264, what is probably not true in this case.
Measure the speed from your script. (AVSmeter ,...) and speed up the slowest part in the chain.
Further speeding up is only possible if your CPU is not already fully loaded.
Looking at the x264 parameters, you gone need a heavy multi-CPU system if you want the whole chain to run faster by speeding up the script using Sora's multi-threading :D
An alternative is to separate (decoding + filters) and encoding by using a "temporary file".
(decoding + filters) -> lossless encoding --> temp file
temp file --> lossless decoding --> x264 encoding.
BTW : have you tried to run your script ? You have a good chance it will stuck on line 1...
GEfS
19th March 2012, 02:51
I'm using Xeon X3430 :D
cybersharky
15th April 2012, 09:07
How exactly do you use sorathread/sorasmsource? Where must I have the .dll's? As far as I can see the examples have steps left out?
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.