View Full Version : What FPS can reach using TempGaussMC on SD?
Gargalash
7th May 2010, 18:59
Hi all,
I am using this deinterlacer regularly. Of course it is really slow, but it's fine with me... although I wonder how fast it can run on a newer machine. I've been stuck with my old machine for over 4 years, I have no idea of the speed heavy filters reach on new machine.
Using this:
MT("TempGaussMC_beta2().SelectEven()", overlap=4)
I can get around 2,17 FPS when deinterlacing a DVD. My machine has a Athlon X2 64bit 4400+ 2.2Ghz.
What FPS are you getting or would you expect to get on something like an affordable i7-920 or the newer i7-980X (http://ark.intel.com/Product.aspx?id=47932)?
Thanks for sharing!
Audionut
8th May 2010, 00:32
Using the line you posted I get about 6fps.
Using, mt("TempGaussMC_beta2().selecteven()",4), I get about 10fps
Using just, mt("TempGaussMC_beta2()",4), I get about 18.5fps
Intel Q6600 at 3.4Ghz
Blue_MiSfit
8th May 2010, 20:01
Are those speeds the script-only, or does that include encoding as well?
Avisynth multithreading is pretty good, but can be problematic once you start doing like 8 threads, as you must to fully utilize an i7.
I think you should of course keep in mind the usual advantages of rendering to a lossless or near-lossless file when using sloooow scripts :)
~MiSfit
LoRd_MuldeR
10th May 2010, 15:59
Hi all,
I am using this deinterlacer regularly. Of course it is really slow, but it's fine with me... although I wonder how fast it can run on a newer machine. I've been stuck with my old machine for over 4 years, I have no idea of the speed heavy filters reach on new machine.
Using this:
MT("TempGaussMC_beta2().SelectEven()", overlap=4)
I can get around 2,17 FPS when deinterlacing a DVD. My machine has a Athlon X2 64bit 4400+ 2.2Ghz.
I got a different question: How much overlap does TempGaussMC() need? Is overlap=4 sufficient/safe to use?
Audionut
10th May 2010, 23:54
Are those speeds the script-only, or does that include encoding as well?
Encode to huffy.
Gargalash
11th May 2010, 16:25
I got a different question: How much overlap does TempGaussMC() need? Is overlap=4 sufficient/safe to use?
I'm absolutely unable to answer this. I've ended up placing overlap=4 because I had a bogus horizontal line in the image that went away when using it. What's your opinion on this?
The FPS I am reporting are by simply letting the video play in Vdub. I always encode to Lagarith.
Looks like a new machine could improve my workflow according to the numbers posted.
Thanks all for your replies!
LoRd_MuldeR
11th May 2010, 21:17
I'm absolutely unable to answer this. I've ended up placing overlap=4 because I had a bogus horizontal line in the image that went away when using it. What's your opinion on this?
My "opinion" is that I can't see any obvious issues with overlap=4, but that doesn't guarantee it really is safe to use.
So I was hoping that somebody could confirm overlap=4 is okay for TempGaussMC. Or tell me to correct value :)
About speed: Both, speed and CPU usage, scale almost linearly with the number of threads by putting MT() around TempGaussMC_beta2().
That's on my Q6600. And it gets saturated at 4 threads. So I can get up to 8 fps instead of 2 fps at least ;)
Didée
11th May 2010, 22:03
I can't imagine there should be any "correct" or "required" overlap value for MT()'ing the TGMC function. You need *some* overlap to avoid visible bordering artifacts. In any case, MT() will potentially degrade the result because of the restricted motion search area (can't find or compensate motion that crosses slice boundaries), so the most safe overlap value is [frame size]-[slice size], or in other words, not to use MT() at all. Anything between overlap=0 and overlap =framesize is possible, it's just a user choice about the speed/quality tradeoff.
Though, purely from the guts, I'd give a try on overlap=16 and splitvertical=true when using 2 slices. For 4 slices, perhaps try MT(MT(vertical),horizontal). But really, I don't know, it's just guesswork.
___
Edit - Oh, and my bid for the record-of-poorness on 720x576i is
TGMC(NNEDI2) => 0.8 fps (1Core Celeron @ 2.6GHz)
:D
LoRd_MuldeR
11th May 2010, 22:31
I'd give a try on overlap=16 and splitvertical=true when using 2 slices. For 4 slices, perhaps try MT(MT(vertical),horizontal).
Interesting. I would have assumed splitting vertically is worse (for motion compensation) than splitting horizontally, because most of the motion is in vertical direction...
Didée
11th May 2010, 22:56
No, what, why? Dominant motion usually is horizontal like
x------>
and splitvertical splits like
xxxxxxxxxxx
xxxxxxxxxxx
oooooooooo
oooooooooo
, I hope it isnt the other way round? (See, I didn't use it since too long ago...)
LoRd_MuldeR
11th May 2010, 23:07
No, what, why? Dominant motion usually is horizontal like
x------>
Yes, that's what I would say too.
and splitvertical splits like
xxxxxxxxxxx
xxxxxxxxxxx
oooooooooo
oooooooooo
I don't think so. What you show is a horizontal split, isn't it? :confused:
At least with overlap=0 and splitvertical=false (default) I get visible horizontal lines (i.e. left to right) at the slice border...
Gavino
11th May 2010, 23:30
At least with overlap=0 and splitvertical=false (default) I get visible horizontal lines (i.e. left to right) at the slice border...
The terminology is slightly ambiguous and hence confusing, but splitvertical=true means frames are cut vertically, and so the segments are split/joined horizontally (ie LoRd_MuldeR is right).
Didée
12th May 2010, 10:47
Ah okay, then MT() really has it the other way round .... I revert my guesstimation from above. For 2 slices, it should be splitvertical=false, then.
However, MT's terminology is a bit unlucky, IMHO. Compare e.g. to "StackVertical" in Avisynth. To me it is more intuitive to think about how the resulting frame slices are aligned. Not in which direction a virtual knife is sliding.
henryho_hk
14th May 2010, 07:04
I'm always using manual MT in pass-0: Four concurrent TGMC_b2(NNEDI2-threads=1,tr3=3) jobs with the lovely trim().
Q6600@3.2GHz... every job runs at 1.9~2 fps (XviD @ EQM EHR, ME6, VHQ1, CQ2)
Edit:
1) Those are 720x480 materials, and it is TGMC_b2(NNEDI2-threads=1,tr3=3).selecteven()
2) 1920x1080 is painfully slow... @ <0.1fps each job
Atak_Snajpera
16th May 2010, 00:31
and with 1920x1080 (avchd camcorder)? how about setmtmode?
UPDATE: this script is unusable for 1920x1080i. My q6600@3Ghz is waaay to slooow. I suppose I would need at least 16 core i7 :)
Didée
27th May 2010, 03:12
Oh, and my bid for the record-of-poorness on 720x576i is
TGMC(NNEDI2) => 0.8 fps (1Core Celeron @ 2.6GHz)
Yesterday's pain, tomorrow's joy.
Now, freshly landed on planet multicore, riding the i7-860 rover on its 8GB wheels and exploring the land of sixty-four, things look a bit different ...
# (everything is x64 here)
SetMTMode(5,12)
SetMemoryMax(512*3)
mpeg2source("interlaced.d2v")
SetMTMode(2)
TGMC_b2() # exactly default
Plain screen rendering in VeeDub:
576i : 30.6 fps
480i : 36.7 fps
Relaxed to (1,1,1,Edi="nope"), TGMC crunches PAL at >60fps.
(and funny, encoding to Huffy w/o preview even is slightly faster)
___
It's a different world indeed, I think I like it. :)
However, it's a bit lonesome in 64bit land. A good friend is missing. NNEDI2, where art thou? :(
Terka
27th May 2010, 10:20
576i : 30.6 fps
when riding so fast, so now its time to even improve the TGMC quality ;)
Gargalash
27th May 2010, 17:02
Plain screen rendering in VeeDub:
576i : 30.6 fps
480i : 36.7 fps
Relaxed to (1,1,1,Edi="nope"), TGMC crunches PAL at >60fps.
Holly [insert sware]!
Thanks for sharing these results Didée! Makes me feel like my computer is getting older thant I thought... But, encouraging! That will be a world of difference when I'll get the new machine! (no idea when, but let's dream a bit!)
I get around 2fps on my core duo 2,4ghz with no MT and TGMC_b2(EdiMode="EEDI3+NNEDI3"). IS there a MT version of beta2?
ficofico
27th May 2010, 17:21
(EdiMode="EEDI3+NNEDI3"
??????
TGMC_b2(EdiMode="EEDI3+NNEDI3")??????
Add the following line of code to TempGaussMC_beta2u (http://forum.doom9.org/showthread.php?p=1400371#post1400371):
\ : (EdiMode=="EEDI3") ? clp.eedi3(field=-2, sclip=clp.nnedi3(field=-2,nsize=nsize, nns=nns, qual=qual))
Add the following line of code to TempGaussMC_beta2u (http://forum.doom9.org/showthread.php?p=1400371#post1400371):
\ : (EdiMode=="EEDI3") ? clp.eedi3(field=-2, sclip=clp.nnedi3(field=-2,nsize=nsize, nns=nns, qual=qual))
Excelent! People read my posts. Perhaps the edited script could be submitted, maybe add a option that uses only EEDI3. TempGaussMC_beta2v?
ficofico
27th May 2010, 21:57
Ok but I've to write in "edimode"EEDI3 or EEDI3+NNeDI3.... i thinks EEDI3 alone, right?
Ok but I've to write in "edimode"EEDI3 or EEDI3+NNeDI3.... i thinks EEDI3 alone, right?
(EdiMode=="EEDI3") defines the syntax, so feel free to change it at your pleasure. But yes according to that EdiMode="EEDI3" is correct. I chose EEDI3 simply because it was short and I didn't include an option to use only EEDI3. Perhaps EEnnedi3 would be a more discriptive choice of syntax.
dbmaxpayne
2nd June 2010, 09:31
576i : 30.6 fps
480i : 36.7 fps
:eek::eek::eek:
What the hell, how do you get that much fps???
I use the following script:
AVISource("C:\DV_type2.avi", audio=true)
AssumeBFF()
LoadPlugin("C:\TGMC\mvtools\mvtools2.dll")
LoadPlugin("C:\TGMC\RemoveGrain+Repair\RepairSSE3.dll")
LoadPlugin("C:\TGMC\RemoveGrain+Repair\RemoveGrainSSE3.dll")
LoadPlugin("C:\TGMC\VerticalCleaner\VerticalCleanerSSE3.dll")
LoadPlugin("C:\TGMC\MaskTools\mt_masktools-25.dll")
#LoadPlugin("C:\TGMC\eedi2_imp.dll")
LoadPlugin("C:\TGMC\nnedi3\nnedi3.dll")
import("C:\TGMC\TempGaussMC_beta2u.avsi")
SetMtMode(2)
TempGaussMC_beta2u(EdiMode="nnedi3")
I'm currently using Avisynth 2.6 Alpha 2 for testing.
First it complained about missing SetMtMode function, so I used a recompiled version from September 2009.
I'm only getting ~4fps on an 8core machine :confused::confused:
Whats wrong here?
BTW: Is there a newer version of TGMC than TempGaussMC_beta2u?
Mark
Didée
2nd June 2010, 12:31
Read closer what I had posted.
1) I had made the test in a 64bit environment: Avisynth x64, and all filters x64.
2) NNEDI is not available as 64bit plugin, so I had used EEDI2 instead.
3) You're using NNEDI3, which is even more slow than NNEDI2.
4) I had used 12 threads on a 4-core HT processor. It seems one needs to use quite some "overloading" to fully load the CPU.
Edit:
Regarding NNEDI3, I'm not sure ATM if it is "internally multithreaded" in the same way as NNEDI2 is? If so, it might be a good idea to set it up to use only one thread. (Imagine SetMTMode spawns 12 threads, and *each* NNEDI within *each* thread is spawning 8 threads on your 8core .... 12*8 = 96threads for NNEDI. Not good, most probably.)
henryho_hk
2nd June 2010, 12:49
To fully load the CPU, we can also split the avs footage into four/eight equal parts (by trim) and run them in parallel. Didée, would you do a rough benchmark on your 4-core HT machine:
a) one 12-threaded (MT) task
b) four parallel single-thread task
c) eight parallel single-thread task
That would be an interesting to look at MT's overhead.
Didée
3rd June 2010, 02:25
Numb3rs.
Evaluating is not exactly trivial.
OTOH, splitting in snipps & running in different processes leads to the fact that some processes may/will finish earlier, some finish later. Now, what's the exact performance? (In the table below, I denoted processing times of both slowest and fastest process. Then, the numbers for FPS and %age are based on the slowest process. Reasoning: the job only is finished when everything is finished.)
OTOH, the characteristics of SetMTmode depend on the script's characteristics. When the temporal complexity of the filter chain is low (or zero), then SetMTmode scales quite similar as processing-sections-in-parallel. However when temporal complexity is high, then SetMTmode scores subpar when using a "low number" of threads (two to four threads). Seems the threads spend much time waiting for Godot. With 6 threads performance becomes much better, and tops with 8 to 12 threads.
Also, SetMTmode needs much less RAM. Even with 12 threads, it was possible to stay at 2GB (slightly below) without performance impact. With section splitting, each process occupies the ressources for a full process' environment. I didn't manage to make a comparison with 12 processes: trying to open the 10th process raised an exception. (Don't quite understand, since sufficient amount of RAM was still available.)
Alltogether, I definetly like SetMTmode more. Throw enough threads to keep the lines filled, and see it running. No hassle at all.
Setting up the splitting-to-sections was definetly more of a hassle, ate much more RAM, and the speed is not much more than that of a (balanced) SetMTmode. Also, what about (possibly) encoding directly to x264 ...
Table:
i7-860, 4 cores + HT + ('turbotech' active), 8GB RAM
====================================================
Avisynth x64 MT
===============
source: PAL 576i (DVD "Lord of the Dance", chapters 1 + 2 (7min) )
TempgaussMC_beta2()
-------------------
complexity: spatial=high, temporal=high
process render time fps rel.
x thread (max)..(min) avg. speed
1 x 1 37:43 8.40 100%
________________________________________
1 x 2 35:27 8.93 106%
2 x 1 20:48~~20:17 15.22 181%
1 x 4 21:48 14.53 173%
4 x 1 14:37~~14:10 21.66 258%
1 x 6 12:10 26.03 310%
6 x 1 10:55~~10:05 29.01 345%
1 x 8 10:04 31.45 374%
8 x 1 9:52~~~9:01 32.09 382%
1 x 10 9:24 33.64 400%
1 x 12 9:41 32.70 389%
12 x 1 (exception upon starting 10th process)
========================================
========================================
SeparateFields().EEDI2()
------------------------
complexity: spatial=high, temporal=zero
1 x 1 16:54 18.74 100%
________________________________________
1 x 2 9:27 33.51 179%
1 x 4 6:37 47.86 255%
1 x 6 5:04 62.50 333%
1 x 8 4:26 71.43 381%
1 x 10 4:22 72.40 386%
1 x 12 4:38 68.35 365%
Terka
3rd June 2010, 09:03
interesting.
would like to see the results for i5 and i3. Does someone have such machine?
henryho_hk
3rd June 2010, 11:36
Didée, which program did you use for benchmark? avs2avi or virtualdub? I wanna try something similar with my slow Q6600.
Didée
3rd June 2010, 19:55
Actually I had used Virtualdub64, but that shouldn't matter. Avs2avi will give pretty much the same results - perhaps marginally faster because of less overhead. Say +1%?
I've also added one more line to the table above. The "sweetspot" for SetMTmode(TGMC) was with threads=10, which I had missed. At that point, the perfomance is 400% that of single-threaded performance. Which is more or less "the goal" on a 4-core CPU, isn't it?
henryho_hk
4th June 2010, 07:11
Then I need to test for the "sweet point" of my non-HT Q6600 too.
I'm a pure command-line geek. I want everything reproducible across machines on the command-line. Even avs2avi involve some GUI when setting codec parameters. Is there any win64 compile of mencoder, xvid_encraws, etc. which produces >2GB AVIs correctly?
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.