Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Usage

Reply
 
Thread Tools Search this Thread Display Modes
Old 24th July 2011, 15:55   #1  |  Link
CarlEdman
Registered User
 
Join Date: Jan 2008
Posts: 185
Extremely Poor Performance on HD Material

I use the following script (in either avs2pipe or the various mod or 2.6 versions with an MT avisynth.dll):
Quote:
SetMTMode(5,0)
DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3)
ColorMatrix(d2v="{source}.d2v",interlaced=true)
tfm().tdecimate(hybrid=1)
SetMTMode(2)
super = MSuper(planar=true)
bv1 = MAnalyse(super, isb = true, delta = 1, overlap=4)
fv1 = MAnalyse(super, isb = false, delta = 1, overlap=4)
bv2 = MAnalyse(super, isb = true, delta = 2, overlap=4)
fv2 = MAnalyse(super, isb = false, delta = 2, overlap=4)
bv3 = MAnalyse(super, isb = true, delta = 3, overlap=4)
fv3 = MAnalyse(super, isb = false, delta = 3,overlap=4)
MDegrain3(super,bv1,fv1,bv2,fv2,bv3,fv3,thSAD=400,planar=true)
Distributor()
The output is then fed to x264 with options like this:
Quote:
x264 --demuxer y4m - --tune film --preset veryslow --crf 22.0 --non-deterministic --profile high --level 4.0 --sar 1:1 --fps 24000/1001 --output "{target}.mp4"
The hardware is an Intel Core i7-970 CPU (6-cores/12-threads) with 16 GByte of memory and a large fast SSD for the source material.

Even though I realize that this script and x264 setting are demanding, the hardware is able to handle it. Standard definition sources (i.e., my DVD collection) encode at around real-time speed (i.e., 20 to 30 fps) with reported 100% CPU usage.

The problem is high-definition sources (typically 1080p24 telecined to 1080i30). Of course I expect that such processing such material will be much slower. What surprises me is the degree of slowdown. Rather than reducing the encoding fps by a factor of 6 (as you would expect from the increase in image size) to maybe 4 or 5 fps, I actually get only a little over 1 fps. What's more, there appears to be some sort of bottleneck as total reported CPU usage, while highly variable, is typically only in the 25-35% range.

Any suggestions on how to fix this? Any reason that the CPU usage should be so low?
CarlEdman is offline   Reply With Quote
Old 24th July 2011, 16:21   #2  |  Link
CarlEdman
Registered User
 
Join Date: Jan 2008
Posts: 185
Immediately after my original post, I tried one more experiment: Instead of running dgindex with honored pull-down flags and letting the avisynth script do the teleciding, I instead forced film in dgindex and removed this line from the avisynth script:
Quote:
tfm().tdecimate(hybrid=1)
That made the problem go away. High-def encodes are still slow, but appropriately slow with 100% CPU usage.

Still, I generally would prefer to let the avisynth script make the teleciding decision on an as you go basis, so the result here may be less good than it would otherwise be. Any way to make that excised line work well on HD material with multithreading?
CarlEdman is offline   Reply With Quote
Old 24th July 2011, 18:45   #3  |  Link
Lyris
Registered User
 
Join Date: Sep 2007
Location: Europe
Posts: 602
Isn't it the combined effects of HD resolution + the many motion vector calculations being done by the script?

Try running the script in VirtualDub, hit Enter and view the stats - what sort of playback FPS do you get?

FWIW, on my ~3.8ghz i7 machine, encoding a grainy BD source (from a file captured from HDCAM SR tape) chugs along at 0.6fps during the very grainy scenes, on the veryslow preset. The quality makes it all worthwhile though
Lyris is offline   Reply With Quote
Old 24th July 2011, 19:07   #4  |  Link
tritical
Registered User
 
Join Date: Dec 2003
Location: MO, US
Posts: 999
Out of curiosity, how fast do each of the following run:

Code:
####### script 1
DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3)


####### script 2
SetMTMode(5,0)
DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3)
Distributor() 

####### script 3
DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3)
ColorMatrix(d2v="{source}.d2v",interlaced=true)
tfm().tdecimate(hybrid=1)

####### script 4
SetMTMode(5,0)
DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3)
ColorMatrix(d2v="{source}.d2v",interlaced=true)
tfm().tdecimate(hybrid=1)
Distributor()
tritical is offline   Reply With Quote
Old 24th July 2011, 19:21   #5  |  Link
Didée
Registered User
 
Join Date: Apr 2002
Location: Germany
Posts: 5,391
For one, it might be worth to use less threads in Avisynth. SetMTmode(5,0) will induce 12 threads on that 6C12T CPU. That's problematic mainly because of the ressource requirements. Additionally, x264 wants to encode at the same time, and will use 18 threads (per default) on its own. That's 30 threads alltogether, but it's only 6 physical cores, and each thread calls for ressources.
Something like SetMTmode(5,6) for avisynth, with --threads 12 for x264 should be more graceful. (Though it's no hard science, and the "ideal" number of threads may vary.)


What I discovered just today, while trying to reproduce the scenario:

No problems whatsoever when not using TFM.TDecimate. - But with TFM.TDecimate, I get crashes, crashes, crashes ... always "out-of-bounds memory access", and the modules appearing in Vdub's crash log always are MVTools2.dll (2.5.11.2) and TIVTC.dll (1.0.5). Again, no problem at all when without TFM.TDecimate ...
Script exactly like posted by CarlEdman, just without Colormatrix (irrelevant for evaluation) and without Distributor (running Vdub).
__________________
- We´re at the beginning of the end of mankind´s childhood -

My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!)
Didée is offline   Reply With Quote
Old 25th July 2011, 03:21   #6  |  Link
Motenai Yoda
Registered User
 
Motenai Yoda's Avatar
 
Join Date: Jan 2010
Posts: 709
also try

Code:
function Degrain(clip c){
o = c
super = o.MSuper(planar=true)
bv3 = super.MAnalyse(isb = true, delta = 3, overlap=4)
bv2 = super.MAnalyse(isb = true, delta = 2, overlap=4)
bv1 = super.MAnalyse(isb = true, delta = 1, overlap=4)
fv1 = super.MAnalyse(isb = false, delta = 1, overlap=4)
fv2 = super.MAnalyse(isb = false, delta = 2, overlap=4)
fv3 = super.MAnalyse(isb = false, delta = 3, overlap=4)
return o.MDegrain3(super,bv1,fv1,bv2,fv2,bv3,fv3,thSAD=400,planar=true)
}

DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3)
 ColorMatrix(d2v="{source}.d2v",interlaced=true)
 tfm().tdecimate(hybrid=1)
mt("Degrain()",4,8)

Last edited by Motenai Yoda; 25th July 2011 at 03:25.
Motenai Yoda is offline   Reply With Quote
Old 25th July 2011, 17:50   #7  |  Link
CarlEdman
Registered User
 
Join Date: Jan 2008
Posts: 185
Thanks to everybody for their thoughtful suggestions and ideas!

I do not have VirtualDub, so I was not able to try this, but I am highly confident that it was the tfm.tdecimate(hybrid=1) which caused the severe slow down, not the inherent large demands of the task.

I have now tried several sources 1080i sources (all of which report 90-100% FILM in dgindex) either by (1) having dgindex honor pulldown flags and using tfm.tdecimate and (2) having dgindex force film and not using tfm.tdecimate.

In every single case, using (1), I observed encoding fps well below 1 fps, CPU load wildly fluctuating but hovering around 30%, and frequent random crashes (i.e., crashes which are not reproduced even when running exactly the same script on the same material again) similar to those reported by Didée.

In every single case, using (2), I observed encoding fps of about 4 fps, CPU load pretty much stuck at 100% (which is fine as the processes are all run at below normal priority, so they don't interfere with my other low-intensity uses of the computer), and no crashes.

So, if I found a workable workaround, why am I still complaining? Three reasons:
(1) Dgindex force-film is pretty good, but at least on material that is not quite 100% FILM, there are noticeable interlacing artifacts, which I've never observed with tfm.tdecimate(hybrid=1) which seems to do a smarter form of teleciding. In some frames with motion force-film just clearly matches incorrect fields, while tfm.tdecimate(hybrid=1) never seems to do it or hides these artifacts much better.

(2) Possibly related to (1), these interlacing artifacts from dgindex force-film have a pretty substantial impact on encoded size. Because tfm.tdecimate does not work, I've not been able to do a clean one-to-one comparison, it appears that that the encode size (with the same crf setting) is perhaps 20%-30% larger than I've come to expect from material of this type and these settings.

(3) Tfm.tdecimate still seems to work (with only relatively rare, random crashes) well on standard-definition material and produce very nice results, so I hate to give up on it.

Motenai Yoda: I will try the experiment with using MT, rather than SetMTMode, at the next opportunity and report results. It looks a little less elegant, but if it works, I'll be happy.
CarlEdman is offline   Reply With Quote
Old 25th July 2011, 20:10   #8  |  Link
CarlEdman
Registered User
 
Join Date: Jan 2008
Posts: 185
Update on Motenai Yoda proposed solution:
I tried what you suggested, but received instant crashes using avs2pipemod26. Then I tried avs2pipe (2.5.8) and feeding the avs script straight to the latest 64-bit x264. Those methods appeared to work reasonably (i.e., not crashing immediately and getting to about 3 fps with about 70-75% CPU utilization). Unfortunately, those attempts eventually crashed after about encoding 10,000 frames (while dgindex force-film without tfm.tdecimate never crashes).

Also, based on the description at Avisynth's SetMTMode/MT description, I am little concerned about the way MT works. While SetMTMode divides up the work between the processors one frame at a time, MT subdivides each frame. As I use MT to do the motion vector calculation, I wonder if that process can really be effective if each thread only looks at part of each frame, in particular if you have a lot of threads and slices, as appears necessary for MT to max out the CPU.

I am starting to think that I need something equally good as tfm.decimate, but more stable in a multiple-processing/high-def environment. QTGMC, according to testimonials and the clips I've seen, appears great at deinterlacing/denoising and can handle both interlaced and progressive input. But can it telecide accurately?
CarlEdman is offline   Reply With Quote
Old 25th July 2011, 20:53   #9  |  Link
tritical
Registered User
 
Join Date: Dec 2003
Location: MO, US
Posts: 999
Quote:
No problems whatsoever when not using TFM.TDecimate. - But with TFM.TDecimate, I get crashes, crashes, crashes ... always "out-of-bounds memory access", and the modules appearing in Vdub's crash log always are MVTools2.dll (2.5.11.2) and TIVTC.dll (1.0.5). Again, no problem at all when without TFM.TDecimate ...
Script exactly like posted by CarlEdman, just without Colormatrix (irrelevant for evaluation) and without Distributor (running Vdub).
Is this when not using multithreading as well? or only when using setmtmode? What if you just use tfm()?

I personally would not recommend using any mode of setmtmode with tdecimate... it can only make things slower and use a lot more memory.
tritical is offline   Reply With Quote
Old 25th July 2011, 21:50   #10  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by Didée View Post
without Distributor (running Vdub).
FYI - Vdub will apply "Distributor()" if it hasn't been called in the script.
Groucho2004 is offline   Reply With Quote
Old 25th July 2011, 22:42   #11  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,431
Quote:
Originally Posted by Groucho2004 View Post
Vdub will apply "Distributor()" if it hasn't been called in the script.
Is it smart enough not to apply it if it has been called in the script?
More likely, it always calls it if you are running MT Avisynth with SetMTMode switched on, as that is an easier condition to detect. In which case it would be called twice, with undesirable consequences.
__________________
GScript and GRunT - complex Avisynth scripting made easier
Gavino is offline   Reply With Quote
Old 25th July 2011, 23:01   #12  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by Gavino View Post
Is it smart enough not to apply it if it has been called in the script?
More likely, it always calls it if you are running MT Avisynth with SetMTMode switched on, as that is an easier condition to detect. In which case it would be called twice, with undesirable consequences.
I tested this once and the number of created threads was the same with or without "Distributor()" in the script. It's probably the same code that x264 uses to determine if the call is necessary.
Groucho2004 is offline   Reply With Quote
Old 26th July 2011, 00:57   #13  |  Link
Didée
Registered User
 
Join Date: Apr 2002
Location: Germany
Posts: 5,391
Quote:
Originally Posted by Groucho2004 View Post
FYI - Vdub will apply "Distributor()" if it hasn't been called in the script.
I know that Distributor is applied automatically when VfW is used. That's why I did not use it in the script.

Also,
Quote:
Originally Posted by Groucho2004
I tested this once and the number of created threads was the same with or without "Distributor()" in the script.
this I cannot confirm. Script like posted below, without Distributor, opened in Vdub: Taskmanager reports 12 Threads by Vdub. Same script with Distributor: 48 Threads by Vdub. Big difference!


(@tritical)
Today, first I was struggling to reproduce yesterday's crashmania: everything rock stable...
But then I could narrow it down to a small syntax detail ...

This works:
Code:
setmtmode(5,6)
SetMemoryMax(1000)

mpeg2source("1080i.d2v")

tfm().tdecimate()

SetMTMode(2)
super = MSuper(planar=true)

... rest of MDegrain3 stuff ...

#distributor()
return(last)

In contrast, this crashes very early: (often immediately)
Code:
setmtmode(5,6)
SetMemoryMax(1000)

mpeg2source("1080i.d2v")

tfm.tdecimate()

SetMTMode(2)
super = MSuper(planar=true)

... rest of MDegrain3 stuff ...

#distributor()
return(last)
HARRRGH! I thought the empty-parentheses-thingy had been sorted long time ago!?!

This was with SEt's 2.60.MT (Aug 13 2009) version, btw. Didn't try the recent one, yet.


Bottomline: Problem solved. Just be a good boy and always write the parentheses.
__________________
- We´re at the beginning of the end of mankind´s childhood -

My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!)
Didée is offline   Reply With Quote
Old 26th July 2011, 01:05   #14  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by Didée View Post
Also,
this I cannot confirm. Script like posted below, without Distributor, opened in Vdub: Taskmanager reports 12 Threads by Vdub. Same script with Distributor: 48 Threads by Vdub.
Odd. I'll have to try this again.
Groucho2004 is offline   Reply With Quote
Old 26th July 2011, 08:51   #15  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by Gavino View Post
Is it smart enough not to apply it if it has been called in the script?
More likely, it always calls it if you are running MT Avisynth with SetMTMode switched on, as that is an easier condition to detect. In which case it would be called twice, with undesirable consequences.
Ok, I was talking rubbish earlier. There doesn't seem to be a way in the Avisynth API to determine if the "Distributor" call is necessary other than calling it when an MT mode is active.
As you already pointed out in a post in the Avisynth development thread, it would be necessary to parse and evaluate the script itself in order to do this.
I wonder how many people are aware of this when they are encoding with Avisynth MT and x264.
Groucho2004 is offline   Reply With Quote
Old 26th July 2011, 09:32   #16  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,431
Quote:
Originally Posted by Didée View Post
This works:
...
tfm().tdecimate()
...
In contrast, this crashes very early: (often immediately)
...
tfm.tdecimate()
...
HARRRGH! I thought the empty-parentheses-thingy had been sorted long time ago!?!

This was with SEt's 2.60.MT (Aug 13 2009) version, btw. Didn't try the recent one, yet.
The original empty-parenthesis problem (long ago) was that no cache was created for a function call without parentheses, slowing performance. This was fixed (coincidentally, by tritical) in Jan 2006, by this code in expression.cpp:
Code:
  // Add cache to Bracketless call of argless function
  if (result.IsClip()) { // Tritical Jan 2006
    return Cache::Create_Cache(result, 0, env);
  }
However, the fix was commented out in the original MT Avisynth (since a different form of cache would be needed), but without adding a suitable replacement. It is still that way in SEt's current source.

Quote:
Originally Posted by Groucho2004 View Post
I wonder how many people are aware of this when they are encoding with Avisynth MT and x264.
It's not a problem with x264, as it detects use of MT and adds Distributor(). The only time a user should put Distributor() in the script is when using a decoder that
a) uses Avisynth directly, rather than through VfW, and
b) is not 'MT-aware'.
__________________
GScript and GRunT - complex Avisynth scripting made easier
Gavino is offline   Reply With Quote
Old 26th July 2011, 10:13   #17  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by Gavino View Post
It's not a problem with x264, as it detects use of MT and adds Distributor(). The only time a user should put Distributor() in the script is when using a decoder that
a) uses Avisynth directly, rather than through VfW, and
b) is not 'MT-aware'.
Yes, but I've seen script examples here where the last line is
Code:
Distributor() #for multithreading
It seems that the creator of that script thinks that the call in the script is necessary no matter what the processing chain is.

Anyway, RTFM is recommended as usual.
Groucho2004 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:54.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.