Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
24th July 2011, 15:55 | #1 | Link | ||
Registered User
Join Date: Jan 2008
Posts: 185
|
Extremely Poor Performance on HD Material
I use the following script (in either avs2pipe or the various mod or 2.6 versions with an MT avisynth.dll):
Quote:
Quote:
Even though I realize that this script and x264 setting are demanding, the hardware is able to handle it. Standard definition sources (i.e., my DVD collection) encode at around real-time speed (i.e., 20 to 30 fps) with reported 100% CPU usage. The problem is high-definition sources (typically 1080p24 telecined to 1080i30). Of course I expect that such processing such material will be much slower. What surprises me is the degree of slowdown. Rather than reducing the encoding fps by a factor of 6 (as you would expect from the increase in image size) to maybe 4 or 5 fps, I actually get only a little over 1 fps. What's more, there appears to be some sort of bottleneck as total reported CPU usage, while highly variable, is typically only in the 25-35% range. Any suggestions on how to fix this? Any reason that the CPU usage should be so low? |
||
24th July 2011, 16:21 | #2 | Link | |
Registered User
Join Date: Jan 2008
Posts: 185
|
Immediately after my original post, I tried one more experiment: Instead of running dgindex with honored pull-down flags and letting the avisynth script do the teleciding, I instead forced film in dgindex and removed this line from the avisynth script:
Quote:
Still, I generally would prefer to let the avisynth script make the teleciding decision on an as you go basis, so the result here may be less good than it would otherwise be. Any way to make that excised line work well on HD material with multithreading? |
|
24th July 2011, 18:45 | #3 | Link |
Registered User
Join Date: Sep 2007
Location: Europe
Posts: 602
|
Isn't it the combined effects of HD resolution + the many motion vector calculations being done by the script?
Try running the script in VirtualDub, hit Enter and view the stats - what sort of playback FPS do you get? FWIW, on my ~3.8ghz i7 machine, encoding a grainy BD source (from a file captured from HDCAM SR tape) chugs along at 0.6fps during the very grainy scenes, on the veryslow preset. The quality makes it all worthwhile though |
24th July 2011, 19:07 | #4 | Link |
Registered User
Join Date: Dec 2003
Location: MO, US
Posts: 999
|
Out of curiosity, how fast do each of the following run:
Code:
####### script 1 DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3) ####### script 2 SetMTMode(5,0) DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3) Distributor() ####### script 3 DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3) ColorMatrix(d2v="{source}.d2v",interlaced=true) tfm().tdecimate(hybrid=1) ####### script 4 SetMTMode(5,0) DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3) ColorMatrix(d2v="{source}.d2v",interlaced=true) tfm().tdecimate(hybrid=1) Distributor() |
24th July 2011, 19:21 | #5 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 5,391
|
For one, it might be worth to use less threads in Avisynth. SetMTmode(5,0) will induce 12 threads on that 6C12T CPU. That's problematic mainly because of the ressource requirements. Additionally, x264 wants to encode at the same time, and will use 18 threads (per default) on its own. That's 30 threads alltogether, but it's only 6 physical cores, and each thread calls for ressources.
Something like SetMTmode(5,6) for avisynth, with --threads 12 for x264 should be more graceful. (Though it's no hard science, and the "ideal" number of threads may vary.) What I discovered just today, while trying to reproduce the scenario: No problems whatsoever when not using TFM.TDecimate. - But with TFM.TDecimate, I get crashes, crashes, crashes ... always "out-of-bounds memory access", and the modules appearing in Vdub's crash log always are MVTools2.dll (2.5.11.2) and TIVTC.dll (1.0.5). Again, no problem at all when without TFM.TDecimate ... Script exactly like posted by CarlEdman, just without Colormatrix (irrelevant for evaluation) and without Distributor (running Vdub).
__________________
- We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!) |
25th July 2011, 03:21 | #6 | Link |
Registered User
Join Date: Jan 2010
Posts: 709
|
also try
Code:
function Degrain(clip c){ o = c super = o.MSuper(planar=true) bv3 = super.MAnalyse(isb = true, delta = 3, overlap=4) bv2 = super.MAnalyse(isb = true, delta = 2, overlap=4) bv1 = super.MAnalyse(isb = true, delta = 1, overlap=4) fv1 = super.MAnalyse(isb = false, delta = 1, overlap=4) fv2 = super.MAnalyse(isb = false, delta = 2, overlap=4) fv3 = super.MAnalyse(isb = false, delta = 3, overlap=4) return o.MDegrain3(super,bv1,fv1,bv2,fv2,bv3,fv3,thSAD=400,planar=true) } DGDecode_mpeg2source("{source}.d2v", info=0, idct=4, cpu=3) ColorMatrix(d2v="{source}.d2v",interlaced=true) tfm().tdecimate(hybrid=1) mt("Degrain()",4,8) Last edited by Motenai Yoda; 25th July 2011 at 03:25. |
25th July 2011, 17:50 | #7 | Link |
Registered User
Join Date: Jan 2008
Posts: 185
|
Thanks to everybody for their thoughtful suggestions and ideas!
I do not have VirtualDub, so I was not able to try this, but I am highly confident that it was the tfm.tdecimate(hybrid=1) which caused the severe slow down, not the inherent large demands of the task. I have now tried several sources 1080i sources (all of which report 90-100% FILM in dgindex) either by (1) having dgindex honor pulldown flags and using tfm.tdecimate and (2) having dgindex force film and not using tfm.tdecimate. In every single case, using (1), I observed encoding fps well below 1 fps, CPU load wildly fluctuating but hovering around 30%, and frequent random crashes (i.e., crashes which are not reproduced even when running exactly the same script on the same material again) similar to those reported by Didée. In every single case, using (2), I observed encoding fps of about 4 fps, CPU load pretty much stuck at 100% (which is fine as the processes are all run at below normal priority, so they don't interfere with my other low-intensity uses of the computer), and no crashes. So, if I found a workable workaround, why am I still complaining? Three reasons: (1) Dgindex force-film is pretty good, but at least on material that is not quite 100% FILM, there are noticeable interlacing artifacts, which I've never observed with tfm.tdecimate(hybrid=1) which seems to do a smarter form of teleciding. In some frames with motion force-film just clearly matches incorrect fields, while tfm.tdecimate(hybrid=1) never seems to do it or hides these artifacts much better. (2) Possibly related to (1), these interlacing artifacts from dgindex force-film have a pretty substantial impact on encoded size. Because tfm.tdecimate does not work, I've not been able to do a clean one-to-one comparison, it appears that that the encode size (with the same crf setting) is perhaps 20%-30% larger than I've come to expect from material of this type and these settings. (3) Tfm.tdecimate still seems to work (with only relatively rare, random crashes) well on standard-definition material and produce very nice results, so I hate to give up on it. Motenai Yoda: I will try the experiment with using MT, rather than SetMTMode, at the next opportunity and report results. It looks a little less elegant, but if it works, I'll be happy. |
25th July 2011, 20:10 | #8 | Link |
Registered User
Join Date: Jan 2008
Posts: 185
|
Update on Motenai Yoda proposed solution:
I tried what you suggested, but received instant crashes using avs2pipemod26. Then I tried avs2pipe (2.5.8) and feeding the avs script straight to the latest 64-bit x264. Those methods appeared to work reasonably (i.e., not crashing immediately and getting to about 3 fps with about 70-75% CPU utilization). Unfortunately, those attempts eventually crashed after about encoding 10,000 frames (while dgindex force-film without tfm.tdecimate never crashes). Also, based on the description at Avisynth's SetMTMode/MT description, I am little concerned about the way MT works. While SetMTMode divides up the work between the processors one frame at a time, MT subdivides each frame. As I use MT to do the motion vector calculation, I wonder if that process can really be effective if each thread only looks at part of each frame, in particular if you have a lot of threads and slices, as appears necessary for MT to max out the CPU. I am starting to think that I need something equally good as tfm.decimate, but more stable in a multiple-processing/high-def environment. QTGMC, according to testimonials and the clips I've seen, appears great at deinterlacing/denoising and can handle both interlaced and progressive input. But can it telecide accurately? |
25th July 2011, 20:53 | #9 | Link | |
Registered User
Join Date: Dec 2003
Location: MO, US
Posts: 999
|
Quote:
I personally would not recommend using any mode of setmtmode with tdecimate... it can only make things slower and use a lot more memory. |
|
25th July 2011, 22:42 | #11 | Link | |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
Quote:
More likely, it always calls it if you are running MT Avisynth with SetMTMode switched on, as that is an easier condition to detect. In which case it would be called twice, with undesirable consequences. |
|
25th July 2011, 23:01 | #12 | Link | |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Quote:
|
|
26th July 2011, 00:57 | #13 | Link | ||
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 5,391
|
Quote:
Also, Quote:
(@tritical) Today, first I was struggling to reproduce yesterday's crashmania: everything rock stable... But then I could narrow it down to a small syntax detail ... This works: Code:
setmtmode(5,6) SetMemoryMax(1000) mpeg2source("1080i.d2v") tfm().tdecimate() SetMTMode(2) super = MSuper(planar=true) ... rest of MDegrain3 stuff ... #distributor() return(last) In contrast, this crashes very early: (often immediately) Code:
setmtmode(5,6) SetMemoryMax(1000) mpeg2source("1080i.d2v") tfm.tdecimate() SetMTMode(2) super = MSuper(planar=true) ... rest of MDegrain3 stuff ... #distributor() return(last) This was with SEt's 2.60.MT (Aug 13 2009) version, btw. Didn't try the recent one, yet. Bottomline: Problem solved. Just be a good boy and always write the parentheses.
__________________
- We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!) |
||
26th July 2011, 08:51 | #15 | Link | |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Quote:
As you already pointed out in a post in the Avisynth development thread, it would be necessary to parse and evaluate the script itself in order to do this. I wonder how many people are aware of this when they are encoding with Avisynth MT and x264. |
|
26th July 2011, 09:32 | #16 | Link | ||
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
Quote:
Code:
// Add cache to Bracketless call of argless function if (result.IsClip()) { // Tritical Jan 2006 return Cache::Create_Cache(result, 0, env); } Quote:
a) uses Avisynth directly, rather than through VfW, and b) is not 'MT-aware'. |
||
26th July 2011, 10:13 | #17 | Link | |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Quote:
Code:
Distributor() #for multithreading Anyway, RTFM is recommended as usual. |
|
|
|