Log in

View Full Version : TemporalDegrain problem with SetMTMode


Reginald0
22nd September 2013, 07:21
Hello, folks!

First I would like to say that this post is a bit longer than the usual, but the reason is to provide as much information as possible in order to help you help me :-).

I would like to fix some of the BluRay titles I acquired recently, due to a seeming tendency to have a lot of graining in this kind of media.
My goal is to backup all of my BluRay discs at my local storage to MKV container encoding to AVC format using x264 with CRF value 20, all other settings default.

I have been testing many methods of degraining, including high quality ones like MCTemporalDenoiser and MC_Spuds, but in my personal opinion, and for my specific needs, TemporalDegrain is the hands down winner.

As well stated in the docs, this script function is very slow, so I'm trying to make it work a little bit faster, but the problem is that I couldn't find a way to make it work using SetMTMode instruction, cause every attempt I made led to crash the process at the start of my script, and the error is always related to FFTW3.dll.
There's absolutely no errors using this amazing function without SetMTMode, but the speed is barely above 1 fps.

Follow below all details of my issue:

Script sample:
SetMTMode(3, 0) # Also tried many different values for (mode, threads) here
DSS2("BluRay Title.m2ts") # Also tried with FFVideoSource (FFmpeg) and AVCSource (DGDecode)
SetMTMode(2)
TemporalDegrain()

VirtualDub error when trying to open the script:
An out-of-bounds memory access (access violation) occurred in module 'fftw3' writing address 00000020.

Media Player Classic error when trying to open the script:
Problem Event Name: APPCRASH / Fault Module Name: fftw3.dll

AviSynth versions tried:
2.5.8 MT and 2.6.0 MT with the most recent avisynth.dll, both 32-bit.

FFTW3.dll versions tried at SysWOW64 folder:
old 30.01.2004 (available at avisynth.org site) and newest 25.11.2012 (available at fftw.org site)

Relevant machine specs:
Intel i7 3770 3.4Ghz (quad-core hyper-threading), 16 Gb RAM DDR3 1600 MHz

Operating System:
Windows 7 SP1 x64

Considering my machine is fairly good, I would like to get at least 3 fps, but when I tried using TemporalDegrain_MT wrapper script with AviSynth 2.5.8 (that uses MT instruction instead of SetMTMode), I just got around 1.7 fps, which is far from my expectations.

I know I can gain speed increasing the blksize value or lowering the degrain parameters, but in my tests, this shows considerable quality loss, so I want to avoid it.

If you can help me to solve this issue, and need some more details about it, feel free to ask, and I would be very grateful.

Thanks in advance,

RegiOween

Didée
22nd September 2013, 19:57
Have you tried setting "HQ=0"? Not sure if it's related to your problem, however ... HQ>0 does use the HQDN3D() filter, and that one cannot work correctly with SetMTmode, because it uses internal recursion, which necessarily fails under SetMTmode. Perhaps it also causes false cachehints, or something.

Moreover, when using Avisynth'MT, I suggest to *always* set a SetMemoryMax(xxx) in the 1st script line. At least some versions of Avisynth'MT have a bug that disables the automatic memory limiting.

Reginald0
23rd September 2013, 06:57
Hello, Didée!

I would like to say that I'm really proud to have my post first responded from one of the most important members of this forum, the living legend that have his name carved in such amazing functions like TemporalDegrain, LimitedSharpen and TempGaussMC. Your contributions to this comunity will last forever...

About your wise suggestions, I tried both, first putting the instruction SetMemoryMax(4096), since I know 32-bit applications doesn't address memory above 4Gb, and also tried hq=0, but despite the FFTW3.dll problem persists, at least there's some news:

I have updated the Avisynth to the newest official version (2.6.0 alpha 5 - 2013.09.18), updated the MT DLL with newest version (2013.03.09), and tried with different versions of FFTW3.DLL in C:\Windows\SysWOW64 folder, so the error message changes depending of the version used:

FFTW3.DLL (2004.01.30):
Avisynth open failure: access violation at 0x0004B0BF in C:\Windows\System32\fftw3.dll, attempting to write to 0x00000000.

FFTW3.DLL (2009.07.18):
Avisynth open failure: access violation at 0x0004232C in C:\Windows\System32\fftw3.dll, attempting to write to 0x00000020.

FFTW3.DLL (2012.11.25):
Avisynth open failure: access violation at 0x0004B0BF in C:\Windows\System32\fftw3.dll, attempting to write to 0x00000020.

Note that the error message is pointing to a file in System32 folder, but the actual FFTW3.DLL file resides on SysWOW64 folder, and the proof is, if I rename this file, the error message changes:
AviSynth open failure: FFT3DFilter: Can not load FFTW3.DLL!

I'm just about to going crazy with this error, and I'm suspecting there is something related to my Windows installation, so I'm gonna try the bazooka shot alternative: Do a clean install of Windows in another hard-drive with just AviSynth and TemporalDegrain required files, and see what happens. I'll post here later when I have some conclusion about it.

Meanwhile, many thanks for your attention.

Didée
23rd September 2013, 08:38
Try a much lower value like SetMemoryMax(600). The barrier to avoid is 2GB, because that is the usual limit for 32bit apps (leaving out LAA hacks/tricks).

Problem is that
a) TemporalDegrain will grab very much ressources (because it uses a 3+3 MDegrain on an already 3+3 MDegrain'ed clip. Temporal chaining of MDegrain eats ressources for breakfast! Faster than you can say "d'oh"!)

b) The ressources caused by a) will multiply when the source is full-HD.

c) The ressources multiplied by b) will multiply and multifold when SetMTmode is used.

Most probably, everything still fits (more or les barely) into 2GB memory when running standard single-threading, but when multithreading is used, the 2GB memory limit is exceeded before Avisynth manages to deliver the first frame.
Probably this can be avoided by using a SetMemoryMax value that is small (small!) enough to keep the whole processing chain under 2GB. BUT, it is quite likely that this will make the processing chain inefficient enough (cache misses > re-calculations, aso.) that the whole thing won't run much faster than standard single-threading.
In any case, you should manually specify a lower thread count in the 1st SetMMax call - 2, 3 or 4 threads. If you have e.g. a quadcore with HT (8 logical cores), the "0" default will create 8 MT threads, which is insane for the case of [FullHD]+[ComplexScript].

If it has to be by all means, then the usual workaround is to run fully separate processes - either splitting the frames spatially, or splicing the source in segments ... then run as much pieces in parallel as system memory allows, save all piecework to temporary files (lossless or quasi-lossless), and afterwards serve all pieces together and make the final encoding.

Big hassle? Yes it is. But you just can't help. Bottom line is that TemporalDegrain, especially with full 3+3 frame searching, is just too complex to be used on full-HD clips.

Reginald0
23rd September 2013, 23:56
Hello, Didée!

Thank you very much for your clear and valuable explanation about the whole TemporalDegrain process, so now I can understand the issues and limitations in multi-threading such a complex function like yours.

All things considered, I think I will give up about multi-threading, and instead, focus on multi-tasking. I have to make some tests to check if my little baby can handle maybe four or five x264 sessions at the same time, without becoming unstable. This wouldn't get the job done faster, as I would like to, but at least will take advantage of the resources in a more productive way.

Once again, many thanks for your attention and knowledge, you're a real pro, and congratulations for your awesome contributions to this comunity, keep it up!

RegiOween

Reginald0
26th September 2013, 21:17
Hello, folks!

Just to update the topic with more concerning info, I would like to inform that after many hours of reading, testing, comparing, etc., I finally managed to get a satisfying result, combining good performance with high quality output.
As I said before, until this multi-threading thing got more mature in Avisynth, from now on I prefer to focus on multi-tasking to get the best out of my machine resources.

The method I choose is the following:

# Script_Part_1.avs

DSS2("BluRay Title.mts") # DSS2 is very fast and reliable for me, unlike native DirectShowSource
TemporalDegrain() # Defaults are pretty much ok for me, but if I want to tweak, I would
# play with degrain, sad1 and sad2 parameters.
LSFmod(defaults="slow") # A limited sharpening after degraining looks good for me, and
# the "slow" preset gives much better quality
Trim(0, FrameCount/2) # Get the first half of the video

# Script_Part_2.avs

# The same script above, except for the last line:
Trim((FrameCount/2)+1, 0) # Get the second half of the video

Then I enqueue both scripts to my favourite front-end (MeGUI), thas is capable of run both serial and parallel jobs, and start a parallel encode using x264 with CRF 20 (very good balance between quality and file-size), preset "very slow" (very slower indeed, but much better compression), and all other settings default.

The two jobs are running at about 4.8 fps (2.4 fps each), so in the end I get much more than what I wanted to achieve in the first place (3 fps). The 4 cores of my CPU are running at full load, so I see no reason to start more than 2 jobs. When the jobs finish, I just merge both parts together with the audio-track in a MKV container, and it's done.

Note 1: To get my machine running at full load for many hours uninterrupted without become unstable, I had to disable hyper-threading in my bios setup, otherwise, x264 was crashing after some time, and in a few occasions I even got BSOD.

Note 2: I often use the term "for me" in my sentences on purpose, so opinions may vary.

I hope this little experiment can help other users, and if you have some suggestions or advices, or just comment about something, you'll be very welcome.

RegiOween