ISO Good 64-bit MT-compatible DeInterlacers and Inverse Teleciners [Archive]

View Full Version : ISO Good 64-bit MT-compatible DeInterlacers and Inverse Teleciners

CarlEdman

26th July 2011, 22:22

I am trying to integrate my transcoding of both SD and HD sources into the current 64-bit x264 in such a way that both the full power of the CPU (i970, 6 cores, 12 threads) and available memory (24 GByte) is fully exploited. Piping 32-bit avs filters into x264 does not seem to be best solution for these purposes.

I'm happy with my current source filter (dgdecode run under MTmode 5) and degrainer (MDegrain3 under MTMode 2), but almost everything I've tried for deinterlacing or inverse telecining (as appropriate for the source material) seems to come up short.

What I've tried so far is:

Teleciders:
(*) Tfm.tdecimate(hybrid=1): For years, this was my preferred telecider because it produced nearly artifact free progressive stream for virtually any hard or soft telecined video I could throw at it. Unfortunately, I have recently found it just too unstable in a 64-bit or multithreaded avisynth environment, even if it is run under a conservative MTMode. Virtually all multithreaded HD encodes are crashed or slowed to a crawl by tfm.tdecimate and non-multithreaded HD encodes are just not a viable option, speed-wise, in particular if you use a demanding (and multi-threading-happy) degrainer like MDegrain3.

(*) Force-Film in DGIndex. This works, but leaves rather awful interlacing artifacts in the output, at least if there is even a few percent of interlaced frames. In contrast to tfm.tdecimate(hybrid=1), force film just seems not to work very well unless the source material is perfectly 100% telecined, which unfortunately only a relatively small fraction is.

Deinterlacers:
(*) TomsMoComp. For years, I was pretty happy with the results, but it is fair to say that it looses too much details compared to better interlaces (while producing very few artifacts). But there is no 64-bit version of it (as far as I know?), so it is out.

(*) QTGMC. Seems to produce gorgeous output on some test clips but (a) I've not been able to make all the plugins work under 64-bit avisynth and (b) it seems to be unnecessarily slow, in particular if you want it to perform an MDegrain3.

(*) LeakKernelDeint(). I gave up on this one once I realized that I had to specify parity explicitly for it and it refuses to just trust the avisynth settings.

(*) TDeint(mode=2, type=2 (or 3 for animation)). I am experimenting with this one right now and it seems to work pretty well.

So if anybody has any better suggestion for 64-bit deinterlacers and inverse teleciners (which are either actually capable of multi-threading or at least run in a reasonable amount of time while single-threaded together with other filters run in multi-threaded mode), I'd love to hear them.

SSH4

26th July 2011, 23:19

Heh, another 64bit victim.

Looks like 64bit useless for Avisinth while all avisinth developer not upgrade their systems to something new with more than 4Gb ram and new cpu's.

but you can try this for ivtc
TFM(clip2=nnedi3(pscrn=false),cthresh=10,mi=80).tdecimate()
TFM(clip2=QTGMC(Preset="Very Slow",sharpness=0,ShowSettings=false,sourcematch=3,Lossless=2).selecteven(),cthresh=10,mi=80).tdecimate()
for example.
this will use nnedi or qtgmc only on combed frames
also you can tune cthresh/mi and other TFM settins.

Didée

27th July 2011, 01:31

@Carl - So you're explicitely speaking about 64-bit Avisynth? Also the problems with TFM/TDecimate are referring to 64bit?

As of now, the consens on 64bit-Avisynth is this: it runs as long as it runs, and it crashes when it crashes. That's how it is, period.

Blue_MiSfit

27th July 2011, 09:56

If you want to effectively use all your computing resources, you need to do a lot of encodes in parallel. If you have a lot of content to process, just do a lot at once :)

If you're trying to do one encode as quickly as possible, split it up into several pieces (via multiple AVS scripts), and encode into a fast decoding lossless codec. Then, when you're done, stitch these together in a new avisynth script, and pipe this into x264.

Derek

TheFluff

27th July 2011, 13:40

Pretty much all filters are unstable under MT Avisynth. If you want stability, don't use it.

CarlEdman

27th July 2011, 13:42

@Didée: Unfortunately not. I get these constant, random tfm.tdecimate crashes on HD material (and more rarely on SD material) regardless of whether I run them within a 64-bit environment or use a 32-bit front-end filter like avs2pipe(mod)?(26)?.

The triggering factor seems to be heavy multi-threading with high-definition as an aggravating factor. On my old quad-core, not-HT (i.e., 4 cores, 4 threads) CPU, these problems hardly ever appeared though MDegrain3 on high-def material was slow as hell. My new i970 CPU (i.e., 6 cores, 12 threads, updated instruction set) for which I paid a premium principally in order to speed up x264 --veryslow encodes with MDegrain3 does appear to be at least 3 times faster on average. But I get all these crashes from my teleciders and deinterlacers now, unless I use setting that basically cripple my new CPU back to the speed of my old CPU.

@Blue_MiSfit: Thanks for the suggestion. I think it would work, but doing it right (e.g., first finding the best key-frames to split the file at), and building the infrastructure to permit running a large number of worker threads in parallel into my current custom build system (about 1,000 lines of Python which make conversion of DVDs, mpgs, VOBs, and mkvs into tagged/cover-arted/h264/AAC mp4s almost totally automatically) would be quite a bit of work. But that may just be what it comes to.

CarlEdman

27th July 2011, 15:25

Pretty much all filters are unstable under MT Avisynth. If you want stability, don't use it.

Not quite. My principal initial filters, DGDecode, ColorMatrix, and crop/autocrop work very reliably in 64-bit MT, as long as they are run under a conservative SetMTMode(5), which is fine as they are fast enough not to create a bottleneck even in that mode. And my principal post-processing filters, MDegrainX and related filters (under both 32-bit and 64-bit), works like a dream with respect to quality, speed, and stability under an aggressive SetMTMode(2).

It is just the deinterlacer/teleciders which are having problems.

TheFluff

27th July 2011, 15:51

Not quite. My principal initial filters, DGDecode, ColorMatrix, and crop/autocrop work very reliably in 64-bit MT, as long as they are run under a conservative SetMTMode(5), which is fine as they are fast enough not to create a bottleneck even in that mode. And my principal post-processing filters, MDegrainX and related filters (under both 32-bit and 64-bit), works like a dream with respect to quality, speed, and stability under an aggressive SetMTMode(2).

It is just the deinterlacer/teleciders which are having problems.

You do know that MT mode 5 is slower than not using MT at all, right? (It's because mode 5 does not, in fact, multithread anything. It's a single threaded compatibility mode used for filters that WILL always crash or do funny things if you attempt to multithread them.)

The vast majority of Avisynth plugins are extremely un-threadsafe, and Avs-MT itself is not exactly stable and well tested code. It's an ugly hack, really. If you want multithreaded Avisynth filtering, what Didée suggested is by far the simplest and most stable solution.

Didée

27th July 2011, 15:52

@Carl - I'm currently running a test with (basically) your script from here (http://forum.doom9.org/showthread.php?p=1515475#post1515475), but including my suggestion (http://forum.doom9.org/showthread.php?p=1515532#post1515532) of using less threads.

This script
setmtmode(5,6)
SetMemoryMax(1000)

mpeg2source("1080i.d2v")

tfm().tdecimate()

bs = 8 # Blocksize for MAnalyse

SetMTMode(2)
super = MSuper(planar=true)
bv1 = MAnalyse(super, isb = true, delta = 1, blksize=bs, overlap=bs/2)
fv1 = MAnalyse(super, isb = false, delta = 1, blksize=bs, overlap=bs/2)
bv2 = MAnalyse(super, isb = true, delta = 2, blksize=bs, overlap=bs/2)
fv2 = MAnalyse(super, isb = false, delta = 2, blksize=bs, overlap=bs/2)
bv3 = MAnalyse(super, isb = true, delta = 3, blksize=bs, overlap=bs/2)
fv3 = MAnalyse(super, isb = false, delta = 3, blksize=bs, overlap=bs/2)
MDegrain3(super,bv1,fv1,bv2,fv2,bv3,fv3,thSAD=400,planar=true)

return(last)

is now running for 3 hours, and there's no sign of instability or slowdown so far, rendering at ~3.15 fps on an i7-860. In particular, the memory allocation is rock stable at ~1400 MB.

I suspect that you simply have memory issues from using too much threads for 1080-HD. Sometimes, less is more. ;)

Moreover, piping the script to x264 is definetly recommended, so that x264 memory allocation is independent from Avisynth. One single 2GB block for both of them can easily become too small, for such a script on fullHD.

All of this refers to 32-bit Avisynth, obviously. I've given up to make any "stability investigations" with 64-bit Avisynth.

BTW, unless you have a very specific reason to use a blocksize of 8, I'd really recommend to use blocksize 16 instead. For 1080-ish content, 8 usually is too small (danger of "compensating the noise"), 16 usually fits better to such big framesizes. Also, the speed is much better: instead of ~3.15 fps, I get ~8.4 fps with blocksize 16. That's roughly 250% speed, along with more effective denoising.

CarlEdman

27th July 2011, 17:21

You do know that MT mode 5 is slower than not using MT at all, right? (It's because mode 5 does not, in fact, multithread anything. It's a single threaded compatibility mode used for filters that WILL always crash or do funny things if you attempt to multithread them.)

I do know that. Fortunately, dgdecode, ColorMatrix, and crop are so fast (compared to MDegrain and x264 --veryslow) even running in a single thread that they never become bottlenecks.

If I could just find a reasonably good and fast telecider (or deinterlacer) which is stable when (a) the telecider (or deinterlacer) operates under MTMode 5, but (b) other parts of the avisynth script (e.g., MDegrain3) is running under MTMode 2, I'd be perfectly happy.

CarlEdman

27th July 2011, 17:28

I suspect that you simply have memory issues from using too much threads for 1080-HD. Sometimes, less is more. ;)
I have 24 GByte of memory and am (as far as the filters are available) running everything in 64-bit mode, making all of that RAM available to any single process, so I would hope that I am not running into memory problems.
BTW, unless you have a very specific reason to use a blocksize of 8, I'd really recommend to use blocksize 16 instead. For 1080-ish content, 8 usually is too small (danger of "compensating the noise"), 16 usually fits better to such big framesizes. Also, the speed is much better: instead of ~3.15 fps, I get ~8.4 fps with blocksize 16. That's roughly 250% speed, along with more effective denoising.
Thank you! That sounds like excellent advice and I will adapt my build process to use blocksize 16 in MDegrain for high-definition content.

Didée

27th July 2011, 17:47

am (as far as the filters are available) running everything in 64-bit mode,

Which part of "64bit Avisynth is not stable" is so hard to understand?

CarlEdman

27th July 2011, 18:07

Which part of "64bit Avisynth is not stable" is so hard to understand?

The part where 64-bit avisynth works just fine, completely stable, with my usual filters for standard definition material.

johnmeyer

27th July 2011, 18:36

Have you done any tests to determine whether this quest is worth the time and cost? In particular, have you verified -- using short scripts that are stable -- that you are going to get a big enough boost in performance going from 32-bit to 64-bit to make the effort worthwhile? Using MT in a 32-bit environment can make huge differences in performance. By contrast, I haven't seen much evidence (although I haven't looked very hard either) that 64-bit AVISynth is going to yield huge performance gains compared to running the same (or similar) scripts using the 32-bit version.

An, as Didée has pointed out, lack of stability makes the whole question pretty easy to answer (at least for me): stay with 32-bit and get on with the next project.

CarlEdman

27th July 2011, 20:11

Well, Didée may be right in the end. I tried various test encodes using small clips, but 32-bit avs2pipemod to 64-bit x264 seems to be the way to go for now. According to these test, he also got the thread number (6) right. I tried 4, which was a little slower, and 8, which was a little faster, but was unstable unless accompanied by SetMemoryMax(1000).

So right now, I'm encoding a full hour-length 1080i30 (really 95% of so telecined 24p) show using pretty much his settings (32-bit avisynth 2.58 MT, mode 2, threads=6, mode=5 for the MDegrain). After 3000 frames, so far, so good, though the CPU usage only is about 90% and fluctuates a fair bit downward frequently. Will post with results when the entire show (90,000 frames) is done.

Blue_MiSfit

27th July 2011, 22:40

Interesting results! So, 32 bit avisynth with conservative MT, plus piping into 64 bit x264 for process separation (so avisynth can use all 2gb without competing with x264) seems reasonably stable eh?

Me gusta :)

Definitely let us know how that goes, and if you have issues with it in the future!

CarlEdman

29th July 2011, 13:53

Ok, here is the promised update. I have now transcoded most of the Game of Thrones first season (captured off my TiVo, to be archived in more efficient format on my NAS) on my i970 (6 cores, 12 threads) with 24 GBytes of RAM running Windows 7.

I used the following AVS script:
SetMTMode(5,6)
DGDecode_mpeg2source("{d2vfile}", info=0, idct=4, cpu=3)
ColorMatrix(d2v="{d2vfile}",interlaced=true)
tfm().tdecimate(hybrid=1)
SetMTMode(2)
super = MSuper(planar=true)
bv1 = MAnalyse(super, isb = true, delta = 1, blksize=16, overlap=8)
fv1 = MAnalyse(super, isb = false, delta = 1, blksize=16, overlap=8)
bv2 = MAnalyse(super, isb = true, delta = 2, blksize=16, overlap=8)
fv2 = MAnalyse(super, isb = false, delta = 2, blksize=16, overlap=8)
bv3 = MAnalyse(super, isb = true, delta = 3, blksize=16, overlap=8)
fv3 = MAnalyse(super, isb = false, delta = 3, blksize=16, overlap=8)
MDegrain3(super,bv1,fv1,bv2,fv2,bv3,fv3,thSAD=400,planar=true)
Distributor()
The command line to do the transcode used was:
avs2pipemod -y4mp {avsfile} | x264[latest 64-bit version] --demuxer y4m - --tune film --preset veryslow --crf 22.0 --non-deterministic --profile high --level 4.0 --sar 1:1 --fps 24000/1001 --output {mp4file}
The Good:
(*) Stability: Not a single crash so far.
(*) Quality: Almost no interlacing artifacts, even though the source material is only around 95% film. tfm.tdecimate(hybrid=1) does a good a job as ever.

The Meh:
(*) Speed: The script just doesn't use the full power of the machine. Some extended periods, CPU usage reaches 90% and encoding 6 fps. But for other extended periods, CPU usage drops to 20% and encoding to about 2 fps. Clearly there is a single-thread bottleneck, probably tfm.tdecimate.
(*) Size: I have had to increase the crf from 20 (which I usually use for high-def material) to 22 and the video bit rate still is about 2,500 to 3,000 MBps, a bit more than I usually see for high-def material with crf 20. But that is most likely, if anything, a x264 issue or just the more demanding content. The results are still acceptably small and of excellent quality.

In sum, I do wish we had a nice, stable, multi-threading friendly, ideally 64-bit version of tfm.tdecimate, but this process will do for now.