MVTools, Depan, DepanEstimate for VapourSynth [Archive]

jackoneill

21st September 2014, 21:36

So I thought, "Why not port MVTools?"

https://github.com/dubhater/vapoursynth-mvtools/releases

readme.rst (https://github.com/dubhater/vapoursynth-mvtools/blob/master/readme.rst)

Now includes ports of Depan and DepanEstimate.

Crashes or corrupted output may happen, as I haven't done very extensive testing.

Reel.Deel

21st September 2014, 21:49

Awesome! :)

Thanks for your efforts.

Mystery Keeper

22nd September 2014, 00:22

feisty2

22nd September 2014, 03:33

dope work, thx for ur support to vapoursynth, I'm tryna have a taste of new vapoursynth lately

Myrsloik

22nd September 2014, 13:32

Myrsloik was against it, but since we've got no better alternative, THANK YOU! With native motion compensation VapourSynth can finally have all the best filters without the need to use AviSynth plugins!

I was mostly against touching that code myself and trying to clean it up. It's obvious that jackoneill can tolerate worse code than me.

Now it's time for some VapourSynth world domination!

jackoneill

22nd September 2014, 19:41

v1.1 fixes a crash and brings back the "dct" parameter to Analyse and Recalculate. Now you can make your scripts really slow.

lansing

23rd September 2014, 03:18

great news, have been waiting for this who knows how long.

I did some benchmarks with the sample script for mdegrain2 in the documentation,
AVISource("c:\test.avi") # or MPEG2Source, DirectShowSource, some previous filter, etc
super = MSuper(pel=2, sharp=1)
backward_vec2 = MAnalyse(super, isb = true, delta = 2, overlap=4)
backward_vec1 = MAnalyse(super, isb = true, delta = 1, overlap=4)
forward_vec1 = MAnalyse(super, isb = false, delta = 1, overlap=4)
forward_vec2 = MAnalyse(super, isb = false, delta = 2, overlap=4)
MDegrain2(super, backward_vec1,forward_vec1,backward_vec2,forward_vec2,thSAD=400)

super = core.mv.Super(src)
mvbw2 = core.mv.Analyse(super, isb=True, delta=2, overlap=4)
mvbw = core.mv.Analyse(super, isb=True, delta=1, overlap=4)
mvfw = core.mv.Analyse(super, isb=False, delta=1, overlap=4)
mvfw2 = core.mv.Analyse(super, isb=False, delta=2, overlap=4)

out = core.mv.Degrain2(clip=src, super=super, mvbw=mvbw, mvfw=mvfw, mvbw2=mvbw2, mvfw2=mvfw2, thsad=400)

cpu can be loaded 100%, 64 bit version is about 20% faster than 32 bit version. With a 720x480 source, 64 bit gave me 60fps while 32 bit gave me 50fps.

Keiyakusha

23rd September 2014, 07:54

On i7 8 threads, the same script as above looks something like this for me:

Vapoursynth x86, d2vsource, vapoursynth-mvtools:
threads=4 -> 27fps, 50% CPU
threads=8 -> 40fps, 100% CPU

AVS+ x86 MT, mpeg2source, mvtools SVP:
Prefetch(4) -> 42fps, 50% CPU
Prefetch(8) -> 57fps, 100% CPU

Edit:
AVS 2.6 x86 MT, mpeg2source, mvtools SVP:
4 threads -> 40fps, 42% CPU
8 threads -> 55fps, 93% CPU

Edit2:
Vapoursynth x86, d2vsource, mvtools SVP (avs plugin):
threads=4 -> 41fps, 50% CPU
threads=8 -> 42fps, 60% CPU (wow, this one sucks >__<)

Anyway, to sum this up, for me vapoursynth-mvtools port is way slower than any other solutions.

Are_

23rd September 2014, 11:54

This is on an AMD cpu 8 cores.
MPEG2Source was used on avisynth, and d2vsource for vapoursynth, source was a VOB file from a DVD and used the sample code from lansing:

# AVISYNTH 2.6 MT - MVTOOLS SVP
Frames processed: 2501 (0 - 2500)
FPS (min | max | average): 34.21 | 82.74 | 51.78
CPU usage (average): 94%
Thread count: 10
Physical Memory usage (peak): 585 MB
Virtual Memory usage (peak): 733 MB
Time (elapsed): 000:00:48.304

# VAPOURSYNTH WIN32
Output 2500 frames in 67.32 seconds (37.13 fps)

# VAPOURSYNTH LINUX x86_64 (-march=native -O2)
Output 2500 frames in 55.85 seconds (44.76 fps)

# VAPOURSYNTH LINUX x86_64 (-march=native -Ofast -lto)
Output 2500 frames in 54.89 seconds (45.55 fps)

# VAPOURSYNTH LINUX x86_64 (retarded compiler flags optimizations)
Output 2500 frames in 53.39 seconds (46.83 fps)

All vapoursynth tests were topping the cpu at 100%.

I don't know why avisynth scored so poorly for me compared to Keiyakusha. :/

DarkSpace

23rd September 2014, 12:15

I guess that the difference comes from the fact that Keiyakusha used the SVP MVTools, while the Dither MVTools are probably (does anyone know for certain?) based on the original MVTools2 rather than the modified SVP MVTools.

Keiyakusha

23rd September 2014, 12:16

I don't know why avisynth scored so poorly for me compared to Keiyakusha. :/
Looks like you incorrectly used MT in Avisynth. It reports threadcount 8 and cpu load 33% but this threadcount is a total value from all the processes involved, means mvtools processing probably was using way less threads.
Edit: well anyway, maybe avsmeter author can explain what this value really shows, but in my case the number of threads was like twice than the ones you have even in case avisynth was configured to 4 threads.
Also in any multithreaded environment, mvtools svp should be noticeably faster than the one from dither package (assuming latter one runs without avstp). However in singlethreaded environment SVP is slower than alternatives.

Edit: in case of AVS+ with the script above, my threadcount reads 20 for Prefetch4 and 24 for Prefetch8. Have no time to install AVS2.6 again but I expect it to be similar.

Are_

23rd September 2014, 12:40

Ok my bad, I didn't know setting mtmode half script does not work.

Updated results in the original post.

TurboPascal7

23rd September 2014, 12:51

Edit: in case of AVS+ with the script above, my threadcount reads 20 for Prefetch4 and 24 for Prefetch8. Have no time to install AVS2.6 again but I expect it to be similar.
More info on this here (http://forum.doom9.org/showpost.php?p=1668266&postcount=629). It has nothing to do with avsmeter, just avs+ being buggy.

Keiyakusha

23rd September 2014, 13:02

More info on this here (http://forum.doom9.org/showpost.php?p=1668266&postcount=629). It has nothing to do with avsmeter, just avs+ being buggy.

Even with avs 2.6 MT avsmeter displays more threads than the number we set in "setmtmode". For whatever reason.
But yeah, I wondered why I have the whole 20+ threads. This explains it!

jackoneill

23rd September 2014, 13:41

Anyone want to compare this version's speed when using a single thread and the original Avisynth plugin (2.5.11.3) running in a non-MT Avisynth?

Myrsloik

23rd September 2014, 13:48

Anyone want to compare this version's speed when using a single thread and the original Avisynth plugin (2.5.11.3) running in a non-MT Avisynth?

Original meaning the dither tools version of mvtools. right?

jackoneill

23rd September 2014, 14:20

Original meaning the dither tools version of mvtools. right?

No, original is this: http://avisynth.org.ru/mvtools/mvtools-v2.5.11.3.zip. 8 bit and no internal multithreading.
The one from Dither is a fork of an earlier version of this original.

foxyshadis

23rd September 2014, 19:51

Single threaded performance on my Haswell laptop, 720x480 video, mvdegrain2 script above:
mvtools 2.5: 9.7fps
mvtools 2.6: 20.6fps
vsmvtools 32: 21.0fps
vsmvtools 64: 23.9fps

8-threaded performance:
vsmvtools 32: 36.6
vsmvtools 64: 41.2

Will edit in with avisynth threaded performance when I get hold of it again.

lansing

23rd September 2014, 22:44

720x480 source,

non-mt avisynth mvtools2 original: 12.89fps

single thread vapoursynth:
32 bit: 11.77fps
64 bit: 15fps

Keiyakusha

24th September 2014, 08:58

720x480 source. All used plugins are the latest versions.

AVS 2.6 Alpha5, mpeg2source:
vanilla-mvtools: CPU ~15%; 10 fps
svp-mvtools: CPU ~15%; 12.3 fps
dither-mvtools: CPU ~15%; 9.3 fps (without avstp)

Vapoursynth x86, d2vsource:
vapour-mvtools: CPU ~15%; 8.75 fps (threads=1)

Edit: also it might be useful to note that without any processing, d2vsource is 2 times faster than mpeg2source (1000+ fps) so it can't cause any fps drop.

Reel.Deel

24th September 2014, 14:58

A while back I did some speed test with different versions of MVTools. Results were similar to Keiyakusha's; MVTools from SVP (http://svp-team.com/wiki/Plugins:_MVTools2) is faster than the rest. On their website they claim MAnalyze to be faster. Maybe jackoneill can include this difference?

MAnalyse
Can be faster than original version (with chroma=true) by 20-40%, look at PlaneOfBlocks.h for changes.

Bloax

24th September 2014, 15:55

Mmm, sounds like I'll have to replace Avisynth soon.
Great news!

Are_

24th September 2014, 21:58

Some more test, mvdegrain2 script above:

### 1 thread ###
## 720x480p ##
# AVISYNTH 2.6 Beta 5 - vanilla mvtools :: 7.21 fps
# VAPOURSYNTH WINDOWS 32bit :: 6.46 fps
# VAPOURSYNTH WINDOWS 64bit :: 7.99 fps
# VAPOURSYNTH LINUX 64bit :: 7.92 fps

## 1920x1080p ##
# AVISYNTH 2.6 Beta 5 - vanilla mvtools :: 1.09 fps
# VAPOURSYNTH WINDOWS 32bit :: 0.93 fps
# VAPOURSYNTH WINDOWS 64bit :: 1.09 fps
# VAPOURSYNTH LINUX 64bit :: 1.18 fps

### 8 threads ###
## 720x480p ##
# AVISYNTH 2.6 MT - vanilla mvtools :: 40.38 fps
# VAPOURSYNTH WINDOWS 32bit :: 37.24 fps
# VAPOURSYNTH WINDOWS 64bit :: 45.66 fps
# VAPOURSYNTH LINUX 64bit :: 47.06 fps

## 1920x1080p ##
# AVISYNTH 2.6 Beta 5 - vanilla mvtools :: 6.08 fps
# VAPOURSYNTH WINDOWS 32bit :: 4.95 fps
# VAPOURSYNTH WINDOWS 64bit :: 5.78 fps
# VAPOURSYNTH LINUX 64bit :: 6.53 fps

Groucho2004

25th September 2014, 08:51

Even with avs 2.6 MT avsmeter displays more threads than the number we set in "setmtmode". For whatever reason.
But yeah, I wondered why I have the whole 20+ threads. This explains it!
AVSMeter displays the the number of threads spawned by avisynth and all loaded modules (DLLs). If a module spawns multiple threads itself (in combination with AVSTP, for example), they will of course be added to the thread count.

Mystery Keeper

25th September 2014, 11:05

Concerning the speed: building it with MSVC would probably help.

jackoneill

28th September 2014, 12:31

v2.0 is out (https://github.com/dubhater/vapoursynth-mvtools/releases).

The last two filters used by QTGMC are now available. I'm curious how these two perform compared to the original Avisynth plugin, because I replaced some inline asm with C code.

If you're feeling grateful for this and other VapourSynth ports, maybe buy me an ebook (https://gist.github.com/dubhater/12a6af383dd006999ba3).

Mystery Keeper

28th September 2014, 13:39

What a glorious day! Thank you very much for your hard work!

Reel.Deel

28th September 2014, 15:48

Good day indeed :). Is there an accurate way to measure speed/performance in VS?

Keiyakusha

28th September 2014, 16:01

Good day indeed :). Is there an accurate way to measure speed/performance in VS?

vspipe?

---

BTW, with the script posted on the 1st page, speed is about the same for 2.0 build. But I it doesn't uses "last two filters". Still QTGMC overall was slower with vapour-mvtools so with these filters speed will probably be same as before at best. This means we're still somehow dependent on avisynth plugin in case of x86 architecture.

Are_

28th September 2014, 22:37

Here we go.
Source was 720x480p. MPEG2Source for avisynth, d2vsource for vapoursynth.

super = core.mv.Super(src)
mvbw = core.mv.Analyse(super, isb=True, delta=1, overlap=4)
mvfw = core.mv.Analyse(super, isb=False, delta=1, overlap=4)
out = core.mv.FlowBlur(clip=src, super=super, mvbw=mvbw, mvfw=mvfw, blur=100)

1 thread
# AVISYNTH 2.6 Beta 5 - vanilla mvtools
11.60 fps

# VAPOURSYNTH WINDOWS 32bit
9.92 fps

# VAPOURSYNTH WINDOWS 64bit
11.80 fps

# VAPOURSYNTH LINUX 64bit
12.00 fps

8 threads
# AVISYNTH 2.6 MT - vanilla mvtools
57.76 fps

# VAPOURSYNTH WINDOWS 32bit
55.96 fps

# VAPOURSYNTH WINDOWS 64bit
65.95 fps

# VAPOURSYNTH LINUX 64bit
69.11 fps

super = core.mv.Super(src)
vectors = core.mv.Analyse(super, isb=False, delta=1, overlap=4)
out = core.mv.Mask(src, vectors)

1 thread
# AVISYNTH 2.6 Beta 5 - vanilla mvtools
27.81 fps

# VAPOURSYNTH WINDOWS 32bit
24.22

# VAPOURSYNTH WINDOWS 64bit
28.46 fps

# VAPOURSYNTH LINUX 64bit
31.01 fps

8 threads
# AVISYNTH 2.6 MT - vanilla mvtools
108.77 fps

# VAPOURSYNTH WINDOWS 32bit
130.80 fps

# VAPOURSYNTH WINDOWS 64bit
151.90 fps

# VAPOURSYNTH LINUX 64bit
168.53 fps

Revgen

28th September 2014, 23:20

Thank you jackoneill!

jackoneill

1st October 2014, 16:53

:goodpost:

May I request that MFlowInter be ported as well, which is used by Firesledge's ivtc_txt60mc (http://forum.doom9.org/showthread.php?p=1466105#post1466105) function. :thanks:

ivtc_txt60mc is a useful function.

jackoneill

3rd October 2014, 15:11

v3 is out (https://github.com/dubhater/vapoursynth-mvtools/releases). It includes FlowInter and fixes for two problems.

I realised that a single version number is sufficient.

feisty2

3rd October 2014, 16:47

add mdegrainN and multivectors functions maybe?
large time radius could be kinda useful along with dct=1 to those "shivering" kinda clips

jackoneill

3rd October 2014, 17:48

add mdegrainN and multivectors functions maybe?
large time radius could be kinda useful along with dct=1 to those "shivering" kinda clips

Maybe. What are these multivector functions?

feisty2

3rd October 2014, 18:08

they are from the modified version in ditherpackage
with "multi=true" in manalyse, a common vector clip will turn into multivectors clips
and the special multivectors clips can be passed to special functions like mdegrainN or mcompensate via "tr" parameter

feisty2

3rd October 2014, 18:17

vmulti=super.manalyse (delta=6,multi=true) is simply identical to
bv6=super.manalyse (delta=6,isb=true)
bv5
bv4
...
fv4
fv5
fv6

Mystery Keeper

3rd October 2014, 18:22

feisty2, here (https://bitbucket.org/mystery_keeper/templinearapproximate-vapoursynth/src/5bbf9c72d57508dd1aef7ec3cc1efb1420444418/MCDenoise.py?at=master#cl-87) is how I do it.

feisty2

3rd October 2014, 18:30

wow, cool, thank you, Mystery Keeper :)

feisty2

3rd October 2014, 18:58

little pickle here, Mystery Keeper, what should I do if I want mdegrainn (tr=6) instead of mcompensate :(

spawnbsd

3rd October 2014, 20:50

Great plugin, but any idea when we'll see 16bit support ?

jackoneill

3rd October 2014, 21:06

Great plugin, but any idea when we'll see 16bit support ?

Dunno. This year, if I'm not too lazy.

On that topic, how does everyone feel about adding support for 16 bit input to Analyse by simply shifting it to 8 bit? Filters like the Degrains and Compensate will work with 16 bit input directly, of course.

Mystery Keeper

3rd October 2014, 21:44

little pickle here, Mystery Keeper, what should I do if I want mdegrainn (tr=6) instead of mcompensate :(Then, of course, you would need to write the function that takes array of clips. But I think that's better than generating special multivector clips.

Mystery Keeper

3rd October 2014, 21:45

On that topic, how does everyone feel about adding support for 16 bit input to Analyse by simply shifting it to 8 bit? Filters like the Degrains and Compensate will work with 16 bit input directly, of course. Should be alright. At least that would be a start. Personally, I'm looking forward to DCT.

Are_

3rd October 2014, 21:50

foxyshadis

3rd October 2014, 22:09

On that topic, how does everyone feel about adding support for 16 bit input to Analyse by simply shifting it to 8 bit? Filters like the Degrains and Compensate will work with 16 bit input directly, of course.

Heck, you could probably get away with 4-bit for the most part. Obviously, dithering down would just introduce unwanted noise, but truncating should work fine. I look forward to it, someday!

feisty2

4th October 2014, 05:26

Mystery Keeper, I mean, the reason that large time radius mcompensate could be done this way is mcompensate only takes one vector clip once, if you have n vector clips, there would be n mcompensates, but mdegrain takes a lot of vector clips once basing on the time radius, so, I donno how to do the same thing to mdegrain like mcompensate

cretindesalpes

4th October 2014, 15:05

how does everyone feel about adding support for 16 bit input to Analyse by simply shifting it to 8 bit? Filters like the Degrains and Compensate will work with 16 bit input directly, of course.
Don’t waste your time at this, moreover it would be misleading. We can explicitly convert the high-bitdepth clips to 8 bits for analysis. Anyway, I think there is a definite benefit to run the analysis on 10–12 bits. I often remap the luma channel to increase the contrast is some specific ranges (generally the dark parts), and keeping 8 bits crunches other ranges, reducing the accuracy of the analysis on fine textures.

jackoneill

4th October 2014, 21:14

Don’t waste your time at this, moreover it would be misleading. We can explicitly convert the high-bitdepth clips to 8 bits for analysis. Anyway, I think there is a definite benefit to run the analysis on 10–12 bits. I often remap the luma channel to increase the contrast is some specific ranges (generally the dark parts), and keeping 8 bits crunches other ranges, reducing the accuracy of the analysis on fine textures.

Damn. No easy way out of it, huh.

chainik_svp

5th October 2014, 22:22

jackoneill

I really don't want to suggest anything but the code in SVPflow is really cleaned comparing to original MVTools ;)
Just compare a few numbers - as you already know all the magic is in "PlaneOfBlocks" cpp/h, and they're ~80 KB of code(*) in MVTools (and in your build too) BUT only 42 KB in SVPflow.

(*) huge commented blocks are also included

Also original MVTools loosing >= 20% of performance just for nothing...