Log in

View Full Version : single precision MVTools plugin (stable)


Pages : [1] 2 3

feisty2
27th August 2015, 10:01
binary (x64 for winnt):https://github.com/IFeelBloated/vapoursynth-mvtools-sf/releases/tag/r10_pre

source code: https://github.com/IFeelBloated/MVTools_SF/tree/master

I'm feeling super duper awesometastic cuz, yeah, I'm insane enough to learn C++ by hacking this big fat monster plugin :)

namespace: mvsf.xxx

anyways, a few things I gotta say here:
1. currently available functions: Super, Analyze, Recalculate, Compensate and Degrain1/2/3, didn't add flow functions yet cuz I got some doubts about truemotion
2. SAD, SCD stuff are floats now (default: thSAD=400.0 (200.0 for Recalculate and 10000.0 for Compensate), thSCD1=400.0, thSCD2=130.0)
3. "limit" in Degrain is float now, and with a range of 0.0 - 1.0, 0.0=no filtering, 1.0=no limit
4. "isse" is removed cuz, well, I mean over 80% of the sse code won't even work on uint16_t clips...
5. I ain't figured out how the hell that dct stuff actually works, so please don't use it (keep dct=0) for now
6. "Analyse" got a new name now! and it's called "Analyze" :D , I'm American and "analyse" always gets autocorrected and that's not nice, now it's "Analyze" so good news to Americans and Canadians, but the old "Analyse" still works for compatibility reasons and British blokes ;) , so you got the freedom to choose "mvsf.Analyze" or "mvsf.Analyse" and both will work

EDIT:
test2
1. Added SATD support (dct=5 works now)
2. Added mvsf.Finest (flow functions will be ready soon)

EDIT2:
test3
1. Fixed possible overflow in SATD
2. Removed SATD for 8x4 16x8 8x16 blocks, no one uses them anyways
3. Added SATD support to 32x32 blocks
4. SATD for 16x16 blocks is corrupted in the original vaporsynth port, fixed now

EDIT3:
test4
1. Added mvsf.FlowBlur (you got a full floating point QTGMC now, if you want to)
2. better SATD precision

EDIT4:
test5
1. Added mvsf.BlockFPS (someone, maybe, will ever use this thing?)

EDIT5:
test6
1. Fixed the crash of BlockFPS on GRAY clips

EDIT6:
test7
1. all 10 modes of dct are working now, "libfftw3f-3.dll" needs to be placed at the same folder with mvtools

EDIT7:
test8
1. Added mvsf.FlowFPS, wanna do some really fancy floating point precision slo-mo stuff? try it!
2. Added mvsf.FlowInter

EDIT8:
test9
1. Added mvsf.SCDetection

EDIT9:
test10
1. Added mvsf.Degrain4/5/6, these are the strict straight extensions of Degrain1/2/3, not like the approximate copycat python script

EDIT10:
test11
1. Binary Part: Extended Degrain to Degrain24 (24, it's my lucky number!)
2. Resurrected vmulti features from MVTools 2.6.0.5, implemented via a python module, "tr" works up to 24, guess no one will ever use a time radius > 24.... maybe?
3. Resurrected StoreVect and RestoreVect from MVTools 2.6.0.5, implemented via a python module

vmulti demos:
1. DegrainN

import vapoursynth as vs
import mvmulti
core = vs.get_core()
clp = xxx
sup = core.mvsf.Super(clp)
vec = mvmulti.Analyze(sup,tr=6,blksize=8,overlap=4)
vec = mvmulti.Recalculate(sup,vec,tr=6,blksize=4,overlap=2)
clp = mvmulti.DegrainN(clp, sup, vec, tr=6)
clp.set_output()


2. Compensate/Flow

import vapoursynth as vs
import mvmulti
core = vs.get_core()
clp = xxx
sup = core.mvsf.Super(clp)
vec = mvmulti.Analyze(sup,tr=6,blksize=8,overlap=4)
vec = mvmulti.Recalculate(sup,vec,tr=6,blksize=4,overlap=2)
clp = mvmulti.Compensate/Flow(clp, sup, vec, tr=6)
clp.set_output()


3.StoreVect (Return a vector clip that could be encoded by vspipe)
vec.vpy

import vapoursynth as vs
import mvmulti
core = vs.get_core()
clp = xxx
sup = core.mvsf.Super(clp)
vec = mvmulti.Analyze(sup,tr=6,blksize=8,overlap=4)
vec = mvmulti.Recalculate(sup,vec,tr=6,blksize=4,overlap=2)
vec = mvmulti.StoreVect(vec,"D:/vec.txt")
vec.set_output()

vspipe.exe vec.vpy D:\vec.rgb

4.RestoreVect (Restore the encoded vector clip back to a standard vector clip)

import vapoursynth as vs
import mvmulti
core = vs.get_core()
vec = mvmulti.RestoreVect("D:/vec.rgb","D:/vec.txt")


EDIT11:
test12
this one is, like, kinda free from runtime problems, the binary works without msvcr dlls, and silenced a warning in Overlap.cpp

EDIT12:
test13
precision boost
1. SAD (float -> double)
2. SATD (int16_t -> double)
3. DCT (uint8_t -> even more precise than float)
and also features some cosmetic changes from @ jackoneill

EDIT13:
test14
A. Full Precision Boost
1. DCT (float -> double)
2. Motion Analysis (float -> double)
3. Super (float -> double)
4. Overlap (float -> double)
5. Variance (float -> double)
6. Degrain (float -> double)
and more...
basically everything works at double precision, rounded to single precision only at the final output stage
binary compiled with strict floating point model settings (100% same like how IEEE defined how floating point calculation works)
B. Bug Fixes
fixed a bug inherited from the avisynth plugin (bit shift operation on negative values, reported by @Are_ via runtime debugging)

libfftw3-3.dll (not libfftw3f-3.dll) needs to be placed at the same folder with the plugin!!!

EDIT14:
test15
A. Colorspace
all floating point colorspaces are supported now, GrayS, RGBS and YUV4xxPS (note that dct 1-4 on YUV clips might be kind of buggy, as chroma features a different range from luma, will be fixed in the next release)
B. Degrain
the stupid "thsadc" and "limitc" parameters got their asses canceled, "thsad" and "limit" have been made arrays
"plane" parameter won't do nothing on RGB and GRAY clips, all planes will be processed
C. stuff, here and there..
bug fixes shamelessly copied from @jackoneill

EDIT15:
r1
first stable release!!!
A. sanity check.
will raise an error if the input is not single precision fp or features varying dimensions
B. DCT
fixed dct stuff on YUV input

EDIT16:
r2
merged bug fixes from jackoneill's branch since his last release
currently no plan to add depanning stuff

EDIT17:
r3
A. BlockFPS
1. added support to overlap (merged from Fizick's master branch)
2. new modes, mode 6-8, occlusion mask weighted on SAD (merged from Fizick's master branch)
B. Compensate
1. new parameter "time", use it to do partial time compensation (merged from Fizick's master branch)
C. Bug Fixes
1. vector length was clamped to 127/pel on motion flow functions, now it's 2147483647/pel, practically unlimited (Fizick relaxed it to 32767/pel, I decided to do it more thoroughly)
D. Precision Boost
1. internal masking for motion flow (uint8_t -> double)
2. simpleresize for masks (uint8_t -> double)
floating point precision MMask should be super easy to implement now (all internal stuff are double already), but I didn't do it anyways like, yeah, I'm all fucked up lazy

EDIT18:
r4
New Filter
binary: added mvsf.Flow
mvmulti: added mvmulti.Flow

EDIT19:
r5
Bug Fixes
1. truemotion was corrupted(bug inherited from jackoneill's branch), fixed.
2. the SATD implementation was completely incorrect, did some research and rewrote that from the beginning, SATD works correctly now

EDIT20:
r6:
new block sizes: 2x2, 64x64, 64x32, 128x128, 128x64, 256x256, 256x128
switched to fftw3.3.5

EDIT21:
r7
Bug Fix
fixed a clip length calculation bug in BlockFPS, reported by groucho86 (http://forum.doom9.org/showthread.php?p=1785268#post1785268)
New Feature
extended SATD to 64x64 128x128 and 256x256 blocks
Uncategorized
replaced Hadamard ordered SATD with the Sequency ordered variant, levels faster..

EDIT22:
r8
New Feature
mvsf.Mask
Uncategorized
converted some ugly C89 style code to C++14 style

EDIT22:
r9
fixed an ancient memory leak in mvsf.Super
converted some weird C++98 code to C++14

EDIT23:
major update:
- the mvmulti python module is now deprecated, all mvmulti stuff has been embedded into the C++ plugin.
- mvsf.Degrain can now handle arbitrary radius (not limited to 24)
- mvsf.Degrain1/Degrain2/.../Degrain24 are removed, the only MDegrain function is now mvsf.Degrain, which works for any radius.
- mvsf.Analyse is removed, type "Analyze" instead
- new parameter "radius" for mvsf.Analyze, when specified, mvsf.Analyze generates a compound vector clip that works for mvsf.Degrain/Compensate/Flow/Recalculate
- mvsf.Compensate/Flow/Recalculate automatically output compound results when provided a compound vector clip
- mvsf.Degrain automatically deduces the radius from the compound vector clip, you don't need to specify the radius
- when "radius" is specified for mvsf.Analyze, "isb" and "delta" are ignored.
- new parameter "cclip" for mvsf.Compensate/Flow, same as in the mvmulti python module, only takes effect for compound outputs.


#MDegrainN
sup = core.mvsf.Super(clip)
vec = core.mvsf.Analyze(sup, radius=6, overlap=4)
vec = core.mvsf.Recalculate(sup, vec, blksize=4, overlap=2)
clip = core.mvsf.Degrain(clip, sup, vec, thsad=400)

#motion compensated dfftest
sup = core.mvsf.Super(clip)
vec = core.mvsf.Analyze(sup, radius=6, overlap=4)
vec = core.mvsf.Recalculate(sup, vec, blksize=4, overlap=2)
clip = core.mvsf.Compensate(clip, sup, vec)
clip = core.dfttest.DFTTest(clip, tbsize=2*6+1, tmode=0)
clip = core.std.SelectEvery(clip, 2*6+1, 6)


you need a C++20 compatible compiler and vsFilterScript (https://forum.doom9.org/showthread.php?t=181027) to build the binary.
the windows binary is currently unavailable because msvc does not support tons of C++20 core language features.

EDIT24:
feature update:
- new parameter "thsad2" for mvsf.Degrain
- new parameter "thsad2" for mvsf.Compensate, only takes effect for compound output

"thsad2" enables cosine annealing along time dimension for MDegrain and MCompensate, I'm sure many of you have been longing for this feature from the avs MVTools, there you have it!

EDIT25:
cumulative update:
- I merged every single bug fix from jackoneill's branch for the last 4 years.
- VectorStructure::sad has been promoted to double.

EDIT26:
trivial update:
- the "limit" parameter of mvsf.Degrain now defaults to infinity. it still follows the [0.0, 1.0] range, however out-of-range samples are allowed for floating point clips, which makes infinity the only true "unlimited" bound.

Groucho2004
27th August 2015, 12:33
I'm American and "analyse" always gets autocorrected and that's not nice
Do you write scripts on your phone?

feisty2
27th August 2015, 15:06
Do you write scripts on your phone?

http://i.imgur.com/uwRDrGc.png
"autocorrect" is available on PC also and besides, I just don't like "analyse", it looks creepy to me, "analyze" is way better

feisty2
28th August 2015, 16:59
test2: https://github.com/IFeelBloated/MVTools_SF/releases/tag/test2
1. Added SATD support (dct=5 works now)
2. Added mvsf.Finest (flow functions will be ready soon)

feisty2
29th August 2015, 09:20
test3: https://github.com/IFeelBloated/MVTools_SF/releases/tag/test3
1. Fixed possible overflow in SATD
2. Removed SATD for 8x4 16x8 8x16 blocks, no one uses them anyways
3. Added SATD support to 32x32 blocks
4. SATD for 16x16 blocks is corrupted in the original vaporsynth port, fixed now

feisty2
3rd September 2015, 16:25
test4:https://github.com/IFeelBloated/MVTools_SF/releases/tag/test4
1. Added mvsf.FlowBlur (you got a full floating point QTGMC now, if you want to)
2. better SATD precision

feisty2
4th September 2015, 08:11
test5:https://github.com/IFeelBloated/MVTools_SF/releases/tag/test5
1. Added mvsf.BlockFPS (someone, maybe, will ever use this thing?)

feisty2
4th September 2015, 10:01
test6:https://github.com/IFeelBloated/MVTools_SF/releases/tag/test6
1. Fixed the crash of BlockFPS on GRAY clips

feisty2
5th September 2015, 15:22
test7:https://github.com/IFeelBloated/MVTools_SF/releases/tag/test7
1. all 10 modes of dct are working now, "libfftw3f-3.dll" needs to be placed at the same folder with mvtools

feisty2
6th September 2015, 16:33
test8:https://github.com/IFeelBloated/MVTools_SF/releases/tag/test8
1. Added mvsf.FlowFPS, wanna do some really fancy floating point precision slo-mo stuff? try it!
2. Added mvsf.FlowInter

feisty2
7th September 2015, 10:18
test9:https://github.com/IFeelBloated/MVTools_SF/releases/tag/test9
1. Added mvsf.SCDetection

feisty2
13th September 2015, 14:50
test10:https://github.com/IFeelBloated/MVTools_SF/releases/tag/test10
1. Added mvsf.Degrain4/5/6, these are the strict straight extensions of Degrain1/2/3, not like the approximate copycat python script

feisty2
15th September 2015, 08:28
test11:https://github.com/IFeelBloated/MVTools_SF/releases/tag/test11
1. Binary Part: Extended Degrain to Degrain24 (24, it's my lucky number!)
2. Resurrected vmulti features from MVTools 2.6.0.5, implemented via a python module, "tr" works up to 24, guess no one will ever use a time radius > 24.... maybe?
3. Resurrected StoreVect and RestoreVect from MVTools 2.6.0.5, implemented via a python module

Boulder
21st September 2015, 04:19
Now that DGDecodeNV has a 64-bit build to test, I also tried your plugin. I cannot get it to work, it only produces garbage on the screen.

https://drive.google.com/file/d/0BzeF_1syecQwdkFEN2t1YmswYXM/view?usp=sharing

import vapoursynth as vs
import havsfunc as has

core = vs.get_core()

clp = core.dgdecodenv.DGSource('c:/x265/hotfuzz.dgi')

clp = core.fmtc.bitdepth(clp, bits=16)

feed = has.DitherLumaRebuild(clp)

superanalyse = core.mvsf.Super(feed,pel=2)
supermdg = core.mvsf.Super(clp,pel=2)

bv1 = core.mvsf.Analyse(superanalyse, dct=0, blksize=16, overlap=8, isb=True)
fv1 = core.mvsf.Analyse(superanalyse, dct=0, blksize=16, overlap=8)

finalclip = core.mvsf.Degrain1(clp, supermdg, bv1, fv1, thsad=300, thsadc=300, limit=1.0, limitc=1.0)

finalclip.set_output()

feisty2
21st September 2015, 04:22
It works at 32bits float (aka Single Precision) ONLY

Boulder
21st September 2015, 04:23
Why not make a sanity check then?

feisty2
21st September 2015, 04:27
Cuz I already said "single precision" at the title of the thread:)
And I'm lazy like hell
Anyways, it's an addition not a replacement to jackoneill's mvtools

feisty2
24th September 2015, 14:49
test12:https://github.com/IFeelBloated/MVTools_SF/releases/tag/test12
this one is, like, kinda free from runtime problems, the binary works without msvcr dlls, and silenced a warning in Overlap.cpp

sl1pkn07
11th October 2015, 10:55
you can add a "configure" for build the plugin?

greetings

feisty2
11th October 2015, 15:05
you can add a "configure" for build the plugin?

greetings

wish I could, but I'm just no good at gnu stuff...
guess the configure files from jackoneill's version will just work, but no guarantee about that..
I'd be happy to add those configure files to the master branch if you'd like to make them..

feisty2
13th October 2015, 14:42
@sl1pkn07
thanks to Are_'s help, the compiling files are available now

sl1pkn07
13th October 2015, 15:00
zankius!

feisty2
21st January 2016, 15:43
https://github.com/IFeelBloated/vapoursynth-mvtools-sf/releases/tag/test13
test13
precision boost
1. SAD (float -> double)
2. SATD (int16_t -> double)
3. DCT (uint8_t -> even more precise than float)
and also features some cosmetic changes from @ jackoneill

feisty2
5th February 2016, 17:24
https://github.com/IFeelBloated/vapoursynth-mvtools-sf/releases/tag/test14
test14
A. Full Precision Boost
1. DCT (float -> double)
2. Motion Analysis (float -> double)
3. Super (float -> double)
4. Overlap (float -> double)
5. Variance (float -> double)
6. Degrain (float -> double)
and more...
basically everything works at double precision, rounded to single precision only at the final output stage
B. Bug Fixes
fixed a bug inherited from the avisynth plugin (bit shift operation on negative values, reported by @Are_ via runtime debugging)

Myrsloik
5th February 2016, 17:38
https://github.com/IFeelBloated/vapoursynth-mvtools-sf/releases/tag/test14
test14
A. Full Precision Boost
1. DCT (float -> double)
2. Motion Analysis (float -> double)
3. Super (float -> double)
4. Overlap (float -> double)
5. Variance (float -> double)
6. Degrain (float -> double)
and more...
basically everything works at double precision, rounded to single precision only at the final output stage
B. Bug Fixes
fixed a bug inherited from the avisynth plugin (bit shift operation on negative values, reported by @Are_ via runtime debugging)

I'm curious now. Did you compare the speed between your single and double precision versions? I suspect you've made it even slower now.

feisty2
5th February 2016, 17:51
I'm curious now. Did you compare the speed between your single and double precision versions? I suspect you've made it even slower now.

the speed is only a little bit slower than single precision version long as FFTW stays out of business, but drops heavily (~2x slower on my i7-4790k) if FFTW gets involved (dct=1-4)

MonoS
7th February 2016, 13:04
Probably you'll see some speed difference if you enable auto vectorization [on GCC -O3 and -march=corei7-avx]. Right now it isn't any slower because latency between float and double is the same [5 cycles on the latest intel generation], the little speed difference is probably due to memory access, fftw is 2x slower due to loss in vectorization [in a 128bit vector register can fit 4 floats but only 2 doubles].

I'll be more interested to know if this precision boost is really useful

feisty2
8th February 2016, 08:30
I'll be more interested to know if this precision boost is really useful

floats lose precision as the calculation goes on, and giant plugin like mvtools comes with hell lot of calculations, double precision is required to do the intermediate stuff if you want the final output features a real single precision, otherwise it's just low precision data in a single precision format..

MonoS
12th February 2016, 23:43
floats lose precision as the calculation goes on, and giant plugin like mvtools comes with hell lot of calculations, double precision is required to do the intermediate stuff if you want the final output features a real single precision, otherwise it's just low precision data in a single precision format..

Sorry if i ask, but i'd like to see some real world example.

While i understand the using of single precision for this kind of calculation, i think that double is rather overkill.
But let make clear that this my statement is not supported by actual facts, i'm willing to change my mind [but please, try to make some better example than the one you did on fmtc thread]

feisty2
8th April 2016, 17:09
https://github.com/IFeelBloated/vapoursynth-mvtools-sf/releases/tag/test15
test15
A. Colorspace
all floating point colorspaces are supported now, GrayS, RGBS and YUV4xxPS (note that dct 1-4 on YUV clips might be kind of buggy, as chroma features a different range from luma, will be fixed in the next release)
B. Degrain
the stupid "thsadc" and "limitc" parameters got their asses canceled, "thsad" and "limit" have been made arrays
"plane" parameter won't do nothing on RGB and GRAY clips, all planes will be processed
C. stuff, here and there..
bug fixes shamelessly copied from @jackoneill

feisty2
9th April 2016, 09:40
so I been testing, RGB colorspace has been supported but,
I found motion estimating under RGB is not some kind of nice thing to do, I mean, it works, but the quality sucks comparing to just do it under YUV 4:4:4
convert your RGB input to YUV before applying mvtools on it, or, anyways, mvtools will happily take the RGB input and feed you back with some shitty motion estimation if you give it like some RGB input directly

feisty2
16th April 2016, 13:24
https://github.com/IFeelBloated/vapoursynth-mvtools-sf/releases/tag/r1
r1
first stable release!!!
A. sanity check.
will raise an error if the input is not single precision fp or features varying dimensions
B. DCT
fixed dct stuff on YUV input

Mystery Keeper
16th April 2016, 19:37
Works awesomely. Thank you!

feisty2
6th June 2016, 13:53
r2
merged bug fixes from jackoneill's branch since his last release
currently no plan to add depanning stuff

feisty2
24th June 2016, 14:26
r3
A. BlockFPS
1. added support to overlap (merged from Fizick's master branch)
2. new modes, mode 6-8, occlusion mask weighted on SAD (merged from Fizick's master branch)
B. Compensate
1. new parameter "time", use it to do partial time compensation (merged from Fizick's master branch)
C. Bug Fixes
1. vector length was clamped to 127/pel on motion flow functions, now it's 2147483647/pel, practically unlimited (Fizick relaxed it to 32767/pel, I decided to do it more thoroughly)
D. Precision Boost
1. internal masking for motion flow (uint8_t -> double)
2. simpleresize for masks (uint8_t -> double)
floating point precision MMask should be super easy to implement now (all internal stuff are double already), but I didn't do it anyways like, yeah, I'm all fucked up lazy

Mystery Keeper
25th June 2016, 00:16
You say it is possible to write QTGMC analog for floating point. Can you please give a code to replace TemporalSoften?

feisty2
5th July 2016, 09:23
r4
New Filter
binary: added mvsf.Flow
mvmulti: added mvmulti.Flow

~SimpleX~
30th July 2016, 13:11
VS Editor crashes on simple

import vapoursynth as vs
core = vs.get_core()

src = core.d2v.Source(...)
last = core.fmtc.bitdepth(src, bits=32, fulls=False, fulld=True)
super = core.mvsf.Super(last)
last.set_output()


Latest test version of VS r33, mvsf r4.

feisty2
30th July 2016, 13:42
VS Editor crashes on simple

import vapoursynth as vs
core = vs.get_core()

src = core.d2v.Source(...)
last = core.fmtc.bitdepth(src, bits=32, fulls=False, fulld=True)
super = core.mvsf.Super(last)
last.set_output()


Latest test version of VS r33, mvsf r4.

does your CPU feature the AVX2 extension?

~SimpleX~
30th July 2016, 13:49
does your CPU feature the AVX2 extension?

Oh, well, it doesn't. I have i7 2600k, which is Sandy Bridge. :(

feisty2
30th July 2016, 13:52
Oh, well, it doesn't. I have i7 2600k, which is Sandy Bridge. :(

then you have to compile a compatible binary yourself, auto compiling files for GCC are provided with the source code

kolak
30th July 2016, 21:15
Is there a real gain in quality due to such a high precision implementation?

feisty2
31st July 2016, 07:46
Is there a real gain in quality due to such a high precision implementation?

guess there's quality gain more or less, but mainly for being free of clamping (more mathematically correct) in floating point format rather than higher precision

kolak
31st July 2016, 12:29
I've tried it and have not seen any "real" difference.

feisty2
31st July 2016, 20:10
I've tried it and have not seen any "real" difference.

you might not see any "real" difference between int8 and int16 most of the time, not to even mention int16 and float32

I started this for my OCD about placebo precision, wanted it to be as mathematically correct as possible, but the actual reason that makes this useful is there're HDR videos out there, HDR videos contain colors even brighter than white or even darker than black, and integers couldn't handle stuff like that, and floats could.

kolak
1st August 2016, 12:15
I'm not trying to say your work is worthless.
Nothing wrong with having options as there may be cases when it may be useful (as you already mentioned).

feisty2
3rd August 2016, 16:18
r5
Bug Fixes
1. truemotion was corrupted(bug inherited from jackoneill's branch), fixed.
2. the SATD implementation was completely incorrect, did some research and rewrote that from the beginning, SATD works correctly now

feisty2
3rd August 2016, 16:25
@~SimpleX~
r5 features no SIMD extension, try it and it will probably not crash this time

~SimpleX~
4th August 2016, 00:13
I've made some stupid compatibility layer for mv and mvsf (https://gist.github.com/SX91/9e65814384a9f23535077366996097ea) (mvmulti, 8..32 bit support, incomplete function set). Maybe it would be useful for someone.

Myrsloik
4th August 2016, 00:40
I've made some stupid compatibility layer for mv and mvsf (https://gist.github.com/SX91/9e65814384a9f23535077366996097ea) (mvmulti, 8..32 bit support, incomplete function set). Maybe it would be useful for someone.

You should check sample_type, not the number of bits