Log in

View Full Version : nnedi3 plugin for VapourSynth


Pages : 1 [2] 3

lansing
30th April 2015, 01:33
the separate link for nnedi3_weights.bin file is broken in the download page

jackoneill
30th April 2015, 10:47
the separate link for nnedi3_weights.bin file is broken in the download page

Thanks. Fixed it.

lansing
30th April 2015, 20:07
it works now, the underscore is missing in the downloaded bin file name, and I'm also getting a "libwinpthread-1.dll is missing" error message

jackoneill
30th April 2015, 23:02
it works now, the underscore is missing in the downloaded bin file name, and I'm also getting a "libwinpthread-1.dll is missing" error message

The link is in the v3 release notes, so I made it point to the version of the file that works with v3, which has a space instead of underscore. I'll add links to the v4 and v5 release notes too.

I uploaded new builds of v5. The error message is no more.

lansing
10th May 2015, 00:09
I'd encounter problem again. I got a "no attribute with the name nnedi3 exists" error with v5. With v4, the video loads fine but crash on seek.

Are_
10th May 2015, 00:39
Try with vapoursynth R27.

lansing
10th May 2015, 01:47
I installed vapoursynth R27 rc1 and it gave me a series of error on startup on plugins

VapourSynth plugins manager: Failed to get pointer to the plugin Plugin1!
VapourSynth plugins manager: Failed to get pointer to the plugin Plugin10!
...

jackoneill
10th May 2015, 11:28
I installed vapoursynth R27 rc1 and it gave me a series of error on startup on plugins

VapourSynth plugins manager: Failed to get pointer to the plugin Plugin1!
VapourSynth plugins manager: Failed to get pointer to the plugin Plugin10!
...


That comes from the editor. Please report it here: http://forum.doom9.org/showthread.php?t=170965&page=5

Does it still crash when seeking?

jeremy33
15th May 2015, 11:59
Hello,

I try to use nnedi3 with VapourSynth on Linux and this is what I get :

http://img11.hostingpics.net/thumbs/mini_464396screenshot.jpg (http://www.hostingpics.net/viewer.php?id=464396screenshot.jpg)

What can do this ?
I'm on Netrunner 14 x64 (Kubuntu derivative) with nvidia 349.16 drivers, MPV 0.91 and VapourSynth 27 from this ppa https://launchpad.net/~djcj/+archive/ubuntu/vapoursynth

jackoneill
15th May 2015, 12:08
Hello,

I try to use nnedi3 with VapourSynth on Linux and this is what I get :

http://img11.hostingpics.net/thumbs/mini_464396screenshot.jpg (http://www.hostingpics.net/viewer.php?id=464396screenshot.jpg)

What can do this ?
I'm on Netrunner 14 x64 (Kubuntu derivative) with nvidia 349.16 drivers, MPV 0.91 and VapourSynth 27 from this ppa https://launchpad.net/~djcj/+archive/ubuntu/vapoursynth

Pretty. Please post your VapourSynth script. Also, what version of nnedi3 is that? Does it happen with opt=False?

jeremy33
15th May 2015, 12:20
This is my test script :
import vapoursynth as vs
core = vs.get_core()
clip = video_in
clip = core.nnedi3.nnedi3_rpow2(clip, 2)
clip.set_output()
It happen with opt=False.
It's nnedi3 from this ppa https://launchpad.net/~djcj/+archive/ubuntu/vapoursynth?field.series_filter=trusty

Are_
15th May 2015, 13:28
This is not happening here with nvidia 349.16 and mpv, nnedi3 and vapoursynth from git. Does this happen outside of mpv?

jeremy33
15th May 2015, 13:37
What do you mean by saying "Does this happen outside of mpv" ?

sl1pkn07
15th May 2015, 14:25
with vapoursinth-editor or directly with pipe to x264

jeremy33
15th May 2015, 14:35
I try with vspipe but I don't find how to use a file as a clip ?

jackoneill
15th May 2015, 14:36
Okay, more info, please:
- exact resolution of the input video
- YUV? RGB? Something else?
- subsampling
- bit depth

text.ClipInfo() displays all of this.

jeremy33
15th May 2015, 14:47
This is the output of text.ClipInfo()

VideoNode
Format: YUV420P8
Width: 1280
Height: 544
Num Frames: 134217727
FPS Num: dynamic
FPS Den: dynamic
Flags: Is Cache, No Cache

jackoneill
15th May 2015, 16:17
Try this script:

def nnedi3_rpow2(src):
clip = src
clip = c.nnedi3.nnedi3(clip, field=1, dh=1, nsize=0, nns=3)
clip = c.std.Transpose(clip)
clip = c.nnedi3.nnedi3(clip, field=1, dh=1, nsize=0, nns=3)
clip = c.std.Transpose(clip)

if src.format.subsampling_h == 1:
clip = c.fmtc.resample(clip, kernel="spline36", sy=-0.5, planes=[2, 3, 3])

# if correct_shift:
clip = c.fmtc.resample(clip, kernel="spline36", sx=-0.5, sy=-0.5)

clip = c.fmtc.bitdepth(clip, csp=src.format.id)

return clip

import vapoursynth as vs
core = vs.get_core()
clip = video_in
#clip = core.nnedi3.nnedi3_rpow2(clip, 2)
clip = nnedi3_rpow2(clip)
clip.set_output()


The output should be just as broken as before. After you confirm that it's still broken, comment out the fmtc lines, one by one, starting with the last one, to see if it's any of them causing the problem.

jeremy33
15th May 2015, 16:50
Ok I tried and it is these 2 lines that cause the problem :

clip = c.fmtc.resample(clip, kernel="spline36", sy=-0.5, planes=[2, 3, 3])
clip = c.fmtc.resample(clip, kernel="spline36", sx=-0.5, sy=-0.5)

If I remove it, it work. If I only remove sy=-0.5 and sx=-0.5, sy=-0.5 it work to.

jackoneill
15th May 2015, 16:58
Ok I tried and it is these 2 lines that cause the problem :

clip = c.fmtc.resample(clip, kernel="spline36", sy=-0.5, planes=[2, 3, 3])
clip = c.fmtc.resample(clip, kernel="spline36", sx=-0.5, sy=-0.5)

If I remove it, it work. If I only remove sy=-0.5 and sx=-0.5, sy=-0.5 it work to.

Excellent. It appears the bug is in fmtconv, not nnedi3. Please complain here: http://forum.doom9.org/showthread.php?t=166504&page=6

jeremy33
15th May 2015, 17:01
Ok thanks ;)

jackoneill
28th May 2015, 12:15
NNEDI3 now has NEON optimisations!

I was curious about NEON (and somewhat bored), so I translated the "hottest" SSE2 functions into NEON intrinsics. Testing on an ODROID-U2 shows that it is ~3.6 times faster with 8 bit input and default parameters, and ~2.1 times faster with 16 bit input and default parameters, compared to plain C code.

captainadamo
28th May 2015, 18:58
Cool. Did you have a specific script you were using for benchmarking? I'm curious to see what profiling it on my Air 2 looks like. Also there might be some ARMv8 instructions that can even be used to speed some stuff up even more. Such as the intrinsics for SADDLV/UADDLV in AArch64 which allows you to sum up all the values across the vector in a single instruction which I noticed provided a decent performance gain over having to do it the ARMv7 way.

jackoneill
28th May 2015, 19:35
Cool. Did you have a specific script you were using for benchmarking? I'm curious to see what profiling it on my Air 2 looks like. Also there might be some ARMv8 instructions that can even be used to speed some stuff up even more. Such as the intrinsics for SADDLV/UADDLV in AArch64 which allows you to sum up all the values across the vector in a single instruction which I noticed provided a decent performance gain over having to do it the ARMv7 way.

No particular script, just pass opt=False when you need to disable the new code.

Did you try to use these ARMv8 instructions in nnedi3?

captainadamo
28th May 2015, 19:48
No particular script, just pass opt=False when you need to disable the new code.

Ok.

Did you try to use these ARMv8 instructions in nnedi3?

Not yet, pulling down your latest code now and will see what benefit I can get.

feisty2
11th June 2015, 16:23
so most internal functions work at float point according to the previous posts
any chance to make nnedi3 work at single float point precision also?

jackoneill
11th June 2015, 20:45
so most internal functions work at float point according to the previous posts
any chance to make nnedi3 work at single float point precision also?

There is a pretty good chance.

feisty2
12th June 2015, 07:01
There is a pretty good chance.

okay... gonna happen anytime soon?

jackoneill
12th June 2015, 11:32
okay... gonna happen anytime soon?

I don't know. I'm working on something else at the moment. How urgent is it? :wink wink:

feisty2
12th June 2015, 15:41
I don't know. I'm working on something else at the moment. How urgent is it? :wink wink:

take your time please, but sooner the better. :)

feisty2
24th June 2015, 07:49
I was float-izing nnedi3 and got into trouble and it got me weird and corrupted results
my guess, I did something wrong to "extract_m8_C" and I actually modified it like this
http://i.imgur.com/VtS2gd4.png
http://i.imgur.com/D1rlcqH.png
some tips to fix things plz?

jackoneill
24th June 2015, 11:21
Instead of screenshots of the code, please post the relevant part(s) from the output of "git diff". Actually, just commit the changes you have now and push them to your fork. I'll look at them there.

Also, screenshots from the source and from the corrupted output may help.

feisty2
24th June 2015, 14:00
I stripped lots of stuff like asm, 8bits support, nnedi3_rpow2 and other irrelevant things down to trace the problem
stripped working version (remains support for 16bits only) https://github.com/IFeelBloated/nnedi3float/blob/master/working.cpp
broken float point version (with compares to the stripped working version) https://github.com/IFeelBloated/nnedi3float/commit/5a52880df627c9aff5bb55134578e3d72f43de72

screenshots
source
http://i.imgur.com/I7PnWle.png

import vapoursynth as vs
core = vs.get_core()
clp = core.raws.Source ("Y.rgb",736,480,src_fmt="GRAYS")
clp = core.nnedi3.nnedi3 (clp,0,True)
clp.set_output ()

http://i.imgur.com/Mmr5YBI.png

edit: I just found an error minutes ago at line 213, corrected, I'll test it again and report back

feisty2
24th June 2015, 15:04
okay, still no luck with line213 corrected, now, that scan line kinda crap is gone, but things work really fast, even faster than the original version with asm opt, that gotta be wrong, so I turned pscrn to 0, and "scan line" came back instantly, that shows the "edi" interpolation is not working at all, it's just plain bicubic interpolation, I'm 80% sure problem comes from "extract_m8_C"

cretindesalpes
24th June 2015, 16:35
I just had a quick glance at the code, but are you sure the ranges of the mstd[] values are correct, or correctly scaled? They feed some non-linear stuff using exp, if the values are not in the expected range, you’ll run into massive overflows or null results. Checking the differences between 8- and 16-bit code may help.

feisty2
24th June 2015, 17:04
not sure... I'm done with direct methods, I'll turn float values to uint16_t within extract_m8_C to get correct mstd values

feisty2
24th June 2015, 17:45
okay, I did those 2 100.f changes, chroma clamping can wait for now, and it's not working
I tried to convert float to uint16_t within extract_m8_C to get correct mstd values
https://github.com/IFeelBloated/nnedi3float/commit/d9ebc0c2b98d08a6053d59578c89f877ef219a7f
not working still, anyways, I'm going out cuz it's like around 10AM, I'll get back to this when I get home

feisty2
25th June 2015, 13:53
still can't get it to work, I did another even more aggressive strip-down (removed pscrn and fapprox) to trace errors
"computeNetwork0", prescreener, removed
"processLine0", bicubic, removed
"elliott_C", prescreener helper, removed
"pixel2float48_C", prescreener helper, removed
"e0_m16_C", fapprox, removed
"e1_m16_C", fapprox, removed

now it's just the pure and raw actual "nnedi" core, all irrelevant stuff (opt... whatever) got kicked out
lite 16bits version (working) https://github.com/IFeelBloated/nnedi3float/blob/master/NNEDI3int16.cpp
lite 8bits version (working, with compares to 16bits version) https://github.com/IFeelBloated/nnedi3float/commit/fea017531c691da63db2c348b41034a9c439b381
lite float version (not working, with compares to 16bits version) https://github.com/IFeelBloated/nnedi3float/commit/e1d470a3968c1da95ba60fbed302e4185ec50886

@cretindesalpes, why that range stuff affects no shit to 8bits version, I modified that float version just like the 8bits one, but it just gives me shit
ah, everything I'm trying, fails
I'm freaking exhausted... :(

feisty2
20th August 2015, 05:46
yeah... I'm still on the mission of floatizing nnedi3
and yet I noticed "evalFunc_0" is the troubled stuff here
it should insert a white blank line between every row of the source clip cuz I removed "pscrn", and instead of that, it actually inserts black blank lines (= evalFunc_1, the actual nnedi3 is disabled)


template <typename PixelType>
void evalFunc_0(void **instanceData, FrameData *frameData)
{
ntestData *d = (ntestData*)* instanceData;

float *input = frameData->input;
const float *weights0 = d->weights0;
float *temp = frameData->temp;
uint8_t *tempu = (uint8_t*)temp;

// And now the actual work.
for (int b = 0; b < d->vi.format->numPlanes; ++b)
{
if ((b == 0 && !d->Y) ||
(b == 1 && !d->U) ||
(b == 2 && !d->V))
continue;

const PixelType *srcp = (const PixelType *)frameData->paddedp;
const int src_stride = frameData->padded_stride[b] / sizeof(PixelType);

const int width = frameData->padded_width[b];
const int height = frameData->padded_height[b];

PixelType *dstp = (PixelType *)frameData->dstp[b];
const int dst_stride = frameData->dst_stride[b] / sizeof(PixelType);

for (int y = 1 - frameData->field[b]; y < height - 12; y += 2) {
memcpy(dstp + y*dst_stride,
srcp + 32 + (6 + y)*src_stride,
(width - 64) * sizeof(PixelType));
}

const int ystart = 6 + frameData->field[b];
const int ystop = height - 6;
srcp += ystart*src_stride;
dstp += (ystart - 6)*dst_stride - 32;
const PixelType *src3p = srcp - src_stride * 3;
int32_t *lcount = frameData->lcount[b] - 6;
for (int y = ystart; y<ystop; y += 2)
{
[B]memset(dstp + 32, 255, (width - 64) * sizeof(PixelType));
lcount[y] += width - 64;
dstp += dst_stride * 2;
}
}
}


guess the "memset" stuff is just wrong for floats, 1.f is not FFFFFFFF obviously...
so how should I make it work?

EDIT:
so I removed the "pscrn" switch in evalFunc_1

template <typename PixelType>
void evalFunc_1(void **instanceData, FrameData *frameData)
{
ntestData *d = (ntestData*)* instanceData;

float *input = frameData->input;
float *temp = frameData->temp;
float **weights1 = d->weights1;
const int qual = d->qual;
const int asize = d->asize;
const int nns = d->nns;
const int xdia = d->xdia;
const int xdiad2m1 = (xdia / 2) - 1;
const int ydia = d->ydia;
const float scale = 1.0f / (float)qual;

for (int b = 0; b < d->vi.format->numPlanes; ++b)
{
if ((b == 0 && !d->Y) ||
(b == 1 && !d->U) ||
(b == 2 && !d->V))
continue;

const PixelType *srcp = (const PixelType *)frameData->paddedp;
const int src_stride = frameData->padded_stride[b] / sizeof(PixelType);

const int width = frameData->padded_width[b];
const int height = frameData->padded_height[b];

PixelType *dstp = (PixelType *)frameData->dstp[b];
const int dst_stride = frameData->dst_stride[b] / sizeof(PixelType);

const int ystart = frameData->field[b];
const int ystop = height - 12;

srcp += (ystart + 6)*src_stride;
dstp += ystart*dst_stride - 32;
const PixelType *srcpp = srcp - (ydia - 1)*src_stride - xdiad2m1;

for (int y = ystart; y<ystop; y += 2)
{
for (int x = 32; x<width - 32; ++x)
{
[B] //if (dstp[x] != 1.f)
//continue;

float mstd[4];
d->extract((const uint8_t *)(srcpp + x), src_stride, xdia, ydia, mstd, input);
for (int i = 0; i<qual; ++i)
{
d->dotProd(input, weights1[i], temp, nns * 2, asize, mstd + 2);
d->expfunc(temp, nns);
d->wae5(temp, nns, mstd);
}
dstp[x] = VSMIN(VSMAX((mstd[3] * scale), 0.f), 1.f);
}
srcpp += src_stride * 2;
dstp += dst_stride * 2;
}
}
}


now I got a working floating point nnedi3, nice!
more details at https://github.com/IFeelBloated/nnedi3float/blob/master/nnedi3sp.cpp
but I lose the compatibility of pscrn by doing that and it works reeaally slow
I want pscrn back later so I want an answer to that memset problem :)

jackoneill
20th August 2015, 08:30
If you're using the highlighted memset line, it's not setting the pixels to 1.0f, but to 0xffffffff, which apparently is NaN (not a number). You'll want to use vs_memset_float to actually set them to 1.0f: https://github.com/vapoursynth/vapoursynth/blob/master/src/core/filtershared.h#L46

foxyshadis
20th August 2015, 11:19
1.f is 0x3f800000, btw. If you're going to hack around with bytes like that you should use a hack like *(int*)(dstp+x) to read it back. (Or define a union.) But yeah, vs_memset_float already does what you want.

feisty2
20th August 2015, 15:02
works, thx for all your help :)

feisty2
21st August 2015, 06:13
G, nnedi3 without asm works outrageously slow, I decide to bring SSE2 "machine code" (like, technically) back to those untouched functions (computeNetwork0, weightedAvgElliottMul5_m16, e0_m16, e1_m16, e2_m16, dotProd)
now I got these:
https://github.com/IFeelBloated/NNEDI3SF/blob/master/NNEDI3SFSSE2.cpp
https://github.com/IFeelBloated/NNEDI3SF/blob/master/nnedi3.asm
https://github.com/IFeelBloated/NNEDI3SF/blob/master/x86inc.asm
but how can I connect them all together, tried lots of things in MSVC and all failed...

jackoneill
21st August 2015, 11:33
Can't you just incorporate your changes into nnedi3, without removing anything?

feisty2
21st August 2015, 12:00
Can't you just incorporate your changes into nnedi3, without removing anything?

wish I could, but.. I just don't have that much programming skill to do that, I'll simply strip the filter down piece by piece till the core function is the only thing left, then I'll modify it as how I want it to work and fix errors till it actually works..
it's sort of hard to get those removed back and make sure they will be okay along with the modified functions :(

feisty2
21st August 2015, 13:25
well, I couldn't get your asm file to work but this (https://github.com/jpsdr/NNEDI3/blob/master/nnedi3/nnedi3_asm_x64.asm) works perfectly...
so I did a (partial) SSE2 opt

jackoneill
8th September 2015, 11:29
v6 is here. (https://github.com/dubhater/vapoursynth-nnedi3/releases/tag/v6)


* Normalise the frame rate when doubling it.
* Add support for 32 bit floating point images.
* Deprecate the 'Y', 'U', and 'V' parameters in favour of the standard 'planes' parameter.
* Only allocate temporary buffers for the planes that are actually processed.
* Remove the nnedi3_rpow2 filter.

And a change that affects only (some) users of ARM CPUs:
* Add NEON intrinsics (translated from the x86 ASM). They make it go fast(er).


feisty2: Thanks for finding all the places in the code that needed changes to support floats. It made my job easier, even if I didn't use any of your code.

The Y, U, and V parameters still work, so your existing scripts won't break, but please don't use them in new scripts.

I removed nnedi3_rpow2 because it really should be a Python script, rather than a filter. And also to make it someone else's problem.

feisty2
8th September 2015, 11:46
wow! cool, I can throw away my rough idea nnedi3 to the trash can now,will you add floats to mvtools too or should I keep my own floating point mvtools?

jackoneill
8th September 2015, 12:56
will you add floats to mvtools too or should I keep my own floating point mvtools?

MVTools is much more awful, so no.

feisty2
8th September 2015, 13:05
http://i.imgur.com/5Xmeovw.png

import vapoursynth as vs
core = vs.get_core()
clp = xxx
clp = core.fmtc.bitdepth(clp, fulls=False, fulld=True, bits=32)
clp = core.std.ShufflePlanes(clp, planes=0, colorfamily=vs.GRAY)
clp = core.nnedi3.nnedi3(clp, 0, True)
clp.set_output ()

okay, I just replaced all "YUV" stuff with "planes"
but this new version is... somehow buggy
it gives me things like that image above at both uint16_t and float