Jinc Resizer - Avisynth Plugin [v0.1.1] [Archive]

innocenat

26th November 2013, 07:32

So I know a lot of people (including me) want the new Jinc resizer from madVR in Avisynth. So after some research and some help from madshi, here I present the Jinc Resizer

This plugin require Avisynth 2.6.

Download (https://github.com/innocenat/jinc-resize/releases)
GitHub repo (https://github.com/innocenat/jinc-resize)

IMPORTANT: Downscaling isn't currently implemented.

Changelog

v0.2
- Core rewritten to use quantized lookup table for coefficient
- Support SSE2, SSE3, AVX2 and FMA3.
- Basically, MUCH faster.

v0.1.1
- Binary now compile with ICC14
- More optimized code, should rune much faster.
- Thanks to tp7 and others for optimization tip.

v0.1
- Initial release

Documentation
The plugin currently exposes 3 functions:

Jinc36Resize(clip, int width, int height)
Jinc64Resize(clip, int width, int height)
Jinc144Resize(clip, int width, int height)
Jinc256Resize(clip, int width, int height)

Crop-style syntax like in internal resizers also works. You can control sub-pixel quantization option by quant_x and quant_y parameter, default is 256.

Note

It should be fast enough for encoding now, especially if you have Haswell CPU or newer. Though the quality without anti-ringing filter is debatable.

Roadmap

Add anti-riging filter
Possibly add OpenCL version

burfadel

26th November 2013, 07:51

Looks interesting :) Will be looking forward to when the stuff in the roadmap is completed.

If making an OpenCL version, it would probably be worth adding related things like sharpen and deband of some description. I know Madshi is currently implementing debanding in MadVR. I suggest this because a small amount of sharpening and debanding can be a good thing when resizing, and might as well get that done on the graphics side of things.

PetitDragon

26th November 2013, 09:01

OMG! This is just f*cking great!:thanks:

innocenat

26th November 2013, 10:53

v0.1.1 released, with more optimization that should runs at least two times faster.

Gavino

26th November 2013, 11:28

Thanks for this, innocenat - a useful addition.

I notice that only planar YUV clips are currently supported.
Will support for YUY2 and RGB be included eventually?

Also worth noting it is a 2.60 plugin only.
Any reason why it couldn't be made to work on 2.58 too?

innocenat

26th November 2013, 11:37

I notice that only planar YUV clips are currently supported.
Will support for YUY2 and RGB be included eventually?

I don't think I will add support for YUY2 and RGB data soon. The problem is that interleaved format is hard to load quickly in SIMD. And the filter is already slow as it is.

Also worth noting it is a 2.60 plugin only.
Any reason why it couldn't be made to work on 2.58 too?

First, a lot of things I use differ from 2.5 header and 2.6 header, such as accepting floating point parameter, get subsampling detail etc. Not that it couldn't be done, but unless there are actually people who want it, I don't think it's worth the effort. Also, there might be AVX optimization in future, and you'd require Avs+ or post-a5 Avisynth 2.6 anyway.

Second, I think most users of Jinc Resizers will be hardcore user and will probably be using 2.6 already anyway.

Keiyakusha

26th November 2013, 11:42

After putting it into plugins directory, I'm getting this, even if JincResizer is not used:

The program can't start because libmmd.dll is missing from your computer. Try reinstalling the program to fix this problem.

innocenat

26th November 2013, 11:52

After putting it into plugins directory, I'm getting this, even if JincResizer is not used:

Sorry, binaries updated.

Keiyakusha

26th November 2013, 12:14

Well, for me on core i7, when doing 720x480 -> 1920x1080, Jinc36 is 2-3 times (depending on content complexity due to prescreener) slower than icl-compiled nnedi3 with fturn (4x upscale + bicubic downscale) and the visual quality is obviously worse.
Edit: nnedi was set to use only 1 thread. I hope I'm not sleeping again, cause as of late I'm getting weird results when trying new avisynth resamplers ^^

turbojet

26th November 2013, 12:24

Thanks for this, do you plan on implementing downsizing?

innocenat

26th November 2013, 13:34

@turbojet

Downscaling will be added later, that's for sure.

@Keiyakusha

Yes, nnedi3_rpow2 should be much better choices right now. Maybe until I implement an AR filter and add a prescreener and/or OpenCL version.

Keiyakusha

26th November 2013, 14:05

Yes, nnedi3_rpow2 should be much better choices right now. Maybe until I implement an AR filter and add a prescreener and/or OpenCL version.
Mmm... but if this is exactly the same stuff as in madvr, it won't be able to beat nnedi quality-wise even with AR. It simply doesn't connects lines as good. And judging from the feedback in madvr thread, Jinc downscale sux and sticking to something like spline36 is a better idea. Also now that we have OpenCL nnedi3, do you think Jinc is still will be competitive? So far Jinc performs exactly as I was afraid of...

innocenat

26th November 2013, 14:10

Mmm... but if this is exactly the same stuff as in madvr, it won't be able to beat nnedi quality-wise even with AR. It simply doesn't connects lines as good. And judging from the feedback in madvr thread, Jinc downscale sux and sticking to something like spline36 is a better idea. Also now that we have OpenCL nnedi3, do you think Jinc is still will be competitive? So far Jinc performs exactly as I was afraid of...

It's nice to have options, after all.

Personally, I don't care if people will use it over nnedi3ocl; coding this is fun, and help me learn a lot of thing.

mandarinka

26th November 2013, 20:30

Oh, this could be very interesting.

Correct me if I am wrong, but this really shouldn't be that much slow, no? Since it is still just a resampler, not a complex interpolator like nnedi variants and other EDI methods, I would expect it to be faster.

madshi

26th November 2013, 20:47

Jinc has to be slower than other linear resamplers because due to how it works (can't be split into 2 separate passes) it has to read and process more source pixels. E.g. compared to Lanczos3, Jinc3 has to read and process 4x as many source pixels.

One thing to keep in mind is that NNEDI3 can only do exact 2x enlargements while Jinc can handle any up- and downscale factor you want.

innocenat

27th November 2013, 03:18

In addition to what madshi said, other traditional resampler can also be normalize to integer operation and has their coefficients cached prior to actual resampling.

Jinc and other 2d resampler unfortunately can't, and has to calculate everything in floating point without any coefficients cache.

cretindesalpes

27th November 2013, 09:57

BlankClip (width=640, height=480, pixel_type="YV12")
ShowFrameNumber ()
Jinc64Resize (1920, 1080)

The result is often corrupted:
http://s29.postimg.org/reh24qtcz/jincresize.jpg (http://postimg.org/image/reh24qtcz/)
Sometimes it works, but there is no obvious pattern.

I quickly hacked a version with precalculated coefficients (https://ldesoras.fr/src/avs/JincResize-0.1.1-quick-hack.zip), compiled with MSVC 2012. It’s faster but very memory hungry (don’t even think about Jinc256 on a HD frame). Enable the precalculation with table=true. However the memory usage can be easily reduced when the scale is a rational fraction or by accepting a small phase error (sub-pixel shift), which would allow using a same set of coefficients for different locations.

EDIT: Second attempt (https://ldesoras.fr/src/avs/JincResize-0.1.1-quick-hack2.zip) with position quantization (1/256). Much more memory-friendly and faster. Use pquant=true to enable the quantization.

innocenat

27th November 2013, 10:27

Yes, a Jinc36Resize on 1920x1080 frame would produce roughly 490MB worth (49 coeff * 4 bytes (float) * 1920x1080 pixel). That's why I don't even think about doing that. I guess I could add option for that. The precalculated coeff version could be simplify further by using integer operation instead of floating point (accepting a small rounding error).

Regarding the error, I guess it's because some reckless assumption I made concerning memory layout. Will have to investigate further.

Nevilne

27th November 2013, 10:44

Anti-ringing filter:

Function nrJinc36Resize(clip input, int "target_width", int "target_height", float "src_left", float "src_top", float "src_width", float "src_height"){

Assert( input.IsPlanar(), "nrJinc36Resize: only planar color spaces are supported!" )

target_width = Default( target_width, width(input) )
target_height = Default( target_height, height(input) )
src_left = Default( src_left, 0 )
src_top = Default( src_top, 0 )
src_width = Default( src_width, 0 )
src_height = Default( src_height, 0 )

return input.Jinc36Resize(target_width, target_height, src_left, src_top, src_width, src_height)
\ .Repair(input.GaussResize(target_width, target_height, src_left, src_top, src_width, src_height, p=100), 1)

}

Gavino

27th November 2013, 11:17

src_width = Default( src_width, 0 )
src_height = Default( src_height, 0 )

The built-in defaults for src_width and src_height are not zero, they are respectively width(input) and height(input). In the case where src_left or src_top is non-zero, this gives a different result.

If you want your function to have the same defaults as the actual resizers, the simplest way is just to pass on the function parameters src_left, src_top, src_width and src_height unchanged. That way, you don't even need to know what the actual default is.

NicolasRobidoux

9th January 2014, 21:06

...
And judging from the feedback in madvr thread, Jinc downscale sux and sticking to something like spline36 is a better idea.
...
I don't know, specifically, how it performs with video content, but fully implemented (possibly slightly deblurred, as it is in madVR---where it only upsamples---and by the EWA LanczosSharp method of ImageMagick---where it can do everything) Jinc-windowed Jinc 3 lobe is an amazing downsampler with digital photographs, esp. if used through linear light.

NicolasRobidoux

9th January 2014, 21:12

I know I am preaching for my own church of one, but a cheaper EWA filter that works fairly well, at least for downsampling, is the Robidoux filter, which is the EWA default (chosen by Anthony Thyssen, the dev. responsible for resampling, not by me) in ImageMagick. Given that it uses a disk of radius 2 (instead of a bit more than 3 like most EWA Lanczos variants) and that the coefficients are computed with a Keys spline, it should run quite a bit faster.
Don't get me wrong: With good quality input content (and output not overly compressed destructively), EWA Lanczos and close relatives gives better results with the types of images I work with (not video). But EWA Robidoux may give good bang for the buck.

madshi

9th January 2014, 21:14

Jinc downscaling is just not (yet) implemented in madVR. It might be in the future. I have no specific opinion about how it performs yet. I'll cross that bridge when I come to it.

NicolasRobidoux

9th January 2014, 21:15

If you base your look up table on the square of the distance, you should be able to live with a 1D array.

NicolasRobidoux

9th January 2014, 21:21

It should be possible to base a pretty nice AR (applied to Jinc-windowed Jinc 3-lobe, deblurred or not) on the key components of the bicubic interpolator LBB (Locally Bounded Bicubic) built into the NoHalo method (found in the open source GEGL and VIPS libraries).

innocenat

11th January 2014, 05:28

You should really consider editing your post instead of adding new reply.

Downscale isn't implement because it requires dynamic disk size. The more downscale ratio, the larger the disk need to be. Compared to upscale, where disk size is constant, this poses more problem for optimization. The performance, at least for this filter, runs directly proportional to disk size, because the coefficient is calculate with LUT.

What I am currently working on:
- Integrating the quantized-table-lookup approach from cretindesalpes, without using STL container like he is using.
- Figuring out the the deringing algorithm.
- Figuring out what cause non-deterministic behaviour (sometimes it produces wrong result)

I will consider other EWA-based scaling too; the code is written in such way that it's easy to add other resizing kernel.

The reason I have been so slow on this is because real-life, and I have been spending free time optimizing Avisynth+ internal resizer.

NicolasRobidoux

11th January 2014, 15:48

Downscale isn't implement because it requires dynamic disk size. The more downscale ratio, the larger the disk need to be. Compared to upscale, where disk size is constant, this poses more problem for optimization. The performance, at least for this filter, runs directly proportional to disk size, because the coefficient is calculate with LUT.
I am fully aware that the main bottleneck is memory traffic when downsampling. A reasonably elegant solution, based on power of 2 box filtered mipmaps, is built into the GEGL EWA components of the filters NoHalo and LoHalo. Whether this approach could pay off within Avisynth, I have no idea. It obviously depends on whether (local) mipmaps can be created and used efficiently.

NicolasRobidoux

11th January 2014, 16:02

...
- Figuring out the the deringing algorithm.
...
It is my hunch that anti-ringing is much less useful when downsampling. At least, much less useful than going through linear light.

Seedmanc

12th January 2014, 13:53

Is it better than spline144resize?

Asmodian

15th January 2014, 05:50

Depends of course. I would say yes but you need to say what you mean by "better". Jinc is not sharper but it has lower aliasing and ringing.

innocenat

15th January 2014, 05:53

Jinc is not sharper but it has lower aliasing and ringing.

lolno. Jinc (and all lanczos-based resizer) has hallelujah ringing. But less aliasing, yes.

Asmodian

15th January 2014, 19:45

Ah it must have been madshi's anti-ringing filter which came out at a similar time that gave me the impression Jinc had less ringing. :o

innocenat

27th July 2014, 16:43

Long time no update. I presented v0.2. Thanks to cretindesalpes idea, now it runs much faster.

I have rewritten the core to be only quantized, table-lookup coefficient only. Even though this is very slightly less accurate, it's not noticeable at all and it runs MUCH faster, on my Haswell laptop the 360p->720p upscale with Jinc36Resize runs at near 40fps. It now has acceleration for SSE2, SSE3, AVX2 and FMA3. While FMA3 offer very slight to unnoticeable advantage over AVX2, AVX2 is huge leap over SSE3, which is also huge leap over SSE2. Non-deterministic behaviour should have also disappeared.

Sorry, anti-ringing filter is not yet implemented.

I have also add Jinc144Resize which is 6-tap filter. You can control quantization option by quant_x and quant_y option. Default is 256 for both value.

zerowalker

30th July 2014, 20:26

What's the main difference between this and Lanczos / Spline?

I haven't used MadVR much so haven't got any experience with the options there.

DarkSpace

30th July 2014, 20:59

I think the answer you're looking for is "Lanczos / Spline scale width and height separately, while Jinc uses some sort of elliptical weight averaging, which means it scales both width and height in one step" (it's also called EWA Lanczos with certain parameters).

zerowalker

30th July 2014, 23:13

And, how does that change the result?
I mean, what is the point of this?

innocenat

31st July 2014, 01:42

And, how does that change the result?
I mean, what is the point of this?

Please read the thread. It has already been answered.

zerowalker

31st July 2014, 02:34

Guess i am blind or assuming a different kind of answer, as i can't actually find anything.
I just see the differences in how it calculates, but not the actual difference in the results qualitywise.

If perhaps, not this is the answer.
lolno. Jinc (and all lanczos-based resizer) has hallelujah ringing. But less aliasing, yes

madshi

31st July 2014, 07:05

Jinc is a bit softer than Lanczos, has a bit less ringing than Lanczos (but still some), but has noticeably less aliasing. Jinc has a more "analog" look to it. Even the ringing looks more natural compared to Lanczos. But some people prefer Lanczos because it's a bit sharper. Also Jinc is much slower than Lanczos because Jinc can't separate X and Y scaling operations.

Here's a comparison image from madVR, upscaling Monsters AG to 4K:

http://madshi.net/madVR/monsters.png

jpsdr

31st July 2014, 07:09

After, it's a matter of personnel choice, but mine goes to NNEDI3.

madshi

31st July 2014, 07:28

NNEDI3 is quite great, IMHO. It does have disadvantages, too, though. E.g. in some image areas (trees, grass, leaves) it can produce fractal like artifacts. That doesn't happen with Jinc. Also NNEDI3 is a lot slower than Jinc, when using madVR. I'm not sure how speed compares in AviSynth, though. Finally, NNEDI3 can only upscale by exactly 2.0x, while Jinc can up and downscale with any factor you like. So although NNEDI3 is great, there's a place for Jinc, too, IMHO.

zerowalker

31st July 2014, 18:03

Hmm, well guess i say the same, Lanczos looks similar, but has more artifacts.
Jic looks less detailed, so softer indeed.

But i think it depends on content, some things looks better in a soft detailed scale, and harder need that sharp edge.

innocenat

31st July 2014, 18:33

Also NNEDI3 is a lot slower than Jinc, when using madVR. I'm not sure how speed compares in AviSynth, though.

In my test (on Mobile Haswell, which uses AVX2 that is ~25% faster than SSE3 on same CPU), Jinc36 720p->1080p runs at ~20fps. With nnedi3 doubling with spline64 to 1080p average to 3-5fps prescreener on, <1fps off. I can't test the OpenCL version because I can't force it to use my external graphic over integrated (I'm on optimus setup).

So yeah, right now performance is MUCH better. But tbh, because of current optimization I don't think it will be nearly as fast on CPU when anti-ringing filter is implemented due to branchy nature of the filter. I am looking into OpenCL right now, but no promise since I don't have much free time nowadays.

Reel.Deel

1st August 2014, 14:13

Hi innocenat,

Thanks for the update. I was wondering what's the purpose of the version parameter? When I set it to true it gives this message.
[Jinc Resizer] [7] Compiled Instruction Set: FMA3 AVX2 SSE3 SSE2 x86

innocenat

1st August 2014, 14:32

It currently show Jinc's internal CPU flag (the [7]) and instruction support it compiles with. I was not sure if AVX2 will really be faster at first so I make this mechanism to tell which version of plugin you have. The Jinc's internal CPU flag is because Avisynth and Avisynth+ cannot currently detect FMA3 and AVX2. Granted, its name is misnomer.

Reel.Deel

1st August 2014, 14:36

Thanks for the information. One more question if you don't mind. Whats the license for JincResize? Apache 2.0 license? The reason I'm asking is because I want to add JincResize it to the wiki.

innocenat

1st August 2014, 14:44

Put it as Apache 2.0 I guess.

The Jinc function calculation (JincFilter.cpp) are Apache 2.0 since they are from ImageMagick. The main resampling code I wrote (EWAResizer.h, FilteredEWAResize.cpp, etc.) is also under MIT license. But the combination (i.e. the project itself) is under Apache 2.0. I guess should put a LICENSE file on the repository.

On a side note, you might encounter line artefact with large upscaling factor. It can be fixed by increasing quant_(x|y) option, depending on what direction the line is. I am still not sure if this is bug in my code, or limitation of quantization. I think it's the former, but I still can't pinpoint it yet.

The code on the GitHub actually now support downscaling, but I haven't thoroughly check/test it for correctness yet.

Groucho2004

1st August 2014, 23:16

I decided to play around with this a little which usually includes getting the code and compiling it myself. I used VC10/ICL13 with PGO to build the DLL.

Test script:
SetMemoryMax(1700)
LoadPlugin("JincResize.dll")

w = 1280
h = 720
colorbars(width = w, height = h, pixel_type = "yv12").killaudio().assumefps(24000, 1001)
trim(0,99)
fadeio(49)
trim(0,99)
v = last

a = v.Jinc36Resize(1920, 1080)
b = v.Jinc64Resize(1920, 1080)
c = v.Jinc144Resize(1920, 1080)
d = v.Jinc256Resize(1920, 1080)

return a++b++c++d

Results with innocenat's DLL:
Frames processed: 400 (0 - 399)
FPS (min | max | average): 3.965 | 18.09 | 7.212
CPU usage (average): 25%
Thread count: 1
Physical Memory usage (peak): 1328 MB
Virtual Memory usage (peak): 1327 MB
Time (elapsed): 000:00:55.463

Results with my DLL:
Frames processed: 400 (0 - 399)
FPS (min | max | average): 3.998 | 18.29 | 7.263
CPU usage (average): 25%
Thread count: 1
Physical Memory usage (peak): 644 MB
Virtual Memory usage (peak): 643 MB
Time (elapsed): 000:00:55.075

This was tested on a i5 2500K @ 4GHz (on XP, so AVX was not used.)

The speed is more or less the same but the memory usage is less than half with the DLL I built, no idea why.

FYI, Here is the makefile I used to build the DLL:
CPP=@icl.exe
CPP_FLAGS=/MT /EHa /W0 /O3 /Qipo /arch:IA32 /Qprof-use /D "NDEBUG" /nologo

LINK=@xilink.exe
LINK_FLAGS=/dll /nologo

JincResize.dll: JincFilter.obj AvisynthEntry.obj cpuid.obj FilteredEWAResize.obj
$(LINK) $(LINK_FLAGS) JincFilter.obj AvisynthEntry.obj cpuid.obj FilteredEWAResize.obj /out:JincResize.dll

JincFilter.obj: JincFilter.cpp
$(CPP) $(CPP_FLAGS) JincFilter.cpp -c

AvisynthEntry.obj: AvisynthEntry.cpp
$(CPP) $(CPP_FLAGS) AvisynthEntry.cpp -c

cpuid.obj: cpuid.cpp
$(CPP) $(CPP_FLAGS) cpuid.cpp -c

FilteredEWAResize.obj: FilteredEWAResize.cpp
$(CPP) $(CPP_FLAGS) FilteredEWAResize.cpp -c

innocenat

2nd August 2014, 01:22

I decided to play around with this a little which usually includes getting the code and compiling it myself. I used VC10/ICL13 with PGO to build the DLL.

I use VC12/ICL14 right now. I am surprised it works on WinXP, though, since I did not select vc120_xp as a base platform, though it is statically compiled.

This was tested on a i5 2500K @ 4GHz (on XP, so AVX was not used.)

There are no AVX code anymore, it's AVX2 only now so you require Haswell. I might try to see if SSE2 integer pack/unpack and AVX processing is faster than pure SSE3, but I doubt that.

The speed is more or less the same but the memory usage is less than half with the DLL I built, no idea why.

This commit (https://github.com/AviSynth/jinc-resize/commit/ef39dab27340a82e6d8762a5712af3842f8ef811) is not in the release built yet.

EDIT: Also, FYI my official built are build with /arch:SSE btw, but important functions are #pragma to specific instruction set anyway (which is SSE minimum)

Groucho2004

2nd August 2014, 10:11

This commit (https://github.com/AviSynth/jinc-resize/commit/ef39dab27340a82e6d8762a5712af3842f8ef811) is not in the release built yet.
I see, that might explain the difference.

Also, FYI my official built are build with /arch:SSE btw, but important functions are #pragma to specific instruction set anyway (which is SSE minimum)
Just checked the ICL13 documentation, "arch:IA32" is the same as "arch:SSE".