PDA

View Full Version : Could Jensen-Data Dependent resizing algorithms be implemented in Avisynth?


Chainmax
10th May 2007, 01:13
A couple of days ago, I was asking a very knowledgeable person about worthwhile interpolation algorithms for use in video editing. He mentioned that the most balanced compromise of speed and quality would be J-DDL3 (i.é: Jensen-Data Dependent Lanczos3) since other more complex algorithms like Jensen-Zhao-Xin-Li and Modified B-Spline are too slow and won't produce much extra benefits due to DVDs not providing good enough image quality to take advantage of better methods. So, I was wondering if there were publicly available papers or algorithms for Jensen-Data Dependent interpolation that the devs here could use to implement J-DDL4 or J-DDLSP36 (i.é: with Spline36).

*.mp4 guy
11th May 2007, 04:17
My 2 cents, If someone can implement Jenson-**** , you would probably want to use some basic masking to restrict it to high contrast areas, it can screw over fine texture really badly.

Chainmax
12th May 2007, 02:26
The hardest part will probably be finding papers regarding that, I couldn't find anything so far :(. In any case, could TMMv1's masking be used for what you're saying?

*.mp4 guy
12th May 2007, 04:44
The masking will be really really easy, I was thinking a range edge mask via masktools, plus 2 inflates, which should be more then good enough. The only difficult thing will be finding someone who can (and wants to) write the actual plugin.

Chainmax
14th May 2007, 22:07
I found the paper. Hopefully some of you devs will want to take this on :).

[link removed, didn't work]

IanB
15th May 2007, 01:51
http://www.bestsharing.com/files/imU8CTQ276607/subpixel%20edge%20localization%20and%20interpolation.pdf.html

404 Not Found Error

Chainmax
15th May 2007, 03:08
That's weird, I didn't get any error message when uploading it there.

I put it on some free webspace I have available, you can download it [link removed]. Please let me know when you get it so I can delete it and free up the space.

[edit]
IanB, foxyshadis: please mirror the PDF somewhere so that others can take a look at it.

IanB
15th May 2007, 08:20
Got It.

Now we need a paper on how to implement it fast. On first pass this looks like it relies on a stack of trig calcs to do it's thing.

foxyshadis
15th May 2007, 12:34
I was going to point towards an image interpolation forum aruzinsky would post at, but it seems to have disappeared from the web (or at least google). He often posts small implementation details, though he carefully avoids any details that could lead someone to compete with SAR. Much of what's on dpreview is just flamewars (he's not the most tactful guy...).

Chainmax
15th May 2007, 22:45
IanB: how slow would you estimate it to be? Would a preliminary vanilla implementation be much slower than, say, MCBob?

MfA
16th May 2007, 00:16
I don't see why modified B-spline would have to be slow, it's just linear ... so it's only going to be so so.

IanB
16th May 2007, 00:42
How slow, no idea, but even 1 sin and cos per group of 3x3 pixels could be unpleasant. I need to see a programatic expression of the algorithm.

A further problem might be the stability of the edge detection from frame to frame. If it is not stable then you will get shimmering.

jmac698
16th May 2007, 01:36
Comparison of all popular methods: http://www.general-cathexis.com/interpolation.html

I like Xin-Li the best, otherwise known as NEDI (New Edge Directed Interpolation, http://www.csee.wvu.edu/%7Exinl/papers/NEDI.pdf). This has already been implemented in EDIUpsizer (http://web.missouri.edu/~kes25c/#c1).
I find the others either have aliasing/ringing or painted brush strokes look. I also find the common measures are unrelated to subjective quality.

MfA
16th May 2007, 02:28
Mr. Aruzinsky didn't specify, but if the modified b-spline implementation is the one from here :
http://www.ee.bilkent.edu.tr/~signal/BCSP/gotchev.pdf

Then the implementation comes down to applying in each dimension 2 2 tap IIR filters and a bog standard 4 tap interpolation filter. That's not going to break the bank.

It's a variation on Michael Unser's b-spline interpolation, for which a good description can be found here :
http://bigwww.epfl.ch/publications/thevenaz9901.pdf

And code here :
http://bigwww.epfl.ch/thevenaz/interpolation/

The only problem is the IIR filters depend on the interpolation filter and the paper which contains the math to find them is illegible to me.

I find the method described in the following paper more interesting though (same complexity) :
http://bigwww.epfl.ch/publications/condat0501.pdf

tritical
16th May 2007, 10:15
Here is what I gathered as far as complexity is concerned from reading the paper Chainmax linked to... It requires on the low resolution image to process a 3x3 window around each pixel. The calculation for each pixel involves convolving its 3x3 window with 5 fixed, 3x3 masks to calculate variables lambda_0-lambda_4. These are then used to calculate the angle of the edge, distance to the edge, and the intensity values on either side of the edge, along with an error measure for quality of fit. The calculations involve 1 arctan, 1 arccos, 2 cos, and 3 sin calculations, along with some multiplications/additions/subtractions. Once that is done you have the low resolution edge map (the results from overlapping 3x3 windows are to be combined somehow). Resampling to another resolution then involves finding a weighted average of the intensity values calculated previously based on whether the desired point is completely on one side of an edge or is intersected by an edge. The weighted average is a function of distance from the edge and has a parameter that can be varied to control sharpening. That description leaves out some implementation details they mention such as checking the center pixel's value against the calculated intensity values, requiring a minimum step magnitude for edges, etc... I think this could be pretty fast assuming fast approximations to the trig functions are used.

My question is just how much better is this method than bicubic or lanczos? Specifically, their edge direction calculation (arctan of dy/dx where dy and dx are computed using a simple 3x3 gradient operator that is somewhere between prewitt and sobel) seems very error prone... especially for those edges that would benefit the most from non-linear resampling. Not to mention that a 3x3 window isn't large enough to fully capture many edges (in terms of accurate direction calculation). There isn't much in the way of results reported in the paper.

Chainmax
16th May 2007, 12:43
That's probably what *.mp4 guy was referring to about it possibly screwing up fine textures. He suggested some masking, maybe TMMv1's code can be reused for that?

jmac698: of course there are better methods, like Jensen-Zhao-Xin Li, but Mr. ruzinsky told me anything more complicated than Jensen Data Dependent methods would be overkill as DVD sources just don't have enough image detail/quality to take advantage of them.

MfA: do you think Modified B-Spline would be implementable if we could find out how to implement the IIR filters (what's that?)?

jmac698
16th May 2007, 14:00
chain: see my comparison link and judge for yourself. I think Jensen technique is inferior, and that NEDI is best, and any statement about DVD not being good enough makes no sense. It's all relative; the exact resolution makes no difference, so you can't make any statement about DVD. Sometimes people make these papers and claim the new technique is best, because of measurements, but clearly the measurements are not reliable, and the result is thick ink lines look which loses textures.

MfA
16th May 2007, 16:02
MfA: do you think Modified B-Spline would be implementable if we could find out how to implement the IIR filters (what's that?)?
It's the same complexity as "standard" b-spline interpolation, shouldn't be too hard to do in real time.

The problem with NEDI is that it needs hack upon hacky hack to work robustly.

Chainmax
16th May 2007, 22:50
chain: see my comparison link and judge for yourself. I think Jensen technique is inferior, and that NEDI is best, and any statement about DVD not being good enough makes no sense. It's all relative; the exact resolution makes no difference, so you can't make any statement about DVD. Sometimes people make these papers and claim the new technique is best, because of measurements, but clearly the measurements are not reliable, and the result is thick ink lines look which loses textures.

Well, what I posted is what Mr. Ruzinsky himself told me when asking him about JZXLvsJDDL. The differences in that comparison are visible because the source image is of extremely high quality (48bit PNG). He also said that the best algorithms in SAR are really only worthwhile for Foveon sensor images.


MfA: so no B-Spline then. That's too bad.

*.mp4 guy
22nd May 2007, 02:20
Function NEDIC(Clip Clp){
NEDI = clp.EDIupsizer(method=9, mvar=0, constraint=0, threshold4=1024, threshold8=2048, threshold16=-1)#.NEDI()
W=NEDI.Width()
H=NEDI.Height()
NEDIC = NEDI.Spline36Resize(W, H, -0.5, -0.5, W, H)
return (NEDIC)}


function Lnedi(clip clp){

c = clp

dx = c.width*2
dy = c.height*2
#dx = default(dx, c.width*2)
#dy = default(dy, c.height*2)

n1 = NEDIC(clp)
t1 = lanczosresize(clp, dx, dy, taps=1)
t2 = lanczosresize(clp, dx, dy, taps=2)
t3 = lanczosresize(clp, dx, dy, taps=3)
t4 = lanczosresize(clp, dx, dy, taps=4)
t5 = lanczosresize(clp, dx, dy, taps=5)
t6 = lanczosresize(clp, dx, dy, taps=6)
t7 = lanczosresize(clp, dx, dy, taps=7)
t8 = lanczosresize(clp, dx, dy, taps=8)
t9 = lanczosresize(clp, dx, dy, taps=9)
t10 = lanczosresize(clp, dx, dy, taps=10)
t11 = lanczosresize(clp, dx, dy, taps=11)

nm1 = yv12lutxy(clp, lanczosresize(n1, clp.width, clp.height, taps=5), "x y - abs", u=3, v=3)
m1 = yv12lutxy(clp, lanczosresize(t1, clp.width, clp.height, taps=5), "x y - abs", u=3, v=3)
m2 = yv12lutxy(clp, lanczosresize(t2, clp.width, clp.height, taps=4), "x y - abs", u=3, v=3)
m3 = yv12lutxy(clp, lanczosresize(t3, clp.width, clp.height, taps=3), "x y - abs", u=3, v=3)
m4 = yv12lutxy(clp, lanczosresize(t4, clp.width, clp.height, taps=2), "x y - abs", u=3, v=3)
m5 = yv12lutxy(clp, lanczosresize(t5, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)
m6 = yv12lutxy(clp, lanczosresize(t6, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)
m7 = yv12lutxy(clp, lanczosresize(t7, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)
m8 = yv12lutxy(clp, lanczosresize(t8, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)
m9 = yv12lutxy(clp, lanczosresize(t9, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)
m10 = yv12lutxy(clp, lanczosresize(t10, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)
m11 = yv12lutxy(clp, lanczosresize(t11, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)

cp1 = maskedmerge(t2, n1, lanczosresize(yv12lutxy(m1, m2, "x y - 256 *", u=3, v=3), dx, dy, taps=2))
m12 = yv12lutxy(clp, lanczosresize(cp1, clp.width, clp.height, taps=5), "x y - abs", u=3, v=3)

cp2 = maskedmerge(cp1, t3, lanczosresize(yv12lutxy(m12, m3, "x y - 256 *", u=3, v=3), dx, dy, taps=2))
m13 = yv12lutxy(clp, lanczosresize(cp2, clp.width, clp.height, taps=4), "x y - abs", u=3, v=3)

cp3 = maskedmerge(cp2, n1, lanczosresize(yv12lutxy(m13, m4, "x y - 256 *", u=3, v=3), dx, dy, taps=2))
m14 = yv12lutxy(clp, lanczosresize(cp3, clp.width, clp.height, taps=3), "x y - abs", u=3, v=3)

cp4 = maskedmerge(cp3, t5, lanczosresize(yv12lutxy(m14, m5, "x y - 256 *", u=3, v=3), dx, dy, taps=2))
m15 = yv12lutxy(clp, lanczosresize(cp4, clp.width, clp.height, taps=2), "x y - abs", u=3, v=3)

cp5 = maskedmerge(cp4, n1, lanczosresize(yv12lutxy(m15, m6, "x y - 256 *", u=3, v=3), dx, dy, taps=2))
m16 = yv12lutxy(clp, lanczosresize(cp5, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)

cp6 = maskedmerge(cp5, t7, lanczosresize(yv12lutxy(m16, m7, "x y - 256 *", u=3, v=3), dx, dy, taps=2))
m17 = yv12lutxy(clp, lanczosresize(cp6, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)

cp7 = maskedmerge(cp6, n1, lanczosresize(yv12lutxy(m17, m8, "x y - 256 *", u=3, v=3), dx, dy, taps=2))
m18 = yv12lutxy(clp, lanczosresize(cp7, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)

cp8 = maskedmerge(cp7, t9, lanczosresize(yv12lutxy(m18, m9, "x y - 256 *", u=3, v=3), dx, dy, taps=2))
m19 = yv12lutxy(clp, lanczosresize(cp8, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)

cp9 = maskedmerge(cp8, n1, lanczosresize(yv12lutxy(m19, m10, "x y - 256 *", u=3, v=3), dx, dy, taps=2))
m20 = yv12lutxy(clp, lanczosresize(cp9, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)

cp10 = maskedmerge(cp9, n1, lanczosresize(yv12lutxy(m20, m11, "x y - 256 *", u=3, v=3), dx, dy, taps=2))
m21 = yv12lutxy(clp, lanczosresize(cp10, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)

#cp11 = maskedmerge(cp10, n1, lanczosresize(yv12lutxy(m21, m1, "x y - 256 *", u=3, v=3), dx, dy, taps=5))
#m22 = yv12lutxy(clp, lanczosresize(cp11, clp.width, clp.height, taps=1), "x y - abs", u=3, v=3)

#cp12 = maskedmerge(cp11, t3, lanczosresize(yv12lutxy(m22, m1, "x y - 256 *", u=3, v=3), dx, dy, taps=5))
#m23 = yv12lutxy(clp, lanczosresize(cp12, clp.width, clp.height, taps=3), "x y - abs", u=3, v=3)

#cp13 = maskedmerge(cp12, t2, lanczosresize(yv12lutxy(m23, m1, "x y - 256 *", u=3, v=3), dx, dy, taps=5))
#m24 = yv12lutxy(clp, lanczosresize(cp13, clp.width, clp.height, taps=3), "x y - abs", u=3, v=3)

return(mergechroma(cp10, spline36resize(clp, dx, dy), 1))}

A Hacky Hack of a script, attempts to intelligently switch between low/medium/high tap lanczos and nedi upsizing by measuring the differences between the varius upsizing methods and the original downsized image, and using the interpolator that causes the least measurable distortion. Not much slower then plain Nedi, comparable line quality to J-DDL and Nedi, comparable texture quality to lanczos3.

This script was just an experiment that turned out well, I don't plan to do anything else with it, but it works well.

IanB
22nd May 2007, 03:11
Whats with the taps=1 and taps=2? Values less than 3 are not recommended.

Taps=1 gives a "bent" bilinear.

Taps=2 gives an illformed bicubic.

*.mp4 guy
22nd May 2007, 04:13
Whats with the taps=1 and taps=2? Values less than 3 are not recommended.

Taps=1 gives a "bent" bilinear.

Taps=2 gives an illformed bicubic.

Because, for some parts of the image, taps=1, or taps=2 may give the best results (taps=2 doesn't ring, it only halos, taps=1 doesn't ring or halo, so they are usefull for some parts of the image). That said, I don't use 1 tap upsizing in the output, only for generating metrics, because it causes mosaic effects even in low contrast areas. I use 2 taps for upsizing the masks because it doesn't cause misquito noise, only blurs and halos, and is sharper and has less aliasing then bilinear, along with being fast. Do you get better results with different internal parameters?

Chainmax
22nd May 2007, 23:38
Sounds great, I'll try it soon. It should get its own thread though, and a comparison between it, MultiSWAR and LanczosPlus would be great to see.

*.mp4 guy
23rd May 2007, 00:06
Well its just a different aproach to hacking nedi to be more usefull in real situations, I don't think its interesting enough to get its own thread.

As far as comparisons go, nedi has the smoothest lines, lanczosplus has the sharpest lines, and multiswar has the sharpest texture.

Chainmax
3rd July 2007, 23:07
So no one is interested/has time to dabble in it? That's a shame :(.

*.mp4 guy
6th July 2007, 04:14
I'm interested, but I don't have enought math knowledge (or programming experience) to do anything usefull.