Log in

View Full Version : Mathematical correctness is apparently a bad thing?!?


Katie Boundary
10th February 2017, 02:51
Today I shrank a clip, ran it through various upscaling algorithms, and looked at some PSNR and other numbers. Despite the rhetoric about how b + 2c is supposed to equal 1, the best cubics actually occurred at about c=1 and performed about as well as lanczos-3. Anti-aliasing and affine gradients can apparently go screw themselves because sharpness is king.

But that's not the part that surprised me.

After those tests, I tried a bunch of different downscaling methods, each of which was then upscaled back with lanczos-4. The results were the opposite of what I expected. Arearesize did the worst. Bilinear did the second-worst. Hermite and Simpleresize came roughly in the middle of the pack. Catrom did third-best, c=1 cubic did second-best, and lanczos-3 gave the best shrinks (I did not attempt to shrink with lanczos-4).

WTF?!?

I can understand why fake sharpness can be a good thing when upscaling. But why is it apparently a good thing when downscaling too?!?

feisty2
10th February 2017, 05:20
The following kernels work pretty well with downscaling but not upscaling:
bicubic (b=-1, c=0)
gauss (p=100)

Since aliasing is much less of a problem when downscaling and those 2 kernels are sharp and ringing-free

Katie Boundary
10th February 2017, 05:52
...did you even read the post?

Also, cubic filters can't have negative B values.

feisty2
10th February 2017, 07:13
cubic filters CAN have negative b values, it's not recommended for upscaling but it's indeed valid.

Katie Boundary
10th February 2017, 10:25
Quoth the AVIsynth documentation:

From c > 0.6 the filter starts to "ring". You won't get real sharpness, what you'll get is crispening like with a TV set sharpness control. Negative values are not allowed for b, use b = 0 for values of c > 0.5.

But that's really a distraction from the real subject of this thread, which is "why is mathematical correctness a bad thing, both when violating the b+2c=1 rule for cubics and when downscaling images in general?"

Sharc
10th February 2017, 12:21
Did you do the quality comparison after re-scaling the picture to its original resolution? I have some doubts that a quality comparison after re-scaling (up-down or down-up) is valid for concluding on the quality of the resizer which was used for the first step (up or down scaling), as you have two scaling processes in series, each performing some lowpass filtering on the source.

feisty2
10th February 2017, 14:22
Quoth the AVIsynth documentation:



But that's really a distraction from the real subject of this thread, which is "why is mathematical correctness a bad thing, both when violating the b+2c=1 rule for cubics and when downscaling images in general?"

Quoth the AVIsynth documentation:
http://avisynth.nl/index.php/BicubicResize#BicubicResize

As c exceeds 0.6, the filter starts to "ring" or overshoot. You won't get true sharpness – what you'll get is exaggerated edges. Negative values for b (although allowed) give undesirable results, so use b=0 for values of c > 0.5.



BicubicResize may be the most visually pleasing of the Resize filters for downsizing to half-size or less.doom9
Try the default setting, (b=0, c=0.75) as above, or (b= -0.5, c=0.25).

Katie Boundary
10th February 2017, 19:00
Did you do the quality comparison after re-scaling the picture to its original resolution?

I REALLY wish that people would read my posts before responding to them. I said "I tried a bunch of different downscaling methods, each of which was then upscaled back with lanczos-4"

I have some doubts that a quality comparison after re-scaling (up-down or down-up) is valid for concluding on the quality of the resizer which was used for the first step (up or down scaling), as you have two scaling processes in series, each performing some lowpass filtering on the source.

But why? Shouldn't a more garbled first pass create a more garbled second pass?

Quoth the AVIsynth documentation:
http://avisynth.nl/index.php/BicubicResize#BicubicResize

Silly me, reading the documentation that actually comes with the software instead of trusting a wiki... :rolleyes:

feisty2
10th February 2017, 19:18
my personal thoughts about your question.
aliasing, blurring and ringing, you can only rule out one and do some trade-off between the rest two (ideal sinc could rule out all 3 but it's not practical as it requires infinite taps)
the "b + 2c = 1" formula makes sure you get the best possible (least artifacts in total) trade-offs between blurring and ringing with aliasing ruled out (as already explained here, https://forum.doom9.org/showthread.php?p=1711734#post1711734)
but "least artifacts in total" does not equal to best possible quality, depending on how you define "quality", it will fail quality tests that favor sharpness over having less ringing.
I personally think aliasing is not that annoying when downscaling so no "b + 2c = 1" for me cuz that's not how I define "downscaling quality", I always use negative b and c = 0 for downscaling cuz it fits my definition of "downscaling quality": sharp, ringing-free and less extreme than nearest neighbor.

Sharc
10th February 2017, 21:21
But why? Shouldn't a more garbled first pass create a more garbled second pass?

Yes, but you can't tell for sure which impairment comes from which pass. You only see the combined result of both passes. Just a thought.

Katie Boundary
12th February 2017, 06:34
I figured it out.

All of these algorithms, when upscaling, rely on the pretense that each pixel represents the color of an image at an infinitely small point, rather than the average color of a given rectangular area of the image.

When downscaling, most of them rely on the same pretense: instead of assigning each new pixel a value equal to the average area of each rectangular chunk of the image, they assign values according to what they think the color of the image would be at any particular infinitely small point.

In other words, the pretense that the upscaling algorithms relied on became true as a result of how most of the downscaling algorithms worked. Most notably, this explains why unfiltered two-tap linear interpolation (simpleresize) significantly outperformed a triangle filter (avisynth's bilinear).

With that being established, are there any upscaling algorithms which rely on the idea that each pixel represents the average color of a rectangular portion of the image?

LoRd_MuldeR
12th February 2017, 13:41
When downscaling, most of them rely on the same pretense: instead of assigning each new pixel a value equal to the average area of each rectangular chunk of the image, they assign values according to what they think the color of the image would be at any particular infinitely small point.

Yup, because that's how sampled data works.

Sampled audio data (PCM) stores signal values at specific "infinitely small" points in time (i.e. points in the mathematical sense). Sampled images, or video frames, store color values at "infinitely small" points in space. And so on...

Data in between the sample points simply is not stored - it is undefined. But the data in in between the sample points may be reconstructed or interpolated.

Therefore, thinking of sampled color values in a video or image of having an "area" is as wrong as thinking of sampled audio data to look like a "stairstep" pattern. Or, more specifically, you would get that (e.g., samples in an image having an "area", or samples in an audio having a "duration"), if and only if you interpolate the data in between the sample points using "nearest neighbor" method - which is only one out of many possible interpolation methods, and generally by far the worst one.

https://i.imgur.com/RfQN7DU.gif
Source: Xiph.org

There is a good explanation of how sampled data works here (see the "stairsteps" chapter, especially the part starting at 6:00):
https://www.xiph.org/video/vid2.shtml

And here is an illustration how different interpolation methods works, in 1D space (e.g audio) and 2D space (e.g. image or video):
https://upload.wikimedia.org/wikipedia/commons/thumb/9/90/Comparison_of_1D_and_2D_interpolation.svg/1000px-Comparison_of_1D_and_2D_interpolation.svg.png

(As you can see, you always start with "infinitely small" sample points and then interpolate/reconstruct the signal course in between those points)

Sharc
12th February 2017, 15:54
Yup, because that's how sampled data works.....

Therefore, thinking of sampled color values in a video or image of having an "area" is as wrong as thinking of sampled audio data to look like a "stairstep" pattern.
....
The "stairstep" is for example the practical implementation of the "sample-and-hold" principle with sinc() lowpass characteristic.

Katie Boundary
14th February 2017, 13:43
(stuff I read on Wikipedia 2 years ago)

:rolleyes:

I'm about 50 steps ahead of where you think I am. You are unfortunately still stuck in your strange fantasy world where images are waveforms and digital data cannot be produced by any method except sampling. Go re-read the discussion that we already had about this. I'm not interested in rehashing it, though MP4 Guy might:

Things only get messy when you need to actually do things and therefore are required to make assumptions about the image you are working with. Sampling theory is one such set of assumptions, it has the impressive quality of allowing perfect reconstruction of a down-sampled signal if you satisfy the version of reality it requires...

A pixel may be a square, or a point. Or it might be something else. You just have to find the least-bad assumption and use it.

https://forum.doom9.org/showpost.php?p=1711511&postcount=12

johnmeyer
14th February 2017, 17:59
If you are 50 steps ahead of everyone, then why do you need to ask any questions?

feisty2
14th February 2017, 20:35
If you are 50 steps ahead of everyone, then why do you need to ask any questions?

Certain kinds of abstractions are required to simply existences in the reality, so it would be possible to predict/calculate/manipulate them in a scientific way (using specialized mathematical formulas/equations)

"Something is moving!": a particle with velocity (particle is the physical abstraction of that something)
"Something is moving and spinning!": a rigid body with linear and angular velocity (now rigid body is the new physical abstraction of that something, because that thing is spinning)
"Something is moving VERY fast (close to light speed)": a particle with velocity and varying mass, mass is determined by the instantaneous velocity and invariant mass (the abstraction is different again, the particle mentioned earlier has constant mass, now it's variable)
"Something is moving, and I think that something is an election": a particle with partially unpredictable status, velocity and position can no longer be predicted simultaneously (which can be predicted simultaneously in the previous particle abstractions)

"Something" is just, something, an existence in the reality, there're a lot of different abstractions in physics for that "something" based on what "something" is or how "something" is like.

Now unlike physics which has various abstractions about a piece of matter, sampling theory is the one and only abstraction for videos (so far), much less fun, uhmmm?

Clearly Katie has been bugged by that boring fact and unfortunately she cannot come up with a new abstraction, which is why she's asking all about it.

LoRd_MuldeR
14th February 2017, 20:58
:rolleyes:

I'm about 50 steps ahead of where you think I am.

https://i.imgur.com/nilqqUr.gif

johnmeyer
14th February 2017, 21:14
Feisty,

Nice review of Newton, Lorentz, Einstein, and Heisenberg. Yes, models that work well for some things (Newton for non-relativistic motion) break down in other situations.

However ...

... there is a difference between those who fully understand the subject matter and know when a given model is breaking down and therefore need to look for a different paradigm, and those who only half-grasp any of the theory and therefore are always looking for the exceptions.

I am reminded of my visit to the clinic at a university back in the 70s. I am sure that I was seen by a resident, probably an intern. My complaint: swollen hands. In retrospect, my problem was simply the result of a change in diet in my new surroundings (I had moved from the west to the east coast). However, this person, who lacked real-world experience, but was filled to the brim with all sorts of book knowledge, went immediately to the most esoteric diagnosis, completely ignoring all the basic, obvious causes. He asked me if my head and hands had always been so large, and then ordered a head X-ray!!! He thought I had an extremely rare disorder called Acromegaly. I do not have this, and my head, while large, exhibits none of the other traits of this nasty disease.

I still can't believe I let this guy take an X-ray of my head!

So, some people are always searching for mythical creatures, uncommon causes, and once-in-a-lifetime events, while ignoring less exciting but more practical ways of looking at things and diagnosing problems.

Having said all that, perhaps that junior doctor many years ago eventually found a case of Acromegaly and then wrote a paper about it and then became famous.

feisty2
15th February 2017, 05:49
sampling theory breaks down with CG stuff (resolution-independent), and there're other abstractions like vector representation or fractal representation for those things, but none works on photographic stuff except sampling theory, so I have reason to believe that sampling theory is a nice (at least working) abstraction, but Katie is obviously not buying any of this, and there's nothing we can do to change her mind, we can always wait and see if she finally gets her theory working one day... (might be a bit challenging to you cuz you're OOOOOLD)

that doctor who thought you had acromegaly, his name has to be "Gregory House", right?

johnmeyer
15th February 2017, 07:06
(might be a bit challenging to you cuz you're OOOOOLD)

that doctor who thought you had acromegaly, his name has to be "Gregory House", right?I'm not that old; no diapers yet.

Gregory House looks like he might have that disease. What a strange-looking dude.

StainlessS
15th February 2017, 17:09
Katie, here is something I knocked up some time ago after a user had his house burgled, attempt at revealing number plate of burglars car.
It is really slow (really really).



Rgb_Test(clip c, Int "SZ",float "Rr"=1.0,Float "Rg"=0.0,Float "Rb"=0.0,float "Gr"=0.0,Float "Gg"=1.0,Float "Gb"=0.0,
\ float "Br"=0.0,Float "Bg"=0.0,Float "Bb"=1.0)

Filter to both upsize clip and modify R,G,B, RGB24 ONLY, null audio.

Output frame is c.Width*SZ x c.Height*SZ pixels, RGB24.
Views source frame as a c.Width*SZ x c.Height*SZ frame of micropixels where each source pixel is SZxSZ micro pixels.
Each output pixel is result of summing SZxSZ source micropixels.
The R,G and B channels are modified via the other arguments,
eg red component comes from Int((rSum * Rr + gSum * Rg + bSum * Rb) / (SZ*SZ) + 0.5) # Range 0 -> 255.0 if Rr+Rg+Rb sum to 1.0)
The result pixel Red channel is then limited to range 0->255.
Same for the other channels.

Defaults:

SZ = 5, output clip 5x5 size of input clip. Range ODD ONLY, 1 -> 9. (1=No resize)

Rr = 1.0 Rg = 0.0 Rb = 0.0 Should sum to 1.0 (valid -3.0 to 3.0)
Gr = 0.0 Gg = 1.0 Gb = 0.0 Should sum to 1.0 (valid -3.0 to 3.0)
Br = 0.0 Bg = 0.0 Bb = 1.0 Should sum to 1.0 (valid -3.0 to 3.0)



http://www.mediafire.com/file/1d8383ginz38ek5/Rgb_Test_v0.0_dll_20160329.zip

EDIT: Effect is slightly blurred and anti-aliased.
I implemented something similar for hardcopy poster output on Sinclair QL, Atari ST, and Commodore Amiga, some years ago.
EDIT: File size is about 3.5MB, contains short clip of burglars number plate.

Source
https://s20.postimg.org/hgpgtp471/test_zpsbxs070h8.png (https://postimg.org/image/7jeg0mwl5/)

Result
https://s20.postimg.org/rfafg6dml/test_zpscjexuhcb.png (https://postimg.org/image/qprn3td2x/)

EDIT: And script used for above

Avisource("Plate.avi").ConvertToRGB24
W=Width H=Height
SZ=5
/* Defaults (should really sum to 1.0, legal -3.0 -> 3.0)
Rr = 1.0 Rg = 0.0 Rb = 0.0
Gr = 0.0 Gg = 1.0 Gb = 0.0
Br = 0.0 Bg = 0.0 Bb = 1.0
*/

Rr = 1.0 Rg = -0.5 Rb = 2.0
Gr = -0.5 Gg = 2.0 Gb = -0.25
Br = -0.5 Bg = -0.5 Bb = 1.0

n=NNedi3_Rpow2(SZ<=2?2:SZ<=4?4:SZ<=8?8:16).BicubicResize(W*SZ,Height*SZ)
r=Rgb_Test(Sz=SZ) # No R,G,B modify
m=Rgb_Test(Sz=SZ,Rr=Rr,Rg=Rg,Rb=Rb,Gr=Gr,Gg=Gg,Gb=Gb,Br=Br,Bg=Bg,Bb=Bb) # R,G,B Modify
StackVertical(n,r,m)
ClipBlend
ConvertToYV24
Coloryuv(autogain=true)
Sharpen(1.0)
StackHorizontal(GrayScale)
Return ConvertToRGB32


https://s20.postimg.org/l2va6cakd/Result0_zpsjbzrpofz.png (https://postimg.org/image/uanin1hmh/)

EDIT: Original thread was deleted, they caught the guy :) (but not via number plate) :(

EDIT: Another attempt, not any better
https://s20.postimg.org/f37j2oprx/test_zpsfq5gvsik.png (https://postimg.org/image/f37j2oprt/)


/*
SeeSawMulti(), Calls SeeSaw() multiple times. Requires SeeSaw script + SeeSaw requirements.
Inspired by InGoldie. Here http://forum.doom9.org/showthread.php?p=1748707#post1748707 (as usual, thread deleted)
Also requires RT_Stats if Avisynth VersionNumber below v2.6.

Times, Number of iterations.
IG, smoother/soother, can be:-
UnDefined (or RT_Undefined if < v2.6 [req RT_Stats]), uses SeeSaw builtin smoothing at every iteration.
Denoised Clip, Used at every iteration.
Function String, Denoise applied to source, or result of previous iteration.
Remaining args as for SeeSaw().
*/
GSCript("""
Function SeeSawMulti(clip c, Int "Times",Val "IG",
\ int "NRlimit",int "NRlimit2",
\ float "Sstr", int "Slimit", float "Spower", float "SdampLo", float "SdampHi", float "Szp",
\ float "bias", int "Smode", int "sootheT", int "sootheS", float "ssx", float "ssy",
\ Float "BlurVal",Float "BlurMult") {
Function SeeSawMulti_LO(clip c, Int Times,Val "IG",
\ int "NRlimit",int "NRlimit2",
\ float "Sstr", int "Slimit", float "Spower", float "SdampLo", float "SdampHi", float "Szp",
\ float "bias", int "Smode", int "sootheT", int "sootheS", float "ssx", float "ssy",
\ Float "BlurVal",Float "BlurMult"
\ ) {
c
dn = (IG.IsClip)?IG:(IG.IsString)?Eval(IG):(VersionNumber>=2.6)?UnDefined:RT_Undefined
BlurVal=Default(BlurVal,0.0) BlurMult=DefaulT(BlurMult,1.0) BlurVal=Min(BlurVal,1.58)
SeeSaw(Last,dn,NRlimit,NRlimit2,Sstr,Slimit,Spower,SdampLo,SdampHi,Szp,bias,Smode,sootheT,sootheS,ssx,ssy)
DoBlur = (0<Times && Blurval>0.0)
if(Doblur) {
Blur(BlurVal)
}
Return (Times<1)
\ ?Last
\ :Last.SeeSawMulti_Lo(Times-1,IG,
\ NRlimit,NRlimit2,Sstr,Slimit,Spower,SdampLo,SdampHi,Szp,bias,Smode,sootheT,sootheS,ssx,ssy,BlurVal*BlurMult,BlurMult)
}
c Times=Default(Times,1)
return (Times<1)
\?Last
\:Last.SeeSawMulti_Lo(Times-1,IG,NRlimit,NRlimit2,Sstr,Slimit,Spower,SdampLo,SdampHi,Szp,bias,Smode,sootheT,sootheS,ssx,ssy,BlurVal,BlurMult)
}
""")

Avisource("Plate.avi").ConvertToRGB24
W=Width H=Height
SZ=5
/* Defaults (should really sum to 1.0, legal -3.0 -> 3.0)
Rr = 1.0 Rg = 0.0 Rb = 0.0
Gr = 0.0 Gg = 1.0 Gb = 0.0
Br = 0.0 Bg = 0.0 Bb = 1.0
*/

Rr = 1.0 Rg = -0.5 Rb = 2.0
Gr = -0.5 Gg = 2.0 Gb = -0.25
Br = -0.5 Bg = -0.5 Bb = 1.0

n=NNedi3_Rpow2(SZ<=2?2:SZ<=4?4:SZ<=8?8:16).BicubicResize(W*SZ,Height*SZ)
r=Rgb_Test(Sz=SZ) # No R,G,B modify
m=Rgb_Test(Sz=SZ,Rr=Rr,Rg=Rg,Rb=Rb,Gr=Gr,Gg=Gg,Gb=Gb,Br=Br,Bg=Bg,Bb=Bb) # R,G,B Modify
StackVertical(n.Fn,r.Fn,m.Fn)
StackHorizontal(GrayScale)
Return ConvertToRGB32

Function Fn(clip c) {
c
ConvertToYV24
Coloryuv(autogain=true)
IG=ClipBlend.Trim(FrameCount-1,-1).Loop(Framecount,0,0) # Long Time, complete clip scan to last frame.
SeeSawMulti(times=10,IG=IG)
ClipBlend
}

Katie Boundary
17th February 2017, 01:38
we can always wait and see if she finally gets her theory working one day...

What theory?

Katie, here is something I knocked up some time ago after a user had his house burgled, attempt at revealing number plate of burglars car.

Thanks but what does that have to do with the original post?

StainlessS
17th February 2017, 02:01
I guess I got confused with which thread this was, was intended for the thread where you were on about sampling of area of pixels,
instead of infinitely small points.

Katie Boundary
17th February 2017, 02:26
It would be equally off-topic there

StainlessS
17th February 2017, 02:38
Not really cos that is exactly what the plugin does.

johnmeyer
17th February 2017, 05:12
StainlessS,

Your good-hearted attempt to help, and her response prove the old adage: "no good deed goes unpunished." I can't believe her total lack of courtesy towards someone who took time out of his day to produce custom code. Even if it wasn't what she wanted, a person with any grace or manners would have at least thought to say, "thank you."

Fortunately, her posts are garnering fewer and fewer responses which will hopefully discourage this uncongenial behavior.

StainlessS
17th February 2017, 05:48
Yep, I guess I expected just too much :)

Katie Boundary
17th February 2017, 08:23
Not really cos that is exactly what the plugin does.

What is?

I can't believe her total lack of courtesy towards someone who took time out of his day to produce custom code. Even if it wasn't what she wanted, a person with any grace or manners would have at least thought to say, "thank you."

Why in God's name would I be interested in, or grateful for, code designed to make license plates readable? I STILL haven't heard an explanation for how that's relevant to anything else in this thread.

smok3
17th February 2017, 09:22
Nah, imho one does code because is a fun problem not because of some silly person.

ndjamena
17th February 2017, 12:09
I vaguely remember a thread where someone modified some resize filters to work under the "squares" theory... I can't find it though...

raffriff42
19th February 2017, 03:31
But that's really a distraction from the real subject of this thread, which is "why is mathematical correctness a bad thing, both when violating the b+2c=1 rule for cubics and when downscaling images in general?"

No one has claimed "correctness" in a resizer before now AFAIK. It's a subjective issue, always has been.http://www.mentallandscape.com/Papers_siggraph88.pdf
https://en.wikipedia.org/wiki/Reconstruction_filter#cite_note-mitchell-3

Reconstruction filters in computer-graphics
Don P. Mitchell
Arun N. Netravali
AT&T Bell Laboratories
August 1988

(page 8)
When 2C + B = 1, ... quadratic convergence of fit is achieved. This line contains the cubic B-spline and the Catmull-Rom spline (which actually has cubic convergence). Within the interval of B = 5/3 to B = 0, good subjective behavior is found with a simple trade-off between blurring and ringing. Outside this interval, k(x) becomes bimodal or exhibits extreme ringing. The filter (1/3, 1/3) ... is recommended by the authors, but other observers may prefer more or less ringing.

Wilbert
20th February 2017, 22:39
Removed all non-relevant posts.