I propose a new resizing algorithm [Archive]

Katie Boundary

6th February 2015, 03:38

I admit, I know very little about how resizing filters work. However, from what I've been able to gather, the most common resizing methods (nearest neighbor, bilinear, bicubic, lanczos) all have the same fundamental flaw: they treat pixels as points, rather than as cells. I'd like to propose a new way of doing things, which I am tentatively calling Grid Overlay resizing. It works like this: first, a grid is drawn with dimensions equal to the new resolution. Then, the original bitmap is painted over it, stretching and squooshing the pixels as necessary. Then, each cell in the grid is assigned a color value equal to the average of the colors in it, weighted by the percentage of that color's share of the cell's area. Then each cell becomes a pixel of that color in the new bitmap.

Example: let's say I want to resize a 6x4 image to 4x3.

Here's the original 6x4 image:

http://img.photobucket.com/albums/v415/DWJohnson/6x4_zpsvae0ikvv.jpg

Here's a 4x3 grid:

http://img.photobucket.com/albums/v415/DWJohnson/4x3grid_zpswmeiwsna.jpg

Here's the image painted onto the grid:

http://img.photobucket.com/albums/v415/DWJohnson/4x3overlay_zpsbqjfvymn.jpg

And then new pixels would be generated according to the average color within each cell. In the upper left cell, for example, total blackness (r0b0g0) accounts for 50% of its area, r51b0g0 accounts for 25% of its area, r0b0g85 accounts for 16.67% of its area, and r51b0g85 accounts for 8.33%, so the uppermost leftmost pixel in the new 4x3 image would be r17b0g22.

Does something like this exist? If not, who would I talk to about making it a reality?

Asmodian

6th February 2015, 03:54

If I understand it correctly this method is mathematically equivalent to bilinear (http://en.wikipedia.org/wiki/Bilinear_interpolation).

feisty2

6th February 2015, 16:08

what you are saying are basically the common steps of any typical resizer, the only difference between bilinear/cubic/sinc family/spline... is the convolution step (how you average the colors within one cell)
EDIT: there're a lot of alternative methods besides this, like vectorize (for CG contents), fractal, EDI (edge directed interpolation), AI (artificial intelligence, try to fill in imaginary details based on a trained neural network when upscaling, still a legend yet :))... and so many others, but most of them are not so practical, they are there like some theories, you got the idea, but you got no idea how to bring it to real life

LoRd_MuldeR

6th February 2015, 21:01

With all these considerations, you need to keep in mind that we are dealing with sampled signals here!

So while you screen shows "square" pixels and it fills the entire pixel area with the same color, as in your first picture, actually each color value (sample) only applies to an infinitely small point, located (e.g.) in the center of the pixel.

There is a very good explanation of "sampled" signals in this video, in the stairsteps chapter (he starts talking about audio, but goes on the images soon):
http://xiph.org/video/vid2.shtml

Nonetheless, what your "new resizing algorithm" actually does is:

It takes the existing sample points that are adjacent to the "new" sample point and weights them according to their distance from the "new" sample point. That's simple linear interpolation (http://en.wikipedia.org/wiki/Bilinear_interpolation) in the 2D space. And it's probably not that new ;)

Katie Boundary

7th February 2015, 00:06

If I understand it correctly this method is mathematically equivalent to bilinear (http://en.wikipedia.org/wiki/Bilinear_interpolation).

Then you're not understanding it correctly.

It takes the existing sample points that are adjacent to the "new" sample point and weights them according to their distance from the "new" sample point. That's simple linear interpolation (http://en.wikipedia.org/wiki/Bilinear_interpolation) in the 2D space. And it's probably not that new ;)

Nope. Imagine taking the 6x4 image and blowing it up to 60x40. The result would look a lot more like nearest neighbor resizing than bilinear.

LoRd_MuldeR

7th February 2015, 00:14

Nope. Imagine taking the 6x4 image and blowing it up to 60x40. The result would look a lot more like nearest neighbor resizing than bilinear.

I don't think so.

If we stick with your way of thinking of pixels as something that has an area (which is not quite correct, as it's actually a sampled signal), then the suggested "averaging" within each "cell" is equivalent a linear interpolation in 2D space.

Conversely, in order to get the equivalent to a "nearest neighbor" method, you would exclusively pick the color that has the biggest area within each "cell" - and totally ignore all other colors that may also appear in that "cell".

Katie Boundary

7th February 2015, 00:21

Okay, let's say we wanted to take a 2x2 image and resize it to 4x3.

Original image:

http://img.photobucket.com/albums/v415/DWJohnson/2x2_zpszdogbeqv.jpg

Painted onto a 4x3 grid:

http://img.photobucket.com/albums/v415/DWJohnson/4x3gridupfirst_zpskudtx8pb.jpg

Final 4x3 image:

http://img.photobucket.com/albums/v415/DWJohnson/4x3gridupfinal_zpsqvcpv0r1.jpg

Obviously not bilinear.

LoRd_MuldeR

7th February 2015, 00:44

Obviously not bilinear.

Not exactly. But only because you are still ignoring that Bitmap images actually are a sampled signal.

Actually, each color value (sample) applies only to an infinitely small point in the center of the pixel (regardless of how big or small you show the pixel) - not to the whole pixel area. See the video I linked above for details.

So the "blow up" that you are showing like this...
http://i.imgur.com/JsUYjIX.png

...actually it needs to look more like this - where the "white" area is not actually white but means "color not stored/known":
http://i.imgur.com/w04NBC4.png

(Yes, the second image is still not entirely correct. That's because the sample points actually would have to be infinitely small and thus would not actually be visible at all. But I think you get the idea ^^)

vivan

7th February 2015, 00:46

I think it's equal to upscaling using nearest neighbor by 2 times till resolution will be not lower than target, then bilinear downscaling. Which is even worse than bilinear.

Anyway, consider upscaling diagonal line - and you'll see why your method is bad.

Katie Boundary

7th February 2015, 00:48

you are still ignoring that Bitmap images actually are a sampled signal.

I'm ignoring it because it's neither relevant nor always true.

Anyway, consider upscaling diagonal line - and you'll see why your method is bad.

Okay, I'm considering it, and I'm not seeing why it's bad.

LoRd_MuldeR

7th February 2015, 01:05

I'm ignoring it because it's neither relevant nor always true.

It's in the nature of Bitmap images. So, in that sense, it is "always" true.

"Resizing" your original 2×2 image to 3×4 means that, originally, what we have are samples at red locations. And what we are interpolating (because this infomration is not present in the original image) are the samples at the green locations:

http://i.imgur.com/kWx5NPZ.png

vivan

7th February 2015, 01:08

Okay, I'm considering it, and I'm not seeing why it's bad.

http://i.imgur.com/Y9Nkcjq.png source
http://i.imgur.com/umzr90V.png nn to 80x40, then bilinear to 60x40. "your method"
http://i.imgur.com/TEjlfyp.png bilinear

Katie Boundary

7th February 2015, 01:22

It's in the nature of Bitmap images. So, in that sense, it is "always" true.

No, it isn't. Sampling is a method for producing digital data, NOT storing it, and bitmaps aren't waveforms, which is why thinking of pixels as samples is stupid and wrong. Pixels can represent samples/points, or they can represent areas, or they can represent LEGO blocks or anything else. But regardless of where pixels come from, once they exist, they are squares, which is why I'm treating them as such.

By the way, this would be a good time to admit that you were wrong about Grid Overlay being functionally the same as bilinear.

http://i.imgur.com/Y9Nkcjq.png source
http://i.imgur.com/umzr90V.png nn to 80x40, then bilinear to 60x40. "your method"
http://i.imgur.com/TEjlfyp.png bilinear

That's not my method, so your point is invalid.

LoRd_MuldeR

7th February 2015, 01:34

Sampling is a method for producing digital data

the method for producing digital data.

NOT storing it

Of course samples are not a method of storing data (and I never said that). They are that data that is stored.

and bitmaps aren't waveforms

Correct. But again, I never said that.

which is why thinking of pixels as samples is stupid and wrong.

Digital (raster) images are a sampled signal. Just like digital audio is a sampled signal too. There a different types of samples, my friend ;)

Heck, did you at least watch the video by Xiph.org that I linked above? It explains those fundamental things quite nicely, so I think I would be a good starting point.

A pixel, on the other hand, is the smallest area on your screen that can be filled with a distinct color.

It is important to understand that while a pixel does have an area, a sample does not! Assuming that a sample's color value applies to the entire pixel area (regardless of how small or large that pixel is) would "stupid and wrong".

Groucho2004

7th February 2015, 01:40

Just like digital audio is a sampled signal too.
Digital audio can just as well be synthesized.

Katie Boundary

7th February 2015, 01:41

the method for producing digital data.

ummm no. That's just wrong. When I write in a .txt file, that's not "sampling", but it's still digital data.

...which is irrelevant to the fact that you were wrong about bilinear resizing.

Of course samples are not a method of storing data (and I never said that). They are that data that is stored.

Only if the data was produced by sampling in the first place, which isn't always true...

...which is irrelevant to the fact that you were wrong about bilinear resizing.

Digital (raster) images are a sampled signal. Just like digital audio is a sampled signal too.

Again, not always. MIDI files, for example, aren't sampled :)

...which is irrelevant to the fact that you were wrong about bilinear resizing.

Heck, did you at least watch the video by Xiph.org that I linked above?

Yes, and he's wrong too.

Assuming that a sample's color value applies to the entire pixel area (regardless of how small or large that pixel is) would "stupid and wrong".

It certainly would be if we knew for sure that we were dealing with infinitely small sample points. In the real world, however, nothing is infinitely small... not even film grains or the little color-detectors in a digital camera.

...which is irrelevant to the fact that you were wrong about bilinear resizing.

vivan

7th February 2015, 02:12

That's not my method, so your point is invalid.It is.
1) For 2x upscaling factors it's nearest neighbor.
2) Then you average color by area, which is the definition of bilinear.

LoRd_MuldeR

7th February 2015, 02:13

ummm no. That's just wrong. When I write in a .txt file, that's not "sampling", but it's still digital data.

We were talking about multi-media data here and now you bring in something completely unrelated.

Anyway, text can be understood as a sampled signal just as well, because text is a sequence of discrete characters - just like audio is a sequence of discrete signal values and (raster) images are a sequence of discrete color values.

Only if the data was produced by sampling in the first place, which isn't always true...

Nope.

Again, not always. MIDI files, for example, aren't sampled :)

Sure it is! MIDI is a sequence of discrete notes, where each note (sample) has a distinct frequency value. And between two notes (samples) there is no information.

Is MIDI necessarily regularly sampled? Probably not, because the duration of notes may vary. But who said it is?

Yes, and he's wrong too.

So, Monty from Xiph.org, recognized Codec developer for many years, is wrong. But of course you are right. Nice try Troll :D

It certainly would be if we knew for sure that we were dealing with infinitely small sample points. In the real world, however, nothing is infinitely small... not even film grains or the little color-detectors in a digital camera.

Just like audio samples are discrete in time and thus have no duration (i.e they apply to an infinitely short time interval), image samples are discrete in location and thus have no area (i.e they a apply to an infinitely small points). Unless you understand and accept this simple fact, all further discussion is pointless...

Pixels, may it be on your screen or in the sensor of you camera, do have an area, yes! Still, the color value that is used to fill the pixel area is valid only in an infinitely small point, located usually (but not necessarily) in the center of that pixel.

Katie Boundary

7th February 2015, 02:25

It is.
1) For 2x upscaling factors it's nearest neighbor.
2) Then you average color by area, which is the definition of bilinear.

LOL nope. Go re-read the original description until you understand it :)

We were talking about multi-media data here and now you bring in something completely unrelated.

Does it matter whether I'm dicking around in Notepad or MS Paint? Either way, nothing is being sampled.

...which is irrelevant to the fact that you were wrong about bilinear resizing.

Anyway, text can be understood as a sampled signal just as well

...

Sure it is! MIDI is a sequence of discrete notes, where each note (sample) has a distinct frequency value. And between two notes (samples) there is no information.

...

Just like audio samples are discrete in time and thus have no duration (i.e they apply to an infinitely short time interval), image samples are discrete in location and thus have no area

Okay, I don't know what you think a "sample" is, but you're clearly not using the word the same way the rest of the English-speaking world does.

...which is irrelevant to the fact that you were wrong about bilinear resizing.

vivan

7th February 2015, 02:32

LOL nope. Go re-read the original description until you understand it :)1) show how scaling by 2x factor is done. That means from W x H to 2W x H. Then try to find any difference from nearest neighbor.
2) show how scaling from 2W x H to 1.5W x H is done. Then try to find any difference from (bi)linear.

Katie Boundary

7th February 2015, 02:42

1) show how scaling by 2x factor is done. That means from W x H to 2W x H. Then try to find any difference from nearest neighbor.
2) show how scaling from 2W x H to 1.5W x H is done. Then try to find any difference from (bi)linear.

Why should I bother with either of those things? Grid Overlay DOES. NOT. WORK. THAT. WAY.

LoRd_MuldeR

7th February 2015, 02:44

Okay, I don't know what you think a "sample" is, but you're clearly not using the word the same way the rest of the English-speaking world does.

So you are the person who defines what the "English-speaking world" does? (I'm glad that this is actually not the case)

Does it matter whether I'm dicking around in Notepad or MS Paint? Either way, nothing is being sampled.

MS Paint exclusively works with raster images, so it's all sampled data, obviously!

Anyway, since I'm getting tired of correcting all you pointless claims and factoids, I'm just switching to "don't feed the troll" mode for now ;)

vivan

7th February 2015, 02:49

Why should I bother with either of those things? Grid Overlay DOES. NOT. WORK. THAT. WAY.So, you don't even want to understand how your own method works? That's looks more like trolling instead of trying to understand anything.

Katie Boundary

7th February 2015, 03:07

So you are the person who defines what the "English-speaking world" does?

That's not even remotely close to what I said. you really LOVE strawman arguments, don't you?

https://yourlogicalfallacyis.com/strawman

MS Paint exclusively works with raster images, so it's all sampled data, obviously!

Wrong. Sampling is not the only way to get raster data!

I'm getting tired of correcting all you pointless claims and factoids

You haven't done that even once yet. Actually, it's your own pointless claims and factoids that have been getting corrected.

So, you don't even want to understand how your own method works?

I understand quite well how it works. You're the one who apparently doesn't want to understand it.

Here's a hint: if you were to actually resize a 20x20 image to 60x40 using Grid Overlay, the results would be the same as straight-up nearest neighbor resizing.

Asmodian

7th February 2015, 03:35

Here's a hint: if you were to actually resize a 20x20 image to 60x40 using Grid Overlay, the results would be the same as straight-up nearest neighbor resizing.

So point resize 1x, 2x, 3x, etc. in either dimension as needed to get to or above the target resolution and then bilinear resize down to the target if needed.

That is all it is.

vivan

7th February 2015, 03:36

I understand quite well how it works. You're the one who apparently doesn't want to understand it.

Here's a hint: if you were to actually resize a 20x20 image to 60x40 using Grid Overlay, the results would be the same as straight-up nearest neighbor resizing.Yeap. And so will "upscale to 80x40 using nn and then downscale to 60x40 using bilinear".
Time to shit bricks? ;)

Asmodian

7th February 2015, 03:41

Yeap. And so will "upscale to 80x40 using nn and then downscale to 60x40 using bilinear".
Time to shit bricks? ;)

It is actually just nn 20x20 to 60x40, no bilinear needed. It looks bad too but that is a different argument.

StainlessS

7th February 2015, 03:53

Katie Boundary,
Methinks tis time for you to take to the compiler and show those so-called-experts how to do it, it may take you some time, but twould be worth it,
and just think of the benefit to humankind, people would wrote odes about you and your quest to inform.
Just take care, there be big nasty dragons in them there woods, and poor little raptors could get gobbled right up.

vivan

7th February 2015, 03:54

It is actually just nn 20x20 to 60x40, no bilinear needed.Well... actually I'm thinking that for integer scaling factor those 2 ways might be equivalent (for the same reason his grid align, bilinear will sample only samples produced from 1 source pixel). I'm trying to find counter-example but I can't :rolleyes:

Katie Boundary

7th February 2015, 04:07

So point resize 1x, 2x, 3x, etc. in either dimension as needed to get to or above the target resolution and then bilinear resize down to the target if needed.

That is all it is.

I ran a quick test - taking a 2x2 image, point-resizing up to 6x6, and then bilinear resizing down to 5x5 - and the result was the same as Grid Overlay would have given... so it looks as though what you say might be true.

It is actually just nn 20x20 to 60x40, no bilinear needed.

Didn't I already say that?

It looks bad too but that is a different argument.

Your opinion of how it looks is irrelevant. It is the most mathematically correct representation of the original data. That's actually the one and only good thing about nearest neighbor resizing - it gives the most mathematically correct results when you're just multiplying each dimension by a whole number.

EDIT: oh wait, now I see what vivian was trying to say. And yes, there are some cases in which that gives the same results. However, it's still not quite the same. Extreme downsizing, for example, should look quite a bit different because bilinear is still only interpolating between four pixels whereas overlay would be averaging out much larger areas to generate each new pixel.

feisty2

7th February 2015, 04:19

Is dat so, y not just stick 2 point resize den, point resize is ALWAYS lossless wen upscaling
It does not suffer from inter factor limit at all

vivan

7th February 2015, 04:19

That's actually the one and only good thing about nearest neighbor resizing - it gives the most mathematically correct results when you're just multiplying each dimension by a whole number.Now take you unfounded belief about "mathematically correct results" away and you end up with the worst possible scaler.

Katie Boundary

7th February 2015, 04:25

Is dat so, y not just stick 2 point resize den

Because I'm not always upscaling by whole numbers, herp derp :rolleyes:

feisty2

7th February 2015, 04:29

Well... U actually can upscale decimal factors with point resize, try it urself

Katie Boundary

7th February 2015, 04:40

Well... U actually can upscale decimal factors with point resize, try it urself

You can, but then the results look like ass.

feisty2

7th February 2015, 04:43

gonna look like ass anyways if stays lossless...
what's the point

Katie Boundary

7th February 2015, 04:46

gonna look like ass anyways

Not if you multiply by a whole number

Asmodian

7th February 2015, 04:47

Your opinion of how it looks is irrelevant. It is the most mathematically correct representation of the original data. That's actually the one and only good thing about nearest neighbor resizing - it gives the most mathematically correct results when you're just multiplying each dimension by a whole number.

Are we are starting this argument again? :rolleyes:

Try this thought experiment: You take a 1280x960 image, resize it to 320x240, then resize it back to 1280x960. The most mathematically correct method would be the one that looks the most like the original 1280x960 image, wouldn't it?

feisty2

7th February 2015, 04:55

Your method and pointresize result da same wen upscaling inter factors like u said earlier
And still looks like ass...

Katie Boundary

7th February 2015, 05:03

Are we are starting this argument again? :rolleyes:

Try this thought experiment: You take a 1280x960 image, resize it to 320x240, then resize it back to 1280x960. The most mathematically correct method would be the one that looks the most like the original 1280x960 image, wouldn't it?

If by "looks the most like" you mean "deviates the least from the information of"...

Your method and pointresize result da same wen upscaling inter factors like u said earlier

What the hell does "upscaling inter factors" mean? Use English, please.

Asmodian

7th February 2015, 05:04

If by "looks the most like" you mean "deviates the least from the information of"...

Exactly. Hint: it isn't NN.

Katie Boundary

7th February 2015, 05:07

Exactly. Hint: it isn't NN.

Well anyone who uses nn for downsizing is an idiot. What's your point?

feisty2

7th February 2015, 05:08

Wow, partys getting a little bit tense now
I mean, "upscale by whole numbers" in ur word, like 2x 3x 4x...

Katie Boundary

7th February 2015, 05:10

Wow, partys getting a little bit tense now
I mean, "upscale by whole numbers" in ur word, like 2x 3x 4x...

Ah. Yes, in such cases, grid overlay and nearest neighbor both produce mathematically correct results. What's your point?

Asmodian

7th February 2015, 05:11

Ok, you use Grid Overlay (bilinear) to downscale. Now pick an upscaling method that gives an image most mathematically similar to the original. It isn't NN.

vivan

7th February 2015, 05:14

Extreme downsizing, for example, should look quite a bit different because bilinear is still only interpolating between four pixels whereas overlay would be averaging out much larger areas to generate each new pixel.That's how wrong downscaling works. With any scaler you should either increase number of taps (radius, aka the number of samples it reads), to cover the size of the resulting pixel or run it several times (google "mipmap"), each time with downscaling factor 2, till the factor is 2 or less. Of course proper implementations do that for you.

feisty2

7th February 2015, 05:19

Let's get this thing str8
First, u wanna lossless upscale algorithm
I said, pointresize is always lossless, upscale by whole numbers or not
You said the result will look like ass if upscale by decimal numbers
I said ur algorithm or pointresize, both gonna look like ass anyways
You said the result wouldn't look like ass if upscale by whole numbers
I said ur algorithm is equal to pointresize when upscale by whole numbers

Conclusion: why not just pick pointresize and everything would be alright

Katie Boundary

7th February 2015, 05:20

Ok, you use Grid Overlay (bilinear) to downscale.

LOL no, if you shrink both sides to 1/4 of their original length, then bilinear and grid overlay will give you different results. I already pointed that out earlier in the thread. Please try to pay attention! Grid overlay would use SIXTEEN pixels in the original image to calculate the value of each pixel in the reduced image. Bilinear would only take four into account.

But if you ran the original image through virtualdub's "2x2 non-overlapping matrix" twice, that WOULD give the same results as Grid Overlay. So let's do that.

Now pick an upscaling method that gives an image most mathematically similar to the original. It isn't NN.

I'll believe that when I see the signal-to-noise data.

That's how wrong downscaling works.

I think that was kind of my point.

str8

ENGLISH! DO YOU SPEAK IT?

feisty2

7th February 2015, 05:28

Str8=straight
I speak English indeed...

vivan

7th February 2015, 05:49

But if you ran the original image through virtualdub's "2x2 non-overlapping matrix" twice, that WOULD give the same results as Grid Overlay. So let's do that.Which is how proper bilinear downscaling works, so we are back to bilinear, second-to-worst scaler.

I'll believe that when I see the signal-to-noise data.You either reading or doing it wrong. On any frame any scaler (even bilinear!) gives higher PSNR than PointResize. It's really easy to check using avisynth: s = ffvideosource ("whatever.mp4")
d = s.BilinearResize (960, 540)
s1 = d.Lanczos4Resize (1920, 1080)
s2 = d.PointResize (1920, 1080)
r1 = Compare (s, s1).Crop (0, 0, 0, 200)
r2 = Compare (s, s2).Crop (0, 0, 0, 200)
StackVertical (r1, r2)You can use any scaler for downscaling, I've choosed bilinear because you like it. With better downscaler you'll get higher PSNR too.
It should be noted that PSNR is a bad metric, SSIM is a better one and the best one is human eye (that's why psnr/ssim optimised encoders are terrible, hello vp*).

ENGLISH! DO YOU SPEAK IT?He can't.