How, mathematically, downscaling works?

luquinhas0021 · 25th March 2017, 22:50

I talk about, here, turn an image into a polynomial function f(x,y) = z.
In upscaling task, all it has to do is replace into function the coordinates of the interpolated pixels.
What about downscaling task: is there something that can be done in the polynomial that is the image function?

Katie Boundary · 10th April 2017, 08:29

http://entropymine.com/imageworsener/resample

^this explains things in simple English with lots of pretty pictures. It's a good place to start.

LoRd_MuldeR · 10th April 2017, 18:32

Quote:

Originally Posted by luquinhas0021

I talk about, here, turn an image into a polynomial function f(x,y) = z.
In upscaling task, all it has to do is replace into function the coordinates of the interpolated pixels.
What about downscaling task: is there something that can be done in the polynomial that is the image function?

As always, the input image consists of sampling points, i.e. the image value (brightness or color) is only "known" at certain discrete points.

In between those sampling points the image value is not known, so a 2D interpolation (e.g. BiLinear or BiCubic) is used to approximate the image data in between the original sample points:
https://upload.wikimedia.org/wikiped...lation.svg.png

Finally, new sampling points are taken from the interpolated curve (function), which then form the output image. Again the output image will only contain discrete sampling points.

The difference between "upscaling" and "downscaling" is that the former creates an output image with more (denser) sampling points, while the latter creates an output image with less (sparser) sampling points - compared to the input image.

(BTW: Why does "downscaling" require interpolation at all? Why not simply discard a certain fraction of the sampling points from the original image? The reason is that, depending on the scaling factor, the remaining sampling points of the "downscaled" image may be located at a position where there is no sampling point in the original image! So the image value at the location of the "new" sampling point needs to be interpolated/approximated from the original sampling points)

nevcairiel · 10th April 2017, 20:26

Quote:

Originally Posted by LoRd_MuldeR

(BTW: Why does "downscaling" require interpolation at all? Why not simply discard a certain fraction of the sampling points from the original image? The reason is that, depending on the scaling factor, the remaining sampling points of the "downscaled" image may be located at a position where there is no sampling point in the original image! So the image value at the location of the "new" sampling point needs to be interpolated/approximated from the original sampling points)

Thats not the only reason for proper downscaling interpolation - even if you were to hit a point exactly at all times, you might still drop a lot of information of the original image without taking it into account at all. Hit one bright pixel in an otherwise mostly dark image, and suddenly a much larger area might be bright. Thats why downscaling typically averages areas to account for all information and not distort by random chance.

luquinhas0021 · 13th April 2017, 00:58

Quote:

Originally Posted by nevcairiel

That's not the only reason for proper downscaling interpolation - even if you were to hit a point exactly at all times, you might still drop a lot of information of the original image without taking it into account at all. Hit one bright pixel in an otherwise mostly dark image, and suddenly a much larger area might be bright. That's why downscaling typically averages areas to account for all information and not distort by random chance.

There's, two, the aliasing problem: simply discard pixels, like Point Downscaling does, causes edge discontinuity. instead of, for example, SSIM Downscaler, that attempts to try to reconstruct the edge.

Katie Boundary · 13th April 2017, 01:27

Quote:

Originally Posted by LoRd_MuldeR

As always, the input image consists of sampling points, i.e. the image value (brightness or color) is only "known" at certain discrete points.

In between those sampling points the image value is not known, so a 2D interpolation (e.g. BiLinear or BiCubic) is used to approximate the image data in between the original sample points:

This discussion has already been had. The short version is that Mulder's explanation is wrong because it pretends that images are waveforms.

More information was provided by MP4 Guy here

LoRd_MuldeR · 13th April 2017, 09:02

Quote:

Originally Posted by Katie Boundary

This discussion has already been had. The short version is that Mulder's explanation is wrong because it pretends that images are waveforms.

Nope!

Just like an audio signal is a continuous curve in 1D time-space (aka "Waveform"), an image is a continuous function in 2D space. And both are sampled for digital storage, i.e. the signal value is only captured/store at discrete (infinitely small) points. Only difference is that for an audio signals we store the signal value at discrete points in 1D time space, while for an image we store the image value at discrete points in 2D spatial space.

Neither are there any "stairsteps" in a digital/sampled audio signal, nor do the "pixels" in a digital/sampled image have any "area". Representing pixels (i.e. samples of a digital image) as solid-color squares that actually take up an area is nothing but a "workaround" which is used by image editors - because you wouldn't be able to see an infinitely small sample points. It is still not an accurate representation (not at all) of what the sampled image data actually is

Watch this video, especially the "there are no stairsteps" chapter:
https://xiph.org/video/vid2.shtml

Sharc · 13th April 2017, 09:35

Just a thought on the "stairsteps" (the temporal aspect maybe a bit off topic though):
In a movie each picture (or frame in digital video) is displayed for 1/24s = 41.7ms. During this time the picture does not change, means our eyes are fed with a sequence of pictures of 41.7ms duration each, which is per definition a "stairstep" function. The interpolation filter is the human apparatus (eyes, brain ....). What is this human filter like? Some kind of lowpass, I assume ..... (btw. certain animals have "interpolation filters" which are totally different from humans, that's why they don't go to the cinema, I presume).

nevcairiel · 13th April 2017, 19:14

Quote:

Originally Posted by LoRd_MuldeR

Neither are there any "stairsteps" in a digital/sampled audio signal, nor do the "pixels" in a digital/sampled image have any "area".

While I agree this is true for audio, for video its not as simple. Digital sensors in a camera aren't infinitely small dots that capture one infinitely small sample at given distances, the pixels in a sensor are an area of a certain size, so the pixel-area representation is closer to their actual sampled signal then an infinitely small dot in the center of the pixel.

Sharc · 13th April 2017, 19:42

Quote:

Originally Posted by nevcairiel

While I agree this is true for audio, ....

Being nit-picking it's not even fully true for audio, as real A/D converters and sampling devices have a non-zero aperture time. The effect is normally negligible though and one can assume "sharp" samples.

wonkey_monkey · 14th April 2017, 00:34

Quote:

Originally Posted by Katie Boundary

This discussion has already been had. The short version is that Mulder's explanation is wrong because it pretends that images are waveforms.

More information was provided by MP4 Guy here

I don't see how MP4 Guy's post contradicts LoRd_MuldeR's perfectly well-informed explanation.

Quote:

Originally Posted by Sharc

The interpolation filter is the human apparatus (eyes, brain ....). What is this human filter like?

I don't think there is any such temporal filtering. That's why movies seem to "judder" and have a "filmic" effect.

Quote:

Originally Posted by Sharc

Being nit-picking it's not even fully true for audio, as real A/D converters and sampling devices have a non-zero aperture time. The effect is normally negligible though and one can assume "sharp" samples.

Isn't audio usually passed through a high-pass filter before sampling for exactly this reason? To avoid "sharp" samples?

Katie Boundary · 14th April 2017, 07:27

Quote:

Originally Posted by nevcairiel

While I agree this is true for audio, for video its not as simple. Digital sensors in a camera aren't infinitely small dots that capture one infinitely small sample at given distances, the pixels in a sensor are an area of a certain size, so the pixel-area representation is closer to their actual sampled signal then an infinitely small dot in the center of the pixel.

Yes, there's that. There's also the fact that all bets are off as soon as any resizing is done. If a 1920x1080 image is resized to 720x480 using Area Averaging, then suddenly each pixel represents the average color of a rectangular piece of the image. And if you're dealing with CGI/machinima, then you're at the mercy of whatever algorithm is used to render images...

In other words, sampling is just one of many models of reality, and presenting it as the only model is wrong.

Sharc · 14th April 2017, 07:49

Quote:

Originally Posted by davidhorman

I don't think there is any such temporal filtering. That's why movies seem to "judder" and have a "filmic" effect.

I think there must exist some kind of "human interpolation filter" (temporal resolution limit) in order to perceive the individual pictures as a more-or-less fluent motion. 24fps is a cost driven lower limit (cost of the film material = storage cost) which is for most people bearable without causing nausea. That does however not mean that 24 pictures per second is fast enough to represent fast changing natural processes undistorted. An example is the backward turning wheels caused by the too low picture sampling rate producing aliasing.

Quote:

Isn't audio usually passed through a high-pass filter before sampling for exactly this reason? To avoid "sharp" samples?

Hmmm..., don't you mean low-pass filter, means remove all frequencies which are above half the sampling frequency in order to avoid aliasing (=folding back the high frequencies to the useful frequency range causing signal distortion)?

StainlessS · 14th April 2017, 08:30

Quote:

24fps is a cost driven lower limit (cost of the film material = storage cost) which is for most people bearable without causing nausea.

Do I not recall that 24 FPS projectors display each frame twice, taking it up to 48 FPS and within reasonable range of the more natural lower limit of about 50 Hz ? (otherwise flicker is apparent @ 24).

hello_hello · 14th April 2017, 08:32

Quote:

Originally Posted by nevcairiel

While I agree this is true for audio, for video its not as simple. Digital sensors in a camera aren't infinitely small dots that capture one infinitely small sample at given distances, the pixels in a sensor are an area of a certain size, so the pixel-area representation is closer to their actual sampled signal then an infinitely small dot in the center of the pixel.

Yeah but the "pixels" capture the three primary colours individually so you could argue a pixel is a combination of samples, and the final colour for each pixel is computed by using information from neighbouring pixels (in a way I don't understand) to increase the resolution, and ignoring the gaps between "light cavities".... each pixel could be seen as something the camera confabulated from it's sensor and doesn't really represent a particular "area", but rather it's a "sample" at a particular point in the image.
http://www.cambridgeincolour.com/tut...ra-sensors.htm

Maybe another way of looking at it.....

I don't know what Katie imagines a pixel could be other than a sample. That makes no sense. If you resize down, each new pixel might represent some sort of average of pixels from the original image, but that doesn't mean a new "sample" wasn't created.

Sharc · 14th April 2017, 08:50

Quote:

Originally Posted by StainlessS

Do I not recall that 24 FPS projectors display each frame twice, taking it up to 48 FPS and within reasonable range of the more natural lower limit of about 50 Hz ? (otherwise flicker is apparent @ 24).

Possibly yes. However, I would assume that displaying the same picture twice does not improve the time resolution of the picture sequence as the same picture is displayed for 2*1/48 = 1/24 second. So I think there must probably be another reason for the 48fps projector playback, or some "psychovisual" or "anti-flicker" effect (?)
This is different for video when we bob (instaed of single rate deinterlace) a video. The time resolution of the single rate deinterlaced video is 30fps while the bobbed sequence produces 60 different pictures per second (for action or panning scenes).

StainlessS · 14th April 2017, 09:08

Quote:

Originally Posted by Sharc

So I think there must probably be another reason for the 48fps projector playback, or some "psychovisual" or "anti-flicker" effect (?)

Yes, anti-flicker. (same for PAL 50 fields/sec or NTSC 60 fields/sec, 25/30 Frames/sec would flicker).

raffriff42 · 14th April 2017, 09:30

Quote:

Originally Posted by davidhorman

Isn't audio usually passed through a high-pass filter before sampling for exactly this reason? To avoid "sharp" samples?

Yes, if you mean low pass - and the same applies to video:
wikipedia/Anti-aliasing_filter/Optical_applications

Midzuki · 14th April 2017, 17:31

Quote:

Originally Posted by StainlessS

Yes, anti-flicker. (same for PAL 50 fields/sec or NTSC 60 fields/sec, 25/30 Frames/sec would flicker).

30fps does flicker. In 2008 I created a short interlaced MPEG-2 clip which emulated a 29.97 f*s-per-second sequence, authored a DVD disc with it, and watched it on a 29-inch analog TV set.

What an annoying, disgusting, terrible experience,
is all I can say

Katie Boundary · 14th April 2017, 18:41

That's funny. None of my old CRT televisions ever flickered. Maybe it was just your TV and had nothing to do with the frame rate.

25th March 2017, 22:50	#1 \| Link
luquinhas0021 The image enthusyast Join Date: Mar 2015 Location: Brazil Posts: 270	How, mathematically, downscaling works? I talk about, here, turn an image into a polynomial function f(x,y) = z. In upscaling task, all it has to do is replace into function the coordinates of the interpolated pixels. What about downscaling task: is there something that can be done in the polynomial that is the image function? __________________ Searching for great solutions

10th April 2017, 08:29	#2 \| Link
Katie Boundary Registered User Join Date: Jan 2015 Posts: 1,056	http://entropymine.com/imageworsener/resample ^this explains things in simple English with lots of pretty pictures. It's a good place to start. __________________ I ask unusual questions but always give proper thanks to those who give correct and useful answers.

13th April 2017, 09:35	#8 \| Link
Sharc Registered User Join Date: May 2006 Posts: 3,997	Just a thought on the "stairsteps" (the temporal aspect maybe a bit off topic though): In a movie each picture (or frame in digital video) is displayed for 1/24s = 41.7ms. During this time the picture does not change, means our eyes are fed with a sequence of pictures of 41.7ms duration each, which is per definition a "stairstep" function. The interpolation filter is the human apparatus (eyes, brain ....). What is this human filter like? Some kind of lowpass, I assume ..... (btw. certain animals have "interpolation filters" which are totally different from humans, that's why they don't go to the cinema, I presume). Last edited by Sharc; 13th April 2017 at 09:57.

14th April 2017, 18:41	#20 \| Link
Katie Boundary Registered User Join Date: Jan 2015 Posts: 1,056	That's funny. None of my old CRT televisions ever flickered. Maybe it was just your TV and had nothing to do with the frame rate. __________________ I ask unusual questions but always give proper thanks to those who give correct and useful answers.