PDA

View Full Version : Detail reconstruction using difference...


lnatan25
25th June 2009, 12:43
Hey guys!

I have an idea, and want to see if it is even feasible. I have this AVS code:

original=directshowsource("1080p source").trim(10023,11455)
encode=directshowsource("320x136 encode").Lanczos4Resize(1920,800).trim(10023,11455).sharpen(1.0)
#encode=original.Lanczos4Resize(320,136).Lanczos4Resize(1920,800).trim(10023,10555)
difference=overlay(original, encode, mode="Subtract", pc_range=true)
reconstruction=overlay(encode,difference,mode="Add", pc_range=true)

return reconstruction.ConvertToYV12()

The difference that comes out is very compressible, and the 320x136 encode is tiny (low bitrate). Can some detail reconstruction be achieved by this? The idea is doing a lot of work at first (encoding, generating difference encode) and the end (adding encode and difference), but ending up with something very small (encode and difference files). Data loss and unwanted artifacts will probably occur, but the question is, will they be up to an acceptable levels.

If this is not possible, please provide with an explanation.

Thanks in advance,
Leo

Audionut
25th June 2009, 12:45
What does this have to do with H.264?

lnatan25
25th June 2009, 12:57
My bad, didn't know where to put it. This is not a question about AviSynth, but more on video encoding, I think. Please move this to the correct forum.

Sharktooth
25th June 2009, 13:05
your method is sadly useless. quantization already takes care of your idea (in a different way though). when encoding some details are "dropped" due to quantization. the higher the quantizer the more details are dropped. that's in respect of the user settings (usually bitrate or qp value).
to reconstruct the missing details you have to lower the qp or rise the bitrate. the differencies you extrapolated with overlay() will take the same bits when encoded in the video due to the nature of the encoder (if we're speaking of mpeg encoders). the only difference is you encoded your "reconstruction buffer" at 320*xxx and that's not a good idea since you're trying to keep details.
the idea may work in other contexts though and, IIRC someone already uses it.

lnatan25
25th June 2009, 13:11
Yes, currently both source and encode are using h264.

My idea was to use a small encode (which can, for example be watched on a PDA), and have a highly compressible difference layer that can be added to the encode to reconstruct some details from the source.

Sharktooth
25th June 2009, 13:51
the problem is details are destroyed while resizing the difference to such a lower resolution.
the upscaling wont restore those lost details...

reepa
25th June 2009, 20:15
the problem is details are destroyed while resizing the difference to such a lower resolution.

If you don't lowpass filter, the high-frequency detail gets preserved when downsampling if there are no low-frequency details (the high frequencies alias to low frequencies). You can reconstruct the high-frequency details by upsampling and lowpass filtering. Wavelets sort of work like this (two downsampled copies of the original signal - one containing the low frequencies, one the high frequencies - apply recursively).

I don't see the point though - much better results by just encoding the original video.

Sharktooth
25th June 2009, 22:54
no way to reconstruct details with upscaling. by downsizing pixels get discarded. you can do nothing to avoid that.
since pixels will be brutally deleted the best thing you can do is replace those pixels by interpolating the nearest ones (upscaling) with some complex interpolation function but subpixel details are gone forever. there is NO way to have them back.

2Bdecided
26th June 2009, 15:56
The difference that comes out is very compressible, and the 320x136 encode is tiny (low bitrate).
I think you've made a mistake somewhere. If you do it correctly, then either x264.encode + x264.difference looks worse than x264.original, or requires a much higher bitrate.

The most efficient way of encoding it is to encode the original as-is. Splitting the original into small base image and high frequency remnants give the encoder a really hard time.

If it appeared to work, you haven't done it properly!


What you've suggested is already implemented as the spatial scalable mode in various standards - it give the advantage of having a separate small video available, but if you want to reconstruct the large video, it isn't as efficient as encoding the large video as-is.

Cheers,
David.

reepa
27th June 2009, 11:00
no way to reconstruct details with upscaling. by downsizing pixels get discarded. you can do nothing to avoid that.
since pixels will be brutally deleted the best thing you can do is replace those pixels by interpolating the nearest ones (upscaling) with some complex interpolation function but subpixel details are gone forever. there is NO way to have them back.

If the high-resolution image doesn't have low-frequency details (say, if it's high-pass filtered), it can be downsampled without losing the high-frequency details. The details alias to low frequencies when downsampling. When upsampling later, the low frequencies "image" to high frequencies. By low-pass filtering the aliased details get removed and the "imaged", i.e. the original details remain.

*.mp4 guy
27th June 2009, 12:08
What you are saying is correct, but it will only allow you to keep the total number of stored pixels (low resolution plus high frequency difference) equal to the total number of input pixels, you cannot reduce the number of stored pixels without losing information (unless you store that information somewhere else, IE by upping the bit depth or something). Also, without careful construction such a system will still lose information due to the limits of our frequency filtering ability, the only way to get around this is to use a carefully cunstructed mathematical system such as used in lossless j2k.

Sharktooth
27th June 2009, 14:46
exactly. so, in conclusion, reducing the resolution will IN ANY CASE kill details. you may speak about frequencies and all... but you cant keep the pixel if it gets deleted.
an example (black and white, 0 = black, 1 = white):
0 0 0 0
0 1 0 0 resize 0 0
0 0 0 0 -----> 0 0
0 0 0 0
as you can see a 4x4 image gets resized to a 2x2 image.
once you resize you get all zeroes (black) and the white pixel is lost.
depending on the resizer you may also get a result like this:
1 0
0 0
that is even worse since it will introduce a bigger error...
However, focusing on the first case, there is nothing in heaven and hell that can reconstruct that white pixel and when you upscale the image you will get this:
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
in the second case you will get:
1 1 0 0
1 1 0 0
0 0 0 0
0 0 0 0
as you can see there is no way back to the original image... and none of the two cases is an acceptable result.
using a higher bitdepth the results are similar. you loose color and/or luminance info of that pixel...
PRINT IT IN YOUR MIND: UPSCALING IS USELESS.

*.mp4 guy
27th June 2009, 16:00
It is provable that the highest frequencies of a given array can be stored in a smaller amount of sample points then the total number of sample points in the array, this is one of the operative principles at work in linear transform coding. You can, for example, take a 720*480 array, and losslessly separate it into 4 360*240 arrays, 3 of them being representations of high frequencies and one of them being the equivelent of a properly lowpassed and downscaled image, in this case the high frequencies of the array are stored in less space then the total input array, but no compression has yet been achieved, as the total output data has only been reorganized, and still takes the same amount of space as the input (likely more before entropy coding, and less after, but these are implementation considerations).

However this debate is not really relevant to the thread.

Sharktooth
27th June 2009, 16:12
yup, but that supposes you're storing those info in a different way. the origina idea is to encode the resized difference between original and encoded file.
that wont work coz, first, it's downsized (see the post above) and, second, it's not a good way to store that info. IMHO those info can be processed and stored more efficiently by the encoder just by lowering the quantizer or using adaptive quantization or other techniques.