lossless encoding and file size

tarasdi · 11th October 2014, 22:27

Hi,

I read on this thread that a lossless codec will always result in a larger file size.

I don't understand this statement. The purpose of a codec is to compress data - how would throwing computing power at a compression problem _worsen_ the situation?

Winzip achieves lossless compression, and I don't think I've ever seen it produce a larger file size.

ChiDragon · 12th October 2014, 02:04

Quote:

Originally Posted by tarasdi

I read on this thread that a lossless codec will always result in a larger file size.

... when the source is a lossy-encoded input.

It's because your source is already highly compressed. To recompress it with a lossless codec, it first needs to be decompressed into raw video. Which is also what happens when you play back the file.

Small lossy video file -> enormous uncompressed video -> giant lossless video file

tarasdi · 23rd October 2014, 12:09

I had to think a bit about this.

I thought that lossless codecs achieved greater compression because they removed information from the source. Uncompressing a lossless file does not 'add' information to the file, so couldn't a lossless compression of 'x' bits have potentially a smaller file size of a losslessly compressed file that also has 'x' bits of information?

(I agree that given the same file, the lossy codec should produce a smaller file size than a lossless codec, but that's not exactly what is happening here).

raffriff42 · 23rd October 2014, 22:01

Learn 2 entropy.

Asmodian · 23rd October 2014, 22:21

Lossy codecs do achieve their greater compression efficiency by removing information from the source but it isn't as simple as that either.

The decompressing step for video generates exactly as much new information as was removed during the lossy compression. The goal is to get the generated image to look as much like the original (to us) while needing as little information as possible.

Lossy video compression builds a mathematical model that generates a video frame. None of the actual raw video data is stored, only matrix coefficients for equations which when decoded generate a similar looking image.

It is like the difference between saying red if X² + Y² = 100² and saying pixel 1 = black, pixel 2 = black ... pixel 50 = red, pixel 51 = black, etc. across 10000 pixels to describe a 100 pixel radius red circle. Except the equations only approximate the image, they do not capture any part of it exactly.

fvisagie · 24th October 2014, 08:08

Quote:

Originally Posted by tarasdi

Hi,

I read on this thread that a lossless codec will always result in a larger file size.

I don't understand this statement. The purpose of a codec is to compress data - how would throwing computing power at a compression problem _worsen_ the situation?

Winzip achieves lossless compression, and I don't think I've ever seen it produce a larger file size.

In addition to answers already provided, note that in the first case the new losslessly compressed file is being compared to a lossily compressed one; in the second case the new losslessly compressed file is being compared to the original, uncompressed file.

tarasdi · 20th December 2014, 12:43

Riffraff42, what part of entropy specifically?

Asmodian, I don't think that decompressing would generate as much new information (otherwise the codec would be lossless).

Where I think I have gone wrong in my thinking is that I came from the basis of losslessly compressing the _already compressed_ file.

After reading a bit about entropy, I actually think that it would be possible to losslessly compress an already lossy compressed filed, as I doubt that a lossy compression would reach the theoretical maximum of information per bit as provided by Shannon's theorem.

However, I don't think that using a lossless codec on this compressed file (if this could even be done - transcoding involves decompressing the lossy file first to 'raw' format) would work. The lossless codec is designed to work with the properties of a 'raw' video input... trying to apply the same codec to a lossy compressed video wouldn't provide good results as the characteristics of the data (the lossy compressed video) is different to what the lossless codec was designed for.

Asmodian · 21st December 2014, 04:24

Quote:

Originally Posted by tarasdi

Asmodian, I don't think that decompressing would generate as much new information (otherwise the codec would be lossless).

It generates the same amount of information, the information has simply been changed from the source. Lossless means mathematically there were no changes at all. Lossy codecs simply generate similar enough video data we don't mind what was changed too much.

It is very easy to know how much information raw video data is. Simply the number of pixels, the color depth, and the frame rate.

Quote:

Originally Posted by tarasdi

However, I don't think that using a lossless codec on this compressed file (if this could even be done - transcoding involves decompressing the lossy file first to 'raw' format) would work. The lossless codec is designed to work with the properties of a 'raw' video input... trying to apply the same codec to a lossy compressed video wouldn't provide good results as the characteristics of the data (the lossy compressed video) is different to what the lossless codec was designed for.

This is a very good way to understand it (but you are correct that without the decompression step the data format is incompatible with any lossless codec). Also the lossy video usually uses some of the same lossless information encoding techniques along with with the lossy ones (e.g. CABAC) so trying to further compress it using the same methodology doesn't gain anything. Like raring a zip file.

foxyshadis · 21st December 2014, 12:37

Quote:

Originally Posted by tarasdi

After reading a bit about entropy, I actually think that it would be possible to losslessly compress an already lossy compressed filed, as I doubt that a lossy compression would reach the theoretical maximum of information per bit as provided by Shannon's theorem.

It certainly is possible -- JPEG re-compressors like Stuffit and WinZip 12 remove the huffman coding, re-organize the data and use prediction and better coding to remove redundancy and shrink it significantly (20-40%). You can do something similar with older MPEG standards (up to MPEG-4 part 2, aka Divx/Xvid), but each generation gets a little better so there's a little less to improve. I'm not sure if it would even be possible to losslessly improve H.264 with CABAC. Maybe through the introduction of HEVC's wavefront, which can actually sometimes reduce size by 1% or so, and playing with CABAC contexts. Maybe you could do something really exotic with long-term references.

Compatibility is so much more important that most of these techniques never get any traction, though; people just update to newer standards as they appear.

tarasdi · 24th December 2014, 12:17

Quote:

Originally Posted by Asmodian

It is very easy to know how much information raw video data is. Simply the number of pixels, the color depth, and the frame rate.

I'd guess this depends on how you define 'information'. If it's pure bytes, then yes I agree with you. But if you use an entropy based definition of information, I'd imagine that it would be irrevocably reduced upon compression using a lossy codec.

In an extreme case, if the codec 'compressed' the video stream by replacing each pixel with a black pixel, then the information content would be zero (even though the number of pixels is the same as the raw video).

tarasdi · 1st January 2015, 11:40

Quote:

Originally Posted by foxyshadis

I'm not sure if it would even be possible to losslessly improve H.264 with CABAC. Maybe through the introduction of HEVC's wavefront, which can actually sometimes reduce size by 1% or so, and playing with CABAC contexts. Maybe you could do something really exotic with long-term references.

So just to confirm, it's not really possible to further compress say H.264 by using more processing power?

My video camera spits out h.264 in real time. The mental picture I have is that the encoding configuration on the video camera results in an acceptable quality to file size ratio, with the constraint that the encoding must be done in *real time*. If the real time constraint was lifted, then presumably you'd be able to use more processing power to bring down the file size without any loss of quality (kind of like the speed of compression options in the zip utility), up to some maximum compression ratio (bounded by Shannon's theorem).

However, we don't have the original raw data in order to re-run the compression with more processing time.

What may be useful could be a codec that takes in already compressed input, and further (losslessly) compresses it, with the end result being the same as what you would have obtained had you thrown more processing power (with the same codec parameters) at the raw data. Not sure if there's any theoretical limits to implementing something like this... Given the continuing fall in cost of storage, I don't think there's much of an incentive to implement either. It just irks me that a 5 minute video takes up 450MB of storage...

hello_hello · 2nd January 2015, 08:32

Quote:

Originally Posted by tarasdi

So just to confirm, it's not really possible to further compress say H.264 by using more processing power?

My video camera spits out h.264 in real time. The mental picture I have is that the encoding configuration on the video camera results in an acceptable quality to file size ratio, with the constraint that the encoding must be done in *real time*. If the real time constraint was lifted, then presumably you'd be able to use more processing power to bring down the file size without any loss of quality (kind of like the speed of compression options in the zip utility), up to some maximum compression ratio (bounded by Shannon's theorem).

Wouldn't more processing power enable better compression at a given speed, ie "real time"? If the real time constraint was lifted couldn't you bring down the file size without any loss of quality using the same processing power?

tarasdi · 2nd January 2015, 09:13

Quote:

Originally Posted by hello_hello

Wouldn't more processing power enable better compression at a given speed, ie "real time"? If the real time constraint was lifted couldn't you bring down the file size without any loss of quality using the same processing power?

Yes, but you I believe you need the original raw source, not the compressed version (with the tools available..)

foxyshadis · 5th January 2015, 01:14

Quote:

Originally Posted by tarasdi

What may be useful could be a codec that takes in already compressed input, and further (losslessly) compresses it, with the end result being the same as what you would have obtained had you thrown more processing power (with the same codec parameters) at the raw data. Not sure if there's any theoretical limits to implementing something like this... Given the continuing fall in cost of storage, I don't think there's much of an incentive to implement either. It just irks me that a 5 minute video takes up 450MB of storage...

By far the biggest sticking point is lossless, because even if you can find a significantly better reference vector for a block, you still have to create a new texture block that has to match exactly when unquantized -- otherwise it's quite likely to end up taking up more total bits than before. The processing power to create that residual is likely to far exceed re-encoding, because it's like breaking a hashed password: You keep guessing and checking until you get a match. Thanks to the in-loop deblocker, you can't just match the block textures either, you have to run through the loop filter in relation to its neighbors too, and since you're also tweaking the neighbors, it gets exponentially more difficult...

Now, you can probably accept near-lossless for blocks that are never referenced in the future, which would help a lot, since being referenced means any tiny deviations start to stack up. To cut size and processing time, maybe you're willing to accept barely-perceptible stacked distortions, at which point you're just re-encoding anyway.

It's a hard problem for very small gain, although not an impossible one. It's more research-paper material than practical, since near-lossless re-encoding is good enough for almost everyone.

tarasdi · 5th January 2015, 10:56

Quote:

Originally Posted by foxyshadis

By far the biggest sticking point is lossless, because even if you can find a significantly better reference vector for a block, you still have to create a new texture block that has to match exactly when unquantized -- otherwise it's quite likely to end up taking up more total bits than before. The processing power to create that residual is likely to far exceed re-encoding, because it's like breaking a hashed password: You keep guessing and checking until you get a match. Thanks to the in-loop deblocker, you can't just match the block textures either, you have to run through the loop filter in relation to its neighbors too, and since you're also tweaking the neighbors, it gets exponentially more difficult...

Now, you can probably accept near-lossless for blocks that are never referenced in the future, which would help a lot, since being referenced means any tiny deviations start to stack up. To cut size and processing time, maybe you're willing to accept barely-perceptible stacked distortions, at which point you're just re-encoding anyway.

It's a hard problem for very small gain, although not an impossible one. It's more research-paper material than practical, since near-lossless re-encoding is good enough for almost everyone.

Thanks for your response

. Clearly your knowledge in this area is much further along than my own, but I think I get the general idea of what your post is saying. As I mentioned, 400MB for a 4 minute video is a bit expensive (even in this age of cheap storage), so I might experiment a bit to see if near lossless re-encoding can give good results. It sounds like from your post that if you're willing to accept near imperceptible loss of quality, that you can make gains in the amount of compression.

movmasty · 16th January 2015, 13:04

Quote:

Originally Posted by tarasdi

Hi,

I read on this thread that a lossless codec will always result in a larger file size.

I don't understand this statement. The purpose of a codec is to compress data - how would throwing computing power at a compression problem _worsen_ the situation?

Winzip achieves lossless compression, and I don't think I've ever seen it produce a larger file size.

The losseless compression about we here are talking about is different from zip compression,

It doesnt serve to store files, but to create intermediate filtered videos to feed to a compressor,

In fact it is a losseless decompression

just smaller than pure RGB decompression

Ghitulescu · 16th January 2015, 13:30

Quote:

Originally Posted by tarasdi

I read on this thread that a lossless codec will always result in a larger file size.

I don't understand this statement. The purpose of a codec is to compress data - how would throwing computing power at a compression problem _worsen_ the situation?

Winzip achieves lossless compression, and I don't think I've ever seen it produce a larger file size.

Quote:

Originally Posted by ChiDragon

... when the source is a lossy-encoded input.

Actually, yes and no.
Starting from a single source a lossless codec will always require more space than a lossy one, as the lossy one can only achieve this level of compression only by throwing away some of the original information.
Having a compressed source, the same applies. Decompressing the file will result in a file which is again less compressed by a lossless codec than by a lossy one. Because a lossless codec does not look to throw away any information, the resulting compressed file may be slightly smaller (compared to the file that would result by compressing the original file) but definitely larger than any other compressed files.

Therefore in both cases a losslessly compressed file would be larger.

11th October 2014, 22:27	#1 \| Link
tarasdi Registered User Join Date: Oct 2014 Posts: 10	lossless encoding and file size Hi, I read on this thread that a lossless codec will always result in a larger file size. I don't understand this statement. The purpose of a codec is to compress data - how would throwing computing power at a compression problem _worsen_ the situation? Winzip achieves lossless compression, and I don't think I've ever seen it produce a larger file size.

23rd October 2014, 12:09	#3 \| Link
tarasdi Registered User Join Date: Oct 2014 Posts: 10	I had to think a bit about this. I thought that lossless codecs achieved greater compression because they removed information from the source. Uncompressing a lossless file does not 'add' information to the file, so couldn't a lossless compression of 'x' bits have potentially a smaller file size of a losslessly compressed file that also has 'x' bits of information? (I agree that given the same file, the lossy codec should produce a smaller file size than a lossless codec, but that's not exactly what is happening here).

23rd October 2014, 22:01	#4 \| Link
raffriff42 Retried Guesser Join Date: Jun 2012 Posts: 1,373	Learn 2 entropy.

23rd October 2014, 22:21	#5 \| Link
Asmodian Registered User Join Date: Feb 2002 Location: San Jose, California Posts: 4,407	Lossy codecs do achieve their greater compression efficiency by removing information from the source but it isn't as simple as that either. The decompressing step for video generates exactly as much new information as was removed during the lossy compression. The goal is to get the generated image to look as much like the original (to us) while needing as little information as possible. Lossy video compression builds a mathematical model that generates a video frame. None of the actual raw video data is stored, only matrix coefficients for equations which when decoded generate a similar looking image. It is like the difference between saying red if X² + Y² = 100² and saying pixel 1 = black, pixel 2 = black ... pixel 50 = red, pixel 51 = black, etc. across 10000 pixels to describe a 100 pixel radius red circle. Except the equations only approximate the image, they do not capture any part of it exactly.

20th December 2014, 12:43	#7 \| Link
tarasdi Registered User Join Date: Oct 2014 Posts: 10	Riffraff42, what part of entropy specifically? Asmodian, I don't think that decompressing would generate as much new information (otherwise the codec would be lossless). Where I think I have gone wrong in my thinking is that I came from the basis of losslessly compressing the _already compressed_ file. After reading a bit about entropy, I actually think that it would be possible to losslessly compress an already lossy compressed filed, as I doubt that a lossy compression would reach the theoretical maximum of information per bit as provided by Shannon's theorem. However, I don't think that using a lossless codec on this compressed file (if this could even be done - transcoding involves decompressing the lossy file first to 'raw' format) would work. The lossless codec is designed to work with the properties of a 'raw' video input... trying to apply the same codec to a lossy compressed video wouldn't provide good results as the characteristics of the data (the lossy compressed video) is different to what the lossless codec was designed for.