Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Newbies

Reply
 
Thread Tools Search this Thread Display Modes
Old 11th October 2014, 23:27   #1  |  Link
tarasdi
Registered User
 
Join Date: Oct 2014
Posts: 10
lossless encoding and file size

Hi,

I read on this thread that a lossless codec will always result in a larger file size.

I don't understand this statement. The purpose of a codec is to compress data - how would throwing computing power at a compression problem _worsen_ the situation?

Winzip achieves lossless compression, and I don't think I've ever seen it produce a larger file size.
tarasdi is offline   Reply With Quote
Old 12th October 2014, 03:04   #2  |  Link
ChiDragon
Registered User
 
ChiDragon's Avatar
 
Join Date: Sep 2005
Location: Vancouver
Posts: 610
Quote:
Originally Posted by tarasdi View Post
I read on this thread that a lossless codec will always result in a larger file size.
... when the source is a lossy-encoded input.

It's because your source is already highly compressed. To recompress it with a lossless codec, it first needs to be decompressed into raw video. Which is also what happens when you play back the file.

Small lossy video file -> enormous uncompressed video -> giant lossless video file
ChiDragon is offline   Reply With Quote
Old 23rd October 2014, 13:09   #3  |  Link
tarasdi
Registered User
 
Join Date: Oct 2014
Posts: 10
I had to think a bit about this.

I thought that lossless codecs achieved greater compression because they removed information from the source. Uncompressing a lossless file does not 'add' information to the file, so couldn't a lossless compression of 'x' bits have potentially a smaller file size of a losslessly compressed file that also has 'x' bits of information?

(I agree that given the same file, the lossy codec should produce a smaller file size than a lossless codec, but that's not exactly what is happening here).
tarasdi is offline   Reply With Quote
Old 23rd October 2014, 23:01   #4  |  Link
raffriff42
Retried Guesser
 
raffriff42's Avatar
 
Join Date: Jun 2012
Posts: 1,377
Learn 2 entropy.
raffriff42 is offline   Reply With Quote
Old 23rd October 2014, 23:21   #5  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 3,774
Lossy codecs do achieve their greater compression efficiency by removing information from the source but it isn't as simple as that either.

The decompressing step for video generates exactly as much new information as was removed during the lossy compression. The goal is to get the generated image to look as much like the original (to us) while needing as little information as possible.

Lossy video compression builds a mathematical model that generates a video frame. None of the actual raw video data is stored, only matrix coefficients for equations which when decoded generate a similar looking image.

It is like the difference between saying red if X + Y = 100 and saying pixel 1 = black, pixel 2 = black ... pixel 50 = red, pixel 51 = black, etc. across 10000 pixels to describe a 100 pixel radius red circle. Except the equations only approximate the image, they do not capture any part of it exactly.
Asmodian is offline   Reply With Quote
Old 24th October 2014, 09:08   #6  |  Link
fvisagie
Registered User
 
Join Date: Aug 2008
Location: Isle of Man
Posts: 588
Quote:
Originally Posted by tarasdi View Post
Hi,

I read on this thread that a lossless codec will always result in a larger file size.

I don't understand this statement. The purpose of a codec is to compress data - how would throwing computing power at a compression problem _worsen_ the situation?

Winzip achieves lossless compression, and I don't think I've ever seen it produce a larger file size.
In addition to answers already provided, note that in the first case the new losslessly compressed file is being compared to a lossily compressed one; in the second case the new losslessly compressed file is being compared to the original, uncompressed file.
fvisagie is offline   Reply With Quote
Old 20th December 2014, 12:43   #7  |  Link
tarasdi
Registered User
 
Join Date: Oct 2014
Posts: 10
Riffraff42, what part of entropy specifically?

Asmodian, I don't think that decompressing would generate as much new information (otherwise the codec would be lossless).

Where I think I have gone wrong in my thinking is that I came from the basis of losslessly compressing the _already compressed_ file.

After reading a bit about entropy, I actually think that it would be possible to losslessly compress an already lossy compressed filed, as I doubt that a lossy compression would reach the theoretical maximum of information per bit as provided by Shannon's theorem.

However, I don't think that using a lossless codec on this compressed file (if this could even be done - transcoding involves decompressing the lossy file first to 'raw' format) would work. The lossless codec is designed to work with the properties of a 'raw' video input... trying to apply the same codec to a lossy compressed video wouldn't provide good results as the characteristics of the data (the lossy compressed video) is different to what the lossless codec was designed for.
tarasdi is offline   Reply With Quote
Old 21st December 2014, 04:24   #8  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 3,774
Quote:
Originally Posted by tarasdi View Post
Asmodian, I don't think that decompressing would generate as much new information (otherwise the codec would be lossless).
It generates the same amount of information, the information has simply been changed from the source. Lossless means mathematically there were no changes at all. Lossy codecs simply generate similar enough video data we don't mind what was changed too much.

It is very easy to know how much information raw video data is. Simply the number of pixels, the color depth, and the frame rate.

Quote:
Originally Posted by tarasdi View Post
However, I don't think that using a lossless codec on this compressed file (if this could even be done - transcoding involves decompressing the lossy file first to 'raw' format) would work. The lossless codec is designed to work with the properties of a 'raw' video input... trying to apply the same codec to a lossy compressed video wouldn't provide good results as the characteristics of the data (the lossy compressed video) is different to what the lossless codec was designed for.
This is a very good way to understand it (but you are correct that without the decompression step the data format is incompatible with any lossless codec). Also the lossy video usually uses some of the same lossless information encoding techniques along with with the lossy ones (e.g. CABAC) so trying to further compress it using the same methodology doesn't gain anything. Like raring a zip file.
Asmodian is offline   Reply With Quote
Old 21st December 2014, 12:37   #9  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,436
Quote:
Originally Posted by tarasdi View Post
After reading a bit about entropy, I actually think that it would be possible to losslessly compress an already lossy compressed filed, as I doubt that a lossy compression would reach the theoretical maximum of information per bit as provided by Shannon's theorem.
It certainly is possible -- JPEG re-compressors like Stuffit and WinZip 12 remove the huffman coding, re-organize the data and use prediction and better coding to remove redundancy and shrink it significantly (20-40%). You can do something similar with older MPEG standards (up to MPEG-4 part 2, aka Divx/Xvid), but each generation gets a little better so there's a little less to improve. I'm not sure if it would even be possible to losslessly improve H.264 with CABAC. Maybe through the introduction of HEVC's wavefront, which can actually sometimes reduce size by 1% or so, and playing with CABAC contexts. Maybe you could do something really exotic with long-term references.

Compatibility is so much more important that most of these techniques never get any traction, though; people just update to newer standards as they appear.
__________________
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order.
foxyshadis is offline   Reply With Quote
Old 24th December 2014, 12:17   #10  |  Link
tarasdi
Registered User
 
Join Date: Oct 2014
Posts: 10
Quote:
Originally Posted by Asmodian View Post
It is very easy to know how much information raw video data is. Simply the number of pixels, the color depth, and the frame rate.
I'd guess this depends on how you define 'information'. If it's pure bytes, then yes I agree with you. But if you use an entropy based definition of information, I'd imagine that it would be irrevocably reduced upon compression using a lossy codec.

In an extreme case, if the codec 'compressed' the video stream by replacing each pixel with a black pixel, then the information content would be zero (even though the number of pixels is the same as the raw video).
tarasdi is offline   Reply With Quote
Old 1st January 2015, 11:40   #11  |  Link
tarasdi
Registered User
 
Join Date: Oct 2014
Posts: 10
Quote:
Originally Posted by foxyshadis View Post
I'm not sure if it would even be possible to losslessly improve H.264 with CABAC. Maybe through the introduction of HEVC's wavefront, which can actually sometimes reduce size by 1% or so, and playing with CABAC contexts. Maybe you could do something really exotic with long-term references.
So just to confirm, it's not really possible to further compress say H.264 by using more processing power?

My video camera spits out h.264 in real time. The mental picture I have is that the encoding configuration on the video camera results in an acceptable quality to file size ratio, with the constraint that the encoding must be done in *real time*. If the real time constraint was lifted, then presumably you'd be able to use more processing power to bring down the file size without any loss of quality (kind of like the speed of compression options in the zip utility), up to some maximum compression ratio (bounded by Shannon's theorem).

However, we don't have the original raw data in order to re-run the compression with more processing time.

What may be useful could be a codec that takes in already compressed input, and further (losslessly) compresses it, with the end result being the same as what you would have obtained had you thrown more processing power (with the same codec parameters) at the raw data. Not sure if there's any theoretical limits to implementing something like this... Given the continuing fall in cost of storage, I don't think there's much of an incentive to implement either. It just irks me that a 5 minute video takes up 450MB of storage...
tarasdi is offline   Reply With Quote
Old 2nd January 2015, 08:32   #12  |  Link
hello_hello
Registered User
 
Join Date: Mar 2011
Posts: 4,041
Quote:
Originally Posted by tarasdi View Post
So just to confirm, it's not really possible to further compress say H.264 by using more processing power?

My video camera spits out h.264 in real time. The mental picture I have is that the encoding configuration on the video camera results in an acceptable quality to file size ratio, with the constraint that the encoding must be done in *real time*. If the real time constraint was lifted, then presumably you'd be able to use more processing power to bring down the file size without any loss of quality (kind of like the speed of compression options in the zip utility), up to some maximum compression ratio (bounded by Shannon's theorem).
Wouldn't more processing power enable better compression at a given speed, ie "real time"? If the real time constraint was lifted couldn't you bring down the file size without any loss of quality using the same processing power?
hello_hello is offline   Reply With Quote
Old 2nd January 2015, 09:13   #13  |  Link
tarasdi
Registered User
 
Join Date: Oct 2014
Posts: 10
Quote:
Originally Posted by hello_hello View Post
Wouldn't more processing power enable better compression at a given speed, ie "real time"? If the real time constraint was lifted couldn't you bring down the file size without any loss of quality using the same processing power?
Yes, but you I believe you need the original raw source, not the compressed version (with the tools available..)
tarasdi is offline   Reply With Quote
Old 5th January 2015, 01:14   #14  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,436
Quote:
Originally Posted by tarasdi View Post
What may be useful could be a codec that takes in already compressed input, and further (losslessly) compresses it, with the end result being the same as what you would have obtained had you thrown more processing power (with the same codec parameters) at the raw data. Not sure if there's any theoretical limits to implementing something like this... Given the continuing fall in cost of storage, I don't think there's much of an incentive to implement either. It just irks me that a 5 minute video takes up 450MB of storage...
By far the biggest sticking point is lossless, because even if you can find a significantly better reference vector for a block, you still have to create a new texture block that has to match exactly when unquantized -- otherwise it's quite likely to end up taking up more total bits than before. The processing power to create that residual is likely to far exceed re-encoding, because it's like breaking a hashed password: You keep guessing and checking until you get a match. Thanks to the in-loop deblocker, you can't just match the block textures either, you have to run through the loop filter in relation to its neighbors too, and since you're also tweaking the neighbors, it gets exponentially more difficult...

Now, you can probably accept near-lossless for blocks that are never referenced in the future, which would help a lot, since being referenced means any tiny deviations start to stack up. To cut size and processing time, maybe you're willing to accept barely-perceptible stacked distortions, at which point you're just re-encoding anyway.

It's a hard problem for very small gain, although not an impossible one. It's more research-paper material than practical, since near-lossless re-encoding is good enough for almost everyone.
__________________
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order.
foxyshadis is offline   Reply With Quote
Old 5th January 2015, 10:56   #15  |  Link
tarasdi
Registered User
 
Join Date: Oct 2014
Posts: 10
Quote:
Originally Posted by foxyshadis View Post
By far the biggest sticking point is lossless, because even if you can find a significantly better reference vector for a block, you still have to create a new texture block that has to match exactly when unquantized -- otherwise it's quite likely to end up taking up more total bits than before. The processing power to create that residual is likely to far exceed re-encoding, because it's like breaking a hashed password: You keep guessing and checking until you get a match. Thanks to the in-loop deblocker, you can't just match the block textures either, you have to run through the loop filter in relation to its neighbors too, and since you're also tweaking the neighbors, it gets exponentially more difficult...

Now, you can probably accept near-lossless for blocks that are never referenced in the future, which would help a lot, since being referenced means any tiny deviations start to stack up. To cut size and processing time, maybe you're willing to accept barely-perceptible stacked distortions, at which point you're just re-encoding anyway.

It's a hard problem for very small gain, although not an impossible one. It's more research-paper material than practical, since near-lossless re-encoding is good enough for almost everyone.
Thanks for your response . Clearly your knowledge in this area is much further along than my own, but I think I get the general idea of what your post is saying. As I mentioned, 400MB for a 4 minute video is a bit expensive (even in this age of cheap storage), so I might experiment a bit to see if near lossless re-encoding can give good results. It sounds like from your post that if you're willing to accept near imperceptible loss of quality, that you can make gains in the amount of compression.
tarasdi is offline   Reply With Quote
Old 16th January 2015, 13:04   #16  |  Link
movmasty
Registered User
 
Join Date: Feb 2002
Posts: 970
Quote:
Originally Posted by tarasdi View Post
Hi,

I read on this thread that a lossless codec will always result in a larger file size.

I don't understand this statement. The purpose of a codec is to compress data - how would throwing computing power at a compression problem _worsen_ the situation?

Winzip achieves lossless compression, and I don't think I've ever seen it produce a larger file size.
The losseless compression about we here are talking about is different from zip compression,

It doesnt serve to store files, but to create intermediate filtered videos to feed to a compressor,

In fact it is a losseless decompression

just smaller than pure RGB decompression
movmasty is offline   Reply With Quote
Old 16th January 2015, 13:30   #17  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,635
Quote:
Originally Posted by tarasdi View Post
I read on this thread that a lossless codec will always result in a larger file size.

I don't understand this statement. The purpose of a codec is to compress data - how would throwing computing power at a compression problem _worsen_ the situation?

Winzip achieves lossless compression, and I don't think I've ever seen it produce a larger file size.
Quote:
Originally Posted by ChiDragon View Post
... when the source is a lossy-encoded input.
Actually, yes and no.
Starting from a single source a lossless codec will always require more space than a lossy one, as the lossy one can only achieve this level of compression only by throwing away some of the original information.
Having a compressed source, the same applies. Decompressing the file will result in a file which is again less compressed by a lossless codec than by a lossy one. Because a lossless codec does not look to throw away any information, the resulting compressed file may be slightly smaller (compared to the file that would result by compressing the original file) but definitely larger than any other compressed files.

Therefore in both cases a losslessly compressed file would be larger.
__________________
Born in the USB (not USA)
Ghitulescu is offline   Reply With Quote
Reply

Tags
lossless compression

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 02:03.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.