View Full Version : Development resumed
General Lee D. Mented
12th September 2004, 20:43
I've resumed development on WARP if anyone's interested.
I managed to get things working for RGB at least, though there's still alot of bugs. Virtualdub 1.6 doesn't want to work with it at all, I'll try and trace through that soon. Earlier versions seem to work ok, and it plays back in media players.
Lossy mode is still broken so don't use non-zero bitrates, though if you do it won't crash horribly or anything, it just won't give you anything interesting for output. I'll work on that after I debug what's going on with vdub 1.6 and colorspaces.
I could really use some help with this thing. I'll try and update the CVS and put out a new binary later tonight if I can.
virus
12th September 2004, 21:08
just 3 or 4 days ago, I dug up the original thread in which you described the algorithm itself, thinking "what happened to GLDM and his project? haven't heard anything recently from him". I'm happy to read you again and I'd also like to try some binary (don't have VC to compile it myself).
I read that you were depressed by the lossless results... are they so bad? You said in your video that you already had some idea to improve it after you've made it work flawlessly (example: a better wavelet kernel IIRC). I really hope you don't give up on it, it's a brand-new project and such new stuff needs a lot of refinement and improvements before being competitive. You seem a guy so full of fresh ideas... did you worked out some new features to improve it? What about the entropy coder you were designing from scratch?
virus
General Lee D. Mented
13th September 2004, 05:05
Well the lossless results haven't really come up with anything much better than huffyuv yet, which is depressing. Maybe with the right entropy coder and YUV colorspace that'll change a bit. I'm also having some kind of weird arithmetic overflow problem that shouldn't happen, and it's causing bright flecks to appear in black areas of the video where the value has flipped from negative to positive in the transform. I've been over the code and the math a dozen times and there shouldn't be a way for this to happen, even accounting for roundoff errors when converting back to 8bit data. If I use 16bit or 32bit mode the problem goes away, so I'm wondering if I've got some kind of incorrectly written rounding somewhere. I need some serious debugging help on this one.
As for the entropy coder, the new algorithm is an exhaustive substring bittree, which is then used to build a probability table and a range coded output string. Unlike a normal range or arithmetic coder, the table needs to be stored, which will hamper compression versus adaptive compressors that generate their data on the fly. On the plus side, this algorithm doesn't predict the probability of upcoming values, it KNOWS the probability of every possible string, via the gigantic tree which will bring systems crashing to their knees with memory overhead.
Basicly encode side will take eternity as the system computes the number of occurances of every possible sequence of bits in the set of data. Somewhere on the order of 16*(n(n-1)/2) bytes of data, where n is the number of bits you're compressing. So 1KB would be 8192 bits, which works out to 536,805,376 bytes of working space in memory. In a 64bit system, double that if you assume all node values are 64bit vars. That's the worst case, it probably won't be quite that bad but no way to tell where it will fall yet. You computer science theory people out there will notice this is an O(N^2) algorithm, which is considered A Bad Thing(TM). But you'll see why I want to put up with it.
After we get this really memory expensive tree, we know from the values in the nodes how many times each possible string of bits repeats. Given that and their length (which is the height in the tree at any point), we know how many times we could substitute a symbol for that sequence of bits. This isn't a guess like in other compressors, it's a known exact quantity. We can then prioritize the largest strings of bits and highest frequencies, and put every substitution we deem "worth it" into a table with frequency*length as a weight. As we do this, removing and decrementing the nodes in the tree for that sequence removes it and updates our model. Once we run out of worthwile values, we collapse the tree into counts in the two level 1 branches (single bit 1 and single bit 0) which are always manditory to store. Then we've got this table of strings of bits, their lengths, and their KNOWN repeat counts in the data. This will get stored for transmission (possibly packed some way but I'm not sure on that yet). Then all we have to do is take our table, and plug it into the data as each sequence comes up. We can find the least number of bits to represent this by a range coder (similar to an arithmetic coder) where our probability model is reversed. Instead of building it from the data, we dismantle it as we go along removing sequences of bits until it's empty. Then the resulting string gets packed along with the table as our final output. So we don't actually have a symbol table per say, we have a range table where sequences of bits are aligned to given probability ranges. We know these probability ranges exactly because we know their exact repeat count over the size of the set, so the probability of any string being found at any one point in the data is the current remaining repeat count / the total of all the string repeat counts. Anyone who's written a range coder or arithmetic coder is probably having a fit right now and will insist either I can't do this, it's impossible, it's ridiculous, or whatever. We'll see.
Decode side, this has some significant advantages. We'll be taking a pretty significant compression hit by storing that table, which most modern entropy coders don't do. However, I believe that having a "perfect" probability model will give better results than building a predicter from data on the fly. To decode things, we just take out the symbol table and the range coded string. Then feed the string in and use it to select the order to insert the strings in the table. Anyone who's actually paying attention this far will proably have come to the realization that decode is a simple process of building a string from a lookup table. You just read the range code values, find out what sequence you fall on in the table, paste it inline into the data buffer. Then decrement the repeat count, which narrows its probability in the model automaticly. If we put the highest frequencies at the end of the table, we don't even need to re-shift most of the values most of the time. So decompression should be insanely fast, probably even faster than simple algorithms like zip or lzo. As for memory use, well if we assume we will actually compress instead of expand, both the table and the range string must be smaller combined than the original data. So, aside from a buffer the size of the original 8192 bit string, we'd need a maximum of 8192 bits for the table, and a maximum of 8192 bits for the range codes. Therefore the maximum memory overhead is 2n, or in this case 2KB.
The main challenge in building this entropy coder will probably be figuring out how many bits need to repeat how many times to be "worth" a table entry. Consider that we have to factor in the size of the table itself as reducing compression. There's probably some kind of heuristic algorithm to do the balancing but I haven't really sat down and thought on it yet. Some input would be appreciated on that. Also, to really make things efficient, we can't be storing 32bit ints as frequency counts in the table, so there probably needs to be some way to dynamicly adjust the size of those values to the minimum possible for storage. That I'm really gonna need help on or it'll be ugly as hell.
*.mp4 guy
13th September 2004, 05:50
Uuuuuhhhhhhmmmmmmmmmm...
Well most of this is beyond me :D
From what i understand tho your entropy encoder seems pretty amazing, I really dont care how time consuming it is. Also cant wait to test the lossless mode with it. oh and just something im sure youve thought of but ill say it anyway: the size of the table should take into account the size of the output as a bigger output will be able to use a bigger table...very obvious i know:D
Anyway keep up the good work and excuse my un-understanding:confused:
Asmodeus
13th September 2004, 09:26
Great to hear that you keep up with this project :D
I'm no programer of any kind, so I couldn't help in any way. I was unsatisfied when there was no news for long time about WARP, so don't keep us with tension once again ;)
If you need testers for this... :sly:
virus
13th September 2004, 14:52
@GLDM
just some random thoughts on your ideas :)
First of all: are you sure that the problem for lossless was the entropy coder itself? And why? IIRC the transform used was user-configurable, have you tried using something like 32x32x1 (that is, disabling any temporal compression) and compare it with HuffYUV/FFV1 to see what happens? Your idea was originally: "I will find a bunch of string of zeros, which are very compressable". This is _not_ the case in lossless, since you cannot discard anything to make it zero (especially high frequencies).
Are you sure the entropy coder doesn't simply go astray in such situations (either the old and the new one)? MPEG coupled RLE with Huffman to work well at all rates, because "pure" RLE sucks at high rates (up to and including lossless). So it may be more a problem of choice than of "LZ is good but I need more". Maybe it won't simply work well for high rates, no matter how much effort you put into it.
About the "perfect model" vs "building on the fly": well, such a huge model is useless if you don't compute it _well_ (how do you plan to "compute" the values? training from real-life data would not be better?). The more an entropy coder is efficient, the more it is sensitive to the probability model used. If you assign sub-optimal probabilities, performance-wise it will b0rk badly. Are you sure you can evaluate _precisely_ a model for all these entries? I doubt so. Also, this way you're forced to work with first order stats and it's no good imho (higher order probabilities - via context modeling - would be better, if you can find a way to introduce them in your scheme).
Also: are you sure all the memory/computational requirements are worth it? Personally I would have started by replacing LZ with something _simpler_, just to evaluate how much the loss actually is and see the real impact of the coder on your overall scheme, before adding a very difficult task to a still unfinished codec.
Last note: when you have a frequency count table holding values that may overflow, a well-established trick is to halve all the entries in the table when one on them reaches the limit. An even better one is to halve only the entry which overflows. This apparently b0rks the stats, but in fact this may be acceptable, depending on your distributions.
General Lee D. Mented
13th September 2004, 23:44
Originally posted by virus
@GLDM
just some random thoughts on your ideas :)
First of all: are you sure that the problem for lossless was the entropy coder itself? And why? IIRC the transform used was user-configurable, have you tried using something like 32x32x1 (that is, disabling any temporal compression) and compare it with HuffYUV/FFV1 to see what happens? Your idea was originally: "I will find a bunch of string of zeros, which are very compressable". This is _not_ the case in lossless, since you cannot discard anything to make it zero (especially high frequencies).
Are you sure the entropy coder doesn't simply go astray in such situations (either the old and the new one)? MPEG coupled RLE with Huffman to work well at all rates, because "pure" RLE sucks at high rates (up to and including lossless). So it may be more a problem of choice than of "LZ is good but I need more". Maybe it won't simply work well for high rates, no matter how much effort you put into it.
About the "perfect model" vs "building on the fly": well, such a huge model is useless if you don't compute it _well_ (how do you plan to "compute" the values? training from real-life data would not be better?). The more an entropy coder is efficient, the more it is sensitive to the probability model used. If you assign sub-optimal probabilities, performance-wise it will b0rk badly. Are you sure you can evaluate _precisely_ a model for all these entries? I doubt so. Also, this way you're forced to work with first order stats and it's no good imho (higher order probabilities - via context modeling - would be better, if you can find a way to introduce them in your scheme).
Also: are you sure all the memory/computational requirements are worth it? Personally I would have started by replacing LZ with something _simpler_, just to evaluate how much the loss actually is and see the real impact of the coder on your overall scheme, before adding a very difficult task to a still unfinished codec.
Last note: when you have a frequency count table holding values that may overflow, a well-established trick is to halve all the entries in the table when one on them reaches the limit. An even better one is to halve only the entry which overflows. This apparently b0rks the stats, but in fact this may be acceptable, depending on your distributions.
Well I haven't tried 32x32x1, maybe that's worth looking into.
As for the model, what are you talking about by training? The model is constructed from whatever data is being compressed. As for computing it well, it's computed completely. Every possible substring of the data is mapped to a tree and then we have the real probability of all of them. There's no approximation like a standard statistical prediction model. We're not reading in some data and then guessing what's coming next. We're reading ALL the data, and analyizing every possible subset of it, and then noting how many times that repeats over the entire set of data. There's no "sub optimal probabilities" because all probabilities are known, not guesses from some kind of statistics. There's no need for any kind of context because we're not trying to do prediction.
As for complexity, I think it will be worth it. The downside of most intensive entropy coders like burrows wheeler (bzip) are the symmetric performance on decompression. It's slow on both sides. Prediction table compressors like LZH range from about the same to significantly faster on compress side, and are typically much faster on decompression. In my sytem, the compress side will be extremely slow and intensive. However the decompress side will be extremely fast and light. Since people watch video more often than they encode it (most DVDs are done once and then pressed into millions) it's really not relevant how slow the encode side is, and as such I have not prioritized it. I don't care about realtime compression because it's too much of a headache and probably not worth it. I care about best efficiency, and if that means servers doing month long encodes I really don't care, as long as the compression is significantly better, which is what I'm hoping for.
I'm not sure what you're talking about with frequency count tables that can overflow. My table can't overflow because its size isn't determined until after we have all the data in the tree. The table is built by analysis of the frequencies in the string tree, and once we have that we encode the data set as a range code mapped to that table. The frequencies in the table DECREASE as we go along the data. On decompress side, we have the table to start with so we just remap the range code to the table and again it decreases as you go along. When you've decompressed all the data, the table is empty and the range code string ends.
virus
14th September 2004, 00:16
Originally posted by General Lee D. Mented
As for the model, what are you talking about by training? The model is constructed from whatever data is being compressed. As for computing it well, it's computed completely. Every possible substring of the data is mapped to a tree and then we have the real probability of all of them.
ok, so you plan to read in data, and analyze and store every subset of it. you're replacing a statistical model with a straightforward, brute force frequency count, no? are you sure you cannot reduce this huge amount of data to a simpler model (again)? the losses will be probably minimal I guess. but maybe the memory/computational savings will be huge.
Also, what happens if you change the size of your transform? Example: raising it to something like 128x128x32, will it make your coder explode or maybe there's no relation between the transform size and the entropy coder itself?
There's no need for any kind of context because we're not trying to do prediction.
a context is used for entropy coding (modeling high order correlation between the data), not for prediction... if your codec does not work well, it may be due to the lack of MC which prevent you from removing all the temporal correlation. In such cases context-based entropy coding _can_ improve things. Anyway, that's not important.
I'm not sure what you're talking about with frequency count tables that can overflow. My table can't overflow because its size isn't determined until after we have all the data in the tree.
well, then you know the size of every entry in the table, so where is your problem with storing it? if 32 bits ints are too much, just use a VLC code then. (but maybe I didn't understand what you problem was at all)
Anyway: I've tried d/l some stuff from your old links and it dates back February. Do you have an updated package with the source and all the needed binaries (and some instructions to install them manually, too, since I don't think you have an installer ready right now ;))?
How much RAM do I need to make this work? Maybe you've got the good old version with LZMA and maybe I can also configure the dictionary size to be something smaller than 256 MB? (please say "yes" :))
*.mp4 guy
14th September 2004, 00:39
I would also like to help test if possible, and i only have 512mb of ram.
General Lee D. Mented
14th September 2004, 03:52
Originally posted by virus
ok, so you plan to read in data, and analyze and store every subset of it. you're replacing a statistical model with a straightforward, brute force frequency count, no? are you sure you cannot reduce this huge amount of data to a simpler model (again)? the losses will be probably minimal I guess. but maybe the memory/computational savings will be huge.
Also, what happens if you change the size of your transform? Example: raising it to something like 128x128x32, will it make your coder explode or maybe there's no relation between the transform size and the entropy coder itself?
You're right, it's an absolute brute force approach. If there's a way of doing something like an N^2->Nlog(N) optimization, I'd love to hear it. I can't really think of any kind of major shortcut for it.
As for transform size, it should work fine, provided you don't exceed 2^31 samples or wind up allocating more memory than the system can adress. If I port to 64bit, the transform size is unlikely to matter. Larger transforms, especially in temporal dimension, should produce more compression efficiency at the cost of speed and memory use. Also, I don't have any exception handling in yet, so it will blow up without warning if you blow the 4GB limit.
a context is used for entropy coding (modeling high order correlation between the data), not for prediction... if your codec does not work well, it may be due to the lack of MC which prevent you from removing all the temporal correlation. In such cases context-based entropy coding _can_ improve things. Anyway, that's not important.
I've got a MC design for later use. It will not be block based like mpeg, and will likely be another brute force on encode only design.
well, then you know the size of every entry in the table, so where is your problem with storing it? if 32 bits ints are too much, just use a VLC code then. (but maybe I didn't understand what you problem was at all)
The problem is, when I figure out how many entries are going to be in the table and what size they are, I have to weigh this against the fact that I'll have to store this table along with the string. So a huge table may not be the best option. A small one may not compress enough. It'll take some thinking and/or testing.
Anyway: I've tried d/l some stuff from your old links and it dates back February. Do you have an updated package with the source and all the needed binaries (and some instructions to install them manually, too, since I don't think you have an installer ready right now ;))?
How much RAM do I need to make this work? Maybe you've got the good old version with LZMA and maybe I can also configure the dictionary size to be something smaller than 256 MB? (please say "yes" :))
I'm working on figuring out how CVS works again so I can update it. If you want, I can email you a file with binaries and sources. Right now there isn't any entropy coder in the program, I've been relying on external ones. Basicly encoding the transformed and RLE'd AVI, and then running 7zip or bzip or whatever on it, with different parameters.
callmeace
14th September 2004, 06:17
Hello all :)
I am not a programmer so I didn't understand a lot of that, are you guys working on a lossless video codec, such as an alternative to Huffyuv?
If so, then as a user I would value an end smaller video file size over speed and resource requirement during capture (CPU and memory usage). I am quiet keen to archive my videos in a lossless digital format, and with Huffyuv I get like 30GB+ per hour of 720x576 25fps PAL video. I know that storage media with greater capacity like Blu-Ray are coming out, but at the moment I am forced to encode to lossy which I don't want to keep doing, so anyway, I am really looking out for a new Lossless video capture codec that can achive compression of about <24GB per hour of 720x576 25fps. I can always upgrade my hardware if needed as it would be worth it to me.
:)
virus
14th September 2004, 13:02
Originally posted by General Lee D. Mented
You're right, it's an absolute brute force approach. If there's a way of doing something like an N^2->Nlog(N) optimization, I'd love to hear it. I can't really think of any kind of major shortcut for it.
It depends on how much the improvement will be compared to LZMA, but if it's not-so-high you may try to encode a bunch of videos as training and maybe reduce the brute force approach to something simpler. Maybe you'll find that certain sequences always use the same number of bits. That may open the door to some shortcuts (not in the algorithm complexity itself, but in the computed table - maybe you don't need to analyze everything for each input). Maybe you can reduce heavily memory and power required just by sacrificing a 0.1% efficiency. Anyway, this is something that can be done only at a later stage, right now it's just speculation ;)
I've got a MC design for later use. It will not be block based like mpeg, and will likely be another brute force on encode only design.
This approach is perfectly fine for experimenting. It's quite simple, too. There will be time to reduce its complexity at a later stage. (you'll probably want to introduce some "low-complexity modes" I think, to make the codec more flexible)
Anyway, I'll PM you my address soon, so I can finally try this stuff :)
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.