Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > New and alternative video codecs

Reply
 
Thread Tools Search this Thread Display Modes
Old 3rd May 2024, 06:30   #1  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 14
Digital Subband Video 1 - Wavelet Codec in Less than 5000 Lines of C

Hi Doom9, I recently developed this:

https://github.com/LMP88959/Digital-Subband-Video-1

DSV1 is comparable to MPEG-1 / MPEG-2 but uses wavelets instead of the DCT.

Click to expand comparison images:



I intend to use it as a video codec for my game but I believe it might be of interest to some of you here

Here is a little slideshow presentation I made explaining the codec and some of my thoughts on video compression:
https://www.youtube.com/watch?v=zDm4GN-znBo

Please understand I developed this codec out of passion and interest in the field, I didn't develop this for serious/professional use.

Here is a quick description:

- uses mutliresolution subband analysis (aka wavelet decomposition) instead of DCT
- half-pixel motion compensation
- supports 4:1:1, 4:2:0, 4:2:2, and 4:4:4 chroma subsampling formats
- adaptive quantization
- closed GOP with intra and inter frames (no B frames, only P frames for simplicity)
- no complex entropy coding (only interleaved exponential-Golomb coding)
- support for much lower bitrates than MPEG-1 / MPEG-2

The README and the PDF files in the GitHub have more information if you're curious.

I've seen Cineform's codec, Dirac, and Snow but none of these were made to be comparable to the first generation of MPEG video, and Snow ended up being abandoned. DSV1 is the first "complete" wavelet video codec I am aware of that is comparable to MPEG-1 / MPEG-2 and not trying to compete with H.264 or MPEG-4 part 2.


Please let me know your thoughts, positive and negative. If you find any bugs please let me know as well

- EMMIR

Last edited by LMP88959; 3rd May 2024 at 17:18. Reason: Updated image links to a better image hosting site
LMP88959 is offline   Reply With Quote
Old 3rd May 2024, 22:55   #2  |  Link
modus-ms325c
Registered User
 
Join Date: Dec 2023
Posts: 18
not sure i really need another video codec at this point, but i'm interested.
my superficial look is that DSV1 maintains a lot of detail at expense of blurring out other parts of the frame or worse.

Last edited by modus-ms325c; 3rd May 2024 at 22:55. Reason: image -> frame
modus-ms325c is offline   Reply With Quote
Old 3rd May 2024, 23:01   #3  |  Link
MoSal
Registered User
 
Join Date: Jun 2013
Posts: 103
Quote:
comparable to MPEG-1 / MPEG-2
You mention this multiple times. Why is it a highly relevant/desirable property for you?
__________________
https://github.com/MoSal
MoSal is offline   Reply With Quote
Old 3rd May 2024, 23:29   #4  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 14
Quote:
Originally Posted by modus-ms325c View Post
not sure i really need another video codec at this point, but i'm interested.
my superficial look is that DSV1 maintains a lot of detail at expense of blurring out other parts of the frame or worse.
Thank you!

Quote:
Originally Posted by MoSal View Post
You mention this multiple times. Why is it a highly relevant/desirable property for you?
Thank you for your question, I mention it for a few reasons:
1. To let people know DSV1 isn't competitive with codecs newer than MPEG-1 / MPEG-2
2. To help group it into the same era of video codecs.
3. To invite others to compare it with those video coding standards.

I hope that clarifies things for you
LMP88959 is offline   Reply With Quote
Old 4th May 2024, 06:26   #5  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 404
This is great.

You probably won't be able to start a business based on that but I am sure you learned a lot.

As you are using HAAR for most things, have you tried to partition the picture in macroblocks like MPEG-1/2 ? This might make things more manageable and HAAR does not have funny edge effects.

And for Inter I know you want short Wavelets, have you tried Daubechies 3 ? I remember that one fondly for Inter, even if used with somewhat small pixel blocks.

If you want to shave of bits from your coefficient coding I recommend Range Coding, maybe look up that IBM paper from G. N. N. Martin / 1979. I think AV1 is using a flavor of that one too. This will make things more complicated and slower though so if speed and simplicity is your goal just pass.

As for no Wavelet based video codecs you know being comparable to Mpeg-1/2, please keep in mind that lots of people made Wavelet based video codecs, block based hybrid ones even, but almost none made it in the spotlight because they just don't perform as well as macroblock based DCT codecs. Wavelet Codecs are mostly some academic thing.

Another thing:

I stalked your Github, what do you think of the following 68K texture span inner loop (texture stride = 255) ?

Code:
move.b (%a1,%d2.w),%d6 ; load texel
move.b %d6,(%a0)+      ; write texel
add.w %a3,%d4          ; vfrac inc
scs.b %d7              ; vfrac overflowed ?
add.w %a2,%d5          ; ufrac inc
addx.w %d3,%d2         ; combined u int + v int + ufrac overflow texel step
add.w %d7,%d2          ; additional v texel step if vfrac overflowed

PS: DSV1_spec.pdf, Page 16, 2nd MIN() -> MAX
rwill is offline   Reply With Quote
Old 4th May 2024, 15:34   #6  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 14
Quote:
Originally Posted by rwill View Post
This is great.

You probably won't be able to start a business based on that but I am sure you learned a lot.

As you are using HAAR for most things, have you tried to partition the picture in macroblocks like MPEG-1/2 ? This might make things more manageable and HAAR does not have funny edge effects.
Hmm, doing a Haar transform isolated to a macroblock? It's definitely something to experiment with, what are referring to when you say 'more manageable'?

Quote:
Originally Posted by rwill View Post
And for Inter I know you want short Wavelets, have you tried Daubechies 3 ? I remember that one fondly for Inter, even if used with somewhat small pixel blocks.
I tried a CDF 2.2 and the edge artifacts around intra blocks looked awful

Quote:
Originally Posted by rwill View Post
If you want to shave of bits from your coefficient coding I recommend Range Coding, maybe look up that IBM paper from G. N. N. Martin / 1979. I think AV1 is using a flavor of that one too. This will make things more complicated and slower though so if speed and simplicity is your goal just pass.
I had heard about, and seen some implementations of, range coding, but since I was prioritizing speed and simplicity like you said I just stuck with EG coding and RLE.

Quote:
Originally Posted by rwill View Post
As for no Wavelet based video codecs you know being comparable to Mpeg-1/2, please keep in mind that lots of people made Wavelet based video codecs, block based hybrid ones even, but almost none made it in the spotlight because they just don't perform as well as macroblock based DCT codecs. Wavelet Codecs are mostly some academic thing.
Yeah I have no doubt others have written similar codecs, I'm sorry if I made it sound like I've done something novel. I'm mostly referring to DSV1 being publicly available, free, unencumbered, and in a stable 'finalized' state. I personally can't recall seeing another codec like DSV1 that checks those boxes, but if you know of any please let me know, I am always interested to learn about others' projects and efforts.

Quote:
Originally Posted by rwill View Post
Another thing:

I stalked your Github, what do you think of the following 68K texture span inner loop (texture stride = 255) ?

Code:
move.b (%a1,%d2.w),%d6 ; load texel
move.b %d6,(%a0)+      ; write texel
add.w %a3,%d4          ; vfrac inc
scs.b %d7              ; vfrac overflowed ?
add.w %a2,%d5          ; ufrac inc
addx.w %d3,%d2         ; combined u int + v int + ufrac overflow texel step
add.w %d7,%d2          ; additional v texel step if vfrac overflowed
Haha are you a graphics enthusiast too? I'm an assembly novice, but your texture fill looks good, I'm assuming paletted texture due to 8-bit texels?

Quote:
Originally Posted by rwill View Post
PS: DSV1_spec.pdf, Page 16, 2nd MIN() -> MAX
Thanks for catching that! I just fixed it in the repo
LMP88959 is offline   Reply With Quote
Old 5th May 2024, 03:42   #7  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 404
Quote:
Hmm, doing a Haar transform isolated to a macroblock? It's definitely something to experiment with, what are referring to when you say 'more manageable'?
...
I tried a CDF 2.2 and the edge artifacts around intra blocks looked awful
With 'more manageable' I mean to have some sort of unit that can be somewhat encoded independently. You would lose your global coefficient approach but you then can do things like rate distortion optimization better. That is it makes it more possible to weight the distortion some encode decision generates against the bits it will cost. Now rate distortion was started to be used in the early 00' so most Mpeg-1/2 encoders did not do that. Encoding in blocks also reduces the effects a transform has to the block area. It will reduce wavelet transform efficiency though.

Quote:
I had heard about, and seen some implementations of, range coding, but since I was prioritizing speed and simplicity like you said I just stuck with EG coding and RLE.
Your plane approach might be suited for something like Huffman variable length codes with a frame optimized table too.. I am just throwing ideas around though. I vaguely recall that the Theora video codec was transmitting its coefficients as a huge size optimized run/level blob too.

Quote:
Yeah I have no doubt others have written similar codecs, I'm sorry if I made it sound like I've done something novel. I'm mostly referring to DSV1 being publicly available, free, unencumbered, and in a stable 'finalized' state. I personally can't recall seeing another codec like DSV1 that checks those boxes, but if you know of any please let me know, I am always interested to learn about others' projects and efforts.
When I was doing my video codec experiments 20+ years ago there were a couple open source implementations available. But I never have seen one that came with a specification or was as polished as yours. Were different times with different tools though.


Quote:
Haha are you a graphics enthusiast too? I'm an assembly novice, but your texture fill looks good, I'm assuming paletted texture due to 8-bit texels?
Yeah, somewhat. From time to time I do some graphics stuff on a TI-89 calculator which has a Motorola 68k CPU and is used with 2 bit grayscale graphics.
Like for example I have done this: https://www.youtube.com/watch?v=3WvSVB4nqjw
rwill is offline   Reply With Quote
Old 5th May 2024, 05:01   #8  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 14
Quote:
Originally Posted by rwill View Post
With 'more manageable' I mean to have some sort of unit that can be somewhat encoded independently. You would lose your global coefficient approach but you then can do things like rate distortion optimization better. That is it makes it more possible to weight the distortion some encode decision generates against the bits it will cost. Now rate distortion was started to be used in the early 00' so most Mpeg-1/2 encoders did not do that. Encoding in blocks also reduces the effects a transform has to the block area. It will reduce wavelet transform efficiency though.
Ah okay, that makes sense. Maybe some more error resilience in that case too? (even though I didn't design DSV1 for streaming/network transmission). If I ever decide to create a DSV2 I will definitely try this out though.

Quote:
Originally Posted by rwill View Post
Your plane approach might be suited for something like Huffman variable length codes with a frame optimized table too.. I am just throwing ideas around though. I vaguely recall that the Theora video codec was transmitting its coefficients as a huge size optimized run/level blob too.
I used EG codes because the magnitudes of the coefficients can be arbitrarily large which can be annoying to deal with. A Huffman table would require two passes on the coefficients right?

Quote:
Originally Posted by rwill View Post
When I was doing my video codec experiments 20+ years ago there were a couple open source implementations available. But I never have seen one that came with a specification or was as polished as yours. Were different times with different tools though.
Oh wow you're a veteran then, I began my image compression journey in 2022 and started video compression in September 2023. I couldn't imagine developing video codecs on the average PC of 20/30 years ago especially with the limited amount of memory.


Quote:
Originally Posted by rwill View Post
Yeah, somewhat. From time to time I do some graphics stuff on a TI-89 calculator which has a Motorola 68k CPU and is used with 2 bit grayscale graphics.
Like for example I have done this: https://www.youtube.com/watch?v=3WvSVB4nqjw
That's awesome! I had a newer TI calculator that supported IIRC ~16 colors but I remembered seeing a 2-bit grayscale 3D engine project for the TI-89. It got me somewhat interested in embedded software and C. I never really got around to writing anything more than some basic linear algebra programs on my calculator though, I did most of my development on my desktop computer.
LMP88959 is offline   Reply With Quote
Old 5th May 2024, 11:25   #9  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 404
Quote:
Ah okay, that makes sense. Maybe some more error resilience in that case too? (even though I didn't design DSV1 for streaming/network transmission). If I ever decide to create a DSV2 I will definitely try this out though.
Regarding error resilience, there is the concept of 'slices'. For example Mpeg-2 had macroblocks in slices that contained a macroblock line of the picture at maximum, down to a couple of macroblocks per slice resulting in lots of independent slices per picture macroblock line. Now in the case of a bitstream error, which resulted in the bitstream parser to go off the happy path, a decoder is able to resync at the next slice. This was great for broadcast where there is no real packet loss but bitstream errors. As codecs and transmission of data got more modern this was less and less of an issue. For network packet based transmission, one could put a slice in each packet so when a packet gets lost one only loses that single slice. Now this only can happen with UDP (like when doing RTP) and not with secured network layers like TCP which just resend the packet. (Independent) slices which break prediction from neighboring units cost quite a lot of compression efficiency too so they are not desirable for every use case today. Most videos shared today over the internet only have a single slice per picture.

Quote:
I used EG codes because the magnitudes of the coefficients can be arbitrarily large which can be annoying to deal with. A Huffman table would require two passes on the coefficients right?
Yes, one pass to count the different symbols to generate the global optimal huffman tables, then write out using the generated variable length codes. You can mix Huffman with your EG codes even. Given N huffman symbols you can code coefficients up to N-1 using huffman codecs and use the code N as an escape symbol and then encode the coefficient as EG as a special case.

Quote:
Oh wow you're a veteran then, I began my image compression journey in 2022 and started video compression in September 2023. I couldn't imagine developing video codecs on the average PC of 20/30 years ago especially with the limited amount of memory.
Well resolutions were smaller and algorithms were simpler. I still do not know what to do with all the compute power available today except for almost brute forcing my way through. Thats also why I sometimes do something on small devices just for fun to not get used too much to the 'would have been a supercomputer 15 years ago' systems available today.
rwill is offline   Reply With Quote
Old 5th May 2024, 19:18   #10  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 14
Quote:
Originally Posted by rwill View Post
Regarding error resilience, there is the concept of 'slices'. For example Mpeg-2 had macroblocks in slices that contained a macroblock line of the picture at maximum, down to a couple of macroblocks per slice resulting in lots of independent slices per picture macroblock line. Now in the case of a bitstream error, which resulted in the bitstream parser to go off the happy path, a decoder is able to resync at the next slice. This was great for broadcast where there is no real packet loss but bitstream errors. As codecs and transmission of data got more modern this was less and less of an issue. For network packet based transmission, one could put a slice in each packet so when a packet gets lost one only loses that single slice. Now this only can happen with UDP (like when doing RTP) and not with secured network layers like TCP which just resend the packet. (Independent) slices which break prediction from neighboring units cost quite a lot of compression efficiency too so they are not desirable for every use case today. Most videos shared today over the internet only have a single slice per picture.
Ah okay, good to know. Thanks for the explanation

Quote:
Originally Posted by rwill View Post
Yes, one pass to count the different symbols to generate the global optimal huffman tables, then write out using the generated variable length codes. You can mix Huffman with your EG codes even. Given N huffman symbols you can code coefficients up to N-1 using huffman codecs and use the code N as an escape symbol and then encode the coefficient as EG as a special case.
I can't imagine Huffman being too much better than EG for low numbers? EG already does a good job of making sure small numbers have a small number of bits. A second pass on the data will be too heavy I think so maybe a fixed set of Huffman codes for < N?

Quote:
Originally Posted by rwill View Post
Well resolutions were smaller and algorithms were simpler. I still do not know what to do with all the compute power available today except for almost brute forcing my way through. Thats also why I sometimes do something on small devices just for fun to not get used too much to the 'would have been a supercomputer 15 years ago' systems available today.
Yeah I totally understand what you mean, that's why I impose so many restrictions on myself with my game (King's Crook) and the other software I write.
LMP88959 is offline   Reply With Quote
Old 6th May 2024, 18:16   #11  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 404
Quote:
Originally Posted by LMP88959 View Post
I can't imagine Huffman being too much better than EG for low numbers? EG already does a good job of making sure small numbers have a small number of bits. A second pass on the data will be too heavy I think so maybe a fixed set of Huffman codes for < N?
Well I guess the more you quantize, given the wavelet transform, the smaller the numbers you get and Exp-Golomb might become more and more optimal with rising quantization. For higher level subbands with larger coefficients or lower level subbands at low quantization EG might not be that optimal. One would have to do some empirical analysis where the bits are spent and how much better other approaches are...
rwill is offline   Reply With Quote
Old 7th May 2024, 17:58   #12  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 14
Quote:
Originally Posted by rwill View Post
Well I guess the more you quantize, given the wavelet transform, the smaller the numbers you get and Exp-Golomb might become more and more optimal with rising quantization. For higher level subbands with larger coefficients or lower level subbands at low quantization EG might not be that optimal. One would have to do some empirical analysis where the bits are spent and how much better other approaches are...
Good point, maybe switching it up per subband would be better? The highest freq subband is 3/4 of the coefficients but also mostly small numbers. Perhaps everything besides the highest frequency subband would be better as some sort of statistical coding rather than a universal code.
LMP88959 is offline   Reply With Quote
Old 7th May 2024, 20:19   #13  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 404
Well for a quick test, have you tried writing a couple of subbands to files and then gzip'ing or 7zip'ing these? This might give some sort of hint what is possible.

Keep in mind if you implement some sort of entropy coder and keep things modular you can use it elsewhere too. Once you have written a couple of different modules you are able to just glue some together to get things done in another fun project. Like for asset compression or so. No need to use the libraries of other people... and no need to license the Unreal Engine ..
rwill is offline   Reply With Quote
Old 8th May 2024, 17:39   #14  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 14
Quote:
Originally Posted by rwill View Post
Well for a quick test, have you tried writing a couple of subbands to files and then gzip'ing or 7zip'ing these? This might give some sort of hint what is possible.

Keep in mind if you implement some sort of entropy coder and keep things modular you can use it elsewhere too. Once you have written a couple of different modules you are able to just glue some together to get things done in another fun project. Like for asset compression or so. No need to use the libraries of other people... and no need to license the Unreal Engine ..
Hmm, how would I dump the subbands to files though in a way that other general compressors can work with it? Zip formats use byte-oriented compression (afaik) but subband coefficients can be 1 byte min to n bytes max.

I have a Huffman coder in my 'toolbox' + a simple byte oriented RLE and that suffices for most of the data that needs compression in my game. Realistically this compression is highly unnecessary for the modern era since the game itself is like 20MB where most of that is audio (which is already ADPCM + Huffman), so there are no AAA sized assets.

Adding cutscenes/video playback to the game would be quite a burden in terms of storage size though which is why I think DSV1 is actually the only truly 'necessary' compression I have in my arsenal.
LMP88959 is offline   Reply With Quote
Old 9th May 2024, 00:36   #15  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 404
Quote:
Originally Posted by LMP88959 View Post
Hmm, how would I dump the subbands to files though in a way that other general compressors can work with it? Zip formats use byte-oriented compression (afaik) but subband coefficients can be 1 byte min to n bytes max.
Well to get a rough size estimate from something like gzip or similar I think it would be sufficient to do something like this:
Code:
sign = coefficient < 0 ? 1 : 0;
out = ( abs( coefficient ) << 1 ) | sign;
And then write out as uint16_t to a file. Don't worry about gzip operating on bytes and having 16 bit values - the gzip size can be seen as an upper acceptable bound and should you roll your own approach you should be able to match it or (likely) do a bit better. Maybe your coefficients cannot get larger than 9bit when having 8 bit input ? I have not looked at your transform yet but I vaguely remember from my wavelet stuff back then that I kept the coefficients at around the input depth plus a sign bit. So if the input is 8 bit the coefficient value range should never get larger than -255 to 255?

I mean wasnt it like this for (1D) Haar given the two pels a and b ?:
Code:
H = b - a
L = a + ( H / 2 )
And similar for other wavelets if one uses integer lifting transforms?

It says so on bearcave.com/misl/misl_tech/wavelets/lifting/basiclift.html too so I seem to recall at least the Haar lifting transform correctly.

I just googled around and there even seems to be software available that can generate lifting schemes and stuff from wavelet definitions automagically, we did not have that back then I think! I am getting old...
rwill is offline   Reply With Quote
Old 9th May 2024, 03:01   #16  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 14
Quote:
Originally Posted by rwill View Post
Well to get a rough size estimate from something like gzip or similar I think it would be sufficient to do something like this:
Code:
sign = coefficient < 0 ? 1 : 0;
out = ( abs( coefficient ) << 1 ) | sign;
And then write out as uint16_t to a file. Don't worry about gzip operating on bytes and having 16 bit values - the gzip size can be seen as an upper acceptable bound and should you roll your own approach you should be able to match it or (likely) do a bit better. Maybe your coefficients cannot get larger than 9bit when having 8 bit input ? I have not looked at your transform yet but I vaguely remember from my wavelet stuff back then that I kept the coefficients at around the input depth plus a sign bit. So if the input is 8 bit the coefficient value range should never get larger than -255 to 255?

I mean wasnt it like this for (1D) Haar given the two pels a and b ?:
Code:
H = b - a
L = a + ( H / 2 )
And similar for other wavelets if one uses integer lifting transforms?

It says so on bearcave.com/misl/misl_tech/wavelets/lifting/basiclift.html too so I seem to recall at least the Haar lifting transform correctly.

I just googled around and there even seems to be software available that can generate lifting schemes and stuff from wavelet definitions automagically, we did not have that back then I think! I am getting old...
Ah that makes sense. The first few subbands (from highest->lowest freq) should easily fit in 16 bits. I don't do (much, sometimes any) magnitude reduction in the LL band as I decompose the signal so the final LL part has REALLY large values. This is so under heavy compression I can still at least get back something that looks like a low res version of the frame.

These are all great ideas, and it has been a nice back and forth we've had.
It'll be fun to try them out when I get back to messing around with compression
LMP88959 is offline   Reply With Quote
Reply

Tags
codec, subband, wavelet

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 02:29.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.