Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > New and alternative video codecs

Reply
 
Thread Tools Display Modes
Old 4th October 2024, 16:06   #1  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Digital Subband Video 2 - Open Source Wavelet Codec

Hello Doom9, I've released the second iteration of my video codec.

DSV1 wasn't giving me the performance I wanted so I continued work on it and created DSV2.

DSV2 is to DSV1 what H.264 was to MPEG-2. A similar 'generational leap' in terms of efficiency.

As of June 20, 2025 the DSV2 bitstream is frozen.
It is totally free for you to use however you'd like.

Please refer to the GitHub repository for code, examples, and a single-header decoder implementation.

https://github.com/LMP88959/Digital-Subband-Video-2

Description taken from GitHub README:

DSV2 Features
  • compression using multiresolution subband analysis instead of DCT
  • also known as a wavelet transform
  • up to quarter-pixel motion compensation
  • 4:1:0, 4:1:1, 4:2:0, 4:2:2 (+ UYVY) and 4:4:4 chroma subsampling formats
  • adaptive quantization
  • in-loop filtering
  • intra and inter frames with variable length closed GOP
  • no bidirectional prediction (also known as B-frames). Only forward prediction with previous picture as reference
Improvements and New Features since DSV1
  • in-loop filtering after motion compensation
  • more adaptive quantization potential
  • skip blocks for temporal stability
  • new subband filters + support for adaptive subband filtering
  • better motion compensation through Expanded Prediction Range Mode (EPRM)
  • quarter pixel compensation
  • psychovisual optimizations in the codec and encoder design

Last edited by LMP88959; 26th June 2025 at 17:55. Reason: Updated to reflect current codec
LMP88959 is offline   Reply With Quote
Old 4th October 2024, 21:07   #2  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 5,122
Can you give an architectural overview of how you're handling inter prediction in wavelets? Wavelets were always pretty good for intraframe compression, but weren't able to match block-based motion compensation efficiency.

I've long thought that this was due to the better symmetry between inter and intra in block-based coding, which minimized cumulative loss from using visibly compressed blocks to predict from.

Being able to do more wavelet-like interprediction always seemed like the holy grail here, but I at least never had a good idea how that would work in practice.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 4th October 2024, 21:31   #3  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Hi Ben,

DSV2 does block-based motion compensation as well using the 'traditional' techniques of subpel motion and in-loop 'deblocking' filtering.
The wavelets are only used for intra frame and inter (residual) frame compression.

Did I understand your question correctly?
LMP88959 is offline   Reply With Quote
Old 9th October 2024, 17:01   #4  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 5,122
Quote:
Originally Posted by LMP88959 View Post
Hi Ben,

DSV2 does block-based motion compensation as well using the 'traditional' techniques of subpel motion and in-loop 'deblocking' filtering.
The wavelets are only used for intra frame and inter (residual) frame compression.

Did I understand your question correctly?
Yes, perfectly.

Is the deblocking to handle artifacts introduced by the pixel-based motion estimation?

Do you find mixing wavelet-based and pixel-based tools is a challenge for efficient motion prediction? My sense is this has been a big reason that wavelet codecs haven't been competitive for interframe encoding to date.

Many years ago I idly imagined a way to use 3D wavelets to extend into the temporal dimension to get around this. But my approach was so obviously awful I didn't get very far with it .

The Daala experimental approach of keeping everything in the frequency domain for prediction and only rasterizing for display seemed like it might be more promising for wavelets than it proved to be for a block-based transform.

Wavelets are a fascinating topic! And get used in things like digital projectors in movie theaters and the beloved Cineform codec. The intrinsic spatially scalable properties seem like they'd be well suited to a lot of internet applications if temporal prediction bitrate scaled with the subbands...
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 10th October 2024, 03:29   #5  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Quote:
Originally Posted by benwaggoner View Post
Yes, perfectly.

Is the deblocking to handle artifacts introduced by the pixel-based motion estimation?

Do you find mixing wavelet-based and pixel-based tools is a challenge for efficient motion prediction? My sense is this has been a big reason that wavelet codecs haven't been competitive for interframe encoding to date.
Yes the deblocking filter is to handle sharp discontinuities introduced by the motion compensation.

From what I've learned, a full-frame wavelet codec like DSV2 is unable to realistically test the entropy of potential predictions since it would be very unreasonable to subtract the prediction for a block, transform+quantize the whole frame, and then estimate the number of bits (or some other metric) that prediction would result in.

DCTs and wavelets are essentially opposites of each other. A DCT coefficient represents a pattern over a certain 2D area whereas a wavelet coefficient represents a 'feature' at a certain point in an area. As a result, what traditional DCT codecs are good at coding may be nightmarish for a wavelet and vice versa.
I say all this to bring attention to the fact that these differences must be taken into consideration when doing ME and mode decision.

Some might call this opinion a stretch, however, I believe H.264's concept of intra prediction is as effective as it is because it essentially combined wavelets and the DCT. Of course, wavelets weren't being directly used, but the removal of a gradient in a block is similar to a wavelet decomposition where you'd have one coefficient in a lower level subband which represents a gradient in the horizontal/vertical/diagonal direction over a certain part of an image.

Quote:
Originally Posted by benwaggoner View Post
Many years ago I idly imagined a way to use 3D wavelets to extend into the temporal dimension to get around this. But my approach was so obviously awful I didn't get very far with it .
Interestingly enough, I began the DSV project last year in an attempt to make a decent wavelet based video codec after seeing how virtually every codec was either using the DCT or (in the old days) VQ.
When I started the project I originally tried that 3D wavelet idea.
It had a few big issues:
1. Very slow.
2. High memory requirement (relatively)
3. Ineffective at temporal prediction.

Temporally, a Haar wavelet would be the best because it does not introduce much ringing at all but at the same time it's essentially the same as simple frame differencing. Any other wavelet would result in weird ghosting and blurring in the temporal domain.


Quote:
Originally Posted by benwaggoner View Post
The Daala experimental approach of keeping everything in the frequency domain for prediction and only rasterizing for display seemed like it might be more promising for wavelets than it proved to be for a block-based transform.

Wavelets are a fascinating topic! And get used in things like digital projectors in movie theaters and the beloved Cineform codec. The intrinsic spatially scalable properties seem like they'd be well suited to a lot of internet applications if temporal prediction bitrate scaled with the subbands...
The idea of doing motion estimation using the wavelet decompositions of both the source and reference frames instead of the pixels themselves sounds interesting, I wonder how well it would work in practice...

Thank you for the great comments and questions!
LMP88959 is offline   Reply With Quote
Old 11th October 2024, 20:08   #6  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 5,122
Thanks for your feedback! I love hearing about novel codec designs. I hadn't considered wavelets and DCT as inverses of each other. I'll need to spend some shower time ruminating on that.

I had tangential involvement with JPEG XR when I was at Microsoft. It had alternating DCT and wavelet spatial enhancement layers, which I thought was a fascinating approach. A fundamentally ineffective one, though. Some smart people spent a long time trying to make an encoder that would meaningfully outperform JPEG perceptually, which in theory was easy, but in practice wasn't accomplished. If that had worked, doing block-based motion estimation on the DCT layer and only using the wavelets for spatial enhancement could have been an interesting possibility.

Alas, so many good ideas just don't work in the end. The Daala development notes are a fascinating journey through trying lots of novel, clever ideas and having most of them prove impractical.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 13th October 2024, 07:06   #7  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Of course, and thank you for your curiosity!
With the little experience I have, I can naively say there is a good amount of potential in mixing wavelets and DCTs in a codec.
And I agree, the Daala development pages are very interesting and document the various novel techniques quite well. They also often describe the fundamental problems they're trying to solve with each technique which helps newbies like myself understand what a codec needs to do.
LMP88959 is offline   Reply With Quote
Old 18th November 2024, 20:19   #8  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Since last time I posted, I've made a few improvements to DSV2.

- Motion estimation uses RMSE instead of SAD
- Better mode decision
- Added a new psychovisual metric. Found in hme.c, block_has_vis_edge(). (may be interesting to a curious reader or other encoder developer out there)
- Replaced the high freq filter with something interesting and relatively unexplored (as far as I know) since ~1990, an asymmetric subband filter.

In my opinion, the most fascinating addition by far is the asymmetric subband filtering.
Asymmetric in this context means the encoding is considerably slower than decoding. It is lossy, but such asymmetry is a rare thing in transform based coding where the inverse transform is typically just as complex as the forward transform.


Check out the pull request here:

https://github.com/LMP88959/Digital-...Video-2/pull/9

Quote from the pull request:
Quote:
Adding a new subband filter (which I'm calling ASF93 for reasons explained below) for increased intra frame decoding performance.
...
This new filter is only applied to level 1 (to create the highest frequency subbands). It is asymmetrical and does not allow for perfect reconstruction.

Subband filtering consists of a FIR filter bank that splits up a signal into different subbands. Wavelets and the Haar transform are all special cases of subband filters.
In subband filtering, the forward transform is called 'analysis' and the inverse transform is called 'synthesis.'

With ASF93, the analysis filters are 9-tap filters. The FIR coefficients were tuned to my personal psychovisual tastes (slightly emphasized low pass features, noisy/ringy but localized high pass features). The synthesis FIR filter coefficients are the standard 3-tap filters used elsewhere in DSV2. The name ASF93 stands for "asymmetric subband filter with 9-tap analysis and 3-tap synthesis."

Reference for the idea:
Subband image coding with three-tap pyramids by
E H Adelson and E P Simoncelli (1990)

Strangely, I haven't been able to find anything else regarding this concept of asymmetrical subband filtering besides this paper, which is unfortunate. I believe there is some untapped potential here.
I hope this update to DSV2 will help push this effort forward as well as provide a modern use-case.
LMP88959 is offline   Reply With Quote
Old 16th May 2025, 17:02   #9  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
A new version of this codec has just been released on GitHub!
It consists of about half a year of research and development and brings a plethora of performance and quality improvements!

Even if you have tried the codec before please give it another try, it's nothing groundbreaking or cutting edge but it's different
LMP88959 is offline   Reply With Quote
Old 16th May 2025, 19:30   #10  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 5,122
Quote:
Originally Posted by LMP88959 View Post
A new version of this codec has just been released on GitHub!
It consists of about half a year of research and development and brings a plethora of performance and quality improvements!

Even if you have tried the codec before please give it another try, it's nothing groundbreaking or cutting edge but it's different
It's always great to see people taking fundamentally different approaches. We've got the traditional block-based algorithms so refined that so much development is just more refinement of that, while we're not exploring what would could build with equal effort on top of fundamentally different technology.

The industry spent so many years delivering different H.263 refinements. VVC is basically HEVC++.

We don't need to see something immediately competitive out of a different approach for it to be worthy or promising. I wish there was a way to compare different foundation approaches with a similar degree of refinement. It's not that informative to compare something with the latest x265 to see if the fundamental transforms are better. Comparing to an early build of reference encoders that had similar degree of refinement could be an interesting approach. So don't test rate control or frame type selection in a comparison when a novel codec doesn't have those yet.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 17th May 2025, 16:36   #11  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Quote:
Originally Posted by benwaggoner View Post
It's always great to see people taking fundamentally different approaches. We've got the traditional block-based algorithms so refined that so much development is just more refinement of that, while we're not exploring what would could build with equal effort on top of fundamentally different technology.

The industry spent so many years delivering different H.263 refinements. VVC is basically HEVC++.

We don't need to see something immediately competitive out of a different approach for it to be worthy or promising. I wish there was a way to compare different foundation approaches with a similar degree of refinement. It's not that informative to compare something with the latest x265 to see if the fundamental transforms are better. Comparing to an early build of reference encoders that had similar degree of refinement could be an interesting approach. So don't test rate control or frame type selection in a comparison when a novel codec doesn't have those yet.
Thank you! I agree with you.

DCT based codecs were first to the market in the early 90s (and from what I read, easier to write hardware implementations of?) so it makes sense that people took it and expanded on it since it was already established.

Wavelets on the other hand seemed to be the topic of interest for researchers exclusively in the mid 90s - pre 2010s that was explored mostly in theory with few practical/fleshed out implementations. By the late 2000s, any new wavelet codecs (e.g Snow/Dirac) needed to compete with x264 which, like you said, is a very difficult (and unfair) thing to do given how new both the technology and encoders were.

This, and the fact that DSV has been a one-man project, is why on the DSV page I compare to minih264 (a relatively simple H.264 encoder).

I really appreciate your interest in the project! If you do end up trying it out, I'd love to hear your thoughts on it
LMP88959 is offline   Reply With Quote
Old 17th May 2025, 17:41   #12  |  Link
nhw_pulsar
Registered User
 
Join Date: Apr 2017
Posts: 43
Quote:
Originally Posted by LMP88959 View Post
Thank you! I agree with you.

DCT based codecs were first to the market in the early 90s (and from what I read, easier to write hardware implementations of?) so it makes sense that people took it and expanded on it since it was already established.

Wavelets on the other hand seemed to be the topic of interest for researchers exclusively in the mid 90s - pre 2010s that was explored mostly in theory with few practical/fleshed out implementations. By the late 2000s, any new wavelet codecs (e.g Snow/Dirac) needed to compete with x264 which, like you said, is a very difficult (and unfair) thing to do given how new both the technology and encoders were.

This, and the fact that DSV has been a one-man project, is why on the DSV page I compare to minih264 (a relatively simple H.264 encoder).

I really appreciate your interest in the project! If you do end up trying it out, I'd love to hear your thoughts on it
Hello,

I never could try your very interesting project DSV, very sorry, or Dirac, Snow because I don't test video compression, I am more an image compression guy, but I still wanted to notice very quickly that in April 2008 was published the excellent Rududu Image Compression codec and at that time I compared it for still images with the then best codec of that time which was x264 intra (the codec was named UCI codec), and for me RIC was as good as x264 intra (and RIC was notably faster to encode). If I remember correctly RIC and x264 intra had the same PSNR, and visually it was 50/50 on my test images (I did not test extreme compression).

So sorry for the off-topic, Rududu Image Compression was really a very good wavelet image codec of 2008, so sad that its author could not continue its work on it... Anyway Rududu uses the high trend in wavelets that is zerotree/SPIHT, it seems that you don't use that scheme in DSV (neither me in my codec NHW)?

Cheers,
Raphael
nhw_pulsar is offline   Reply With Quote
Old 17th May 2025, 23:59   #13  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Hello! I never tried Rududu, but I'll see if I can find it and compile it. I didn't design DSV2 as a still image codec so I'm sure it will not perform as well. I don't use SPIHT/zerotrees as those are very slow and involve multiple passes over the image. I do simple value-coding for speed and it performs decently from what I've seen.
LMP88959 is offline   Reply With Quote
Old 18th May 2025, 09:40   #14  |  Link
nhw_pulsar
Registered User
 
Join Date: Apr 2017
Posts: 43
Quote:
Originally Posted by LMP88959 View Post
I don't use SPIHT/zerotrees as those are very slow and involve multiple passes over the image.
Hello!

I don't know zerotree/SPIHT techniques, but yes it notably seems "a little" slow to decode....

Quote:
Originally Posted by LMP88959 View Post
I do simple value-coding for speed and it performs decently from what I've seen.
Just great!

I really appreciate your new approach to make work the "classic" and advanced me/mc techniques with wavelets.

Cheers,
Raphael
nhw_pulsar is offline   Reply With Quote
Old 18th May 2025, 19:39   #15  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Quote:
Originally Posted by nhw_pulsar View Post
I don't know zerotree/SPIHT techniques, but yes it notably seems "a little" slow to decode....
Ah, yes they (the normal versions) operate on bit planes rather than on individual coefficients which not only requires extra computation/iterations but also results in poor cache usage. They also require storing lists and iterating multiple times over the bits to determine what is and isn't significant. They are more or less symmetrical operations for encoding/decoding so it's not like this extra time will get you faster decoding or anything. Plus, these are PSNR optimized algorithms which squeeze the entropy quite well out of the wavelet tree but at lower bit rates they look awfully blurry due to the lack of psychovisual consideration.

Quote:
Originally Posted by nhw_pulsar View Post
Just great!

I really appreciate your new approach to make work the "classic" and advanced me/mc techniques with wavelets.
Thank you so much, I took a look at your NHW project as well and it looks great as well.
LMP88959 is offline   Reply With Quote
Old 18th May 2025, 20:55   #16  |  Link
nhw_pulsar
Registered User
 
Join Date: Apr 2017
Posts: 43
Quote:
Originally Posted by LMP88959 View Post
Ah, yes they (the normal versions) operate on bit planes rather than on individual coefficients which not only requires extra computation/iterations but also results in poor cache usage. They also require storing lists and iterating multiple times over the bits to determine what is and isn't significant. They are more or less symmetrical operations for encoding/decoding so it's not like this extra time will get you faster decoding or anything. Plus, these are PSNR optimized algorithms which squeeze the entropy quite well out of the wavelet tree but at lower bit rates they look awfully blurry due to the lack of psychovisual consideration.

Thank you for the explanation. I have also read that to squeeze entropy out of zerotree/SPIHT partitioning the best as possible, you need to couple it to context modeling (with arithmetic coding), and context modeling is quite slow...

Personally, I am not an expert, but I like your approach to make a _fast_ but psychovisually-optimized wavelet video codec.

Quote:
Originally Posted by LMP88959 View Post
Thank you so much, I took a look at your NHW project as well and it looks great as well.
Thank you very much for taking a look at NHW.


Keep up your great work,
Cheers,
Raphael

Last edited by nhw_pulsar; 18th May 2025 at 20:59.
nhw_pulsar is offline   Reply With Quote
Old 19th May 2025, 21:44   #17  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Thank you Raphael.

@benwaggoner I added lossless coding support as you suggested
https://github.com/LMP88959/Digital-...ideo-2/pull/13
LMP88959 is offline   Reply With Quote
Old 30th May 2025, 22:17   #18  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 5,122
Quote:
Originally Posted by LMP88959 View Post
Ah, yes they (the normal versions) operate on bit planes rather than on individual coefficients which not only requires extra computation/iterations but also results in poor cache usage. They also require storing lists and iterating multiple times over the bits to determine what is and isn't significant. They are more or less symmetrical operations for encoding/decoding so it's not like this extra time will get you faster decoding or anything. Plus, these are PSNR optimized algorithms which squeeze the entropy quite well out of the wavelet tree but at lower bit rates they look awfully blurry due to the lack of psychovisual consideration.
Yeah, PSNR is very much a first-order quality metric!

As such, it's fine to use in early codec development. But too much focus on PSNR-at-QP optimization can miss opportunities to make a bitstream that is highly efficient with adaptive quantization and other psychovisual improvements.

I don't get it when codec developer apologize for "we got 40% subjective improvement, but only 25% PNSR." Subjective improvements is the only thing we care about in distribution! I'd take that over 35% in both PSNR and subjective improvement.

This blind spot wound up being a big issue with VC-1, due to our inefficient RLE differential bitmask approach to adaptive quantization. At internet bitrates, the overhead often ate more bits than the psychovisual improvements could save.

There were some tricks to get around that to a worthwhile degree (like using it only for I-frames and selected P-frames), but they never became the default in anything.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 30th May 2025, 22:45   #19  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Definitely, I've been using XPSNR as a rough guide during tuning but I always verify with my eyes.

Wavelets got a lot of bad reputation for image/video coding (in my opinion) specifically because it was being researched mainly during the time when PSNR was almost exclusively the target metric. People saw good numbers but bad visual quality and for some strange reason decided to say wavelets were the problem and not the metric itself.

I really enjoy learning about psychovisual phenomena which is why I designed DSV2 with psychovisual considerations baked-in rather than something which was created to optimize for PSNR originally and had psy optimizations added as an afterthought.

Did you work on developing VC-1? I was curious about why the designers chose a cubic half-pel filter (-1,9,9,-1), not sure if you know why that was chosen?
I was using that filter originally but it made subpixel motion extremely blurry (and I wanted to keep the filters at 4-taps max) so I did some R&D on my end and settled on 'smart' temporal switching between two sharper cubic filters which from my testing significantly outperformed the (-1,9,9,-1) filter.

Thank you!
LMP88959 is offline   Reply With Quote
Old 21st June 2025, 03:59   #20  |  Link
LMP88959
Registered User
 
Join Date: Apr 2024
Posts: 26
Small but significant update(s):

1. DSV2 bitstream is now frozen
2. Improved wavelet coefficient entropy coding slightly (generally 1-5% smaller videos for same quality)
3. Improved adaptive quantization
LMP88959 is offline   Reply With Quote
Reply

Tags
codec, open source, subband, wavelet

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 06:11.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.