Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
4th October 2024, 16:06 | #1 | Link |
Registered User
Join Date: Apr 2024
Posts: 14
|
Digital Subband Video 2 - Open Source Wavelet Codec
Hello Doom9, I've released the second iteration of my video codec.
https://github.com/LMP88959/Digital-Subband-Video-2 DSV1 wasn't giving me the performance I wanted so I continued work on it and created DSV2. DSV2 is to DSV1 what MPEG-4 Part 2 was to MPEG-2. A similar 'generational leap' in terms of efficiency. Comparison video (you can see a couple more comparisons in the README of the repository): minih264: https://github.com/user-attachments/...7-ba6b66a98d59 DSV2: https://github.com/user-attachments/...f-74cb59e05333 It is totally free for you to use however you'd like. Description taken from GitHub README: DSV2 Features
|
4th October 2024, 21:07 | #2 | Link |
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,893
|
Can you give an architectural overview of how you're handling inter prediction in wavelets? Wavelets were always pretty good for intraframe compression, but weren't able to match block-based motion compensation efficiency.
I've long thought that this was due to the better symmetry between inter and intra in block-based coding, which minimized cumulative loss from using visibly compressed blocks to predict from. Being able to do more wavelet-like interprediction always seemed like the holy grail here, but I at least never had a good idea how that would work in practice. |
4th October 2024, 21:31 | #3 | Link |
Registered User
Join Date: Apr 2024
Posts: 14
|
Hi Ben,
DSV2 does block-based motion compensation as well using the 'traditional' techniques of subpel motion and in-loop 'deblocking' filtering. The wavelets are only used for intra frame and inter (residual) frame compression. Did I understand your question correctly? |
9th October 2024, 17:01 | #4 | Link | |
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,893
|
Quote:
Is the deblocking to handle artifacts introduced by the pixel-based motion estimation? Do you find mixing wavelet-based and pixel-based tools is a challenge for efficient motion prediction? My sense is this has been a big reason that wavelet codecs haven't been competitive for interframe encoding to date. Many years ago I idly imagined a way to use 3D wavelets to extend into the temporal dimension to get around this. But my approach was so obviously awful I didn't get very far with it . The Daala experimental approach of keeping everything in the frequency domain for prediction and only rasterizing for display seemed like it might be more promising for wavelets than it proved to be for a block-based transform. Wavelets are a fascinating topic! And get used in things like digital projectors in movie theaters and the beloved Cineform codec. The intrinsic spatially scalable properties seem like they'd be well suited to a lot of internet applications if temporal prediction bitrate scaled with the subbands... |
|
10th October 2024, 03:29 | #5 | Link | |||
Registered User
Join Date: Apr 2024
Posts: 14
|
Quote:
From what I've learned, a full-frame wavelet codec like DSV2 is unable to realistically test the entropy of potential predictions since it would be very unreasonable to subtract the prediction for a block, transform+quantize the whole frame, and then estimate the number of bits (or some other metric) that prediction would result in. DCTs and wavelets are essentially opposites of each other. A DCT coefficient represents a pattern over a certain 2D area whereas a wavelet coefficient represents a 'feature' at a certain point in an area. As a result, what traditional DCT codecs are good at coding may be nightmarish for a wavelet and vice versa. I say all this to bring attention to the fact that these differences must be taken into consideration when doing ME and mode decision. Some might call this opinion a stretch, however, I believe H.264's concept of intra prediction is as effective as it is because it essentially combined wavelets and the DCT. Of course, wavelets weren't being directly used, but the removal of a gradient in a block is similar to a wavelet decomposition where you'd have one coefficient in a lower level subband which represents a gradient in the horizontal/vertical/diagonal direction over a certain part of an image. Quote:
When I started the project I originally tried that 3D wavelet idea. It had a few big issues: 1. Very slow. 2. High memory requirement (relatively) 3. Ineffective at temporal prediction. Temporally, a Haar wavelet would be the best because it does not introduce much ringing at all but at the same time it's essentially the same as simple frame differencing. Any other wavelet would result in weird ghosting and blurring in the temporal domain. Quote:
Thank you for the great comments and questions! |
|||
11th October 2024, 20:08 | #6 | Link |
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,893
|
Thanks for your feedback! I love hearing about novel codec designs. I hadn't considered wavelets and DCT as inverses of each other. I'll need to spend some shower time ruminating on that.
I had tangential involvement with JPEG XR when I was at Microsoft. It had alternating DCT and wavelet spatial enhancement layers, which I thought was a fascinating approach. A fundamentally ineffective one, though. Some smart people spent a long time trying to make an encoder that would meaningfully outperform JPEG perceptually, which in theory was easy, but in practice wasn't accomplished. If that had worked, doing block-based motion estimation on the DCT layer and only using the wavelets for spatial enhancement could have been an interesting possibility. Alas, so many good ideas just don't work in the end. The Daala development notes are a fascinating journey through trying lots of novel, clever ideas and having most of them prove impractical. |
13th October 2024, 07:06 | #7 | Link |
Registered User
Join Date: Apr 2024
Posts: 14
|
Of course, and thank you for your curiosity!
With the little experience I have, I can naively say there is a good amount of potential in mixing wavelets and DCTs in a codec. And I agree, the Daala development pages are very interesting and document the various novel techniques quite well. They also often describe the fundamental problems they're trying to solve with each technique which helps newbies like myself understand what a codec needs to do. |
18th November 2024, 20:19 | #8 | Link | |
Registered User
Join Date: Apr 2024
Posts: 14
|
Since last time I posted, I've made a few improvements to DSV2.
- Motion estimation uses RMSE instead of SAD - Better mode decision - Added a new psychovisual metric. Found in hme.c, block_has_vis_edge(). (may be interesting to a curious reader or other encoder developer out there) - Replaced the high freq filter with something interesting and relatively unexplored (as far as I know) since ~1990, an asymmetric subband filter. In my opinion, the most fascinating addition by far is the asymmetric subband filtering. Asymmetric in this context means the encoding is considerably slower than decoding. It is lossy, but such asymmetry is a rare thing in transform based coding where the inverse transform is typically just as complex as the forward transform. Check out the pull request here: https://github.com/LMP88959/Digital-...Video-2/pull/9 Quote from the pull request: Quote:
|
|
Tags |
codec, open source, subband, wavelet |
Thread Tools | Search this Thread |
Display Modes | |
|
|