Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se |
|
|
#1 | Link |
|
Artem S. Tashkinov
Join Date: Dec 2006
Posts: 443
|
OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates
Pretrained latent diffusion models have shown strong potential for lossy image compression, owing to their powerful generative priors. Most existing diffusion-based methods reconstruct images by iteratively denoising from random noise, guided by compressed latent representations. While these approaches have achieved high reconstruction quality, their multi-step sampling process incurs substantial computational overhead. Moreover, they typically require training separate models for different compression bit-rates, leading to significant training and storage costs. To address these challenges, we propose a one-step diffusion codec across multiple bit-rates. termed OSCAR. Specifically, our method views compressed latents as noisy variants of the original latents, where the level of distortion depends on the bit-rate. This perspective allows them to be modeled as intermediate states along a diffusion trajectory. By establishing a mapping from the compression bit-rate to a pseudo diffusion timestep, we condition a single generative model to support reconstructions at multiple bit-rates. Meanwhile, we argue that the compressed latents retain rich structural information, thereby making one-step denoising feasible. Thus, OSCAR replaces iterative sampling with a single denoising pass, significantly improving inference efficiency. Extensive experiments demonstrate that OSCAR achieves superior performance in both quantitative and visual quality metrics. The code and models are available at https://github.com/jp-guo/OSCAR.
Paper At least on pictures it looks exceptional. |
|
|
|
|
|
#2 | Link |
|
Registered User
Join Date: Apr 2017
Posts: 60
|
I am a total newbie in AI/ML technologies, but I have read in the C3 codec paper that there must be a synthesis network in C3 to reach good quality image. Normally when you upscale the latents, the resulting upscaled image has not an "optimal" quality, and so in C3 they add a synthesis network on the upscaled image to restore good quality.
I didn't make an in-deep reading of the OSCAR paper, but when I look at their algorithm scheme, they mention at decoding a VAE decoder at final step after recovering the latents. I guess that such a VAE decoder upscales the latents, but there is certainly a kind of synthesis network with a model in a VAE decoder that "enhances" the upscaled image? |
|
|
|
|
|
#3 | Link |
|
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 5,147
|
I understand the romance of one-step all-AI codecs. But so much domain knowledge gets explicitly expressed in real-world codecs, which is ignored to the peril and detriment of those who think it isn't required.
Integrating AI techniques with domain expertise should outperform any solution based on just one of the two. There's been some interesting work leveraging ML in AV2 that's not making it into the initial profile but hopefully into a followup. Using GenAI in a big way will have a lot of challenges, because decoding needs to be extremely deterministic, which really isn't the strong suit of ML techniques, or anything based on floating point (as MPEG-2 showed us). And the more advanced the Gen AI is for compression efficiency, the more plausible generated hallucinations will be. And it would cause all kinds of problems if a movie can have text on a newspaper subtly altered to mean something different. |
|
|
|
![]() |
| Tags |
| artificial intelligence, diffusion, neural networks |
| Thread Tools | |
| Display Modes | |
|
|