View Full Version : vs-basicvsr
PatchWorKs
16th November 2024, 11:57
@Selur how you judge MIA-VSR ?
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Recently, Vision Transformer has achieved great success in recovering missing details in low-resolution sequences, i.e., the video super-resolution (VSR) task. Despite its superiority in VSR accuracy, the heavy computational burden as well as the large memory footprint hinder the deployment of Transformer-based VSR models on constrained devices. In this paper, we address the above issue by proposing a novel feature-level masked processing framework: VSR with Masked Intra and inter-frame Attention (MIA-VSR). The core of MIA-VSR is leveraging featurelevel temporal continuity between adjacent frames to reduce redundant computations and make more rational use of previously enhanced SR features. Concretely, we propose an intra-frame and inter-frame attention block which takes the respective roles of past features and input features into consideration and only exploits previously enhanced features to provide supplementary information. In addition, an adaptive block-wise mask prediction module is developed to skip unimportant computations according to feature similarity between adjacent frames. We conduct detailed ablation studies to validate our contributions and compare the proposed method with recent state-of-the-art VSR approaches. The experimental results demonstrate that MIAVSR improves the memory and computation efffciency over state-of-the-art methods, without trading off PSNR accuracy.
https://raw.githubusercontent.com/LabShuHangGU/MIA-VSR/refs/heads/main/assets/Results.png
Git: https://github.com/LabShuHangGU/MIA-VSR
Paper: https://arxiv.org/abs/2401.06312
Selur
16th November 2024, 12:28
Haven't tried it. Since it's flownet based, maybe styler00dollar or HolyWu will write a Vapoursynth wrapper for it in the future.
poisondeathray
16th November 2024, 15:33
Maybe styler00dollar or HolyWu can port evtexture , currently the best 4x video superres by PSNR - REDS4 32.93, On Vid4 29.78
https://github.com/dachunkai/evtexture
https://arxiv.org/abs/2406.13457
I can't get it to run on your own video data set - I can't get the event voxel flow grid generation step modified correctly. The author is supposed to provide an inference script on user's own video data, but has not posted it yet
ReinerSchweinlin
17th November 2024, 15:11
This looks promising - at least from the example videos they show...
PatchWorKs
28th November 2024, 09:47
Some other upcoming interesting video-enhancing papers in CVPR-2024 (https://github.com/liuzhen03/awesome-video-enhancement?tab=readme-ov-file#cvpr-2024) and ECCV-2024 (https://github.com/liuzhen03/awesome-video-enhancement?tab=readme-ov-file#eccv-2024)...
poisondeathray
28th November 2024, 20:16
Some other upcoming interesting video-enhancing papers in CVPR-2024 (https://github.com/liuzhen03/awesome-video-enhancement?tab=readme-ov-file#cvpr-2024) and ECCV-2024 (https://github.com/liuzhen03/awesome-video-enhancement?tab=readme-ov-file#eccv-2024)...
Thx for the heads up
I got FMA-Net from CVPR-2024 to work . The only pretrained model provided was trained on Reds
https://github.com/KAIST-VICLab/FMA-Net
FWIW here is suzie in FFV1 in RGB (bgr0)
https://www.mediafire.com/file/f47ytaqkfx9cw6v/suzie_FMA-Net_Reds_ffv1.mkv/file
FMA-Net is signifcantly faster than something like basicvsr++ or vrt/rvrt - but more aliasing and temporal flickering (many of the metrics commonly used like PSNR/SSIM don't measure temporal characteristics like temporal consistency artifacts)
One quirk I can't figure out is you lose the first and last frames
I couldn't get the other ones that have code published to work on your own datasets
PatchWorKs
11th December 2024, 11:41
More VSR, more fun !
StableVSR (https://github.com/claudiom4sir/StableVSR#readme) - Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models (ECCV 2024)
PatchWorKs
17th December 2024, 15:10
[I've deleted my previous reply with some other links]
OK, since I don't wanna use this 3ad "as a notepad" more, I listed those VSR cited here (and some other) under VIDEO (https://github.com/FORARTfe/HyMPS#-1) \ AI-based page (https://github.com/FORARTfe/HyMPS/blob/main/Video/AI-based.md#--) \ Upscalers (https://github.com/FORARTfe/HyMPS/blob/main/Video/AI-based.md#upscalers-): feel free to add or fork it.
Some "real use" (video) shootout would be cool... :thanks:
Selur
17th January 2025, 16:51
If anyone tries Distance Ratio Based Adjuster for Animeinter (DRBA) (https://github.com/routineLife1/DRBA) through cvffi (https://github.com/TensoRaws/ccvfi?tab=readme-ov-file) let me know how it compares to newer RIFE models.
Thanks!
Cu Selur
Z2697
17th January 2025, 19:57
DRBA is "a control mechanism for Video Frame Interpolation (VFI) networks specifically tailored for anime", it's meant to be used with RIFE or other VFI nets.
Judging by the demo video on github, it's not only tailored for anime, but even a quite specific type of scenes in anime: the background is moving and the forgound character is "semi-static" (she's changing pose but not actually moving, and is the typical shots in anime that has duplicated "frames").
Selur
18th January 2025, 05:14
Thanks for clearing that up.
PatchWorKs
20th February 2025, 11:52
Better late than never: Happy New Year !
Just discovered that Intel has "own" (server-oriented ?) open source Video Super Resolution library:
Intel Library for Video Super Resolution consist of a few different algorithms including machine learning and deep learning implementations to offer a balance between quality and performance.
We have enhanced the public RAISR (Rapid and Accurate Image Super Resolution), an AI based Super Resolution algorithm https://arxiv.org/pdf/1606.01299.pdf, to achieve better visual quality and beyond real-time performance for 2x and 1.5x upscaling on Intel® Xeon® platforms and Intel® GPUs. Enhanced RAISR provides better quality results than standard (bicubic) algorithms and a good performance vs quality trade-off as compared to compute intensive DL-based algorithms.
Enhanced RAISR is provided as an FFmpeg plugin inside of a Docker container(Docker container only for CPU) to help ease testing and deployment burdens. This project is developed using C++ and takes advantage of Intel® Advanced Vector Extension 512 (Intel® AVX-512) on Intel® Xeon® Scalable Processor family and OpenCL support on Intel® GPUs.
https://private-user-images.githubusercontent.com/89970744/363394862-e28b52c2-67c7-44a9-a66f-df8b355735f9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDAwNDc5OTIsIm5iZiI6MTc0MDA0NzY5MiwicGF0aCI6Ii84OTk3MDc0NC8zNjMzOTQ4NjItZTI4YjUyYzItNjdjNy00NGE5LWE2NmYtZGY4YjM1NTczNWY5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjIwVDEwMzQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThiYTgxYWUwNjJhNTU3MTRkZDYyNTk3MmRjZWZjYmY2NDkyMzU4MThiMjUzOWU4MTFhNDk5MTZhMWU5ODExYTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-jWF59M1V5ygFWHag8XX-lT2PDj5swqWiCe6Quwq8DA
Check out this interesting doc about performance/usage (https://github.com/OpenVisualCloud/Video-Super-Resolution-Library/blob/main/docs/performance.md) too.
Git: https://github.com/OpenVisualCloud/Video-Super-Resolution-Library#readme
PatchWorKs
29th March 2025, 21:40
Hi everyone, just discovered (and tested on a single image) S3Diff that looks VERY promising:
Diffusion-based image super-resolution (SR) methods have achieved remarkable success by leveraging large pre-trained text-to-image diffusion models as priors. However, these methods still face two challenges: the requirement for dozens of sampling steps to achieve satisfactory results, which limits efficiency in real scenarios, and the neglect of degradation models, which are critical auxiliary information in solving the SR problem. In this work, we introduced a novel one-step SR model, which significantly addresses the efficiency issue of diffusion-based SR methods. Unlike existing fine-tuning strategies, we designed a degradation-guided Low-Rank Adaptation (LoRA) module specifically for SR, which corrects the model parameters based on the pre-estimated degradation information from low-resolution images. This module not only facilitates a powerful data-dependent or degradation-dependent SR model but also preserves the generative prior of the pre-trained diffusion model as much as possible. Furthermore, we tailor a novel training pipeline by introducing an online negative sample generation strategy. Combined with the classifier-free guidance strategy during inference, it largely improves the perceptual quality of the super-resolution results. Extensive experiments have demonstrated the superior efficiency and effectiveness of the proposed model compared to recent state-of-the-art methods.
Demo results: https://github.com/ArcticHare105/S3Diff#visual_comparison
Git: https://github.com/ArcticHare105/S3Diff#readme
Selur
30th March 2025, 06:00
Since it's aiming at images not video, maybe we will end up with a onnx mask. :)
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.