Log in

View Full Version : Enhance! RAISR Sharp Images with Machine Learning


Pages : 1 [2]

huhn
16th November 2017, 14:02
i was thinking to do the extra work of taking 24-48 frames from a video file and scale them all with this scaler and to turn it into a lossless RGB video file to show it in motion but the rotation images show the issue more than enough.

madshi
16th November 2017, 14:51
Here is another example with significant artifacts.
How many neurons did you use for NNEDI3? I'm quite surprised by how much better NGU-AA (medium!!) quality is compared to NNEDI3 with this test image. I find NGU-AA to be not only more detailed, but it also has less ringing artifacts and looks more natural at the same time! I guess I should ask (in the madVR thread) once more if NNEDI3 is still needed/useful.

but for video i think this multiple frames (Detail-revealing Deep Video Super-resolution) give good result than any Single Image Super-Resolution
https://github.com/jiangsutx/SPMC_VideoSR :D:D
Yes, it looks good. Would be interesting to see how it would handle typical movie sources. Usually these types of algos are too slow for real time use, though.

i was thinking to do the extra work of taking 24-48 frames from a video file and scale them all with this scaler and to turn it into a lossless RGB video file to show it in motion but the rotation images show the issue more than enough.
The RGB video would have been quite interesting, but it would have required the help of 5-10 users because the website limits the number of upscales for each user to max 5 images.

zub35
16th November 2017, 14:55
HolyWu, madshi

crowd_run in motion
original 1 (http://images2.imagebam.com/6b/3d/4e/e2aba5658688653.png) 2 (http://images2.imagebam.com/94/fc/3e/26da5c658688673.png) 3 (http://images2.imagebam.com/41/97/d3/49f4f4658688693.png) 4 (http://images2.imagebam.com/c1/ba/79/bf6a0a658688703.png) 5 (http://images2.imagebam.com/85/b9/34/7f2e43658688733.png) (downscale lanczos)
waifu2x 1 (http://images2.imagebam.com/11/fd/fd/1385fe658684953.png) 2 (http://images2.imagebam.com/12/88/6f/8f17d7658685003.png) 3 (http://images2.imagebam.com/e9/87/e6/38d0ec658687853.png) 4 (http://images2.imagebam.com/da/b1/c8/730f40658685103.png) 5 (http://images2.imagebam.com/56/af/d9/04e450658685153.png)
letsenhance 1 (http://images2.imagebam.com/2e/24/84/adb0c1658675553.png) 2 (http://images2.imagebam.com/08/86/cb/ecf989658675573.png) 3 (http://images2.imagebam.com/fb/21/34/2eb5f3658675613.png) 4 (http://images2.imagebam.com/da/4c/0d/c9b8b0658675653.png) 5 (http://images2.imagebam.com/f7/8a/f9/e1f876658675743.png)
If the system detects, that you uploaded compressed JPEG, it automatically applies anti-JPEG neural network. While it is highly efficient in removing JPEG artifacts, it can also lead to some blurring. This is anticipated.
If you would like to avoid it, you can save you JPEG as PNG-24. This will skip anti-JPEGing, as well as preserve JPEG noise. If you are sure that this is something you need, we advise to do the PNG-24 thing.

madshi
16th November 2017, 15:17
@zub35, are you involved in that website? Or how did you manage to create more than 5 images? Would you mind sharing the original crowd run video? I think I only have an image, but not the video.

On a positive note, the texture in the big tree in the middle is relatively stable in motion. It's not completely stable, but it's better than I expected. However, the crowd in the very (right) background switches between blurry and sharp from one frame to the next, and some of the faces in the crowd are really scary looking, when enhanced with the hallucinated textures. Overall, I still think this algo won't work well for video.

zub35
16th November 2017, 15:23
madshi, No, I am the same observer as you are. Upload more than 5 - how will allow the number of email addresses and your conscience ;)
crowd_run and more test sample: https://media.xiph.org/video/derf/

madshi
16th November 2017, 15:33
Ah cool, thanks for the link. Some of those videos look familiar to me.

zub35
16th November 2017, 16:16
crowd_run in motion
original 1 (http://images2.imagebam.com/6b/3d/4e/e2aba5658688653.png) 2 (http://images2.imagebam.com/94/fc/3e/26da5c658688673.png) 3 (http://images2.imagebam.com/41/97/d3/49f4f4658688693.png) 4 (http://images2.imagebam.com/c1/ba/79/bf6a0a658688703.png) 5 (http://images2.imagebam.com/85/b9/34/7f2e43658688733.png) (downscale lanczos)
waifu2x 1 (http://images2.imagebam.com/11/fd/fd/1385fe658684953.png) 2 (http://images2.imagebam.com/12/88/6f/8f17d7658685003.png) 3 (http://images2.imagebam.com/e9/87/e6/38d0ec658687853.png) 4 (http://images2.imagebam.com/da/b1/c8/730f40658685103.png) 5 (http://images2.imagebam.com/56/af/d9/04e450658685153.png)
letsenhance 1 (http://images2.imagebam.com/2e/24/84/adb0c1658675553.png) 2 (http://images2.imagebam.com/08/86/cb/ecf989658675573.png) 3 (http://images2.imagebam.com/fb/21/34/2eb5f3658675613.png) 4 (http://images2.imagebam.com/da/4c/0d/c9b8b0658675653.png) 5 (http://images2.imagebam.com/f7/8a/f9/e1f876658675743.png)

Encode x264 medium crf=25 (https://cloud.mail.ru/public/Eh8g/XaN9eKaCH)
original 1 (http://images2.imagebam.com/28/ec/88/7dc5d2658744163.png) 2 (http://images2.imagebam.com/5b/6c/4b/a4b208658744173.png) 3 (http://images2.imagebam.com/ba/36/c6/80e99f658744213.png) 4 (http://images2.imagebam.com/09/4b/db/6c10d3658744233.png) 5 (http://images2.imagebam.com/d7/0e/50/bc67e1658744253.png)
lanczos4 1 (http://images2.imagebam.com/d3/7f/d8/e4b10f658764523.png) 2 (http://images2.imagebam.com/94/78/b9/8b6cc2658764603.png) 3 (http://images2.imagebam.com/3a/bd/0f/f2afba658764683.png) 4 (http://images2.imagebam.com/e8/05/6c/73b78c658764713.png) 5 (http://images2.imagebam.com/b0/64/47/ee7a7d658764743.png)
letsenhance PNG 1 (http://images2.imagebam.com/75/e1/ff/3a235e658748653.png) 2 (http://images2.imagebam.com/31/4d/05/edc5f9658748683.png) 3 (http://images2.imagebam.com/01/84/e0/664959658748723.png) 4 (http://images2.imagebam.com/b0/44/7f/824a8d658748753.png) 5 (http://images2.imagebam.com/7e/05/18/985314658748793.png)
letsenhance JPG(q100) 1 (http://images2.imagebam.com/a0/7a/ae/1d94e4658751503.png) 2 (http://images2.imagebam.com/10/8d/97/9ca700658751523.png) 3 (http://images2.imagebam.com/95/ff/71/e355c7658751563.png) 4 (http://images2.imagebam.com/8b/64/fc/30f8cf658751593.png) 5 (http://images2.imagebam.com/a8/3e/ad/f127d3658751613.png)

HolyWu Maybe you used a 1080 sample. Alas, it is of poor quality downscale. Use 2160 sample ;)

madshi
16th November 2017, 16:27
How does it compare to madVR's "remove compression artifacts"?

zub35
16th November 2017, 16:42
madshi Try it, I can make mistakes with the choice of filters. File to download in the post above.
p.s. Private Messages ;)

UPD: From a technical, it is better to create a new neural network, comparing frame by frame the source file with compression (several thousand variations) for restoration / removal of artifacts (given the different codecs and settings)
And then, to apply this neural network to upscale. Either combine them into one big neural network.

Based on the above. Provided that a stable algorithm. Add to the container (mkv) of the video file (or individual *.neural file), the minimum data needed for fast/realtime work of the neural network on the players.
At the same time, don't even have to create a new video standard, and apply them to the existing AVC or HEVC. Accordingly, retaining the ability to play videos where there is no power to the neural network.

ABDO
16th November 2017, 22:34
I find NGU-AA to be not only more detailed, but it also has less ringing artifacts and looks more natural at the same time! I guess I should ask (in the madVR thread) once more if NNEDI3 is still needed/useful.
for me i never used NNEDI3 since NGU-AA come to madvr, NGU-AA looks more natural as you said and much faster in the same time :D

Would be interesting to see how it would handle typical movie sources. Usually these types of algos are too slow for real time use, though.
yeah, it is too slow for real time use,i will traning the network soon if author did not put the pretraning moudel and i wish it give good result in typical movie sources.

How does it compare to madVR's "remove compression artifacts"?
it is equal to madvr RCA-6 Values, but madvr also much much faster, i will up some comparison image soon.
edit:
All Images upscaling with NGU Sharp VH

jpg Source
https://postimg.org/image/pcaefvqfd/

madvr RCA6-high 1080p
https://postimg.org/image/qtvurl0cp/

letsenhance-anti-jpeg source
https://postimg.org/image/qv5skkewp/

letsenhance-anti-jpeg 1080p
https://postimg.org/image/ay70o19pl/

i think while (RCA*6) totally treat jpeg artifacts as efficiency as (letsenhance-anti-jpeg), but it is clear that (madvr-RCA) excel in save more sharpen and texture detail ��, With Note that (madvr- RCA) give this result in real time, though for now definitely (madvr-RCA) does best job than (letsenhance-anti-jpeg) since we do not have any control over letsenhance-anti-jpeg strength.(sorry if i miss understand your question right)

zub35
17th November 2017, 14:43
ABDO He was referring to - apply filters to remove artifacts from compressed video of my example crf=25 and upload them in a png and jpeg and compare them with the results without filters removal of artifacts.

But I think (not tested) that removing artifacts, will not allow the algorithm to recover the details. Since the compress-artifacts contains information about the original content.
The presence of an artifact indicates that this place was something else, unlike those places where artifacts are not.

ABDO
17th November 2017, 16:45
ABDO He was referring to - apply filters to remove artifacts from compressed video of my example crf=25 and upload them in a png and jpeg and compare them with the results without filters removal of artifacts.
i am sorry, i realy did not understand the question right.

But I think (not tested) that removing artifacts, will not allow the algorithm to recover the details. Since the compress-artifacts contains information about the original content.
The presence of an artifact indicates that this place was something else, unlike those places where artifacts are not.
yeah, Unfortunately i am not sure, as i not technical man or programer, but as you said the technology is extremely promising, so i think the future will bring an Improvements to it.

feisty2
22nd November 2017, 16:41
Very good, thank you! I've de-rotated/de-mirrored your images and here they are for easy comparison:

no modification (http://madVR.com/doom9/gan/clownGanOrg.png) - | - rotated left (http://madVR.com/doom9/gan/clownGanRotLeft.png) - | - rotated right (http://madVR.com/doom9/gan/clownGanRotRight.png) - | - mirrored horizontally (http://madVR.com/doom9/gan/clownGanMirrorHorz.png)

If you compare these images, you'll see that the texture changes a lot in all 4 images, it's completely different in each frame. It still has an overall similar look to it, but the changes are still much bigger than any dithering, so in motion this will look extremely noisy/unstable.

For still images it might not matter too much, but for video this type of "texture hallucination" is IMHO currently not feasible in motion, because it is not stable when the image content changes slightly. I'm not sure if the algorithm could be changed to fix this problem. I kind of doubt it because the algo by design doesn't even try to restore the original texture (which is technically impossible, anyway), it just tries to hallucinate a texture which hopefully has a similar look to the texture the original hi-res image had before downscaling. So the algo is by design not able to maintain a stable "position" of the texture in motion.

Even worse, if you look at the very bottom of the image, the alphalt texture is changing its brightness very strongly from frame to frame, this will actually produce visible flickering in motion. That said, these brightness fluctuations should be fixable with better neural network training.

the loss function defined in SRGAN has 2 sections, content loss and adversarial loss, the content loss is defined as a perceptual loss which is a high level VGG feature loss rather than pixel loss (SAD/MSE), the adversarial loss would try to make the reconstructed image look as close to a native high res image in general as possible and probably 80% of the magic comes from this section, so this section stays put, now a different content loss function would not affect the "hi res" magic much, but would determine the level of the "richness" of details in the generated image, and here u could remove the perceptual loss function and replace it with MSE, the perceptual loss function gives rich but unstable details while MSE gives blurry but stable details, adversarial loss paired with MSE would give u a slightly blurry and stable result that still looks native high resolution in general, I guess u could try with this and see if it works out alright

madshi
22nd November 2017, 16:49
Thanks. Do you happen to know any papers/projects which do it that way? Would like to see if the resulting images look pleasing enough to make it worth spending the time.

feisty2
22nd November 2017, 17:31
Thanks. Do you happen to know any papers/projects which do it that way? Would like to see if the resulting images look pleasing enough to make it worth spending the time.

page 8 of the original SRGAN paper (https://arxiv.org/pdf/1609.04802.pdf), the author compared the results of different loss functions
https://s7.postimg.org/atgl1sxp7/Untitled.png
SRResNet: MSE
SRGAN-MSE: adversarial loss + MSE (what I suggested)
SRGAN-VGG22: adversarial loss + lower level of perceptual loss
SRGAN-VGG54: adversarial loss + higher level of perceptual loss

madshi
22nd November 2017, 17:54
Thanks, doesn't look too bad. Might be worth a try. It does seem to do some sort of texture sharpening, though. I wonder what that will do to video sources with compression artifacts... :scared:

zub35
11th May 2018, 19:21
neural network in restoration old films
(rus) https://yandex.ru/blog/company/oldfilms
example: 1 (https://www.kinopoisk.ru/film/raduga-1943-42037/#!watch-film/4ebe422d3c94cb48a0d1a9360b0e905b/kp) 2 (https://www.kinopoisk.ru/film/letyat-zhuravli-1957-7724/#!watch-film/45e3a48fd611065491ffef684afd26c7/kp) 3 (https://www.kinopoisk.ru/film/dorogoy-moy-chelovek-1958-44291/#!watch-film/449cb06050ed36018c823eeb42171eed/kp) 4 (https://www.kinopoisk.ru/film/sudba-cheloveka-1959-44027/#!watch-film/408967ac7a2ce635bbe5f6252bbadebd/kp)

poisondeathray
13th May 2018, 02:07
neural network in restoration old films
(rus) https://yandex.ru/blog/company/oldfilms
example: 1 (https://www.kinopoisk.ru/film/raduga-1943-42037/#!watch-film/4ebe422d3c94cb48a0d1a9360b0e905b/kp) 2 (https://www.kinopoisk.ru/film/letyat-zhuravli-1957-7724/#!watch-film/45e3a48fd611065491ffef684afd26c7/kp) 3 (https://www.kinopoisk.ru/film/dorogoy-moy-chelovek-1958-44291/#!watch-film/449cb06050ed36018c823eeb42171eed/kp) 4 (https://www.kinopoisk.ru/film/sudba-cheloveka-1959-44027/#!watch-film/408967ac7a2ce635bbe5f6252bbadebd/kp)

This is cool, thanks

Any more info on what process was used ? Some things lost in google translation too