Enhance! RAISR Sharp Images with Machine Learning - Page 3

madshi · 13th November 2017, 23:18

Ok, let's wait and see what the future will bring. Current situation is that GPUs would probably have to be about 20-50x faster than they are now to be able to apply waifu2x sized neural networks on video in real time. And that's just single frame processing, not yet taking temporal information into account.

Of course, for offline processing real time performance is not needed.

feisty2 · 14th November 2017, 09:02

looks like GAN related model to me, judging from all that generated fake details

ABDO · 14th November 2017, 15:46

the author says
“We took few state-of-art approaches, hacked around and rolled them into production-ready system. Basically we were inspired by SRGAN and EDSR papers.”

cork_OS · 14th November 2017, 21:47

Quote:

Originally Posted by zub35

online upscaler: https://letsenhance.io

Original
Up x4:
- Lanczos4
- waifu2x UpPhoto noise_scale Level1 x4
- letsenhance io !

Wow, clown is unbelievable!

madshi · 16th November 2017, 10:23

One problem with hallucinated texture detail is that it's most probably not "stable" in motion. Which means if there's only a small change in the video image, the hallucinated texture detail may shift its position, although still having a similar "look" to it. In motion this will probably look like very strong dithering noise. It in extreme cases it could even turn into flickering.

Unfortunately I've already spent my 5 allowed images on the website, so I can't test that. Maybe someone else can try upscaling the clown image shifted by 1 pixel? Or window boxed by 1 pixel? Or mirrored or rotated? That would be a quick way to check how "stable" the hallucinated textures are.

tebasuna51 · 16th November 2017, 11:43

Quote:

Originally Posted by madshi

Maybe someone else can try upscaling the clown image shifted by 1 pixel? Or window boxed by 1 pixel?

I spent my 5 images with 1,2,3,4,5 pixels shifted of original clown.png and magic 4X: https://www.sendspace.com/file/8mifbt

To see in movement order the letsenhance io was c5-0p.png

madshi · 16th November 2017, 12:06

Thank you! So it seems simple pixel shifting doesn't harm. Well, I suppose it makes sense because convolutional neural networks aren't really position dependent. So window boxing wouldn't make any difference, either. Argh, should have thought of that! Maybe adding a very small amount of noise would be interesting, or changing image brightness ever so slightly, or rotating by 1 degree, or something like that?

ABDO · 16th November 2017, 12:24

Quote:

Originally Posted by madshi

Maybe someone else can try upscaling the clown image shifted by 1 pixel? Or window boxed by 1 pixel? Or mirrored or rotated?

Mirrored

Rotated1

Rotated2

madshi · 16th November 2017, 12:57

Very good, thank you! I've de-rotated/de-mirrored your images and here they are for easy comparison:

no modification - | - rotated left - | - rotated right - | - mirrored horizontally

If you compare these images, you'll see that the texture changes a lot in all 4 images, it's completely different in each frame. It still has an overall similar look to it, but the changes are still much bigger than any dithering, so in motion this will look extremely noisy/unstable.

For still images it might not matter too much, but for video this type of "texture hallucination" is IMHO currently not feasible in motion, because it is not stable when the image content changes slightly. I'm not sure if the algorithm could be changed to fix this problem. I kind of doubt it because the algo by design doesn't even try to restore the original texture (which is technically impossible, anyway), it just tries to hallucinate a texture which hopefully has a similar look to the texture the original hi-res image had before downscaling. So the algo is by design not able to maintain a stable "position" of the texture in motion.

Even worse, if you look at the very bottom of the image, the alphalt texture is changing its brightness very strongly from frame to frame, this will actually produce visible flickering in motion. That said, these brightness fluctuations should be fixable with better neural network training.

ABDO · 16th November 2017, 13:59

Quote:

Originally Posted by madshi

but the changes are still much bigger than any dithering, so in motion this will look extremely noisy/unstable.

that is very sad news

For still images it realy do very good, i try most of this Single Image Super-Resolution, and most of them give Similar result to "boring" version, just srgan can give very Similar sharp result to "magic" version but less acuracy in hallucinate texture detail.

but for video i think this multiple frames (Detail-revealing Deep Video Super-resolution) give good result than any Single Image Super-Resolution
https://github.com/jiangsutx/SPMC_VideoSR

huhn · 16th November 2017, 14:02

i was thinking to do the extra work of taking 24-48 frames from a video file and scale them all with this scaler and to turn it into a lossless RGB video file to show it in motion but the rotation images show the issue more than enough.

madshi · 16th November 2017, 14:51

Quote:

Originally Posted by HolyWu

Here is another example with significant artifacts.

How many neurons did you use for NNEDI3? I'm quite surprised by how much better NGU-AA (medium!!) quality is compared to NNEDI3 with this test image. I find NGU-AA to be not only more detailed, but it also has less ringing artifacts and looks more natural at the same time! I guess I should ask (in the madVR thread) once more if NNEDI3 is still needed/useful.

Quote:

Originally Posted by ABDO

but for video i think this multiple frames (Detail-revealing Deep Video Super-resolution) give good result than any Single Image Super-Resolution
https://github.com/jiangsutx/SPMC_VideoSR

Yes, it looks good. Would be interesting to see how it would handle typical movie sources. Usually these types of algos are too slow for real time use, though.

Quote:

Originally Posted by huhn

i was thinking to do the extra work of taking 24-48 frames from a video file and scale them all with this scaler and to turn it into a lossless RGB video file to show it in motion but the rotation images show the issue more than enough.

The RGB video would have been quite interesting, but it would have required the help of 5-10 users because the website limits the number of upscales for each user to max 5 images.

zub35 · 16th November 2017, 14:55

HolyWu, madshi

crowd_run in motion
original 1 2 3 4 5 (downscale lanczos)
waifu2x 1 2 3 4 5
letsenhance 1 2 3 4 5

Quote:

If the system detects, that you uploaded compressed JPEG, it automatically applies anti-JPEG neural network. While it is highly efficient in removing JPEG artifacts, it can also lead to some blurring. This is anticipated.
If you would like to avoid it, you can save you JPEG as PNG-24. This will skip anti-JPEGing, as well as preserve JPEG noise. If you are sure that this is something you need, we advise to do the PNG-24 thing.

madshi · 16th November 2017, 15:17

@zub35, are you involved in that website? Or how did you manage to create more than 5 images? Would you mind sharing the original crowd run video? I think I only have an image, but not the video.

On a positive note, the texture in the big tree in the middle is relatively stable in motion. It's not completely stable, but it's better than I expected. However, the crowd in the very (right) background switches between blurry and sharp from one frame to the next, and some of the faces in the crowd are really scary looking, when enhanced with the hallucinated textures. Overall, I still think this algo won't work well for video.

zub35 · 16th November 2017, 15:23

madshi, No, I am the same observer as you are. Upload more than 5 - how will allow the number of email addresses and your conscience

crowd_run and more test sample: https://media.xiph.org/video/derf/

madshi · 16th November 2017, 15:33

Ah cool, thanks for the link. Some of those videos look familiar to me.

zub35 · 16th November 2017, 16:16

Quote:

Originally Posted by zub35

crowd_run in motion
original 1 2 3 4 5 (downscale lanczos)
waifu2x 1 2 3 4 5
letsenhance 1 2 3 4 5

Encode x264 medium crf=25
original 1 2 3 4 5
lanczos4 1 2 3 4 5
letsenhance PNG 1 2 3 4 5
letsenhance JPG(q100) 1 2 3 4 5

HolyWu Maybe you used a 1080 sample. Alas, it is of poor quality downscale. Use 2160 sample

madshi · 16th November 2017, 16:27

How does it compare to madVR's "remove compression artifacts"?

zub35 · 16th November 2017, 16:42

madshi Try it, I can make mistakes with the choice of filters. File to download in the post above.
p.s. Private Messages

UPD: From a technical, it is better to create a new neural network, comparing frame by frame the source file with compression (several thousand variations) for restoration / removal of artifacts (given the different codecs and settings)
And then, to apply this neural network to upscale. Either combine them into one big neural network.

Based on the above. Provided that a stable algorithm. Add to the container (mkv) of the video file (or individual *.neural file), the minimum data needed for fast/realtime work of the neural network on the players.
At the same time, don't even have to create a new video standard, and apply them to the existing AVC or HEVC. Accordingly, retaining the ability to play videos where there is no power to the neural network.

ABDO · 16th November 2017, 22:34

Quote:

Originally Posted by madshi

I find NGU-AA to be not only more detailed, but it also has less ringing artifacts and looks more natural at the same time! I guess I should ask (in the madVR thread) once more if NNEDI3 is still needed/useful.

for me i never used NNEDI3 since NGU-AA come to madvr, NGU-AA looks more natural as you said and much faster in the same time

Quote:

Would be interesting to see how it would handle typical movie sources. Usually these types of algos are too slow for real time use, though.

yeah, it is too slow for real time use,i will traning the network soon if author did not put the pretraning moudel and i wish it give good result in typical movie sources.

Quote:

How does it compare to madVR's "remove compression artifacts"?

it is equal to madvr RCA-6 Values, but madvr also much much faster, i will up some comparison image soon.
edit:
All Images upscaling with NGU Sharp VH

jpg Source
https://postimg.org/image/pcaefvqfd/

madvr RCA6-high 1080p
https://postimg.org/image/qtvurl0cp/

letsenhance-anti-jpeg source
https://postimg.org/image/qv5skkewp/

letsenhance-anti-jpeg 1080p
https://postimg.org/image/ay70o19pl/

i think while (RCA*6) totally treat jpeg artifacts as efficiency as (letsenhance-anti-jpeg), but it is clear that (madvr-RCA) excel in save more sharpen and texture detail ��, With Note that (madvr- RCA) give this result in real time, though for now definitely (madvr-RCA) does best job than (letsenhance-anti-jpeg) since we do not have any control over letsenhance-anti-jpeg strength.(sorry if i miss understand your question right)

16th November 2017, 15:23	#55 \| Link
zub35 Registered User Join Date: Oct 2016 Posts: 56	madshi, No, I am the same observer as you are. Upload more than 5 - how will allow the number of email addresses and your conscience crowd_run and more test sample: https://media.xiph.org/video/derf/ Last edited by zub35; 16th November 2017 at 15:35.

16th November 2017, 16:42	#59 \| Link
zub35 Registered User Join Date: Oct 2016 Posts: 56	madshi Try it, I can make mistakes with the choice of filters. File to download in the post above. p.s. Private Messages UPD: From a technical, it is better to create a new neural network, comparing frame by frame the source file with compression (several thousand variations) for restoration / removal of artifacts (given the different codecs and settings) And then, to apply this neural network to upscale. Either combine them into one big neural network. Based on the above. Provided that a stable algorithm. Add to the container (mkv) of the video file (or individual .neural file), the minimum data needed for fast/realtime work of the neural network on the players. At the same time, don't even have to create a new video standard, and apply them to the existing AVC or HEVC. Accordingly, retaining the ability to play videos where there is no power to the neural network. Last edited by zub35; 16th November 2017 at 18:32.*

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

13th November 2017, 23:18	#41 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	Ok, let's wait and see what the future will bring. Current situation is that GPUs would probably have to be about 20-50x faster than they are now to be able to apply waifu2x sized neural networks on video in real time. And that's just single frame processing, not yet taking temporal information into account. Of course, for offline processing real time performance is not needed.

14th November 2017, 09:02	#42 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	looks like GAN related model to me, judging from all that generated fake details

14th November 2017, 15:46	#43 \| Link
ABDO Registered User Join Date: Dec 2016 Posts: 65	the author says “We took few state-of-art approaches, hacked around and rolled them into production-ready system. Basically we were inspired by SRGAN and EDSR papers.”

16th November 2017, 10:23	#45 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	One problem with hallucinated texture detail is that it's most probably not "stable" in motion. Which means if there's only a small change in the video image, the hallucinated texture detail may shift its position, although still having a similar "look" to it. In motion this will probably look like very strong dithering noise. It in extreme cases it could even turn into flickering. Unfortunately I've already spent my 5 allowed images on the website, so I can't test that. Maybe someone else can try upscaling the clown image shifted by 1 pixel? Or window boxed by 1 pixel? Or mirrored or rotated? That would be a quick way to check how "stable" the hallucinated textures are.

16th November 2017, 12:06	#47 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	Thank you! So it seems simple pixel shifting doesn't harm. Well, I suppose it makes sense because convolutional neural networks aren't really position dependent. So window boxing wouldn't make any difference, either. Argh, should have thought of that! Maybe adding a very small amount of noise would be interesting, or changing image brightness ever so slightly, or rotating by 1 degree, or something like that?

16th November 2017, 12:57	#49 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	Very good, thank you! I've de-rotated/de-mirrored your images and here they are for easy comparison: no modification - \| - rotated left - \| - rotated right - \| - mirrored horizontally If you compare these images, you'll see that the texture changes a lot in all 4 images, it's completely different in each frame. It still has an overall similar look to it, but the changes are still much bigger than any dithering, so in motion this will look extremely noisy/unstable. For still images it might not matter too much, but for video this type of "texture hallucination" is IMHO currently not feasible in motion, because it is not stable when the image content changes slightly. I'm not sure if the algorithm could be changed to fix this problem. I kind of doubt it because the algo by design doesn't even try to restore the original texture (which is technically impossible, anyway), it just tries to hallucinate a texture which hopefully has a similar look to the texture the original hi-res image had before downscaling. So the algo is by design not able to maintain a stable "position" of the texture in motion. Even worse, if you look at the very bottom of the image, the alphalt texture is changing its brightness very strongly from frame to frame, this will actually produce visible flickering in motion. That said, these brightness fluctuations should be fixable with better neural network training.

16th November 2017, 14:02	#51 \| Link
huhn Registered User Join Date: Oct 2012 Posts: 7,923	i was thinking to do the extra work of taking 24-48 frames from a video file and scale them all with this scaler and to turn it into a lossless RGB video file to show it in motion but the rotation images show the issue more than enough.

16th November 2017, 15:17	#54 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	@zub35, are you involved in that website? Or how did you manage to create more than 5 images? Would you mind sharing the original crowd run video? I think I only have an image, but not the video. On a positive note, the texture in the big tree in the middle is relatively stable in motion. It's not completely stable, but it's better than I expected. However, the crowd in the very (right) background switches between blurry and sharp from one frame to the next, and some of the faces in the crowd are really scary looking, when enhanced with the hallucinated textures. Overall, I still think this algo won't work well for video.

16th November 2017, 15:33	#56 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	Ah cool, thanks for the link. Some of those videos look familiar to me.

16th November 2017, 16:27	#58 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	How does it compare to madVR's "remove compression artifacts"?