Enhance! RAISR Sharp Images with Machine Learning

Overdrive80 · 15th November 2016, 21:09

https://research.googleblog.com/2016...h-machine.html

LoRd_MuldeR · 15th November 2016, 23:59

Original:

Spline36Resize (4x):

NNEDI3 (4x):

NNEDI3 (4x) + Sharpen:

Theirs:

Gser · 16th November 2016, 01:19

Is the original picture available somewhere or did you just point resize that one? Would like to try some other things.

smok3 · 16th November 2016, 12:50

from https://arxiv.org/abs/1606.01299

Quote:

Given an image, we wish to produce an image of larger size with significantly more pixels and higher image quality. This is generally known as the Single Image Super-Resolution (SISR) problem. The idea is that with sufficient training data (corresponding pairs of low and high resolution images) we can learn set of filters (i.e. a mapping) that when applied to given image that is not in the training set, will produce a higher resolution version of it, where the learning is preferably low complexity. In our proposed approach, the run-time is more than one to two orders of magnitude faster than the best competing methods currently available, while producing results comparable or better than state-of-the-art.

So this will basically be problematic for video/video compression? Only usefull for single-images?

feisty2 · 16th November 2016, 13:58

I'm a bit disappointed cuz nn doesn't seem to work so well imho(at least for now)..
sure it won't be possible to "recover" the image to higher resolution, but the guess work, which is what nn does, is also not good enough to fool my eyes

LoRd_MuldeR · 16th November 2016, 19:31

Quote:

Originally Posted by Gser

Is the original picture available somewhere or did you just point resize that one? Would like to try some other things.

What they call "original" seems to be a 4x upscale (using PointResize?) of the actual original. So, yes, reduced it to 1/4 via PointResize before applying NNEDI3_rpow2.

Quote:

Originally Posted by smok3

So this will basically be problematic for video/video compression? Only usefull for single-images?

Any scaling method that works on "single" images can trivially be extended to work on video, because a video is nothing but a sequence of "single" images.

The most common image scaling method, like BiLinear, BiCubic or Lanczos interpolation also are "single image", i.e. they process each frame like an independent picture. This applies to plain NNEDI3 just as well.

Whether their method is fast enough to be useful for (real-time) video upscaling, that's a whole different question though...

smok3 · 16th November 2016, 20:53

LoRd_MuldeR:

, What I meant was that two slightly different frames could use completely different resizing methods due to the brute-force methodology used here (Which may look weird when 'animated'). But thanks for that explanation anyway, especially the part about "video is nothing but a sequence of single images" < that got me giggling.

LoRd_MuldeR · 16th November 2016, 21:23

Quote:

Originally Posted by smok3

Quote:

Given an image, we wish to produce an image of larger size with significantly more pixels and higher image quality. This is generally known as the Single Image Super-Resolution (SISR) problem. The idea is that with sufficient training data (corresponding pairs of low and high resolution images) we can learn set of filters (i.e. a mapping) that when applied to given image that is not in the training set, will produce a higher resolution version of it, where the learning is preferably low complexity. In our proposed approach, the run-time is more than one to two orders of magnitude faster than the best competing methods currently available, while producing results comparable or better than state-of-the-art.

What I meant was that two slightly different frames could use completely different resizing methods due to the brute-force methodology used here (Which may look weird when 'animated').

It's not "brute force", it's how neural networks work:

You start with an initial (random) network and then you "train" the network with pairs of input data and corresponding (optimal) output data, aka the "training set" – in this case pairs of low-resolution images and corresponding high-resolution images. In the end, you get a network that (hopefully) produces "good" results, even for unknown inputs – in this case a network that will approximate "sharp and naturally-looking" high-resolution image from an (unknown) low-resolution image.

This is more or less exactly how NNEDI3 (and its predecessors) have been created! I don't think there is reason to assume that this approach will necessarily create a highly discontinuous filter, i.e. a filter that would produce completely different outputs, even for very small deviations of the input. At least NNEDI3 shows the opposite. It works pretty well for (progressive) video upscaling, doesn't it?

(Note: The reason why NNEDI3 alone is not a good double-rate deinterlacer and produces notable "bobbing" effect is a different one - it is because the images are alternatingly approximated from "odd" and "even" lines in that scenario)

smok3 · 17th November 2016, 09:10

Define "train". Yeah it is not brute-force once the user gets this, its a cache of brute-force rather.

Gser · 17th November 2016, 17:12

Theirs

SuperRes(2, 1, 0, """nnedi3_rpow2(rfactor=4, nns=4, cshift="Spline16Resize")""")

2xSuperResXBR(2, .6, xbrStr=2.3, xbrSharp=1.2)

2xSuperResXBR(1, .7, xbrStr=.1, xbrSharp=.7)

LoRd_MuldeR · 17th November 2016, 19:39

Quote:

Originally Posted by smok3

Define "train".

You present your current network with an input/output pair, the training sample. In this case, the "input" would be a low-resolution version of the original image, and the "output" is the original high-resolution image. You let your network create its own (high-res) output image from the given (low-res) "input" image. And then you compare that against the given optimal "output" image (original). Of course, there will be some difference between the network's actual output and the optimal (desired) output - especially at the beginning of the training phase. This difference, or "error", will be used to update (improve) the network, so that the error is reduced. For example, one approach is to let the "error" propagate through the network in backwards direction and adjust the individual weights accordingly. You repeat this process with many training samples (input/output pairs). In the end, you get a network that (hopefully) produces good results, even for unknown inputs.

Overdrive80 · 17th November 2016, 19:48

Quote:

Originally Posted by smok3

Define "train". Yeah it is not brute-force once the user gets this, its a cache of brute-force rather.

Quote:

Yaniv Romano, John Isidoro, Peyman Milanfar
(Submitted on 3 Jun 2016 (v1), last revised 4 Oct 2016 (this version, v3))

Given an image, we wish to produce an image of larger size with significantly more pixels and higher image quality. This is generally known as the Single Image Super-Resolution (SISR) problem. The idea is that with sufficient training data (corresponding pairs of low and high resolution images) we can learn set of filters (i.e. a mapping) that when applied to given image that is not in the training set, will produce a higher resolution version of it, where the learning is preferably low complexity. In our proposed approach, the run-time is more than one to two orders of magnitude faster than the best competing methods currently available, while producing results comparable or better than state-of-the-art.
A closely related topic is image sharpening and contrast enhancement, i.e., improving the visual quality of a blurry image by amplifying the underlying details (a wide range of frequencies). Our approach additionally includes an extremely efficient way to produce an image that is significantly sharper than the input blurry one, without introducing artifacts such as halos and noise amplification. We illustrate how this effective sharpening algorithm, in addition to being of independent interest, can be used as a pre-processing step to induce the learning of more effective upscaling filters with built-in sharpening and contrast enhancement effect.

https://arxiv.org/abs/1606.01299

CruNcher · 19th November 2016, 16:56

SOA Compare of different test cases from the dataset the building reconstruction result is pretty good also all the text (letter) parts and the faceted eyes pattern reconstruction

very impressive high frequency preservation in all of those results by default

No one yet here gained those sharp results without haloing so far in the Post Process with the Human Eyes Face sample is still far away from those reconstruction results especially at the fine hair structure reconstruction (shown also with the cat test case).

https://drive.google.com/file/d/0BzC...VFZGJ4OWc/view

Triticals NNEDI3 got on the first sight beaten by Google R&D

Wee need the Clown and Lighthouse test with Googles Trained Algorithm

plasma · 21st November 2016, 22:05

using neural enhance+denoise

luquinhas0021 · 4th February 2017, 21:58

Where can I get some stable and updated version of this algorithm for personal use? RAISR outperformed Waifu2x, in my tests.

Overdrive80 · 17th March 2017, 22:20

https://github.com/google/guetzli/

LoRd_MuldeR · 18th March 2017, 01:37

Quote:

Originally Posted by Overdrive80

https://github.com/google/guetzli/

We have a Guetzli thread already:
https://forum.doom9.org/showthread.php?t=174428

madshi · 21st March 2017, 13:15

FWIW, the blown-up "original" image doesn't match the small sized image posted by LoRd_MuldeR. The "original" image used by RAISR appears to be higher quality. So all the comparison images with "our" algorithms seem to be at a disadvantage here.

It's a pitty Google didn't find it necessary to make all the images from their PDF available as PNG (or even JPG). But maybe they had a good reason for that...

The biggest problem with all these neural network algorithms is that usually the guys publishing such "scientific" papers use *one* specific algorithm to downscale their images, and then their neural network just learns how to revert the downscaling. Obviously you can get pretty good results that way. But as soon as you feed those neural networks with images downscaled with a different algorithm (e.g. Lanczos instead of Catrom, or Bilinear or Box), the algos quickly fall apart. The only way to properly evaluate the quality of an upscaling algorithm is to test it with multiple different images, which were downscaled with different algorithms.

luquinhas0021 · 22nd March 2017, 13:38

Your thoughts makes complete sense, Madshi.

I thought the correct approach for upscaling is simply an image that must be size-increased... Just that. Not a H.R image that was downscaled and the upscale will try reverse the downscaling.

madshi · 22nd March 2017, 14:55

Quote:

Originally Posted by luquinhas0021

I thought the correct approach for upscaling is simply an image that must be size-increased... Just that. Not a H.R image that was downscaled and the upscale will try reverse the downscaling.

Yes. However, at doom9 we're talking about DVDs and Blu-Rays which were usually downscaled from higher resolution masters. So learning how to revert downscaling is not generally a bad idea, IMHO. For our purposes here it could even be a great idea. But the important thing is that an upscaling algorithm, which works by trying to revert a prior downscaling operation, should not be limited to work with only one specific downscaling algorithm. But it should work reasonably well for all common downscaling algorithms. If it does, it might also work well enough for images that weren't downscaled at all.

My point was that we have to take any results from such "scientific PDFs" with a pinch of salt, because they often train for just one specific downscaling algorithm. So it's important that we're able to double check the upscaling results ourselves, with test images we
created ourselves. Otherwise it's hard to judge how good such an algo really is.

15th November 2016, 21:09	#1 \| Link
Overdrive80 Anime addict Join Date: Feb 2009 Location: Spain Posts: 673	Enhance! RAISR Sharp Images with Machine Learning https://research.googleblog.com/2016...h-machine.html __________________ Intel i7-6700K + Noctua NH-D15 + Z170A XPower G. Titanium + Kingston HyperX Savage DDR4 2x8GB + Radeon RX580 8GB DDR5 + ADATA SX8200 Pro 1 TB + Antec EDG750 80 Plus Gold Mod + Corsair 780T Graphite

15th November 2016, 23:59	#2 \| Link
LoRd_MuldeR Software Developer Join Date: Jun 2005 Location: Last House on Slunk Street Posts: 13,248	Original: Spline36Resize (4x): NNEDI3 (4x): NNEDI3 (4x) + Sharpen: Theirs: __________________ Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 16th November 2016 at 00:04.

17th November 2016, 17:12	#10 \| Link
Gser Registered User Join Date: Apr 2008 Posts: 418	Theirs SuperRes(2, 1, 0, """nnedi3_rpow2(rfactor=4, nns=4, cshift="Spline16Resize")""") 2xSuperResXBR(2, .6, xbrStr=2.3, xbrSharp=1.2) 2xSuperResXBR(1, .7, xbrStr=.1, xbrSharp=.7) Last edited by Gser; 17th November 2016 at 17:15.

19th November 2016, 16:56	#13 \| Link
CruNcher Registered User Join Date: Apr 2002 Location: Germany Posts: 4,926	SOA Compare of different test cases from the dataset the building reconstruction result is pretty good also all the text (letter) parts and the faceted eyes pattern reconstruction very impressive high frequency preservation in all of those results by default No one yet here gained those sharp results without haloing so far in the Post Process with the Human Eyes Face sample is still far away from those reconstruction results especially at the fine hair structure reconstruction (shown also with the cat test case). https://drive.google.com/file/d/0BzC...VFZGJ4OWc/view Triticals NNEDI3 got on the first sight beaten by Google R&D Wee need the Clown and Lighthouse test with Googles Trained Algorithm __________________ all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 Last edited by CruNcher; 19th November 2016 at 20:06.

21st November 2016, 22:05	#14 \| Link
plasma Registered User Join Date: Nov 2016 Posts: 15	using neural enhance+denoise Last edited by plasma; 21st November 2016 at 22:06. Reason: image

16th November 2016, 01:19	#3 \| Link
Gser Registered User Join Date: Apr 2008 Posts: 418	Is the original picture available somewhere or did you just point resize that one? Would like to try some other things.

16th November 2016, 13:58	#5 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	I'm a bit disappointed cuz nn doesn't seem to work so well imho(at least for now).. sure it won't be possible to "recover" the image to higher resolution, but the guess work, which is what nn does, is also not good enough to fool my eyes

16th November 2016, 20:53	#7 \| Link
smok3 brontosaurusrex Join Date: Oct 2001 Posts: 2,392	LoRd_MuldeR: , What I meant was that two slightly different frames could use completely different resizing methods due to the brute-force methodology used here (Which may look weird when 'animated'). But thanks for that explanation anyway, especially the part about "video is nothing but a sequence of single images" < that got me giggling.

17th November 2016, 09:10	#9 \| Link
smok3 brontosaurusrex Join Date: Oct 2001 Posts: 2,392	Define "train". Yeah it is not brute-force once the user gets this, its a cache of brute-force rather.

4th February 2017, 21:58	#15 \| Link
luquinhas0021 The image enthusyast Join Date: Mar 2015 Location: Brazil Posts: 270	Where can I get some stable and updated version of this algorithm for personal use? RAISR outperformed Waifu2x, in my tests. __________________ Searching for great solutions

17th March 2017, 22:20	#16 \| Link
Overdrive80 Anime addict Join Date: Feb 2009 Location: Spain Posts: 673	https://github.com/google/guetzli/ __________________ Intel i7-6700K + Noctua NH-D15 + Z170A XPower G. Titanium + Kingston HyperX Savage DDR4 2x8GB + Radeon RX580 8GB DDR5 + ADATA SX8200 Pro 1 TB + Antec EDG750 80 Plus Gold Mod + Corsair 780T Graphite

21st March 2017, 13:15	#18 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	FWIW, the blown-up "original" image doesn't match the small sized image posted by LoRd_MuldeR. The "original" image used by RAISR appears to be higher quality. So all the comparison images with "our" algorithms seem to be at a disadvantage here. It's a pitty Google didn't find it necessary to make all the images from their PDF available as PNG (or even JPG). But maybe they had a good reason for that... The biggest problem with all these neural network algorithms is that usually the guys publishing such "scientific" papers use one specific algorithm to downscale their images, and then their neural network just learns how to revert the downscaling. Obviously you can get pretty good results that way. But as soon as you feed those neural networks with images downscaled with a different algorithm (e.g. Lanczos instead of Catrom, or Bilinear or Box), the algos quickly fall apart. The only way to properly evaluate the quality of an upscaling algorithm is to test it with multiple different images, which were downscaled with different algorithms.

22nd March 2017, 13:38	#19 \| Link
luquinhas0021 The image enthusyast Join Date: Mar 2015 Location: Brazil Posts: 270	Your thoughts makes complete sense, Madshi. I thought the correct approach for upscaling is simply an image that must be size-increased... Just that. Not a H.R image that was downscaled and the upscale will try reverse the downscaling. __________________ Searching for great solutions Last edited by luquinhas0021; 27th November 2017 at 16:25.