Enhance! RAISR Sharp Images with Machine Learning [Archive]

smok3

16th November 2016, 12:50

from https://arxiv.org/abs/1606.01299
Given an image, we wish to produce an image of larger size with significantly more pixels and higher image quality. This is generally known as the Single Image Super-Resolution (SISR) problem. The idea is that with sufficient training data (corresponding pairs of low and high resolution images) we can learn set of filters (i.e. a mapping) that when applied to given image that is not in the training set, will produce a higher resolution version of it, where the learning is preferably low complexity. In our proposed approach, the run-time is more than one to two orders of magnitude faster than the best competing methods currently available, while producing results comparable or better than state-of-the-art.
So this will basically be problematic for video/video compression? Only usefull for single-images?

feisty2

16th November 2016, 13:58

I'm a bit disappointed cuz nn doesn't seem to work so well imho(at least for now)..
sure it won't be possible to "recover" the image to higher resolution, but the guess work, which is what nn does, is also not good enough to fool my eyes

LoRd_MuldeR

16th November 2016, 19:31

Is the original picture available somewhere or did you just point resize that one? Would like to try some other things.

What they call "original" seems to be a 4x upscale (using PointResize?) of the actual original. So, yes, reduced it to 1/4 via PointResize before applying NNEDI3_rpow2.

So this will basically be problematic for video/video compression? Only usefull for single-images?

Any scaling method that works on "single" images can trivially be extended to work on video, because a video is nothing but a sequence of "single" images.

The most common image scaling method, like BiLinear, BiCubic or Lanczos interpolation also are "single image", i.e. they process each frame like an independent picture. This applies to plain NNEDI3 just as well.

Whether their method is fast enough to be useful for (real-time) video upscaling, that's a whole different question though...

smok3

16th November 2016, 20:53

LoRd_MuldeR: :), What I meant was that two slightly different frames could use completely different resizing methods due to the brute-force methodology used here (Which may look weird when 'animated'). But thanks for that explanation anyway, especially the part about "video is nothing but a sequence of single images" < that got me giggling.

LoRd_MuldeR

16th November 2016, 21:23

Given an image, we wish to produce an image of larger size with significantly more pixels and higher image quality. This is generally known as the Single Image Super-Resolution (SISR) problem. The idea is that with sufficient training data (corresponding pairs of low and high resolution images) we can learn set of filters (i.e. a mapping) that when applied to given image that is not in the training set, will produce a higher resolution version of it, where the learning is preferably low complexity. In our proposed approach, the run-time is more than one to two orders of magnitude faster than the best competing methods currently available, while producing results comparable or better than state-of-the-art. What I meant was that two slightly different frames could use completely different resizing methods due to the brute-force methodology used here (Which may look weird when 'animated').

It's not "brute force", it's how neural networks work:

You start with an initial (random) network and then you "train" the network with pairs of input data and corresponding (optimal) output data, aka the "training set" – in this case pairs of low-resolution images and corresponding high-resolution images. In the end, you get a network that (hopefully) produces "good" results, even for unknown inputs – in this case a network that will approximate "sharp and naturally-looking" high-resolution image from an (unknown) low-resolution image.

This is more or less exactly how NNEDI3 (and its predecessors) have been created! I don't think there is reason to assume that this approach will necessarily create a highly discontinuous filter, i.e. a filter that would produce completely different outputs, even for very small deviations of the input. At least NNEDI3 shows the opposite. It works pretty well for (progressive) video upscaling, doesn't it?

(Note: The reason why NNEDI3 alone is not a good double-rate deinterlacer and produces notable "bobbing" effect is a different one - it is because the images are alternatingly approximated from "odd" and "even" lines in that scenario)

smok3

17th November 2016, 09:10

Define "train". Yeah it is not brute-force once the user gets this, its a cache of brute-force rather.

Gser

17th November 2016, 17:12

Theirs
http://i.imgur.com/3YdjBri.png
SuperRes(2, 1, 0, """nnedi3_rpow2(rfactor=4, nns=4, cshift="Spline16Resize")""")
http://imgur.com/Fk4h2X3.jpg
2xSuperResXBR(2, .6, xbrStr=2.3, xbrSharp=1.2)
http://imgur.com/nA2qQBN.jpg
2xSuperResXBR(1, .7, xbrStr=.1, xbrSharp=.7)
http://imgur.com/NzU6qMP.jpg

LoRd_MuldeR

17th November 2016, 19:39

Define "train".

You present your current network with an input/output pair, the training sample. In this case, the "input" would be a low-resolution version of the original image, and the "output" is the original high-resolution image. You let your network create its own (high-res) output image from the given (low-res) "input" image. And then you compare that against the given optimal "output" image (original). Of course, there will be some difference between the network's actual output and the optimal (desired) output - especially at the beginning of the training phase. This difference, or "error", will be used to update (improve) the network, so that the error is reduced. For example, one approach is to let the "error" propagate through the network in backwards direction and adjust the individual weights accordingly. You repeat this process with many training samples (input/output pairs). In the end, you get a network that (hopefully) produces good results, even for unknown inputs.

Overdrive80

17th November 2016, 19:48

Define "train". Yeah it is not brute-force once the user gets this, its a cache of brute-force rather.

Yaniv Romano, John Isidoro, Peyman Milanfar
(Submitted on 3 Jun 2016 (v1), last revised 4 Oct 2016 (this version, v3))

Given an image, we wish to produce an image of larger size with significantly more pixels and higher image quality. This is generally known as the Single Image Super-Resolution (SISR) problem. The idea is that with sufficient training data (corresponding pairs of low and high resolution images) we can learn set of filters (i.e. a mapping) that when applied to given image that is not in the training set, will produce a higher resolution version of it, where the learning is preferably low complexity. In our proposed approach, the run-time is more than one to two orders of magnitude faster than the best competing methods currently available, while producing results comparable or better than state-of-the-art.
A closely related topic is image sharpening and contrast enhancement, i.e., improving the visual quality of a blurry image by amplifying the underlying details (a wide range of frequencies). Our approach additionally includes an extremely efficient way to produce an image that is significantly sharper than the input blurry one, without introducing artifacts such as halos and noise amplification. We illustrate how this effective sharpening algorithm, in addition to being of independent interest, can be used as a pre-processing step to induce the learning of more effective upscaling filters with built-in sharpening and contrast enhancement effect.

https://arxiv.org/abs/1606.01299

CruNcher

19th November 2016, 16:56

SOA Compare of different test cases from the dataset the building reconstruction result is pretty good also all the text (letter) parts and the faceted eyes pattern reconstruction

very impressive high frequency preservation in all of those results by default :)

No one yet here gained those sharp results without haloing so far in the Post Process with the Human Eyes Face sample is still far away from those reconstruction results especially at the fine hair structure reconstruction (shown also with the cat test case).

https://drive.google.com/file/d/0BzCe024Ewz8ab2RKUFVFZGJ4OWc/view

Triticals NNEDI3 got on the first sight beaten by Google R&D

Wee need the Clown and Lighthouse test with Googles Trained Algorithm :)

plasma

21st November 2016, 22:05

using neural enhance+denoise
http://i.imgur.com/Vayd1PO.png

luquinhas0021

4th February 2017, 21:58

Where can I get some stable and updated version of this algorithm for personal use? RAISR outperformed Waifu2x, in my tests.

Overdrive80

17th March 2017, 22:20

https://github.com/google/guetzli/

LoRd_MuldeR

18th March 2017, 01:37

https://github.com/google/guetzli/

We have a Guetzli thread already:
https://forum.doom9.org/showthread.php?t=174428

madshi

21st March 2017, 13:15

FWIW, the blown-up "original" image doesn't match the small sized image posted by LoRd_MuldeR. The "original" image used by RAISR appears to be higher quality. So all the comparison images with "our" algorithms seem to be at a disadvantage here.

It's a pitty Google didn't find it necessary to make all the images from their PDF available as PNG (or even JPG). But maybe they had a good reason for that... :sly:

The biggest problem with all these neural network algorithms is that usually the guys publishing such "scientific" papers use *one* specific algorithm to downscale their images, and then their neural network just learns how to revert the downscaling. Obviously you can get pretty good results that way. But as soon as you feed those neural networks with images downscaled with a different algorithm (e.g. Lanczos instead of Catrom, or Bilinear or Box), the algos quickly fall apart. The only way to properly evaluate the quality of an upscaling algorithm is to test it with multiple different images, which were downscaled with different algorithms.

luquinhas0021

22nd March 2017, 13:38

Your thoughts makes complete sense, Madshi.

I thought the correct approach for upscaling is simply an image that must be size-increased... Just that. Not a H.R image that was downscaled and the upscale will try reverse the downscaling.

madshi

22nd March 2017, 14:55

I thought the correct approach for upscaling is simply an image that must be size-increased... Just that. Not a H.R image that was downscaled and the upscale will try reverse the downscaling.
Yes. However, at doom9 we're talking about DVDs and Blu-Rays which were usually downscaled from higher resolution masters. So learning how to revert downscaling is not generally a bad idea, IMHO. For our purposes here it could even be a great idea. But the important thing is that an upscaling algorithm, which works by trying to revert a prior downscaling operation, should not be limited to work with only one specific downscaling algorithm. But it should work reasonably well for all common downscaling algorithms. If it does, it might also work well enough for images that weren't downscaled at all.

My point was that we have to take any results from such "scientific PDFs" with a pinch of salt, because they often train for just one specific downscaling algorithm. So it's important that we're able to double check the upscaling results ourselves, with test images we
created ourselves. Otherwise it's hard to judge how good such an algo really is.

burfadel

22nd March 2017, 14:59

But as soon as you feed those neural networks with images downscaled with a different algorithm (e.g. Lanczos instead of Catrom, or Bilinear or Box), the algos quickly fall apart. The only way to properly evaluate the quality of an upscaling algorithm is to test it with multiple different images, which were downscaled with different algorithms.

Between the downsclaing and upscaling the image wasn't lossy compressed, this would assumingly throw it farther.

luquinhas0021

22nd March 2017, 17:57

But the important thing is that an upscaling algorithm, which works by trying to revert a prior downscaling operation, should not be limited to work with only one specific downscaling algorithm. But it should work reasonably well for all common downscaling algorithms. If it does, it might also work well enough for images that weren't downscaled at all.

SAR Image Processor 5.2 try do this, with it Pseudo-Inverse algorithm.
Try to reverse downscaling is not required when the downscaler is detail preserve.

Another form of try the inverse downscaling is do the upscaling in an image (Without assuming the prior downscaling) and, then, apply some good deblur operator, specially in frequency domain.

Between the downscaling and upscaling the image wasn't lossy compressed, this would assumingly throw it farther.

The problem is the most part of images are lossy compressed, in great or lesser degree. One more reason why upscaling based on prior downscaling is not convince, at least theoretically, at all.

madshi

22nd March 2017, 18:11

SAR Image Processor 5.2 try do this, with it Pseudo-Inverse algorithm.
Yes, but it doesn't look very good to me.

luquinhas0021

25th March 2017, 23:45

The biggest problem with all these neural network algorithms is that usually the guys publishing such "scientific" papers use *one* specific algorithm to downscale their images, and then their neural network just learns how to revert the downscaling. Obviously you can get pretty good results that way. But as soon as you feed those neural networks with images downscaled with a different algorithm (e.g. Lanczos instead of Catmul-Rom, or Bilinear or Box), the algos quickly fall apart. The only way to properly evaluate the quality of an upscaling algorithm is to test it with multiple different images, which were downscaled with different algorithms.

Complementing what you said, Madshi, another big problem in that neural network comes from mathematical aspect (Remember that RAISR use neural networks for learning the difference between a cheap-upscaled downsized image and the ground-truth one, and it uses a 4 x 4 matrix in both cases): for each pixel, considering just one channel, there are 2 ^ k possible color values. Considering that a value can be repeated in some other pixel, the total number of possibilities is 2 ^ 16k image 4 x 4 matrices. If we take account a 8 bit channel, so the number of possibilities will be 256 ^ 16, which is equal a number with 39 digits. A content, by it size, impossible to be real, at least in actual days.
Google said that was trained 10000 images. But what are 10000 images in comparison with the number I wrote?

madshi

26th March 2017, 12:00

feisty2

26th March 2017, 13:17

Right, the first layer is pretty much an edge detector, but you could always do a neural net with like probably 100 layers and hopefully it might be able to predict stuff more complex than just edges, there's like 1002-layer neural net out there last time I checked

madshi

26th March 2017, 14:10

Yep. However, training of a lot of layers can be very difficult.

feisty2

26th March 2017, 14:22

It's hard for the traditional BPNN, but I think it's practical if you're using a deep learning model,,

Wilbert

26th March 2017, 15:32

I don't know anything about neural networks. It is possible (mathematically) to show that the first layer is an edge detector is some way or some form?

I guess it should also be possible to incorperate edge detection in standard resizing routines? Thus that you neglect an edge (and every source pixel up to that edge) that would normally contribute to a target pixel (and normalize the remaining source pixels that do contribute). Perhaps people tried things like that?

madshi

26th March 2017, 15:44

One convolutional neural networks layer basically consists of a number of rectangular "filters". With the right tools, you can convert these filters into bitmap images, which lets you see what the filters do. Doing that you can see that the filters in the first layer are usually mostly edge detection filters, plus at least one "smooth area" detection filter, plus a couple "weird" filters. It's really interesting look at what a learned neural network does. I've read about this in some PDF some time ago, but I don't have a link at hand right now. The most interesting thing is that all of these filters are created automatically by machine learning, with no human intervention or guiding.

Yes, you could try to tweak standard linear filter resizers to look for edges and treat them differently. Many people have tried that, including myself, but it's hard to make that work really well. Often if you try to interpret gradient/line angles, you end up adding directional artifacts into the interpolated image. E.g. see here for an example:

http://www.general-cathexis.com/interpolation/clownDDL3_4X.jpg (Data-Dependent Lanczos 3)

Pretty ugly, if you ask me. That's an extreme example, of cousre. It's possible to get better results than that.

CruNcher

27th March 2017, 09:26

Nvidia is open up their Deep Learning Super Resolution R&D a tad more to the public in context of VR and Game Development use cases

https://developer.nvidia.com/deep-learning-materials-texture

Gravitator

27th March 2017, 10:08

Pixel Recursive Super Resolution
https://arxiv.org/abs/1702.00783

luquinhas0021

28th March 2017, 04:47

If you analyze what those neural networks learn, the first layer actually ends up to be an edge detector for different angles and detectors for smooth areas, and then based on this first layer, the neural networks apply their learned upscaling (or rather downscaling inversion). It's pretty cool to see that you just throw images at the network training, and it ends up using edge detection filters. Which is not a decision that a human made, but the result of the neural network learning. This also shows that for good quality upscaling, you *do* need to detect edges and treat them differently. Which is why simple linear algorithms like spline, bicubic, lanczos, jinc or sinc are not really good upscalers, because they don't detect edges at all. They just naively apply their weights on every pixel, regardless of whether it's an edge or a smooth area.

Anyhow, might to be passed into neural network a couple millions of data.

Yes, you could try to tweak standard linear filter resizers to look for edges and treat them differently. Many people have tried that, including myself, but it's hard to make that work really well. Often if you try to interpret gradient/line angles, you end up adding directional artifacts into the interpolated image. See here, for an example, Data-Dependent Lanczos 3...

You and me know that periodic based image interpolation doesn't use to give us optimal results, due to it periodicity artifacts.
Data-dependent polynomial based interpolation would give us better results, due to it flexibility when we will build the linear system, i.e, put till the n-th derivative and whatever more we want, as long as fit in system.

zub35

13th November 2017, 19:20

online upscaler: https://letsenhance.io

Original (http://madvr.com/doom9/clown/clown.png)
Up x4:
- Lanczos4 (http://images2.imagebam.com/86/09/ad/b6af8b655838283.png)
- waifu2x UpPhoto noise_scale Level1 x4 (http://images2.imagebam.com/1c/78/fc/efce1b655838323.png)
- letsenhance io (http://images2.imagebam.com/eb/06/61/8389fc655838343.png) !

Original (http://images2.imagebam.com/d3/86/5d/78b00c655864713.png)
downscale (http://images2.imagebam.com/35/ac/3b/a4b297655864763.png)
Up x4:
- Lanczos4 (http://images2.imagebam.com/64/b0/cb/ffb030655864813.png)
- waifu2x UpPhoto noise_scale Level0 (http://images2.imagebam.com/ef/82/51/b85d2f655864863.png)
- letsenhance io (http://images2.imagebam.com/62/1c/11/6b8933655864923.png) !

madshi

13th November 2017, 20:34

Hmmm... That website offers a "boring" and "magic" version. The boring one looks very similar to waifu2x. The "magic" one most probably tries to hallucinate texture detail, based on some of the recent scientific PDFs suggesting that approach. And it works surprisingly well (even magical) on many image areas, especially on trees and stuff. But it also adds a shitload of weird artifacts, sometimes does truly scary things (like mutating an image element into something completely different), and this is when testing with losslessly compressed images. Testing with lossily compressed images, artifacts become even worse. E.g. any sort of Mosquito or Block artifacts are very strongly enhanced and reinterpreted as being weird image detail.

It's a very interesting technology, but I don't think it's a good match for upscaling lossily compressed video content. Just my 2 cents, of course.

zub35

13th November 2017, 21:22

While all this is demonstrational on individual images. The technology can be expanded, given a temporary.
Thereby improving not only the quality but also minimizing the artifacts. Besides it affects the quality more and a good database of samples for neural computing.

If go even further. In the next-gen video standard, can be implemented neural-data in GOP for fast/quality restoration. Thus allowing compress 6K-8K in the current bitrates 2K

madshi

13th November 2017, 21:41

You do know that neural networks of this size don't process in real time for video, right? And that's when talking about single images. Adding temporal processing would help reducing artifacts (but most probably not remove them completely), but it would also slow things down even further.

zub35

13th November 2017, 21:49

Of course. For real-time processing requires an appropriate implementation of the instruction set, for example as a separate unit in the GPU. That's why I mentioned the next-gen and the adoption of appropriate standards.
Maybe even next-next-gen standart. But you will agree, the technology is extremely promising and has a chance at hardware implementation ;)

madshi

13th November 2017, 22:47

When even car manufacturers like Tesla use GPUs to run neural networks then I don't know how much more efficient "hardware implementations" would be. Maybe they would achieve a somewhat higher power vs performance ratio than a GPU, but I wouldn't expect miracles. It's not like e.g. video decoding, where the algorithms are relatively complicated and hardware implementations can achieve dramatic improvements over a general purpose CPU. Processing neural networks is extremely simple math, but requires very high GLOPS. It's mostly just lots and lots and lots of matrix multiplications, and GPUs are already very good at doing that.

nevcairiel

13th November 2017, 22:58

NVIDIA is already putting Tensor cores into their Volta-based Datacenter Deep Learning GPUs, which helps Neural Network inference quite a bit. And some other vendors are working on special chips for NN inference. So in the consumer space, there is still plenty performance to achieve with special Neural Network hardware.

madshi

13th November 2017, 23:18

Ok, let's wait and see what the future will bring. Current situation is that GPUs would probably have to be about 20-50x faster than they are now to be able to apply waifu2x sized neural networks on video in real time. And that's just single frame processing, not yet taking temporal information into account.

Of course, for offline processing real time performance is not needed.

feisty2

14th November 2017, 09:02

looks like GAN related model to me, judging from all that generated fake details

ABDO

14th November 2017, 15:46

the author says
“We took few state-of-art approaches, hacked around and rolled them into production-ready system. Basically we were inspired by SRGAN and EDSR papers.”

cork_OS

14th November 2017, 21:47

madshi

16th November 2017, 10:23

One problem with hallucinated texture detail is that it's most probably not "stable" in motion. Which means if there's only a small change in the video image, the hallucinated texture detail may shift its position, although still having a similar "look" to it. In motion this will probably look like very strong dithering noise. It in extreme cases it could even turn into flickering.

Unfortunately I've already spent my 5 allowed images on the website, so I can't test that. Maybe someone else can try upscaling the clown image shifted by 1 pixel? Or window boxed by 1 pixel? Or mirrored or rotated? That would be a quick way to check how "stable" the hallucinated textures are.

tebasuna51

16th November 2017, 11:43

Maybe someone else can try upscaling the clown image shifted by 1 pixel? Or window boxed by 1 pixel?

I spent my 5 images with 1,2,3,4,5 pixels shifted of original clown.png (http://madvr.com/doom9/clown/clown.png) and magic 4X: https://www.sendspace.com/file/8mifbt

To see in movement order the letsenhance io (http://images2.imagebam.com/eb/06/61/8389fc655838343.png) was c5-0p.png

madshi

16th November 2017, 12:06

Thank you! So it seems simple pixel shifting doesn't harm. Well, I suppose it makes sense because convolutional neural networks aren't really position dependent. So window boxing wouldn't make any difference, either. Argh, should have thought of that! Maybe adding a very small amount of noise would be interesting, or changing image brightness ever so slightly, or rotating by 1 degree, or something like that?

ABDO

16th November 2017, 12:24

Maybe someone else can try upscaling the clown image shifted by 1 pixel? Or window boxed by 1 pixel? Or mirrored or rotated?

Mirrored
https://s20.postimg.org/m51kjv0m1/clown1-magic.png (https://postimg.org/image/m51kjv0m1/)

Rotated1
https://s20.postimg.org/n8lovug49/clown2-magic.png (https://postimg.org/image/n8lovug49/)

Rotated2
https://s20.postimg.org/dbao35iwp/clown3-magic.png (https://postimg.org/image/dbao35iwp/)

madshi

16th November 2017, 12:57

Very good, thank you! I've de-rotated/de-mirrored your images and here they are for easy comparison:

no modification (http://madVR.com/doom9/gan/clownGanOrg.png) - | - rotated left (http://madVR.com/doom9/gan/clownGanRotLeft.png) - | - rotated right (http://madVR.com/doom9/gan/clownGanRotRight.png) - | - mirrored horizontally (http://madVR.com/doom9/gan/clownGanMirrorHorz.png)

If you compare these images, you'll see that the texture changes a lot in all 4 images, it's completely different in each frame. It still has an overall similar look to it, but the changes are still much bigger than any dithering, so in motion this will look extremely noisy/unstable.

For still images it might not matter too much, but for video this type of "texture hallucination" is IMHO currently not feasible in motion, because it is not stable when the image content changes slightly. I'm not sure if the algorithm could be changed to fix this problem. I kind of doubt it because the algo by design doesn't even try to restore the original texture (which is technically impossible, anyway), it just tries to hallucinate a texture which hopefully has a similar look to the texture the original hi-res image had before downscaling. So the algo is by design not able to maintain a stable "position" of the texture in motion.

Even worse, if you look at the very bottom of the image, the alphalt texture is changing its brightness very strongly from frame to frame, this will actually produce visible flickering in motion. That said, these brightness fluctuations should be fixable with better neural network training.

ABDO

16th November 2017, 13:59

but the changes are still much bigger than any dithering, so in motion this will look extremely noisy/unstable.

that is very sad news :D

For still images it realy do very good, i try most of this Single Image Super-Resolution, and most of them give Similar result to "boring" version, just srgan can give very Similar sharp result to "magic" version but less acuracy in hallucinate texture detail.

but for video i think this multiple frames (Detail-revealing Deep Video Super-resolution) give good result than any Single Image Super-Resolution
https://github.com/jiangsutx/SPMC_VideoSR :D:D