waifu2x - Page 2

ReinerSchweinlin · 11th May 2020, 12:35

Yes, Waifu2x is fine for anime and stuff - but in no way potent for real-life stuff..

As mentioned above, take a look at "hybrid" from selur - he already has a working implementation of the vapoursynth ports with vulkan etc... No need to re-invent the wheel for avisynth with CPU-based OPEN CL ....

tormento · 11th May 2020, 14:40

Quote:

Originally Posted by MeteorRain

I don't think opencl on CPU is well supported. No?

Yes it is on integrated GPU and I think it's easily portable from existing C++ code with appropriate tools.

tormento · 11th May 2020, 14:45

Quote:

Originally Posted by FranceBB

Yes, that would definitely be welcomed. At work we have servers with NVIDIA Quadro GPUs that are currently not used other than for x264 OpenCL accelerated encoding, which is a total waste, honestly. I would love to see plugins supporting both CPU and GPU acceleration.

Whenever your company will decide to buy next generation and dump current, please send me a PM, seriously.

Quote:

Originally Posted by FranceBB

This is to avoid having two separate version of plugins to maintain.

AFAIK, at least with CUDA and perhaps OpenCL it's really easy to port from C++ to GPU accelation. We should ask Khanattila how did he port NLMeans to KNLMeansCL.

Quote:

Originally Posted by FranceBB

so for me CUDA is fine; heck, it's even better than OpenCL in my case 'cause OpenCL runs slower on NVIDI

CUDA has many many tools and resources to help programmers. AMD has great hardware but really poor support and drivers.

feisty2 · 11th May 2020, 17:58

NTSC master

scaled to 2K by ESRGAN

feisty2 · 11th May 2020, 18:01

ur welcome to challenge the results with waifu2x, nnedi3, traditional resizers, anything you like, and see if the quality could be nearly as good

feisty2 · 11th May 2020, 18:45

note that ESRGAN is already available to vaporsynth thru vs_mxnet, you can turn your SD masters to native 2k anytime you like, the quality is levels above waifu2x or nnedi3, I therefore find all this waifu2x discussion totally meaningless

poisondeathray · 11th May 2020, 18:56

Quote:

Originally Posted by feisty2

note that ESRGAN is already available to vaporsynth thru vs_mxnet, you can turn your SD masters to native 2k anytime you like, the quality is levels above waifu2x or nnedi3, I therefore find all this waifu2x discussion totally meaningless

ESRGAN is also available through vsgan
https://github.com/rlaPHOENiX/VSGAN

But that image looks nothing like a "native 2K image". It's a huge overstatement

But waifu2x is meant for anime sources, and was trained on anime sources - how is it "meaningless ? "

Or, do you have an ESRGAN model that gives similar or better results than waifu2x on typical anime sources ?

It's all about the training used and the specific model. I tried some public distributed anime trained models for ESRGAN , and the results are significantly worse

feisty2 · 11th May 2020, 19:06

the problem is not about anime or photos, waifu2x is a really "early" project, its models are too small and do not have enough capacity compared to state-of-the-art models in the academia. if someone's gonna port neural net based resizers to avisynth, I think it makes more sense to directly go for the modern models.

feisty2 · 11th May 2020, 19:12

Quote:

Originally Posted by poisondeathray

But that image looks nothing like a "native 2K image". It's a huge overstatement

still, small models like waifu2x and nnedi3 can never get close, I have posted the original SD images, try for yourself.

Quote:

Originally Posted by poisondeathray

I tried some public distributed anime trained models for ESRGAN , and the results are significantly worse

I am 99.9% sure that, given the same training set, anime or not, ESRGAN will produce much better results

poisondeathray · 11th May 2020, 19:25

Quote:

Originally Posted by feisty2

the problem is not about anime or photos, waifu2x is a really "early" project, its models are too small and do not have enough capacity compared to state-of-the-art models in the academia. if someone's gonna port neural net based resizers to avisynth, I think it makes more sense to directly go for the modern models.

I agree

Quote:

Originally Posted by feisty2

still, small models like waifu2x and nnedi3 can never get close, I have posted the original SD images, try for yourself.

Of course, it's just stupid to even try waifu2x on non anime source , and nnedi3 is known for coarse lines, very old neural network (but Tritical was years ahead of almost anyone)

Quote:

I am 99.9% sure that, given the same training set, anime or not, ESRGAN will produce much better results

Probably, but it takes lot of time and resources to train.

Waifu2x has many different GUI's for it, several types of pretrained models available, some with lua scripting in other programs (aviutl) , many different implementations supporting AMD, Nvidia) - it just seems so mature and that it should be easier to port to avisynth.

Many of the newer research projects are python based, pytorch, etc...so they should be able to be work in vapoursynth much more easily, than something like avisynth

feisty2 · 11th May 2020, 19:30

Quote:

Originally Posted by poisondeathray

Of course, it's just stupid to even try waifu2x on non anime source

there're waifu2x models optimized for photos iirc, have a try and compare the results with ESRGAN if you can find them

edit: here

poisondeathray · 11th May 2020, 19:37

Quote:

Originally Posted by feisty2

there're waifu2x models optimized for photos iirc, have a try and compare the results with ESRGAN if you can find them

edit: here

I tested the waifu2x photo models before about a year ago - I can tell you now, it won't even be close. The waifu2x "strength" is only typical anime content.

Which ESRGAN model did you use above?

feisty2 · 11th May 2020, 19:59

the official one

also the results above pass for 2k to me, it surely ain't high quality 2k, it looks a bit like the quality of 16mm film. native 2k could be u know, slightly out of focus and stuff and not necessarily very sharp or with extremely delicate details.

screenshot took directly from a 2k master (filmed in 2018), does it also look "fake" to you?

zorr · 11th May 2020, 20:39

As good as ESRGAN is, it's still meant for still images. When applied to video it may not be temporally coherent. Has anyone thought about porting GANs which are meant for video? Such as TecoGAN (and I'm sure there are many more).

poisondeathray · 11th May 2020, 22:11

Quote:

Originally Posted by feisty2

also the results above pass for 2k to me, it surely ain't high quality 2k

Not for me; it screams "upscale" to me. It might be "less bad" than other algorithms, but there is 0% chance anybody that works with video or image would mistake that as native 2K.

Quote:

screenshot took directly from a 2k master (filmed in 2018), does it also look "fake" to you?

Not "fake" per say, but that shot has different characteristics with the motion and shutter blur . It's also framed differently at a distance

There is no way if you had a comparable shot up close that you would have the coarse detail characteristics, image artifacts like that Britney upscale

feisty2 · 11th May 2020, 22:44

Quote:

Originally Posted by poisondeathray

There is no way if you had a comparable shot up close that you would have the coarse detail characteristics, image artifacts like that Britney upscale

close-up shots can have coarse detail characteristics, I don't wanna pollute this thread with any more huge screenshots, gonna use a link instead.

Quote:

Originally Posted by poisondeathray

Not for me; it screams "upscale" to me. It might be "less bad" than other algorithms, but there is 0% chance anybody that works with video or image would mistake that as native 2K.

have you tried scaling the original image with, say, spline64? that's what upscale normally looks like, then there're nnedi3 and waifu2x which produce sharper edges but the result becomes "oil-painted", neither looks like the result of ESRGAN. The "eye-lash" or the whole "eye" region looks very native hi-res to me.

poisondeathray · 11th May 2020, 23:13

Quote:

Originally Posted by feisty2

close-up shots can have coarse detail characteristics, I don't wanna pollute this thread with any more huge screenshots, gonna use a link instead.

Yes, but that looks like native 2K. There are fine details, like fine grain, lace patterns in the eye patch. You can see strands of hair clearly, not edge clumps that look upscaled. It fits together and coherently - There is nothing "off" like compression artifacts, ringing that are dead giveaways in the Britney upscale. Brit has some parts that are upscaled nicely , but others that are not - it's a hodgepodge, that' s why your brain (or my brain) screams upscale

Nobody would think this was an upscale, almost everyone (that has any experience) would think Britney was. You might be able to pass Britney off to the uninitiated or lay person, not to anyone that works in image or video industry.

Quote:

have you tried scaling the original image with, say, spline64? that's what upscale normally looks like, then there're nnedi3 and waifu2x which produce sharper edges but the result becomes "oil-painted", neither looks like the result of ESRGAN. The "eye-lash" or the whole "eye" region looks very native hi-res to me.

Of course, but they all look non native 2K.

ESRGAN is "less bad", and it does a generally better job with finer details. But there are telltale signs that make it look like an upscale and not native 2K.

If you used the original models very much, certain texture patterns are prone to serious errors as well . Certain frames are unusable because of this

The flicker or temporal inconsistency was mentioned earlier too; it's not that great on video because of those issues

feisty2 · 12th May 2020, 14:19

I'm not sure what you mean by "edge clumps", I found a native HD close-up shot of britney in another video and the hair also looks "clumpy", probably just the texture of her hair.

all these were shot in circa 2010, when music video production began shifting towards 1080p. a lot of HD videos shot around that time or earlier have this "coarse" look, you can tell it's HD, but it's not high quality HD, it definitely does not squeeze out the full potential of 2k resolution. you just know the image could still be much "finer" and more "delicate" at 2k. the gap between SD and HD wasn't all that huge since SD videos at that time usually look much "finer" at their native resolution.

more recent shots have managed to get rid of this "coarse" look, they look more "HD" than them shots in 2010, the image looks much more "delicate" and you can tell instantly that it has exploited every last bit of 2k resolution potential. this is the "real" HD quality I'm talking about. I'm not quite sure what changed in the production process of recent years that led to such quality boost.

FranceBB · 12th May 2020, 15:57

Quote:

Originally Posted by feisty2

I'm not quite sure what changed in the production process of recent years that led to such quality boost.

Lens, cameras, high bit depth used all the time (from recording to post-processing) and logaritmic curves to keep as many details as possible in terms of light captured (more stops) even if the final file was then going to be graded down to linear BT709 100 nits.
Not that cameras of the past were bad, but we made huge step forwards compared to years ago as technology became so much better. And this was all years before HDR was even a thing! You can see how cameras became much more advanced to record frequencies than what the official standardized specs were allowing broadcasters to air... We squeezed everything we could for years out of the FULL HD standard...

feisty2 · 12th May 2020, 16:16

I think it probably has more to do with lens and cameras. older cameras can record 1080p videos but that's highest resolution they can handle, while modern cameras can record 6k or 8k videos and when downscaled to 2k, it should look much "finer" than footage shot by 2k cameras.

12th May 2020, 14:19	#38 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	I'm not sure what you mean by "edge clumps", I found a native HD close-up shot of britney in another video and the hair also looks "clumpy", probably just the texture of her hair. all these were shot in circa 2010, when music video production began shifting towards 1080p. a lot of HD videos shot around that time or earlier have this "coarse" look, you can tell it's HD, but it's not high quality HD, it definitely does not squeeze out the full potential of 2k resolution. you just know the image could still be much "finer" and more "delicate" at 2k. the gap between SD and HD wasn't all that huge since SD videos at that time usually look much "finer" at their native resolution. more recent shots have managed to get rid of this "coarse" look, they look more "HD" than them shots in 2010, the image looks much more "delicate" and you can tell instantly that it has exploited every last bit of 2k resolution potential. this is the "real" HD quality I'm talking about. I'm not quite sure what changed in the production process of recent years that led to such quality boost. Last edited by feisty2; 12th May 2020 at 14:31.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

11th May 2020, 12:35	#21 \| Link
ReinerSchweinlin Registered User Join Date: Oct 2001 Posts: 454	Yes, Waifu2x is fine for anime and stuff - but in no way potent for real-life stuff.. As mentioned above, take a look at "hybrid" from selur - he already has a working implementation of the vapoursynth ports with vulkan etc... No need to re-invent the wheel for avisynth with CPU-based OPEN CL ....

11th May 2020, 17:58	#24 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	NTSC master scaled to 2K by ESRGAN

11th May 2020, 18:01	#25 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	ur welcome to challenge the results with waifu2x, nnedi3, traditional resizers, anything you like, and see if the quality could be nearly as good

11th May 2020, 18:45	#26 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	note that ESRGAN is already available to vaporsynth thru vs_mxnet, you can turn your SD masters to native 2k anytime you like, the quality is levels above waifu2x or nnedi3, I therefore find all this waifu2x discussion totally meaningless

11th May 2020, 19:06	#28 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	the problem is not about anime or photos, waifu2x is a really "early" project, its models are too small and do not have enough capacity compared to state-of-the-art models in the academia. if someone's gonna port neural net based resizers to avisynth, I think it makes more sense to directly go for the modern models.

11th May 2020, 19:59	#33 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	the official one also the results above pass for 2k to me, it surely ain't high quality 2k, it looks a bit like the quality of 16mm film. native 2k could be u know, slightly out of focus and stuff and not necessarily very sharp or with extremely delicate details. screenshot took directly from a 2k master (filmed in 2018), does it also look "fake" to you?

11th May 2020, 20:39	#34 \| Link
zorr Registered User Join Date: Mar 2018 Posts: 447	As good as ESRGAN is, it's still meant for still images. When applied to video it may not be temporally coherent. Has anyone thought about porting GANs which are meant for video? Such as TecoGAN (and I'm sure there are many more).

12th May 2020, 16:16	#40 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	I think it probably has more to do with lens and cameras. older cameras can record 1080p videos but that's highest resolution they can handle, while modern cameras can record 6k or 8k videos and when downscaled to 2k, it should look much "finer" than footage shot by 2k cameras.