Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Display Modes
Old 7th April 2011, 07:15   #61  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by Gavino View Post
Actually, there is more than one way of doing it. In the Avisynth implementation, your line [...] is effectively replaced by [...] which is marginally faster (fewer multiplies).
Haha - you're right...

That said, the weights are usually only calculated once and then stored in some array. So it's not a few saved multiplies per pixel, not even per frame.
madshi is offline   Reply With Quote
Old 7th April 2011, 09:10   #62  |  Link
PhrostByte
Grand Fruitioner
 
PhrostByte's Avatar
 
Join Date: Mar 2004
Location: Chicago, IL
Posts: 115
Here's version 5. Hopefully the last of the chroma bugs. Been hitting F2 so much in Virtualdub that I've started hitting F2 instead of F5 in browsers, lol.

Download for 32-bit and 64-bit:
http://sourceforge.net/projects/int6...5.zip/download

Full documentation:
http://svn.int64.org/viewvc/int64/re...l?revision=268

Changelog:
  • Catmull–Rom, Gaussian, Hermite, Mitchell–Netraveli, Robidoux, Sinc, and SoftCubic kernels.
  • SMPTE 240M and FCC matrices.
  • Customization of kernel support scale.
  • Customization of chroma kernel.
  • Support for SetMTMode(2).
  • More SSE versions of colorspace conversions.
  • Bug fix: make SSE paths work with unaligned sources.
  • Bug fix: scale chroma correctly.
PhrostByte is offline   Reply With Quote
Old 7th April 2011, 09:12   #63  |  Link
SubJunk
Registered User
 
Join Date: Jun 2010
Posts: 442
Thanks a lot
SubJunk is offline   Reply With Quote
Old 7th April 2011, 10:09   #64  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,375
Thank you!
I can't make RGB work, either input or ouput. I get a crash (my fault?):
crashinfo.txt

edit: that was in avs 2.57 MT winXP SP3
edit2: It worked if dither=false, so its a dither thing in RGB.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread

Last edited by Dogway; 8th April 2011 at 06:33.
Dogway is offline   Reply With Quote
Old 7th April 2011, 13:00   #65  |  Link
PhrostByte
Grand Fruitioner
 
PhrostByte's Avatar
 
Join Date: Mar 2004
Location: Chicago, IL
Posts: 115
Quote:
Originally Posted by Dogway View Post
Thank you!
I can't make RGB work, either input or ouput. I get a crash (my fault?):
crashinfo.txt
i can't download that file, for some reason mediafire gives me an invalid url. do you have a script that reproduces it?

edit: nevermind, i got it through another browser.

Last edited by PhrostByte; 7th April 2011 at 13:03.
PhrostByte is offline   Reply With Quote
Old 7th April 2011, 13:28   #66  |  Link
Yellow_
Registered User
 
Join Date: Sep 2009
Posts: 378
Is there anything to gain from implementing yesgrey's high precision 3D LUT (yCMS) for colourspace conversions used in conjunction with the necessary bits of Triticals t3dlut plugin?

http://forum.doom9.org/showthread.php?t=154719
Yellow_ is offline   Reply With Quote
Old 7th April 2011, 14:22   #67  |  Link
PhrostByte
Grand Fruitioner
 
PhrostByte's Avatar
 
Join Date: Mar 2004
Location: Chicago, IL
Posts: 115
Quote:
Originally Posted by Yellow_ View Post
Is there anything to gain from implementing yesgrey's high precision 3D LUT (yCMS) for colourspace conversions used in conjunction with the necessary bits of Triticals t3dlut plugin?

http://forum.doom9.org/showthread.php?t=154719
Not sure, I'll check it out.

Actual colorspace conversion is the cheapest part of ResampleHQ right now -- resampling takes up a huge amount of time compared to it.

I'm currently experimenting with OpenCL which should give a pretty nice speedup as is, though it can also use 3D LUTs so it could also help assuming you've got enough vram.
PhrostByte is offline   Reply With Quote
Old 7th April 2011, 14:39   #68  |  Link
Archimedes
Registered User
 
Join Date: Apr 2005
Posts: 213
Thanks for the update. Hermite and Robidoux do not working (unsupported kernels).
Archimedes is offline   Reply With Quote
Old 7th April 2011, 14:45   #69  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,275
Also "Spline32" (mentioned in the docs) does not work, but "Spline36" (undocumented) does
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊
LoRd_MuldeR is offline   Reply With Quote
Old 7th April 2011, 15:15   #70  |  Link
PhrostByte
Grand Fruitioner
 
PhrostByte's Avatar
 
Join Date: Mar 2004
Location: Chicago, IL
Posts: 115
Quote:
Originally Posted by Archimedes View Post
Thanks for the update. Hermite and Robidoux do not working (unsupported kernels).
Fixed. You can use Bicubic with b/c set manually until I get a new version out.

Quote:
Originally Posted by LoRd_MuldeR View Post
Also "Spline32" (mentioned in the docs) does not work, but "Spline36" (undocumented) does
Doh!

Quote:
Originally Posted by Dogway View Post
Thank you!
I can't make RGB work, either input or ouput. I get a crash (my fault?):
crashinfo.txt

edit: that was in avs 2.57 MT winXP SP3
Still trying to reproduce this. Strange crashdump, it includes instructions that don't exist in my filter, like movups.
PhrostByte is offline   Reply With Quote
Old 7th April 2011, 23:04   #71  |  Link
ganymede
Registered User
 
Join Date: Aug 2010
Location: Paris
Posts: 52
Quote:
Originally Posted by PhrostByte View Post
Mitchell–Netraveli
It's a typo, the name of the filter should be Mitchell–Netravali (from the name of Don P. Mitchell and Arun N. Netravali).
ganymede is offline   Reply With Quote
Old 9th April 2011, 00:11   #72  |  Link
PhrostByte
Grand Fruitioner
 
PhrostByte's Avatar
 
Join Date: Mar 2004
Location: Chicago, IL
Posts: 115
Version 6. Fixes Dogway's crash and some other things.

Download for 32-bit and 64-bit:
http://sourceforge.net/projects/int6...6.zip/download

Full documentation:
http://svn.int64.org/viewvc/int64/re...l?revision=272

Changelog:
  • SSE Y'CbCr output conversions. All conversions now have SSE implementations.
  • Bug fix: correct rounding in SSE output conversions.
  • Bug fix: allocate dithering error buffers.
  • Bug fix: Hermite and Robidoux kernels are now enabled.
  • Bug fix: correct spelling of Spline36 and Mitchell–Netravali.
PhrostByte is offline   Reply With Quote
Old 9th April 2011, 00:36   #73  |  Link
SubJunk
Registered User
 
Join Date: Jun 2010
Posts: 442
Nice one, thanks
SubJunk is offline   Reply With Quote
Old 10th April 2011, 18:43   #74  |  Link
PhrostByte
Grand Fruitioner
 
PhrostByte's Avatar
 
Join Date: Mar 2004
Location: Chicago, IL
Posts: 115
Anyone have a shiny new Sandy Bridge CPU willing to run a lengthy benchmark? I wrote a bunch of AVX stuff but have no way of testing if it is actually any faster.
PhrostByte is offline   Reply With Quote
Old 13th April 2011, 08:07   #75  |  Link
Yellow_
Registered User
 
Join Date: Sep 2009
Posts: 378
Enjoying using your plugin and have a query, not a bug to report but I notice when converting to RGB that I see faint red/blue rectangular blocks along the edges of content in any chosen image frame. The source is h264AVC from a Canon DSLR. I can put up a sample but guess the description is enough. Assume this is result of chroma subsampling / moire / line skipping. They also appear to be non square even though its square pixel source

So query is, is there anything that can be done in the interpolation of chroma, assume its going 4:2:0 to 4:2:2 to RGB that could subdue the alternate red blue blocks its like chequreboard

I notice this with other conversion methods to RGB including yCMS 3DLUT + Triticals t3dlut plugin

Last edited by Yellow_; 13th April 2011 at 08:11.
Yellow_ is offline   Reply With Quote
Old 13th April 2011, 12:26   #76  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 1,827
Quote:
Anyone have a shiny new Sandy Bridge CPU willing to run a lengthy benchmark? I wrote a bunch of AVX stuff but have no way of testing if it is actually any faster.
I can test it. Have a SB 2500 CPU
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth
VapourSynth Portable FATPACK || VapourSynth Database
ChaosKing is offline   Reply With Quote
Old 13th April 2011, 15:39   #77  |  Link
PhrostByte
Grand Fruitioner
 
PhrostByte's Avatar
 
Join Date: Mar 2004
Location: Chicago, IL
Posts: 115
Quote:
Originally Posted by Yellow_ View Post
Enjoying using your plugin and have a query, not a bug to report but I notice when converting to RGB that I see faint red/blue rectangular blocks along the edges of content in any chosen image frame. The source is h264AVC from a Canon DSLR. I can put up a sample but guess the description is enough. Assume this is result of chroma subsampling / moire / line skipping. They also appear to be non square even though its square pixel source

So query is, is there anything that can be done in the interpolation of chroma, assume its going 4:2:0 to 4:2:2 to RGB that could subdue the alternate red blue blocks its like chequreboard

I notice this with other conversion methods to RGB including yCMS 3DLUT + Triticals t3dlut plugin
I suspect it's caused by chroma bleeding into luma because of subsampling. Some colors right next to each other can react particularly bad. If you post an example image I can take a look.

Quote:
Originally Posted by ChaosKing View Post
I can test it. Have a SB 2500 CPU
Here it is. It'll create a "benchmark results.txt" once it's done, you can just paste the results here.

http://sourceforge.net/projects/int6...h.zip/download
PhrostByte is offline   Reply With Quote
Old 13th April 2011, 16:01   #78  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,375
I made some tests. I use an image I used once before because it has good features to test on like small details, big details, and some well defined letters.
From what I observed most of the scaling kernels show the ringing I described a few weeks back. So this is how I classify the different algorithms:

blurry: principally because of this
bilinear
rubidoux

ringing:
lanczos4
spline64
spline36

good balance of both of them:
lanczos (best in my opinion)
blackman (2nd best)
spline16
catmull

I also included automttap3 for comparison, it was the one who retained most detail in my last test, but I wouldnt use it for graphics or cartoons because it also creates some ugly ringing repetition. Also added spline64.
If I tell you the truth I was a bit dissappointed concerning the ringing thing, even if that was what it was supposed to do :/

Pack with the tests, .psd file and separated .png files
35Mb rar:
http://www.mediafire.com/?16a31o7v0im652o
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread
Dogway is offline   Reply With Quote
Old 13th April 2011, 16:24   #79  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 1,827
Benchmark results
------------------

The AVX version seems to be very slow :/

Code:
RGB24 -> linear RGB C: (fastest/slowest/average) wall time speed is 6.59281/6.10231/6.51858 runs/second.
RGB24 -> linear RGB C relative speed: 1.0x
RGB24 -> linear RGB SSSE3: (fastest/slowest/average) wall time speed is 37.5498/33.94/36.5475 runs/second.
RGB24 -> linear RGB SSSE3 relative speed: 5.69557x
RGB24 -> linear RGB AVX: (fastest/slowest/average) wall time speed is 3.00409/2.71127/2.94455 runs/second.
RGB24 -> linear RGB AVX relative speed: 0.455661x
RGB32 -> linear RGB C: (fastest/slowest/average) wall time speed is 4.67475/3.83495/4.55645 runs/second.
RGB32 -> linear RGB C relative speed: 1.0x
RGB32 -> linear RGB SSE2: (fastest/slowest/average) wall time speed is 37.7303/35.7484/37.0366 runs/second.
RGB32 -> linear RGB SSE2 relative speed: 8.07107x
RGB32 -> linear RGB AVX: (fastest/slowest/average) wall time speed is 2.76705/2.44912/2.72385 runs/second.
RGB32 -> linear RGB AVX relative speed: 0.591913x
RGB32 -> linear RGBA C: (fastest/slowest/average) wall time speed is 4.55221/4.31145/4.49747 runs/second.
RGB32 -> linear RGBA C relative speed: 1.0x
RGB32 -> linear RGBA SSE2: (fastest/slowest/average) wall time speed is 36.1119/33.9868/35.8453 runs/second.
RGB32 -> linear RGBA SSE2 relative speed: 7.93283x
RGB32 -> linear RGBA AVX: (fastest/slowest/average) wall time speed is 2.85771/2.20617/2.79098 runs/second.
RGB32 -> linear RGBA AVX relative speed: 0.627764x
YV12 -> YUV C: (fastest/slowest/average) wall time speed is 165.108/157.79/163.724 runs/second.
YV12 -> YUV C relative speed: 1.0x
YV12 -> YUV SSE2: (fastest/slowest/average) wall time speed is 300.263/286.618/295.344 runs/second.
YV12 -> YUV SSE2 relative speed: 1.81859x
YV12 -> YUV AVX: (fastest/slowest/average) wall time speed is 19.5201/18.1788/19.3726 runs/second.
YV12 -> YUV AVX relative speed: 0.118226x
YUY2 -> YUV C: (fastest/slowest/average) wall time speed is 204.879/178.597/199.624 runs/second.
YUY2 -> YUV C relative speed: 1.0x
YUY2 -> YUV SSE2: (fastest/slowest/average) wall time speed is 424.132/361.432/417.757 runs/second.
YUY2 -> YUV SSE2 relative speed: 2.07016x
YUY2 -> YUV AVX: (fastest/slowest/average) wall time speed is 32.1951/28.7248/31.7467 runs/second.
YUY2 -> YUV AVX relative speed: 0.157142x
YUV -> linear RGB C: (fastest/slowest/average) wall time speed is 16.9974/12.2325/16.4998 runs/second.
YUV -> linear RGB C relative speed: 1.0x
YUV -> linear RGB SSE2: (fastest/slowest/average) wall time speed is 37.3566/35.0877/36.9554 runs/second.
YUV -> linear RGB SSE2 relative speed: 2.19778x
YUV -> linear RGB AVX: (fastest/slowest/average) wall time speed is 3.72789/3.47262/3.68511 runs/second.
YUV -> linear RGB AVX relative speed: 0.219321x
linear RGB -> RGB24 C: (fastest/slowest/average) wall time speed is 6.25605/5.79279/6.17731 runs/second.
linear RGB -> RGB24 C relative speed: 1.0x
linear RGB -> RGB24 SSSE3: (fastest/slowest/average) wall time speed is 40.4334/33.1002/39.0018 runs/second.
linear RGB -> RGB24 SSSE3 relative speed: 6.46308x
linear RGB -> RGB24 AVX: (fastest/slowest/average) wall time speed is 3.30754/2.9804/3.24551 runs/second.
linear RGB -> RGB24 AVX relative speed: 0.528695x
linear RGB -> RGB32 C: (fastest/slowest/average) wall time speed is 4.37648/4.11399/4.33744 runs/second.
linear RGB -> RGB32 C relative speed: 1.0x
linear RGB -> RGB32 SSE2: (fastest/slowest/average) wall time speed is 41.4194/40.0089/41.09 runs/second.
linear RGB -> RGB32 SSE2 relative speed: 9.46409x
linear RGB -> RGB32 AVX: (fastest/slowest/average) wall time speed is 3.57475/3.04053/3.43611 runs/second.
linear RGB -> RGB32 AVX relative speed: 0.816809x
linear RGBA -> RGB32 C: (fastest/slowest/average) wall time speed is 4.44825/4.36037/4.42751 runs/second.
linear RGBA -> RGB32 C relative speed: 1.0x
linear RGBA -> RGB32 SSE2: (fastest/slowest/average) wall time speed is 39.6152/36.8389/39.01 runs/second.
linear RGBA -> RGB32 SSE2 relative speed: 8.90578x
linear RGBA -> RGB32 AVX: (fastest/slowest/average) wall time speed is 3.31775/2.84886/3.2405 runs/second.
linear RGBA -> RGB32 AVX relative speed: 0.745854x
YUV -> YV12 C: (fastest/slowest/average) wall time speed is 155.964/138.181/153.442 runs/second.
YUV -> YV12 C relative speed: 1.0x
YUV -> YV12 SSE2: (fastest/slowest/average) wall time speed is 450.69/430.569/445.116 runs/second.
YUV -> YV12 SSE2 relative speed: 2.88972x
YUV -> YV12 AVX: (fastest/slowest/average) wall time speed is 15.0684/14.3686/14.9177 runs/second.
YUV -> YV12 AVX relative speed: 0.0966147x
YUV -> YUY2 C: (fastest/slowest/average) wall time speed is 201.52/167.831/199.426 runs/second.
YUV -> YUY2 C relative speed: 1.0x
YUV -> YUY2 SSE2: (fastest/slowest/average) wall time speed is 659.215/645.618/654.092 runs/second.
YUV -> YUY2 SSE2 relative speed: 3.2712x
YUV -> YUY2 AVX: (fastest/slowest/average) wall time speed is 21.4287/20.0173/21.215 runs/second.
YUV -> YUY2 AVX relative speed: 0.106335x
linear RGB -> YUV C: (fastest/slowest/average) wall time speed is 8.66521/8.32953/8.50003 runs/second.
linear RGB -> YUV C relative speed: 1.0x
linear RGB -> YUV SSE2: (fastest/slowest/average) wall time speed is 38.0747/33.7847/37.6462 runs/second.
linear RGB -> YUV SSE2 relative speed: 4.39398x
linear RGB -> YUV AVX: (fastest/slowest/average) wall time speed is 3.74787/3.71133/3.74012 runs/second.
linear RGB -> YUV AVX relative speed: 0.432519x
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth
VapourSynth Portable FATPACK || VapourSynth Database
ChaosKing is offline   Reply With Quote
Old 13th April 2011, 17:39   #80  |  Link
PhrostByte
Grand Fruitioner
 
PhrostByte's Avatar
 
Join Date: Mar 2004
Location: Chicago, IL
Posts: 115
Quote:
Originally Posted by Dogway View Post
I made some tests. I use an image I used once before because it has good features to test on like small details, big details, and some well defined letters.
From what I observed most of the scaling kernels show the ringing I described a few weeks back. So this is how I classify the different algorithms:
Thanks for the analysis. My own eyeballed tests tend to agree, I've seen the best downsizing results with Lanczos2/3 and Spline16.

Quote:
Originally Posted by ChaosKing View Post
Benchmark results
------------------

The AVX version seems to be very slow :/
Very bizarre results indeed. I'll have to look things over closely, because those can't be correct.
PhrostByte is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 22:04.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.