Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 20th November 2016, 12:01   #101  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,437
Now we need a revised masktools
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 20th November 2016, 12:16   #102  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,582
Quote:
Originally Posted by pinterf View Post
New release:
  • MvTools2 2.7.5.22 (20161119)
    General support of 10-16 bit formats with Avisynth Plus (r2294 or newer recommended)
    with new MDegrain4 and MDegrain5 filters.
  • Depan: 2.13.1 (20161119)
  • DepanEstimate: 2.10 (20161119)
    General support of 10-16 bit formats with Avisynth Plus
    Compiled for 64 bit for the first time
    Supporting YV16 and YV24 for 8 bit and their 10+ bits equivalents
    Requires 2.6 interface

Download:
MvTools 2.7.5.22 + Depan + DepanEstimate for 32 and 64 bits

For general info see readme or the first page of the thread.
Have fun.

Todo:
  • greyscale support (not tested)
  • Port remaining inline assembly to SIMD intrinsics
  • satd 16 bit to SIMD
  • fix reported bugs
Energetic as always

has the Depan fix this bug (turns the video green)?
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 20th November 2016, 14:47   #103  |  Link
Reel.Deel
Registered User
 
Join Date: Mar 2012
Location: Texas
Posts: 1,655
Quote:
Originally Posted by pinterf View Post
New release:
  • MvTools2 2.7.5.22 (20161119)
    General support of 10-16 bit formats with Avisynth Plus (r2294 or newer recommended)
    with new MDegrain4 and MDegrain5 filters.
  • Depan: 2.13.1 (20161119)
  • DepanEstimate: 2.10 (20161119)
    General support of 10-16 bit formats with Avisynth Plus
    Compiled for 64 bit for the first time
    Supporting YV16 and YV24 for 8 bit and their 10+ bits equivalents
    Requires 2.6 interface

Download:
MvTools 2.7.5.22 + Depan + DepanEstimate for 32 and 64 bits

For general info see readme or the first page of the thread.
Have fun.

Todo:
  • greyscale support (not tested)
  • Port remaining inline assembly to SIMD intrinsics
  • satd 16 bit to SIMD
  • fix reported bugs
Awesome work as always, thanks pinterf!

Quote:
Originally Posted by real.finder View Post
has the Depan fix this bug (turns the video green)?
I think Fizick fixed it in the latest release, see here: http://forum.doom9.org/showthread.ph...63#post1763563

Quote:
Originally Posted by tormento View Post
Now we need a revised masktools
A while back tp7 started working on a 16-bit MaskTools, unfortunately it was not finished. See here: https://github.com/tp7/masktools/commits/16bit

Maybe someone will come along and finish it. Another route is porting VS' Expr and co.
Reel.Deel is offline   Reply With Quote
Old 20th November 2016, 15:38   #104  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
Someday, maybe there will be a complete version of mvtools that could merge all mvtools variations into one
One binary serves as both a vaporsynth plugin and an avisynth plugin, supporting bitdepths from 8 to 32 and arbitrary temporal radius..
feisty2 is offline   Reply With Quote
Old 20th November 2016, 16:43   #105  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,582
Quote:
Originally Posted by Reel.Deel View Post


I think Fizick fixed it in the latest release, see here: http://forum.doom9.org/showthread.ph...63#post1763563
I missed this update...
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 30th November 2016, 14:39   #106  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,582
MvTools2 2.7.5.22d is slower than mvtools2 2.7.0.22d

~1.7 fps vs ~1.5 fps with same complex script

is this because it built without ICC?
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 30th November 2016, 15:04   #107  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by real.finder View Post
MvTools2 2.7.5.22d is slower than mvtools2 2.7.0.22d

~1.7 fps vs ~1.5 fps with same complex script

is this because it built without ICC?
No. It's because ill-tempered fairies have invaded your computer and are eating CPU cycles.

If you post the script that causes this behaviour and possibly mention what CPU you have you'll probably get a better answer.
__________________
Groucho's Avisynth Stuff
Groucho2004 is offline   Reply With Quote
Old 30th November 2016, 17:14   #108  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,266
Quote:
Originally Posted by real.finder View Post
MvTools2 2.7.5.22d is slower than mvtools2 2.7.0.22d

~1.7 fps vs ~1.5 fps with same complex script

is this because it built without ICC?
Yes.
Since then I found some bottlenecking places when using __forceinline helped poor vs2015.

Testing on a typical MDegrain3 script:
2.7.5.22: 4.13 fps (VS2015)
2.7.0.22d: 4.69 fps
2.7.futu.re: 4.62 fps (VS2015)

Promising.
pinterf is offline   Reply With Quote
Old 30th November 2016, 17:26   #109  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by pinterf View Post
Testing on a typical MDegrain3 script:
2.7.5.22: 4.13 fps (VS2015)
2.7.0.22d: 4.69 fps
What CPU do you use for these tests? Also, what switches for ICC?
__________________
Groucho's Avisynth Stuff
Groucho2004 is offline   Reply With Quote
Old 30th November 2016, 17:39   #110  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,266
i7 ivy bridge.
Icc build was for sse2 with optional sse4.1 code paths
pinterf is offline   Reply With Quote
Old 30th November 2016, 17:51   #111  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,582
Quote:
Originally Posted by pinterf View Post
Yes.
Since then I found some bottlenecking places when using __forceinline helped poor vs2015.

Testing on a typical MDegrain3 script:
2.7.5.22: 4.13 fps (VS2015)
2.7.0.22d: 4.69 fps
2.7.futu.re: 4.62 fps (VS2015)

Promising.
why stop using ICL in last ver.? for amd users?
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 30th November 2016, 18:06   #112  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by pinterf View Post
Icc build was for sse2 with optional sse4.1 code paths
Well, that's just one switch (QaxNNN). From my experience with Intel compilers, there are at least 5-7 other switches that can affect performance. Just a little selection:
Code:
/O3    optimize for maximum speed and enable more aggressive optimizations

/Qipo[n]  enable multi-file IP optimization between files

/Qunroll-aggressive

/Qopt-ra-region-strategy[:<keyword>]
          select the method that the register allocator uses to partition each
          routine into regions
            routine - one region per routine
            block   - one region per block
            trace   - one region per trace
            loop    - one region per loop
            default - compiler selects best option

/Qprof  profiling
__________________
Groucho's Avisynth Stuff
Groucho2004 is offline   Reply With Quote
Old 30th November 2016, 18:08   #113  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,266
Will check it later now i'm from mobile.
pinterf is offline   Reply With Quote
Old 30th November 2016, 18:23   #114  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by pinterf View Post
Icc build was for sse2 with optional sse4.1 code paths
One more thing about the automatic CPU dispatcher (enabled with QaX...) in the Intel compiler - This could actually have an impact on AMD CPUs since I suspect that even with the latest incarnation of the compiler it may chose sub-optimal optimizations for those.
If I build for specific instruction sets, I always "hard-code" them by using "Qx..." instead of "QaX...". This way all CPUs use the same code path. That of course means building binaries for each instruction set.
I also recommend testing if the SSE4.x or even AVX options really make a difference. More often than not, they don't. As usual, it all depends on the code.
__________________
Groucho's Avisynth Stuff
Groucho2004 is offline   Reply With Quote
Old 1st December 2016, 11:57   #115  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,261
Quote:
Originally Posted by Groucho2004 View Post
If I build for specific instruction sets, I always "hard-code" them by using "Qx..." instead of "QaX...". This way all CPUs use the same code path. That of course means building binaries for each instruction set.
Oh... You too.

Didn't know some of the options :
Code:
/Qunroll-aggressive
/Qopt-ra-region-strategy[:<keyword>]
I'll have to check themn even if... "aggressive". Just seeing the word make me a little suspicious.
jpsdr is offline   Reply With Quote
Old 1st December 2016, 12:15   #116  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by jpsdr View Post
Oh... You too.

Didn't know some of the options :
Code:
/Qunroll-aggressive
/Qopt-ra-region-strategy[:<keyword>]
I'll have to check themn even if... "aggressive". Just seeing the word make me a little suspicious.
Don't worry.

I suppose my way of building binaries differs a lot from what everyone else does. I don't use the IDE, I create makefiles for my projects and build from the command line with batch files. The makefiles have easily accessible compiler and linker options so I can quickly change them, rebuild and test.

As for all compiler options - run "icl -help" and pipe into a text file. And there are of course the Intel docs that come with the compiler which have lots of stuff about optimization (which almost nobody reads, I guess).
__________________
Groucho's Avisynth Stuff
Groucho2004 is offline   Reply With Quote
Old 1st December 2016, 14:44   #117  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,266
Quote:
Originally Posted by Groucho2004 View Post
One more thing about the automatic CPU dispatcher (enabled with QaX...) in the Intel compiler - This could actually have an impact on AMD CPUs since I suspect that even with the latest incarnation of the compiler it may chose sub-optimal optimizations for those.
If I build for specific instruction sets, I always "hard-code" them by using "Qx..." instead of "QaX...". This way all CPUs use the same code path. That of course means building binaries for each instruction set.
I also recommend testing if the SSE4.x or even AVX options really make a difference. More often than not, they don't. As usual, it all depends on the code.
Not to mention the aligment hints, hints for typical loop size, etc.

Finally I couldn't find my old ICC settings. Sure, loop unrolling was at default, so it was not fine-tuned, but I've seen unrolled loops in the asm output (sometimes I check the asm code that compilers generate). Global optimization was on, and the maximum optimization, too.

When I have successfully gained back a lot of speed, I used VS2015's performance profiler that showed me the parts where the code spends most of the time. Then I forced these functions to be inline.

There were other optimizations I have made, so perhaps the ICC version could also be faster, from the current codebase.

Interesting article on the optimizer changes came with VS2015 update 3:
https://blogs.msdn.microsoft.com/vcb...ode-optimizer/
pinterf is offline   Reply With Quote
Old 1st December 2016, 17:42   #118  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
There are also oddball cases where the compiler options for max. speed have the opposite effect. I have a couple of programs where turning off automatic inlining or even using "O1" instead of "O2/O3" results in faster binaries. Always test, if possible on Intel and AMD CPUs.
__________________
Groucho's Avisynth Stuff
Groucho2004 is offline   Reply With Quote
Old 4th December 2016, 11:06   #119  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,261
Quote:
Originally Posted by Groucho2004 View Post
And there are of course the Intel docs that come with the compiler which have lots of stuff about optimization (which almost nobody reads, I guess).
Yes, i've read it... a long time ago.
jpsdr is offline   Reply With Quote
Old 4th December 2016, 21:01   #120  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,266
New version. 2.7.6.22
Depan and DepanEstimate are unchanged.

https://github.com/pinterf/mvtools/r...s/tag/2.7.6.22

Change log
2.7.6.22 (20161204) - fixes and speedup
  • fix: sumLumaChange underflow (used for dct=2,6,9) (regression during 16 bit support)
  • fix: MeanLumaChange scale for 10-16 bits (used for dct=2,6,9)
  • fix: Mask fix: 8 bit mask resizer bug in SIMD intrinsics - Thx real.finder
    (regression on inline asm -> SIMD transition)
  • Fix: dctmode=1,2: pixel distance was not corrected for 16 bit pixel sizes
  • speed: Let's help VS2015 with some __forceinline directives to recognize the truth.
  • speed: Misc optimizations throughout the code (bit shifts instead of div or mul)
  • speed: FFTW DCT: C code replaced with SIMD SSE2/SSE4 (FloatToBytes, BytesToFloat)
  • speed: 16 bit SAD: a few optimizations in SSE2, AVX-coded SSE2 and AVX2 codepath
  • VS2015 compiler: /MT -> /MD (from static to dynamic dlls - now it really needs VS2015 redistributables)

Last edited by pinterf; 4th December 2016 at 21:03.
pinterf is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:20.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, vBulletin Solutions Inc.