Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 27th February 2010, 05:00   #41  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 318
Thanks. mt_edge defaults work now, and every example usage of masktools I have seems to perform as expected. Please look into fixing the temporalsoften bug, as temporalsoften is (ab)used fairly often in scripts.
Stephen R. Savage is offline   Reply With Quote
Old 27th February 2010, 07:32   #42  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Alright, I think I've got that temporal soften bug squashed, test it and let me know if there are any issues

I have TempGaussMC running . . . I'm showing some significant speed differences, anyone care to try there hand at it? I've got a four core machine, and telling AviSynth to use ~8 threads seems to be the sweet spot for processor utilization for me.

Last edited by JoshyD; 27th February 2010 at 08:53.
JoshyD is offline   Reply With Quote
Old 27th February 2010, 11:34   #43  |  Link
aegisofrime
Registered User
 
Join Date: Apr 2009
Posts: 452
Quote:
Originally Posted by JoshyD View Post
Alright, I think I've got that temporal soften bug squashed, test it and let me know if there are any issues

I have TempGaussMC running . . . I'm showing some significant speed differences, anyone care to try there hand at it? I've got a four core machine, and telling AviSynth to use ~8 threads seems to be the sweet spot for processor utilization for me.
I want to try it! What are the files that's needed? I'm clueless about running 64-bit Avisynth.

JoshyD I take it you have a Core i7 machine? I wanna see what my Phenom II 955 gets.

Last edited by aegisofrime; 27th February 2010 at 11:37.
aegisofrime is offline   Reply With Quote
Old 27th February 2010, 16:37   #44  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 318
I can confirm that TGMC/EEDI2 works now.

@aegis: You want avs64, eedi2_64, mvtools2_64, mt_masktools64, removegrain64, and repair64.

Things left to do:
1) Port changes to avs 2.6
2) Implement RGB code
3) Port more plugins
4) Clean up code

Last edited by Stephen R. Savage; 27th February 2010 at 16:41.
Stephen R. Savage is offline   Reply With Quote
Old 27th February 2010, 18:46   #45  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by Stephen R. Savage View Post
I can confirm that TGMC/EEDI2 works now.

@aegis: You want avs64, eedi2_64, mvtools2_64, mt_masktools64, removegrain64, and repair64.

Things left to do:
1) Port changes to avs 2.6
2) Implement RGB code
3) Port more plugins
4) Clean up code
@aegis
Stephen's spot on about what to get, the directions for installation of the 64bit dll are pretty straight forward, and contained in the readme that accompanies the archive containing Avisynth64. Any headaches you encounter, come back and we'll (myself and whoever else cares to pitch in(I bet Stephen wouldn't mind)) will be as helpful as possible.

@Stephen
Your list of todo's is quite well put. I wanted to ask you if you saw any speed improvement under the 64bit environment. I went from encoding ~10fps on a 720x480 interlaced dv source to ~20fps. I feel like that's too big of a leap and I may have screwed something up along the line. If your numbers are anything similar, I can rest a little easier.




Code clean up is a big concern . . . it's kind of a mess in there, especially the hack and slash that was done to get the inline assembler in Avisynth to compile. I would really like to rip the assembly out to standalone ASM files and optimize it specifically for x64 to truly take advantage of the extra registers all around. I think there's some extra performance to eek out if that can be done. Any current inline ASM doesn't use the extra registers at our disposal. In addition, a lot of it uses the mmx register set, when we could easily use the SSE registers, which are twice as large and twice as numerous (8 64 bit mmx registers vs 16 128bit xmm (SSE) registers).

Some of the existing code can be directly dropped in to what's in Avisynth 2.6, while other portions of the project have gone to Softwire for dynamic assembly generation. That's just a pain to deal with, I would have rather seen assembly generated dynamically via macros and such as Masktools did. This would mean generating every function needed before runtime, leading to possibly larger code, but removes the dependence on Softwire, which is no longer officially maintained.

Seriously, I'm really impressed with the way the Masktools authors implemented their assembly routines. It obscures a lot of the function generation via a TON of macros, but with some architecture specific aliasing of registers/variables, the codebase can be compiled for both x86 and x64 without too much of a headache. That's almost the state I have it in now, an hour of coding and the source would compile for either architecture. It's also compiler agnostic (for the most part, there were some pieces of C++ that ICC and GCC don't agree with, which were coded around in my release-->most compilers like their function templates outside of *.cpp source files). If any Masktools authors care, I can give you the source I made, and point you to where the more strict compilers complain.

The RGB code should be back in the main DLL without too much work, I think I've almost got it taken care of, hopefully we'll see it back by the end of the week. It just wasn't on the top of my priority list, I wanted to get TempGaussMC up and going before anything else.

The script produces awesome results for restoring my old home movies, but was painfully slow when comparing it to a quick noise removal / deinterlace with some of the other plugins (namely a quick run through FFT3DFilter and MVDegrain coupled with yadifmod/nnedi2). The x64 results appear visually identical, so I don't think there are any steps accidentally skipped in the x64 code path. I'm racking my brain for differences outside of arhcitecture, but can't think of any as of yet.

That being said, working with the same samples, I can literally encode almost twice as fast when working under a pure 64bit environment. This makes re-encoding 24 hours of raw dv video (thank you HDD's in the TB+ range) seem like a practical batch job.

I've grabbed the 2.6 branch from the CVS periodically, and it's not stable for me in x86 land as of yet. The code compiles, and starts to serve the video stream, but output will just drop to zero a few hundred frames into an AVI encode. When the project matures to a point that the authors feel comfortable calling it a beta release, I want to go to work. A lot of the code can just be "dropped in" while I'd like to create a more elegant solution to replace Softwire generated assembly. This would make the whole package a bit more future proof. We're going to see x64 stick around for a while, it makes sense to start to use the features it offers. I personally have 4 x64 machines floating around, while clinging to an old x86 P4 (northwood 2.6ghz) that I use to mess around with various linux distros.

Final note: Both GCC and ICC support iterative compile-->runtime profiling of code-->profile based optimization. None of that has been done for any of these releases. When it's all a bit more stable, I'd like to allow some time for this. The guys at doom10 recommend this approach when doing your own x264 compiles.
JoshyD is offline   Reply With Quote
Old 27th February 2010, 19:36   #46  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
Quote:
Originally Posted by JoshyD View Post
Final note: Both GCC and ICC support iterative compile-->runtime profiling of code-->profile based optimization. None of that has been done for any of these releases. When it's all a bit more stable, I'd like to allow some time for this. The guys at doom10 recommend this approach when doing your own x264 compiles.
this seems like you're trying to imply that MSVC does not have a profiling and optimization feature like ICL/ICC/GCC,
which is untrue as MSVC does have PGO (Profile Guided Optimization)...
i only recall the feature only being in the professional/team versions and not the express/free ones though.

x264 is easier to profile as it's an executable, so you only need to provide a file to work with and it can profile.

avisynth profiling would be more complex as you need to supply both a script and a program to profile avisynth with...
not to mention going through all the code paths would also be a pain.
__________________
custom x264 builds & patches | F@H | My Specs
kemuri-_9 is offline   Reply With Quote
Old 27th February 2010, 19:37   #47  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 318
I think a key problem is a lack of ways to benchmark AVS64. I use avs2avi for standard AVS, but there is no analogue for AVS64. For example, if you benchmark against x264, the results are skewed by the 64-bit advantage for x264. If you normalize by using avs2yuv for the 32-bit Avisynth, you still have to factor in the speedup by cutting out the piping overhead.

That said, here are some benchmarks. All results were taken using x264 64-bit in q=51, preset=ultrafast. 32-bit AVS was fed using avs2yuv, which should have negligible performance cost when single-threading.

Edit: Performance benchmarks have been redone using avs2avi.

TempGaussMC/EEDI2
32-bit: 2.83 fps
64-bit: 3.01 fps

MDeGrain3
32-bit: 6.20 fps
64-bit: 6.90 fps

AAA (mt_masktools, EEDI2)
32-bit: 1.92 fps
64-bit: 0.63 fps (SetMTMode: 1.94 fps)

Didee's Edge Mask
32-bit: 80.16 fps
64-bit: 89.02 fps

EEDI2 Resize2x
32-bit: 5.59 fps
64-bit: 5.67 fps

None of these cases came out bit-exact. I am a bit confused as to why AAA() comes out so much slower when all the components are faster.

Edit: I have traced the AAA slowdown to the following code fragment:

Code:
input = DirectShowSource("640x480p30.xvid.avi")

ox = width(input)
oy = height(input)

aa = TurnRight(input).EEDI2(field=1).TurnLeft().EEDI2(field=1)

edge = mt_logic(mt_edge(aa, "5 10 5 0 0 0 -5 -10 -5 4", 0, 255, 0, 255),
	\ mt_edge(aa, "5 0 -5 10 0 -10 5 0 -5 4", 0, 255, 0, 255), "max").Greyscale().
	\ Levels(0, 0.8, 128, 0, 255, false).Spline36Resize(ox, oy, -0.5, -0.5, 2 * ox, 2 * oy)
ds = Spline36Resize(aa, ox, oy, -0.5, -0.5, 2 * ox, 2 * oy)
maskmerge = mt_merge(input, ds, edge, U=1, V=1)
MergeChroma(ds)
I think it is a cache-related bug, because none of the individual pieces is slower.

Last edited by Stephen R. Savage; 3rd March 2010 at 00:49. Reason: Update benchmarks
Stephen R. Savage is offline   Reply With Quote
Old 27th February 2010, 21:47   #48  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by kemuri-_9 View Post
this seems like you're trying to imply that MSVC does not have a profiling and optimization feature like ICL/ICC/GCC,
which is untrue as MSVC does have PGO (Profile Guided Optimization)...
i only recall the feature only being in the professional/team versions and not the express/free ones though.

x264 is easier to profile as it's an executable, so you only need to provide a file to work with and it can profile.

avisynth profiling would be more complex as you need to supply both a script and a program to profile avisynth with...
not to mention going through all the code paths would also be a pain.
Oh, I know that MSVC does profile, but GCC and ICC are just "better" compilers. If anything, at least give GCC credit for being free, cross platform, having auto parallelization and auto vectorization, and always being up to date. That's a lot for a community driven project. It constantly produces code similar in quality to the much more expensive and commercially developed ICC, and VS2008's compiler is pretty dated, I believe. It will still compile some really questionable coding practices without as much as a warning. I haven't looked at the release candidate for 2010 yet though.

Note:
Compilers are stupid, I've written them in the past. Consider them akin to something like google's translator. The more straightforward you are with the language, the easier it is for a compiler to understand. Hence, I don't like VS2008's lax rules for C++ syntax. Assembly is pretty much as straight forward as it comes, but inlining it into C++ functions is generally a bad idea. You tell the compiler "make me some machine code from this high level language" and you give it machine code wrapped in high level language? Why translate the same sentence twice??

And that ends my rant . . . sorry, back to DLL optimization.

A dll is essentially an executable as well, it just can be dynamically tagged on to whatever program needs it at the time. It contains a set of functions that are then made available to the process that utilizes it. As avisynth is a frame server, it has one purpose, to provide video and audio to the calling program. The entry point into avisynth's dll and subsequent function calls internally shouldn't depend on the program.

The drawback to dll profiling is that you can only profile functions in Avisynth that are called internally by Avisynth. The compiler doesn't know what the external program is doing to the registers, memory, etc, so no profile can be created for any externally called functions. Things like internal filters will behave the same each time they are called, because external programs cannot call something like a resize function directly. It is my understanding that useful profiles can be generated for these. The machine state is going to be known prior to each internal function call.

It would DEFINITELY be painful to try and get every code path to execute on avisynth, while also also providing a number of test videos to stress them.

Just feeding working scripts with strings of useful functions (pseudo-random perl avs generation anyone?) into x264 repeatedly at the command line would hit a lot of the code paths. Random test vector generation has been useful for me in the past, it may fail miserably here, but if I get the time I may try it out. Hey, it's my own time to waste, I'm not suggesting anyone even think about it.
JoshyD is offline   Reply With Quote
Old 27th February 2010, 22:00   #49  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by Stephen R. Savage View Post
I think it is a cache-related bug, because none of the individual pieces is slower.
Time to go bug hunting!

Just converted and restored RGB functionality, it's up and in the latest release on the front page.
JoshyD is offline   Reply With Quote
Old 27th February 2010, 22:04   #50  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
Quote:
Originally Posted by Stephen R. Savage View Post
I think a key problem is a lack of ways to benchmark AVS64. I use avs2avi for standard AVS, but there is no analogue for AVS64. For example, if you benchmark against x264, the results are skewed by the 64-bit advantage for x264. If you normalize by using avs2yuv for the 32-bit Avisynth, you still have to factor in the speedup by cutting out the piping overhead.
A) compile avs2yuv for 64bit
B) use vdub, it has x86 and x64 versions and both work with the corresponding avisynth directly.

---

I would be in agreement on extracting out the inline asm to separate files, utilizing x264's asm abstraction layer as indicated in
http://x264dev.multimedia.cx/?p=191
would likely make things much easier.
__________________
custom x264 builds & patches | F@H | My Specs
kemuri-_9 is offline   Reply With Quote
Old 27th February 2010, 22:53   #51  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 318
Quote:
Originally Posted by JoshyD View Post
Time to go bug hunting!

Just converted and restored RGB functionality, it's up and in the latest release on the front page.
ConvertToRGB seems to work fine, as does ConvertToYV12. Is it a problem that your ported filters are not bit-exact to the originals?

Also, perhaps you could recompile avs2avi, avs2yuv, and wavi against the 64-bit Avisynth. These should be trivial to build. In particular, avs2avi is useful for benchmarking because it has a null output mode and reports the average FPS.

Last edited by Stephen R. Savage; 27th February 2010 at 22:56.
Stephen R. Savage is offline   Reply With Quote
Old 28th February 2010, 00:46   #52  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
@kemuri-_9

I like the way that x264 has done their asm coding, a lot. MvTools2 uses some portions of x264, and the fact that they have a nice and easy abstraction layer made porting those parts of the code seamless.

@Stephen
The issue with the AAA script you wrote is DEFINITELY cache related. Try SetMTMode(1,1) and it magically goes the speed it should when single threaded. It's ever so slightly faster in x64 mode, when you add threading to the mix, there are definite speed gains to be had. I have no idea why the normal caching mechanism is fussy, I didn't re-write any of that code.

I've been using VDub for benchmarking, but can do a quick recompile of those tools in a bit. I just write out raw streams taking any compressor out of the mix.

As for the subtle pixel differences, they're quite odd, and I can't explain them as of yet.

If comparing with ImageMagick, the average error of 24 bit bitmap captures for each channel is ~0.005%. I'm going to see if I can find what's causing the difference, there may be some rounding differences when working with 64bit integers that are playing a role. I'm going to fine tooth comb the code and see what I can see.

Edit: I just remembered that in the normal version of avisynth goes through Softwire for (I think) any planar to non-planar conversion, it very well could be that's where the difference lies.

Last edited by JoshyD; 28th February 2010 at 00:59.
JoshyD is offline   Reply With Quote
Old 28th February 2010, 01:26   #53  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 318
I can confirm that SetMTMode(1, 1) brings performance back to about the same levels. I hope that the bug behind this can be squashed.

If you say that the image errors are trivial, then I will take your word for it.

Edit: Didee's new TGMC requires VerticalCleaner by kassandro. Ain't the hamster wheel of upgrades fun?

Last edited by Stephen R. Savage; 28th February 2010 at 04:25.
Stephen R. Savage is offline   Reply With Quote
Old 1st March 2010, 03:40   #54  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
A rough Vertical Cleaner is up on the first page, it's just the C-Code compiled, but ICC should be doing some of the SSE optimizations for us. TempGaussMC Beta 2 is at least as fast in most cases with this version of the plugin, sometimes still faster. This little filter is far from the limiting factor of the overall script . . .

Edit: The C version is faster than the ASM, odd huh?

Last edited by JoshyD; 1st March 2010 at 04:17.
JoshyD is offline   Reply With Quote
Old 1st March 2010, 04:11   #55  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 318
Bravo, it works. It's a shame that you're the only one working on Avs64. Is there anything that those of us who can't write significant amounts of code can do?
Stephen R. Savage is offline   Reply With Quote
Old 1st March 2010, 22:40   #56  |  Link
turbojet
Registered User
 
Join Date: May 2008
Posts: 1,840
I see directshowsource source in the sources package but any chance of including directshowsource.dll in the binary package?

I've never been able to get squid80's directshowsource.dll working, keeps giving no function errors.

Also if you ever plan on an x64 build of 2.60 I look forward to it considering a resize only script is about 10% faster with 2.60 over 2.58 and about 15% faster then 2.58 MT.
turbojet is offline   Reply With Quote
Old 1st March 2010, 23:06   #57  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 318
DSS was included in one of the earlier copies of the Avs64 build posted here. It seems JoshyD has forgotten to include it in the latest one, so I have uploaded it here: http://www.sendspace.com/file/tm7pcf
Stephen R. Savage is offline   Reply With Quote
Old 1st March 2010, 23:47   #58  |  Link
turbojet
Registered User
 
Join Date: May 2008
Posts: 1,840
Thanks for the upload but unfortunately that doesn't work either. I tried both loadplugin and auto load from plugins directory and this is the result:

avs [error]: Script error: there is no function named "directshowsource"

avisource worked in avisynth64 and graphstudio64 handles the input files correctly.

Has anyone been able to use directshowsource?
turbojet is offline   Reply With Quote
Old 2nd March 2010, 02:45   #59  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 318
DSS64 works fine for me, and I used it in all my performance tests above.
Stephen R. Savage is offline   Reply With Quote
Old 2nd March 2010, 07:49   #60  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Whoops, sorry guys, updating the archive with direct show source, and it definitely is working for me. Don't know what other idiosyncrasies would be giving you grief when loading the plugin.

@Stephen:
I compiled a 64bit avs2avi for benchmarking and such, works fine for me, hope it does for you as well.

Last edited by JoshyD; 2nd March 2010 at 09:50.
JoshyD is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:25.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2017, vBulletin Solutions Inc.