Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th April 2010, 19:57   #361  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,856
Hey guys

Anyone care to test the x64 version of masktools in my sig ? If it doesn't crash with mt_invert and mt_edge, then all the asm should work with 64bits asm.
__________________
Manao is offline   Reply With Quote
Old 19th April 2010, 20:20   #362  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
I receive an unrecognized exception with both mt_edge() and mt_invert(), Manao. I tested both the -25 and -26 variants.
Stephen R. Savage is offline   Reply With Quote
Old 19th April 2010, 20:36   #363  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,856
I was afraid of that

Can you test mt_lut("x", chroma="process") ? (that one doesn't have asm).

Oh, and just to be sure, you tested the -25-x64 and -26-x64 variant, didn't you ?
__________________
Manao is offline   Reply With Quote
Old 20th April 2010, 00:11   #364  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
I still receive an exception with your mt_lut("x", chroma="process") command. I tried mt_masktools-25-x64.dll and mt_masktools-26-x64.dll.
Stephen R. Savage is offline   Reply With Quote
Old 20th April 2010, 02:09   #365  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
For what it's worth, here's the hack job that I did on MaskTools 2.0a36.

It's far from pretty, but maybe can provide some basis or at least an idea or two.

FYI, there were some templates for functions that got shuffled around because ICC is a bit more strict on where you declare templates, and some other junk. I don't think I mangled the names of any of the functions, and I did some decent editing on the stack / heap management macros as well as the generic computation macros. There's just so many differences in the code it's hard to take a straight directory diff and merge the two sources selectively. One of the main reasons I haven't attempted this yet. If we're getting an officially maintained x64 version, that would be very cool.

Last edited by JoshyD; 20th April 2010 at 02:13.
JoshyD is offline   Reply With Quote
Old 20th April 2010, 06:58   #366  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,856
You'll be interested in the modifications I made. The assembly rewrite was two fold : lots of cosmetics, and use of the new stack macros - which (ought to) supports win32, lin64 and win64. I hardly changed anything else. Now if it could only work...

I'll see tonight if I have some ideas.

Meanwhile, if you want to try and build it yourself, just be advised you'll need yasm 1.0 in order to do so.
__________________
Manao is offline   Reply With Quote
Old 20th April 2010, 11:25   #367  |  Link
trevaaar
Registered User
 
Join Date: Feb 2009
Location: Australia
Posts: 24
Just tried Dehalo_alpha with the 4/12 build and output is seriously broken. The crashing with MCTemporalDenoise I said about earlier in the thread is gone now though.
trevaaar is offline   Reply With Quote
Old 20th April 2010, 11:45   #368  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
Quote:
Originally Posted by trevaaar View Post
Just tried Dehalo_alpha with the 4/12 build and output is seriously broken. The crashing with MCTemporalDenoise I said about earlier in the thread is gone now though.
I see no problems with DeHalo_alpha. I assume you are using a MaskTools2 modified version. In case you aren't:

Code:
function DeHalo_alpha(clip clp, float "rx", float "ry", float "darkstr", float "brightstr", float "lowsens", float "highsens", float "ss")
{
rx        = default( rx,        2.0 )
ry        = default( ry,        2.0 )
darkstr   = default( darkstr,   1.0 )
brightstr = default( brightstr, 1.0 )
lowsens   = default( lowsens,    50 )
highsens  = default( highsens,   50 )
ss        = default( ss,        1.5 )

LOS = string(lowsens)
HIS = string(highsens/100.0)
DRK = string(darkstr)
BRT = string(brightstr)
ox  = clp.width()
oy  = clp.height()

x = ox/rx
y = oy/ry
m4x = x<16?16:int(round(x/4.0)*4)
m4y = y<16?16:int(round(y/4.0)*4)
ssx = ox*ss
ssy = oy*ss
m4ssx = ssx<16?16:int(round(ssx/4.0)*4)
m4ssy = ssy<16?16:int(round(ssy/4.0)*4)

halos  = clp.bicubicresize(m4x,m4y).bicubicresize(ox,oy,1,0)
are    = mt_lutxy(clp.mt_expand(),clp.mt_inpand(),"x y -")
ugly   = mt_lutxy(halos.mt_expand(),halos.mt_inpand(),"x y -")
so     = mt_lutxy( ugly, are, "x", "y x - y 0.001 + / 255 * "+LOS+" - y 256 + 512 / "+HIS+" + *" )
lets   = mt_merge(halos,clp,so)
remove = (ss==1.0) ? clp.repair(lets,1,0) 
          \        : clp.spline36resize(m4ssx,m4ssy)
          \             .mt_logic(lets.mt_expand().bicubicresize(m4ssx,m4ssy),"min")
          \             .mt_logic(lets.mt_inpand().bicubicresize(m4ssx,m4ssy),"max")
          \             .spline36resize(ox,oy)
them   = mt_lutxy(clp,remove,"x","x y < x x y - "+DRK+" * - x x y - "+BRT+" * - ?",U=2,V=2)

return( them )
}

Last edited by Stephen R. Savage; 20th April 2010 at 11:56.
Stephen R. Savage is offline   Reply With Quote
Old 22nd April 2010, 10:19   #369  |  Link
trevaaar
Registered User
 
Join Date: Feb 2009
Location: Australia
Posts: 24
Yep, that's the script I'm using. It's not an error running the script, it's garbled output. On further inspection, it only appears to do so if width is not mod8. Alignment problem somewhere?
trevaaar is offline   Reply With Quote
Old 22nd April 2010, 11:25   #370  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,637
Quote:
Originally Posted by JoshyD View Post
Here's the source I'm building from. If you do a wholesale diff on the directory with something like WinMerge, it's pretty easy to see where it's been hacked apart to work with x64, just look for #ifndef _AMD64_ for most of the 64bit only mods.
I'm totaly lost indeed... I'm absolutely not used with the way the asm files are writen, and i understand nothing...
I don't have (and will not install) things like WinMerge.
Can you tell me what files i need to take a look for the IEEE 1180 reference ?
I've already identify the idctref.cpp file, but it can't be the only one...
jpsdr is offline   Reply With Quote
Old 22nd April 2010, 13:54   #371  |  Link
moviefan
Registered User
 
Join Date: Jul 2005
Posts: 438
I set up a fresh Windows 7 x64, SEt's Avisynth 2.5.8 MT and the required x64 plugins for GradFun2DBMod and LSFMod (RemoveGrain, AddGrainC, Repair, gradfun2db, masktools) + ffms2 all taken from the first page of this thread. I started encoding with x264.1542.x64 build from x264.nl but the encode always crashes after 1000-1500 frames. I removed the filters step by step and it still crashes when only loading the video with FFVideoSource without any filtering afterwards (ths time after 100 frames). Am I doing something wrong or is there a severe bug?
moviefan is offline   Reply With Quote
Old 22nd April 2010, 20:29   #372  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
Quote:
Originally Posted by trevaaar View Post
Yep, that's the script I'm using. It's not an error running the script, it's garbled output. On further inspection, it only appears to do so if width is not mod8. Alignment problem somewhere?
@ JoshyD
I have traced the problem to mt_merge(). It does not correctly process clips if they are not aligned to mod8. All other mt functions used provided correct output. You can verify the problem with the following script:

Code:
ColorBars()
ConvertToYV12()
mod4 = Crop(4, 0, 0, 0).mt_test().AddBorders(4, 0, 0, 0).Subtitle("mod4")
mod8 = Crop(8, 0, 0, 0).mt_test().AddBorders(8, 0, 0, 0).Subtitle("mod8")
Interleave(mod4, mod8)

function mt_test(clip input)
{
    a = Invert(input)
    b = FlipVertical(input)
    return mt_merge(input, a, b)
}
Incidentally, I wonder why most programming languages don't let you set variables equal to the names of functions and data types like Avisynth does.

Last edited by Stephen R. Savage; 25th April 2010 at 16:53.
Stephen R. Savage is offline   Reply With Quote
Old 23rd April 2010, 02:25   #373  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
I've looked into the non-monotonic PTS warnings that x264 generates when SetMTMode is off and a resizer is used (i was using a 1920x1080 source -> Lanczos4Resize(1440,1080))
xmm6 through xmm14 are not zero and it's causing a situation a condition surrounding a double that should be false to be flagged as true and invalidating the input PTS.

this is indicating that the resizer asm code is violating the win64 calling convention by not preserving registers properly.
__________________
custom x264 builds & patches | F@H | My Specs
kemuri-_9 is offline   Reply With Quote
Old 24th April 2010, 04:00   #374  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
@kemuri-_9
You've got my number on that one.

Both MS and Intel compilers never ever conform to the published ABI originally set forth by AMD. Floats should be passed in those xmm0-xmm3 registers, but, in my experience I've never seen either compiler put this into play. So, I was sitting around watching xmm8-xmm15 being completely unused in any C++ code. I got into the habit of only coding around what the compiler I was using was going to produce. It's easy to change, it just makes the heuristics of the previous testing I've done invalid. End result, very very foolish.

I was just blown away by the fact that the compiler wasn't following spec, and as most of the interactions with Avisynth64 are from code compiled by either MS or Intel's compiler, I was greedy with my registers, not thinking that I'd even see this problem until ICC (insert ridiculously far off built) or something of the like became prevalent. Intel's all about having a great compiler to support their "awesome" silicon, and they were still shipping a product that minimally made use of x64 . . . and that can come back on them (and me).

I tucked away the code that follows spec in expectation of the day that a commercial compiler would simply comply to spec. I tested a number of cases to make sure that the registers weren't in use when the function call was made before committing to using the extra lot. My ABI compliant source is a bit out of sync with my current build, so I'll need some cycles to iterate on what's going to produce the best performance, but hopefully will have a nice little fix for the weekend. Register usage should be relatively simple to cut, some constants are kept in the registers at the top range (or lazily in xmm8 in earlier code that needs updating anyhow), a memory reference here and there shouldn't be too painful.

If you didn't research this, I personally am not familiar enough with the build practices of GCC (and derivatives like MinGW) nor the x264 source itself, it would have taken some time to realize the root cause. It's interesting from an academic standpoint. My familiarity of the x264 source stops at what is integrated into MVtool2s, and I had never seen the devs go beyond 8 XMM registers (in the functions borrowed in MVTools2) at the asm level until r1531 apparently. Is the double from the compiler or assembly?

As an aside, I've never broken spec on the other ASM. Performance impact of not explicitly defining the extra register use at the machine code level will likely be mitigated by the an out of order execution engine on silicon anyhow (unless you're encoding on Atom, which is just silly) as well as the fairly large / efficient caches on modern architectures . I just want to "get it right" before packaging it up all nice and pretty.

@Stephen
The mod 8 problem is most likely a problem with a direct conversion of the routines that relied on a parameter being mod 4, but when you move to x64, it gets changed to 8 . . . and you loose granularity. Manao's code is REALLY flexible, and was written with an incredible amount of reuse in mind. Unfortunately, there's going to be corner cases that don't get seen for a bit. If he's going forth with an official x64 build, he may have a better idea of where the underlying problem, and have a particular way he'd like to solve it. I'm willing to help at any turn though.

Oh, and the compiler rant above happens to coincide with your question of setting the function name the same as a variable. Compilers have to take what is known about the language syntax, assign variables in their workspace to various logic functions, do logic minimization, see what resources can be shared, where there will be constraints, etc.

Now, compilers are written by humans. I guess the main goal would be efficiency. If you name instantiate your class "Cat" as "Cat" it introduces confusion and overhead into process. Consider naming your cat "cat" in the real world, when speaking to other humans (the english language being our syntax in this case), you're going to run into cases where if not known a priori that you had a pet cat named "cat" they would have no idea how to distinguish the two in conversation.

The difference being that I can learn that your cat is named cat, compilers have no memory, and lack a good method of learning. We can build in some heuristics and some neat tricks, but we scorn the programmer who wants us to complicate our compiler to understand the difference between variable cat and function/class cat. We can mangle the names at compile time, but that's just unneeded overhead. Just name your variable "instance_of_cat," or similar.

Compilers really are quite dumb, amazingly so in some instances. The more explicit you code in a higher level language, the easier it is to produce a quality executable. Assembly becomes attractive when the compiler balks at the code. There's a good ~3 post discussion of mt_impand/mt_expand (or similar function) in the current masktools thread. They're written in straight up C++. When you execute TGMC, these two make up ~90% of the cpu time spent in MaskTools2, which accounts for ~30% CPU time of all processes combined. That's when you say "yes please" to some good ol' assembly.

@jpsdr
I actually fixed the IEEE 1180 about two weeks ago, and forgot to post The build. I hope that's the right one, will double check and update the first post accordingly.

It's been a long week, cheers to everyone keeping an active interest in the project.

Last edited by JoshyD; 24th April 2010 at 04:32.
JoshyD is offline   Reply With Quote
Old 24th April 2010, 04:04   #375  |  Link
Mr VacBob
Registered User
 
Join Date: Feb 2005
Posts: 140
Windows doesn't use the x86-64 ABI, so compilers shouldn't be expected to respect it.
Mr VacBob is offline   Reply With Quote
Old 24th April 2010, 04:28   #376  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
@JoshyD: Do you think rewriting your code to conform to the Win64 ABI will cost performance? I had heard D_S mentioning in other threads that the Win64 ABI seriously limits the effectiveness of the extra ADM64 registers.
Stephen R. Savage is offline   Reply With Quote
Old 24th April 2010, 04:35   #377  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
Quote:
Originally Posted by JoshyD View Post
If you didn't research this, I personally am not familiar enough with the build practices of GCC (and derivatives like MinGW) nor the x264 source itself, it would have taken some time to realize the root cause. It's interesting from an academic standpoint. My familiarity of the x264 source stops at what is integrated into MVtool2s, and I had never seen the devs go beyond 8 XMM registers (in the functions borrowed in MVTools2) at the asm level until r1531 apparently. Is the double from the compiler or assembly?
line 1475 from x264.c (r1563) is the aforementioned double that was exhibiting the problem, so this would be GCC/MinGW.

I haven't been maintaining my ICL patch to x264 lately, so i haven't been able to see if it similarly shows issues.

Quote:
As an aside, I've never broken spec on the other ASM. Performance impact of not explicitly defining the extra register use at the machine code level will likely be mitigated by the an out of order execution engine on silicon anyhow (unless you're encoding on Atom, which is just silly) as well as the fairly large / efficient caches on modern architectures . I just want to "get it right" before packaging it up all nice and pretty.
don't feel too bad about it as the problem was found and you are willing to fix it, unlike the situation with ffmpeg where they know they violate the win64 ABI and no one really wants to fix the problem it seems.
__________________
custom x264 builds & patches | F@H | My Specs

Last edited by kemuri-_9; 24th April 2010 at 04:49.
kemuri-_9 is offline   Reply With Quote
Old 24th April 2010, 11:11   #378  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,856
For the problem regarding mod8 + mt_merge, it might actually be my fault. Try the same script with masktools 2.0a36 win32, you should (i think) get the same garbled output. And it should be fixed in the win32 dlls contained here :

http://manao4.free.fr/masktools-v2.0a39.zip

Now, this package also contains another attempt at a 64bits masktools dll. If somebody could launch DebugView on his computer, then try with this dll mt_merge, mt_invert, mt_lut, mt_edge and mt_logic, then make the resulting log file available to me, it would be great. Alternatively, if somebody could give me RDP access to a win64 machine with avisynth 64, vdub 64 and debugview installed, it would be even better.


-
__________________
Manao is offline   Reply With Quote
Old 24th April 2010, 12:45   #379  |  Link
noee
Registered User
 
Join Date: Jan 2007
Posts: 530
Manao:
Testing with veedub64 (AMD1.9.9) your latest x64 masktools, I get a unrecognized exception on line 493 in LSFMod() (appears to be the MT_Luxy call). My script is:

loadCplugin("C:\Program Files (x86)\AviSynth 2.5\plugins64\ffms2.dll")
ffvideosource("E:\output\test.mkv")
lsfmod()

the mkv contains SD AVC source. Works fine with the mt_Masktools posted on the first page. I have a debugview log, where could I send it? I would gladly allow RDP, but I'm dial-up.
noee is offline   Reply With Quote
Old 24th April 2010, 13:24   #380  |  Link
manoj4986
Registered User
 
Join Date: Sep 2006
Posts: 8
Quote:
Originally Posted by manoj4986 View Post
Guys also need following 64 bit plugins
1)HDRAGC
2)dctfilter
3)medianblur
Anyone working on these 3 filters to bring 64 bit version?

Regards,
manoj
manoj4986 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:49.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.