Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se |
|
|||||||
![]() |
|
|
Thread Tools | Search this Thread | Display Modes |
|
|
#361 | Link |
|
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
Hey guys
Anyone care to test the x64 version of masktools in my sig ? If it doesn't crash with mt_invert and mt_edge, then all the asm should work with 64bits asm.
__________________
|
|
|
|
|
|
#363 | Link |
|
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
I was afraid of that
![]() Can you test mt_lut("x", chroma="process") ? (that one doesn't have asm). Oh, and just to be sure, you tested the -25-x64 and -26-x64 variant, didn't you ?
__________________
|
|
|
|
|
|
#365 | Link |
|
Registered User
Join Date: Feb 2010
Posts: 84
|
For what it's worth, here's the hack job that I did on MaskTools 2.0a36.
It's far from pretty, but maybe can provide some basis or at least an idea or two. FYI, there were some templates for functions that got shuffled around because ICC is a bit more strict on where you declare templates, and some other junk. I don't think I mangled the names of any of the functions, and I did some decent editing on the stack / heap management macros as well as the generic computation macros. There's just so many differences in the code it's hard to take a straight directory diff and merge the two sources selectively. One of the main reasons I haven't attempted this yet. If we're getting an officially maintained x64 version, that would be very cool. Last edited by JoshyD; 20th April 2010 at 02:13. |
|
|
|
|
|
#366 | Link |
|
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
You'll be interested in the modifications I made. The assembly rewrite was two fold : lots of cosmetics, and use of the new stack macros - which (ought to) supports win32, lin64 and win64. I hardly changed anything else. Now if it could only work...
I'll see tonight if I have some ideas. Meanwhile, if you want to try and build it yourself, just be advised you'll need yasm 1.0 in order to do so.
__________________
|
|
|
|
|
|
#368 | Link | |
|
Registered User
Join Date: Nov 2009
Posts: 327
|
Quote:
Code:
function DeHalo_alpha(clip clp, float "rx", float "ry", float "darkstr", float "brightstr", float "lowsens", float "highsens", float "ss")
{
rx = default( rx, 2.0 )
ry = default( ry, 2.0 )
darkstr = default( darkstr, 1.0 )
brightstr = default( brightstr, 1.0 )
lowsens = default( lowsens, 50 )
highsens = default( highsens, 50 )
ss = default( ss, 1.5 )
LOS = string(lowsens)
HIS = string(highsens/100.0)
DRK = string(darkstr)
BRT = string(brightstr)
ox = clp.width()
oy = clp.height()
x = ox/rx
y = oy/ry
m4x = x<16?16:int(round(x/4.0)*4)
m4y = y<16?16:int(round(y/4.0)*4)
ssx = ox*ss
ssy = oy*ss
m4ssx = ssx<16?16:int(round(ssx/4.0)*4)
m4ssy = ssy<16?16:int(round(ssy/4.0)*4)
halos = clp.bicubicresize(m4x,m4y).bicubicresize(ox,oy,1,0)
are = mt_lutxy(clp.mt_expand(),clp.mt_inpand(),"x y -")
ugly = mt_lutxy(halos.mt_expand(),halos.mt_inpand(),"x y -")
so = mt_lutxy( ugly, are, "x", "y x - y 0.001 + / 255 * "+LOS+" - y 256 + 512 / "+HIS+" + *" )
lets = mt_merge(halos,clp,so)
remove = (ss==1.0) ? clp.repair(lets,1,0)
\ : clp.spline36resize(m4ssx,m4ssy)
\ .mt_logic(lets.mt_expand().bicubicresize(m4ssx,m4ssy),"min")
\ .mt_logic(lets.mt_inpand().bicubicresize(m4ssx,m4ssy),"max")
\ .spline36resize(ox,oy)
them = mt_lutxy(clp,remove,"x","x y < x x y - "+DRK+" * - x x y - "+BRT+" * - ?",U=2,V=2)
return( them )
}
Last edited by Stephen R. Savage; 20th April 2010 at 11:56. |
|
|
|
|
|
|
#370 | Link | |
|
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,637
|
Quote:
I don't have (and will not install) things like WinMerge. Can you tell me what files i need to take a look for the IEEE 1180 reference ? I've already identify the idctref.cpp file, but it can't be the only one... |
|
|
|
|
|
|
#371 | Link |
|
Registered User
Join Date: Jul 2005
Posts: 438
|
I set up a fresh Windows 7 x64, SEt's Avisynth 2.5.8 MT and the required x64 plugins for GradFun2DBMod and LSFMod (RemoveGrain, AddGrainC, Repair, gradfun2db, masktools) + ffms2 all taken from the first page of this thread. I started encoding with x264.1542.x64 build from x264.nl but the encode always crashes after 1000-1500 frames. I removed the filters step by step and it still crashes when only loading the video with FFVideoSource without any filtering afterwards (ths time after 100 frames). Am I doing something wrong or is there a severe bug?
|
|
|
|
|
|
#372 | Link | |
|
Registered User
Join Date: Nov 2009
Posts: 327
|
Quote:
I have traced the problem to mt_merge(). It does not correctly process clips if they are not aligned to mod8. All other mt functions used provided correct output. You can verify the problem with the following script: Code:
ColorBars()
ConvertToYV12()
mod4 = Crop(4, 0, 0, 0).mt_test().AddBorders(4, 0, 0, 0).Subtitle("mod4")
mod8 = Crop(8, 0, 0, 0).mt_test().AddBorders(8, 0, 0, 0).Subtitle("mod8")
Interleave(mod4, mod8)
function mt_test(clip input)
{
a = Invert(input)
b = FlipVertical(input)
return mt_merge(input, a, b)
}
Last edited by Stephen R. Savage; 25th April 2010 at 16:53. |
|
|
|
|
|
|
#373 | Link |
|
Compiling Encoder
Join Date: Jan 2007
Posts: 1,348
|
I've looked into the non-monotonic PTS warnings that x264 generates when SetMTMode is off and a resizer is used (i was using a 1920x1080 source -> Lanczos4Resize(1440,1080))
xmm6 through xmm14 are not zero and it's causing a situation a condition surrounding a double that should be false to be flagged as true and invalidating the input PTS. this is indicating that the resizer asm code is violating the win64 calling convention by not preserving registers properly. |
|
|
|
|
|
#374 | Link |
|
Registered User
Join Date: Feb 2010
Posts: 84
|
@kemuri-_9
You've got my number on that one. Both MS and Intel compilers never ever conform to the published ABI originally set forth by AMD. Floats should be passed in those xmm0-xmm3 registers, but, in my experience I've never seen either compiler put this into play. So, I was sitting around watching xmm8-xmm15 being completely unused in any C++ code. I got into the habit of only coding around what the compiler I was using was going to produce. It's easy to change, it just makes the heuristics of the previous testing I've done invalid. End result, very very foolish. I was just blown away by the fact that the compiler wasn't following spec, and as most of the interactions with Avisynth64 are from code compiled by either MS or Intel's compiler, I was greedy with my registers, not thinking that I'd even see this problem until ICC (insert ridiculously far off built) or something of the like became prevalent. Intel's all about having a great compiler to support their "awesome" silicon, and they were still shipping a product that minimally made use of x64 . . . and that can come back on them (and me). I tucked away the code that follows spec in expectation of the day that a commercial compiler would simply comply to spec. I tested a number of cases to make sure that the registers weren't in use when the function call was made before committing to using the extra lot. My ABI compliant source is a bit out of sync with my current build, so I'll need some cycles to iterate on what's going to produce the best performance, but hopefully will have a nice little fix for the weekend. Register usage should be relatively simple to cut, some constants are kept in the registers at the top range (or lazily in xmm8 in earlier code that needs updating anyhow), a memory reference here and there shouldn't be too painful. If you didn't research this, I personally am not familiar enough with the build practices of GCC (and derivatives like MinGW) nor the x264 source itself, it would have taken some time to realize the root cause. It's interesting from an academic standpoint. My familiarity of the x264 source stops at what is integrated into MVtool2s, and I had never seen the devs go beyond 8 XMM registers (in the functions borrowed in MVTools2) at the asm level until r1531 apparently. Is the double from the compiler or assembly? As an aside, I've never broken spec on the other ASM. Performance impact of not explicitly defining the extra register use at the machine code level will likely be mitigated by the an out of order execution engine on silicon anyhow (unless you're encoding on Atom, which is just silly) as well as the fairly large / efficient caches on modern architectures . I just want to "get it right" before packaging it up all nice and pretty. @Stephen The mod 8 problem is most likely a problem with a direct conversion of the routines that relied on a parameter being mod 4, but when you move to x64, it gets changed to 8 . . . and you loose granularity. Manao's code is REALLY flexible, and was written with an incredible amount of reuse in mind. Unfortunately, there's going to be corner cases that don't get seen for a bit. If he's going forth with an official x64 build, he may have a better idea of where the underlying problem, and have a particular way he'd like to solve it. I'm willing to help at any turn though. Oh, and the compiler rant above happens to coincide with your question of setting the function name the same as a variable. Compilers have to take what is known about the language syntax, assign variables in their workspace to various logic functions, do logic minimization, see what resources can be shared, where there will be constraints, etc. Now, compilers are written by humans. I guess the main goal would be efficiency. If you name instantiate your class "Cat" as "Cat" it introduces confusion and overhead into process. Consider naming your cat "cat" in the real world, when speaking to other humans (the english language being our syntax in this case), you're going to run into cases where if not known a priori that you had a pet cat named "cat" they would have no idea how to distinguish the two in conversation. The difference being that I can learn that your cat is named cat, compilers have no memory, and lack a good method of learning. We can build in some heuristics and some neat tricks, but we scorn the programmer who wants us to complicate our compiler to understand the difference between variable cat and function/class cat. We can mangle the names at compile time, but that's just unneeded overhead. Just name your variable "instance_of_cat," or similar. Compilers really are quite dumb, amazingly so in some instances. The more explicit you code in a higher level language, the easier it is to produce a quality executable. Assembly becomes attractive when the compiler balks at the code. There's a good ~3 post discussion of mt_impand/mt_expand (or similar function) in the current masktools thread. They're written in straight up C++. When you execute TGMC, these two make up ~90% of the cpu time spent in MaskTools2, which accounts for ~30% CPU time of all processes combined. That's when you say "yes please" to some good ol' assembly. @jpsdr I actually fixed the IEEE 1180 about two weeks ago, and forgot to post The build. I hope that's the right one, will double check and update the first post accordingly. It's been a long week, cheers to everyone keeping an active interest in the project. Last edited by JoshyD; 24th April 2010 at 04:32. |
|
|
|
|
|
#376 | Link |
|
Registered User
Join Date: Nov 2009
Posts: 327
|
@JoshyD: Do you think rewriting your code to conform to the Win64 ABI will cost performance? I had heard D_S mentioning in other threads that the Win64 ABI seriously limits the effectiveness of the extra ADM64 registers.
|
|
|
|
|
|
#377 | Link | ||
|
Compiling Encoder
Join Date: Jan 2007
Posts: 1,348
|
Quote:
I haven't been maintaining my ICL patch to x264 lately, so i haven't been able to see if it similarly shows issues. Quote:
Last edited by kemuri-_9; 24th April 2010 at 04:49. |
||
|
|
|
|
|
#378 | Link |
|
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
For the problem regarding mod8 + mt_merge, it might actually be my fault. Try the same script with masktools 2.0a36 win32, you should (i think) get the same garbled output. And it should be fixed in the win32 dlls contained here :
http://manao4.free.fr/masktools-v2.0a39.zip Now, this package also contains another attempt at a 64bits masktools dll. If somebody could launch DebugView on his computer, then try with this dll mt_merge, mt_invert, mt_lut, mt_edge and mt_logic, then make the resulting log file available to me, it would be great. Alternatively, if somebody could give me RDP access to a win64 machine with avisynth 64, vdub 64 and debugview installed, it would be even better. -
__________________
|
|
|
|
|
|
#379 | Link |
|
Registered User
Join Date: Jan 2007
Posts: 530
|
Manao:
Testing with veedub64 (AMD1.9.9) your latest x64 masktools, I get a unrecognized exception on line 493 in LSFMod() (appears to be the MT_Luxy call). My script is: loadCplugin("C:\Program Files (x86)\AviSynth 2.5\plugins64\ffms2.dll") ffvideosource("E:\output\test.mkv") lsfmod() the mkv contains SD AVC source. Works fine with the mt_Masktools posted on the first page. I have a debugview log, where could I send it? I would gladly allow RDP, but I'm dial-up.
|
|
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|