SEt's Avisynth 2.5.8 MT compiled for *X86_64*, Latest Build 4/16/2010 - Page 11

VincAlastor · 17th March 2010, 20:39

Quote:

Originally Posted by neuron2

I probably don't need a 64 bit OS to make the build. But I do need someone with 64-bit expertise to tell me how to make such builds. I have never made a 64-bit anything.

i hope there is an expert. don't be angry about me, i think you could understand me, that i wish a x64 version of your modern source filters

i'm not a programmer, but i'm a good tester

Guest · 17th March 2010, 20:40

I'm not angry! I'll be happy to make a 64-bit build if someone is willing to educate me about it.

JoshyD · 17th March 2010, 21:18

@neuron2
You can compile 64bit binaries without a 64bit operating system, you just can't test them without a 64bit computer/os

The ease of porting the build will depend on how much it relies on assembly language, always considering pointers as the same length as an int, etc. If you want to try, I can find you some good reading material.

Also, isn't dg nv tools written in CUDA? I'm not super-familiar with compiling it, is it done via a custom NVIDIA compiler? If that's the case, and it only uses NVIDIA specific language references (I know CUDA is C like, or something) then it may be as easy as changing your compile target architecture to x64.

A little background info on how it's written, and I think I can get you going . . .

@Audionut
That error (the MT plugin is written in russian originally, I think?) usually happens when something goes wrong in the script you're trying to invoke. Syntax error, filter not loaded, something little is missing. That's really the only place in the code I see that error, so I don't know what to tell you. Something's throwing an exception in your chain.

@osgZach
Thread deadlock should be resolved by tonight. I'm pretty sure it's a result of me giving the compiler free reign to add parallelism where it *thinks* it is safe. I think I've mentioned this in an earlier post, compilers are stupid, I shouldn't have trusted it, but oh well. I can thread it all by hand eventually, but that's not my main concern. I'm working on some performance tweaks here and there that should make everyone smile.

Guest · 17th March 2010, 21:45

Quote:

Originally Posted by JoshyD

If you want to try, I can find you some good reading material.

Sure, please do. There is no assembler in the DLLs and just some basic stuff in DGIndexNV for chroma upsampling.

Quote:

Also, isn't dg nv tools written in CUDA?

No. It uses the CUVID API.

Quote:

A little background info on how it's written, and I think I can get you going . . .

See above.

osgZach · 17th March 2010, 23:13

Joshy, cool will look forward to testing when it's released.

Also, just wanted to say, I know we all have problems with the build but don't let our whining control when you release updates or how often you work on it. I know how easy it is to get burned out on something you start out doing because you want to. I'd rather download an update a week later than a hastily put together crowd pleaser

In other news.. I've gone back to using MT("filter") as I appear to have solved the overlap issue by cranking it up to 12.. Which I was initially afraid to do because I thought it would actually overlap the sections to the point where actual data started dissapearing.. It seems to work though, I will be comparing it against other encodes made without MT("filter")..

It is interesting to note, that I can actually use my PC for web browsing and other tasks when using MT("filter") - as opposed to SetMTMode which makes the entire thing slow down to a bog crawl. Even though performance is roughly the same (although MT("filter") seems a bit faster ).. Any reason why that is? And would it be reasonable to expert SetMTMode in the future to not bog down the PC completely even when using all cores?

levi · 18th March 2010, 04:55

SetMTMode doesn't bog down my PC. if you are getting weird behavior, i suggest you look in to:

limit # of threads - SetMTMode(2,x) - x = threads
limit max memory - SetMaxMemory(512)
try regulator()
post your script

paulvdb · 18th March 2010, 14:44

Quote:

Originally Posted by neuron2

Sure, please do. There is no assembler in the DLLs and just some basic stuff in DGIndexNV for chroma upsampling.

Only the DLLs have to be 64-bit because they have to be loaded by 64-bit avisynth. You can still use the 32-bit indexer. At least that's how squid80 did it with DGMPGDec.

osgZach · 18th March 2010, 15:47

Levi, I will if I have problems in the future. But for now I'm content that its working the way I have it.

The only time SetMtMode didn't bog down the PC was when trying out the x32 builds w/MT. And performance was not really better than single threaded either for the most part, although I saw occassional spikes past 50 under task manager.. Maybe I just have a goofy install of Windows going on, or something like that..

But if its working under MT("filter") then that is OK for now.

re: DGindex/NV, The indexer itself runs fine as a 32bit app, its the decoder DLL's that have to be 64bit, and they will have no problems loading the index file as its just a basic text file after all.. Squid's x64 release is just the Dgdecode.dll compiled for x64.

Hiritsuki · 18th March 2010, 17:52

I just waiting tnlmeans x64 ver.
and it's release I change use X86 MT to X64 MT

ifb · 19th March 2010, 18:21

I have a 1080i MPEG2 source that I deinterlace and resize to 512x288.

Vertically resizing to anything less than 720 causes crashes. BilinearResize() did not crash, but every other resizer did.

Using the latest Intel build from the top post on an i7 920.

Code:

SetMemoryMax(512)
Global NewHeight = 288
Global NewWidth = 512
SetMTMode(5)

Vid = MPEG2Source("file.d2v")
Aud = wavSource("file.wav")
SetMtMode(2)
AudioDub(Vid,Aud)
Spline36Resize(NewWidth,NewHeight)

osgZach · 19th March 2010, 19:52

It doesn't look like your initial SetMTMode call specified the number of threads, try (5,0). You might even be able to get away with Mode 2.. There seems to be conflicting opinions about whether it works properly under any mode less than 5.

Is your input clip progressive? You shouldn't be resizing before a deinterlace also, and I don't know if trying to resize an interlaced clip would mess up MT either ?

You could also try resizing before your Dub operation ?

ifb · 19th March 2010, 20:22

Quote:

Originally Posted by osgZach

It doesn't look like your initial SetMTMode call specified the number of threads, try (5,0).

Doesn't change anything. I've never had to specify thread number, even with vanilla avisynth.

Quote:

You might even be able to get away with Mode 2.. There seems to be conflicting opinions about whether it works properly under any mode less than 5.

Not setting mode 5 before DGSource reduces speed dramatically, even when changing to mode 2 later in the script.

Quote:

Is your input clip progressive? You shouldn't be resizing before a deinterlace also, and I don't know if trying to resize an interlaced clip would mess up MT either ?

It's 1080i, like I said. I removed the deinterlacer for simplicity. I can resize horizontally all I want, just not vertically to a value less than 720.

Normally (32-bit) the script would be:

Code:

Spline36Resize(NewWidth,height,8,0,-8,0)
Yadif()
Spline36Resize(width,NewHeight,0,-4,0,-4)

Leaving SetMTMode() out completely, causes virtualdub to get green frames instead of crashing completely on open.

osgZach · 19th March 2010, 20:49

Are you using the newest compile of Dgdecode? (1.5.8)

Don't know if it matters, but not all the iDCT modes work, could that be causing an issue?

That's about all I can think of really. I don't have any HD source files to play with.

ifb · 19th March 2010, 21:06

Quote:

Originally Posted by osgZach

Are you using the newest compile of Dgdecode? (1.5.8)

Downloaded today.

Quote:

Don't know if it matters, but not all the iDCT modes work, could that be causing an issue?

No, because then it wouldn't decode anything (or the output would be trash).

It's very simple. Resize height<720 using lanczos, spline, or bicubic -> crashes virtualdub and avs2avi.

OS is Win2k8 Server 64-bit.

I should be thorough and try an older build (prior to the vertical resizer using SSSE3).

<edit>
The 3-1-2010 build works fine.

JoshyD · 19th March 2010, 23:53

I think your resize problem is really my fault . . . I noticed I rolled back to an older version of the resample function source when releasing that build. If you could try my latest compile, I'd appreciate it. Let me know if it's still crashing so I know if I need to delve deeper into the matter.

@neuron2
This details the differences between x86 and x64. The main thing to watch out for is any assembler that assumes a certain calling convention. In x64, instead of parameter passing on the stack, you get the first four "integer" (pointers, shorts, whatever) in rcx, rdx, r8, and r9. The first four floating point typed arguments are passed in xmm0-xmm3 (I think). There is shadow space created on the stack where you would normally find these variables, but don't go looking there for them. It's just garbage memory to start with, unless you explicitly store the parameter there for later use. Also, all variables on the stack are aligned to 64bits. So, even though int types should take 4 bytes on your stack, 8 are actually allocated. However, when reading these from memory, you can't read all 8 bytes, only the first 4 contain useful memory.

Code:

;=============================================================================
;void mmx_merge_luma( unsigned int *src, unsigned int *luma, int pitch, int luma_pitch,int width, int height )
;=============================================================================
; parameter 1(src): rcx
; parameter 2(luma): rdx
; parameter 3(pitch): r8d
; parameter 4(luma_pitch): r9d 
; parameter 5(width): rsp + 40
; parameter 6(height): rsp + 48

in this example, getting from memory would look like:

Code:

mov eax, DWORD [rsp+40]

You can't do this:

Code:

mov rax, QWORD [rsp+40]

Because bytes 44-47 are garbage. It's the little things you have to get used to.

There are some other oddities, like whenever an extended register is used, a REX prefix accompanies the opcode. If you want to use an extended register (r8-r15) as a counter, dec and inc always treat them as 64bit values, even when specifying:

Code:

dec r8d
or
inc r9d

You still get a 64 bit add, which is slower.

Volatile registers:
rax
rcx
rdx
r8
r9
r10
r11

Non volatiles:
rbx
rbp
rdi
rsi
r12
r13
r14
r15

XMM0-XMM5 are volatile, XMM6-XMM15 are non-volatile.

If your compiler supports 64bit binary compilation, then your shouldn't have a problem just taking the source and compiling it as is.

Watch out for MSVS's compiler, it takes away inline asm when compiling for x64.

squid_80 · 20th March 2010, 04:47

64-bit adds/subs aren't slower, that's the whole point of having a processor that uses 64-bit registers.

Stephen R. Savage · 20th March 2010, 06:55

I can confirm that I experience "Avisynth: unknown exception" when resizing 1920x1080 to 704x396 with Spline36Resize, using the latest Intel build. 1920x1080 -> 1280x720 -> 704x396 works.

JoshyD · 20th March 2010, 15:43

My bad on the inc, dec instructions, I thought I had read something in the Intel docs about them, it was actually that some forms of them aren't supported. I think the instruction is encoding may longer when using the extended registers, regardless of which portion of the register you want to inc or dec.

Quote:

The INC and DEC instructions are supported in 64-bit mode. However, some forms of INC and DEC (the register operand being encoded using register extension field in the MOD R/M byte) are not encodable in 64-bit mode because the opcodes are treated as REX prefixes.

There's also something about register dependency breaking (the whole register renaming in an out of order core thing) if you modify all the flags with an ADD or SUB rather than just an INC or DEC, I don't remember exactly what it is.
Edit: found it, from intel's optimization guidelines

Quote:

Assembly/Compiler Coding Rule 32. (M impact, H generality) INC and DEC instructions should be replaced with ADD or SUB instructions, because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore creating false dependencies on earlier instructions that set the flags.

I'll look into why the resize is throwing an error. What color space is it giving the error in? I can't seem to get the crash in any color space . . .

yo4kazu · 21st March 2010, 02:21

64bit support so that you can do?

_GPU25.dll(source)
http://www.mediafire.com/?jdtmm2cjvrm

HLSLfile for HD48xx
http://www.mediafire.com/?jdtmm2cjvrm

original author:thejam79
GPU_001(binary&source)
http://www.avisynth.info/?plugin=att...le=GPU_001.zip
custom author:gpu25clone
_GPU25(binary)
http://www.avisynth.info/?plugin=att...nary_Rev53.rar

include filter on GPU sharder
(GPU_001)BilinearResize,ColorYUY2,Convolution3d,IT,LanczosResize,TemporalSmoother,WNR
(_GPU25)2DClean,2DCleanFake,Wavelet,HardwareResize,Blur,Sharp,HFlip,VFlip

turbojet · 21st March 2010, 02:29

JoshyD a few questions about the separate builds.

Does the amd build work on intel cpus? If so, is there a significant speedup of intel over the amd build on intel cpus? If not, any reason for separate builds? If so, do you want me to improve the install script to use amd dll for amd cpus and intel dll for intel cpus? If so, what will be the names of the 2 dll's?

Also I could make a self contained exe to install or use a typical installer but I'd have to inject new dll's every time, also typical installers are invasive in the registry. Maybe when the development slows down i'll look into it if you want.

17th March 2010, 21:18	#203 \| Link
JoshyD Registered User Join Date: Feb 2010 Posts: 84	@neuron2 You can compile 64bit binaries without a 64bit operating system, you just can't test them without a 64bit computer/os The ease of porting the build will depend on how much it relies on assembly language, always considering pointers as the same length as an int, etc. If you want to try, I can find you some good reading material. Also, isn't dg nv tools written in CUDA? I'm not super-familiar with compiling it, is it done via a custom NVIDIA compiler? If that's the case, and it only uses NVIDIA specific language references (I know CUDA is C like, or something) then it may be as easy as changing your compile target architecture to x64. A little background info on how it's written, and I think I can get you going . . . @Audionut That error (the MT plugin is written in russian originally, I think?) usually happens when something goes wrong in the script you're trying to invoke. Syntax error, filter not loaded, something little is missing. That's really the only place in the code I see that error, so I don't know what to tell you. Something's throwing an exception in your chain. @osgZach Thread deadlock should be resolved by tonight. I'm pretty sure it's a result of me giving the compiler free reign to add parallelism where it thinks it is safe. I think I've mentioned this in an earlier post, compilers are stupid, I shouldn't have trusted it, but oh well. I can thread it all by hand eventually, but that's not my main concern. I'm working on some performance tweaks here and there that should make everyone smile. Last edited by JoshyD; 17th March 2010 at 21:25.

18th March 2010, 15:47	#208 \| Link
osgZach Registered User Join Date: Feb 2009 Location: USA Posts: 676	Levi, I will if I have problems in the future. But for now I'm content that its working the way I have it. The only time SetMtMode didn't bog down the PC was when trying out the x32 builds w/MT. And performance was not really better than single threaded either for the most part, although I saw occassional spikes past 50 under task manager.. Maybe I just have a goofy install of Windows going on, or something like that.. But if its working under MT("filter") then that is OK for now. re: DGindex/NV, The indexer itself runs fine as a 32bit app, its the decoder DLL's that have to be 64bit, and they will have no problems loading the index file as its just a basic text file after all.. Squid's x64 release is just the Dgdecode.dll compiled for x64. Last edited by osgZach; 18th March 2010 at 15:49.

18th March 2010, 17:52	#209 \| Link
Hiritsuki Novice of AVS Join Date: Oct 2009 Posts: 156	I just waiting tnlmeans x64 ver. and it's release I change use X86 MT to X64 MT __________________ My PC

19th March 2010, 18:21	#210 \| Link
ifb Registered User Join Date: Dec 2009 Posts: 72	I have a 1080i MPEG2 source that I deinterlace and resize to 512x288. Vertically resizing to anything less than 720 causes crashes. BilinearResize() did not crash, but every other resizer did. Using the latest Intel build from the top post on an i7 920. Code: SetMemoryMax(512) Global NewHeight = 288 Global NewWidth = 512 SetMTMode(5) Vid = MPEG2Source("file.d2v") Aud = wavSource("file.wav") SetMtMode(2) AudioDub(Vid,Aud) Spline36Resize(NewWidth,NewHeight)

19th March 2010, 19:52	#211 \| Link
osgZach Registered User Join Date: Feb 2009 Location: USA Posts: 676	It doesn't look like your initial SetMTMode call specified the number of threads, try (5,0). You might even be able to get away with Mode 2.. There seems to be conflicting opinions about whether it works properly under any mode less than 5. Is your input clip progressive? You shouldn't be resizing before a deinterlace also, and I don't know if trying to resize an interlaced clip would mess up MT either ? You could also try resizing before your Dub operation ? Last edited by osgZach; 19th March 2010 at 20:05.

17th March 2010, 20:40	#202 \| Link
Guest Guest Join Date: Jan 2002 Posts: 21,901	I'm not angry! I'll be happy to make a 64-bit build if someone is willing to educate me about it.

17th March 2010, 23:13	#205 \| Link
osgZach Registered User Join Date: Feb 2009 Location: USA Posts: 676	Joshy, cool will look forward to testing when it's released. Also, just wanted to say, I know we all have problems with the build but don't let our whining control when you release updates or how often you work on it. I know how easy it is to get burned out on something you start out doing because you want to. I'd rather download an update a week later than a hastily put together crowd pleaser In other news.. I've gone back to using MT("filter") as I appear to have solved the overlap issue by cranking it up to 12.. Which I was initially afraid to do because I thought it would actually overlap the sections to the point where actual data started dissapearing.. It seems to work though, I will be comparing it against other encodes made without MT("filter").. It is interesting to note, that I can actually use my PC for web browsing and other tasks when using MT("filter") - as opposed to SetMTMode which makes the entire thing slow down to a bog crawl. Even though performance is roughly the same (although MT("filter") seems a bit faster ).. Any reason why that is? And would it be reasonable to expert SetMTMode in the future to not bog down the PC completely even when using all cores?

18th March 2010, 04:55	#206 \| Link
levi Registered User Join Date: Mar 2003 Posts: 116	SetMTMode doesn't bog down my PC. if you are getting weird behavior, i suggest you look in to: limit # of threads - SetMTMode(2,x) - x = threads limit max memory - SetMaxMemory(512) try regulator() post your script

19th March 2010, 20:49	#213 \| Link
osgZach Registered User Join Date: Feb 2009 Location: USA Posts: 676	Are you using the newest compile of Dgdecode? (1.5.8) Don't know if it matters, but not all the iDCT modes work, could that be causing an issue? That's about all I can think of really. I don't have any HD source files to play with.

19th March 2010, 23:53	#215 \| Link
JoshyD Registered User Join Date: Feb 2010 Posts: 84	I think your resize problem is really my fault . . . I noticed I rolled back to an older version of the resample function source when releasing that build. If you could try my latest compile, I'd appreciate it. Let me know if it's still crashing so I know if I need to delve deeper into the matter. @neuron2 This details the differences between x86 and x64. The main thing to watch out for is any assembler that assumes a certain calling convention. In x64, instead of parameter passing on the stack, you get the first four "integer" (pointers, shorts, whatever) in rcx, rdx, r8, and r9. The first four floating point typed arguments are passed in xmm0-xmm3 (I think). There is shadow space created on the stack where you would normally find these variables, but don't go looking there for them. It's just garbage memory to start with, unless you explicitly store the parameter there for later use. Also, all variables on the stack are aligned to 64bits. So, even though int types should take 4 bytes on your stack, 8 are actually allocated. However, when reading these from memory, you can't read all 8 bytes, only the first 4 contain useful memory. Code: ;============================================================================= ;void mmx_merge_luma( unsigned int src, unsigned int luma, int pitch, int luma_pitch,int width, int height ) ;============================================================================= ; parameter 1(src): rcx ; parameter 2(luma): rdx ; parameter 3(pitch): r8d ; parameter 4(luma_pitch): r9d ; parameter 5(width): rsp + 40 ; parameter 6(height): rsp + 48 in this example, getting from memory would look like: Code: mov eax, DWORD [rsp+40] You can't do this: Code: mov rax, QWORD [rsp+40] Because bytes 44-47 are garbage. It's the little things you have to get used to. There are some other oddities, like whenever an extended register is used, a REX prefix accompanies the opcode. If you want to use an extended register (r8-r15) as a counter, dec and inc always treat them as 64bit values, even when specifying: Code: dec r8d or inc r9d You still get a 64 bit add, which is slower. Volatile registers: rax rcx rdx r8 r9 r10 r11 Non volatiles: rbx rbp rdi rsi r12 r13 r14 r15 XMM0-XMM5 are volatile, XMM6-XMM15 are non-volatile. If your compiler supports 64bit binary compilation, then your shouldn't have a problem just taking the source and compiling it as is. Watch out for MSVS's compiler, it takes away inline asm when compiling for x64. Last edited by JoshyD; 20th March 2010 at 00:20.

20th March 2010, 04:47	#216 \| Link
squid_80 Registered User Join Date: Dec 2004 Location: Melbourne, AU Posts: 1,963	64-bit adds/subs aren't slower, that's the whole point of having a processor that uses 64-bit registers.

20th March 2010, 06:55	#217 \| Link
Stephen R. Savage Registered User Join Date: Nov 2009 Posts: 327	I can confirm that I experience "Avisynth: unknown exception" when resizing 1920x1080 to 704x396 with Spline36Resize, using the latest Intel build. 1920x1080 -> 1280x720 -> 704x396 works.

21st March 2010, 02:21	#219 \| Link
yo4kazu Registered User Join Date: Mar 2010 Posts: 8	64bit support so that you can do? _GPU25.dll(source) http://www.mediafire.com/?jdtmm2cjvrm HLSLfile for HD48xx http://www.mediafire.com/?jdtmm2cjvrm original author:thejam79 GPU_001(binary&source) http://www.avisynth.info/?plugin=att...le=GPU_001.zip custom author:gpu25clone _GPU25(binary) http://www.avisynth.info/?plugin=att...nary_Rev53.rar include filter on GPU sharder (GPU_001)BilinearResize,ColorYUY2,Convolution3d,IT,LanczosResize,TemporalSmoother,WNR (_GPU25)2DClean,2DCleanFake,Wavelet,HardwareResize,Blur,Sharp,HFlip,VFlip Last edited by Guest; 1st June 2010 at 13:43.

21st March 2010, 02:29	#220 \| Link
turbojet Registered User Join Date: May 2008 Posts: 1,840	JoshyD a few questions about the separate builds. Does the amd build work on intel cpus? If so, is there a significant speedup of intel over the amd build on intel cpus? If not, any reason for separate builds? If so, do you want me to improve the install script to use amd dll for amd cpus and intel dll for intel cpus? If so, what will be the names of the 2 dll's? Also I could make a self contained exe to install or use a typical installer but I'd have to inject new dll's every time, also typical installers are invasive in the registry. Maybe when the development slows down i'll look into it if you want. Last edited by turbojet; 21st March 2010 at 02:33.