Doom9's Forum - View Single Post - SEt's Avisynth 2.5.8 MT compiled for *X86_64*, Latest Build 4/16/2010

JoshyD · 19th March 2010, 23:53

I think your resize problem is really my fault . . . I noticed I rolled back to an older version of the resample function source when releasing that build. If you could try my latest compile, I'd appreciate it. Let me know if it's still crashing so I know if I need to delve deeper into the matter.

@neuron2
This details the differences between x86 and x64. The main thing to watch out for is any assembler that assumes a certain calling convention. In x64, instead of parameter passing on the stack, you get the first four "integer" (pointers, shorts, whatever) in rcx, rdx, r8, and r9. The first four floating point typed arguments are passed in xmm0-xmm3 (I think). There is shadow space created on the stack where you would normally find these variables, but don't go looking there for them. It's just garbage memory to start with, unless you explicitly store the parameter there for later use. Also, all variables on the stack are aligned to 64bits. So, even though int types should take 4 bytes on your stack, 8 are actually allocated. However, when reading these from memory, you can't read all 8 bytes, only the first 4 contain useful memory.

Code:

;=============================================================================
;void mmx_merge_luma( unsigned int *src, unsigned int *luma, int pitch, int luma_pitch,int width, int height )
;=============================================================================
; parameter 1(src): rcx
; parameter 2(luma): rdx
; parameter 3(pitch): r8d
; parameter 4(luma_pitch): r9d 
; parameter 5(width): rsp + 40
; parameter 6(height): rsp + 48

in this example, getting from memory would look like:

Code:

mov eax, DWORD [rsp+40]

You can't do this:

Code:

mov rax, QWORD [rsp+40]

Because bytes 44-47 are garbage. It's the little things you have to get used to.

There are some other oddities, like whenever an extended register is used, a REX prefix accompanies the opcode. If you want to use an extended register (r8-r15) as a counter, dec and inc always treat them as 64bit values, even when specifying:

Code:

dec r8d
or
inc r9d

You still get a 64 bit add, which is slower.

Volatile registers:
rax
rcx
rdx
r8
r9
r10
r11

Non volatiles:
rbx
rbp
rdi
rsi
r12
r13
r14
r15

XMM0-XMM5 are volatile, XMM6-XMM15 are non-volatile.

If your compiler supports 64bit binary compilation, then your shouldn't have a problem just taking the source and compiling it as is.

Watch out for MSVS's compiler, it takes away inline asm when compiling for x64.

19th March 2010, 23:53	#215 \| Link
JoshyD Registered User Join Date: Feb 2010 Posts: 84	I think your resize problem is really my fault . . . I noticed I rolled back to an older version of the resample function source when releasing that build. If you could try my latest compile, I'd appreciate it. Let me know if it's still crashing so I know if I need to delve deeper into the matter. @neuron2 This details the differences between x86 and x64. The main thing to watch out for is any assembler that assumes a certain calling convention. In x64, instead of parameter passing on the stack, you get the first four "integer" (pointers, shorts, whatever) in rcx, rdx, r8, and r9. The first four floating point typed arguments are passed in xmm0-xmm3 (I think). There is shadow space created on the stack where you would normally find these variables, but don't go looking there for them. It's just garbage memory to start with, unless you explicitly store the parameter there for later use. Also, all variables on the stack are aligned to 64bits. So, even though int types should take 4 bytes on your stack, 8 are actually allocated. However, when reading these from memory, you can't read all 8 bytes, only the first 4 contain useful memory. Code: ;============================================================================= ;void mmx_merge_luma( unsigned int src, unsigned int luma, int pitch, int luma_pitch,int width, int height ) ;============================================================================= ; parameter 1(src): rcx ; parameter 2(luma): rdx ; parameter 3(pitch): r8d ; parameter 4(luma_pitch): r9d ; parameter 5(width): rsp + 40 ; parameter 6(height): rsp + 48 in this example, getting from memory would look like: Code: mov eax, DWORD [rsp+40] You can't do this: Code: mov rax, QWORD [rsp+40] Because bytes 44-47 are garbage. It's the little things you have to get used to. There are some other oddities, like whenever an extended register is used, a REX prefix accompanies the opcode. If you want to use an extended register (r8-r15) as a counter, dec and inc always treat them as 64bit values, even when specifying: Code: dec r8d or inc r9d You still get a 64 bit add, which is slower. Volatile registers: rax rcx rdx r8 r9 r10 r11 Non volatiles: rbx rbp rdi rsi r12 r13 r14 r15 XMM0-XMM5 are volatile, XMM6-XMM15 are non-volatile. If your compiler supports 64bit binary compilation, then your shouldn't have a problem just taking the source and compiling it as is. Watch out for MSVS's compiler, it takes away inline asm when compiling for x64. Last edited by JoshyD; 20th March 2010 at 00:20.