Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 17th March 2010, 20:39   #201  |  Link
VincAlastor
Registered User
 
Join Date: Sep 2009
Location: Berlin
Posts: 137
Quote:
Originally Posted by neuron2 View Post
I probably don't need a 64 bit OS to make the build. But I do need someone with 64-bit expertise to tell me how to make such builds. I have never made a 64-bit anything.
i hope there is an expert. don't be angry about me, i think you could understand me, that i wish a x64 version of your modern source filters

i'm not a programmer, but i'm a good tester
VincAlastor is offline   Reply With Quote
Old 17th March 2010, 20:40   #202  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,924
I'm not angry! I'll be happy to make a 64-bit build if someone is willing to educate me about it.
Guest is offline   Reply With Quote
Old 17th March 2010, 21:18   #203  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
@neuron2
You can compile 64bit binaries without a 64bit operating system, you just can't test them without a 64bit computer/os The ease of porting the build will depend on how much it relies on assembly language, always considering pointers as the same length as an int, etc. If you want to try, I can find you some good reading material.

Also, isn't dg nv tools written in CUDA? I'm not super-familiar with compiling it, is it done via a custom NVIDIA compiler? If that's the case, and it only uses NVIDIA specific language references (I know CUDA is C like, or something) then it may be as easy as changing your compile target architecture to x64.

A little background info on how it's written, and I think I can get you going . . .

@Audionut
That error (the MT plugin is written in russian originally, I think?) usually happens when something goes wrong in the script you're trying to invoke. Syntax error, filter not loaded, something little is missing. That's really the only place in the code I see that error, so I don't know what to tell you. Something's throwing an exception in your chain.

@osgZach
Thread deadlock should be resolved by tonight. I'm pretty sure it's a result of me giving the compiler free reign to add parallelism where it *thinks* it is safe. I think I've mentioned this in an earlier post, compilers are stupid, I shouldn't have trusted it, but oh well. I can thread it all by hand eventually, but that's not my main concern. I'm working on some performance tweaks here and there that should make everyone smile.

Last edited by JoshyD; 17th March 2010 at 21:25.
JoshyD is offline   Reply With Quote
Old 17th March 2010, 21:45   #204  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,924
Quote:
Originally Posted by JoshyD View Post
If you want to try, I can find you some good reading material.
Sure, please do. There is no assembler in the DLLs and just some basic stuff in DGIndexNV for chroma upsampling.

Quote:
Also, isn't dg nv tools written in CUDA?
No. It uses the CUVID API.

Quote:
A little background info on how it's written, and I think I can get you going . . .
See above.

Last edited by Guest; 17th March 2010 at 21:48.
Guest is offline   Reply With Quote
Old 17th March 2010, 23:13   #205  |  Link
osgZach
Registered User
 
Join Date: Feb 2009
Location: Waterloo, WI - USA
Posts: 649
Joshy, cool will look forward to testing when it's released.

Also, just wanted to say, I know we all have problems with the build but don't let our whining control when you release updates or how often you work on it. I know how easy it is to get burned out on something you start out doing because you want to. I'd rather download an update a week later than a hastily put together crowd pleaser

In other news.. I've gone back to using MT("filter") as I appear to have solved the overlap issue by cranking it up to 12.. Which I was initially afraid to do because I thought it would actually overlap the sections to the point where actual data started dissapearing.. It seems to work though, I will be comparing it against other encodes made without MT("filter")..

It is interesting to note, that I can actually use my PC for web browsing and other tasks when using MT("filter") - as opposed to SetMTMode which makes the entire thing slow down to a bog crawl. Even though performance is roughly the same (although MT("filter") seems a bit faster ).. Any reason why that is? And would it be reasonable to expert SetMTMode in the future to not bog down the PC completely even when using all cores?
osgZach is offline   Reply With Quote
Old 18th March 2010, 04:55   #206  |  Link
levi
Registered User
 
Join Date: Mar 2003
Posts: 116
SetMTMode doesn't bog down my PC. if you are getting weird behavior, i suggest you look in to:

limit # of threads - SetMTMode(2,x) - x = threads
limit max memory - SetMaxMemory(512)
try regulator()
post your script

levi is offline   Reply With Quote
Old 18th March 2010, 14:44   #207  |  Link
paulvdb
Registered User
 
Join Date: Jan 2005
Posts: 33
Quote:
Originally Posted by neuron2 View Post
Sure, please do. There is no assembler in the DLLs and just some basic stuff in DGIndexNV for chroma upsampling.
Only the DLLs have to be 64-bit because they have to be loaded by 64-bit avisynth. You can still use the 32-bit indexer. At least that's how squid80 did it with DGMPGDec.
paulvdb is offline   Reply With Quote
Old 18th March 2010, 15:47   #208  |  Link
osgZach
Registered User
 
Join Date: Feb 2009
Location: Waterloo, WI - USA
Posts: 649
Levi, I will if I have problems in the future. But for now I'm content that its working the way I have it.

The only time SetMtMode didn't bog down the PC was when trying out the x32 builds w/MT. And performance was not really better than single threaded either for the most part, although I saw occassional spikes past 50 under task manager.. Maybe I just have a goofy install of Windows going on, or something like that..

But if its working under MT("filter") then that is OK for now.


re: DGindex/NV, The indexer itself runs fine as a 32bit app, its the decoder DLL's that have to be 64bit, and they will have no problems loading the index file as its just a basic text file after all.. Squid's x64 release is just the Dgdecode.dll compiled for x64.

Last edited by osgZach; 18th March 2010 at 15:49.
osgZach is offline   Reply With Quote
Old 18th March 2010, 17:52   #209  |  Link
Hiritsuki
Novice of AVS
 
Join Date: Oct 2009
Posts: 156
I just waiting tnlmeans x64 ver.
and it's release I change use X86 MT to X64 MT
__________________
My PC
Hiritsuki is offline   Reply With Quote
Old 19th March 2010, 18:21   #210  |  Link
ifb
Registered User
 
Join Date: Dec 2009
Posts: 40
I have a 1080i MPEG2 source that I deinterlace and resize to 512x288.

Vertically resizing to anything less than 720 causes crashes. BilinearResize() did not crash, but every other resizer did.

Using the latest Intel build from the top post on an i7 920.

Code:
SetMemoryMax(512)
Global NewHeight = 288
Global NewWidth = 512
SetMTMode(5)

Vid = MPEG2Source("file.d2v")
Aud = wavSource("file.wav")
SetMtMode(2)
AudioDub(Vid,Aud)
Spline36Resize(NewWidth,NewHeight)
ifb is offline   Reply With Quote
Old 19th March 2010, 19:52   #211  |  Link
osgZach
Registered User
 
Join Date: Feb 2009
Location: Waterloo, WI - USA
Posts: 649
It doesn't look like your initial SetMTMode call specified the number of threads, try (5,0). You might even be able to get away with Mode 2.. There seems to be conflicting opinions about whether it works properly under any mode less than 5.

Is your input clip progressive? You shouldn't be resizing before a deinterlace also, and I don't know if trying to resize an interlaced clip would mess up MT either ?

You could also try resizing before your Dub operation ?

Last edited by osgZach; 19th March 2010 at 20:05.
osgZach is offline   Reply With Quote
Old 19th March 2010, 20:22   #212  |  Link
ifb
Registered User
 
Join Date: Dec 2009
Posts: 40
Quote:
Originally Posted by osgZach View Post
It doesn't look like your initial SetMTMode call specified the number of threads, try (5,0).
Doesn't change anything. I've never had to specify thread number, even with vanilla avisynth.
Quote:
You might even be able to get away with Mode 2.. There seems to be conflicting opinions about whether it works properly under any mode less than 5.
Not setting mode 5 before DGSource reduces speed dramatically, even when changing to mode 2 later in the script.
Quote:
Is your input clip progressive? You shouldn't be resizing before a deinterlace also, and I don't know if trying to resize an interlaced clip would mess up MT either ?
It's 1080i, like I said. I removed the deinterlacer for simplicity. I can resize horizontally all I want, just not vertically to a value less than 720.

Normally (32-bit) the script would be:
Code:
Spline36Resize(NewWidth,height,8,0,-8,0)
Yadif()
Spline36Resize(width,NewHeight,0,-4,0,-4)
Leaving SetMTMode() out completely, causes virtualdub to get green frames instead of crashing completely on open.
ifb is offline   Reply With Quote
Old 19th March 2010, 20:49   #213  |  Link
osgZach
Registered User
 
Join Date: Feb 2009
Location: Waterloo, WI - USA
Posts: 649
Are you using the newest compile of Dgdecode? (1.5.8)

Don't know if it matters, but not all the iDCT modes work, could that be causing an issue?

That's about all I can think of really. I don't have any HD source files to play with.
osgZach is offline   Reply With Quote
Old 19th March 2010, 21:06   #214  |  Link
ifb
Registered User
 
Join Date: Dec 2009
Posts: 40
Quote:
Originally Posted by osgZach View Post
Are you using the newest compile of Dgdecode? (1.5.8)
Downloaded today.
Quote:
Don't know if it matters, but not all the iDCT modes work, could that be causing an issue?
No, because then it wouldn't decode anything (or the output would be trash).

It's very simple. Resize height<720 using lanczos, spline, or bicubic -> crashes virtualdub and avs2avi.

OS is Win2k8 Server 64-bit.

I should be thorough and try an older build (prior to the vertical resizer using SSSE3).

<edit>
The 3-1-2010 build works fine.

Last edited by ifb; 19th March 2010 at 21:51.
ifb is offline   Reply With Quote
Old 19th March 2010, 23:53   #215  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
I think your resize problem is really my fault . . . I noticed I rolled back to an older version of the resample function source when releasing that build. If you could try my latest compile, I'd appreciate it. Let me know if it's still crashing so I know if I need to delve deeper into the matter.

@neuron2
This details the differences between x86 and x64. The main thing to watch out for is any assembler that assumes a certain calling convention. In x64, instead of parameter passing on the stack, you get the first four "integer" (pointers, shorts, whatever) in rcx, rdx, r8, and r9. The first four floating point typed arguments are passed in xmm0-xmm3 (I think). There is shadow space created on the stack where you would normally find these variables, but don't go looking there for them. It's just garbage memory to start with, unless you explicitly store the parameter there for later use. Also, all variables on the stack are aligned to 64bits. So, even though int types should take 4 bytes on your stack, 8 are actually allocated. However, when reading these from memory, you can't read all 8 bytes, only the first 4 contain useful memory.
Code:
;=============================================================================
;void mmx_merge_luma( unsigned int *src, unsigned int *luma, int pitch, int luma_pitch,int width, int height )
;=============================================================================
; parameter 1(src): rcx
; parameter 2(luma): rdx
; parameter 3(pitch): r8d
; parameter 4(luma_pitch): r9d 
; parameter 5(width): rsp + 40
; parameter 6(height): rsp + 48
in this example, getting from memory would look like:
Code:
mov eax, DWORD [rsp+40]
You can't do this:
Code:
mov rax, QWORD [rsp+40]
Because bytes 44-47 are garbage. It's the little things you have to get used to.

There are some other oddities, like whenever an extended register is used, a REX prefix accompanies the opcode. If you want to use an extended register (r8-r15) as a counter, dec and inc always treat them as 64bit values, even when specifying:

Code:
dec r8d
or
inc r9d
You still get a 64 bit add, which is slower.

Volatile registers:
rax
rcx
rdx
r8
r9
r10
r11

Non volatiles:
rbx
rbp
rdi
rsi
r12
r13
r14
r15

XMM0-XMM5 are volatile, XMM6-XMM15 are non-volatile.

If your compiler supports 64bit binary compilation, then your shouldn't have a problem just taking the source and compiling it as is.

Watch out for MSVS's compiler, it takes away inline asm when compiling for x64.

Last edited by JoshyD; 20th March 2010 at 00:20.
JoshyD is offline   Reply With Quote
Old 20th March 2010, 04:47   #216  |  Link
squid_80
Registered User
 
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
64-bit adds/subs aren't slower, that's the whole point of having a processor that uses 64-bit registers.
squid_80 is offline   Reply With Quote
Old 20th March 2010, 06:55   #217  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 313
I can confirm that I experience "Avisynth: unknown exception" when resizing 1920x1080 to 704x396 with Spline36Resize, using the latest Intel build. 1920x1080 -> 1280x720 -> 704x396 works.
Stephen R. Savage is offline   Reply With Quote
Old 20th March 2010, 15:43   #218  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
My bad on the inc, dec instructions, I thought I had read something in the Intel docs about them, it was actually that some forms of them aren't supported. I think the instruction is encoding may longer when using the extended registers, regardless of which portion of the register you want to inc or dec.

Quote:
The INC and DEC instructions are supported in 64-bit mode. However, some forms of INC and DEC (the register operand being encoded using register extension field in the MOD R/M byte) are not encodable in 64-bit mode because the opcodes are treated as REX prefixes.
There's also something about register dependency breaking (the whole register renaming in an out of order core thing) if you modify all the flags with an ADD or SUB rather than just an INC or DEC, I don't remember exactly what it is.
Edit: found it, from intel's optimization guidelines
Quote:
Assembly/Compiler Coding Rule 32. (M impact, H generality) INC and DEC instructions should be replaced with ADD or SUB instructions, because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore creating false dependencies on earlier instructions that set the flags.

I'll look into why the resize is throwing an error. What color space is it giving the error in? I can't seem to get the crash in any color space . . .

Last edited by JoshyD; 20th March 2010 at 21:15.
JoshyD is offline   Reply With Quote
Old 21st March 2010, 02:21   #219  |  Link
yo4kazu
Registered User
 
Join Date: Mar 2010
Posts: 8
64bit support so that you can do?

_GPU25.dll(source)
http://www.mediafire.com/?jdtmm2cjvrm

HLSLfile for HD48xx
http://www.mediafire.com/?jdtmm2cjvrm

original author:thejam79
GPU_001(binary&source)
http://www.avisynth.info/?plugin=att...le=GPU_001.zip
custom author:gpu25clone
_GPU25(binary)
http://www.avisynth.info/?plugin=att...nary_Rev53.rar

include filter on GPU sharder
(GPU_001)BilinearResize,ColorYUY2,Convolution3d,IT,LanczosResize,TemporalSmoother,WNR
(_GPU25)2DClean,2DCleanFake,Wavelet,HardwareResize,Blur,Sharp,HFlip,VFlip

Last edited by Guest; 1st June 2010 at 13:43.
yo4kazu is offline   Reply With Quote
Old 21st March 2010, 02:29   #220  |  Link
turbojet
Registered User
 
Join Date: May 2008
Posts: 1,840
JoshyD a few questions about the separate builds.

Does the amd build work on intel cpus? If so, is there a significant speedup of intel over the amd build on intel cpus? If not, any reason for separate builds? If so, do you want me to improve the install script to use amd dll for amd cpus and intel dll for intel cpus? If so, what will be the names of the 2 dll's?

Also I could make a self contained exe to install or use a typical installer but I'd have to inject new dll's every time, also typical installers are invasive in the registry. Maybe when the development slows down i'll look into it if you want.

Last edited by turbojet; 21st March 2010 at 02:33.
turbojet is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:08.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.