Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
17th March 2010, 20:39 | #201 | Link | |
Registered User
Join Date: Sep 2009
Location: Berlin
Posts: 173
|
Quote:
i'm not a programmer, but i'm a good tester |
|
17th March 2010, 21:18 | #203 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
@neuron2
You can compile 64bit binaries without a 64bit operating system, you just can't test them without a 64bit computer/os The ease of porting the build will depend on how much it relies on assembly language, always considering pointers as the same length as an int, etc. If you want to try, I can find you some good reading material. Also, isn't dg nv tools written in CUDA? I'm not super-familiar with compiling it, is it done via a custom NVIDIA compiler? If that's the case, and it only uses NVIDIA specific language references (I know CUDA is C like, or something) then it may be as easy as changing your compile target architecture to x64. A little background info on how it's written, and I think I can get you going . . . @Audionut That error (the MT plugin is written in russian originally, I think?) usually happens when something goes wrong in the script you're trying to invoke. Syntax error, filter not loaded, something little is missing. That's really the only place in the code I see that error, so I don't know what to tell you. Something's throwing an exception in your chain. @osgZach Thread deadlock should be resolved by tonight. I'm pretty sure it's a result of me giving the compiler free reign to add parallelism where it *thinks* it is safe. I think I've mentioned this in an earlier post, compilers are stupid, I shouldn't have trusted it, but oh well. I can thread it all by hand eventually, but that's not my main concern. I'm working on some performance tweaks here and there that should make everyone smile. Last edited by JoshyD; 17th March 2010 at 21:25. |
17th March 2010, 21:45 | #204 | Link | ||
Guest
Join Date: Jan 2002
Posts: 21,901
|
Sure, please do. There is no assembler in the DLLs and just some basic stuff in DGIndexNV for chroma upsampling.
Quote:
Quote:
Last edited by Guest; 17th March 2010 at 21:48. |
||
17th March 2010, 23:13 | #205 | Link |
Registered User
Join Date: Feb 2009
Location: USA
Posts: 676
|
Joshy, cool will look forward to testing when it's released.
Also, just wanted to say, I know we all have problems with the build but don't let our whining control when you release updates or how often you work on it. I know how easy it is to get burned out on something you start out doing because you want to. I'd rather download an update a week later than a hastily put together crowd pleaser In other news.. I've gone back to using MT("filter") as I appear to have solved the overlap issue by cranking it up to 12.. Which I was initially afraid to do because I thought it would actually overlap the sections to the point where actual data started dissapearing.. It seems to work though, I will be comparing it against other encodes made without MT("filter").. It is interesting to note, that I can actually use my PC for web browsing and other tasks when using MT("filter") - as opposed to SetMTMode which makes the entire thing slow down to a bog crawl. Even though performance is roughly the same (although MT("filter") seems a bit faster ).. Any reason why that is? And would it be reasonable to expert SetMTMode in the future to not bog down the PC completely even when using all cores? |
18th March 2010, 15:47 | #208 | Link |
Registered User
Join Date: Feb 2009
Location: USA
Posts: 676
|
Levi, I will if I have problems in the future. But for now I'm content that its working the way I have it.
The only time SetMtMode didn't bog down the PC was when trying out the x32 builds w/MT. And performance was not really better than single threaded either for the most part, although I saw occassional spikes past 50 under task manager.. Maybe I just have a goofy install of Windows going on, or something like that.. But if its working under MT("filter") then that is OK for now. re: DGindex/NV, The indexer itself runs fine as a 32bit app, its the decoder DLL's that have to be 64bit, and they will have no problems loading the index file as its just a basic text file after all.. Squid's x64 release is just the Dgdecode.dll compiled for x64. Last edited by osgZach; 18th March 2010 at 15:49. |
19th March 2010, 18:21 | #210 | Link |
Registered User
Join Date: Dec 2009
Posts: 72
|
I have a 1080i MPEG2 source that I deinterlace and resize to 512x288.
Vertically resizing to anything less than 720 causes crashes. BilinearResize() did not crash, but every other resizer did. Using the latest Intel build from the top post on an i7 920. Code:
SetMemoryMax(512) Global NewHeight = 288 Global NewWidth = 512 SetMTMode(5) Vid = MPEG2Source("file.d2v") Aud = wavSource("file.wav") SetMtMode(2) AudioDub(Vid,Aud) Spline36Resize(NewWidth,NewHeight) |
19th March 2010, 19:52 | #211 | Link |
Registered User
Join Date: Feb 2009
Location: USA
Posts: 676
|
It doesn't look like your initial SetMTMode call specified the number of threads, try (5,0). You might even be able to get away with Mode 2.. There seems to be conflicting opinions about whether it works properly under any mode less than 5.
Is your input clip progressive? You shouldn't be resizing before a deinterlace also, and I don't know if trying to resize an interlaced clip would mess up MT either ? You could also try resizing before your Dub operation ? Last edited by osgZach; 19th March 2010 at 20:05. |
19th March 2010, 20:22 | #212 | Link | |||
Registered User
Join Date: Dec 2009
Posts: 72
|
Quote:
Quote:
Quote:
Normally (32-bit) the script would be: Code:
Spline36Resize(NewWidth,height,8,0,-8,0) Yadif() Spline36Resize(width,NewHeight,0,-4,0,-4) |
|||
19th March 2010, 20:49 | #213 | Link |
Registered User
Join Date: Feb 2009
Location: USA
Posts: 676
|
Are you using the newest compile of Dgdecode? (1.5.8)
Don't know if it matters, but not all the iDCT modes work, could that be causing an issue? That's about all I can think of really. I don't have any HD source files to play with. |
19th March 2010, 21:06 | #214 | Link | |
Registered User
Join Date: Dec 2009
Posts: 72
|
Downloaded today.
Quote:
It's very simple. Resize height<720 using lanczos, spline, or bicubic -> crashes virtualdub and avs2avi. OS is Win2k8 Server 64-bit. I should be thorough and try an older build (prior to the vertical resizer using SSSE3). <edit> The 3-1-2010 build works fine. Last edited by ifb; 19th March 2010 at 21:51. |
|
19th March 2010, 23:53 | #215 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
I think your resize problem is really my fault . . . I noticed I rolled back to an older version of the resample function source when releasing that build. If you could try my latest compile, I'd appreciate it. Let me know if it's still crashing so I know if I need to delve deeper into the matter.
@neuron2 This details the differences between x86 and x64. The main thing to watch out for is any assembler that assumes a certain calling convention. In x64, instead of parameter passing on the stack, you get the first four "integer" (pointers, shorts, whatever) in rcx, rdx, r8, and r9. The first four floating point typed arguments are passed in xmm0-xmm3 (I think). There is shadow space created on the stack where you would normally find these variables, but don't go looking there for them. It's just garbage memory to start with, unless you explicitly store the parameter there for later use. Also, all variables on the stack are aligned to 64bits. So, even though int types should take 4 bytes on your stack, 8 are actually allocated. However, when reading these from memory, you can't read all 8 bytes, only the first 4 contain useful memory. Code:
;============================================================================= ;void mmx_merge_luma( unsigned int *src, unsigned int *luma, int pitch, int luma_pitch,int width, int height ) ;============================================================================= ; parameter 1(src): rcx ; parameter 2(luma): rdx ; parameter 3(pitch): r8d ; parameter 4(luma_pitch): r9d ; parameter 5(width): rsp + 40 ; parameter 6(height): rsp + 48 Code:
mov eax, DWORD [rsp+40] Code:
mov rax, QWORD [rsp+40] There are some other oddities, like whenever an extended register is used, a REX prefix accompanies the opcode. If you want to use an extended register (r8-r15) as a counter, dec and inc always treat them as 64bit values, even when specifying: Code:
dec r8d or inc r9d Volatile registers: rax rcx rdx r8 r9 r10 r11 Non volatiles: rbx rbp rdi rsi r12 r13 r14 r15 XMM0-XMM5 are volatile, XMM6-XMM15 are non-volatile. If your compiler supports 64bit binary compilation, then your shouldn't have a problem just taking the source and compiling it as is. Watch out for MSVS's compiler, it takes away inline asm when compiling for x64. Last edited by JoshyD; 20th March 2010 at 00:20. |
20th March 2010, 15:43 | #218 | Link | ||
Registered User
Join Date: Feb 2010
Posts: 84
|
My bad on the inc, dec instructions, I thought I had read something in the Intel docs about them, it was actually that some forms of them aren't supported. I think the instruction is encoding may longer when using the extended registers, regardless of which portion of the register you want to inc or dec.
Quote:
Edit: found it, from intel's optimization guidelines Quote:
I'll look into why the resize is throwing an error. What color space is it giving the error in? I can't seem to get the crash in any color space . . . Last edited by JoshyD; 20th March 2010 at 21:15. |
||
21st March 2010, 02:21 | #219 | Link |
Registered User
Join Date: Mar 2010
Posts: 8
|
64bit support so that you can do?
_GPU25.dll(source) http://www.mediafire.com/?jdtmm2cjvrm HLSLfile for HD48xx http://www.mediafire.com/?jdtmm2cjvrm original author:thejam79 GPU_001(binary&source) http://www.avisynth.info/?plugin=att...le=GPU_001.zip custom author:gpu25clone _GPU25(binary) http://www.avisynth.info/?plugin=att...nary_Rev53.rar include filter on GPU sharder (GPU_001)BilinearResize,ColorYUY2,Convolution3d,IT,LanczosResize,TemporalSmoother,WNR (_GPU25)2DClean,2DCleanFake,Wavelet,HardwareResize,Blur,Sharp,HFlip,VFlip Last edited by Guest; 1st June 2010 at 13:43. |
21st March 2010, 02:29 | #220 | Link |
Registered User
Join Date: May 2008
Posts: 1,840
|
JoshyD a few questions about the separate builds.
Does the amd build work on intel cpus? If so, is there a significant speedup of intel over the amd build on intel cpus? If not, any reason for separate builds? If so, do you want me to improve the install script to use amd dll for amd cpus and intel dll for intel cpus? If so, what will be the names of the 2 dll's? Also I could make a self contained exe to install or use a typical installer but I'd have to inject new dll's every time, also typical installers are invasive in the registry. Maybe when the development slows down i'll look into it if you want. Last edited by turbojet; 21st March 2010 at 02:33. |
|
|