Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 13th March 2010, 18:59   #141  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by turbojet View Post
JoshyD: The SSE2 build works. I don't think SSE2 is what AviSynth 2.60 is using during resize judging by these results

2.58-x86: 51.51
2.60a2-x86: 57.77
+12%

3-1-10-x64: 49.25
SSE2-x64: 51.07
+3%

If you plan on suggesting the SSE2 build for AMD CPU's for now you might want to link libiomp5md.dll if you can, it's not an easy dll to find.
It isn't . . . it uses the same opcodes my resizers use, which is why I'm at a loss as to why my resizers cause your computer to error.

Avisynth 2.6 uses SSE3 as long as the FIR filter size is below 8, which is a pretty decent cutoff. Resizers that need larger filter sizes are normally pretty rare. TGMC beta 2 goes up to size 9 for my test vectors.

Also, my compiler settings shouldn't allow your Athlon down any Intel code paths. The intel specific code paths, from what I can gather by reading icc vs gcc posts, are only executed when an intel family processor is detected. Any of Intel's "special" code should look at your AMD processor, read the vendor ID, and send it down a generic code path.

Also, aegisofrime seems to indicate that he has it running on his Phenom II x4 in the post above. kemuri-_9's system specs indicate he's running a Phenom II x4 as well, and he hasn't voiced any resize specific problems as of yet. Your Athlon II is a stripped down phenom II core, I believe. I'm a bit baffled.

I usually link Open-MP statically, that SSE2 only build is really old and I probably omitted the compiler flag for it. It has to be manually entered, because Open-MP, when used by multiple plugins, will error if they're all statically linked. For this reason, I haven't built any of the plugins with Open-MP directives. EEDI2 in particular enjoys a massive speed gain if you let it run with multiple threads.

Quote:
you seem to have simply ported over the stack corruption checking that's supposed to check for the plugin being stdcall (and following that convention)
which you had already pointed out that win x64 does not follow this convention so what's going on here?
You are correct, I didn't touch any of these routines, they'll definitely need to be changed. I hadn't even thought about loading C plugins with this dll until you expressed interest in getting a 64bit port of FFMS2 working. I was updating on an "as needed" basis, it's just a lot of code to sift through, I can't get it all in one shot. I grabbed the FFMS2 source, are you building it with MinGW? There's a MSVS project in the svn checkout, but even with C99 support, I'm missing some headers, and wanted to be consistent with your build environment, for testing/debugging and such. A compiled x64 dll would let me run through my source to tie any loose ends up, if you could either a) just link an x64 dll or b) fill me in on how you're compiling, I'd greatly appreciate it, and can get the avisynth core changed ASAP.
JoshyD is offline   Reply With Quote
Old 13th March 2010, 19:33   #142  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
Quote:
Originally Posted by JoshyD View Post
Also, aegisofrime seems to indicate that he has it running on his Phenom II x4 in the post above. kemuri-_9's system specs indicate he's running a Phenom II x4 as well, and he hasn't voiced any resize specific problems as of yet. Your Athlon II is a stripped down phenom II core, I believe. I'm a bit baffled.
I've been mostly sitting on the sidelines on this and not actively testing so don't pull me into any arguments as proof of something!
(I only got the binary and started working with it for trying the ffms2 plugin just yesterday!)
I have 3 pcs consisting of PhenomII x4, Phenom x4, and athlon64 x2 all running on x64 versions of windows, so i can do testing from the AMD side of things if the need arises....
(as i usually do this for x264 as the other x264 devs mostly use Intel and/or linux)

Quote:
You are correct, I didn't touch any of these routines, they'll definitely need to be changed. I hadn't even thought about loading C plugins with this dll until you expressed interest in getting a 64bit port of FFMS2 working. I was updating on an "as needed" basis, it's just a lot of code to sift through, I can't get it all in one shot. I grabbed the FFMS2 source, are you building it with MinGW? There's a MSVS project in the svn checkout, but even with C99 support, I'm missing some headers, and wanted to be consistent with your build environment, for testing/debugging and such. A compiled x64 dll would let me run through my source to tie any loose ends up, if you could either a) just link an x64 dll or b) fill me in on how you're compiling, I'd greatly appreciate it, and can get the avisynth core changed ASAP.
yes, I'm building with MinGW completely as per the reasoning of http://doom10.org/index.php?topic=25.msg1730#msg1730
ffms2.dll: x64 testing binary
aforementioned LoadCPlugin plugin: x64 binary src
__________________
custom x264 builds & patches | F@H | My Specs
kemuri-_9 is offline   Reply With Quote
Old 13th March 2010, 21:28   #143  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
@kemuri-_9
Being the king of AMD PC's that you are, would you mind pulling a quick vertical resize test on any source to see if it's broken across the board for AMD users? If you could pull the latest binary (I uploaded a new one today), it'd be helpful to see if something funny is happening on the AMD side of things. You'll need it to properly test ffms2 anyway .

Giving a quick look at your loadCplugin vs the current one that checks stack corruption, I decided to just drop your LoadCPlugin function in as a replacement. As a result, I've got your ffms2.dll loading and the few tests I've run (various sources with some post processing effects, etc) have it running, with some oddities. Running a debug build of avisynth through through MSVS's debugger points to some code in ffms2.dll that is making illegal memory accesses. However, frames come through and appear correctly when running a release build of avisynth. There are no memory access violations thrown.

The oddity is that the 1st frame (frame 0) comes through as garbage. Moving around the source a bit, and then coming back to frame 0 has the frame rendering correctly. I figured I'd let you look it over, it may be my avisynth not initializing your plugin correctly. As you've probably gathered, I'm not too familiar with the plugin loading code of avisynth. My main focus has been on optimizing the calculation and memory heavy routines, which aren't usually hanging out in the core code.

Oh, and in case you needed to see what I was building: a quick snapshot of my source. The C plugin routines still check for/specify fastcall and stdcall in places, but icc ignores these types when compiling 64bit. I thought it would be safe to leave them in if nothing was breaking as a result.

It is very cool to have ffms2 working somewhat correctly though, it can't be far off from having full blown functionality.

Last edited by JoshyD; 13th March 2010 at 21:38.
JoshyD is offline   Reply With Quote
Old 14th March 2010, 02:15   #144  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
Quote:
Originally Posted by JoshyD View Post
@kemuri-_9
Being the king of AMD PC's that you are, would you mind pulling a quick vertical resize test on any source to see if it's broken across the board for AMD users? If you could pull the latest binary (I uploaded a new one today), it'd be helpful to see if something funny is happening on the AMD side of things. You'll need it to properly test ffms2 anyway .
har har, i have SWScale to resize with instead!
but yes, trying something like Lanczos4Resize(Width(last),Height(last)*2) is throwing
"Avisynth Unknown exceptions" exceptions here on my phenomII and athlon64 machines with this new build.

Quote:
Giving a quick look at your loadCplugin vs the current one that checks stack corruption, I decided to just drop your LoadCPlugin function in as a replacement. As a result, I've got your ffms2.dll loading and the few tests I've run (various sources with some post processing effects, etc) have it running, with some oddities. Running a debug build of avisynth through through MSVS's debugger points to some code in ffms2.dll that is making illegal memory accesses. However, frames come through and appear correctly when running a release build of avisynth. There are no memory access violations thrown.

The oddity is that the 1st frame (frame 0) comes through as garbage. Moving around the source a bit, and then coming back to frame 0 has the frame rendering correctly. I figured I'd let you look it over, it may be my avisynth not initializing your plugin correctly. As you've probably gathered, I'm not too familiar with the plugin loading code of avisynth. My main focus has been on optimizing the calculation and memory heavy routines, which aren't usually hanging out in the core code.
I'm not experiencing either of these, though i can't manage to compile the source code you provided (I do have ICL 11.1.051) due to missing convert_a64.asm to make a debug build
__________________
custom x264 builds & patches | F@H | My Specs

Last edited by kemuri-_9; 14th March 2010 at 02:20.
kemuri-_9 is offline   Reply With Quote
Old 14th March 2010, 03:12   #145  |  Link
aegisofrime
Registered User
 
Join Date: Apr 2009
Posts: 455
JoshyD, sorry for any confusion but the tests were on my Core 2 Duo machine, not my Phenom II machine. I do intend to test it on my Phenom II rig after my current encoding run is completed.
aegisofrime is offline   Reply With Quote
Old 14th March 2010, 04:09   #146  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
kemuri-_9
Whoops, I keep a CVS server running for my own code, forgot to add it to the tree. Here it is on it's own. Here's the source again, just to be on the safe side.

If you can point me to the code making AMD processors so unhappy, I'd really really appreciate it.
JoshyD is offline   Reply With Quote
Old 14th March 2010, 06:19   #147  |  Link
turbojet
Registered User
 
Join Date: May 2008
Posts: 1,840
Thanks levi for pointing to a newer RePAL version that outputs 25fps. About a year ago I had a blended pal->ntsc dvd that repal didn't handle all that well and tried SRestore to find worse results but maybe SRestore has improved since then. I agree repal should be low priority considering the low percentage of pal->ntsc sources.

Another filter that I've started to use lately is autocrop which would be nice for avisynth64 and required for x264 input imo without it you need to depend on external programs to get crop values. Maybe something like --autocrop:width(none = no resize):resizer(lanczos default):mod(default 16) some examples with a 1920x1080 2.35:1 source:
--autocrop:1280 = lanczosresize(1280,544,0,132,0,-132)
--autocrop:1920 = --autocrop = crop(0,132,0,-132) (this should undercrop to mod set, since it's still higher quality then resizing)
if the source is 1920x1080 1.78:1 --autocrop:1920 = 1920x1080 with no crop/resize (mod8 input with no crop/resize would always be mod8 output, same for mod4/2?)
though something like --crop and --resize would be helpful in cases where autocrop doesn't work (which I haven't ran into yet). I really don't understand the need for --cli-filter prefix however.

Thanks JoshyD for tivtc I'll try it on some sources next week.

Last edited by turbojet; 14th March 2010 at 06:31.
turbojet is offline   Reply With Quote
Old 14th March 2010, 07:15   #148  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 341
I just tried the new plugins, and I can say that all of them passed a quick test without issues. However, I do have a question regarding the "threads" parameter to some of these filters. Does the threads parameter get ignored on your Avs64 build, JoshyD?

Now if only we had nnedi2

Last edited by Stephen R. Savage; 14th March 2010 at 07:17.
Stephen R. Savage is offline   Reply With Quote
Old 14th March 2010, 08:01   #149  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Nope, they'll thread themselves independently of the main avisynth dll. It's a bit of a balancing act that the user has to do to make all processors stay busy, while not choking them with too many threads. NNEDI2 would be nice, but I don't think we're going to see that anytime soon.

For now, I really want to figure out why AMD processors can't run the resizers in current binary . . . it's has to be something really simple that I'm missing.

Last edited by JoshyD; 14th March 2010 at 08:07.
JoshyD is offline   Reply With Quote
Old 14th March 2010, 09:01   #150  |  Link
mavinashbabu
Registered User
 
Join Date: Mar 2007
Posts: 35
Hi,


thread title says x86_64 so can i assume that it works on 32 bit versions of avisynth as well. can anyone confirm please.

Thanks,
mavinashbabu is offline   Reply With Quote
Old 14th March 2010, 09:16   #151  |  Link
aegisofrime
Registered User
 
Join Date: Apr 2009
Posts: 455
Quote:
Originally Posted by mavinashbabu View Post
Hi,


thread title says x86_64 so can i assume that it works on 32 bit versions of avisynth as well. can anyone confirm please.

Thanks,
Nope, it doesn't. You can't mix 64-bit filters with 32-bit filters, AFAIK.
aegisofrime is offline   Reply With Quote
Old 14th March 2010, 10:36   #152  |  Link
osgZach
Registered User
 
Join Date: Feb 2009
Location: USA
Posts: 658
I did a clean download last night and grabbed everything again. To make sure I had all the recent updates.

Did a test last night.. Took 2h:37m. And no crash. LEt's hope it stays that way

The problem did not appear to be source access either, I think it was something like Trim or Decimate.. I'm guessing trim. I set Mode 3 after TGMC

Code:
LoadPlugin("C:\yatta\plugins64\decomb.dll")
LoadPlugin("C:\yatta\plugins64\dgdecode.dll")
LoadPlugin("C:\yatta\plugins64\telecidehints.dll")
LoadPlugin("C:\yatta\plugins64\fieldhint.dll")


function Preset0(clip c) {
#Name: Default
c
return last
}
SetMTMode(2,0)
DGDecode_Mpeg2Source("L:\Ep 01\VTS_01_1.d2v")



FieldHint(ovr="L:\Ep 01\VTS_01_1.d2v.fh.txt")

#MT("TempGaussMC_beta2().SelectEven()",threads=2,overlap=4)
TempGaussMC_beta2().SelectEven()
SetMTMode(3)

PresetClip0=Preset0()

PresetClip0.Trim(0,41023)


DClip = Decimate(cycle=5,quality=3,ovr="L:\Ep 01\VTS_01_1.d2v.dec.txt").assumefps(last.framerate)
osgZach is offline   Reply With Quote
Old 14th March 2010, 11:47   #153  |  Link
squid_80
Registered User
 
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
Are you perhaps confusing SSSE3 instructions with SSE3? I think you'll find the pmulhrsw instructions are the issue.
squid_80 is offline   Reply With Quote
Old 14th March 2010, 15:29   #154  |  Link
levi
Registered User
 
Join Date: Mar 2003
Posts: 116
Xeon quad core E5530 2.40 ghz w/ turbo(hyperthreading)

x264(x86) + set avisynth 2.6(x86)
First Pass Output to null = 27.18

x264(x64) + JoshyD avisynth 3-13-10(x64)
First Pass Output to null = 29.54

I've seen an 8% speed improvement

Code:
SetMTmode(3,3)
mpeg2source("my.d2v")
SetMTmode(2,3)
tdeint()
crop(4,4,1916,1076)
LanczosResize(1280,720)
levi is offline   Reply With Quote
Old 14th March 2010, 17:29   #155  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
@Squid_80
You've got it. It's pshufb that throws the first error on my (really old) Athlon64. Then, those pmulrhrw's probably cause an issue as well. Reading some programming message boards, apparently pshufb has been giving AMD developers trouble. I'd guess the subtleties between SSE3 and SSSE3 do as well. Back to the drawing board for AMD people. Thanks for taking a look, being the only eyes looking over your source can drive a person a bit crazy.

@levi
Why only 3 threads? Are you accounting for x264 also taking up some cores? I think x264's defaults to creating 1.5x the amount of threads it detects your system can run. I was hoping for some more speed gains. Any chance on seeing some straight single threading tests on the same system? I want to re-build tdeint to see if I can't eek out some more performance as well.

@osgZach
It's great that there aren't any crashes anymore! I don't think trimming frames from a source should kill multithreading, but decimate has an unusual access pattern. I guess you can't win them all.

Last edited by JoshyD; 14th March 2010 at 17:36.
JoshyD is offline   Reply With Quote
Old 14th March 2010, 17:37   #156  |  Link
osgZach
Registered User
 
Join Date: Feb 2009
Location: USA
Posts: 658
Crap keyboard.. lost my post to ill-place Forward/Backward keys...

Anyway. Its technically 2 different sources being mixed in, so whatever the case the problem probably lies somewhere in there, with one or both of them.. I know Mode 3 was recommended for Trim at the very least.

Any chance of grabbing the latest DGindex/decode source (1.5.8) and seeing if it will compile for x64? I'd give it a shot but if it needed changes I'd be clueless about that stuff..

It would certainly make managing Yatta a lot easier (right now I have to setup two copies, using older x86 source to make projects, since the x64 DLL we have is only 1.4.6).
If anyone that has time could look into it for that matter, it'd be great.

Last edited by osgZach; 14th March 2010 at 17:40.
osgZach is offline   Reply With Quote
Old 14th March 2010, 18:08   #157  |  Link
squid_80
Registered User
 
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
I just had a look on my HDD and it seems at some point I did make an x64 build of DGDecode 1.5.4: http://www.mediafire.com/?dl4fc2yyyzz

It seems to work but I have no idea if it's faster than the old build or if anything apart from the .d2v identifier was changed. Too bad the author has no regard for backwards compatibility.
squid_80 is offline   Reply With Quote
Old 14th March 2010, 18:43   #158  |  Link
noee
Registered User
 
Join Date: Jan 2007
Posts: 530
FWIW, just did a test with an SD MPEG2 source, encoding with x264, using the 3/13 AVISynth64:

AVS:
Code:
#LoadPlugin("C:\Program Files (x86)\AviSynth 2.5\plugins\mvtools2.dll")
LoadPlugin("C:\Program Files (x86)\AviSynth 2.5\plugins\DGDecode.dll")
LoadPlugin("C:\Program Files (x86)\AviSynth 2.5\plugins\tivtc.dll")
SetMTmode(2,8)
Mpeg2Source("lotr.d2v")
#Insert Deinterlacer
tfm(last,d2v="lotr.d2v").tdecimate()
#Applying Resizing
LanczosResize(720,352,0,62,-0,-66)
x264 v1471
64: x264-64bit.exe --crf 20 --preset medium --threads auto --tune film --sar 32:27 --output "C:\Temp\lotr64.mkv" lotr64.avs
32: x264.exe --crf 20 --preset medium --threads auto --tune film --sar 32:27 --output "C:\Temp\lotr.mkv" lotr.avs

64bit chain: encoded 16356 frames, 79.40 fps, 1345.89 kb/s
32bit chain: encoded 16356 frames, 70.57 fps, 1345.89 kb/s

Using Athlon II (620) O/C'd to 3.5Ghz
Win7 Ult x64
noee is offline   Reply With Quote
Old 14th March 2010, 19:14   #159  |  Link
osgZach
Registered User
 
Join Date: Feb 2009
Location: USA
Posts: 658
Thanks Squid, I suppose its better than nothing

I don't have any issues with using old versions, as long as nothing major has changed since then.. But when I open the D2V files from two different versions and they look different it kind of makes me nervous I'm not getting the best indexing of my source.
osgZach is offline   Reply With Quote
Old 14th March 2010, 19:47   #160  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
@noee
I'm guessing this means that you've gotten tivtc with working results? If so, that's great news. Those results seem about in line with expected. Somewhere between 10-20% faster when using x64 code.

@Squid80
You've got a DGDecode listed on your webpage along with source, but checking the version info indicates it's 1.4.6. Any chance you have the 1.5.6 source on hand and I could take a peek at it? Also, may I add that to the first post?

@kemuri-_9
Can I link your FFMS2.dll on the first post? I can move it over to mediafire if you don't want to waste bandwidth on hosting it locally.

@turbojet
Autocrop is built and working for me. Link is on the first post.

Last edited by JoshyD; 14th March 2010 at 20:20.
JoshyD is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:27.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.