Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
13th March 2010, 19:59 | #141 | Link | ||
Registered User
Join Date: Feb 2010
Posts: 84
|
Quote:
Avisynth 2.6 uses SSE3 as long as the FIR filter size is below 8, which is a pretty decent cutoff. Resizers that need larger filter sizes are normally pretty rare. TGMC beta 2 goes up to size 9 for my test vectors. Also, my compiler settings shouldn't allow your Athlon down any Intel code paths. The intel specific code paths, from what I can gather by reading icc vs gcc posts, are only executed when an intel family processor is detected. Any of Intel's "special" code should look at your AMD processor, read the vendor ID, and send it down a generic code path. Also, aegisofrime seems to indicate that he has it running on his Phenom II x4 in the post above. kemuri-_9's system specs indicate he's running a Phenom II x4 as well, and he hasn't voiced any resize specific problems as of yet. Your Athlon II is a stripped down phenom II core, I believe. I'm a bit baffled. I usually link Open-MP statically, that SSE2 only build is really old and I probably omitted the compiler flag for it. It has to be manually entered, because Open-MP, when used by multiple plugins, will error if they're all statically linked. For this reason, I haven't built any of the plugins with Open-MP directives. EEDI2 in particular enjoys a massive speed gain if you let it run with multiple threads. Quote:
|
||
13th March 2010, 20:33 | #142 | Link | ||
Compiling Encoder
Join Date: Jan 2007
Posts: 1,348
|
Quote:
(I only got the binary and started working with it for trying the ffms2 plugin just yesterday!) I have 3 pcs consisting of PhenomII x4, Phenom x4, and athlon64 x2 all running on x64 versions of windows, so i can do testing from the AMD side of things if the need arises.... (as i usually do this for x264 as the other x264 devs mostly use Intel and/or linux) Quote:
ffms2.dll: x64 testing binary aforementioned LoadCPlugin plugin: x64 binary src |
||
13th March 2010, 22:28 | #143 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
@kemuri-_9
Being the king of AMD PC's that you are, would you mind pulling a quick vertical resize test on any source to see if it's broken across the board for AMD users? If you could pull the latest binary (I uploaded a new one today), it'd be helpful to see if something funny is happening on the AMD side of things. You'll need it to properly test ffms2 anyway . Giving a quick look at your loadCplugin vs the current one that checks stack corruption, I decided to just drop your LoadCPlugin function in as a replacement. As a result, I've got your ffms2.dll loading and the few tests I've run (various sources with some post processing effects, etc) have it running, with some oddities. Running a debug build of avisynth through through MSVS's debugger points to some code in ffms2.dll that is making illegal memory accesses. However, frames come through and appear correctly when running a release build of avisynth. There are no memory access violations thrown. The oddity is that the 1st frame (frame 0) comes through as garbage. Moving around the source a bit, and then coming back to frame 0 has the frame rendering correctly. I figured I'd let you look it over, it may be my avisynth not initializing your plugin correctly. As you've probably gathered, I'm not too familiar with the plugin loading code of avisynth. My main focus has been on optimizing the calculation and memory heavy routines, which aren't usually hanging out in the core code. Oh, and in case you needed to see what I was building: a quick snapshot of my source. The C plugin routines still check for/specify fastcall and stdcall in places, but icc ignores these types when compiling 64bit. I thought it would be safe to leave them in if nothing was breaking as a result. It is very cool to have ffms2 working somewhat correctly though, it can't be far off from having full blown functionality. Last edited by JoshyD; 13th March 2010 at 22:38. |
14th March 2010, 03:15 | #144 | Link | ||
Compiling Encoder
Join Date: Jan 2007
Posts: 1,348
|
Quote:
but yes, trying something like Lanczos4Resize(Width(last),Height(last)*2) is throwing "Avisynth Unknown exceptions" exceptions here on my phenomII and athlon64 machines with this new build. Quote:
Last edited by kemuri-_9; 14th March 2010 at 03:20. |
||
14th March 2010, 05:09 | #146 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
kemuri-_9
Whoops, I keep a CVS server running for my own code, forgot to add it to the tree. Here it is on it's own. Here's the source again, just to be on the safe side. If you can point me to the code making AMD processors so unhappy, I'd really really appreciate it. |
14th March 2010, 07:19 | #147 | Link |
Registered User
Join Date: May 2008
Posts: 1,840
|
Thanks levi for pointing to a newer RePAL version that outputs 25fps. About a year ago I had a blended pal->ntsc dvd that repal didn't handle all that well and tried SRestore to find worse results but maybe SRestore has improved since then. I agree repal should be low priority considering the low percentage of pal->ntsc sources.
Another filter that I've started to use lately is autocrop which would be nice for avisynth64 and required for x264 input imo without it you need to depend on external programs to get crop values. Maybe something like --autocrop:width(none = no resize):resizer(lanczos default):mod(default 16) some examples with a 1920x1080 2.35:1 source: --autocrop:1280 = lanczosresize(1280,544,0,132,0,-132) --autocrop:1920 = --autocrop = crop(0,132,0,-132) (this should undercrop to mod set, since it's still higher quality then resizing) if the source is 1920x1080 1.78:1 --autocrop:1920 = 1920x1080 with no crop/resize (mod8 input with no crop/resize would always be mod8 output, same for mod4/2?) though something like --crop and --resize would be helpful in cases where autocrop doesn't work (which I haven't ran into yet). I really don't understand the need for --cli-filter prefix however. Thanks JoshyD for tivtc I'll try it on some sources next week. Last edited by turbojet; 14th March 2010 at 07:31. |
14th March 2010, 08:15 | #148 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
I just tried the new plugins, and I can say that all of them passed a quick test without issues. However, I do have a question regarding the "threads" parameter to some of these filters. Does the threads parameter get ignored on your Avs64 build, JoshyD?
Now if only we had nnedi2 Last edited by Stephen R. Savage; 14th March 2010 at 08:17. |
14th March 2010, 09:01 | #149 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
Nope, they'll thread themselves independently of the main avisynth dll. It's a bit of a balancing act that the user has to do to make all processors stay busy, while not choking them with too many threads. NNEDI2 would be nice, but I don't think we're going to see that anytime soon.
For now, I really want to figure out why AMD processors can't run the resizers in current binary . . . it's has to be something really simple that I'm missing. Last edited by JoshyD; 14th March 2010 at 09:07. |
14th March 2010, 11:36 | #152 | Link |
Registered User
Join Date: Feb 2009
Location: USA
Posts: 676
|
I did a clean download last night and grabbed everything again. To make sure I had all the recent updates.
Did a test last night.. Took 2h:37m. And no crash. LEt's hope it stays that way The problem did not appear to be source access either, I think it was something like Trim or Decimate.. I'm guessing trim. I set Mode 3 after TGMC Code:
LoadPlugin("C:\yatta\plugins64\decomb.dll") LoadPlugin("C:\yatta\plugins64\dgdecode.dll") LoadPlugin("C:\yatta\plugins64\telecidehints.dll") LoadPlugin("C:\yatta\plugins64\fieldhint.dll") function Preset0(clip c) { #Name: Default c return last } SetMTMode(2,0) DGDecode_Mpeg2Source("L:\Ep 01\VTS_01_1.d2v") FieldHint(ovr="L:\Ep 01\VTS_01_1.d2v.fh.txt") #MT("TempGaussMC_beta2().SelectEven()",threads=2,overlap=4) TempGaussMC_beta2().SelectEven() SetMTMode(3) PresetClip0=Preset0() PresetClip0.Trim(0,41023) DClip = Decimate(cycle=5,quality=3,ovr="L:\Ep 01\VTS_01_1.d2v.dec.txt").assumefps(last.framerate) |
14th March 2010, 16:29 | #154 | Link |
Registered User
Join Date: Mar 2003
Posts: 116
|
Xeon quad core E5530 2.40 ghz w/ turbo(hyperthreading)
x264(x86) + set avisynth 2.6(x86) First Pass Output to null = 27.18 x264(x64) + JoshyD avisynth 3-13-10(x64) First Pass Output to null = 29.54 I've seen an 8% speed improvement Code:
SetMTmode(3,3) mpeg2source("my.d2v") SetMTmode(2,3) tdeint() crop(4,4,1916,1076) LanczosResize(1280,720) |
14th March 2010, 18:29 | #155 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
@Squid_80
You've got it. It's pshufb that throws the first error on my (really old) Athlon64. Then, those pmulrhrw's probably cause an issue as well. Reading some programming message boards, apparently pshufb has been giving AMD developers trouble. I'd guess the subtleties between SSE3 and SSSE3 do as well. Back to the drawing board for AMD people. Thanks for taking a look, being the only eyes looking over your source can drive a person a bit crazy. @levi Why only 3 threads? Are you accounting for x264 also taking up some cores? I think x264's defaults to creating 1.5x the amount of threads it detects your system can run. I was hoping for some more speed gains. Any chance on seeing some straight single threading tests on the same system? I want to re-build tdeint to see if I can't eek out some more performance as well. @osgZach It's great that there aren't any crashes anymore! I don't think trimming frames from a source should kill multithreading, but decimate has an unusual access pattern. I guess you can't win them all. Last edited by JoshyD; 14th March 2010 at 18:36. |
14th March 2010, 18:37 | #156 | Link |
Registered User
Join Date: Feb 2009
Location: USA
Posts: 676
|
Crap keyboard.. lost my post to ill-place Forward/Backward keys...
Anyway. Its technically 2 different sources being mixed in, so whatever the case the problem probably lies somewhere in there, with one or both of them.. I know Mode 3 was recommended for Trim at the very least. Any chance of grabbing the latest DGindex/decode source (1.5.8) and seeing if it will compile for x64? I'd give it a shot but if it needed changes I'd be clueless about that stuff.. It would certainly make managing Yatta a lot easier (right now I have to setup two copies, using older x86 source to make projects, since the x64 DLL we have is only 1.4.6). If anyone that has time could look into it for that matter, it'd be great. Last edited by osgZach; 14th March 2010 at 18:40. |
14th March 2010, 19:08 | #157 | Link |
Registered User
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
|
I just had a look on my HDD and it seems at some point I did make an x64 build of DGDecode 1.5.4: http://www.mediafire.com/?dl4fc2yyyzz
It seems to work but I have no idea if it's faster than the old build or if anything apart from the .d2v identifier was changed. Too bad the author has no regard for backwards compatibility. |
14th March 2010, 19:43 | #158 | Link |
Registered User
Join Date: Jan 2007
Posts: 530
|
FWIW, just did a test with an SD MPEG2 source, encoding with x264, using the 3/13 AVISynth64:
AVS: Code:
#LoadPlugin("C:\Program Files (x86)\AviSynth 2.5\plugins\mvtools2.dll") LoadPlugin("C:\Program Files (x86)\AviSynth 2.5\plugins\DGDecode.dll") LoadPlugin("C:\Program Files (x86)\AviSynth 2.5\plugins\tivtc.dll") SetMTmode(2,8) Mpeg2Source("lotr.d2v") #Insert Deinterlacer tfm(last,d2v="lotr.d2v").tdecimate() #Applying Resizing LanczosResize(720,352,0,62,-0,-66) 64: x264-64bit.exe --crf 20 --preset medium --threads auto --tune film --sar 32:27 --output "C:\Temp\lotr64.mkv" lotr64.avs 32: x264.exe --crf 20 --preset medium --threads auto --tune film --sar 32:27 --output "C:\Temp\lotr.mkv" lotr.avs 64bit chain: encoded 16356 frames, 79.40 fps, 1345.89 kb/s 32bit chain: encoded 16356 frames, 70.57 fps, 1345.89 kb/s Using Athlon II (620) O/C'd to 3.5Ghz Win7 Ult x64 |
14th March 2010, 20:14 | #159 | Link |
Registered User
Join Date: Feb 2009
Location: USA
Posts: 676
|
Thanks Squid, I suppose its better than nothing
I don't have any issues with using old versions, as long as nothing major has changed since then.. But when I open the D2V files from two different versions and they look different it kind of makes me nervous I'm not getting the best indexing of my source. |
14th March 2010, 20:47 | #160 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
@noee
I'm guessing this means that you've gotten tivtc with working results? If so, that's great news. Those results seem about in line with expected. Somewhere between 10-20% faster when using x64 code. @Squid80 You've got a DGDecode listed on your webpage along with source, but checking the version info indicates it's 1.4.6. Any chance you have the 1.5.6 source on hand and I could take a peek at it? Also, may I add that to the first post? @kemuri-_9 Can I link your FFMS2.dll on the first post? I can move it over to mediafire if you don't want to waste bandwidth on hosting it locally. @turbojet Autocrop is built and working for me. Link is on the first post. Last edited by JoshyD; 14th March 2010 at 21:20. |
Thread Tools | Search this Thread |
Display Modes | |
|
|