PDA

View Full Version : 32 bit to 64 bit Xvid performance impact (benchmarks)


fumigator
9th February 2005, 19:37
I have posted a test in the Virtual dub forum, I tested a Xvid 64 bit compile with Virtual dub 1.6.3 64 bit

the post goes as follows:

Hi everyone!!, wanted to share my first testing about the speed difference between 32 and 64 bit virtual dub, running 32 and 64 bit Xvid respectively. Here are the details:


TEST:

Recompression of an Xvid file, lenght 3:10 aprox.


Hardware:

Asus K8V-X socket 754 + athlon 64 3000 + Kingston 512MB DDR400 Cas Latency 3

(similar performance impact should be achieved with an "EMT64 enabled" Intel processor)


Software:

Virtual dub 1.6.3 64 bit + Codec Xvid 64 bit (under Windows XP x64 latest RC)

VS

Virtual dub 1.5.1 32 bit + Xvid 32 bit (under Windows 2000 32 bit)


Results:

Second pass full recompress:



http://img202.exs.cx/img202/4884/pasada232bitxvid4ld.gif http://img202.exs.cx/img202/5711/pasada264bitxvid6ru.gif
Decoder configuration 32 , 64:
http://img202.exs.cx/img202/7464/decoderconfiguration32bit1vq.gif http://img202.exs.cx/img202/5411/decoderconfiguration64bit4ug.gif


Motion configuration 32 , 64:

http://img202.exs.cx/img202/4950/encoderconfigurationmotion32bi.gifhttp://img202.exs.cx/img202/9809/encoderconfigurationmotion64bi.gif

Profile 32, 64:

http://img103.exs.cx/img103/4347/encoderconfigurationprofile32b.gifhttp://img103.exs.cx/img103/2294/encoderconfigurationprofile64b.gif

Doubts:

- Don't know if closed GOV is significative in performance. I don't have enough time and I just realized watching the screenshot
-As far as I know different versions of Virtual Dub don't have perfromance impact, at least, version 1.5.1 is no faster or slower that 1.6
-It seems Xvid 64 bit I've got is compiled using Intel 64 bit compiler, which makes programs to run slower on AMD 64 processors (quite logical). However, don't know which compiler was used for Virtual dub 64 bit, but as far as I know, Microsoft has a preview 64 bit version of Visual Studio .NET 2005 so people who wants to make some noise may want to download that for free.
-Deringing to the 64 bit test, don't know if this has impact in performance.


Conclusions

the 64 bit test is 28 seconds faster than the 32 bit test. That's about 20% faster. Imagine having a 10 hours processing with Xvid, under 64 bit that would take 8 hours. It's quite interesting knowing that everythin including windows is still beta and not hardly optimized for 64 bit but just compiled to run. I hope to see a better version of 64 bit Vdub and Xvid. Does anyone know if DivX is going to release a 64 bit version? Avisynth 64 bit is going to be released it seems... Hope you like this test.

Sharktooth
9th February 2005, 20:11
Just a note: EMT64 wont have the same performace boost. Intel just added some stuff but it has not dedicated units for 64bit execution. The performace will be poor.

EDIT: Closed GOV does not change the encoding speed.

fumigator
9th February 2005, 20:41
Uhmm I was using old Xvid for 32 bit test(XviD-1.0.3-20122004 _Final Release_), while I actually don't know the 64 bit version I've got (it only came a zip with 2 DLL files and a inf...) tested Vdub 32 bit v 1.6.3 with the latest Koepi's Xvid, quite an improvement over the old Xvid... :o

http://img52.exs.cx/img52/5760/newxvid111dj.gif

Well... 32 bit version is still slower. But at least we can find the difference between new and old Xvid... :(
Anyway, I still have doubts about the 64 bit version I've got, if someone knows about a compiled version of Xvid 64 bit (that expressally says 1.1b) please tell me (direct link would be good, thanks) :rolleyes:

Sharktooth
9th February 2005, 20:49
Maybe you should do a cvs checkout and compile it by yourself.
BTW, i dont know if the x86-64 asm code has been optimized for speed yet.

fumigator
9th February 2005, 20:53
most likely not... many 32 bit programs out there run quite slower than their 64 bit versions... trying to test 64 bit capabilities with virtual dub 64 and Xvid 64 was quite a mistake...

Sharktooth
9th February 2005, 20:55
if 64 bit appz are not optimized/dont use the additional registers/are simply re-compiled with a different flag... there's almost no speedup.
Porting to 64 bits is not so "easy".
Part of the code should be changed or reworked even heavily.

fumigator
9th February 2005, 21:16
Look at this, part of the speed up was thanks to the new version of VDub. Here's the version 1.5.1 with new Xvid 1.1b:

http://img28.exs.cx/img28/5929/oldvirtualdub3md.gif

comparing to the one in the post above, 2:38 against 2:28... adding new Xvid with new virtual dub you get quite a good performance improvement.
Maybe we should wait a quite good while until they optimize x64 for speed... :(

ChronoCross
9th February 2005, 21:22
It would also help to know if avisynth/filters were used. On a machine like tha tit seems a bit slow as on my p3 1.1ghz machien I can get almost 13fps with a heavy script. and almost 26fps simple by loading cropping and resizing....compared to your athlon64 3000 I make your encoding look weak....so is it filters that are slowing you down or what?

squid_80
9th February 2005, 21:53
I'm guessing that's my 64-bit build, unless someone else has done one and I've missed seeing it. There's a thread in this very forum about it, but I think it's been pushed back somewhere around page 3 by now.
In vdub64's select compression box, select xvid and click on About and it should say it's xvid-1.1.0-beta1, windows x64 build, the listed date should be somewhere around 26th of jan I think.
It wasn't compiled with the Intel compiler, I used the windows ddk and visual studio 2003. There certainly shouldn't be any performance loss due to using an AMD vs using an Intel.
There's still a LOT of optimization to be done, I've only done a few little tweaks here and there. I want to be sure there's no bugs first.

fumigator
9th February 2005, 23:37
Yeah, it seems there are lots of optimizations to be done.

ChronoCross: I would see the duration of the encoding, don't look frame rate because in the piece of movie I compressed the framerate go up and down like hell (I did it on purpose it was a very shaky scene with slowdowns). Actually the max framerate I achieved was like 75 fps and at the end it goes down to 28 fps. Also, the avi file is already encoded in Xvid, which means that the app decodes and encodes, and it's resolution is 720x352 (no resize applied during encoding).
Try to reencode a 720x352 file yourself in Xvid and your pentium will die at try (I also had a penium 3, this Athlon 64 is a monster :devil: ... I felt robbed by intel all the time and will never go back)
ALSO: I didn't use Avisynth (which I know it kicks in speeds) so more over, try putting full processing mode in Virtual dub, then show me your bench, it should be like 15 fps max down to 8 or 7 fps min. :sly:

fumigator
9th February 2005, 23:51
Squid_80: If your Xvid wasn't compiled with intel compiler then Vdub was, because I remember one of those were compiled with intel compiler. However, I heard the Microsoft compiler doesn't have support for MMX, SSE1 & 2, 3D Now etc, is that true?? :confused:

I haven't entered to windows x64 today but I think it is your build, if it was posted in a forum, most likely it's your build. If you're planning making more optimizations i would be glad to be a tester (I suppose you already have a x64 processor). Just to know about it, maybe there's a way to see if there's a bug in your compilation in the resulted AVI file. Here's my doubt: An AVI file encoded with Xvid 1.1b and the same encoded with Xvid 1.1b 64 bit, the resulting AVI files should be binary identicals??? If so, then I use file comparing to compare the results. I remember comparing the avi files resulting of DivX 5.05, in different platforms:

1)Athlon XP 1700
2)Athlon 950Mhz
3)Pentium 2 400Mhz

AVIs made in systems 1 & 2 were binary identicals, the one made in 3 was almost identical but with slight diferences over all the AVI file. That's a mistery to me, I don't know almost nothing about low level programming (only pascal and C to me) but the only scientifical conclution I could think is that Athlons have 3d now instructions... What can you tell me about that one...
:confused: :scared:

squid_80
10th February 2005, 00:36
I doubt that virtualdub for AMD64 was compiled with an intel compiler. In fact I remember seeing on virtualdub's news page that the first AMD64 build was done with the windows DDK and I've got a feeling it's still done that way.

The microsoft compiler doesn't support inline assembly, so assembly programming has to be done with instrinsics. Only SSE/SSE2 intrinsics that use the XMM registers (not MMX) are supported. So you're half right - the microsoft compiler doesn't support MMX or 3DNow. But it will generate code that uses SSE/SSE2 instructions.
There's an easy way around this though - use something else to compile the assembly and then link it in. I used yasm to compile all the assembly code for xvid.

There's huge differences (at binary level) between files created with my xvid and a 32-bit version. I'm still investigating if these differences are worth worrying about.

EDIT: Was it the planetamd64 forum that you got it from?

fumigator
10th February 2005, 02:20
LOL, my brother just reminded me that the AVI generated in the Pentium 2 400Mhz was 50% binary identical and 50% absolutly messy. That may mean that it's just that way (and is not worth worrying about). I would care about the look of the result, and speed in decoding. Taking a frame of the AVI generated in 2 different versions and compraing using zoom in, or something.

About where did I get your Xvid, I don't remember :( sorry... I was in a rush that day, I just remember I typed Xvid x64 (or something like that) in yahoo and opened 4 or 5 pages of results... then I found that I think it was in the first or second page. Maybe it was from Planet 64, most likely since I haven't entered Doom9 forum a long time ago.

I encourage you to continue the developing of Xvid x64... I mean, if you can make some speed mods, that could be good, even if it's an alpha release, my though are based on results: It does it's job faster, I like it.

You didn't told me what are your system specs, just curious what you run inside the grey box

sysKin
10th February 2005, 03:51
The 64-bit version will have no assembler optimizations at all. Our current 64-bit assembler code turned out to be linux-only, because for some weird reason windows uses a different calling convention and doesn't support MMX (only SSE).

fumigator
10th February 2005, 04:06
Does that means that the Linux version may be probably faster due that is possible to make asm optimizations? And if not, why not to do the optimizations? :D Are you lazy or what ?:p

ChronoReverse
10th February 2005, 08:28
Originally posted by sysKin
The 64-bit version will have no assembler optimizations at all. Our current 64-bit assembler code turned out to be linux-only, because for some weird reason windows uses a different calling convention and doesn't support MMX (only SSE).

From what I've heard, MS wants to get rid of MMX and x87 (replace with SSE2 vector and scalar) and they are deprecated.

I've never heard of MMX making a huge difference anyways, so I don't suppose it's a bad thing to go to SSE, but if they've completely gotten rid of MMX already that'd be a really odd thing to do.

squid_80
10th February 2005, 09:33
Originally posted by ChronoReverse
From what I've heard, MS wants to get rid of MMX and x87 (replace with SSE2 vector and scalar) and they are deprecated.

I've never heard of MMX making a huge difference anyways, so I don't suppose it's a bad thing to go to SSE, but if they've completely gotten rid of MMX already that'd be a really odd thing to do.

Not sure if sysKin is talking about an official windows build that's planned for the future or what but my build most certainly does have assembly optimizations, in fact more than what you'd get if you made a linux build based on current cvs.

Yes, microsoft have said not to use MMX because it's deprecated and to use SSE instead. What they don't seem to realise is that until we get CPUs that implement SSE ops in true 128-bit instead of two 64-bit halves, MMX is still faster in a lot of cases. At any rate, MMX is still working in windows x64 and I've warned that there are MMX instructions in this build. If ms decides to stop saving the MMX registers by the operating system before I get around to converting it all to SSE, it will stop working.

As for MS using different calling conventions, I have cursed them out loud many, many times for doing so. What the hell's the point of having twice as many registers if all the extra ones are non-volatile? If the linux version is faster, it's these ABI differences that will be responsible and you can blame ms for their short-sightedness (once again).

fumigator: I haven't done much work on xvid lately since I want to do longer tests, ideally with MPEG-2 source material. So I'm working on something that will let vdub64 read MPEG-2 streams (.vob, .ts, .m2v etc). Once I get that sorted I'll go back to optimizing. And for the record I've got a Athlon64 3000+ (754 pin) in a gigabyte K8NS.

N.B. If anyone wants to argue about the speed of MMX vs. SSE, please don't bother - unless you personally want to volunteer to produce faster, SSE2 versions of xvid's MMX code. Anyone who says "Use SSE, it operates on twice the amount of data so it should be twice as fast" obviously hasn't tried doing it.

LiFe
10th February 2005, 12:15
Some elements of MS might want to deprieciate MMX but it sounds like they have the case of the HP's:
Left Hand: "Helloooooo over there!"
Right Hand: "Cooooeeee, can anyone hear me?"

Windows XP still runs Windows 95 programs (maybe even win 3.11 proggies?). There a millions of programs with mmx optimisation, and MS will continue supporting them till they fall over and die from code bloat due to trying to support every previous iteration of everything ever in existance.

Dropping support of mmx won't happen for years. They may just be encouraging it a little.

fumigator
10th February 2005, 21:50
Squid 80: Cool idea, I didn't tought about something to let virtual dub 64 read mpeg2 stream, since Avisynth and it's plugins have to be rebuilt to work in 64 bit, then I think if you achieve that you'll be one step ahead. Hey, don't you want a web site to publish these releases? I can make that if you like, I mean, I can't optimize or help in programing that stuff, but at least I can build a nice website.

squid_80
10th February 2005, 23:07
Originally posted by fumigator
Squid 80: Cool idea, I didn't tought about something to let virtual dub 64 read mpeg2 stream, since Avisynth and it's plugins have to be rebuilt to work in 64 bit, then I think if you achieve that you'll be one step ahead. Hey, don't you want a web site to publish these releases? I can make that if you like, I mean, I can't optimize or help in programing that stuff, but at least I can build a nice website.

It would be good to have a single page with these things on it (xvid64, huffyuv64, and the in progress mpg2 reader), but I haven't done it since I'm almost certain there's still nasty bugs waiting to be found. Plus I want to finish replacing all the MMX with SSE so I'm not breaking microsoft's rules, even though I think LiFe is right and MMX will work fine for a long time (winnt.h in the winddk headers seems to agree with me).
Probably after win x64 is finally released (or when my ISP tells me the webspace for my dial-up account is using too much bandwidth) and more people start wanting these things I'll set up a permanent home. Or just pass them back to the original developers and leave it up to them.

Sulik
10th February 2005, 23:14
The "No MMX" 64-bit rule really only applies to device drivers running in the system process context. The same limitation also exists in 32-bit as well, though there are hacks to get around it.

fumigator
10th February 2005, 23:35
Squid 80: Most likely the original developers have webspace for your port since everytime I make a website I have to load it in the free version of geocities. If you still need someone to design a simple page just send me a message. :cool:

squid_80
16th February 2005, 09:48
For anoyone interested in further benchmarks I thought I would post the results from 32-bit and 64-bit versions of xvid_bench. Note that the 64-bit version has a lot of redundant functions removed since any 64-bit processsor will have SSE/SSE2. Also feel free to correct my comments if you think I've got something wrong.

===== test fdct/idct ===== 32-bit
PLAINC - 1.300 usec PSNR=13.291 MSE=3.000 (fdct=plainc/idct=plainc)
MMX - 0.350 usec PSNR=9.611 MSE=7.000 (fdct=mmx/idct=mmx)
MMXEXT - 0.344 usec PSNR=9.611 MSE=7.000 (fdct=mmxext/idct=mmxext)
SSE2 - 0.350 usec PSNR=9.611 MSE=7.000 (fdct=SSE2/idct=mmx)

===== test fdct/idct ===== 64-bit
PLAINC - 0.872 usec PSNR=13.291 MSE=3.000 (fdct=plainc/idct=plainc)
MMXEXT - 0.522 usec PSNR=13.291 MSE=3.000 (fdct=plainc/idct=mmxext)
SSE2 - 0.528 usec PSNR=9.611 MSE=7.000 (fdct=sse2/idct=plainc)

This test is a bit misleading since it tests both fdct and idct together. So I added the comments at the end of the lines to help clear things up a bit. In ordinary circumstances the 64-bit codec uses fdct=sse2/idct=3dnowext(not shown here because xvid_bench doesn't test it properly) or fdct=sse2/idct=mmxext if the cpu is intel based. So not to worry, it's not using plainc in general use.
I think the dct functions could make good use of the extra xmm registers but unfortunately I can't wrap my head around skal's algorithm. Any volunteers? You'd be doing lots of people a favour because this fdct seems to be used all over the place...

=== test block motion === 32-bit
PLAINC - interp- h-round0 0.092 usec crc32=0x115381ba
PLAINC - round1 0.104 usec crc32=0x2b1f528f
PLAINC - interp- v-round0 0.078 usec crc32=0x423cdcc7
PLAINC - round1 0.091 usec crc32=0x42202efe
PLAINC - interp-hv-round0 0.156 usec crc32=0xd198d387
PLAINC - round1 0.157 usec crc32=0x9ecfd921
---
MMX - interp- h-round0 0.026 usec crc32=0x115381ba
MMX - round1 0.039 usec crc32=0x2b1f528f
MMX - interp- v-round0 0.039 usec crc32=0x423cdcc7
MMX - round1 0.039 usec crc32=0x42202efe
MMX - interp-hv-round0 0.052 usec crc32=0xd198d387
MMX - round1 0.065 usec crc32=0x9ecfd921
---
MMXEXT - interp- h-round0 0.013 usec crc32=0x115381ba
MMXEXT - round1 0.026 usec crc32=0x2b1f528f
MMXEXT - interp- v-round0 0.013 usec crc32=0x423cdcc7
MMXEXT - round1 0.013 usec crc32=0x42202efe
MMXEXT - interp-hv-round0 0.026 usec crc32=0xd198d387
MMXEXT - round1 0.039 usec crc32=0x9ecfd921
---
3DNOW - interp- h-round0 0.026 usec crc32=0x115381ba
3DNOW - round1 0.013 usec crc32=0x2b1f528f
3DNOW - interp- v-round0 0.013 usec crc32=0x423cdcc7
3DNOW - round1 0.013 usec crc32=0x42202efe
3DNOW - interp-hv-round0 0.039 usec crc32=0xd198d387
3DNOW - round1 0.026 usec crc32=0x9ecfd921
---
3DNOWE - interp- h-round0 0.027 usec crc32=0x115381ba
3DNOWE - round1 0.013 usec crc32=0x2b1f528f
3DNOWE - interp- v-round0 0.013 usec crc32=0x423cdcc7
3DNOWE - round1 0.013 usec crc32=0x42202efe
3DNOWE - interp-hv-round0 0.038 usec crc32=0xd198d387
3DNOWE - round1 0.027 usec crc32=0x9ecfd921
---

=== test block motion === 64-bit
PLAINC - interp- h-round0 0.078 usec crc32=0x115381ba
PLAINC - round1 0.092 usec crc32=0x2b1f528f
PLAINC - interp- v-round0 0.091 usec crc32=0x423cdcc7
PLAINC - round1 0.078 usec crc32=0x42202efe
PLAINC - interp-hv-round0 0.143 usec crc32=0xd198d387
PLAINC - round1 0.130 usec crc32=0x9ecfd921
---
SSE2 - interp- h-round0 0.013 usec crc32=0x115381ba
SSE2 - round1 0.026 usec crc32=0x2b1f528f
SSE2 - interp- v-round0 0.013 usec crc32=0x423cdcc7
SSE2 - round1 0.026 usec crc32=0x42202efe
SSE2 - interp-hv-round0 0.039 usec crc32=0xd198d387
SSE2 - round1 0.052 usec crc32=0x9ecfd921
---

I rewrote these functions in SSE2 rather hurriedly. They need to be redone properly, especially hv by the looks of things.

====== test SAD ====== 32-bit
PLAINC - sad8 0.117 usec sad=3776
PLAINC - sad16 0.443 usec sad=27214
PLAINC - sad16bi 1.198 usec sad=26274
PLAINC - dev16 0.586 usec sad=3344
---
MMX - sad8 0.026 usec sad=3776
MMX - sad16 0.091 usec sad=27214
MMX - sad16bi 0.222 usec sad=26274
MMX - dev16 0.169 usec sad=3344
---
MMXEXT - sad8 0.013 usec sad=3776
MMXEXT - sad16 0.039 usec sad=27214
MMXEXT - sad16bi 0.052 usec sad=26274
MMXEXT - dev16 0.065 usec sad=3344
---
SSE2 - sad16 0.052 usec sad=27214
SSE2 - dev16 0.078 usec sad=3344
---
3DNOW - sad16bi 0.104 usec sad=26274
---
3DNOWE - sad8 0.130 usec sad=3776
3DNOWE - sad16 0.443 usec sad=27214
3DNOWE - sad16bi 0.092 usec sad=26274
3DNOWE - dev16 0.598 usec sad=3344
---

====== test SAD ====== 64-bit
PLAINC - sad8 0.117 usec sad=3776
PLAINC - sad16 0.430 usec sad=27214
PLAINC - sad16bi 1.120 usec sad=26274
PLAINC - dev16 1.028 usec sad=3344
---
SSE2 - sad8 0.013 usec sad=3776
SSE2 - sad16 0.039 usec sad=27214
SSE2 - sad16bi 0.078 usec sad=26274
SSE2 - dev16 0.065 usec sad=3344
---

The 3DNOWE results aren't accurate because xvid_bench doesn't enable MMXEXT when testing 3DNOWEXT functions - it's actually using the plainc functions (except sad16bi, which is plain 3dnow). Anyway, some of these were redone in SSE2 including a few not tested by xvid_bench, same story as the interpolate functions, I need to redo them properly.

=== test transfer === 32-bit
PLAINC - 8to16 0.052 usec crc32=0x115814bb
PLAINC - 16to8 0.247 usec crc32=0xee7ccbb4
PLAINC - 8to8 0.013 usec crc32=0xd37b3295
PLAINC - 16to8add 0.215 usec crc32=0xdd817bf4
PLAINC - 8to16sub 0.117 usec crc32(1)=0xa1e07163 crc32(2)=0xd86c5d23
PLAINC - 8to16sub2 0.325 usec crc32=0x99b6c4c7
---
MMX - 8to16 0.013 usec crc32=0x115814bb
MMX - 16to8 0.006 usec crc32=0xee7ccbb4
MMX - 8to8 -0.000 usec crc32=0xd37b3295
MMX - 16to8add 0.013 usec crc32=0xdd817bf4
MMX - 8to16sub 0.033 usec crc32(1)=0xa1e07163 crc32(2)=0xd86c5d23
MMX - 8to16sub2 0.058 usec crc32=0x99b6c4c7
---
MMXEXT - 8to16sub2 0.033 usec crc32=0x99b6c4c7
---
3DNOWE - 8to16 0.020 usec crc32=0x115814bb
3DNOWE - 16to8 0.007 usec crc32=0xee7ccbb4
3DNOWE - 8to8 0.013 usec crc32=0xd37b3295
3DNOWE - 16to8add 0.020 usec crc32=0xdd817bf4
3DNOWE - 8to16sub 0.013 usec crc32(1)=0xa1e07163 crc32(2)=0xd86c5d23
3DNOWE - 8to16sub2 0.332 usec crc32=0x99b6c4c7
---

=== test transfer === 64-bit
PLAINC - 8to16 0.124 usec crc32=0x115814bb
PLAINC - 16to8 0.397 usec crc32=0xee7ccbb4
PLAINC - 8to8 0.013 usec crc32=0xd37b3295
PLAINC - 16to8add 0.163 usec crc32=0xdd817bf4
PLAINC - 8to16sub 0.287 usec crc32(1)=0xa1e07163 crc32(2)=0xd86c5d23
PLAINC - 8to16sub2 0.169 usec crc32=0x99b6c4c7
---
SSE2 - 8to16 0.019 usec crc32=0x115814bb
SSE2 - 16to8 0.143 usec crc32=0xee7ccbb4
SSE2 - 8to8 0.007 usec crc32=0xd37b3295
SSE2 - 16to8add -0.104 usec crc32=0xdd817bf4
SSE2 - 8to16sub 0.163 usec crc32(1)=0xa1e07163 crc32(2)=0xd86c5d23
SSE2 - 8to16sub2 -0.091 usec crc32=0x99b6c4c7
---

More new SSE2 functions, not really sure of the validity of these results. Obviously the negative values aren't right but the others don't seem to make sense either.

===== test quant ===== 32-bit
PLAINC - quant_mpeg_intra 50.553 usec crc32=0xfd6a21a4
PLAINC - quant_mpeg_inter 48.932 usec crc32=0xf6de7757
PLAINC - dequant_mpeg_intra 13.021 usec crc32=0x2def7bc7
PLAINC - dequant_mpeg_inter 17.292 usec crc32=0xd878c722
PLAINC - quant_h263_intra 13.223 usec crc32=0x2eba9d43
PLAINC - quant_h263_inter 16.484 usec crc32=0xbd315a7e
PLAINC - dequant_h263_intra 12.507 usec crc32=0x9841212a
PLAINC - dequant_h263_inter 13.229 usec crc32=0xe7df8fba
---
MMX - quant_mpeg_intra 3.457 usec crc32=0xdacabdb6 | ERROR
MMX - quant_mpeg_inter 2.747 usec crc32=0x72883ab6 | ERROR
MMX - dequant_mpeg_intra 3.053 usec crc32=0x2def7bc7
MMX - dequant_mpeg_inter 3.665 usec crc32=0xd878c722
MMX - quant_h263_intra 1.829 usec crc32=0x2eba9d43
MMX - quant_h263_inter 1.725 usec crc32=0xbd315a7e
MMX - dequant_h263_intra 2.344 usec crc32=0x9841212a
MMX - dequant_h263_inter 2.135 usec crc32=0xe7df8fba
---
MMXEXT - quant_mpeg_intra 3.464 usec crc32=0xfd6a21a4
MMXEXT - quant_mpeg_inter 3.457 usec crc32=0xf6de7757
MMXEXT - dequant_h263_intra 2.240 usec crc32=0x9841212a
MMXEXT - dequant_h263_inter 1.829 usec crc32=0xe7df8fba
---
SSE2 - quant_h263_intra 1.523 usec crc32=0x2eba9d43
SSE2 - quant_h263_inter 1.732 usec crc32=0xbd315a7e
SSE2 - dequant_h263_intra 2.031 usec crc32=0x9841212a
SSE2 - dequant_h263_inter 1.829 usec crc32=0xe7df8fba
---
3DNOWE - dequant_mpeg_intra 13.021 usec crc32=0x2def7bc7
3DNOWE - dequant_mpeg_inter 17.298 usec crc32=0xd878c722
3DNOWE - quant_h263_intra 13.223 usec crc32=0x2eba9d43
3DNOWE - quant_h263_inter 16.478 usec crc32=0xbd315a7e
3DNOWE - dequant_h263_intra 12.409 usec crc32=0x9841212a
3DNOWE - dequant_h263_inter 13.327 usec crc32=0xe7df8fba
---

===== test quant ===== 64-bit
PLAINC - quant_mpeg_intra 48.014 usec crc32=0xfd6a21a4
PLAINC - quant_mpeg_inter 47.402 usec crc32=0xf6de7757
PLAINC - dequant_mpeg_intra 12.005 usec crc32=0x2def7bc7
PLAINC - dequant_mpeg_inter 18.307 usec crc32=0xd878c722
PLAINC - quant_h263_intra 13.633 usec crc32=0x2eba9d43
PLAINC - quant_h263_inter 14.447 usec crc32=0xbd315a7e
PLAINC - dequant_h263_intra 11.595 usec crc32=0x9841212a
PLAINC - dequant_h263_inter 12.917 usec crc32=0xe7df8fba
---
MMX - dequant_mpeg_intra 2.852 usec crc32=0x2def7bc7
MMX - dequant_mpeg_inter 3.659 usec crc32=0xd878c722
---
MMXEXT - quant_mpeg_intra 3.359 usec crc32=0xfd6a21a4
MMXEXT - quant_mpeg_inter 3.561 usec crc32=0xf6de7757
---
SSE2 - quant_h263_intra 1.934 usec crc32=0x2eba9d43
SSE2 - quant_h263_inter 1.628 usec crc32=0xbd315a7e
SSE2 - dequant_h263_intra 1.934 usec crc32=0x9841212a
SSE2 - dequant_h263_inter 1.934 usec crc32=0xe7df8fba
---

Not much difference here except that I got rid of the unneeded versions.

===== test cbp ===== 32-bit
PLAINC - calc_cbp#1 0.039 usec cbp=0x15
PLAINC - calc_cbp#2 0.039 usec cbp=0x38
PLAINC - calc_cbp#3 0.037 usec cbp=0x0f
PLAINC - calc_cbp#4 0.070 usec cbp=0x05
---
MMX - calc_cbp#1 0.052 usec cbp=0x15
MMX - calc_cbp#2 0.052 usec cbp=0x38
MMX - calc_cbp#3 0.055 usec cbp=0x0f
MMX - calc_cbp#4 0.050 usec cbp=0x05
---
SSE2 - calc_cbp#1 0.052 usec cbp=0x15
SSE2 - calc_cbp#2 0.050 usec cbp=0x38
SSE2 - calc_cbp#3 0.052 usec cbp=0x0f
SSE2 - calc_cbp#4 0.052 usec cbp=0x05
---
3DNOWE - calc_cbp#1 0.047 usec cbp=0x15
3DNOWE - calc_cbp#2 0.039 usec cbp=0x38
3DNOWE - calc_cbp#3 0.039 usec cbp=0x0f
3DNOWE - calc_cbp#4 0.070 usec cbp=0x05
---

===== test cbp ===== 64-bit
PLAINC - calc_cbp#1 0.023 usec cbp=0x15
PLAINC - calc_cbp#2 0.024 usec cbp=0x38
PLAINC - calc_cbp#3 0.023 usec cbp=0x0f
PLAINC - calc_cbp#4 0.052 usec cbp=0x05
---

This one's interesting - the plainc version has early termination whereas the simd versions process the whole block regardless. The C version is written to use int64s, that's why it's significantly faster in 64-bit mode. So I dropped the assembly variants completely. A faster version could probably be done in assembly using repnz cmpsq.... maybe.

===== test sse ===== 32-bit
PLAINC - sse8_16bit#1 0.136 usec sse=182013834
PLAINC - sse8_16bit#2 0.138 usec sse=142545203
PLAINC - sse8_16bit#3 0.136 usec sse=146340935
PLAINC - sse8_16bit#4 0.136 usec sse=130136661
PLAINC - sse8_16bit#5 0.136 usec sse=136870353
PLAINC - sse8_16bit#6 0.135 usec sse=164107772
PLAINC - sse8_8bit#1 0.141 usec sse=1356423
PLAINC - sse8_8bit#2 0.139 usec sse=1173074
PLAINC - sse8_8bit#3 0.140 usec sse=1092357
PLAINC - sse8_8bit#4 0.139 usec sse=1360239
PLAINC - sse8_8bit#5 0.139 usec sse=1208414
PLAINC - sse8_8bit#6 0.139 usec sse=1099285
---
MMX - sse8_16bit#1 0.026 usec sse=182013834
MMX - sse8_16bit#2 0.031 usec sse=142545203
MMX - sse8_16bit#3 0.026 usec sse=146340935
MMX - sse8_16bit#4 0.027 usec sse=130136661
MMX - sse8_16bit#5 0.027 usec sse=136870353
MMX - sse8_16bit#6 0.027 usec sse=164107772
MMX - sse8_8bit#1 0.042 usec sse=1356423
MMX - sse8_8bit#2 0.038 usec sse=1173074
MMX - sse8_8bit#3 0.044 usec sse=1092357
MMX - sse8_8bit#4 0.038 usec sse=1360239
MMX - sse8_8bit#5 0.038 usec sse=1208414
MMX - sse8_8bit#6 0.038 usec sse=1099285
---

===== test sse ===== 64-bit
PLAINC - sse8_16bit#1 0.132 usec sse=182013834
PLAINC - sse8_16bit#2 0.133 usec sse=142545203
PLAINC - sse8_16bit#3 0.132 usec sse=146340935
PLAINC - sse8_16bit#4 0.132 usec sse=130136661
PLAINC - sse8_16bit#5 0.133 usec sse=136870353
PLAINC - sse8_16bit#6 0.132 usec sse=164107772
PLAINC - sse8_8bit#1 0.133 usec sse=1356423
PLAINC - sse8_8bit#2 0.138 usec sse=1173074
PLAINC - sse8_8bit#3 0.133 usec sse=1092357
PLAINC - sse8_8bit#4 0.134 usec sse=1360239
PLAINC - sse8_8bit#5 0.134 usec sse=1208414
PLAINC - sse8_8bit#6 0.133 usec sse=1099285
---
MMX - sse8_8bit#1 0.037 usec sse=1356423
MMX - sse8_8bit#2 0.044 usec sse=1173074
MMX - sse8_8bit#3 0.037 usec sse=1092357
MMX - sse8_8bit#4 0.038 usec sse=1360239
MMX - sse8_8bit#5 0.038 usec sse=1208414
MMX - sse8_8bit#6 0.037 usec sse=1099285
---
SSE2 - sse8_16bit#1 0.023 usec sse=182013834
SSE2 - sse8_16bit#2 0.022 usec sse=142545203
SSE2 - sse8_16bit#3 0.023 usec sse=146340935
SSE2 - sse8_16bit#4 0.023 usec sse=130136661
SSE2 - sse8_16bit#5 0.021 usec sse=136870353
SSE2 - sse8_16bit#6 0.023 usec sse=164107772
---

Another function redone in SSE2, but properly this time. At least, this was the quickest I could make it. The pmaddwd instructions tend to cause bottlenecks in the fpu mul pipeline.

Comments and suggestions welcome.

fumigator
16th February 2005, 19:06
You seem to be fast with the clock (just kidding) how did you calculated time by the way? :confused: Anyway we can see that in many parts it's good to be 64 bit. Hopefully we will see this in real apps. :) If you see there's, for example:

===== test cbp ===== 32-bit
PLAINC - calc_cbp#1 0.039 usec cbp=0x15

===== test cbp ===== 64-bit
PLAINC - calc_cbp#1 0.023 usec cbp=0x15

that difference of 0.016 is actually like 40%, meaning that an app that uses that comand a lot may be significantly faster in 64 bit. Of course I took the best benchmark, but just see what squid said, it seems that "The C version is written to use int64s" :sly:

Sharktooth
17th February 2005, 15:25
Can someone test it with this please?
winxp pro x64 RC2: http://download.microsoft.com/download/f/3/0/f30500f0-0eef-48fc-884b-3b0902885ff1/w2k3sp1_1433_usa_x64fre_pro.iso
Just use the RC1 key to install.

fumigator
17th February 2005, 18:29
I will, just let me download and install that. :-)

squid_80
18th February 2005, 06:52
Erm, I know it's free anyway but I don't think it's actually legal to post a direct link to the iso like that.

Sharktooth
18th February 2005, 13:48
Well, you still have to go to the MS website to get a working serial, and however that's is a trial version.

Joe Fenton
18th February 2005, 22:23
squid_80: Just as MS doesn't want people using MMX and X87, you also shouldn't be using 3DNow or 3DNow Ext. Both of those are also deprecated for XP64. They use the same resources as MMX and X87 and so will also eventually not work as well. The ONLY mm extension to be supported in the future will be SSEx.

squid_80
19th February 2005, 01:46
I know. Where'd you get the idea that I was replacing only the mmx code and leaving the 3dnow stuff in?

LigH
9th May 2005, 20:04
Are there any newer Athlon-64 builds of XviD than celtic-druid's GCC builds (http://www.aziendeassociate.it/cd.asp?dir=/gcc) available at the moment?

Sharktooth
9th May 2005, 20:34
The celtic druid builds are 32bit compiles... even the ones for athlon-64.

Selur
9th May 2005, 22:05
could someone post a link to a 'real' 64bit compile ?

squid_80
9th May 2005, 23:42
For win x64? http://home.iprimus.com.au/ajdunstan/xvid64.zip

Unzip, right click the .inf file and choose install. Keep in mind that it will only work with 64-bit apps (virtualdub is the only 64-bit video app I know of at the moment) and that includes avisynth.
I'm updating the build fairly regularly, I just don't post about it because people start going "It's only a tiny bit faster! Waaah!"

LigH
9th May 2005, 23:54
Indeed, we shall not expect miracles - or does anyone know (via "profiling") which part makes the most remarkable difference between 32 and 64 bit applications? The memory access maybe? Well - encoding calculations and disk activities shall be the slowest parts, and shall not gain much by 64-bit architecture, in my opinion.

squid_80
10th May 2005, 00:31
Any encoder/decoder function that uses the XMM registers (SSE2 code in xvid's case) can potentially benefit from having 8 more registers to play with. Unfortunately MS have made it rather hard to use them if you're coding in pure assembly. I might try rewriting some of the large asm functions (dct and (de)quant) with intrinsics and see how well the compiler optimizes them. I doubt it would do better, but you never know.

I used to profile with AMD's CodeAnalyst, but something I did to the xvid code broke it - it was telling me all the time was being spent in the brightness postprocessing function. :confused:

Selur
10th May 2005, 10:30
thx for the build :)

IgorC
11th May 2005, 05:26
Here is some imformation about performing of AMD-X2-64bit on Divx, Xvid
However they used Win XP not 64 bit and xvid 1.0.3 standart compilation. I sent email to athor of the test and asked him to use VDubx64 instead of Xmpeg and XVID x64 compilation running on WinXP64. Test is in the progress.



http://www.fcenter.ru/img/article/CPU/Athlon64_x2/charts/62378.png

http://www.fcenter.ru/img/article/CPU/Athlon64_x2/charts/62411.png

You can use translator for this http://www.fcenter.ru/online.shtml?articles/hardware/processors/13155

slavickas
11th May 2005, 18:12
Originally posted by IgorC

You can use translator for this http://www.fcenter.ru/online.shtml?articles/hardware/processors/13155

I think there http://www.xbitlabs.com/articles/cpu/display/athlon64-x2.html is author's english version

squid_80
22nd May 2005, 08:07
For those that are interested, I did some tests today using xvid_encraw to compare Koepi's 32-bit build with my 64-bit build. The tests were done on an AMD64 3000+ running windows x64 RC2, test.yuv contains 3500 raw 704x576 frames in I420 format (2,128,896,000 bytes).

Command Line: xvid_encraw -i i:\test.yuv -type 0 -w 704 -h 576 -packed -closed_gop -asm -max_bframes 2 -qpel -gmc -single

32-bit:
Tot: enctime(ms) =347540.00, length(bytes) = 15779684
Avg: enctime(ms) = 99.24, fps = 10.08, length(bytes) = 4505

64-bit:
Tot: enctime(ms) =271306.00, length(bytes) = 15779684
Avg: enctime(ms) = 77.47, fps = 12.91, length(bytes) = 4505

I tested 2-Pass as well.
Command Line: xvid_encraw -i i:\test.yuv -type 0 -w 704 -h 576 -packed -closed_gop -asm -max_bframes 2 -qpel -gmc -pass1 video.pass

32-bit:
Tot: enctime(ms) =72889.00, length(bytes) = 66803501
Avg: enctime(ms) = 20.81, fps = 48.05, length(bytes) = 19075

64-bit:
Tot: enctime(ms) =61848.00, length(bytes) = 66803501
Avg: enctime(ms) = 17.66, fps = 56.62, length(bytes) = 19075

Command Line: xvid_encraw -i i:\test.yuv -type 0 -w 704 -h 576 -packed -closed_gop -asm -max_bframes 2 -qpel -gmc -pass2 video.pass -bitrate 1000000

32-bit:
Tot: enctime(ms) =337446.00, length(bytes) = 17440545
Avg: enctime(ms) = 96.36, fps = 10.38, length(bytes) = 4980

64-bit:
Tot: enctime(ms) =267055.00, length(bytes) = 17440545
Avg: enctime(ms) = 76.26, fps = 13.11, length(bytes) = 4980

So there you have it. Approximately 20% faster (1st pass only about 15%) and the output is binary identical (yes I checked).

MacAddict
22nd May 2005, 14:18
squid_80,

Thats great news indeed! Perhaps it might be almost time for me to switch over to Win64:)

Blue_MiSfit
24th May 2005, 04:33
NICE! :) I will have to download and install the new build for my x64 partition when I get home!

As far as getting MPEG2 decoded (for vdub), can this be done in x64 yet?


-Misfit

squid_80
24th May 2005, 14:07
Yes, with the x64 versions of avisynth and dgdecode.
http://home.iprimus.com.au/ajdunstan/avisynth64.zip
http://home.iprimus.com.au/ajdunstan/dgdecode64.zip

I haven't optimized these as much as xvid64 yet.

Blue_MiSfit
26th May 2005, 00:19
:D :D :D :D :D :D :D :D

you compiled avisynth and dgdecode for x64???

You are now my new best friend. An all 64 bit workflow.

:D :D :D :D :D :D :D :D

On a more serious note, what are the limitations? It appears that all plugins will have to be recompiled for x64 as well (considering that you compiled dgdecode), or am I misunderstanding something here?

-Misfit

Blue_MiSfit
26th May 2005, 04:51
Did some tests with my system and the latest releases from squid_80

AVS as follows:

LoadPlugin("F:\documents and settings\administrator\desktop\vdub x64\dgdecode.dll")
*note this points to a different version of dgdecode (obviously) in win32.

mpeg2source("D:\Pulp\Pulp.d2v", idct=6)

Lanczos4Resize(1024,688)


I know 1024x688 is an oddball resolution, but the way I see it - on playback it is displayed at its native resolution (Since my crappy 17" CRT will only do > 60Hz at 1024x768 or less).

codec settings as follows:
*MSP6, VHQ4 (w/bvops), Chroma Motion, Max I-Frame 240
*All quants 2-31 w/trellis
*CQM: 6of9HVS
*Adaptive Quant, Qpel, Bvops @ 2/1.5/1
*Chroma Optimizer on all frames
*Constant Q3

Souce was Pulp Fiction - frames 40,000 - 42,000

Oddly enough my resulting AVI files were NOT identical - the win32 created file was 17,354,752 bytes, and the x64 created file was 17,770,392 bytes. Perhaps different versions of dgdecode are to blame?

HOWEVER, the time was significantly faster:
Win32: 9:00 (540 seconds)
x64: 8:13 (493 seconds)

Marvelous! Almost 10% faster is cool with me!

-Misfit

squid_80
26th May 2005, 13:52
Yep plugins need to be recompiled to work with avisynth64. If you need one done (and the source is available) I'm open to suggestions.

The filesize difference could be a few things:
a) The 64-bit version of dgdecode uses a sse2 version of skal's idct (that's what idct=6 is, correct?) as opposed to the 32-bit version which uses MMX. However I think they produce the same output so this probably isn't the culprit.
b) 64-bit avisynth's resizers might not be perfect since I had to rewrite a lot of stuff. I am WAY behind on avisynth64 work.
c) I haven't checked xvid64 for binary compatibility when using mpeg/custom matrices.
d) This is a long shot - there might be some differences in the color conversions being done by vdub.

Either way you've given me more than enough information to test and figure out what the answer is (I even have the same source available) so I'll see what turns up.

lexor
27th May 2005, 17:31
I hate to point out the obvious but nobody linked to the x64 xvid compiles :rolleyes:

so, yeah... gimmi gimmi gimmi:D

squid_80
27th May 2005, 17:56
I hate to point out the obvious too, try reading the previous page of this thread or use search.

squid_80
27th May 2005, 23:55
@Blue_MiSfit: Looks like the culprit is b) there's a bug in avisynth64's resizing algorithm. Without the resize everything seems to come out identical so mpeg/custom matrices and skal's idct/dgdecode aren't to blame (phew!). I have 2 weeks holiday from work so I should have plenty of time to improve/fix avisynth64 very soon.

Blue_MiSfit
29th May 2005, 03:06
Awesome! squid_80 is on it folks!

As far as plugins are concerned, here are some suggestions of plugins that I use a lot:

RemoveGrainSSE2, Unfilter, Dustv5 (which depends on loadpluginex), WarpSharp, Convolution3d

Not to demand or anything but if you have free time and want to work on making 64 bit plugins I'm sure many would appreciate it(myself included).

-Misfit

pogo stick
1st June 2005, 19:56
Thanks for your builds, Squid!

I just quickly tried whole 64 bit encoding chain: AviSynth with DGDecode, VirtualDub and XviD. If I did everything right, it seems about 20% faster then 32 bit on second pass!

By the way, are you interested in x264 encoding? ;)

squid_80
3rd June 2005, 07:20
Thanks for your builds, Squid!
By the way, are you interested in x264 encoding? ;)

Good to hear you got it all working :)
I've never used x264 myself, but I have heard that it's a bit on the slow side. I'll try and get a copy of the source and see how it looks... If there's not too much assembly (especially using inline __asm) it might be as simple as doing a recompile. Not sure how I'd go keeping it up to date though, the good thing about xvid is that the project has been managed very well and these days any changes are pretty small and easy to incorporate. I might just wait and see what happens with the next version of virtualdub; reading Avery Lee's latest blog entry (www.virtualdub.org) I think he's well on the way to getting 32-bit codecs to work with 64-bit vdub. If that's the case it would probably be best for me to leave 64-bit development up to the original codec developers, rather than releasing my own builds like I have been doing. (Any devs who would like assistance with 64-bit stuff, feel free to ask!)

pogo stick
6th June 2005, 19:39
There is some 64 bit code in x264 already and it seems like asm (don't know what it means :o). But it's *nix only. :( Here (http://forum.doom9.org/showthread.php?t=93716).

Sharktooth
6th June 2005, 20:06
x264 contains a lot of MMX code... it needs to be ported to SSE to make it work on win-xp x64.

Joe Fenton
6th June 2005, 22:23
x264 contains a lot of MMX code... it needs to be ported to SSE to make it work on win-xp x64.

It SHOULD but apparently XP64 does allow 64bit programs to still use MMX. The guy working on 64bit xvid just kept the MMX code in xvid and it seems to work fine in XP64.

bill_baroud
7th June 2005, 16:11
x264 contains a lot of MMX code... it needs to be ported to SSE to make it work on win-xp x64.

AFAIK, XP64 doesn't support MMX in their compiler _intrisics_.
that doesn't mean XP64 doesn't support MMX anymore (well at least, it's what i understood)

squid_80
8th June 2005, 07:20
We've been through this. MMX works fine in xp64. But using SSE2 is a better choice if possible since you can use intrinsics and the compiler should be smart enough to make the most of the extra registers. This is easier than writing external assembly and is more portable. The main disadvantage of using intrinsics was that the compiler produced crap code but this has been improved dramatically (both the compilers in the latest platform SDK and VS2005 seem pretty capable).
Having said all that, the easiest way to get a working 64-bit version of x264 would be to do what I did with xvid and just change the assembly code from x86 to x86-64 and make sure it matches the ABI. If there is linux x86-64 assembly code already then it should require very little work. Like I said I don't really have the time or experience with x264 to know if it is working as expected but if someone else wants to do it (sharktooth, weren't you attempting that once upon a time? (http://forum.doom9.org/showthread.php?p=624236#post624236)) feel free to ask for help.

Sharktooth
8th June 2005, 12:19
well, i tried and i failed :D
however linux hasn't the same "limitations" as windows has. gcc is able to compile 64bit code with both MMX, SSEx and FP.

squid_80
9th June 2005, 10:01
Ok, I've had a look at avisynth64's resizing discrepancies and here's the lowdown:

I'd missed a rounding calculation in the vertical resizing, this has been fixed (also vertical resizing should work for all colorspaces, not just YV12).
There was a bug in horizontal resizing converting dwords to words. Also fixed.
Now PointResize and BilinearResize produce the same output as a 32-bit build. But Bicubic, Lanczos and Lanczos4 can still produce values that vary by a range of about 4. Since the co-efficients used for resizing are produced using lots of floating point divisions (lanczos uses sine as well), I'm guessing there's small accuracy differences due to the compiler using different methods. This is probably something that is more or less unavoidable. The question is which has better accuracy, 32 or 64-bit? Or will anyone be able to see the difference? ;)
New avisynth64 is in the usual place: http://home.iprimus.com.au/ajdunstan/avisynth64.zip

AliceD
30th June 2005, 22:03
Hi,

i also tried to install all these progs to test the incredible performance of a 64bit sys, but unfortunately it's not working.

i got a winxp for 64bit ext. and installed it on drive d.
then i got
vdub 1.6.x for AMD64
XviD, avisynth and dgdecode somewhere from this thread i think.

so i installed XviD and the files are under windows\sysWOW64\ but nothing found in veedub or add/remove software neither under video-codecs.
then i made a d2v file with dgindex 1.2.1 as it says in the textfile of dgdecode64 and a avs but when i load the avs in veedub i get an error

"Avi Import Error: (Unknown)(80040154)"

eehhhh....

can anybody tell me how i get these things to work??

AliceD
30th June 2005, 22:05
avisynth and dgdecode are from squid_80°° thank you!

squid_80
30th June 2005, 23:55
Xvid64 and avisynth64 should be installed by right-clicking the .inf file that comes with them and choosing the install option. Nothing should be installed into windows\syswow64, they should automatically be copied into windows\system32. If they end up in syswow64 you're probably using a 32-bit file manager (people tend to do this so the context menus for winrar etc. still work) which is a bad idea.
If xvid64 is installed correctly it will definitely show up in add/remove programs and should also be listed under video codecs->properties in device manager.

AliceD
1st July 2005, 16:05
Oh yeah, it's working. Thank you. This was the problem, i installed it with total commander wich is still 32bit software. didn't know that windows even make differences from where it installes programms. i used the ms explorer and it's working.

thank you!

fumigator
18th July 2005, 00:03
Long time unseen in the scene, now I'm back. WOW squid you really grew with your Xvid, however I have not windows 64 bit installed anymore because of space problems (must get a bigger hard drive).

Anyway, the real reason I came back is that I heard someone did an experimentation with his video card, he did some work using video card hardware instead of the PC processor. Such a thing can be outstanding knowing that video card processors are much faster than x86 processors in math computations. Don't know who did this kind of experimentation, I searched over the net and found nothing, but I've heard of it and it seems to be true thing.

Thinking more about it, both Nvidia or ATI cards (or any of the kind) have hardware functions for antialiasing and resizing. So my idea is, if you encode in Divx, would be possible to let your video card do the resizing work and/or antialiasing, or even execute part of the encoding code?

What about that?

wiak
30th July 2005, 18:09
and btw 64bit xvid is the same as 32bit xvid in interface ;) its basicly the same layout but its complied on 64bit and not 32bit ;)

fumigator
30th July 2005, 18:18
look at some earlier post on this thread, specially page 3. C ya !

zToFFe
12th August 2005, 22:53
just tried those apps out,
source: PAL 704x576 (interlaced) capped with dvb-s card.
avs:
LoadPlugin("D:\dgdecode64\dgdecode.dll")
mpeg2source("D:\test64.d2v", idct=6)

the first 18 seconds of the encoding runs at 84fps,then 75, and then it starts to jump between 44,56,60,64,66,70 etc.

but when i add any of the lanczos resize filters (which i guess are not really compatible yet) the speeds drops to 28fps.

running on a:
dual xeon 3.2ghz/1mb/800fsb
2x512mb kingston ddr2 ecc/reg.

really nice work squid_80! :)

wiak
30th August 2005, 04:40
witch software do you guys make the d2v file with?

am getting this error
http://img374.imageshack.us/img374/4942/error4qq.jpg
here is how my avs file is:

LoadPlugin("D:\Download\dgdecode.dll")
mpeg2source("I:\1.d2v")

i cant get it to work :(

i have installed XviD64 (using the .inf), Avisynth64 (in system32 & i have run avisynth.reg)

System:
AMD Athlon 64 3200+
4x 512 MB TwinMOS PC3200 DDR400
4x WDC 250 GB
ATi Radeon X800 XT
ATi Theater 550 PRO
Windows XP Professional x64 Edition

i dont know what but its something that is WRONG :(

Axed
30th August 2005, 06:52
Obvious question first, did you download the update of DGIndex? If so, did you create a new D2V file or try to use the old one?

squid_80
30th August 2005, 13:26
From the readme file that I included with the 64-bit version of dgdecode:
It is based on the source code for version 1.2.1 and should be used with a corresponding version of DGIndex.
So the .d2v file needs to be made with DGIndex 1.2.1. Newer versions will create incompatible .d2v files.

Sirber
30th August 2005, 13:58
Keep the good work! I'm startimng to see an advantage to 64bit ^^

wiak
30th August 2005, 17:57
Thanks guys it realy works :D:D:D:
good work ! :D

Xayd
7th September 2005, 11:10
Awesome stuff squid. I'm anxiously awaiting more of your ports, got a dual nocona xeon machine that's begging to be a full 64 bit encoding box.

when you get back to working on filters again the remaining few filters that AutoGK uses would be awesome (removegrain, FDecimate, autocrop).

if you need anything for your efforts i'd be happy to chip in too.

IvS
24th December 2005, 05:07
1. squid_80: have you done any more work on the 64-bit XviD port?
2. Has anyone else tried seriously developing this idea?
3. Does anyone have a working link? :)

squid_80
24th December 2005, 05:24
1. squid_80: have you done any more work on the 64-bit XviD port?
2. Has anyone else tried seriously developing this idea?
3. Does anyone have a working link? :)
1. No, haven't touched it since May. But I might redo it when syskin's multithreading stuff is finished.
2. ? (not that I know of)
3. If my ftp server (ftp://squid80.no-ip.com) is unavailable try http://okejl.dk/dunstan

IvS
24th December 2005, 15:27
Thanks squid_80. I hope to see this being developed with help from others as well. The potential is huge.

NeonEva
2nd September 2006, 17:44
3. If my ftp server (ftp://squid80.no-ip.com) is unavailable try http://okejl.dk/dunstan
neither works anymore any way to get a link to new working downloads ?

daStorm
16th September 2006, 23:46
http://nwgat.net/mirrors/okejl.dk/dunstan/
^_^
have fun