Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 13th May 2003, 08:59   #21  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
@alx:
Very Weird, Ive not added anything, only taken bits out of loops, i.e. every time a frame was got a memory allocation was done (& leaked) as well as the iDCT would get refreshed as well as a bunch of variables being set that dont need to be set, etc etc
I think its impossible for mine to be slower (unless the intel compilers causing it, but it makes it faster on mine), but ill look into it
As for dvd2avi_nic, lol, maybe ill never release it. Mainly because I dont use Comp check, so dvd2avi_nic doesnt have one. But people will want it, so Id better code one first.

@trbarry: Add_Block was just one that it mentioned (that I remembered). Its important and may not be able to be improved. The SSE iDCT is there, not the other SSE2 code. Marc FD removed/ifdef'd it because it was unstable (?) I think or at least not producing accurate results. If I had a SSE2 computer for development id test through each bit & find the bits that worked (i.e. created the same output as the SSEMMX parts). But I dont at present (you got any free time ? )

BTW: Marc's post on SSE2:-
http://forum.doom9.org/showthread.ph...SE2#post207193

-Nic

Last edited by Nic; 13th May 2003 at 09:17.
Nic is offline   Reply With Quote
Old 13th May 2003, 12:55   #22  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
If memory is no longer aligned, a minor penalty can be a expected on Athlon (and other processors). But it seems like a lot - MPEG2 decoding shouldn't take much more than 10-15% of the overall processing time.

Could you repeat the test, just to be sure it isn't something strange like windows swapping or something.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 13th May 2003, 13:08   #23  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
@sh0dan: the memory is still aligned. Windows cacheing make a big difference on small tests...as ive been finding out. im going to try and make mpeg2dec3's disk accessing more efficient and then stop for now.

-Nic
Nic is offline   Reply With Quote
Old 13th May 2003, 14:20   #24  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Quote:
@trbarry: Add_Block was just one that it mentioned (that I remembered). Its important and may not be able to be improved. The SSE iDCT is there, not the other SSE2 code. Marc FD removed/ifdef'd it because it was unstable (?) I think or at least not producing accurate results. If I had a SSE2 computer for development id test through each bit & find the bits that worked (i.e. created the same output as the SSEMMX parts). But I dont at present (you got any free time ? )
Nic -

As Int21h pointed out the SSE2 code (except for IDCT) made only marginal improvements, though it made more of a difference on my machine than his for some reason. But some of it was very sensitive to compiler optimization and would crash with some settings and combination of inlining. That's probably why Marc FD had to turn it off. There is a whole series of timing tests written up for similar DVD2AVI code in the DVD2AVI section in that huge DVD2AVI Sourceforge thread somewhere.

I've been planning for awhile to add a couple more simpler assembler optimizations but haven't quite got to it. These would just need P3's (or less), not P4's.

First, the iDCT code should probably be called via pointer like in Xvid (or my DctFilter), avoiding all the extra logic in AddBlock. You want to do this? (the SSE2 prefetch call is unneeded)

Second, the assembler now in AddBlock can be easily optimized a bit.

And third, and more important, would probably be asm optimizing the dequant functions. This wouldn't be hard and -h nags us about this from time to time.

I'll try to get to those.

I haven't checked yet to see if there are unneeded data copies in YV12 but maybe you can get those if you can find them. It certainly seems YV12 should be able to copy data fewer times since (IIRC for MPEG2DEC2) for YUY2 there was first a pass to planar 4:2:2 and then a conversion to YUY2. It seems at least one and maybe both of those should be unneeded.

- Tom

Last edited by trbarry; 13th May 2003 at 14:24.
trbarry is offline   Reply With Quote
Old 13th May 2003, 14:42   #25  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
Trying to get my head around whats safe and whats not to store in the GOPBuffer is tricky. Im tempted to re-write that whole bit.

Ill do the iDCT pointer stuff as thats a good idea indeed.

I had a look at the dequant functions ages ago. If you find any speedups or improvements let me know

Anything you can give to it would be very appreciated Tom Please post here if you come with any improvements

Cheers,
-Nic
Nic is offline   Reply With Quote
Old 14th May 2003, 12:05   #26  |  Link
hakko504
Remember Rule One
 
hakko504's Avatar
 
Join Date: Oct 2001
Location: SWEDEN
Posts: 1,611
I'll update the DVD2AVI FAQ ASAP.
__________________
/hakko

http://www.boardgamegeek.com
hakko504 is offline   Reply With Quote
Old 14th May 2003, 13:51   #27  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Quote:
I had a look at the dequant functions ages ago. If you find any speedups or improvements let me know
I think there are a couple quick asm tweaks I can make without trying to recode the whole dequant stuff. I'll try that today and see if it helps any.

Is 1.04 the best source to start from now?

- Tom
trbarry is offline   Reply With Quote
Old 14th May 2003, 14:15   #28  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
Yup it is the best to start from, I wanted to do more last night but corrupted my registry while fitting a new gfx card.doh.

-Nic
Nic is offline   Reply With Quote
Old 14th May 2003, 21:43   #29  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Coding now.

But I notice that running in the debugger I will get messages about bad heap free's etc. from Virtualdubmod when I exit or try to do a "Save & Refresh". This does not happen with previous versions. There may be a problem with new storage management.

- Tom
trbarry is offline   Reply With Quote
Old 14th May 2003, 23:41   #30  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Well, back to the drawing board.

My first simple attempts to optimize the dequant stuff made it about 2% slower.

- Tom
trbarry is offline   Reply With Quote
Old 15th May 2003, 16:27   #31  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
"I will get messages about bad heap free's etc. from Virtualdubmod when I exit or try to do a "Save & Refresh". This does not happen with previous versions. " Are you refering to previous versions of MPEG2Dec3 or VDubMod ?

"My first simple attempts to optimize the dequant stuff made it about 2% slower". No Luck Im sure you'll be able to do something to speed it up though

Cheers,
-Nic
Nic is offline   Reply With Quote
Old 15th May 2003, 16:40   #32  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
I also get these errors, when debugging through VirtualDubMod - probably for about a month or two. Quite annoying actually - but it made me wonder - how can we even get these unless there is some code somewhere which is compiled in debug mode?

AFAIK these checks are only present in Debug mode - or am I mistaking?
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 15th May 2003, 19:13   #33  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
V 1.0.5 temp version for testing only

Quote:
Are you refering to previous versions of MPEG2Dec3 or VDubMod ?
Nic -

It happens if I compile your 1.0.4 (or my new one) for debug.

Anyway, I did a little more optimization, based upon your changes. I only seem to be able to squeeze out another 1-2% improvement from it but added to your recent changes that might add up to about 5-6%.

And I'm using VS6 without the Intel compiler so maybe if you compiled and hosted it we might get a tad more. And I'm sure there's still more to do somewhere.

I temporarily put out the source and dll for you or anyone to test at

edit: Removed link to buggy test version.

I changed some assembler code in GetPic.cpp functions Add_Block(), Decode_MPEG2_Intra_Block(), and Decode_MPEG2_Non_Intra_Block().

The changes will only help machines with ssemmx. This would include all P3's, P4's, Athlons, Durons, and Celerons > about 550 mhz. Older machines won't notice the difference.

I kind of eyed the sse2 stuff again but decided it maybe wasn't worthwhile playing with again right now.

- Tom

Last edited by trbarry; 20th May 2003 at 18:16.
trbarry is offline   Reply With Quote
Old 15th May 2003, 19:51   #34  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
Do have you profiled to see which functions are using the most time?

__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 15th May 2003, 20:23   #35  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
My version of VS6 is not the one with profiling. I had a trial version of Intel Vtune a year ago when I was fooling with this stuff in DVD2AVI but that has long since expired.

Is there a good free way to profile stuff?

- Tom
trbarry is offline   Reply With Quote
Old 15th May 2003, 20:32   #36  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
AMD CodeAnalyst is a very great tool IMO - it enables functionwise profiling, and pipeline analysis (with detail state/stall information). I don't know if it requires an AMD CPU though.

I'm downloading it now (their server is dog-slow from where I sit).

A very minor thing I noticed while browsing (I know it's nitpicking):

Two pack instructions doesn't pair, so you could save a few (two) cycles by doing:
Code:
        movq    mm0, [ebx+0*16]
        movq    mm1, [ebx+1*16]
        packsswb    mm0, [ebx+0*16+8]   // pack with SIGNED saturate (unlike old way)
        movq    mm2, [eax]        // get rfp val
        movq    mm3, [eax+edx]      // "
        packsswb    mm1, [ebx+1*16+8]   // pack with SIGNED saturate
But since it is probably quite memory saturated, it probably doesn't matter even a bit.

In general most routines seem memory-intense - so either faster RAM or less memory use is probably the only way to get any significant speedups.
__________________
Regards, sh0dan // VoxPod

Last edited by sh0dan; 15th May 2003 at 20:43.
sh0dan is offline   Reply With Quote
Old 15th May 2003, 23:36   #37  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
"But since it is probably quite memory saturated, it probably doesn't matter even a bit. "

Can't hurt. I'll change it.

I agree that most of our stuff is memory bound. Nic's probably right that we should next be checking in MPEG2DEC3 to see if there are still any unneeded buffer copies now that we are returning YV12.

My problem with optimizing with vTune was that it got a bit confused by all the inlining used by MPEG2DEC. If I compiled with debug and no inlining then I could get very clear results that no longer matched the usual usage profile since there are a lot of small rtn's that without inlining will spend a good amount of time in linkage.

Let me know if you find out whether the AMD analyzer works only on AMD boxes. I downloaded it over a year ago but then never tried it for some reason (think I forgot aboout it).

- Tom
trbarry is offline   Reply With Quote
Old 16th May 2003, 09:12   #38  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
@Tom: CodeAnalyst is AMD Box only Ill disassemble it and see why that might be...

-Nic
Nic is offline   Reply With Quote
Old 16th May 2003, 11:24   #39  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
Some numbers:

Source: SVCD 480x480 (sorry I currently have no DVD material)
Processor: Athlon 500 non-DDR memory.
AVS2AVI -> XviD "null encoder".
Code:
77.63% mpeg2dec3.dll
11.52% avisynth.dll
That's about what can be expected overhead.

Distribution within mpeg2dec3.dll
Code:
16.13% SSEMMX_IDCT
11.79% CMPEG2Decoder::Copyall
10.53% CMPEG2Decoder::decode_macroblock
4.68%  CMPEG2Decoder::motion_compensation
3.84%  CMPEG2Decoder::Copyodd
2.66%  MC_put_16_mmxext
2.55%  CMPEG2Decoder::Show_Bits
Copyall is a bit suspicious. Either there are a LOT of copying going on, or there is some inefficiencies. Block prefetching/movntq might be worth trying out. I'll try replacing it with a bitblit.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 16th May 2003, 12:20   #40  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
vfapidec.cpp:
Code:
void CMPEG2Decoder::Copyall(YV12PICT *src, YV12PICT *dst)
{
  AVSenv->BitBlt(dst->y, dst->ypitch, src->y, src->ypitch, src->ypitch, Coded_Picture_Height);
  AVSenv->BitBlt(dst->u, dst->uvpitch, src->u, src->uvpitch, src->uvpitch, Coded_Picture_Height>>1);
  AVSenv->BitBlt(dst->v, dst->uvpitch, src->v, src->uvpitch, src->uvpitch, Coded_Picture_Height>>1);
} 

void CMPEG2Decoder::Copyodd(YV12PICT *src, YV12PICT *dst)
{
  AVSenv->BitBlt(dst->y, dst->ypitch*2, src->y,src->ypitch*2, src->ypitch, Coded_Picture_Height>>1);
  AVSenv->BitBlt(dst->u, dst->uvpitch*2, src->u,src->uvpitch*2, src->uvpitch, Coded_Picture_Height>>2);
  AVSenv->BitBlt(dst->v, dst->uvpitch*2, src->v,src->uvpitch*2, src->uvpitch, Coded_Picture_Height>>2);
}

void CMPEG2Decoder::Copyeven(YV12PICT *src, YV12PICT *dst)
{
  AVSenv->BitBlt(dst->y+dst->ypitch, dst->ypitch*2, src->y+src->ypitch, src->ypitch*2, src->ypitch, Coded_Picture_Height>>1);
  AVSenv->BitBlt(dst->u+dst->uvpitch, dst->uvpitch*2, src->u+src->uvpitch, src->uvpitch*2, src->uvpitch, Coded_Picture_Height>>2);
  AVSenv->BitBlt(dst->v+dst->uvpitch, dst->uvpitch*2, src->v+src->uvpitch, src->uvpitch*2, src->uvpitch, Coded_Picture_Height>>2);
}
AviSynthAPI.cpp:
Code:
PVideoFrame __stdcall MPEG2Source::GetFrame(int n, IScriptEnvironment* env)
{
  m_decoder.AVSenv = env;
  [...]
global.h
Code:
class MPEG2DEC_API CMPEG2Decoder
{
	friend class MPEG2Source;
protected:
  IScriptEnvironment* AVSenv;
I don't see much change here (maybe a percent) - could you test on DDR systems?
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:13.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.