Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 21st May 2003, 15:45   #61  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
I thought as much..exactly what im doing. I have to check the AVSEnv in copyall/etc because of when the DLL is being used as standalone.

But I was wondering whether this "if" statement will negate any improvment we get from using BitBlt... ?

The invokation (like that word ) could also be used to take the resize parameters from dvd2avi, but I dont think ill implement that (yet).

-Nic
Nic is offline   Reply With Quote
Old 21st May 2003, 15:51   #62  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
Even in worst case it will cost 20-30 cycles (on P4 - less on K7) - but considering the amount that's being copied this is nothing.

Besides, the processor will be able to brach predict this 100% after 3 runs, since it never changes.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 21st May 2003, 19:14   #63  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Quote:
@Tom: Well im using an AMD 1800 XP ill leave the code in and test more. Have you spotted the bug in Add_Block? Can you fix it?
Nic -

Could you set a break point and confirm it is even going through the new _SSE versions of those functions on your Athlon?

I'll still take a look at the Add_Block. It's possible I really can't do 8 bit arithmatic there without loss of precision but it's probably just a silly bug.

- Tom
trbarry is offline   Reply With Quote
Old 21st May 2003, 19:46   #64  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
I can see no reason at all, why Toms Add_Block should be slower - everything tells me it should be faster. Loop unrolling will have a larger effect on P4 compared to K7, but still it should be faster.

There are fewer instructions in the loop - no brach mispredicts.

It might be connected to the non-linear memory access. Is it possible to do a version, that accesses memory more linear (and doesn't look up eax, followed by eax+edx) - it does however seem unavoidable to me.

Perhaps doing:
Code:
        // make rfp qwords 0, 1
        prefetchnta [eax+edx*4]
        movq    mm2, [eax]        // get rfp val
        movq    mm3, [eax+edx]      // "
        movq    mm0, [ebx+0*16]
        movq    mm1, [ebx+1*16]
        packsswb    mm0, [ebx+0*16+8]   // pack with SIGNED saturate (unlike old way)
        packsswb    mm1, [ebx+1*16+8]   // pack with SIGNED saturate
Might be a bit faster (moved eax loopups to top, to avoid stalling the ebx lookups, and prefetching further down). Remove the prefetch, if ISSE is not allowed here.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 21st May 2003, 21:51   #65  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Sh0dan -

From what Nic said the Add_Block wasn't the slow part. It was the broken part.

And I didn't want to add a separate section there for SSEMMX, hence the lack of prefetch. That part is not in a tight loop anyway.

But right now I first have to figure out how to even get the right answer.

- Tom
trbarry is offline   Reply With Quote
Old 22nd May 2003, 08:57   #66  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
Yup Add_Block was causing the errors for me. They were only slight (you could see random blocks appear). If you need a test clip to re-produce it ill put one up along with an accompanying d2v file

As for my minor progress, the crop stuff is working and tested, the iDCT is now done using a function pointer, Ive added two more iDCTs: one is Skal's from his MPEG-4 project (which is the fastest ive ever come across, he's given me permission to put into mpeg2dec) and the other is SimpleiDCT from XviD, which is known to have very high precision (although a tad slower, thought it might be useful ?).

Ive added Sh0dan's suggestion of using BitBlt and all the external code (i.e. using MPEG2Dec3.dll without avisynth) seems to be working fine. (ive written a little commandline example to go with the source i.e. GetPic d2vfile frame output.bmp -> for capturing bitmaps)

The speed is now definitely faster on all machines ive tested, but trbarry's intra/non-intra code is still slowing it down on my Athlon ? But I cant think why, ill test more.

Hope that all sounds ok

-Nic
Nic is offline   Reply With Quote
Old 22nd May 2003, 09:32   #67  |  Link
JohnMK
Registered User
 
Join Date: Sep 2002
Location: Seattle
Posts: 551
Can you post your source or a binary? I'm just learning how to use Visual Studio .Net Professional + ICL 7.1 and I'd love to experiment.
JohnMK is offline   Reply With Quote
Old 22nd May 2003, 09:56   #68  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
Use 1.04 source (first post of this thread) for now. ill post it when its ready for release, which will hopefully be later today (I dont have it on me now, hopefully Tom/Sh0dan will fix the new Add_Block in that time)

Ive got ICL 7.1 as well (well ive got the evaluation version, until they send the full license). Doesn't make any real difference about 1-2fps faster also had a problem with some of the 3DNow assembler If I remember correctly

Cheers,
-Nic
Nic is offline   Reply With Quote
Old 22nd May 2003, 22:54   #69  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Quote:
The speed is now definitely faster on all machines ive tested, but trbarry's intra/non-intra code is still slowing it down on my Athlon ? But I cant think why, ill test more.
Nic -

Did you ever check if the following block of code (in Getpic) is even being executed? Mine's the only place in Getpic that checks that cpu.ssemmx flag and I can't test that myself on an Athlon.

- Tom

Code:
/* decode blocks */
	// separate rtn for ssemmx now - trbarry 5/2003
	if (cpu.ssemmx)
	{
		for (comp=0; comp<block_count; comp++)
		{
			if (coded_block_pattern & (1<<(block_count-1-comp)))
			{
				if (*macroblock_type & MACROBLOCK_INTRA)
					Decode_MPEG2_Intra_Block_SSE(comp, dc_dct_pred);
				else
					Decode_MPEG2_Non_Intra_Block_SSE(comp);
				if (Fault_Flag) {
					#ifdef PROFILING
				//			stop_decMB_timer();
					#endif
					return 0;	// trigger: go to next slice
				}
			}
		}
	}
	else
trbarry is offline   Reply With Quote
Old 23rd May 2003, 08:40   #70  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
Yup I did check...It does get called (it would do, the Athlon XP has all extended instructions apart from SSE2).

I honestly dont know why its slower ? But it does appear to be (but only slightly). Did you manage to fix add_block?

Cheers,
-Nic
Nic is offline   Reply With Quote
Old 23rd May 2003, 09:35   #71  |  Link
Sigmatador
Guest
 
Posts: n/a
[HS]Someone knows this compilateur ?
http://www.codeplay.com/vectorc/bench.html[/hs]
  Reply With Quote
Old 23rd May 2003, 12:28   #72  |  Link
Acaila
Retired
 
Acaila's Avatar
 
Join Date: Jan 2002
Location: Netherlands
Posts: 1,529
I tried Codeplay out a while ago, and this is what I think of it:

- Despite that the demos look really promising, it is specifically made for vectorizing code (2D/3D modelling, rotations like the demos), and isn't very spectacular on normal stuff.
- I could hardly find any code that compiled on it at all. Almost everything gave an error of some sort.
- The latest version supports C++, but that support is so limited that you're better off using it only for plain C.
- In my opinion it was very expensive. $100 for a nutured version, $800 for the full version.
- The nutured version isn't capable of Athlon optimizations (although at first they said it would be P4 that wouldn't be supported). And since I have an Athlon I very much didn't like that .
Acaila is offline   Reply With Quote
Old 23rd May 2003, 14:59   #73  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
vectorc is primarily aimed at float point optimizations. MPEG2DEC doesn't really contain any float point code, so I very much doubt it will be of much use here.

However we really cannot know this until it is tested.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 23rd May 2003, 15:07   #74  |  Link
Sigmatador
Guest
 
Posts: n/a
i talk about vectorc, not specially for mpeg2dec3, but for general c/c++/asm compiling (and increasing filter speed, for people, like me who are able to write asm code slower than their pure c one ^^ )
  Reply With Quote
Old 23rd May 2003, 16:42   #75  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Quote:
Did you manage to fix add_block?
Nic -

I found it but haven't corrected it yet. There are 2 sections of mmx code in Add_Block and the first one can not be done in 8 bit arithmatic without overflowing. But I can still optimize it a bit. Hopefully I'll get a replacement out today.

Should I still base it on v 1.04?

- Tom
trbarry is offline   Reply With Quote
Old 23rd May 2003, 17:46   #76  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
Uh - conflicts

IMO you should use 1.05.. Maybe you should just check for SSE2 instead - that way it will only run on P4 boxes for now.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 23rd May 2003, 18:10   #77  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
Use any version you like trbarry, or just post the fixed code snippets and ill add it into the version ive made up.

Ill just take your code and fit into my version, test it and release it. And then that will be that for a little while I think

Cheers,
-Nic

ps
BTW: I just bunged up quick the latest source at:
http://nic.dnsalias.com/src.zip
Just in case you want to see what ive done so far

Last edited by Nic; 23rd May 2003 at 18:21.
Nic is offline   Reply With Quote
Old 23rd May 2003, 18:38   #78  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Nic -

Oops. Too late to get the new one first.

I just posted a fixed Add_Block function snippet at:
www.trbarry.com/Add_Block.txt .

And maybe give some thought to Sh0dans comments about SSE2 only for the performance functions. Though it should run faster on P3's too. Maybe Athlon has a slower bsr instruction (emulated?). I don't often use that but don't see a fast way around it.

- Tom
trbarry is offline   Reply With Quote
Old 23rd May 2003, 18:47   #79  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
Quote:
Originally posted by trbarry
Maybe Athlon has a slower bsr instruction (emulated?). I don't often use that but don't see a fast way around it.

- Tom
Yes - BSR is VectorPath, and executes in at least 10 cycles. Furthermore the code isn't pairable at all, leaving 2 of the three pipes unused.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 23rd May 2003, 19:31   #80  |  Link
Nic
Moderator
 
Join Date: Oct 2001
Location: England
Posts: 3,285
Thanks Tom Im off to see the Matrix Reloaded tonight, so I cant test now. But Ive got a P4 and an old Athlon 800 to test on at home, so ill try them and check

Cheers,
-Nic
Nic is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 13:10.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.