Koepi vs personal build

plugh · 19th April 2007, 22:22

I had noted in the past that Koepi's build of 1.1.2 was bigger than my builds.
For example, xvidcore.dll - Koepi 748KB - mine 588KB

I just chalked it up to differant compilers but assumed the output was the same.

I don't normally muck around with the core (my interest has been the 2nd pass vbv stuff, which I moved to vfw.dll) so I normally use Koepi's core.dll, but I was just trying an experiment and discovered that my assumption was incorrect. For the same clip and encoder options I get somewhat differant output files depending upon which xvidcore build I use.

Which begs the questions:
Which one is 'right'?
How do I determine that?
If mine is 'wrong' how do I fix it?

I've been using the visual studio workspace/project files included with the xvid source kit. I'm using MS Visual Studio 6 with Service Pack 5, Processor Pack 5, and nasm 0.98.39. I was under the impression this was a 'correct' setup. I have noted that do I get an error about an unknown compiler option 'Qipo' but as I understand it that is an option used by the Intel C compiler, not the MS one, and is harmless - perhaps not?

Suggestions?

Dark Shikari · 20th April 2007, 00:27

He's probably using more aggressive optimization options. In GCC land, that might be -O3 and -funroll-loops that would increase the size as such.

plugh · 20th April 2007, 01:58

Yeah, I figure he's using a differant compiler and/or options.

The significant point is that the results are differant as well.

As a first take, I diff'd the .pass files for my 7800 frame clip. These were both "full quality" first passes (not using the fast approx routines).

There are no differences in I frames (interval 240), scattered differences in isolated b-frames, and two segments with runs of differant p and b frames - one 198 frames long, the other 210 frames long. For example: ( <! is Koepi, !> is mine )

Code:

 <! b 3 0 2560 0 4104 2879
 !> b 3 0 2560 0 4106 2900

 <! b 3 0 2560 0 5498 4357
 !> b 3 0 2560 0 5471 4337

 <! b 3 0 2560 0 15629 7732
 !> b 3 0 2560 0 15652 7742

 <! b 3 0 2560 0 13280 6691
 !> b 3 0 2560 0 13280 6690

 <! b 3 0 2560 0 7909 5051
 !> b 3 0 2560 0 7907 5048

 <! b 3 0 2560 0 10205 5730
 !> b 3 0 2560 0 10211 5734

 <! b 3 0 2560 0 11031 5689
 !> b 3 0 2560 0 11034 5682

 <! b 3 0 2560 0 4950 2688
 !> b 3 0 2560 0 4947 2688

 <! b 3 0 2560 0 3489 2439
 !> b 3 0 2560 0 3495 2437

and

Code:

 <! p 2 47 2505 8 17747 5163
 !> p 2 46 2506 8 17743 5161
 <! b 3 0 2560 0 4003 2819
 !> b 3 0 2560 0 3999 2814
 <! p 2 48 2503 9 17272 5148
 !> p 2 48 2503 9 17271 5145
 <! b 3 0 2560 0 3695 2587
 !> b 3 0 2560 0 3705 2596
 <! p 2 47 2512 1 17926 5131
 !> p 2 47 2512 1 17929 5132
 <! b 3 0 2560 0 3206 2219
 !> b 3 0 2560 0 3204 2215
 <! p 2 45 2511 4 17892 4939
 !> p 2 46 2510 4 17906 4939
 <! b 3 0 2560 0 4197 2658
 !> b 3 0 2560 0 4220 2673
 <! p 2 111 2442 7 18951 5259
 !> p 2 112 2441 7 18939 5250
 <! b 3 0 2560 0 2699 1958
 !> b 3 0 2560 0 2699 1952
 <! p 2 75 2474 11 16939 4795
 !> p 2 75 2474 11 16949 4799
 <! b 3 0 2560 0 3496 2529
 !> b 3 0 2560 0 3493 2529
 <! p 2 68 2484 8 17503 4963
 !> p 2 68 2484 8 17502 4963
 <! b 3 0 2560 0 3120 2309
 !> b 3 0 2560 0 3113 2298
 <! p 2 70 2480 10 18055 4899
 !> p 2 70 2480 10 18068 4894
(continues)

fwiw, output files
Koepi 118,568,960 bytes
mine 118,566,912 bytes

Does this provide any clues as to what / where / why they are differant?
How do you determine which one is "correct"?

henryho_hk · 20th April 2007, 02:44

Celtic Druid's 01 Apr builds are nearly 1MB. I am using Celtic Druid's March builds because the April snapshot seems to have problems in 2-pass rate control.

It appears to be difficult to judge the "correctness" as we don't even have a reference code base (which beta, alpha or even snapshots?) nor build (maybe plain MSVC 7 or old stable GCC w/o any optimization?).

Dark Shikari · 20th April 2007, 03:08

If they are using exactly the same codebase, the only thing I can think of is that he's using some option like -ffast-math that doesn't abide perfectly by the ANSI C standards. Its most likely that you're using different codebases though.

plugh · 20th April 2007, 04:16

The source kit is the "Xvid 1.1.2 stable release" source kit from xvid.org. The source kit includes "Generic install procedure for Win32/MSVC" instructions (from 2004), and includes MS Visual Studio workspace and project files. Assuming you have the required software, unzip it, open the workspace, select the project, and build it. Done!

I believe Koepi's build is also based upon this code base.

Like I said, the only glitch I encountered is that the canned build wants to use a "Qipo" compile switch, which MSVC noted and ignored.

To be fair, the bulk of the encode appears to be the same - perhaps 500 frames out of the 7800 were differant. But that still seems like a lot from simply using a differant compiler.

henryho_hk · 20th April 2007, 06:46

Sorry for my ignorance.

Isn't "/Qipo" an ICL option? Owing to the past records of ICL, I believe your pure MSVC compile is more "correct".

plugh · 20th April 2007, 07:29

Yes, from what I was able to find on the net, I believe that is an Intel C compiler option. I too thought that was odd, coming from the Xvid canned Visual Studio project files.

I have a (not-installed) copy someplace in my archives. I vaguely recall there was some kind of integration, where you could substitute it for the MS C compiler - compatible command lines etc. But the docs included in the xvid source kit don't mention it at all, just gcc and MSVC.

celtic_druid · 21st April 2007, 09:45

Yeah, with ICL installed there is an option in VC6 to pick what compiler to use. /Qipo enables multi-file optimisation.

plugh · 21st April 2007, 17:10

Hmmm. So perhaps I should try using ICL and see what happens.

As there doesn't appear to be a clear answer as to which is "correct", it occurred to me to ask 'which is better'? In that context, I stumbled across this MSU Video Quality Measurement Tool which appears to be tailor made for this - allowing comparison between source (avs input) and two alternative encodes (avi input).

Has anyone else used this tool? Any advice?

sysKin · 22nd April 2007, 08:46

The output should be identical regardless of the compiler. Something's funny, maybe Koepi used some earlier sources?

celtic_druid · 22nd April 2007, 13:52

In that case I guess also try my 1.1.2 compile since it was also compiled with ICL. See if the output matches.

plugh · 22nd April 2007, 15:25

Koepi and celtic_druid builds produced identical results.

So it seems safe to conclude both used same sources, and both used ICL?

celtic_druid, did you use the 1.1.2 source kit download zip file, or cvs?

Anyone got a gcc build of 1.1.2 xvidcore.dll ?

Guess I'll dig out ICL from my archives, and see what that gives me...

EDIT: change 'identical' to 'nearly identical' - the .pass files *are* identical, the avi files differ by _one_ byte part way in.

celtic_druid · 22nd April 2007, 15:40

Could have been zip/tar.bz or CVS. Can't remember.

http://ffdshow.faireal.net/mirror/Xv...id-1.1.2gcc.7z
Should be a bunch of gcc builds for different CPU's from recollection.

plugh · 22nd April 2007, 18:28

As I'm using an 'Applebred' (Duron 1.8Ghz) cpu, I used the gcc Athlon-XP build (and I'm not overclocking).

Results:

Comparing the Koepi and gcc-xp .pass files, I get a set of differences ("set1")
Comparing gcc-xp and my-msvc .pass files, I get a set of differences ("set2")
Comparing my-msvc and Koepi .pass files, I get a set of differences ("set3")

-->in the context of this test clip<--

set3 has the smallest number of differences, set2 the largest.

working from smallest to largest, comparing

set3 and set1 - there are two frames in set3 not in set1
(set3 is NOT a proper subset of set1)
set1 and set2 - set1 IS a proper subset of set2

I'm using a differant clip than I used above; with this clip, ALL the differences are B-frames (no P-frames like above).

I also dug out and installed ICL 9.0.28 (fairly old), recompiled using it, and compared to Koepi .pass file - only three frames differant, interestingly 2 P's and 1 B, and only by one byte lengths.

plugh · 22nd April 2007, 19:39

Another quick set of comparisons...

Same clip as above, same three builds (Koepi, my-msvc, gcc-xp)
This time, vhq=0 and 'vhq for b-frames' off (was vhq=3 and 'on' above)

my-msvc vs gcc-xp - .pass files were identical, and _one_ byte difference in avi files (same byte position as above when comparing Koepi and druid ICL builds, but differant values)

Koepi vs gcc-xp, and Koepi vs my-msvc - many differences, including both P and B frames

So msvc and gcc give same results, until you turn on vhq.
ICL builds give lots of differences from ths common result.

(I stated this previously, but as a reminder, these were 'full quality' first passes; NOT using the fast approximation routines.)

plugh · 22nd April 2007, 23:35

More comparisons between msvc and gcc build...

They continue to produce identical results, across the entire range of vhq settings, as long as "VHQ for B-frames" is off.

Looks like there is a something in estimation_rd_based_bvop.c and/or its unique subordinates that cause msvc/gcc to differ.

These tests also lead to the conclusion that the ICL builds are BORKED.

I'm continuing to compare behaviour on other code paths - qpel, gmc, ...

sysKin · 23rd April 2007, 01:56

Quote:

Originally Posted by plugh

They continue to produce identical results, across the entire range of vhq settings, as long as "VHQ for B-frames" is off.

Looks like there is a something in estimation_rd_based_bvop.c and/or its unique subordinates that cause msvc/gcc to differ.

You're on to something here ~~ perhaps you'll find the evil elusive bug that causes the output sometimes (very very rarely) depend on number of threads.

plugh · 23rd April 2007, 02:52

I'm not doing multiple threads, so not sure how this helps that.

FYI - I've been using a custom profile that has 4MV off; but have tested 4MV on as well, and so far there does not seem to be dependance on that flag elsewhere - the two builds continue to give identical results with vhq-b off.

I've decided to hold off on the qpel testing; I don't use it anyway, and it further multiplies the number of tests. (Though I did do one; vhqb=off, vhq=4, 4mv=on, qpel=on, h263, yeilding a single p-frame difference between the builds.)

There seem to be three major predicates in that vhq-b module; h263 or not, inter4v or not, qpel or not. So I'll try some test cases and see what turns up...

But I'm back to my "original" problem - which build (msvc, gcc), if either, is behaving "correctly"? Perhaps this is a gcc glitch...

Are there any xvidcore.dll builds around using yet another compiler? The ICL ones are out, since they differ no matter what encoder options I select...

BTW syskin, perhaps you can answer a question. I've noticed a several constructs in that module similar to this one

Code:

	switch(mode) {
		case MODE_DIRECT: return Data_d->iMinSAD[0];
		case MODE_FORWARD: return Data_f->iMinSAD[0];
		case MODE_BACKWARD: return Data_b->iMinSAD[0];
		default:
		case MODE_INTERPOLATE: return Data_i->iMinSAD[0];

My question is - is 'case interpolate' the correct path when mode is DIRECT_NONE_MV or (in particular) DIRECT_NO4V?

plugh · 23rd April 2007, 15:32

Continuing my comparisons between msvc to gcc,
I tried various relevant encoder options to probe
the "vhq for b-frames" behavioural difference.

The results were inconclusive.
So I decided to try a differant tack.

Note that I am using a Duron 1.8GHz
that has mmx, xmm, sse, 3dne, 3dne2

Using encode options vhq-b=on, vhq=1, 4mv=off
(this is _a_ case where msvc and gcc builds differ)

Code:

Using a normal 'optimized' msvc build, compare

h263,default	h263,mmxonly	identical
mpeg,default	mpeg,mmxonly	differant

Using a build with estimation_rd_based_bvop.c compiled noopt

h263,default	h263,mmxonly	identical
mpeg,default	mpeg,mmxonly	differant

So far so good. Horizontal differences, while not ideal,
*may* simply indicate an accuracy difference between the
mmx asm routines, and the default mixed, non-orthogonal,
set used on my processor.

Now if there is NO optimization sensitivity on the relevant
C code paths, then comparing the above 8 encodes *vertically*
should give me all identical comparison results. They don't!

Code:

identical	identical
DIFFERANT	identical

To make this clear, whether this module is optimized or not
*should* have no effect on its results. The vertical comparisons
*should* all be 'identical'. But it appears there is an interaction
between the optimization status of this module and the default mix
of asm routines used on my processor. This is not good.

For what it's worth, the relevant asm routines are:

quant_mpeg_inter_xmm and dequant_mpeg_inter_3dne
--vs--
quant_mpeg_inter_mmx and dequant_mpeg_inter_mmx
(and perhaps fdct_mmx_skal vs fdct_xmm_skal )

These are called from within two 'large' static inline routines.
Block_CalcBits_BVOP and Block_CalcBits_BVOP_direct

These two inline routines are virtually identical (only ONE line
is differant), and they are invoked multiple times both within
and outside of loops. My gut tells me that the various compiler
optimizers are having a field day with this.

Where to go from here? Beats me - perhaps someone 'out there'
can look at those asm routines and the relevant C code and figure
out why there is an interaction with the C compiler's optimizer...

19th April 2007, 22:22	#1 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Koepi vs personal build I had noted in the past that Koepi's build of 1.1.2 was bigger than my builds. For example, xvidcore.dll - Koepi 748KB - mine 588KB I just chalked it up to differant compilers but assumed the output was the same. I don't normally muck around with the core (my interest has been the 2nd pass vbv stuff, which I moved to vfw.dll) so I normally use Koepi's core.dll, but I was just trying an experiment and discovered that my assumption was incorrect. For the same clip and encoder options I get somewhat differant output files depending upon which xvidcore build I use. Which begs the questions: Which one is 'right'? How do I determine that? If mine is 'wrong' how do I fix it? I've been using the visual studio workspace/project files included with the xvid source kit. I'm using MS Visual Studio 6 with Service Pack 5, Processor Pack 5, and nasm 0.98.39. I was under the impression this was a 'correct' setup. I have noted that do I get an error about an unknown compiler option 'Qipo' but as I understand it that is an option used by the Intel C compiler, not the MS one, and is harmless - perhaps not? Suggestions?

20th April 2007, 06:46	#7 \| Link
henryho_hk Registered User Join Date: Mar 2004 Posts: 889	Sorry for my ignorance. Isn't "/Qipo" an ICL option? Owing to the past records of ICL, I believe your pure MSVC compile is more "correct". Last edited by henryho_hk; 20th April 2007 at 11:59.

22nd April 2007, 08:46	#11 \| Link
sysKin Registered User Join Date: Jun 2002 Location: Adelaide, Australia Posts: 1,167	The output should be identical regardless of the compiler. Something's funny, maybe Koepi used some earlier sources? __________________ Visit #xvid or #x264 at irc.freenode.net

22nd April 2007, 15:25	#13 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Koepi and celtic_druid builds produced identical results. So it seems safe to conclude both used same sources, and both used ICL? celtic_druid, did you use the 1.1.2 source kit download zip file, or cvs? Anyone got a gcc build of 1.1.2 xvidcore.dll ? Guess I'll dig out ICL from my archives, and see what that gives me... EDIT: change 'identical' to 'nearly identical' - the .pass files are identical, the avi files differ by _one_ byte part way in. Last edited by plugh; 22nd April 2007 at 15:35.

22nd April 2007, 18:28	#15 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	As I'm using an 'Applebred' (Duron 1.8Ghz) cpu, I used the gcc Athlon-XP build (and I'm not overclocking). Results: Comparing the Koepi and gcc-xp .pass files, I get a set of differences ("set1") Comparing gcc-xp and my-msvc .pass files, I get a set of differences ("set2") Comparing my-msvc and Koepi .pass files, I get a set of differences ("set3") -->in the context of this test clip<-- set3 has the smallest number of differences, set2 the largest. working from smallest to largest, comparing set3 and set1 - there are two frames in set3 not in set1 (set3 is NOT a proper subset of set1) set1 and set2 - set1 IS a proper subset of set2 I'm using a differant clip than I used above; with this clip, ALL the differences are B-frames (no P-frames like above). I also dug out and installed ICL 9.0.28 (fairly old), recompiled using it, and compared to Koepi .pass file - only three frames differant, interestingly 2 P's and 1 B, and only by one byte lengths. Last edited by plugh; 22nd April 2007 at 18:40.

20th April 2007, 00:27	#2 \| Link
Dark Shikari x264 developer Join Date: Sep 2005 Posts: 8,666	He's probably using more aggressive optimization options. In GCC land, that might be -O3 and -funroll-loops that would increase the size as such.

20th April 2007, 02:44	#4 \| Link
henryho_hk Registered User Join Date: Mar 2004 Posts: 889	Celtic Druid's 01 Apr builds are nearly 1MB. I am using Celtic Druid's March builds because the April snapshot seems to have problems in 2-pass rate control. It appears to be difficult to judge the "correctness" as we don't even have a reference code base (which beta, alpha or even snapshots?) nor build (maybe plain MSVC 7 or old stable GCC w/o any optimization?).

20th April 2007, 03:08	#5 \| Link
Dark Shikari x264 developer Join Date: Sep 2005 Posts: 8,666	If they are using exactly the same codebase, the only thing I can think of is that he's using some option like -ffast-math that doesn't abide perfectly by the ANSI C standards. Its most likely that you're using different codebases though.

20th April 2007, 04:16	#6 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	The source kit is the "Xvid 1.1.2 stable release" source kit from xvid.org. The source kit includes "Generic install procedure for Win32/MSVC" instructions (from 2004), and includes MS Visual Studio workspace and project files. Assuming you have the required software, unzip it, open the workspace, select the project, and build it. Done! I believe Koepi's build is also based upon this code base. Like I said, the only glitch I encountered is that the canned build wants to use a "Qipo" compile switch, which MSVC noted and ignored. To be fair, the bulk of the encode appears to be the same - perhaps 500 frames out of the 7800 were differant. But that still seems like a lot from simply using a differant compiler.

20th April 2007, 07:29	#8 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Yes, from what I was able to find on the net, I believe that is an Intel C compiler option. I too thought that was odd, coming from the Xvid canned Visual Studio project files. I have a (not-installed) copy someplace in my archives. I vaguely recall there was some kind of integration, where you could substitute it for the MS C compiler - compatible command lines etc. But the docs included in the xvid source kit don't mention it at all, just gcc and MSVC.

21st April 2007, 09:45	#9 \| Link
celtic_druid Registered User Join Date: Oct 2001 Location: Melbourne, Australia Posts: 2,171	Yeah, with ICL installed there is an option in VC6 to pick what compiler to use. /Qipo enables multi-file optimisation.

21st April 2007, 17:10	#10 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Hmmm. So perhaps I should try using ICL and see what happens. As there doesn't appear to be a clear answer as to which is "correct", it occurred to me to ask 'which is better'? In that context, I stumbled across this MSU Video Quality Measurement Tool which appears to be tailor made for this - allowing comparison between source (avs input) and two alternative encodes (avi input). Has anyone else used this tool? Any advice?

22nd April 2007, 13:52	#12 \| Link
celtic_druid Registered User Join Date: Oct 2001 Location: Melbourne, Australia Posts: 2,171	In that case I guess also try my 1.1.2 compile since it was also compiled with ICL. See if the output matches.

22nd April 2007, 15:40	#14 \| Link
celtic_druid Registered User Join Date: Oct 2001 Location: Melbourne, Australia Posts: 2,171	Could have been zip/tar.bz or CVS. Can't remember. http://ffdshow.faireal.net/mirror/Xv...id-1.1.2gcc.7z Should be a bunch of gcc builds for different CPU's from recollection.

22nd April 2007, 19:39	#16 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Another quick set of comparisons... Same clip as above, same three builds (Koepi, my-msvc, gcc-xp) This time, vhq=0 and 'vhq for b-frames' off (was vhq=3 and 'on' above) my-msvc vs gcc-xp - .pass files were identical, and _one_ byte difference in avi files (same byte position as above when comparing Koepi and druid ICL builds, but differant values) Koepi vs gcc-xp, and Koepi vs my-msvc - many differences, including both P and B frames So msvc and gcc give same results, until you turn on vhq. ICL builds give lots of differences from ths common result. (I stated this previously, but as a reminder, these were 'full quality' first passes; NOT using the fast approximation routines.) Last edited by plugh; 22nd April 2007 at 19:54.

22nd April 2007, 23:35	#17 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	More comparisons between msvc and gcc build... They continue to produce identical results, across the entire range of vhq settings, as long as "VHQ for B-frames" is off. Looks like there is a something in estimation_rd_based_bvop.c and/or its unique subordinates that cause msvc/gcc to differ. These tests also lead to the conclusion that the ICL builds are BORKED. I'm continuing to compare behaviour on other code paths - qpel, gmc, ...

23rd April 2007, 02:52	#19 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	I'm not doing multiple threads, so not sure how this helps that. FYI - I've been using a custom profile that has 4MV off; but have tested 4MV on as well, and so far there does not seem to be dependance on that flag elsewhere - the two builds continue to give identical results with vhq-b off. I've decided to hold off on the qpel testing; I don't use it anyway, and it further multiplies the number of tests. (Though I did do one; vhqb=off, vhq=4, 4mv=on, qpel=on, h263, yeilding a single p-frame difference between the builds.) There seem to be three major predicates in that vhq-b module; h263 or not, inter4v or not, qpel or not. So I'll try some test cases and see what turns up... But I'm back to my "original" problem - which build (msvc, gcc), if either, is behaving "correctly"? Perhaps this is a gcc glitch... Are there any xvidcore.dll builds around using yet another compiler? The ICL ones are out, since they differ no matter what encoder options I select... BTW syskin, perhaps you can answer a question. I've noticed a several constructs in that module similar to this one Code: switch(mode) { case MODE_DIRECT: return Data_d->iMinSAD[0]; case MODE_FORWARD: return Data_f->iMinSAD[0]; case MODE_BACKWARD: return Data_b->iMinSAD[0]; default: case MODE_INTERPOLATE: return Data_i->iMinSAD[0]; My question is - is 'case interpolate' the correct path when mode is DIRECT_NONE_MV or (in particular) DIRECT_NO4V? Last edited by plugh; 23rd April 2007 at 03:11.

23rd April 2007, 15:32	#20 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Continuing my comparisons between msvc to gcc, I tried various relevant encoder options to probe the "vhq for b-frames" behavioural difference. The results were inconclusive. So I decided to try a differant tack. Note that I am using a Duron 1.8GHz that has mmx, xmm, sse, 3dne, 3dne2 Using encode options vhq-b=on, vhq=1, 4mv=off (this is _a_ case where msvc and gcc builds differ) Code: Using a normal 'optimized' msvc build, compare h263,default h263,mmxonly identical mpeg,default mpeg,mmxonly differant Using a build with estimation_rd_based_bvop.c compiled noopt h263,default h263,mmxonly identical mpeg,default mpeg,mmxonly differant So far so good. Horizontal differences, while not ideal, may simply indicate an accuracy difference between the mmx asm routines, and the default mixed, non-orthogonal, set used on my processor. Now if there is NO optimization sensitivity on the relevant C code paths, then comparing the above 8 encodes vertically should give me all identical comparison results. They don't! Code: identical identical DIFFERANT identical To make this clear, whether this module is optimized or not should have no effect on its results. The vertical comparisons should all be 'identical'. But it appears there is an interaction between the optimization status of this module and the default mix of asm routines used on my processor. This is not good. For what it's worth, the relevant asm routines are: quant_mpeg_inter_xmm and dequant_mpeg_inter_3dne --vs-- quant_mpeg_inter_mmx and dequant_mpeg_inter_mmx (and perhaps fdct_mmx_skal vs fdct_xmm_skal ) These are called from within two 'large' static inline routines. Block_CalcBits_BVOP and Block_CalcBits_BVOP_direct These two inline routines are virtually identical (only ONE line is differant), and they are invoked multiple times both within and outside of loops. My gut tells me that the various compiler optimizers are having a field day with this. Where to go from here? Beats me - perhaps someone 'out there' can look at those asm routines and the relevant C code and figure out why there is an interaction with the C compiler's optimizer... Last edited by plugh; 23rd April 2007 at 16:07.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode