Koepi vs personal build - Page 2

Dark Shikari · 23rd April 2007, 17:03

It sounds like you're using an option like -ffast-math (GCC example) that causes deviation from the ANSI C standard.

sysKin · 23rd April 2007, 17:36

Quote:

Originally Posted by Dark Shikari

It sounds like you're using an option like -ffast-math (GCC example) that causes deviation from the ANSI C standard.

It's a floating point option, it's completely irrelevant.

plugh · 23rd April 2007, 22:24

The difference set in my preceding post, between the two mpeg+default cpuflags encodes (using opt or noopt on that bvop rd module) consists of a ~180 frame sequence of P and B frames, starting at a P frame. It ends at the next I frame. The rest of the 5000 frame clip is identical.

Very curious indeed - somehow opt/noopt of *this* module polluted / affected a P frame?

Ran another series of tests, expanding on the 'diff' axes
constant -> vhq-b=on, vhq=1, 4mv=off, mpeg
variable -> range of cpu flags
10 encodes

Code:

		  /diff\  /same\  /same\  /SAME\
full optimize	mmx	xmm	sse	3dn	3de
		 |	 |	 |	 |	 |
		same	same	same	same	DIFF
		 |	 |	 |	 |	 |
noopt bvop rd	mmx	xmm	sse	3dn	3de
		  \diff/  \same/  \same/  \DIFF/

I really don't know what to make of this. Going from 3dn to 3de switches in a fairly sizable set of asm routines, including dequant_mpeg_inter_3dne. When estimation_rd_based_bvop is compiled with optimizer, I get same results as with xmm, sse, and 3dn. But when it is compiled noopt, this causes results to change. (Or perhaps result was *supposed* to change in the optimized build case, but didn't?)

Well, at least it further narrow things down...

plugh · 24th April 2007, 00:44

Well, I still don't know what/why, but I can say this particular issue IS _directly_ related to asm routine dequant_mpeg_inter_3dne (in module quantize_mpeg_xmm.asm) and NOT the other 3dne asm routines...

I modified xvid.c and commented out the function pointer assignment for this routine, leaving all others alone. Rebuilt, with bvop rd still 'noopt', reran that corner case, and now output matches.

Really need someone who knows those SIMD instructions to look at that routine... Why, with the routine enabled, do we get differant output for the 'opt' and 'noopt' cases?

Dark Shikari · 24th April 2007, 14:36

Quote:

Originally Posted by sysKin

It's a floating point option, it's completely irrelevant.

Wait, xvid doesn't use floating point?

plugh · 24th April 2007, 15:26

Quote:

Originally Posted by Dark Shikari

Wait, xvid doesn't use floating point?

I can't comment in general, but I don't see any in that module at least. I do know there are floats in plugin_2pass2, but that wouldn't impact this.

I remembered I had a VMware virtual machine with W2K and a gcc setup on it (Msys 1.0.10, MinGW 4.1.0, gcc 3.4.4). Added nasm, installed xvid 1.1.2 sources, tried doing a build - success! So now I can poke at the gcc built version and see what it reveals about that module.

FWIW, the canned xvid+gcc build procedure uses the following gcc flags:
-Wall -O2 -fstrength-reduce -finline-functions -freduce-all-givs -ffast-math -fomit-frame-pointer

(note the 'fast-math' referred to earlier)

Guess I need to read up on them...

re-EDIT: Just did a quick one shot encode comparison between gcc build with/without fast-math flag --> one B frame is slightly differant in the .pass files. However, the gcc build *with* fast-math and msvc are in agreement on that particular frame.

plugh · 24th April 2007, 20:43

re: opt/noopt and asm routines

Could this be a case of a missing emms/femms someplace?

EDIT: Just tried an experiment, and it looks like the answer is "yes".

plugh · 25th April 2007, 21:26

It has been an extremely tedious process, but I have tracked back to a specific chunk of code that gives differant results when compiled with msvc vs gcc.

(When I say tedious I mean it - tracking backwards through the code, identifying where a particular rd mode evaluation for a particular macroblock for a particular frame goes weird)

The chuck of code is

Code:

static __inline uint32_t
d_mv_bits(int x, int y, const VECTOR pred, const uint32_t iFcode, const int qpel)
{
	unsigned int bits;

	x <<= qpel;
	y <<= qpel;

	x -= pred.x;
	bits = (x != 0 ? iFcode:0);
	x = -abs(x);
	x >>= (iFcode - 1);
	bits += r_mvtab[x+63];

	y -= pred.y;
	bits += (y != 0 ? iFcode:0);
	y = -abs(y);
	y >>= (iFcode - 1);
	bits += r_mvtab[y+63];

	return bits;
}

in motion_inlines.h; r_mvtab is defined there as well.

The arguments being passed in this particular case are
x=-64 y=63 pred={x=63,y=15} iFcode=2 qpel=0

msvc produces the value 4128837
gcc produces the value 14
I manually calc it, and I get 26

The call stack is:
ModeDecision_BVOP_RD ->
SearchInterpolate_RD ->
CheckCandidateRDInt ->
the first instance in the following statement

Code:

rd += BITS_MULT * (d_mv_bits(xf, yf, data->predMV, data->iFcode, data->qpel^data->qpel_precision)
		+ d_mv_bits(xb, yb, data->bpredMV, data->iFcode, data->qpel^data->qpel_precision));

WTF!

plugh · 25th April 2007, 22:24

Found another one; differant macroblock, differant rd mode

The arguments being passed
x=63 y=24 pred={x=-64,y=-20} iFcode=2 qpel=0

msvc computes 4128837 (again)
gcc computes 14 (again)
I manually calculate 26 (again)

Call stack is
ModeDecision_BVOP_RD ->
SearchBF_RD (mode is Forward) ->
CheckCandidateRDBF ->
The following line

Code:

rd += BITS_MULT*(d_mv_bits(x, y, data->predMV, data->iFcode, data->qpel^data->qpel_precision)-2);

plugh · 25th April 2007, 23:10

Bingo - I see it!

-127 integer divide by two is -63

-127 shift right once (sign extended) is -64

It's going out the top of the array...

So the next question is -

Is this a bug in the routine, or is "-64" an illegal/out-of-range value for a vector?

Perhaps some asm routine not rounding / range limiting correctly?

plugh · 26th April 2007, 02:27

Well as an experiment, I changed three lines in motion_inlines.h

Code:

	static const int r_mvtab[64] = {
to
	static const int r_mvtab[65] = {12,

	bits += r_mvtab[x+63];
to
	bits += r_mvtab[x+64];

	bits += r_mvtab[y+63];
to
	bits += r_mvtab[y+64];

then created normal msvc and gcc optimized builds and did four encodes - vhq-b=on, vhq=1, 4mv=off, with both h263 and mpeg quant, with both dlls.

Compared the paired output .pass and .avi files, and they are identical now!

Not saying the above is a "fix", but it does seem to show that both compilers are generating equivalent functional representations of the source code (unlike the ICL builds). I'll probably run some more comparison series (range of VHQ, range of cpu-flags), but I have much greater confidence that my builds are 'right' now.

I hope someone knowledgable will chime in and indicate if "-64" is a valid value for a vector component - if it is, then the above *is* a fix. If not, it's just a workaround for some badly behaved code elsewhere (which both msvc and gcc compilers are building as directed

) Might be interesting to see if this change improves psnr/ssim/xyzzy...

The other weirdness with the opt vs noopt msvc builds and that one asm routine - I'm not sure what to think about that one. As an experiment, I added a 'femms' instruction to the asm file just before the return, and it magically caused the noopt build to produce the same output as the opt build - not the other way around.

Again, I hope someone more knowledgable will look at that oddity...

Manao · 26th April 2007, 06:10

A motion vector goes from -2^x to 2^x - 1/2 ( or 1/4 for QPel ), so yes, -64 is valid ( in your case, -64 is -16 integer pixels, and 63 is 15.75 integer pel ).

sysKin · 26th April 2007, 10:03

Whoa plugh what a great work.

Yes -64 is valid. So we were nicely reading r_mvtab[-1]? Great, I wonder why memory access analysis tools didn't pick it up

I suppose I should stick this d_mv_bits() after motion vector writing code and assert that calculated length is the actual bitstream length. This will make us 100% sure nothing else is wrong.

Although, then again, I did have such assertion for a whole macroblock (part of VHQ debugging). I suppose vectors of -64 were never chosen (as they appeared to be horribly costly, 44 kilobits!) and therefore assertion was never hit.

plugh · 26th April 2007, 13:44

So that three line change *can* be considered a "fix" for 1.1.2?

Given I'm only working with "HD" encodes (and with 4mv off), perhaps that magnitude of vector is somewhat more likely? ie MB displacement across X% of the image literally crosses more pixels? (Don't know what I'm talking about, but it sounds good anyway

)

BTW, there was one other thing in the huge volume of debug print data I collected that struck me - I'll pass it on, for whatever it is worth.

In ModeDecisionBVOP_RD, right after the "evaluate cost of all modes" loop, the values for d_rd, f_rd, b_rd, i_rd were frequently the same (with my short test clip). The code is evaluating the modes in increasing SAD order, but in this case should it simply choose the 'first' mode at that cost?

EDIT: Duh - stupid question; you want the one with the lowest SAD. never mind...

Anyway, it happens enough (multiple modes yeilding same rd) in my data that it caught my eye, so I thought I'd mention it. Seemed odd, given radically differant code paths.

Examples: - Frame 182, the 4 SADs, evaluation order stuff, x/y of MB, the four RD costs, the chosen cost/mode

182 ds=464 bs=301 fs=464 is=238 bst=238 order=1 2 0 3 num=4 I0 B1 D2 F3 x=41 y=0 d=770 f=786 b=770 i=770 rd=770 mod=1
182 ds=324 bs=306 fs=340 is=216 bst=216 order=1 2 0 3 num=4 I0 B1 D2 F3 x=22 y=1 d=1179 f=1195 b=1179 i=1179 rd=1179 mod=1

plugh · 27th April 2007, 08:31

Out of curiosity, I also did a build using ICL 9.0.28 with the above "fix", and compared it to the msvc/gcc builds.

The difference set is now much smaller, however there are still differences. I've poked at it some, and made the following observations.

1) Ever so often, the VOP header is a single bit longer than 'usual'. This extra bit is sometimes enough to cause the byte-padded frame to be a single byte longer. The msvc and ICL builds do not do this 'in sync' with each other. Thus, a comparison of the .pass files for ICL vs msvc shows occasional one byte frame length differences. No such difference is observed comparing msvc vs gcc .pass files.

The source of this difference in behaviour is the following routine in encoder.c

Code:

simplify_time(int *inc, int *base)
{
	/* common factor */
	const int s = gcd(*inc, *base);
  *inc  /= s;
  *base /= s;

	if (*base > 65535 || *inc > 65535) {
		int *biggest;
		int *other;
		float div;

		if (*base > *inc) {
			biggest = base;
			other = inc;
		} else {
			biggest = inc;
			other = base;
		}
		
		div = ((float)*biggest)/((float)65535);
		*biggest = (unsigned int)(((float)*biggest)/div);
		*other = (unsigned int)(((float)*other)/div);
	}
}

In my case avisynth was feeding an 'inc' of 41708 and 'base' of 1,000,000. The above code, in attempting to normalize the base to 65535, actually returns 65534 with the ICL build.

2) If I encode a very short sequence of frames (so that I don't encounter that extra bit/byte 'time' thing above), then binary compare the avi files, I consistently show a single byte difference per frame. In my test case, the msvc build will have an 'FF' where the ICL build has an 'FB'. I don't have any tool to parse the avi and tell me where this byte is in the frame (though I would guess it's at the end?)

Again, the msvc and gcc builds show no such difference. I'm suspicious of the "bitstream" code in this case, but will leave that as 'an exercise' for someone else...

plugh · 27th April 2007, 15:56

When I did my initial "fixed" msvc/gcc/icl compare above I collected one other datum, which yielded a quite surprising comparative result:

Code:

time (min:sec) to complete test-clip encode

		msvc	gcc	icl
h263 quant	16:11	16:30	16:39
mpeg quant	17:35	17:52	18:22

dll size	580KB	728KB	808KB

I did NOT expect this.

The only hypothesis I can come up with is that the more compact dll works better with my cache-challenged Duron processor. Guess which one I'll be using for my future encodes

If anyone wants to experiment, attached is msvc build of v1.1.2 xvidcore with the above arraysize "fix".

EDIT: withdrawn, based upon syskin's post below. Updated build here

foxyshadis · 28th April 2007, 07:01

If it helps, avisynth had its own problems with fps and ended up with this function to fix things up:

Code:

// This function uses continued fractions to find the best rational
// approximation that satisfies (denom <= limit).  The algorithm
// is from Wikipedia, Continued Fractions.
//
static void reduce_frac(unsigned &num, unsigned &den, unsigned limit)
{
  unsigned n0 = 0, n1 = 1, n2, nx = num;    // numerators
  unsigned d0 = 1, d1 = 0, d2, dx = den;  // denominators
  unsigned a2, ax, amin;  // integer parts of quotients
  unsigned f1, f2;        // fractional parts of quotients
  int i = 0;  // number of loop iterations

  while (1) { // calculate convergents
    a2 = nx / dx;
    f2 = nx % dx;
    n2 = n0 + n1 * a2;
    d2 = d0 + d1 * a2;

    if (f2 == 0) break;
    if ((i++) && (d2 >= limit)) break;

    n0 = n1; n1 = n2;
    d0 = d1; d1 = d2;
    nx = dx; dx = f1 = f2;
  }
  if (d2 <= limit)
  {
    num = n2; den = d2;  // use last convergent
  }
  else { // (d2 > limit)
    // d2 = d0 + d1 * ax
    // d1 * ax = d2 - d1
    ax = (limit - d0) / d1;  // set d2 = limit and solve for a2

    if ((a2 % 2 == 0) && (d0 * f1 > f2 * d1))
      amin = a2 / 2;  // passed 1/2 a_k admissibility test
    else
      amin = a2 / 2 + 1;

    if (ax < amin) {
      // use previous convergent
      num = n1;
      den = d1;
    }
    else {
      // calculate best semiconvergent
      num   = n0 + n1 * ax;
      den = d0 + d1 * ax;
    }
  }
}

Hopefully Intel would be kinder to this one, as well as giving smaller fractions.

celtic_druid · 28th April 2007, 10:17

Maybe -Os or -O2 -fno-reorder-blocks -fno-reorder-functions would be faster for Duron's? That along with -march=athlon-xp

plugh · 28th April 2007, 17:27

I just used the canned build options from the source kit, as my focus was getting the builds to produce identical output.

I may experiment in that area some, but it would mean re-running encoder output comparisons (a time consuming process) to insure such tweaks didn't change the results - like that msvc opt/noopt oddity I discuss above...

sysKin · 28th April 2007, 17:37

OK I committed the d_mv_bits out-of-bouds memory access bustage.

Unfortunately the fix is not correct. For some negative vectors which land in the range mv_table[64-34]..[64-64], the correct value seems to be 11 not 12.

I added an assertion that fails when incorrectly-estimated vector is coded.

I am not sure if the logic is incorrect in one place, or maybe the entire mv_bits can't be calculated in such "smart", branchless way. Following the code from CodeVector is surely correct but unfortunately measurably slower.

We should just use a LUT.

Anyway, overestimating cost by one in those rare cases (I need to encode over 200 frames to hit the assertion) should have absolutely no effect on quality.

23rd April 2007, 22:24	#23 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	The difference set in my preceding post, between the two mpeg+default cpuflags encodes (using opt or noopt on that bvop rd module) consists of a ~180 frame sequence of P and B frames, starting at a P frame. It ends at the next I frame. The rest of the 5000 frame clip is identical. Very curious indeed - somehow opt/noopt of this module polluted / affected a P frame? Ran another series of tests, expanding on the 'diff' axes constant -> vhq-b=on, vhq=1, 4mv=off, mpeg variable -> range of cpu flags 10 encodes Code: /diff\ /same\ /same\ /SAME\ full optimize mmx xmm sse 3dn 3de \| \| \| \| \| same same same same DIFF \| \| \| \| \| noopt bvop rd mmx xmm sse 3dn 3de \diff/ \same/ \same/ \DIFF/ I really don't know what to make of this. Going from 3dn to 3de switches in a fairly sizable set of asm routines, including dequant_mpeg_inter_3dne. When estimation_rd_based_bvop is compiled with optimizer, I get same results as with xmm, sse, and 3dn. But when it is compiled noopt, this causes results to change. (Or perhaps result was supposed to change in the optimized build case, but didn't?) Well, at least it further narrow things down... Last edited by plugh; 23rd April 2007 at 23:54.

24th April 2007, 00:44	#24 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Well, I still don't know what/why, but I can say this particular issue IS _directly_ related to asm routine dequant_mpeg_inter_3dne (in module quantize_mpeg_xmm.asm) and NOT the other 3dne asm routines... I modified xvid.c and commented out the function pointer assignment for this routine, leaving all others alone. Rebuilt, with bvop rd still 'noopt', reran that corner case, and now output matches. Really need someone who knows those SIMD instructions to look at that routine... Why, with the routine enabled, do we get differant output for the 'opt' and 'noopt' cases? Last edited by plugh; 24th April 2007 at 00:56.

24th April 2007, 20:43	#27 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	re: opt/noopt and asm routines Could this be a case of a missing emms/femms someplace? EDIT: Just tried an experiment, and it looks like the answer is "yes". Last edited by plugh; 24th April 2007 at 21:53.

25th April 2007, 21:26	#28 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	It has been an *extremely* tedious process, but I have tracked back to a specific chunk of code that gives differant results when compiled with msvc vs gcc. (When I say tedious I mean it - tracking backwards through the code, identifying where a particular rd mode evaluation for a particular macroblock for a particular frame goes weird) The chuck of code is Code: static __inline uint32_t d_mv_bits(int x, int y, const VECTOR pred, const uint32_t iFcode, const int qpel) { unsigned int bits; x <<= qpel; y <<= qpel; x -= pred.x; bits = (x != 0 ? iFcode:0); x = -abs(x); x >>= (iFcode - 1); bits += r_mvtab[x+63]; y -= pred.y; bits += (y != 0 ? iFcode:0); y = -abs(y); y >>= (iFcode - 1); bits += r_mvtab[y+63]; return bits; } in motion_inlines.h; r_mvtab is defined there as well. The arguments being passed in this particular case are x=-64 y=63 pred={x=63,y=15} iFcode=2 qpel=0 msvc produces the value 4128837 gcc produces the value 14 I manually calc it, and I get 26 The call stack is: ModeDecision_BVOP_RD -> SearchInterpolate_RD -> CheckCandidateRDInt -> the first instance in the following statement Code: rd += BITS_MULT * (d_mv_bits(xf, yf, data->predMV, data->iFcode, data->qpel^data->qpel_precision) + d_mv_bits(xb, yb, data->bpredMV, data->iFcode, data->qpel^data->qpel_precision)); WTF!

25th April 2007, 22:24	#29 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Found another one; differant macroblock, differant rd mode The arguments being passed x=63 y=24 pred={x=-64,y=-20} iFcode=2 qpel=0 msvc computes 4128837 (again) gcc computes 14 (again) I manually calculate 26 (again) Call stack is ModeDecision_BVOP_RD -> SearchBF_RD (mode is Forward) -> CheckCandidateRDBF -> The following line Code: rd += BITS_MULT*(d_mv_bits(x, y, data->predMV, data->iFcode, data->qpel^data->qpel_precision)-2);

23rd April 2007, 17:03	#21 \| Link
Dark Shikari x264 developer Join Date: Sep 2005 Posts: 8,666	It sounds like you're using an option like -ffast-math (GCC example) that causes deviation from the ANSI C standard.

25th April 2007, 23:10	#30 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Bingo - I see it! -127 integer divide by two is -63 -127 shift right once (sign extended) is -64 It's going out the top of the array... So the next question is - Is this a bug in the routine, or is "-64" an illegal/out-of-range value for a vector? Perhaps some asm routine not rounding / range limiting correctly? Last edited by plugh; 25th April 2007 at 23:25.

26th April 2007, 02:27	#31 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Well as an experiment, I changed three lines in motion_inlines.h Code: static const int r_mvtab[64] = { to static const int r_mvtab[65] = {12, bits += r_mvtab[x+63]; to bits += r_mvtab[x+64]; bits += r_mvtab[y+63]; to bits += r_mvtab[y+64]; then created normal msvc and gcc optimized builds and did four encodes - vhq-b=on, vhq=1, 4mv=off, with both h263 and mpeg quant, with both dlls. Compared the paired output .pass and .avi files, and they are identical now! Not saying the above is a "fix", but it does seem to show that both compilers are generating equivalent functional representations of the source code (unlike the ICL builds). I'll probably run some more comparison series (range of VHQ, range of cpu-flags), but I have much greater confidence that my builds are 'right' now. I hope someone knowledgable will chime in and indicate if "-64" is a valid value for a vector component - if it is, then the above is a fix. If not, it's just a workaround for some badly behaved code elsewhere (which both msvc and gcc compilers are building as directed ) Might be interesting to see if this change improves psnr/ssim/xyzzy... The other weirdness with the opt vs noopt msvc builds and that one asm routine - I'm not sure what to think about that one. As an experiment, I added a 'femms' instruction to the asm file just before the return, and it magically caused the noopt build to produce the same output as the opt build - not the other way around. Again, I hope someone more knowledgable will look at that oddity...

26th April 2007, 06:10	#32 \| Link
Manao Registered User Join Date: Jan 2002 Location: France Posts: 2,856	A motion vector goes from -2^x to 2^x - 1/2 ( or 1/4 for QPel ), so yes, -64 is valid ( in your case, -64 is -16 integer pixels, and 63 is 15.75 integer pel ). __________________ Masktools x86 & x64: Stable (2.0a48) AVCMatrices : Stable (1.3) Anisotool : Beta (1.0a5)

26th April 2007, 10:03	#33 \| Link
sysKin Registered User Join Date: Jun 2002 Location: Adelaide, Australia Posts: 1,167	Whoa plugh what a great work. Yes -64 is valid. So we were nicely reading r_mvtab[-1]? Great, I wonder why memory access analysis tools didn't pick it up I suppose I should stick this d_mv_bits() after motion vector writing code and assert that calculated length is the actual bitstream length. This will make us 100% sure nothing else is wrong. Although, then again, I did have such assertion for a whole macroblock (part of VHQ debugging). I suppose vectors of -64 were never chosen (as they appeared to be horribly costly, 44 kilobits!) and therefore assertion was never hit. __________________ Visit #xvid or #x264 at irc.freenode.net

26th April 2007, 13:44	#34 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	So that three line change can be considered a "fix" for 1.1.2? Given I'm only working with "HD" encodes (and with 4mv off), perhaps that magnitude of vector is somewhat more likely? ie MB displacement across X% of the image literally crosses more pixels? (Don't know what I'm talking about, but it sounds good anyway ) BTW, there was one other thing in the huge volume of debug print data I collected that struck me - I'll pass it on, for whatever it is worth. In ModeDecisionBVOP_RD, right after the "evaluate cost of all modes" loop, the values for d_rd, f_rd, b_rd, i_rd were frequently the same (with my short test clip). The code is evaluating the modes in increasing SAD order, but in this case should it simply choose the 'first' mode at that cost? EDIT: Duh - stupid question; you want the one with the lowest SAD. never mind... Anyway, it happens enough (multiple modes yeilding same rd) in my data that it caught my eye, so I thought I'd mention it. Seemed odd, given radically differant code paths. Examples: - Frame 182, the 4 SADs, evaluation order stuff, x/y of MB, the four RD costs, the chosen cost/mode 182 ds=464 bs=301 fs=464 is=238 bst=238 order=1 2 0 3 num=4 I0 B1 D2 F3 x=41 y=0 d=770 f=786 b=770 i=770 rd=770 mod=1 182 ds=324 bs=306 fs=340 is=216 bst=216 order=1 2 0 3 num=4 I0 B1 D2 F3 x=22 y=1 d=1179 f=1195 b=1179 i=1179 rd=1179 mod=1 Last edited by plugh; 26th April 2007 at 15:58.

27th April 2007, 08:31	#35 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	Out of curiosity, I also did a build using ICL 9.0.28 with the above "fix", and compared it to the msvc/gcc builds. The difference set is now much smaller, however there are still differences. I've poked at it some, and made the following observations. 1) Ever so often, the VOP header is a single bit longer than 'usual'. This extra bit is sometimes enough to cause the byte-padded frame to be a single byte longer. The msvc and ICL builds do not do this 'in sync' with each other. Thus, a comparison of the .pass files for ICL vs msvc shows occasional one byte frame length differences. No such difference is observed comparing msvc vs gcc .pass files. The source of this difference in behaviour is the following routine in encoder.c Code: simplify_time(int inc, int base) { /* common factor / const int s = gcd(inc, base); inc /= s; base /= s; if (base > 65535 \|\| inc > 65535) { int biggest; int other; float div; if (base > inc) { biggest = base; other = inc; } else { biggest = inc; other = base; } div = ((float)biggest)/((float)65535); biggest = (unsigned int)(((float)biggest)/div); other = (unsigned int)(((float)other)/div); } } In my case avisynth was feeding an 'inc' of 41708 and 'base' of 1,000,000. The above code, in attempting to normalize the base to 65535, actually returns 65534 with the ICL build. 2) If I encode a very short sequence of frames (so that I don't encounter that extra bit/byte 'time' thing above), then binary compare the avi files, I consistently show a single byte difference per frame. In my test case, the msvc build will have an 'FF' where the ICL build has an 'FB'. I don't have any tool to parse the avi and tell me where this byte is in the frame (though I would guess it's at the end?) Again, the msvc and gcc builds show no such difference. I'm suspicious of the "bitstream" code in this case, but will leave that as 'an exercise' for someone else... Last edited by plugh; 27th April 2007 at 08:46.

27th April 2007, 15:56	#36 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	When I did my initial "fixed" msvc/gcc/icl compare above I collected one other datum, which yielded a quite surprising comparative result: Code: time (min:sec) to complete test-clip encode msvc gcc icl h263 quant 16:11 16:30 16:39 mpeg quant 17:35 17:52 18:22 dll size 580KB 728KB 808KB I did NOT expect this. The only hypothesis I can come up with is that the more compact dll works better with my cache-challenged Duron processor. Guess which one I'll be using for my future encodes If anyone wants to experiment, attached is msvc build of v1.1.2 xvidcore with the above arraysize "fix". EDIT: withdrawn, based upon syskin's post below. Updated build here Last edited by plugh; 28th April 2007 at 20:04.

28th April 2007, 10:17	#38 \| Link
celtic_druid Registered User Join Date: Oct 2001 Location: Melbourne, Australia Posts: 2,171	Maybe -Os or -O2 -fno-reorder-blocks -fno-reorder-functions would be faster for Duron's? That along with -march=athlon-xp

28th April 2007, 17:27	#39 \| Link
plugh A hollow voice says Join Date: Sep 2006 Posts: 269	I just used the canned build options from the source kit, as my focus was getting the builds to produce identical output. I may experiment in that area some, but it would mean re-running encoder output comparisons (a time consuming process) to insure such tweaks didn't change the results - like that msvc opt/noopt oddity I discuss above... Last edited by plugh; 28th April 2007 at 17:46.

28th April 2007, 17:37	#40 \| Link
sysKin Registered User Join Date: Jun 2002 Location: Adelaide, Australia Posts: 1,167	OK I committed the d_mv_bits out-of-bouds memory access bustage. Unfortunately the fix is not correct. For some negative vectors which land in the range mv_table[64-34]..[64-64], the correct value seems to be 11 not 12. I added an assertion that fails when incorrectly-estimated vector is coded. I am not sure if the logic is incorrect in one place, or maybe the entire mv_bits can't be calculated in such "smart", branchless way. Following the code from CodeVector is surely correct but unfortunately measurably slower. We should just use a LUT. Anyway, overestimating cost by one in those rare cases (I need to encode over 200 frames to hit the assertion) should have absolutely no effect on quality. __________________ Visit #xvid or #x264 at irc.freenode.net Last edited by sysKin; 28th April 2007 at 17:49.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode