PDA

View Full Version : Stupid brain


SansGrip
26th November 2002, 22:58
I'm currently attempting to port FluxSmooth to MMX and my brain isn't cooperating at all. Here's the algorithm, in a depressingly small nutshell:


cpel = pel from curr frame
ppel = pel from prev frame
npel = pel from next frame

if (cpel < ppel and cpel < npel) or (cpel > ppel and cpel > npel)
sum = mean of immediate temporal neighbours within temporal threshold
sum += mean of immediate spatial neighbours within spatial threshold
new pel = sum / count of values in sum


I've figured out the conditional part and have the code for it, but it's the averaging that I'm having the biggest difficulty with. Life would be fine if I could keep everything in packed bytes and use pavgb, but I fear the cumulative error would be substantial. The only alternative I can see is to process two pels at a time and work with packed words.

I read the C3D source but it made me even more confused. Would any kind soul out there be able to push me in the right direction? I think once I have an idea of how to proceed I'll be able to code it up fairly easily. It seems to be getting started that's the hard part...

Any help would be much appreciated.

vlad59
27th November 2002, 17:55
Originally posted by SansGrip
I'm currently attempting to port FluxSmooth to MMX and my brain isn't cooperating at all. Here's the algorithm, in a depressingly small nutshell:

...

I read the C3D source but it made me even more confused. Would any kind soul out there be able to push me in the right direction? I think once I have an idea of how to proceed I'll be able to code it up fairly easily. It seems to be getting started that's the hard part...

Any help would be much appreciated.

If that you read the YUV2 version of C3D it's normal you got confused I'm sometimes confused too when I tried to reread it ;) . YV12 was way easier for C3D.

What I did for C3D YUV2 is to apply the test to the 8 bytes (full mmx register) and to unpack the low part and high part to packed word to make all the paddw.

Are you working with YV12 or YUV2 ?????

If you have more specific questions, feel free to ask.

SansGrip
27th November 2002, 22:26
What I did for C3D YUV2 is to apply the test to the 8 bytes (full mmx register)

By the test I assume you mean the threshold test...?

and to unpack the low part and high part to packed word to make all the paddw.

Yes, that's kind of what I'm thinking too.

Are you working with YV12 or YUV2 ?????

Both. Flux is going to support YUY2 and YV12, at least until 2.5 is stable, at which point I might delete all the YUY2 code while drinking champagne ;).

I'm going to grab the C3D YV12 source and take a look right now.

If you have more specific questions, feel free to ask.

Thanks! I'm currently trying to get it all planned out on paper first. We'll see if this attempt gets off the ground :).

vlad59
27th November 2002, 22:38
SansGrip,

By the test I assume you mean the threshold test...?

Yes, the threshold check.

About the YV12 sources, I just remember I forgot to add all the .h in my latest release. So I'll try to mail it to you tomorrow (or to attach it to this thread), as I won't have enought time to make an official new release in the meanwhile.

SansGrip
27th November 2002, 22:46
vlad59: About the YV12 sources, I just remember I forgot to add all the .h in my latest release. So I'll try to mail it to you tomorrow (or to attach it to this thread)

That would be great, thanks :).

trbarry
28th November 2002, 16:11
Life would be fine if I could keep everything in packed bytes and use pavgb, but I fear the cumulative error would be substantial.

Since the pavgb instruction rounds up it comes out high by an average of .25. Intel recommends avoiding (most of) a cumulative drift when using a series of pavgb instructions by subtracting 1 from one of the 2 operands every other time. Like:

pavgb a,b
pavgb c,d

psubusb a, ONES
pavg a,c
...

There is still a tiny drift because the psubusb instruction will saturate if a and b both started at 0. They even published better code in a monograph somewhere but I don't remember the details.

The above has always seemed close enough for me, at least in most cases.

- Tom