PDA

View Full Version : MinMax filter...


Gatiori
29th June 2007, 19:53
I want to make an Avisynth filter like "Minimum/Maximum" (photoshop). I know a little about programming in C++, but I don't know if it is possible optimize this filter.

If you help me to optimize this code, I will tanks you a lot.
This is very slow.
How to make more fast?.

This is the code for RGB24 and RGB32:

if (vi.IsRGB24()) {


for (h=0; h < src_height;h++) {
for (w = 0; w < src_width; w+=3) {
if (Radius>=0) {
l1=0;
}
else {
l1=255;
}

for (y=-abs(Radius); y<=abs(Radius);y++) {
for (x=-abs(Radius); x<=abs(Radius);x++) {

if (w+(x*3)>=0 && w+(x*3)<src_width && (h+y)<src_height && (h+y)>=0) {
b=*(srcp+w + (x*3)+(y*src_pitch));
g=*(srcp + w+1+(x*3)+(y*src_pitch));
r=*(srcp + w+2+(x*3)+(y*src_pitch));
l=(r+g+b)/3;

if ((l>l1 && Radius>=0) || (l<l1 && Radius<0)) {
x1=x;
y1=y;
l1=l;
}

}
}
}
*(dstp + w)= *(srcp + w+(x1*3)+(y1*src_pitch));
*(dstp + w+1)= *(srcp + w + 1+(x1*3)+(y1*src_pitch));
*(dstp + w+2)= *(srcp + w + 2+(x1*3)+(y1*src_pitch));
}
srcp = srcp + src_pitch;
dstp = dstp + dst_pitch;
}
}



if (vi.IsRGB32()) {

for (h=0; h < src_height;h++) {
for (w = 0; w < src_width/4; w+=1) {
if (Radius>=0) {
l1=-1; //*((unsigned int *)dstp + w)=0x00000000;
}
else {
l1=256;//*((unsigned int *)dstp + w)=0x00ffffff;
}


x1=0;
y1=0;

for (y=-abs(Radius); y<=abs(Radius);y++) {
for (x=-abs(Radius); x<=abs(Radius);x++) {


if (w+x>=0 && w+x<(src_width/4) && (h+y)<src_height && (h+y)>=0) {
// pix=*((unsigned int *)srcp + w);
pix=*((unsigned int *)srcp + w + x+(y*(src_pitch/4)));
b=pix%256;
g=(pix%(65536)-b)/256;
r=(pix-(pix%(65536)))/65536;
l=(r+g+b)/3;

if ((l>l1 && Radius>=0) || (l<l1 && Radius<0)) {
x1=x;
y1=y;
l1=l;
}
}
}
}
*((unsigned int *)dstp + w) = *((unsigned int *)srcp + w +x1+(y1*(src_pitch/4)));
}

srcp = srcp + src_pitch;
dstp = dstp + dst_pitch;
}
}





Help...

... please..

Guest
30th June 2007, 00:00
Take stuff outside the loops where possible.

Get rid of abs() calls.

Don't recalculate things you've already calculated.

Fizick
30th June 2007, 10:53
use scaled multiplication with followed bits shifting instead of dividing by 3.

((a + b + c)*341)/1024

Leak
30th June 2007, 11:57
Write an MMX/SSE version of your code.

That'll probably need quite an overhaul of your algorithm, though.

np: Apparat - Like Porcelain (Walls)

IanB
2nd July 2007, 01:32
@Gatiori, Any chance you can edit your 1st post to make the code sample width friendly i.e. tabs->2 spaces, strip the trailing spaces from the lines, etc. Being able to read all the code without horizontal scrolling is the first step in seeing what to optimize.Get rid of abs() calls.i.e. do separate min and max routines and do the if(Radius < 0) once to select the routine.use scaled multiplication with followed bits shifting instead of dividing by 3.

((a + b + c)*341)/1024Even better would be superscale that part of the code by 3.

i.e. l1 = 255 *3;
...
l= r+g+b

Also precalculate and remember the w+x1+(y1*src_pitch_{3 or 4}) value in the inner loop if (l <l1) test.

And remove the edge condition tests from within the loops. Hard code the edges before and after each X and Y loop respectivly. i.e.Top row.
for (2nd row to 2ndlast row) {
left edge.
for (2nd column to 2nd last column)
...
right edge
}
Bottom row.