PDA

View Full Version : Questions about motion estimation


JimiK
15th November 2002, 14:24
Hello, I'd like to ask you programming gurus some questions about motion estimation. At work I have to solve this problem: calculate a persons location, using the pictures taken by a camera, mounted to its head. When you know the
camera parameters and the object you're looking at, you can calculate a from
position with this 2D view. In the beginning you know both (you tell a person to
stand at a certain spot and look in a specific direction. But now the person
starts to walk or to turn its head. I want to keep track of this motion and I
thought maybe I could use motion estimation for that. I don't know much (barely anything) about m.est. From what I understood it takes two pictures and looks where a set of pixels moved from one picture to another. Then it applies a
vector to this set. Is that correct so far? If yes, I'd like to work with these
vectors.


The program would run on a P2-400 (mobile PC) and the pictures would be captured with 10fps at a resolution of 640x480 (could be reduced if necessary).
Now my questions:
1. Would motion estimation algorithms work on such a system with fast movement
(person turns its head)? I read there are different ones like PMVfast or EPSZ.


2. How hard is it to implement? In the avisynth forum I read that SansGrip used
motion estimation for one of his filters and I wonder if he coded more than 1000
lines just for motion estimation.


3. if anybody knows how my problem could be solved in another way, feel free to
make proposals ;)





Thanks a lot,
JimiK

SansGrip
16th November 2002, 19:31
I want to keep track of this motion and I thought maybe I could use motion estimation for that.

Sounds like a very difficult challenge. Good luck, and keep us posted on your progess :).

2. How hard is it to implement? In the avisynth forum I read that SansGrip used motion estimation for one of his filters and I wonder if he coded more than 1000 lines just for motion estimation.

What I used was motion adaption, not estimation, with a very basic algorithm that picks up motion without too many false hits from noise. Adaption is much simpler to code but only detects moving edges and areas of high contrast. Low-contrast motion is not picked up, and if that's what you need then proper motion estimation is the only way of doing it.

That said, for what you need M.A. might work, since you only need to track known "landmarks" in the image. Motion adaption will pick up the moving edges of these objects, which should give you enough information to attempt to identify them.

Download NoMoSmooth from my web site (see sig) and run it in show=true mode. This'll highlight areas it considers to be in motion. By adjusting the motion_threshold you should be able to find a setting that picks up just the very edges of objects. Looking at the output should give you an idea of whether you can use this kind of motion adaption or whether you'll have to bite the bullet and implement full estimation.

If you choose the latter, you might want to entice someone like -h to join the disucussion since he's much more knowledgable about M.E. than I.

As far as recognizing these landmarks once you have an outline of them, you might want to consider analyzing the chroma information (i.e. the colour of the object) as well as the luma variance, which would give you an idea of the texture.

Many more advanced object recognition techniques exist. Dr Google should be able to present you with myriad PDFs on the subject, most of which are way over my head ;).

What makes this really difficult is that the person can not only rotate but also move around, thus changing the size of the objects. Depending on the application it might be possible to put highly recognizable "placemarkers" around the room, say very bright neon with a shape that's easy to detect programatically.

Just some ideas that may or may not be helpful ;).

JimiK
17th November 2002, 00:52
Thank you for the answer. It was very useful and I read the NoMoComp thread. Looks like many people are glad with the motion detection. I'll download the source. Unfortunately I'm a Java programmer and now only a little C (motion detection would run with JNI, using a dll of the motion detector).
The whole task is very difficult. The project is outdoor augmented reality. For indoor AR you can use placemarkers and there are programs for that. For outdoor AR I don't know any programs and I don't now if people would be happy if you would put a neon yellow triangle at their church ;) We tried different ways and motion detection is just one of my new ideas. Even though we have no landmarks, I hope that there will be some contrast between wall and a window for example. Right now I have no captured material to test with. Next week I'll make some tests and keep you posted.
I think when he's not away, -h will read almost everything in the forums. He was always very helpful in all the threads I read. So I guess if he would have an idea, he would have posted.
Of course I'll also ask DR Google :) I just wanted to ask all the smart people here if my idea is complete crap or if it's actually doable.

Best regards,
JimiK

SansGrip
17th November 2002, 01:11
Unfortunately I'm a Java programmer and now only a little C (motion detection would run with JNI, using a dll of the motion detector).

It's very simple. Move a 3x3 block over the image, and for each pixel find the difference between it and the corresponding pixel from the previous frame. Add all these differences together and compare the absolute value with motion_threshold. If greater, it's moving, otherwise it isn't.

Complex, no? ;)

I like to call it an "absolute sum of differences" even though it probably already has a better name.

I don't now if people would be happy if you would put a neon yellow triangle at their church ;)

Maybe a neon yellow cross? :D

I hope that there will be some contrast between wall and a window for example.

Oh yes, assuming light levels are good and you're not operating in a snowfield or something there'll be plenty of things to show motion. The hard part will be turning the moving/stationary information into coordinates. Rather you than me ;).

JimiK
18th November 2002, 09:26
Hi, I still did not have the time to look at your code (maybe a minute, but it seems I'm very busy) ;) That's what I understood from reading your post: You're just detecting motion, you're not looking where the motion is directed (o.k., we should be through with this) :) What I did not exactly understand was: do you use a matrix to add all pixel values together? I think you don't. You wrote you compare every pixel in a 3x3 block to their neighbours in time. So you compare the upper left pixel in a 3x3 block from picture 1 to the upper left pixel in picture 2, right? So now to an idea how to actually get the direction of the motion. It should be possible to search the neighbourhood of the 3x3 block in pic 2 for a 3x3 block where the difference to the 3x3 block in pic 1 is minimal. Then you could make a vector from the blocks location in pic 1 to the location in pic 2. If you found the first vector, it should be easier to find other vectors to proof this vector is correct. Do you think that would be smart? I haven't seen -h around in a while. Maybe I should ask sysKin about motion estimation, I think he wrote an implementation for the new hinted MV.

Thank you,
JimiK

-h
19th November 2002, 04:08
Yes I have been getting lax lately.

I've given this some thought, and decided that it's far beyond me :)

You could use a form of ME to garner the global vector for the frame, decide if the person is turning (simple pan) or walking (simple zoom), then use this vector to adjust the person's calculated location and direction.

However this would be wildly inaccurate. You can't be sure of the distances involved, as vectors measured in pels could be feet or inches depending on how far away the surface was that the vector was found on. You would have to periodically recalculate the person's location by creating a number of "known points", which you would have to identify beforehand then compare the current scene to.

There are probably known ways to do this, but I'm just brainstorming as I have no experience in the matter of 3D antics. I think (well, for any technical matter actually) that MfA is the man you're after :)

-h

SansGrip
19th November 2002, 05:28
However this would be wildly inaccurate.

:D :D

I have no idea why this made me laugh out loud. It just did ;).

JimiK
19th November 2002, 11:01
Hello -h, thank you for joining this thread :)
I've given this some thought, and decided that it's far beyond me

I don't want all of you to do my research for me, just wanted to share some thoughts and see if motion estimation could be used for my task.

You could use a form of ME to garner the global vector for the frame, decide if the person is turning (simple pan) or walking (simple zoom), then use this vector to adjust the person's calculated location and direction.

Yes, that's what I thought. I will try to make it more clear to you all. We have technics that "should" identify objects in a picture, determine their location and orientation towards the viewer. Then you could calculate the persons location quite exactly. There are (at least) two flaws in this technic.
First: It takes a lot of CPU-power we don't have.
Second: It should be accurate if you can identify the objects in the picture. But when you have almost no information, it's very hard (or almost imposible) to find those objects.
Now my solution: when you start, you know the persons location and where it's just looking. So you have all info you need. With every frame I'm trying to find out about the direction of the motion. You can calculate your new position now. You're right when you say that this will be screwed up sooner or later. That's when the other technic kicks in. Every two or three seconds you identify the objects location and orientation. It safes a lot of CPU-power if you don't have to do it every frame and, even better, by usings motion vecs before you have a slight idea where to find the objects in the picture.
That's why I wanted to know more about motion vecs. I want to know what fast they can be calculated and how "robust" (I don't know the word, it means that you might not get 100% accurate results, but you don't get results that are completely wrong) they are. Does anyone know of some good papers about this theme? (Why search in Google and to read crap if you know people that certainly now the links to some good papers) ;)

Thank you all a lot,
JimiK

SansGrip
19th November 2002, 17:46
Does anyone know of some good papers about this theme? (Why search in Google and to read crap if you know people that certainly now the links to some good papers) ;)

I'm afraid Dr Google is the only source I know of. You'll find many papers on the two fast algorithms you mentioned in your first post, though. Most of them were above my head, but with enough headache medicine you'll probably have some luck with them :).

-h
19th November 2002, 19:31
Yes in order for this to be accurate you will have to perform some level of object recognition or scene segmentation, and that's beyond me. You could try looking into existing solutions such as this (http://www.facit.co.uk/peripherals_position.htm).

It seems to me that it might be easier to do this with radio-based triangulation. Creating a visual-based motion tracking system mostly from scratch is going to be a nightmare.

The motion estimation step can be performed quite fast, if you're only interested in finding "true" motion (instead of compression). A P2-400 should be enough for running EPZS in full-pel mode. You can find the description of EPZS here (http://citeseer.nj.nec.com/tourapis02enhanced.html), with the description of PMVFAST it builds off here (http://citeseer.nj.nec.com/tourapis01predictive.html).

-h

JimiK
19th November 2002, 22:53
Thank you for the links. I will have a look at these pages.

Sincerely,
JimiK

High Speed Dubb
20th November 2002, 05:27
You might have some luck using “global motion estimation.” The assumption there is that everything in the scene is still, but that the camera is moving. You’ll find a quite a few papers with a Google search for it.

JimiK
20th November 2002, 10:54
You're right. Why didn't I think of this. All I heard about GMC so far is that it catches camera pans and zooms. But after that is exactly what I want to do, why not give it a try. I already did a small test with XviD GMC. I encode a piece of a STTNG episode. Picard moves in front of a wall, camera moves with him. FFDshow report GMC. Camera stops, but Picard is still moving his arms. No GMC. So GMC seems not to be disturbed by persons moving in front of the background and that would be required. So this looks promising. In the moment I have so much to do for school (why do americans say school when it's a university), so I'm not having much time to do research. But I'll tell you when I have first results.

Best regards,
JimiK

High Speed Dubb
22nd November 2002, 03:08
XviD has global motion estimation? Cool — That means there’s some GPL code for it out there. How slow/fast is it?

JimiK
22nd November 2002, 08:58
Are you serious or are you joking? XviD has an implementation of Global Motion Compensation, so I thought this would require Global Motion Estimation? Is that wrong? However it's said to be buggy and increases filesize compared to normal ME. I didn't have a look at the code (and I hope I never will, it would be too complicated) ;) But at least the detection of moving backgrounds seemed to work as I reported. So if anybody wants to write a book "GMC for dummies", feel free and don't forget to send me a PDF-version. :)
I don't know how fast XviD-GMC is. I think it gave me a frame drop from 15fps to 14fps or 13fps.

Best regards,
JimiK