PDA

View Full Version : Auto clip matching using motion vectors?


Karyudo
9th October 2004, 04:31
On occasion, I have been interested to repair one copy of a clip using good portions of a second, generally-inferior clip. For example, if I have a DVD rip with hard subs and a VCD version with no subs, I'd like to be able to replace the areas of the DVD frame containing subs with the same part of the VCD, and get a result that maintains most of the DVD frame quality, just using the VCD to patch over the area with subs.

Regardless of how the repair is attempted (averaging, masking, replacing frames, rotoscoping, etc.), the first step pretty much has to be aligning the two sources, both in time (i.e. same frames must be shown at the same time) and in space (i.e. x-y coordinates of both clips must match). In my above example, you'd have to scale up the VCD (320 x 240 or something?) to match the DVD (720 x 576, let's say).

Lining clips up in the time axis is pretty easy to do with good sources (no frame drops) and something like StackVertical, so maybe that can be left manual at the moment.

But lining up two clips in space is painful, slow, and boring work!

I've noticed work being done on motion vector tools (like MVTools), and I got to wondering:

Would it be possible to write a filter that used motion vectors to automatically align two (or more) clips in space (and maybe time)?

I don't know nearly enough about the practical application of motion vectors, but I can imagine that two copies of the same clip would produce identical sets of motion vectors. Two copies of the same clip *of different frame sizes* should give a set of vectors that is the same, except for an offset of the origin and a scaling vector. I therefore suspect it should be possible to use calculated motion vectors from the two sources to back-calculate the appropriate offsets and scaling factors needed to supersample (and then downsample) the smaller frame to match the larger one. Once the smaller frame is scaled up and matched to the larger, any intra-frame work can then be performed as flawlessly as possible.

[A further abstraction would be to use the motion vectors to line up the two sources in time, as well, although I can see that being a lot of work up front that doesn't save all that much effort on the back end.]

Am I out to lunch, or could this work? If it could work, does anyone have any better idea than I how to do it in practice??

Mug Funky
9th October 2004, 04:59
in my case, i'd find manual alignment easier than using a plugin... if you use photoshop with screen shots of your video, you can very quickly get a good match that can be replicated in avisynth. just make sure to use rulers and guides to measure how many pixels the offsets are.

one possible problem with the above method is that YV12 can only be cropped by mod2. a way around this is with yv12convolution (in masktools)... simply give it a matrix that moves everything by 1 pixel (or even half, quarter, or whatever if you want uber precision). this is of course quite slow.

however, i like the idea of using motion-vectors for inpainting. that sounds like a pretty good idea - when an object moves underneath a subtitle, it is compensated from the last frame and moved over the sub. that could be acceptable quality without the need for a second clip (but it'd fall apart in static scenes).

MVs for simple frame-alignment seems like the wrong tool - the vectors work in blocks of 8 or 4, so the chances of getting a recognizably similar vector-field for the same frame in 2 clips would be quite low indeed.

perhaps DePan could handle it though. that moves the whole frame, rather than bits of it. you could interleave your 2 clips (at the same res) and then run depan on it, then split it back out into 2 clips, which hopefully will be in the same spatial location. that might destroy a lot of other pan information though.

Karyudo
12th July 2005, 07:37
This is a big-ass bump.

I have yet another project that would benefit greatly from being able to line up two sources, and this one is even nastier than before (and therefore even less fun to do by hand): using the fullscreen (sic) version of a video to help boost the resolution of a portion of the widescreen version.

For example, if you're capping a widescreen NTSC laserdisc of a 2.35:1 title, you're only getting about 270 lines of resolution. If you cap the fullscreen (sic) version of the same film, you're getting fully 480 lines of resolution. Obviously, you're missing picture information in each case -- but if you could easily combine the two captures in the area they overlap, you'd have something that's better than the sum of the parts. Wouldn't you?

Problem is, they don't call fullscreen (sic) "pan-and-scan" for nothing: very often there are artificial cuts and pans introduced where none existed in the original. This means lining things up by hand is pretty much impossible. You would likely have to mess with every scene -- and in some scenes, you'd have to mess with every frame.

Anybody got any ideas for being able to take two sources, scale and/or stretch and/or pad them both automagically so the picture information in each overlaps as closely as possible (some sort of overall difference-minimization on a frame-by-frame basis, using something like Subtract(), say?), and then scale back down to a user-selectable size?

Just dreamin'... but some of the filters I see discussed here blow my little mind, so I figured I may as well dream out loud!

Firesurfer
4th August 2005, 09:05
@Karyudo

The laserdisc player I remember (Pioneer I think) had a zoom function with a movable zoom area. This way one could capture every "quadrant" of the movie, resulting in 4 clips with higher resolution that could be combined to one. Maybe there is a way to do that with your player.


After re-reading Karyudos second post I actually understood what he really meant, so please discard the above as -nearly- (see next post by Mug Funky) unusable.

Mug Funky
4th August 2005, 09:59
you wouldn't have any more lines that way though. laserdisc's horizontal resolution is below the capturing rate, and vertical resolution on SDTV is fixed at 480/576, so the pioneer would've been upscaling the image in its zoom mode. it could help in elimitating rainbows and noise though.