cutfinder/drop detection

jmac698 · 13th April 2011, 04:23

Thanks to a script from Gavino, I know how to search through two videos, comparing them to find the differences. It reminds me of an insertion sort. The problem is the comparison stage; they have to be very similiar in all properties to match just on lumadiff etc.
Is there a comparison more robust to noise?
Well, correlation of course, but is there some correlation filter which returns a script variable for the final answer?

pbristow · 18th April 2011, 11:41

It depends partly what you want to do the matching for. Are you just trying to get the correct frame from A matched up with the correct frame from B, i.e. synchronise them? Or are you trying to measure some specific aspect of how video A differs from video B? (In which case, you need to choose a process that reduces the extraneous factors such as noise, but doesn't destroy or distort the characteristic you want to measure.)

For solving the first problem, here are some ideas:

1. Run the two videos through a de-noise stage (same process and parameters for both) before comparing them.

2. If there are big differences in the noise characteristics of two sources, you could run the two videos through various different de-noising techniques (e.g. one spatial, one temporal; or spatial soften with two different sets of parameters (one subtle, one more aggressive); etc); then do cross-comparisons for each technique on video A against each technique on video B, and select whichever combination gives the closest matches.

3. Do something similar to what MVtools does for matching blocks, only applied to whole frames: Start by comparing really low-resolution versions of the two videos (e.g. 1/8th size), to get within a few frames of correct synch; then progressively improve the resolution (1/4 size, 1/2 size, full size) checking and refining the result as you go.

As for how you calculate the correct frame offset between the two videos, the process I think you need is:

1. Compare video, Frame N, against video B Frames N-2, N-1, N, and N+1, N+2 - i.e. for an offset of -2 to +2, find which is the closest match. Add that offset to a variable called "total_offset".
2. Repeat step one for the next frame, and the next... Keep a count of how many frames you've compared.
3. After some time (e.g. 10 frames), divide "total_offset" by the number of frames (use floating point), and round it to the nearest integer. If things are going well, this should equal the amount of adjustment you need to make to get the two videos in synch. If the video is fairly static, you may need to test over a longer sequence of frames.

pbristow · 18th April 2011, 11:59

Gah! And now I re-read you question and realise you already have that second part covered. You meant correlation of values integrated over the frame (spatial correlation) - I was thinking correlation against time! [SLAPS FOREHEAD]

My history in acoustic signal processing is catching me out there: We only had one dimension to worry about!

I don't know of any existing spatial correlation filters, but it should be easy to create one with the MT_lut family. (Having just created my first filter using those, I'm suddenly a big fan!

)

You need to create the product of videoA * videoB, then sum or average that over the whole frame. Something like:

MT_lutxy(... "X Y *" ...).calculate_the_average_luma()

will calculate the correlation for the case with no spatial offset between the two videos. You could use MT_convolution to create copies of video B with lots of different spatial offsets, calculate correlation coefficient for each against (un-shifted) video A, and choose the spatial offset that gives the best fit (averaging the vector over several frames to find a consistent fit). Or you could go ahead and create a full correlation map over the whole frame, and output that as a video clip... but that would be VERY slow!

Using mt_luts might be more efficient (it allows you to do your calculation taking into account neighbouring pixels, so if you tell it use *one* neighbouring pixel (and not the original position) each time, that might be more efficient that running MT_convolution lots of times.

Hope these musings help. Interesting problem!

jmac698 · 18th April 2011, 20:09

Looks like this is going to be my next project. Another scripter is interested too. Our problem is capture cards which create freezeframes (dupped frames) and then drop frames a little after. The problem is to fix a video by automatically detecting missing frames and interpolating them, and also remove dupped frames. A further problem is to synchronize n videos (up to 5!) frame-by-frame.
You gave me a lot of ideas to think about.
It looks like dup, dedup, and getdubs all output a log file which could be used, so it would have to be two pass for now unfortunately. In my case I have even worse problems, the noise is extreme and always different on each copy/frame, so I might need to compare only pixels that are good with a mask (it's little white lines on the video which I can detect).
by correlation I mean pearson's coeficient, you can't calculate that in script with mvtools. It's immune to brightness, contrast, and order of pixels, and gives the best expectation in the presense of white noise.

pbristow · 19th April 2011, 09:07

Quote:

Originally Posted by jmac698

by correlation I mean pearson's coeficient, you can't calculate that in script with mvtools.

[GOES GOOGLING] AH! Yes, that's basically what I was talking about, except I forgot a couple of details: Subtracting the mean to make the signals symmetrical about zero before multiplying (something I never needed to do with the audio stuff, as the mean was already zero), and then dividing the product by the S.D. Have a heart, it's been over 20 years!

You *can* calculate this using MaskTools (not mvtools), but it would involve multiple steps and be very slow, and in most cases would be total overkill.

...But I might just do it sometime anyway, for the practice!

jmac698 · 19th April 2011, 16:01

Well actually I tried. First of all, when you do divisions you generally need a lot of accuracy, and I did numerical studies and you need at least 16 bits (they ranged up to 24). Next, there's one step in the equation where you have an expression of 4 variables, which I couldn't do. Maybe you can find some way to do it.
Here's some code:

Code:

For i=0 To l-1
	mx=mx+ref(i)
	my=my+v(i)
Next i

mx=mx/l
my=my/l

For i=0 To l-1
	sxy=sxy+(ref(i)-mx)*(v(i)-my)
	s2x=s2x+(ref(i)-mx)^2
	s2y=s2y+(v(i)-my)^2
Next i

r=sxy/(sqr(s2x)*Sqr(s2y))

ps this is one of the reasons I was wishing for shift operators in masktools; I finally got it in a48, so it should be easier now anyways.
I calculated the means mx my as averages directly. I think you can even do that with a simple resize.
My DeepColor tools has some of the code needed (it does 16bit operations with masktools).

pbristow · 19th April 2011, 16:22

I'm not clear exactly what that code is doing since you haven't identified the variables... But in any case, thinking about how you would do the calculation in a "normal" programming language is a red herring. The trick to using the MT_lut toolkit is to break down the expressions you're trying to evaluate into components that can each be computed in a single MT_(whatever) call, and then combine them. One of the requirements there is that the functions don't return single values, but complete video clips of all the results of whatever expression you give them evaluated over the entire input clip(s). If you need a single value, such as the average of some expression evaluated over the whole frame, then you evaluate the expression using MT(whatever) and then pass the result to the runtime function AverageLuma. (Yes, you do have to get into using the runtime environment, alas, which is another thing that makes things slow.)

MT_lut calculations are done in (at least) 16 bits internally, by the way; you just have to make sure the *result* of each block of calculation (i.e. what comes out of the MT_(whatever) call) can be expressed adequately in 8 bits.

jmac698 · 19th April 2011, 21:38

mx=means of x
x=ref(i)=the first list of numbers to correlate
y=v(i)=the second list of numbers to correlate
my=mean of y
sxy=sum of (x-mx)*(y-my)
s2x=sum of (x-mx)^2
s2y=sum of (y-mx)^2
r=the correlation (between -1 and 1, where close to 1 means the originals are equivalent)
mt has 32bit precision for intermediates, but I can only store an 8 bit result. If I break the calculation into steps, I have to store a result. The result then limits the precision. And you don't need average luma, you can either resize or have a copy of an average in all pixels at once.
Do you want to try to write it? It would be useful, I just don't think it's possible.

Here's the first two calculations:

Code:

l=blankclip(60,16,32,pixel_type="YV12",color_yuv=$648080)#luma=100
r=blankclip(l,color_yuv=$C88080)#luma=200
x=stackhorizontal(l,r)
y=stackhorizontal(r,l)
mx=mt_lutf(x,x,expr="x")
my=mt_lutf(y,y,expr="x")
mx#all grey image with luma=150

Gavino · 20th April 2011, 00:27

Quote:

Originally Posted by jmac698

I just don't think it's possible.

Sure it's possible. Perhaps tedious and messy, but certainly possible.

First notice that the formula for the correlation can be rewritten as
r = (avg(x*y) - avg(x)*avg(y)) / (std(x)*std(y))

All these operations are available in mt_lutf and mt_lutxy, so it's just a question of putting it all together.
(left as an exercise for the reader

)

pbristow · 21st April 2011, 19:27

Quote:

Originally Posted by jmac698

mx=means of x
x=ref(i)=the first list of numbers to correlate
y=v(i)=the second list of numbers to correlate
my=mean of y
sxy=sum of (x-mx)*(y-my)
s2x=sum of (x-mx)^2
s2y=sum of (y-mx)^2
...

This approach is a bit like trying to put an orange through a bacon slicer: You won't get nice neat slices of orange; You'll get a lot of ragged orange fragments and a splatter of juice everywhere (i.e. the bits of data that get lost from storing 16 bit numbers in 8 bit locations).

What you need to do is look under the skin of the orange, and see the natural structure of it. Instead of trying to create slices of even thickness, see that the orange naturally falls into a set of similar-shaped radial segments, each of which has a natural boundary that keeps the juice in (i.e. the interfaces between them can all be done in 8 bits, without data loss).

That's what Gavino's done.

jmac698 · 22nd April 2011, 00:30

But there is data loss. His formula is right, and can be done in less steps but I still need to extract 16bits from those operations. You gave me an idea though; however even with 2 pixels I still get more than 8 bit numbers. I see about 6 steps there at least. The maximum intermediate values seem to be when half the pixels of each image are white and half are black (and the same ones are white in each).

Gavino · 22nd April 2011, 00:54

Part of the 'messiness' I was alluding to was the probable need to scale intermediate results to maintain precison. For example, x*y has a range of 0-66025, so you need to store it as x*y/255 or possibly as two separate 8-bit quantities. Another is that the final result in the range -1 to 1 should be scaled to be between 0 and 255. There is also the question of how you handle the case of a uniform clip (std()=0, so correlation is undefined).

Quote:

Originally Posted by jmac698

The maximum intermediate values seem to be when half the pixels of each image are white and half are black (and the same ones are white in each).

That will give the greatest std deviation, I think.
I believe the std cannot exceed 128, so you could gain more precision by multiplying it by 2.

jmac698 · 22nd April 2011, 10:50

How's this:

Code:

#Correlation test v0.1 by jmac
#trying to compute a correlation in script between video clips x and y, result as r
l=blankclip(60,16,32,pixel_type="YV12",color_yuv=$648080)#luma=100
r=blankclip(l,color_yuv=$C88080)#luma=200
x=stackhorizontal(l,r)
y=stackhorizontal(r,l)
#r=(avg(xy)-avg(x)*avg(y))/(std(x)*std(y))
mx=mt_lutf(x,x,expr="x")#all grey image with luma=150
my=mt_lutf(y,y,expr="x")
stdx=mt_lutf(x,x,"std",expr="x")
stdy=mt_lutf(y,y,"std",expr="x")
mxyt=mt_lutxy(x,y,"x y * 256 /")
mxy=mt_lutf(mxyt,mxyt,expr="x")
tops=mt_lutxyz(mxy,mx,my,"x 256 * x y * - 512 / 128 +")
r=mt_lutxyz(tops,stdx,stdy,"x 128 - y z * / abs 255 *")
v1=scriptclip(mx.pointresize(100,20),"""subtitle("mx="+string(averageluma,"%.0f"))""")
v2=scriptclip(my.pointresize(100,20),"""subtitle("my="+string(averageluma,"%.0f"))""")
v3=scriptclip(stdx.pointresize(100,20),"""subtitle("stdx="+string(averageluma,"%.0f"))""")
v4=scriptclip(stdy.pointresize(100,20),"""subtitle("stdy="+string(averageluma,"%.0f"))""")
v5=scriptclip(mxyt.pointresize(100,20),"""subtitle("mxyt="+string(averageluma,"%.0f"))""")
v6=scriptclip(mxy.pointresize(100,20),"""subtitle("mxy="+string(averageluma,"%.0f"))""")
v7=scriptclip(tops.pointresize(100,20),"""subtitle("tops="+string(averageluma,"%.0f"))""")
v8=scriptclip(r.pointresize(100,20),"""subtitle("r="+string(averageluma,"%.0f"))""")
stackvertical(v1,v2,v3,v4,v5,v6,v7,v8)

Still not quite right, however...

Gavino · 22nd April 2011, 11:55

Quote:

Originally Posted by jmac698

Code:

tops=mt_lutxyz(mxy,mx,my,"x 256 * x y * - 512 / 128 +")

Still not quite right, however...

You're not using z in that expression!
It should be "x 256 * z y * - 512 / 128 +"

However, a simpler (and faster) way is to note that each avg or std component of the 'r' formula is a single number (same for all pixels), so it can be more easily done in floating point precision directly within ScriptClip.

Code:

ScriptClip(x, """
  axy = AverageLuma(mxyt)
  ax = AverageLuma(x)
  ay = AverageLuma(y)
  sx = AverageLuma(stdx)
  sy = AverageLuma(stdy)
  r = (256*axy - ax*ay)/(sx*sy) # in [-1, 1]
  Subtitle(...)
""")

For speed, the stdx and stdy clips could be PointResized down to 2x2 outside the ScriptClip, and apart from mxyt, the others (mx, my, etc) are no longer needed.

For greater accuracy, you could even compute x*y in full precision as two separate 8-bit clips and use them in the calculation instead of mxyt.

jmac698 · 22nd April 2011, 23:31

Ok, this works except for some little details (what is the proper correlation of x=(103,201) and y=(201,201)?)

Code:

#Correlation test v0.2 by jmac
#big thanks to Gavino, again :)
#trying to compute a correlation in script between video clips x and y, result as r
#this works but may be accuracy problems in std()
l=blankclip(60,16,32,pixel_type="YV12",color_yuv=$678080)#luma=103
r=blankclip(l,color_yuv=$C98080)#luma=201
x=stackhorizontal(l,r)
y=stackhorizontal(r,l)#cahnge to r,r to get r=0/0
#r=(avg(xy)-avg(x)*avg(y))/(std(x)*std(y))

stdx=mt_lutf(x,x,"std",expr="x")
stdy=mt_lutf(y,y,"std",expr="x")
xyhigh=mt_lutxy(x,y,"x y * 8 >>u")#don't use 256 / it rounds i.e. 80.8=>81
xylow=mt_lutxy(x,y,"x y * 255 &u")
blankclip#something to display results on
ScriptClip("""
mx=averageluma(x)
my=averageluma(y)
mxyhigh=averageluma(xyhigh)
mxylow=averageluma(xylow)
mxy=mxyhigh*256+mxylow
stx=averageluma(stdx)
sty=averageluma(stdy)
top=mxy-mx*my
bot=stx*sty
r=top/bot
subtitle("mx="+string(mx)+" my="+string(my))
subtitle(y=20,"mxyhigh="+string(mxyhigh)+" mxylow="+string(mxylow)+" mxy="+string(mxy))
subtitle(y=40,"mx*my="+string(mx*my)+" mxy-mx*my="+string(top))
subtitle(y=60,"stx="+string(stx)+" stdy="+string(sty)+" stx*sty="+string(stx*sty))
subtitle(y=80,+" r="+string(top/bot))
""")

jmac698 · 23rd April 2011, 01:30

So what does /0 mean? r=avg(xy)-avg(x)avg(y)/std(x)std(y)
std(x)=(x-mx)^2 essentially.
So std(x)=0 when each pixel is the same as the mean of all pixels, which is when each pixel is the mean, or a flatfield.
We get r=x/0 when one or both clips are a flatfield. I only want this to match another flatfield. Most of the time I should set r=x/0=0 so there's no match.
When is the numerator=0?
When is x1*y1+x0*y0=(x1+x0)*(y1+y0) (or x1*y1+x0*y0+x1*y0+x0*y1)
That means one of them is a flatfield (I'm jumping ahead here).
There's no way to tell both are flatfield, so if the divisor is 0 I would assume they don't match unless they are both flatfield. I would test this by seeing if they both have the same low std.

jmac698 · 23rd April 2011, 08:24

Well I have good news and bad news.
First the good news

I got it to work and practically seems pretty good. Didn't get a divide by 0 in real life. The accuracy of std doesn't seem to matter too much, though I didn't thoroughly test.
The bad news. I can't distinguish my frames. My same frames with different noise have r>.90, but my same frames are only .001 more than comparing with a previous frame, even when large motion. Sometimes the wrong frame matches better. It seems the noise is too much. Denoising doesn't seem to help.
Looking at the current minus the previous, when they are different there are obvious "shadows" around edges, so maybe a count of "8 connected" pixels would help to detect this. Otherwise is there some more robust way to match two frames?

pbristow · 23rd April 2011, 08:52

Another trick to improve accuracy: The time when you need the most precision in the std is when the stds are both low. A low denominator means a high final value, and small steps in the denominator will have a much bigger effect on that value than when the denominator is higher.

Three possible tricks to achieve this required extra precision at low values of std, by trading off against precision at the (less significant) higher values:

1. Store the natural logarithm of the std. This gives equal percentage precision across the whole range of values. (Problem: Is there an operator to create logs in the MT_lut... set? If I wasn't so lazy I could look at the docs and check...

)
2. Store the square root of the std. This still allows more precision (percentage wise) at higher values than low ones, but may be computationally quicker/easier to deal with. (Note: The computation gets done when opening the script, not during execution, so overall performance shouldn't really be affected.)
3. Store the reciprocal of the std. In a sense this is the most natural option, since we'll be dividing by the sum of the stds anyway.

The snag is that MT_lutf doesn't allow extra operations to be done after computing the std, which means you end up having to implement the std operation yourself before applying the final precision-saving step. Messy.

I wonder if a strong enough case could be made for adding a "post-expression" option to MT_lutf?

Gavino · 23rd April 2011, 09:36

Quote:

Originally Posted by pbristow

The snag is that MT_lutf doesn't allow extra operations to be done after computing the std, which means you end up having to implement the std operation yourself before applying the final precision-saving step. Messy.

I wonder if a strong enough case could be made for adding a "post-expression" option to MT_lutf?

It does allow extra operations, eg mt_lutf(c, c, "std", "x log") would store the log of the std. However, as jmac698 reports, the std value itself is only delivered as an 8-bit integer. I hadn't realised this, but it's obvious when you consider how the mt_lut* functions work - as you said:

Quote:

The computation gets done when opening the script, not during execution

The good news is that std(x) can be calculated simply as sqrt(avg(x^2)-avg(x)^2), which can be done using AverageLuma, simply using mt_lutx to get the clip holding x^2 (or two clips as was done for x*y).

jmac698 · 23rd April 2011, 16:56

p
Log idea is a great trick. I used a log table to divide once (z/d*32) and do some other operations in a 3d engine. Tables can be extremely powerful.

Yep, when I saw r>>1 I knew there was a problem. I know std is internally available to more precision, because I'm looking at the true and masktools values and can see that masktools is rounding. The same thing was updated in the last alpha which only took a few days, it should be a quick fix.
Here is a replacement function:

Code:

function std(clip x) {
    x2high=mt_lut(x,"x x * 8 >>u")
    x2low=mt_lut(x,"x x * 255 &u")
    avgx2=averageluma(x2high)*256+averageluma(x2low)
    avgx=averageluma(x)
    sqrt(avgx2-pow(avgx,2))
}

Now my tests are much more reliable, it fully works in the noisy panning scene, and only fails where you'd expect - when there's virtually no movement. The different between (same frame, different noise) and (different frames, different noise) is a small but significant .005 or so. The correlation of (same frame, different noise) is still poor at r=.9, which shows you much noise I'm dealing with.

Moving on, how should I detect low motion frames so I can make fuzzy logic to ignore correlation indications at these frames? I'll just assume that out of sync by correlation frames in a series of low motion frames and surrounded by in-sync frames are really in-sync.

It's a lot slower now.

13th April 2011, 04:23	#1 \| Link
jmac698 Registered User Join Date: Jan 2006 Posts: 1,867	cutfinder/drop detection Thanks to a script from Gavino, I know how to search through two videos, comparing them to find the differences. It reminds me of an insertion sort. The problem is the comparison stage; they have to be very similiar in all properties to match just on lumadiff etc. Is there a comparison more robust to noise? Well, correlation of course, but is there some correlation filter which returns a script variable for the final answer?

18th April 2011, 11:59	#3 \| Link
pbristow Registered User Join Date: Jun 2009 Location: UK Posts: 263	Gah! And now I re-read you question and realise you already have that second part covered. You meant correlation of values integrated over the frame (spatial correlation) - I was thinking correlation against time! [SLAPS FOREHEAD] My history in acoustic signal processing is catching me out there: We only had one dimension to worry about! I don't know of any existing spatial correlation filters, but it should be easy to create one with the MT_lut family. (Having just created my first filter using those, I'm suddenly a big fan! ) You need to create the product of videoA * videoB, then sum or average that over the whole frame. Something like: MT_lutxy(... "X Y " ...).calculate_the_average_luma() will calculate the correlation for the case with no spatial offset between the two videos. You could use MT_convolution to create copies of video B with lots of different spatial offsets, calculate correlation coefficient for each against (un-shifted) video A, and choose the spatial offset that gives the best fit (averaging the vector over several frames to find a consistent fit). Or you could go ahead and create a full correlation map over the whole frame, and output that as a video clip... but that would be VERY slow! Using mt_luts might be more efficient (it allows you to do your calculation taking into account neighbouring pixels, so if you tell it use one* neighbouring pixel (and not the original position) each time, that might be more efficient that running MT_convolution lots of times. Hope these musings help. Interesting problem! Last edited by pbristow; 18th April 2011 at 12:07.

19th April 2011, 16:01	#6 \| Link
jmac698 Registered User Join Date: Jan 2006 Posts: 1,867	Well actually I tried. First of all, when you do divisions you generally need a lot of accuracy, and I did numerical studies and you need at least 16 bits (they ranged up to 24). Next, there's one step in the equation where you have an expression of 4 variables, which I couldn't do. Maybe you can find some way to do it. Here's some code: Code: For i=0 To l-1 mx=mx+ref(i) my=my+v(i) Next i mx=mx/l my=my/l For i=0 To l-1 sxy=sxy+(ref(i)-mx)(v(i)-my) s2x=s2x+(ref(i)-mx)^2 s2y=s2y+(v(i)-my)^2 Next i r=sxy/(sqr(s2x)Sqr(s2y)) ps this is one of the reasons I was wishing for shift operators in masktools; I finally got it in a48, so it should be easier now anyways. I calculated the means mx my as averages directly. I think you can even do that with a simple resize. My DeepColor tools has some of the code needed (it does 16bit operations with masktools). Last edited by jmac698; 19th April 2011 at 16:23.

19th April 2011, 21:38	#8 \| Link
jmac698 Registered User Join Date: Jan 2006 Posts: 1,867	mx=means of x x=ref(i)=the first list of numbers to correlate y=v(i)=the second list of numbers to correlate my=mean of y sxy=sum of (x-mx)(y-my) s2x=sum of (x-mx)^2 s2y=sum of (y-mx)^2 r=the correlation (between -1 and 1, where close to 1 means the originals are equivalent) mt has 32bit precision for intermediates, but I can only store an 8 bit result. If I break the calculation into steps, I have to store a result. The result then limits the precision. And you don't need average luma, you can either resize or have a copy of an average in all pixels at once. Do you want to try to write it? It would be useful, I just don't think it's possible. Here's the first two calculations: Code: l=blankclip(60,16,32,pixel_type="YV12",color_yuv=$648080)#luma=100 r=blankclip(l,color_yuv=$C88080)#luma=200 x=stackhorizontal(l,r) y=stackhorizontal(r,l) mx=mt_lutf(x,x,expr="x") my=mt_lutf(y,y,expr="x") mx#all grey image with luma=150 Last edited by jmac698; 19th April 2011 at 22:00.*

23rd April 2011, 16:56	#20 \| Link
jmac698 Registered User Join Date: Jan 2006 Posts: 1,867	p Log idea is a great trick. I used a log table to divide once (z/d32) and do some other operations in a 3d engine. Tables can be extremely powerful. Yep, when I saw r>>1 I knew there was a problem. I know std is internally available to more precision, because I'm looking at the true and masktools values and can see that masktools is rounding. The same thing was updated in the last alpha which only took a few days, it should be a quick fix. Here is a replacement function: Code: function std(clip x) { x2high=mt_lut(x,"x x 8 >>u") x2low=mt_lut(x,"x x * 255 &u") avgx2=averageluma(x2high)*256+averageluma(x2low) avgx=averageluma(x) sqrt(avgx2-pow(avgx,2)) } Now my tests are much more reliable, it fully works in the noisy panning scene, and only fails where you'd expect - when there's virtually no movement. The different between (same frame, different noise) and (different frames, different noise) is a small but significant .005 or so. The correlation of (same frame, different noise) is still poor at r=.9, which shows you much noise I'm dealing with. Moving on, how should I detect low motion frames so I can make fuzzy logic to ignore correlation indications at these frames? I'll just assume that out of sync by correlation frames in a series of low motion frames and surrounded by in-sync frames are really in-sync. It's a lot slower now.

18th April 2011, 11:41	#2 \| Link
pbristow Registered User Join Date: Jun 2009 Location: UK Posts: 263	It depends partly what you want to do the matching for. Are you just trying to get the correct frame from A matched up with the correct frame from B, i.e. synchronise them? Or are you trying to measure some specific aspect of how video A differs from video B? (In which case, you need to choose a process that reduces the extraneous factors such as noise, but doesn't destroy or distort the characteristic you want to measure.) For solving the first problem, here are some ideas: 1. Run the two videos through a de-noise stage (same process and parameters for both) before comparing them. 2. If there are big differences in the noise characteristics of two sources, you could run the two videos through various different de-noising techniques (e.g. one spatial, one temporal; or spatial soften with two different sets of parameters (one subtle, one more aggressive); etc); then do cross-comparisons for each technique on video A against each technique on video B, and select whichever combination gives the closest matches. 3. Do something similar to what MVtools does for matching blocks, only applied to whole frames: Start by comparing really low-resolution versions of the two videos (e.g. 1/8th size), to get within a few frames of correct synch; then progressively improve the resolution (1/4 size, 1/2 size, full size) checking and refining the result as you go. As for how you calculate the correct frame offset between the two videos, the process I think you need is: 1. Compare video, Frame N, against video B Frames N-2, N-1, N, and N+1, N+2 - i.e. for an offset of -2 to +2, find which is the closest match. Add that offset to a variable called "total_offset". 2. Repeat step one for the next frame, and the next... Keep a count of how many frames you've compared. 3. After some time (e.g. 10 frames), divide "total_offset" by the number of frames (use floating point), and round it to the nearest integer. If things are going well, this should equal the amount of adjustment you need to make to get the two videos in synch. If the video is fairly static, you may need to test over a longer sequence of frames.

18th April 2011, 20:09	#4 \| Link
jmac698 Registered User Join Date: Jan 2006 Posts: 1,867	Looks like this is going to be my next project. Another scripter is interested too. Our problem is capture cards which create freezeframes (dupped frames) and then drop frames a little after. The problem is to fix a video by automatically detecting missing frames and interpolating them, and also remove dupped frames. A further problem is to synchronize n videos (up to 5!) frame-by-frame. You gave me a lot of ideas to think about. It looks like dup, dedup, and getdubs all output a log file which could be used, so it would have to be two pass for now unfortunately. In my case I have even worse problems, the noise is extreme and always different on each copy/frame, so I might need to compare only pixels that are good with a mask (it's little white lines on the video which I can detect). by correlation I mean pearson's coeficient, you can't calculate that in script with mvtools. It's immune to brightness, contrast, and order of pixels, and gives the best expectation in the presense of white noise.

19th April 2011, 16:22	#7 \| Link
pbristow Registered User Join Date: Jun 2009 Location: UK Posts: 263	I'm not clear exactly what that code is doing since you haven't identified the variables... But in any case, thinking about how you would do the calculation in a "normal" programming language is a red herring. The trick to using the MT_lut toolkit is to break down the expressions you're trying to evaluate into components that can each be computed in a single MT_(whatever) call, and then combine them. One of the requirements there is that the functions don't return single values, but complete video clips of all the results of whatever expression you give them evaluated over the entire input clip(s). If you need a single value, such as the average of some expression evaluated over the whole frame, then you evaluate the expression using MT(whatever) and then pass the result to the runtime function AverageLuma. (Yes, you do have to get into using the runtime environment, alas, which is another thing that makes things slow.) MT_lut calculations are done in (at least) 16 bits internally, by the way; you just have to make sure the result of each block of calculation (i.e. what comes out of the MT_(whatever) call) can be expressed adequately in 8 bits.

22nd April 2011, 00:30	#11 \| Link
jmac698 Registered User Join Date: Jan 2006 Posts: 1,867	But there is data loss. His formula is right, and can be done in less steps but I still need to extract 16bits from those operations. You gave me an idea though; however even with 2 pixels I still get more than 8 bit numbers. I see about 6 steps there at least. The maximum intermediate values seem to be when half the pixels of each image are white and half are black (and the same ones are white in each).

23rd April 2011, 01:30	#16 \| Link
jmac698 Registered User Join Date: Jan 2006 Posts: 1,867	So what does /0 mean? r=avg(xy)-avg(x)avg(y)/std(x)std(y) std(x)=(x-mx)^2 essentially. So std(x)=0 when each pixel is the same as the mean of all pixels, which is when each pixel is the mean, or a flatfield. We get r=x/0 when one or both clips are a flatfield. I only want this to match another flatfield. Most of the time I should set r=x/0=0 so there's no match. When is the numerator=0? When is x1y1+x0y0=(x1+x0)(y1+y0) (or x1y1+x0y0+x1y0+x0*y1) That means one of them is a flatfield (I'm jumping ahead here). There's no way to tell both are flatfield, so if the divisor is 0 I would assume they don't match unless they are both flatfield. I would test this by seeing if they both have the same low std.

23rd April 2011, 08:24	#17 \| Link
jmac698 Registered User Join Date: Jan 2006 Posts: 1,867	Well I have good news and bad news. First the good news I got it to work and practically seems pretty good. Didn't get a divide by 0 in real life. The accuracy of std doesn't seem to matter too much, though I didn't thoroughly test. The bad news. I can't distinguish my frames. My same frames with different noise have r>.90, but my same frames are only .001 more than comparing with a previous frame, even when large motion. Sometimes the wrong frame matches better. It seems the noise is too much. Denoising doesn't seem to help. Looking at the current minus the previous, when they are different there are obvious "shadows" around edges, so maybe a count of "8 connected" pixels would help to detect this. Otherwise is there some more robust way to match two frames?

23rd April 2011, 08:52	#18 \| Link
pbristow Registered User Join Date: Jun 2009 Location: UK Posts: 263	Another trick to improve accuracy: The time when you need the most precision in the std is when the stds are both low. A low denominator means a high final value, and small steps in the denominator will have a much bigger effect on that value than when the denominator is higher. Three possible tricks to achieve this required extra precision at low values of std, by trading off against precision at the (less significant) higher values: 1. Store the natural logarithm of the std. This gives equal percentage precision across the whole range of values. (Problem: Is there an operator to create logs in the MT_lut... set? If I wasn't so lazy I could look at the docs and check... ) 2. Store the square root of the std. This still allows more precision (percentage wise) at higher values than low ones, but may be computationally quicker/easier to deal with. (Note: The computation gets done when opening the script, not during execution, so overall performance shouldn't really be affected.) 3. Store the reciprocal of the std. In a sense this is the most natural option, since we'll be dividing by the sum of the stds anyway. The snag is that MT_lutf doesn't allow extra operations to be done after computing the std, which means you end up having to implement the std operation yourself before applying the final precision-saving step. Messy. I wonder if a strong enough case could be made for adding a "post-expression" option to MT_lutf?