Welcome to Doom9's Forum, THE inplace to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. 
13th April 2011, 04:23  #1  Link 
Registered User
Join Date: Jan 2006
Posts: 1,862

cutfinder/drop detection
Thanks to a script from Gavino, I know how to search through two videos, comparing them to find the differences. It reminds me of an insertion sort. The problem is the comparison stage; they have to be very similiar in all properties to match just on lumadiff etc.
Is there a comparison more robust to noise? Well, correlation of course, but is there some correlation filter which returns a script variable for the final answer? 
18th April 2011, 11:41  #2  Link 
Registered User
Join Date: Jun 2009
Location: UK
Posts: 262

It depends partly what you want to do the matching for. Are you just trying to get the correct frame from A matched up with the correct frame from B, i.e. synchronise them? Or are you trying to measure some specific aspect of how video A differs from video B? (In which case, you need to choose a process that reduces the extraneous factors such as noise, but doesn't destroy or distort the characteristic you want to measure.)
For solving the first problem, here are some ideas: 1. Run the two videos through a denoise stage (same process and parameters for both) before comparing them. 2. If there are big differences in the noise characteristics of two sources, you could run the two videos through various different denoising techniques (e.g. one spatial, one temporal; or spatial soften with two different sets of parameters (one subtle, one more aggressive); etc); then do crosscomparisons for each technique on video A against each technique on video B, and select whichever combination gives the closest matches. 3. Do something similar to what MVtools does for matching blocks, only applied to whole frames: Start by comparing really lowresolution versions of the two videos (e.g. 1/8th size), to get within a few frames of correct synch; then progressively improve the resolution (1/4 size, 1/2 size, full size) checking and refining the result as you go. As for how you calculate the correct frame offset between the two videos, the process I think you need is: 1. Compare video, Frame N, against video B Frames N2, N1, N, and N+1, N+2  i.e. for an offset of 2 to +2, find which is the closest match. Add that offset to a variable called "total_offset". 2. Repeat step one for the next frame, and the next... Keep a count of how many frames you've compared. 3. After some time (e.g. 10 frames), divide "total_offset" by the number of frames (use floating point), and round it to the nearest integer. If things are going well, this should equal the amount of adjustment you need to make to get the two videos in synch. If the video is fairly static, you may need to test over a longer sequence of frames. 
18th April 2011, 11:59  #3  Link 
Registered User
Join Date: Jun 2009
Location: UK
Posts: 262

Gah! And now I reread you question and realise you already have that second part covered. You meant correlation of values integrated over the frame (spatial correlation)  I was thinking correlation against time! [SLAPS FOREHEAD]
My history in acoustic signal processing is catching me out there: We only had one dimension to worry about! I don't know of any existing spatial correlation filters, but it should be easy to create one with the MT_lut family. (Having just created my first filter using those, I'm suddenly a big fan! ) You need to create the product of videoA * videoB, then sum or average that over the whole frame. Something like: MT_lutxy(... "X Y *" ...).calculate_the_average_luma() will calculate the correlation for the case with no spatial offset between the two videos. You could use MT_convolution to create copies of video B with lots of different spatial offsets, calculate correlation coefficient for each against (unshifted) video A, and choose the spatial offset that gives the best fit (averaging the vector over several frames to find a consistent fit). Or you could go ahead and create a full correlation map over the whole frame, and output that as a video clip... but that would be VERY slow! Using mt_luts might be more efficient (it allows you to do your calculation taking into account neighbouring pixels, so if you tell it use *one* neighbouring pixel (and not the original position) each time, that might be more efficient that running MT_convolution lots of times. Hope these musings help. Interesting problem! Last edited by pbristow; 18th April 2011 at 12:07. 
18th April 2011, 20:09  #4  Link 
Registered User
Join Date: Jan 2006
Posts: 1,862

Looks like this is going to be my next project. Another scripter is interested too. Our problem is capture cards which create freezeframes (dupped frames) and then drop frames a little after. The problem is to fix a video by automatically detecting missing frames and interpolating them, and also remove dupped frames. A further problem is to synchronize n videos (up to 5!) framebyframe.
You gave me a lot of ideas to think about. It looks like dup, dedup, and getdubs all output a log file which could be used, so it would have to be two pass for now unfortunately. In my case I have even worse problems, the noise is extreme and always different on each copy/frame, so I might need to compare only pixels that are good with a mask (it's little white lines on the video which I can detect). by correlation I mean pearson's coeficient, you can't calculate that in script with mvtools. It's immune to brightness, contrast, and order of pixels, and gives the best expectation in the presense of white noise. 
19th April 2011, 09:07  #5  Link  
Registered User
Join Date: Jun 2009
Location: UK
Posts: 262

Quote:
You *can* calculate this using MaskTools (not mvtools), but it would involve multiple steps and be very slow, and in most cases would be total overkill. ...But I might just do it sometime anyway, for the practice! 

19th April 2011, 16:01  #6  Link 
Registered User
Join Date: Jan 2006
Posts: 1,862

Well actually I tried. First of all, when you do divisions you generally need a lot of accuracy, and I did numerical studies and you need at least 16 bits (they ranged up to 24). Next, there's one step in the equation where you have an expression of 4 variables, which I couldn't do. Maybe you can find some way to do it.
Here's some code: Code:
For i=0 To l1 mx=mx+ref(i) my=my+v(i) Next i mx=mx/l my=my/l For i=0 To l1 sxy=sxy+(ref(i)mx)*(v(i)my) s2x=s2x+(ref(i)mx)^2 s2y=s2y+(v(i)my)^2 Next i r=sxy/(sqr(s2x)*Sqr(s2y)) I calculated the means mx my as averages directly. I think you can even do that with a simple resize. My DeepColor tools has some of the code needed (it does 16bit operations with masktools). Last edited by jmac698; 19th April 2011 at 16:23. 
19th April 2011, 16:22  #7  Link 
Registered User
Join Date: Jun 2009
Location: UK
Posts: 262

I'm not clear exactly what that code is doing since you haven't identified the variables... But in any case, thinking about how you would do the calculation in a "normal" programming language is a red herring. The trick to using the MT_lut toolkit is to break down the expressions you're trying to evaluate into components that can each be computed in a single MT_(whatever) call, and then combine them. One of the requirements there is that the functions don't return single values, but complete video clips of all the results of whatever expression you give them evaluated over the entire input clip(s). If you need a single value, such as the average of some expression evaluated over the whole frame, then you evaluate the expression using MT(whatever) and then pass the result to the runtime function AverageLuma. (Yes, you do have to get into using the runtime environment, alas, which is another thing that makes things slow.)
MT_lut calculations are done in (at least) 16 bits internally, by the way; you just have to make sure the *result* of each block of calculation (i.e. what comes out of the MT_(whatever) call) can be expressed adequately in 8 bits. 
19th April 2011, 21:38  #8  Link 
Registered User
Join Date: Jan 2006
Posts: 1,862

mx=means of x
x=ref(i)=the first list of numbers to correlate y=v(i)=the second list of numbers to correlate my=mean of y sxy=sum of (xmx)*(ymy) s2x=sum of (xmx)^2 s2y=sum of (ymx)^2 r=the correlation (between 1 and 1, where close to 1 means the originals are equivalent) mt has 32bit precision for intermediates, but I can only store an 8 bit result. If I break the calculation into steps, I have to store a result. The result then limits the precision. And you don't need average luma, you can either resize or have a copy of an average in all pixels at once. Do you want to try to write it? It would be useful, I just don't think it's possible. Here's the first two calculations: Code:
l=blankclip(60,16,32,pixel_type="YV12",color_yuv=$648080)#luma=100 r=blankclip(l,color_yuv=$C88080)#luma=200 x=stackhorizontal(l,r) y=stackhorizontal(r,l) mx=mt_lutf(x,x,expr="x") my=mt_lutf(y,y,expr="x") mx#all grey image with luma=150 Last edited by jmac698; 19th April 2011 at 22:00. 
20th April 2011, 00:27  #9  Link 
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,380

Sure it's possible. Perhaps tedious and messy, but certainly possible.
First notice that the formula for the correlation can be rewritten as r = (avg(x*y)  avg(x)*avg(y)) / (std(x)*std(y)) All these operations are available in mt_lutf and mt_lutxy, so it's just a question of putting it all together. (left as an exercise for the reader ) 
21st April 2011, 19:27  #10  Link  
Registered User
Join Date: Jun 2009
Location: UK
Posts: 262

Quote:
What you need to do is look under the skin of the orange, and see the natural structure of it. Instead of trying to create slices of even thickness, see that the orange naturally falls into a set of similarshaped radial segments, each of which has a natural boundary that keeps the juice in (i.e. the interfaces between them can all be done in 8 bits, without data loss). That's what Gavino's done. 

22nd April 2011, 00:30  #11  Link 
Registered User
Join Date: Jan 2006
Posts: 1,862

But there is data loss. His formula is right, and can be done in less steps but I still need to extract 16bits from those operations. You gave me an idea though; however even with 2 pixels I still get more than 8 bit numbers. I see about 6 steps there at least. The maximum intermediate values seem to be when half the pixels of each image are white and half are black (and the same ones are white in each).

22nd April 2011, 00:54  #12  Link  
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,380

Part of the 'messiness' I was alluding to was the probable need to scale intermediate results to maintain precison. For example, x*y has a range of 066025, so you need to store it as x*y/255 or possibly as two separate 8bit quantities. Another is that the final result in the range 1 to 1 should be scaled to be between 0 and 255. There is also the question of how you handle the case of a uniform clip (std()=0, so correlation is undefined).
Quote:
I believe the std cannot exceed 128, so you could gain more precision by multiplying it by 2. 

22nd April 2011, 10:50  #13  Link 
Registered User
Join Date: Jan 2006
Posts: 1,862

How's this:
Code:
#Correlation test v0.1 by jmac #trying to compute a correlation in script between video clips x and y, result as r l=blankclip(60,16,32,pixel_type="YV12",color_yuv=$648080)#luma=100 r=blankclip(l,color_yuv=$C88080)#luma=200 x=stackhorizontal(l,r) y=stackhorizontal(r,l) #r=(avg(xy)avg(x)*avg(y))/(std(x)*std(y)) mx=mt_lutf(x,x,expr="x")#all grey image with luma=150 my=mt_lutf(y,y,expr="x") stdx=mt_lutf(x,x,"std",expr="x") stdy=mt_lutf(y,y,"std",expr="x") mxyt=mt_lutxy(x,y,"x y * 256 /") mxy=mt_lutf(mxyt,mxyt,expr="x") tops=mt_lutxyz(mxy,mx,my,"x 256 * x y *  512 / 128 +") r=mt_lutxyz(tops,stdx,stdy,"x 128  y z * / abs 255 *") v1=scriptclip(mx.pointresize(100,20),"""subtitle("mx="+string(averageluma,"%.0f"))""") v2=scriptclip(my.pointresize(100,20),"""subtitle("my="+string(averageluma,"%.0f"))""") v3=scriptclip(stdx.pointresize(100,20),"""subtitle("stdx="+string(averageluma,"%.0f"))""") v4=scriptclip(stdy.pointresize(100,20),"""subtitle("stdy="+string(averageluma,"%.0f"))""") v5=scriptclip(mxyt.pointresize(100,20),"""subtitle("mxyt="+string(averageluma,"%.0f"))""") v6=scriptclip(mxy.pointresize(100,20),"""subtitle("mxy="+string(averageluma,"%.0f"))""") v7=scriptclip(tops.pointresize(100,20),"""subtitle("tops="+string(averageluma,"%.0f"))""") v8=scriptclip(r.pointresize(100,20),"""subtitle("r="+string(averageluma,"%.0f"))""") stackvertical(v1,v2,v3,v4,v5,v6,v7,v8) 
22nd April 2011, 11:55  #14  Link  
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,380

Quote:
It should be "x 256 * z y *  512 / 128 +" However, a simpler (and faster) way is to note that each avg or std component of the 'r' formula is a single number (same for all pixels), so it can be more easily done in floating point precision directly within ScriptClip. Code:
ScriptClip(x, """ axy = AverageLuma(mxyt) ax = AverageLuma(x) ay = AverageLuma(y) sx = AverageLuma(stdx) sy = AverageLuma(stdy) r = (256*axy  ax*ay)/(sx*sy) # in [1, 1] Subtitle(...) """) For greater accuracy, you could even compute x*y in full precision as two separate 8bit clips and use them in the calculation instead of mxyt. 

22nd April 2011, 23:31  #15  Link 
Registered User
Join Date: Jan 2006
Posts: 1,862

Ok, this works except for some little details (what is the proper correlation of x=(103,201) and y=(201,201)?)
Code:
#Correlation test v0.2 by jmac #big thanks to Gavino, again :) #trying to compute a correlation in script between video clips x and y, result as r #this works but may be accuracy problems in std() l=blankclip(60,16,32,pixel_type="YV12",color_yuv=$678080)#luma=103 r=blankclip(l,color_yuv=$C98080)#luma=201 x=stackhorizontal(l,r) y=stackhorizontal(r,l)#cahnge to r,r to get r=0/0 #r=(avg(xy)avg(x)*avg(y))/(std(x)*std(y)) stdx=mt_lutf(x,x,"std",expr="x") stdy=mt_lutf(y,y,"std",expr="x") xyhigh=mt_lutxy(x,y,"x y * 8 >>u")#don't use 256 / it rounds i.e. 80.8=>81 xylow=mt_lutxy(x,y,"x y * 255 &u") blankclip#something to display results on ScriptClip(""" mx=averageluma(x) my=averageluma(y) mxyhigh=averageluma(xyhigh) mxylow=averageluma(xylow) mxy=mxyhigh*256+mxylow stx=averageluma(stdx) sty=averageluma(stdy) top=mxymx*my bot=stx*sty r=top/bot subtitle("mx="+string(mx)+" my="+string(my)) subtitle(y=20,"mxyhigh="+string(mxyhigh)+" mxylow="+string(mxylow)+" mxy="+string(mxy)) subtitle(y=40,"mx*my="+string(mx*my)+" mxymx*my="+string(top)) subtitle(y=60,"stx="+string(stx)+" stdy="+string(sty)+" stx*sty="+string(stx*sty)) subtitle(y=80,+" r="+string(top/bot)) """) 
23rd April 2011, 01:30  #16  Link 
Registered User
Join Date: Jan 2006
Posts: 1,862

So what does /0 mean? r=avg(xy)avg(x)avg(y)/std(x)std(y)
std(x)=(xmx)^2 essentially. So std(x)=0 when each pixel is the same as the mean of all pixels, which is when each pixel is the mean, or a flatfield. We get r=x/0 when one or both clips are a flatfield. I only want this to match another flatfield. Most of the time I should set r=x/0=0 so there's no match. When is the numerator=0? When is x1*y1+x0*y0=(x1+x0)*(y1+y0) (or x1*y1+x0*y0+x1*y0+x0*y1) That means one of them is a flatfield (I'm jumping ahead here). There's no way to tell both are flatfield, so if the divisor is 0 I would assume they don't match unless they are both flatfield. I would test this by seeing if they both have the same low std. 
23rd April 2011, 08:24  #17  Link 
Registered User
Join Date: Jan 2006
Posts: 1,862

Well I have good news and bad news.
First the good news I got it to work and practically seems pretty good. Didn't get a divide by 0 in real life. The accuracy of std doesn't seem to matter too much, though I didn't thoroughly test. The bad news. I can't distinguish my frames. My same frames with different noise have r>.90, but my same frames are only .001 more than comparing with a previous frame, even when large motion. Sometimes the wrong frame matches better. It seems the noise is too much. Denoising doesn't seem to help. Looking at the current minus the previous, when they are different there are obvious "shadows" around edges, so maybe a count of "8 connected" pixels would help to detect this. Otherwise is there some more robust way to match two frames? 
23rd April 2011, 08:52  #18  Link 
Registered User
Join Date: Jun 2009
Location: UK
Posts: 262

Another trick to improve accuracy: The time when you need the most precision in the std is when the stds are both low. A low denominator means a high final value, and small steps in the denominator will have a much bigger effect on that value than when the denominator is higher.
Three possible tricks to achieve this required extra precision at low values of std, by trading off against precision at the (less significant) higher values: 1. Store the natural logarithm of the std. This gives equal percentage precision across the whole range of values. (Problem: Is there an operator to create logs in the MT_lut... set? If I wasn't so lazy I could look at the docs and check... ) 2. Store the square root of the std. This still allows more precision (percentage wise) at higher values than low ones, but may be computationally quicker/easier to deal with. (Note: The computation gets done when opening the script, not during execution, so overall performance shouldn't really be affected.) 3. Store the reciprocal of the std. In a sense this is the most natural option, since we'll be dividing by the sum of the stds anyway. The snag is that MT_lutf doesn't allow extra operations to be done after computing the std, which means you end up having to implement the std operation yourself before applying the final precisionsaving step. Messy. I wonder if a strong enough case could be made for adding a "postexpression" option to MT_lutf? 
23rd April 2011, 09:36  #19  Link  
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,380

Quote:
Quote:
Last edited by Gavino; 23rd April 2011 at 09:56. 

23rd April 2011, 16:56  #20  Link 
Registered User
Join Date: Jan 2006
Posts: 1,862

p
Log idea is a great trick. I used a log table to divide once (z/d*32) and do some other operations in a 3d engine. Tables can be extremely powerful. Yep, when I saw r>>1 I knew there was a problem. I know std is internally available to more precision, because I'm looking at the true and masktools values and can see that masktools is rounding. The same thing was updated in the last alpha which only took a few days, it should be a quick fix. Here is a replacement function: Code:
function std(clip x) { x2high=mt_lut(x,"x x * 8 >>u") x2low=mt_lut(x,"x x * 255 &u") avgx2=averageluma(x2high)*256+averageluma(x2low) avgx=averageluma(x) sqrt(avgx2pow(avgx,2)) } Moving on, how should I detect low motion frames so I can make fuzzy logic to ignore correlation indications at these frames? I'll just assume that out of sync by correlation frames in a series of low motion frames and surrounded by insync frames are really insync. It's a lot slower now. 
Thread Tools  Search this Thread 
Display Modes  

