Convert 24p to 60p through both duplication and interpolation? [Archive]

View Full Version : Convert 24p to 60p through both duplication and interpolation?

miamicanes

17th December 2010, 01:33

Suppose you have a 720p24 video that you want to convert to 480i60, specifically optimized for playback on an inherently-interlaced CRT 480i60 display. I'm pretty sure I can figure out most of it, but I'm totally stumped about how to do the "middle" (selective interpolation and duplication) part:

-----

for each sequential pair of 720p24 source frames A and B:

1. use MVinterpolate(?) to synthesize a frame that's roughly 2/3 of the way between A and B. We'll call it "Q".

2. output five frames: A A Q B B

-----

Ultimately, I'd rip through the faux 60fps progressive output and alternate between grabbing odd and even lines to create an interlaced 30fps video. If A, B, C, and D are 24p source frames, Q is synthesized and roughly 2/3 of the way between A and B, and V is synthesized and roughly 1/3 of the way between C and D, and "o" & "e" indicate an odd or even set of scanlines, a 5-frame/10-field chunk of the final video would be in the form:

AoAe QoBe BoCe CoVe DoDe (then repeat from AoAe...)

I know this would look horrific on a natively-progressive display, but I believe it would be a nearly-ideal compromise between judder-minimization and motion artifacts from synthetic fields for a SD CRT TV. Each source frame would start at the same relative moment as it would in normal 3:2 pulldown, but the addition of two interpolated fields would turn the 3:2:3:2 cadence into 2:(1):2:2:(1):2. Put another way, it would still "buzz" from judder, but the frequency of the visual buzzing would be roughly double what it is with conventional pulldown.

I know mixing fields in a single frame is an absolute no-no for progressive output, but logically it seems like for a natively-interlaced display, it would kind of be like using adjacent subpixels to smooth text the way cleartype does. The MPEG encoder might know that film frame #2 is split between video frames #2 and #3, but presumably your eyes aren't keeping tally of field pairs, and the TV itself is completely indifferent. You'd still notice something is amiss, but I'm predicting that it wouldn't be nearly as obvious as the effect you get from traditional 3:2 pulldown. As an added bonus, only 2 out of 10 fields would be synthetic, and each would persist for only 1/60th of a second, so even if a motion glitch caused some distracting artifact, it would only be a single field surrounded by four unmodified fields from the original video source (2 source frames).

Worst-case, I'm guessing that it might throw a monkey wrench into the efficiency of the MPEG-2 encoding algorithm by breaking an assumption made by its authors about the nature of the video being encoded... but if file size is only a secondary concern, it seems like it would work spectacularly well (and might have even become the norm, had CRT displays not become commercially obsolete a decade before computers became fast enough to casually synthesize interpolated video fields like this)

Anyway, the part between the hyphenated lines is what I'm really stuck on right now (I've done a fair bit with Avisynth, but I've never tried to do anything that didn't apply to *everything*, as opposed to trying to selectively pick out frames and do specific things for specific frame numbers.

miamicanes

17th December 2010, 04:14

ah, ok... I think I found a way to at least do a proof of concept rendering, even though it's not terribly efficient:

super = MSuper(pel=2)
backward_vec=MAnalyse(super, overlap=4, isb=true, delta=1, search=3)
forward_vec=MAnalyse(super, overlap=4,isb=false, delta=1, search=3)
MFlowFPS(super, backward_vec, forward_vec, num=48000, den=1001)
# we now have AQBV
RepeatEveryFrame(2)
# we now have AAQQBBVV
DeleteEvery(8,6,7)
# we now have AAQQBB
DeleteEvery(6,3)
# we now have AAQBB
AssumeFPS(60) # forcibly tell AVIsynth our video is now 60fps

TimeStretch(tempo=250.25)
# TimeStretch value is probably wrong because MFlowFPS itself modifies it, I think.

I'm not sure yet whether it worked as intended... I'm rendering a 5-minute test clip, but it's slow as hell. I'm afraid to see how slow it's going to be when I put GradFun2dbmod(str=96,temp=65,radius=3, range=8, mode=2) back in...

Gavino

17th December 2010, 11:26

MFlowFPS(super, backward_vec, forward_vec, num=48000, den=1001)
# we now have AQBV
That doesn't look right - you have made Q halfway between A and B rather than 2/3.
I think what you want is this:
super = MSuper(pel=2)
backward_vec=MAnalyse(super, overlap=4, isb=true, delta=1, search=3)
forward_vec=MAnalyse(super, overlap=4,isb=false, delta=1, search=3)
Q = MFlowInter(super, backward_vec, forward_vec, time=200.0/3).SelectEven()
A = SelectEven()
B = SelectOdd()
Interleave(A, A, Q, B, B)
That will directly give you 60fps (or more precisely, 2.5 times original frame rate).
No need for TimeStretch since duration is preserved.

Tip: For initial testing of this sort of thing, use as source a short blank clip with frame numbers added:
BlankClip(20, pixel_type="YV12").ShowFrameNumber()
Then check frame rate of result and step through frames to see if the order is correct.
No need to wait hours for an encode. ;)

miamicanes

17th December 2010, 19:01

Cool, I can't wait to try that tonight when I get home from work! For what it's worth, even with the interpolated frames' timing slightly off (50/50 instead of 2/3), the experiment was definitely a success. The 2:(1):2:2:(1):2 video definitely looked better than earlier experiments I did with straight 3:2, MFlowFPS(to 29.97fps from 23.976), and MFlowFPS(to 59.94fps from 23.976).

The frame number rendering idea is great. In fact, I'd like to try taking it a step further so I can visually confirm that neither TMPGenc nor the player itself is insidiously mangling the video along the way. Can you think of any good way to get the modulus of the source frame (0..23) and output frame (0..59), and render both values according to the following rules:

* even values get rendered on the left side of the video, odd values get rendered on the right.

* output frame value gets rendered closer to the video's edge than source frame value (maybe a different color, too?)

* vertical position is a function of the modulus value... 0/1 rendered at the top, 22/23 and 58/59 rendered at the bottom

Visually, it would be something like this:

0 0 - -
- 0 - 1
2 0 1 -
- - 1 3
4 - 1 -
- 2 - 5
6 2 - -
- 2 3 7
8 - 3 -
- - 3 9
10 4 - -
- 4 - 11
12 4 5 -
...
56 22 - -
- 22 23 57
58 - 23 -
- - 23 59
(with more space between the left pair and right pair than the forum software will allow me to show here, and blank space where I indicated a "-" above).

Obviously, only a single line at a time would be rendered onto any given frame, but over the course of a second, I'd expect to see 30 rows of 4 values lighting up in a predictable cadence that will hopefully make any dropped fields/frames instantly noticeable.

Didée

17th December 2010, 19:37

Suggestion: MFLowFPS to 48 Hz, then BlendFPS to 60Hz with aperture =~ 0.5.
_____

If you're concerned about speed, then do NOT use search=3. (Exhaustive search is very slow, and sort of a joke together with [per default] truemotion=true ... exhaustive searches plenty, then truemotion immediately rejects most of them because of [holy thresholds].)

Search=2 should be good, search=5 should be fine.

Gavino

17th December 2010, 20:45

Can you think of any good way to get the modulus of the source frame (0..23) and output frame (0..59), and render both values according to the following rules: ...
I've produced something like your requested frame numbered output using this code:
... your source goes here ...

# add input frame numbers:
ScriptClip("""
f = current_frame%24
Subtitle(string(f,"%02.0f"), y=f/2*45, align=8, text_color=color_red)
""")

... your processing goes here ...
(either mine from earlier, or Didée's suggestion)

# add output frame numbers:
ScriptClip("""
f = current_frame%60
align = (f % 2 == 0 ? 7 : 9)
Subtitle(string(f,"%02.0f"), y=f/2*18, align=align)
""")
A problem is that, since the input is motion interpolated, adjacent pairs of input frame numbers ([0,1], [2,3], etc) must be shown in the same place - if not, they disappear in the interpolated frames - so I've chosen to put them in the middle.
The code assumes your clip is high enough to show 30 separate lines of text at default spacing.

miamicanes

17th December 2010, 23:22

Awesome, thanks! I can't wait to try it out (~2-3 hours from now) :)

Actually, while we're on the topic... the videos have a static logo at the lower right. The problem is, anytime there's lots of motion in the area it overlays, the logo itself gets artifacted. Is there any way to maybe capture a frame of the video, load it into photoshop, turn the area where it is into a monochrome mask, then pass it (as an instantiated image object, or as a filename string) to MVinterpolate as a parameter and tell MVinterpolate and/or MVanalyse, "don't even look at the masked area; just pretend it doesn't exist and leave it alone"?

update: rendering now (7pm)

miamicanes

20th December 2010, 04:29

I'm not quite sure what to make of it, but I've verified it 100% in three double-blind tests of different videos so far (I can always pick out one of two videos I like better, and every time it's ended up being the one interpolated at 50). Interpolating the AB frame at 200.0/3 produces video that looks better than interpolating the AB frame at 50 when viewed frame-by-frame or in relative isolation, but videos where I've interpolated the AB frame at 50 seem to be nicer to actually "watch".

It's weird, but I almost can't watch the ones interpolated to 66 for more than a few minutes before feeling like I want to do something -- anything -- else. In contrast, the ones interpolated to 50 leave me feeling like I can see artifacts everywhere, but it nevertheless "feels" more correct.

Here's an example where the effect seems to be particularly vivid: take a scene where two people are sitting next to each other. One is looking to the left, then quickly turns his head 180 degrees to the right over the span of a few frames. Using ~66, it "feels" like the person pauses for a moment, then whips his head around with almost cartoon-like speed. At 50, you can almost taste the judder, and it looks terrible when stepping through frame by frame... but when viewed at 60fps, the head-turning looks normal. The person starts, continues, and completes the turn at what looks like a natural & deliberate speed.

Ditto, for hand-sweeps. At 66, it looks like the person pauses, takes half a breath, then whips his hand to the target position smoothly, but with comic velocity (think: Bugs Bunny or Elmer Fudd directing an orchestra). Interpolated at 50, the hand (and especially the fingers) gets slaughtered by the interpolation algorithm and turns into a blurred mess, but once again, at 50 the motion looks normal & deliberate when viewed at 60fps.

I'm torn, because I'm now about halfway done, and I'm not sure whether I want to do the second half at 50 or 200.0/3, and whether I want to go back and re-render the first half to 50 too. Intellectually, I know that 66 makes sense and looks correct (in theory, at least), but the actual rendered videos just don't feel "right".