Preparing for numPy reshape [Archive] - Page 2

View Full Version : Preparing for numPy reshape

Pages : 1 [2]

markfilipak

6th November 2021, 02:02

The last line on the right : A+a.... looks correct .

That's good news. Note that the result is 3-2-3-2-2 pull-down. Hmmm... 10 fields in, 12 fields out. ... :eek:

...Yes, you read that right: "fields". At that point, they are fields. :)

poisondeathray

6th November 2021, 02:14

In the pps[hps] case - What is the distinction between 2 half pics combined to form a picture vs. not yet a frame ? Isn't the "picture" already a "frame" in this case ?

25pps <== this is a picture stream.
25fps[25pps] <== this is a picture stream, framed.

"Isn't the picture already a frame" -- a picture is not a frame. A picture is a picture. A frame is a frame.

"A picture is a picture." "A frame is a frame." Great definition... :)

What does a pps (picture) stream , combined from halfpics, yet "unframed" look like ? => I think this is the trouble I'm having, in the case of progressive content. pps "framed" vs. pps "unframed"

This is where I'm coming from - In the old way of describing things as fields: 2 fields from the same moment in time, when combined, form a progressive frame. So in the pps[hps] case, where we pretend 2 "halfpics" are the "fields", and we're combining them , what is it ? You're calling it 2 half pics forming a picture. That's fine. But in the old way , that was also a "frame". I'm not seeing the distinction between picture and frame in this case.

I'm making the assumption it's playing at the same speed (we're not doing things like speeding up, or adding frames, those sorts of things) - is that the difference when you write "not yet a frame?" The frame rate is not assigned ? The 25fps in "25fps[25pps]" is "assumed" in my head, unless otherwise stated . Or is it something else I'm missing ? It feels like I'm missing something...

"Encoded interlaced" vs. "Encoded Progressive"

Here's my notion about all that. Correct me, please.
"encoded interlaced" is 25fps[50sps] for example -- scans, interlaced [note] and framed.
"encoded progressive" is 25fps[25pps] for example -- pictures framed.

We're describing a file that was "encoded I or P"

Encoded interlaced would be something [25pps] , because the input source [25pps] used to generate the file we are describing now was the same in both cases. The content is the same in both cases for input and ouput [25pps] - it doesn't change. Nothing funky is going on ; nothing like fields added or deleted - ie. no change in "structure."

No, "encoded interlaced" (or "encoded progressive") is independent of [] .

? "independent of []"? "[]" is the source, right? How can the state of being "encoded interlaced" or "encoded progressive" be independent of the video? Or are you saying that "encoded interlaced" & "encoded progressive" apply only to targets? What's the difference?

It can be [sps] or [pps]

?

Yes, it's independent of input source. "Encoding interlaced" vs. "encoding progressive" is applied to an input source , producing an output. ie. We're describing the output file from that encoder , and that the encoder used either I or P setting to encode. The encoder uses different algorithms for I vs. P encoding, and the output file is flagged differently. But the underlying content and structure is not changed . There are differences in quality, between using "encoded i or p" it has implications for various issues and handling as described in earlier posts

I don't think "Encoded I or P" fits in the way the notation describes "structure" and streams. The streams (encoded "i" or encoded "p"), functionally look the same on paper in terms of structure

If there is no motion, both field pairs are from the same image - it's a progressive content during that scene (unless it's mishandled by playback chain, eg. deinterlaced) .

I don't understand. If there's no motion, then the images are identical, but they're still in a stream of separate images, aren't they?

(it should have said both fields from the pair are from the same image - "both field pairs" implies 4 fields)

Yes, it's basically the same thing repeated . If you examined frames (in a non deinterlacing player) , it would look like a single progressive, full quality image repeated at 29.97 frames /s . (If it was deinterlaced using most algorithms, there would be lower quality, aliasing)

"non deinterlacing player" - meaning player doesn't apply a deinterace filter automatically

<Translation Crib sheet>

your pps <=> means "x progressive content"

your "interlace" <=> means "put together" , such as combining 2 half pics , to pics

your "deinterlace" <=> means "take part", such as separating pics , to half pics

"deinterlace" in most video forums implies separating fields and interpolating the other scan line (by various algorithms) to generate a full frame (or "picture") . An example of a deinterlacing filter would be "yadif". Single rate deinterlace means dropping one set of scan lines (half temporal resolution if content was interlaced to begin with (ie. starting with [50sps] or [59.94sps] ) , and I think the naming would be 25fps[25pps[50sps]]), double rate deinterlace means keeping the temporal resolution , I think it would be 50fps[50pps[50sps]] )

<>

1) Let's say you recieve a clip, no other information about the camera or background info, and the recording was that static scene only - what would you "label" it ?

30'fps[60'sps]].

How did you determine it was [60'sps] from the clip only ?

Would 29.97fps[29.97pps] "look" the same ?

poisondeathray

6th November 2021, 02:15

Well then! All the words you've written and all the meaning you've struggled to convey boils down to this: 30fps[1,3,2,5,4,7,6,8-12[1+0000[6x[50hps[25fps]]]]]

Is that an improvement?

30fps[1,3,2,5,4,7,6,8-12[1+0000[6x[50hps[25fps]]]]]
\ \ \ \ \ \____________the source video
\ \ \ \ \_________________unframe and deinterlace
\ \ \ \___________________1-to-6 duplicate each halfpic
\ \ \_________________________drop halfpics # 2 3 4 5
\ \___________________________________________reshuffle the remaining halfpics
\_______________________________________________interlace and frame

Idea** If I want to be verbose:

30fps[30pps[60hps[1,3,2,5,4,7,6,8,9,10,11,12[60hps[1+0000[300hps[6x[50hps[25pps[25fps]]]]]]]]]]

Yes the added descriptors helps with my "homework"

It's accurate, concise, and conveys a lot of precise information (the patterns, which was dropped etc...) . Unfortunately nobody would understand the notation without investing a fair bit of time... and I'm still not comfortable with it , yet....

In this forum you could say, "25p content telecined for NTSC" and most would know in basic terms what was meant (not necessarily the pattern details, or how you got there) . And if the blend variant was used, add "field blended" to the description (field blends are present in that case, but the procedure involves frame, not field blending)

poisondeathray

6th November 2021, 02:46

markfilipak

6th November 2021, 04:53

"A picture is a picture." "A frame is a frame." Great definition... :)
Not definitions, distinctions. They are two different things, at least, to me. I find it very clarifying to keep them well separated.

What does a pps (picture) stream , combined from halfpics, yet "unframed" look like ?
Like numbers in a memory hex dump. I don't mean to be snarky, but I actually don't know what you mean by "look like". How does one represent a mental visualization or a blob of data in memory?
=> I think this is the trouble I'm having, in the case of progressive content. pps "framed" vs. pps "unframed"
Waxing pedantic, pps is a stream of pictures in memory. It's a handy abstraction. "Framed" just means the pictures in the stream have each been enclosed in a frame structure: a pattern of surrounding data bits defined by MPEG as constituting an elemental stream.

This is where I'm coming from - In the old way of describing things as fields: 2 fields from the same moment in time, when combined, form a progressive frame.
Agree.
So in the pps[hps] case, where we pretend 2 "halfpics" are the "fields", and we're combining them , what is it ?
The two halfpics are not fields. They have to be in a frame to be fields.
What 'it' is, is a picture: a blob of memory that has no containing structure (i.e. no frame).

You're calling it 2 half pics forming a picture. That's fine. But in the old way , that was also a "frame". I'm not seeing the distinction between picture and frame in this case.
When you look at a display, you see pictures. When you look inside an elemental stream with a hex editor, you see frames.
Frames are data, ephemeral things. Pictures (as samples) are what is shot with a camera or sampled with a film scanner, plus (as pixels) are the interpretation of the data that is enclosed within frames, plus (as pels) are what is displayed on a screen.
I'm feeling selfconscious. Why am I explaining this to You? Why are You asking?

I'm making the assumption it's playing at the same speed (we're not doing things like speeding up, or adding frames, those sorts of things) - is that the difference when you write "not yet a frame?" The frame rate is not assigned ? The 25fps in "25fps[25pps]" is "assumed" in my head, unless otherwise stated . Or is it something else I'm missing ? It feels like I'm missing something...
A series of numbers placed on a page (or on a display) is just a series of numbers. When entered into a spreadsheet cell, that series of numbers becomes part of a spreadsheet. When surrounded by a frame structure, a picture becomes part of the frame. But a picture isn't a frame any more than a series of numbers is a spreadsheet. If you habitually call a series of numbers a "cell", then I don't know whether they're already in a spreadsheet or not in a spreadsheet. That makes a difference when someone (me) tries to understand what you are writing about, eh?

"Encoded interlaced" vs. "Encoded Progressive"

We're describing a file that was "encoded I or P"

Encoded interlaced would be something [25pps] , because the input source [25pps] used to generate the file we are describing now was the same in both cases. The content is the same in both cases for input and ouput [25pps] - it doesn't change. Nothing funky is going on ; nothing like fields added or deleted - ie. no change in "structure."
Sorry, those are semi-rhetorical statements? Or are they questions?

Yes, it's independent of input source. "Encoding interlaced" vs. "encoding progressive" is applied to an input source , producing an output. ie. We're describing the output file from that encoder , and that the encoder used either I or P setting to encode. The encoder uses different algorithms for I vs. P encoding, and the output file is flagged differently. But the underlying content and structure is not changed . There are differences in quality, between using "encoded i or p" it has implications for various issues and handling as described in earlier posts
I can't understand/appreciate that ...quality... that is the result of an encoder setting. It means nothing to me. I understand structure. I don't understand "Darmok & Jalad at Tanagra".
I don't understand if someone says "A" is "B" in certain contexts. They ask me to climb into their heads in order to understand what they say. I don't know how to do that.

I don't think "Encoded I or P" fits in the way the notation describes "structure" and streams. The streams (encoded "i" or encoded "p"), functionally look the same on paper in terms of structure
Why are you saying that? That's not true at all unless you're referring to a raw stream (in which case, there is no 'i' or 'p'). I'm mystified.

(it should have said both fields from the pair are from the same image - "both field pairs" implies 4 fields)
Okay. Why are you mentioning 4 fields?

Yes, it's basically the same thing repeated . If you examined frames (in a non deinterlacing player) , it would look like a single progressive, full quality image repeated at 29.97 frames /s . (If it was deinterlaced using most algorithms, there would be lower quality, aliasing)
Huh?

"non deinterlacing player" - meaning player doesn't apply a deinterace filter automatically
Why would a player deinterlace?

<Translation Crib sheet>

your pps <=> means "x content"
I don't know what your "x content" means. "pps" means "pictures per second".

your "interlace" <=> means "put together" , such as combining 2 half pics , to pics
It's not my "interlace". I detest the word. Interlace is a verb for a process: the interleaving of odd and even lines, that happens solely on a display. But the rest of the world uses the word to mean anything that makes (or could make) a picture out of 2 half pictures or 2 scans, and I'm tired of fighting the fight.

your "deinterlace" <=> means "take part", such as separating pics , to half pics
Again, I'm using the word because I'm tired of fighting that fight. "Separating half pictures is much more accurate". We share the same idea.

"deinterlace" in most video forums implies separating fields and interpolating the other scan line (by various algorithms) to generate a full frame (or "picture") . An example of a deinterlacing filter would be "yadif". Single rate deinterlace means dropping one set of scan lines (half temporal resolution if content was interlaced to begin with (ie. starting with [50sps] or [59.94sps] ) , and I think the naming would be 25fps[25pps[50sps]]), double rate deinterlace means keeping the temporal resolution , I think it would be 50fps[50pps[50sps]] )
"Single rate deinterlace"... What a curious expression. What does it mean? Deinterlace at a single rate? Or some sort of deinterlace that produces a rate that's the same as the input rate. I have not the slightest idea. I suspect a phrase like "single rate deinterlace" mixes up all sorts of operations that aught to be kept separate and distinct.

How did you determine it was [60'sps] from the clip only ?
No particular reason.

Would 29.97fps[29.97pps] "look" the same ?
The same as what?

markfilipak

6th November 2021, 05:11

Yes the added descriptors helps with my "homework"
I had such descriptors in previous diagrams. I think you maybe didn't appreciate them at the time. I'm heartened that you appreciate them.

It's accurate, concise, and conveys a lot of precise information (the patterns, which was dropped etc...) . Unfortunately nobody would understand the notation without investing a fair bit of time... and I'm still not comfortable with it , yet....
Hahaha... Maybe, never.

In this forum you could say, "25p content telecined for NTSC" and most would know in basic terms what was meant (not necessarily the pattern details, or how you got there) .
And in casual conversation, "25p content telecined for NTSC" would be fine. The notation is not intended for casual conversation. It's intended to provide a technically precise, unambiguous specification for particular video structures and how they were created. It's intended to cut through vague or nebulous concepts to reveal concrete properties.

And if the blend variant was used, add "field blended" to the description (field blends are present in that case, but the procedure involves frame, not field blending)
I know some of the various ways to blend pictures (all of which involve blending pixels). Tell me: How does one blend fields? Do you put them in a blender?

PS: I apologize. I can be snarky. I know I'm being a PITA about this frames v. pictures thing. But they really are different things at (sometimes) differing rates. If you can break the habit of mixing them up, it will really help me and others, and I think it will help you.

markfilipak

6th November 2021, 05:28

For reference, to perform the same operation ("dropping 25p content on a 29.97i timeline" in a NLE) the steps in avisynth, vapoursynth below. I' m having a bit of trouble with replicating it in ffmpeg, I can't get shuffleframes to work properly, I tried some setpts expressions, vsync vfr, maybe you can figure it out . I also tried a different approach with select expressions with interleave, and trimming one of the separatefields branches to get field 3

avisynth

#25p content source
ChangeFPS(60000,1001)
AssumeTFF().SeparateFields()
SelectEvery(4, 0, 3)
Weave()

vapoursynth

clip = 25p content source

clip = haf.ChangeFPS(clip, 60000,1001)
sep = core.std.SeparateFields(clip, tff=True)
a0 = core.std.SelectEvery(clip=sep, cycle=4, offsets=0)
a3 = core.std.SelectEvery(clip=sep, cycle=4, offsets=3)
i = core.std.Interleave(clips=[a0,a3])
w = core.std.DoubleWeave(i, tff=True)
w = core.std.SelectEvery(w, 2, 0)
w.set_output()

I want to retire all my FFmpeg and go entirely with VapourSynth. But I'm a VS novice and don't grok python -- I'm a OOL animal, but I don't know how to query python constructors and such stuff. Will you help me?

ffmpeg - I verified up to shuffleframes gives the expected output

ffmpeg -i "25p_content.ext" -filter_complex "fps=60000/1001, setfield=tff, separatefields, shuffleframes='0|-1|-1|3', setpts='N/(120000/1001*TB)', weave" -c:v utvideo -an 25p_to_29.97i_telecine.avi -y

"fps=60000/1001, setfield=tff" is not needed.
"shuffleframes='0|-1|-1|3'" works only on frames, and that particular pattern operates on a stride of 4 frames. It copies frame 0 to '0', drops frame 1, drops frame 2, and copies frame 3 to ?where?. The ?where? should be '1'. The result will be a video with half the number of frames. Is that what you intend?
"setpts='N/(120000/1001*TB)". Try this, instead: settb=1/25,setpts=N

PS: 120000/1001? Are you trying to fast motion to 120'fps or did you simply make a mistake?

PPS: I've gotten some fresh sleep...
The 'shuffleframes' filter pattern you're looking for may be
0|3|-1|-1
I can never remember whether the pattern is source-wise or target-wise.
If you're just dropping frames, then this: <target>fps[1+00+1[<source>fps]], will do the trick. You can simply implement that in FFmpeg with this:
select=or(eq(mod(n\,4)\,1)\,eq(mod(n\,4),0)
I haven't figured out what you're trying to do or the frame rates you're trying to achieve. 60'fps? 120'fps?
Dropping 2 frames in a row is awfully brutal.

See if this does what you want to do (my best guess given what you give)

-fv "settb=1/25,setpts=N,shuffleframes=0|3|-1|-1" or
-fv "settb=1/25,setpts=N,select=or(eq(mod(n\,4)\,1)\,eq(mod(n\,4)\,0)" <== this is escaped for Windows -- remove the backslashes if not Windows
\___________________\___this part just hard-wires the input frequency (CFR) and avoids PTS errors but I can't know what output you want from what you provided.

HTH

markfilipak

6th November 2021, 23:22

poisondeathray

7th November 2021, 04:29

What does a pps (picture) stream , combined from halfpics, yet "unframed" look like ?

Like numbers in a memory hex dump. I don't mean to be snarky, but I actually don't know what you mean by "look like". How does one represent a mental visualization or a blob of data in memory?

"Look like" - I'm trying to visualize the "unframed" state so I can understand the difference "framed" vs. "unframed". Similar to the manner that I can visualize a stream of "pictures" on the display right now, or fields, or whatever - The "blob of data" is converted to a RGB representation and rendered on the display.

pps is a stream of pictures in memory. It's a handy abstraction. "Framed" just means the pictures in the stream have each been enclosed in a frame structure: a pattern of surrounding data bits defined by MPEG as constituting an elemental stream.

I can use the "old system" using "fields" as an abstraction - to describe patterns, rearrange things. eg. I can use some pattern notation such as "AaBbCc" to describe something. I'm pretending "A" is the top field as an abstraction ("fields" don't really exist in a native progressive stream, it' s just a mental exercise). But I can display the stream of "pictures" , or "halfpics", or "fields" in a video editor, or avisynth/avspmod, or vapoursynth/vsedit... etc.... I can see and arrange/rearrange/manipulate the "pictures", or frames, or fields using programs and "see" the results on the display of a raw yuv stream (or any other type of video data stream, it could be raw RGB , CMYK image sequence, doesn't matter) . I can display "A" as a "half height" picture (halfpic) representing the "top field" abstraction. If the notation describes something happening, that some process is applied - I want to visualize it on a display at each step or node

So in the pps[hps] case, where we pretend 2 "halfpics" are the "fields", and we're combining them , what is it ?

The two halfpics are not fields. They have to be in a frame to be fields.
What 'it' is, is a picture: a blob of memory that has no containing structure (i.e. no frame).

Lets stick to "halfpics" . 2 halfpics combining to form a "picture" (but not yet a "frame")

To me, when you combine 2 halfpics of the pair, that becomes a "picture" - but it's also a "frame". I don't understand what is done so the "picture" to get it "promoted" to a "frame" ?

The blob of memory is still comprised of 1's and 0's. The stream of halfpics itself, or when combined to form a stream of pictures in memory (pps) can both be visualized.

Is "visualization" part of the distinction ?

Why am I explaining this to You? Why are You asking?

You separated them out in the reference , and I'd like to know the significance . It feels like I' m missing something occurring with the "framing" step, or I'm thinking about it in a different manner, or am I just overthinking it ?

Do you have an example or situation that illustrates differences, or the need to distinguish between these following two operations ?

- fps[hps interlaces & frames halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if fps != 2*hps).
- pps[hps interlaces halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if pps != 2*hps).

You're calling it 2 half pics forming a picture. That's fine. But in the old way , that was also a "frame". I'm not seeing the distinction between picture and frame in this case.

When you look at a display, you see pictures. When you look inside an elemental stream with a hex editor, you see frames.
Frames are data, ephemeral things. Pictures (as samples) are what is shot with a camera or sampled with a film scanner, plus (as pixels) are the interpretation of the data that is enclosed within frames, plus (as pels) are what is displayed on a screen.

"When you look at a display, you see pictures" - ok, but you're using "picture(s)" in 2 slightly different ways. On one hand you're seeing the "pictures" on the display ; on the other hand, a stream of half pic pairs, when combined, forms a "picture" stream...(but apparently it's not "framed" yet... I don't get it, sorry. Something not "clicking" for me for "framing")

"Encoded interlaced" vs. "Encoded Progressive"

We're describing a file that was "encoded I or P"

Encoded interlaced would be something [25pps] , because the input source [25pps] used to generate the file we are describing now was the same in both cases. The content is the same in both cases for input and ouput [25pps] - it doesn't change. Nothing funky is going on ; nothing like fields added or deleted - ie. no change in "structure."

Sorry, those are semi-rhetorical statements? Or are they questions?

Not questions (no question mark). Statements or explanations so we are on the "same page". And if this is not how you "think" of it, then say so and explain

To clarify , I was still referring to the "P content, encoded I" vs. "P content, encoded P" . But the other 2 cases also occur in real life: "I content, encoded I" , vs. "I, encoded P", and last one (I content encoded P) has additional problems, I'm sure you can imagine

I don't think "Encoded I or P" fits in the way the notation describes "structure" and streams. The streams (encoded "i" or encoded "p"), functionally look the same on paper in terms of structure

Why are you saying that? That's not true at all unless you're referring to a raw stream (in which case, there is no 'i' or 'p'). I'm mystified.

To be clear, I was still referring to "progressive content encoded interlaced", vs. "progressive content encoded progressive". eg. 25p content encoded interlaced vs. 25p content encoded progressive . The structure is the same. Both can be represented as AaBbCc... To recap - all this discussion about "25p content encoded I or P" started with the 25p content PAL DVD example - 25p content encoded interlaced.

What is the "25p content, telecined for PAL" notation? Because that is the same thing as "25p content, encoded interlaced" . The analogous case for North America would be "29.97p content, telecined for NTSC" (and also "29.97p content encoded interlaced")

(it should have said both fields from the pair are from the same image - "both field pairs" implies 4 fields)
Okay. Why are you mentioning 4 fields?

I'm highlighting my mistake . (Lots of them apparently the later at night it gets) . "If there is no motion, both field pairs are from the same image " is wrong because "both field pairs" implies 4 fields (1 field "pair" is 2 fields). It should only be 2 total fields from the same image

Yes, it's basically the same thing repeated . If you examined frames (in a non deinterlacing player) , it would look like a single progressive, full quality image repeated at 29.97 frames /s . (If it was deinterlaced using most algorithms, there would be lower quality, aliasing)

Huh?

What is the question about?

"non deinterlacing player" - meaning player doesn't apply a deinterace filter automatically

Why would a player deinterlace?

("Deinterlacing" in this context meaning like something like "yadif" being applied.)

Some players automatically deinterlace in some situations, usually based on flags and metadata. eg. If you play a typical DV or AVCHD (interlaced content, interlaced encoding) file in something like Windows Media Player, they will usually automatically deinterlace because it has "interlace" scan type and a declared scan field order. By convention, all interlaced HD is TFF, and DV is BFF, and programs like mediainfo will report it. A typical (NTSC area) DV or AVCHD file with motion will display in WMP as 59.94p different images per second (each field is separated , and spatially the missing scan line is interpolated to "full sized" frames; essentially 59.94 fields/s become 59.94 frames/s if double rate deinterlaced, 29.97 frames/s if single rate deinterlaced - half the temporal samples in the latter case) . I don't want to get into a deinterlacer discussion (it's a big topic) - in short there are different kinds of deinterlacing algorithms, varying quality, some are "dumb" and blindly deinterlace everywhere, including static content (therefore degrading that section) . Others are "smarter" and have motion adaptive algorithms and use data from adjacent fields. Some use "AI" or machine learning - for example , high end TV set chips.

"deinterlace" in most video forums implies separating fields and interpolating the other scan line (by various algorithms) to generate a full frame (or "picture") . An example of a deinterlacing filter would be "yadif". Single rate deinterlace means dropping one set of scan lines (half temporal resolution if content was interlaced to begin with (ie. starting with [50sps] or [59.94sps] ) , and I think the naming would be 25fps[25pps[50sps]]), double rate deinterlace means keeping the temporal resolution , I think it would be 50fps[50pps[50sps]] )

"Single rate deinterlace"... What a curious expression. What does it mean? Deinterlace at a single rate? Or some sort of deinterlace that produces a rate that's the same as the input rate. I have not the slightest idea. I suspect a phrase like "single rate deinterlace" mixes up all sorts of operations that aught to be kept separate and distinct.

Yes, a bunch of operations are included and the naming" could probably be better. I'm just explaining what is meant by the terms when you come across them in various video forums. If it helps yadif mode=0 is single rate deinterlacing, yadif mode=1 is double rate deinterlacing .

your pps <=> means "x content"

I don't know what your "x content" means. "pps" means "pictures per second".

I edited it before you posted; "x progressive content" . Such as "25p content" would be 25pps

your "deinterlace" <=> means "take part", such as separating pics , to half pics

Again, I'm using the word because I'm tired of fighting that fight. "Separating half pictures is much more accurate". We share the same idea.

"separating half pictures" would imply resulting quarter pictures; (I know you wrote that late at night)

How did you determine it was [60'sps] from the clip only ?

No particular reason.

Would 29.97fps[29.97pps] "look" the same ?

The same as what?

To reiterate:

The camera uses an old style CCD sensor - so the sensor itself also works in "field scan" (as opposed to more modern CMOS sensors , progressive scan, where "interlaced content" [sps] outputs are created in camera after the sensor, but before the recording to media in the CMOS case.) For the DV recording, during motion, each field represents a different image. So the DV camera the output recording is 29.97fps[59.94sps]. If there is no motion, field pairs are from the same image - it's progressive content during that scene

1) Let's say you recieve a clip, no other information about the camera or background info, and the recording was that static scene only (no motion) - what would you "label" it ?

Would it be wrong to label the clip with no motion, no background info, as "29.97fps[29.97pps]" ? When viewed on a display, both should look the same on that clip (I mean actual 29.97fps[29.97pps] for example from a different camera, and this DV camera shot with no motion). You see 29.97 pictures per second, they are essentially the same full picture repeated (assuming no processing such as deinterlacing is applied, and ignoring compression differences, noise etc...).

What made you choose [60'sps] ? - you wrote "no reason" . I'm trying to understand the thought process behind that.

What happens if you now know the background info, or look at mediainfo and it says " Scan type : Interlaced" ? Does that alter how you got to the answer, or change the answer ? What if mediainfo said "Scan type : Progressive" - does it change how you arrived at the answer, or change the answer ?

poisondeathray

7th November 2021, 04:34

I want to retire all my FFmpeg and go entirely with VapourSynth. But I'm a VS novice and don't grok python -- I'm a OOL animal, but I don't know how to query python constructors and such stuff. Will you help me?

I can try to help, but I'm a python novice. If I never touched vapoursynth, I never would have touched python. I know just enough python to do tasks that I need to do in vapoursynth... I learn from examples and usage, and ask if I can't do something. I'm more familar with avisynth. Avisynth has been around longer, so there are more plugins and scripts already developed for various audio/video tasks. But having been exposed to both - vapoursynth has much more potential than avisynth - because of python. So I plan on learning it properly eventually (books , courses...)

"fps=60000/1001, setfield=tff" is not needed.
"shuffleframes='0|-1|-1|3'" works only on frames, and that particular pattern operates on a stride of 4 frames. It copies frame 0 to '0', drops frame 1, drops frame 2, and copies frame 3 to ?where?. The ?where? should be '1'. The result will be a video with half the number of frames. Is that what you intend?
"setpts='N/(120000/1001*TB)". Try this, instead: settb=1/25,setpts=N

PS: 120000/1001? Are you trying to fast motion to 120'fps or did you simply make a mistake?

PPS: I've gotten some fresh sleep...
The 'shuffleframes' filter pattern you're looking for may be
0|3|-1|-1
I can never remember whether the pattern is source-wise or target-wise.
If you're just dropping frames, then this: <target>fps[1+00+1[<source>fps]], will do the trick. You can simply implement that in FFmpeg with this:
select=or(eq(mod(n\,4)\,1)\,eq(mod(n\,4),0)
I haven't figured out what you're trying to do or the frame rates you're trying to achieve. 60'fps? 120'fps?
Dropping 2 frames in a row is awfully brutal.

See if this does what you want to do (my best guess given what you give)

-fv "settb=1/25,setpts=N,shuffleframes=0|3|-1|-1" or
-fv "settb=1/25,setpts=N,select=or(eq(mod(n\,4)\,1)\,eq(mod(n\,4)\,0)" <== this is escaped for Windows -- remove the backslashes if not Windows
\___________________\___this part just hard-wires the input frequency (CFR) and avoids PTS errors but I can't know what output you want from what you provided.

HTH

That refers to the same "25p content telecined for NTSC" task , translated from vapoursynth and avisynth - just attempted in ffmpeg

The way I would have written the schematic out is not as pretty , and doesn't have proper spacing (maybe my lack of html or ascii art skills) .

The pattern of duplicates is what you described in the earlier post . ChangeFPS duplicates or drops frames (call them pictures if you want) , to achieve the desired frame rate. -vf FPS in ffmpeg does the same thing as ChangeFPS (assuming ffmpeg interprets the timecodes in the same way - not always the case). But I verifed - the pattern is the same in ffmpeg (fps=60000/1001), avisynth, vapoursynth, and a couple NLE's with a test clip. Applying ChangeFPS(60000,1001) to a 25p content source results in AAABBCCDDEE expressed as frames.

SeparateFields is the same in ffmpeg, avisynth, vapoursynth. It's analogous to "separating pics to halfpics" (since the "parent" is progressive). Next step is selecting in every group of 4, every 1st and 4th (frame numbering in most programs starts at "zero"), and by extension dropping 2nd and 3rd "fields". So SelectEvery(4,0,3) in avisynth means for every group of four, select position 0 and 3 (which are the 1st and the 4th in every group of four). "Weave" put separated fields back together into a frame (you would label "weave" slightly differently, something like "interlacing half pic pairs, +/- framing them" - I'm still not to clear on the use of "framing" )

AaAaAaBbBbCcCcCcDdDdEeEe ( SeparateFields )
A aA bB cC cD dE e ( SelectEvery(4,0,3) )
AaAbBcCcDdEe ( Weave )

Yes, 120000/1001 for setpts is a "mistake" - but I actually tried it too. It's one of the many versions I tried, and I just copied and pasted that version late at night.

Replacing with settb=1/25,setpts=N does not result in the expected result

"fps=60000/1001, setfield=tff" is needed in ffmpeg to to replicate what I described for that first step in the conversion. The avisynth script is "easiest" to understand IMO, when breaking apart individual steps; and the output of each indivdual step (or node) is the same when compared to ffmpeg, up to the shuffleframes point (the ffmpeg output just after separatefields is correct - ie the output frome separatefields can be thought as "half pic" stream, but we're calling them "fields" in ffmpeg, avisynth and vapoursynth)

If you're certain that ffmpeg shuffleframes only works on frames (it does not work after separatefields), then this approach won't work . I mentioned also trying split with select, interleave, weave the 2 streams of fields with one offset with trim - didn't work. I tried dozens of combinations, you know how it is with ffmpeg....

Do you know of another way to accomplish this "25p content telecined for NTSC" task in ffmpeg ? If it helps I can upload input, and the expected output video (avs, vpy, nle's all match) to compare. I would like to figure out how to do it in ffmpeg, because sometimes it's more streamline for a specific task I'm doing (or a related issue might come up). I tend to use a bunch of tools, each has pros/cons

poisondeathray

7th November 2021, 04:37

I'd like to know what you think of one of my favorite tricks: Use halfpic interpolaton to turn 50sps into 25fps that's fully progressive (except for a 1st combed frame).

Interesting thought exercise. Have you tested it on real video compared to standard "deinterlacers", or more complex ones like QTGMC?

Interpolation (I'm assuming motion vector interpolation, temporal midpoint of the halfpics) can yield weird results and artifacts, and when weaved (combined into full pic), they might not "fit" as you expected it to - resulting in aliasing issues .

markfilipak

7th November 2021, 16:16

...

Would you kindly edit your post and change the CODE tag to QUOTE so that the text wraps?

poisondeathray

7th November 2021, 16:40

Would you kindly edit your post and change the CODE tag to QUOTE so that the text wraps?

sorry, does that fix it ?

markfilipak

7th November 2021, 16:50

Interesting thought exercise. Have you tested it on real video compared to standard "deinterlacers", or more complex ones like QTGMC?
What is QTGMC? By standard deinterlacers I reckon you mean cosmetic filters that do deinterlacing to separate half pictures as part of their function. If so, I stopped using cosmetics a while back. I do better.

Ah! Found it!
http://avisynth.nl/index.php/QTGMC
Yeah, yet another cosmetic. No, thanks. I'm a mechanical man.

Interpolation (I'm assuming motion vector interpolation, temporal midpoint of the halfpics) can yield weird results and artifacts, and when weaved (combined into full pic), they might not "fit" as you expected it to - resulting in aliasing issues .
I'm doing it all the time. I'm even doing 1-to-5 MV interpolation from 24fps to 120fps and getting great results. A 1-to-2 interpolation is a piece of cake. There's no issues, no weirdness. 1-to-5 stretches the limits and for some particular scenarios such as tracking shots with high contrast scrolling backgrounds with vertical lines there sometimes is weirdness caused by the large size of the detection blocks used. I'd like to see smaller blocks and/or pixel-by-pixel motion detection but the developers don't see it that way.

Did this:
25fps[25pps{50hps[25hps[1+0}{1,[50hps[25hps[0+1}[50sps]]
'sing' to you?

markfilipak

7th November 2021, 16:55

sorry, does that fix it ?

Nope. It (http://forum.doom9.org/showthread.php?p=1956788#post1956788) still has a wide a CODE block at the end.

markfilipak

7th November 2021, 17:08

... The way I would have written the schematic out is not as pretty , and doesn't have proper spacing (maybe my lack of html or ascii art skills) . ...

Hahaha... You have to put the texipix diagrams in CODE tags.
It's the only way to format text as monospaced text in this forum.

I wish so-called "rich text" would go away.

PS: You know, I'm going to put all my posts in CODE tags and do the text
wrapping, if not a texipix diagram, myself, manually.

poisondeathray

7th November 2021, 19:05

I think I fixed the [code] tag

I know some of the various ways to blend pictures (all of which involve blending pixels). Tell me: How does one blend fields? Do you put them in a blender?

Not sure if that was a joke...If you have a blended frame (picture), and separate the fields (or separate to "half pics"), each field will also exhibit blends. I personally think it's a terrible method, but it happens in real life, in some professional distributions. Just search "field blend" and you'll get hundreds of examples of sources in this forum alone. I mentioned earlier script developed that attempt to "undo" the damage"

If you start with AB as 2 frames , and (A/B) is the 50/50 weighted blend frame. If separating "normal" frame A to fields results in Aa, then separating blended frame (A/B) to fields gives (A/B)(a/b) .

PS: I apologize. I can be snarky. I know I'm being a PITA about this frames v. pictures thing. But they really are different things at (sometimes) differing rates. If you can break the habit of mixing them up, it will really help me and others, and I think it will help you.

The truth is I don't understand the distinction in how/why you use frames vs. pictures. Try to explain in other words or describe how they are different, or sometimes have different rates.

I understand you can change the rate, do other things like add duplicates, but I consider those as additional steps afterwards - is that why ? But I already asked about that earlier

Did this:
25fps[25pps{50hps[25hps[1+0}{1,[50hps[25hps[0+1}[50sps]]
'sing' to you?

I understand conceptually what it does with the earlier descriptions on the right (in the earlier post). I'm not at the point of looking at that notation and understanding yet

But I suspect problems in the end result. "The proof is in the pudding" - It's like a fun challenge. Post some tests and comparisons if you think it does a good job in some scenarios

If I have time I'll try it later in avisynth or vapoursynth

What is QTGMC? By standard deinterlacers I reckon you mean cosmetic filters that do deinterlacing to separate half pictures as part of their function. If so, I stopped using cosmetics a while back. I do better.

Ah! Found it!
http://avisynth.nl/index.php/QTGMC
Yeah, yet another cosmetic. No, thanks. I'm a mechanical man.

motion vector interpolated frames are a "cosmetic" too

Interlaced content ([sps]) sources are by definition starting with fields already. "50sps" has 50 different fields during motion. You have 1/2 vertical resolution to begin with, compared to a 50p content source. Your earlier interpolation example is "mechanically" discarding 1/2 the temporal resolution as a trade off - not necessarily an ideal tradeoff either.

" that do deinterlacing to separate half pictures" - Deinterlacers are typically not used on Progressive content ("half pictures" would suggest progressive parent, right?) . Field matching and decimation (aka IVTC) are used to recover the original progressive "pictures". IVTC is not "deinterlacing".

QTGMC is a mega function and smooths over the problems that other typical deinterlacers have (line flicker, aliasing, "marching ants" artifacts) . Usually the cosmetics are better than the alterative for interlaced content , but there is no one "best" at everything.

I'm doing it all the time. I'm even doing 1-to-5 MV interpolation from 24fps to 120fps and getting great results. A 1-to-2 interpolation is a piece of cake. There's no issues, no weirdness. 1-to-5 stretches the limits and for some particular scenarios such as tracking shots with high contrast scrolling backgrounds with vertical lines there sometimes is weirdness caused by the large size of the detection blocks used. I'd like to see smaller blocks and/or pixel-by-pixel motion detection but the developers don't see it that way.

Yes there are artifacts and weirdness as the result of MV interpolation. If you get 100% clean results , no issues - your testing set is very imited.

I've been using interpolation for many years. You can search my old posts and read about the categories of problems that you will definitely encounter. If you want samples, test videos, that illustrate the problems let me know. I use professional interpolation software too - this is an area I have a lot of experience with, including manual and semi-manual fixes to the problems.

Newer approaches such as DAIN, RIFE can fix many of the MVTools2 problems in some scenes, but have some of their own issues too. Pros/cons. Combing methods yields the best results, but requires more user interaction, roto, masking

Smaller blocks during interpolation often cause more problems with artifacts near edges. You can see this effect when varying blocksize from high to low. But more settings "options" are always welcome

poisondeathray

7th November 2021, 21:16

I think I finally "get" it walking through that example - The difference in how you use "frame" picture stream, and "interlace" combining the half pic pairs

Your "interlace" of the half pic pairs pps[hps is like "interleave" in avisynth or ffmpeg . They can be thought as of "side by side" . At that point they are still half height. The "framing" step is like "weave" in avisynth or ffmpeg , combining to to full height

- fps[hps interlaces & frames halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if fps != 2*hps).
- pps[hps interlaces halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if pps != 2*hps).

Sorry for being slow...

markfilipak

7th November 2021, 21:22

"Look like" - I'm trying to visualize the "unframed" state so I can understand the difference "framed" vs. "unframed".
Similar to the manner that I can visualize a stream of "pictures" on the display right now, or fields, or whatever - The "blob of data" is converted
to a RGB representation and rendered on the display.

Strange as it may seem, a picture is data, not what you see on a display.

Frames (populated by samples having particular formats) --> Pictures (populated by pixels) --> Display (populated by pels)

It's not me saying that. It's MPEG.

To "visualize" pictures, think of raw (image) data in an editing pipeline. There aren't any frames there. There's just memory blocks. You (everyone)
visualize pictures by what you see on a screen, but that's a rendered image (DAR=PAR*SAR stuff). I'm sure you've experienced times when, what you saw
on a screen had unexpected proportions, especially with SD material from DVDs. Well, you were looking at a rendered image, not at a picture (as
"picture" is defined by MPEG). So, what does a picture look like? What does data look like?

What I do is imagine what the picture (data) would look like if it was a rendered image on a display (e.g. DAR=16:9) for SAR=1:1. That's as close to
"seeing" what a picture looks like as you'll get.

I can use the "old system" using "fields" as an abstraction - to describe patterns, rearrange things. eg. I can use some pattern notation such
as "AaBbCc" to describe something. I'm pretending "A" is the top field as an abstraction ("fields" don't really exist in a native progressive stream
...

They ("fields") certainly do exist. But they're fractionalized inside particular macroblocks. Fields don't exist as discrete things, no. But they do
exist within macroblocks, even in progressive frames. Just look at the structural differences between so-called "progressive" macroblocks and so-
called "interleaved" -- MPEG doesn't use the word "interlaced" -- macroblocks.

Aside: I can tell by your use of quotes that you are recognizing the shortcomings of the limited video vocabulary most people have.

... it' s just a mental exercise). But I can display the stream of "pictures" , or "halfpics", or "fields" in a
video editor, or avisynth/avspmod, or vapoursynth/vsedit... etc.... I can see and arrange/rearrange/manipulate the "pictures", or frames, or fields
using programs and "see" the results on the display of a raw yuv stream (or any other type of video data stream, it could be raw RGB , CMYK image
sequence, doesn't matter) . I can display "A" as a "half height" picture (halfpic) representing the "top field" abstraction. If the notation describes
something happening, that some process is applied - I want to visualize it on a display at each step or node

Well, I suppose an interactive notation visualizer that renders pictures on a display with each '[' step could be written, but not by me. You'll have
to use your imagination and texipix.

A good system of notation will help mental visualization, I think.

Lets stick to "halfpics" . 2 halfpics combining to form a "picture" (but not yet a "frame")

To me, when you combine 2 halfpics of the pair, that becomes a "picture" - but it's also a "frame". I don't understand what is done so the "picture"
to get it "promoted" to a "frame" ?

TaDa! That's the pregnant question, isn't it?

A picture is raw data. A frame is formatted data: pixel quads, blocks, macroblocks, slices, and surrounding metadata. What you see on a display is
rendered frames. What a camera shoots is sample frames. Pictures are the unframed raw data in the realm between.

The blob of memory is still comprised of 1's and 0's. The stream of halfpics itself, or when combined to form a stream of pictures in memory (pps) can both be visualized.

Is "visualization" part of the distinction ?

Short answer: Yes. Long answer: You know that a "field" can be framed and becomes a half-height "picture" all on its own. It becomes a frame:
separating frames to fields creates a new stream of half-height frames. Well, a halfpic stream is just the raw version of that half-height stream, so
it's as though you separated the original picture stream directly to a halfpic stream. It's what FFmpeg's 'separatefields' does:

[A+a].. to [A][a]..
\ \ \___ half picture
\ \___ half picture
\___ picture

It's really that simple.

You separated them out in the reference , and I'd like to know the significance . It feels like I' m missing something occurring with the
"framing" step, or I'm thinking about it in a different manner, or am I just overthinking it ?

I think you're overthinking it. The notation is not profound. It's just handy for representing complex operations in a manner that can be understood
without resorting to words.
Here's a beginning list of notations that I'm sure of:

25fps[1-24,24[24pps]] 25fps[24pps] via 1-of-24 picture repeat
25fps[23x(1+0)+2[2x[24pps]]] 25fps[24pps] via 1-to-2 picture repeat + 23-of-48 picture drop
25fps[24pps] 4% PAL speedup
25fps[50sps] 50sps interlace
25fps{50hps[1+0}{1,50hps[0+1}[50sps] comb-free 25fps[50sps]
30fps[1,3,2,5,4,7,6,8-12[1+0000[6x[50hps[25pps]]]]] 30fps[25pps] telecine via 3-2-3-2-2 pull-down
30fps[1-5,5[25pps[50sps]]] combed 30fps[50sps] via 1-of-5 picture repeat
30fps[2+00+3+00+2+00+3[96hps[2x[24pps]]]] 30fps[24pps] telecine via 2-3 pull-down
30fps[60sps] 60sps interlace
30fps{60hps[1+0}{1,60hps[0+1}[60sps] comb-free 30fps[60sps]
60fps[1+0[5x[24pps]]] 60fps[24pps] via 1-to-5 picture repeat + 2-to-1 picture drop
120fps[120pps[24pps]] 120fps[24pps] via 1-to-5 picture interpolation
120fps[5x[24pps]] 120fps[24pps] via 1-to-5 picture repeat

Do you have an example or situation that illustrates differences, or the need to distinguish between these following two operations ?

"When you look at a display, you see pictures" - ok, but you're using "picture(s)" in 2 slightly different ways. On one hand you're seeing the
"pictures" on the display ...

I should have written "images on the display". Forgive me if I used the word "picture".

... on the other hand, a stream of half pic pairs, when combined, forms a "picture" stream...(but apparently it's not "framed"
yet... I don't get it, sorry. Something not "clicking" for me for "framing")

I hope what I've written above has resolved it. You know, it's not simply a matter of symantics. I've even seen real video pros argue pointlessly
because they misunderstood which "frame" was which: from an encoder or in a raw stream. When I read the MPEG specs I can tell which sections were
written by who, well, not by name but by psychology/terminology. It seems the MPEG engineers agreed on one thing: Never to use the word "interlace".

To clarify , I was still referring to the "P content, encoded I" vs. "P content, encoded P" . But the other 2 cases also occur in real life: "I
content, encoded I" , vs. "I, encoded P", and last one (I content encoded P) has additional problems, I'm sure you can imagine

Sorry. I'm lost.

To be clear, I was still referring to "progressive content encoded interlaced", vs. "progressive content encoded progressive". eg. 25p content
encoded interlaced vs. 25p content encoded progressive . The structure is the same. Both can be represented as AaBbCc... To recap - all this
discussion about "25p content encoded I or P" started with the 25p content PAL DVD example - 25p content encoded interlaced.

Is this a metaquestion? What does anything that's encoded have to do with anything that's useful? Certainly, the notation has nothing to do with
encoding.

What is the "25p content, telecined for PAL" notation? Because that is the same thing as "25p content, encoded interlaced" . The analogous case
for North America would be "29.97p content, telecined for NTSC" (and also "29.97p content encoded interlaced")

"25p content, telecined for PAL" notation? Did I write that?

I'm highlighting my mistake .

Oh.

(Lots of them apparently the later at night it gets) . "If there is no motion, both field pairs are from the same image " is wrong because
"both field pairs" implies 4 fields (1 field "pair" is 2 fields). It should only be 2 total fields from the same image

A good night's sleep generally fixes those things.

("Deinterlacing" in this context meaning like something like "yadif" being applied.)

Related: I once asked the FFmpeg developers what the format was for picture data in the filter pipeline. I suspected that some of their obtuse answers
were based on the pipe-internal format rather than based on what everyone else would consider to be a picture. They thought it was a stupid question
and that I'm a stupid person.

Some players automatically deinterlace in some situations, usually based on flags and metadata. eg. If you play a typical DV or AVCHD
(interlaced content, interlaced encoding) file in something like Windows Media Player, they will usually automatically deinterlace because it has
"interlace" scan type and a declared scan field order. By convention, all interlaced HD is TFF, and DV is BFF, and programs like mediainfo will
report it. ...

A typical (NTSC area) DV or AVCHD file with motion ...

What does "with motion" mean? What special meaning does "with motion" have for you?

... will display in WMP as 59.94p different images per second (each field is separated , and
spatially the missing scan line is interpolated to "full sized" frames; ...

How do you know that? There's no accounting for how Microsoft chooses to provide an "enhanced user experience".

... essentially 59.94 fields/s become 59.94 frames/s if double rate deinterlaced, 29.97 frames/s if single rate deinterlaced - half the
temporal samples in the latter case) .

I'm not familiar with any metadata that specifies "double rate deinterlaced".

Yes, a bunch of operations are included and the naming" could probably be better. I'm just explaining what is meant by the terms when you come
across them in various video forums. If it helps yadif mode=0 is single rate deinterlacing, yadif mode=1 is double rate deinterlacing .

Yes, I was aware of that. yadif, like all deinterlacing filters, just incidentally deinterlaces. It's primarily a cosmetic. I don't use cosmetics. I
attempt to undo whatever has been done to pictures so as to turn them back into the best replica of the original pictures that I can recreate. I then
prefer to convert them to 120pps and encode them 120fps.

To reiterate:

The camera uses an old style CCD sensor - so the sensor itself also works in "field scan" (as opposed to more modern CMOS sensors , progressive
scan, where "interlaced content" [sps] outputs are created in camera after the sensor, but before the recording to media in the CMOS case.) For the DV
recording, during motion, each field represents a different image. So the DV camera the output recording is 29.97fps[59.94sps]. If there is no
motion, field pairs are from the same image - it's progressive content during that scene

1) Let's say you recieve a clip, no other information about the camera or background info, and the recording was that static scene only (no motion) -
what would you "label" it ?

Would it be wrong to label the clip with no motion, no background info, as "29.97fps[29.97pps]" ? When viewed on a display, both should look the
same on that clip (I mean actual 29.97fps[29.97pps] for example from a different camera, and this DV camera shot with no motion). You see 29.97
pictures per second, they are essentially the same full picture repeated (assuming no processing such as deinterlacing is applied, and ignoring
compression differences, noise etc...).

What made you choose [60'sps] ? - you wrote "no reason" . I'm trying to understand the thought process behind that.

Because I heard a rumor that the original camera stream was 60'sps?

Hmmm... motion v. no motion? What I think you're 'talking' about is layers in MPEG-4 AVC. I'm not interested in encoded formats. I don't anticipate
the notation applying to encoded streams.

What happens if you now know the background info, or look at mediainfo and it says " Scan type : Interlaced" ? Does that alter how you got to the
answer, or change the answer ? What if mediainfo said "Scan type : Progressive" - does it change how you arrived at the answer, or change the
answer?

I don't much use MediaInfo. I watch a video with MPV. I rely on its report:
"FPS: sss.mmm (specified)" v. "sss.mmm (estimated)",
and I single frame-step while looking for/at visual clues.

I still don't know what you mean by " Scan type : Interlaced" and "Scan type : Progressive". Interlacing is what happens on a display screen or out of a decoder when producing a raw stream. Interlacing is how you make
a picture from 2 progressive halfpics.

It's also what you get when you interlace 2 scans to make a (1/60th second combed) picture.

Oh! Oh! Idea! Is "Scan type" just a label? Does "Scan type : Interlaced" just mean "Progressive: No", and does "Scan type : Progressive" just mean
"Progressive: Yes". If so, how does that jibe with the metadata? -- There's 2 of them in an MPEG-ES.

Metadata:

MPEG elemental stream
-- sequence_extension
-- -- offset 0005.4 : 'progressive_sequence' : '0' for picture or halfpics or scans; '1' for picture
-- picture_coding_extension
-- -- offset 0008.0 : 'progressive_frame' : '0' for halfpics or scans; '1' for picture

markfilipak

7th November 2021, 22:05

I think I finally "get" it walking through that example - The difference in how you use "frame" picture stream, and "interlace" combining the half
pic pairs

- fps[hps interlaces & frames halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if fps != 2*hps).
- pps[hps interlaces halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if pps != 2*hps).

Your "interlace" of the half pic pairs pps[hps is like "interleave" in avisynth or ffmpeg . They can be thought as of "side by side" . At that point
they are still half height. The "framing" step is like "weave" in avisynth or ffmpeg , combining to to full height

Sorry for being slow...

That's fine. Just so long as you don't call everything a frame. :)

##pps : a stream of pictures at '##' pictures per second.
##hps : a stream of halfpics at '##' halfpics per second.
##sps : a stream of scans at '##' scans per second.

They are serial streams, so
"hps[pps" can be implemented: 'separatefields'
"pps[hps" can be implemented: 'weave'
"pps[sps" can be implemented: 'weave'
"fps[hps" is interlace ('weave') and frame
"fps[sps" is interlace ('weave') and frame
"fps[pps" is frame, only

Strides are also stream animals.
masked strides:
1+0 -- inputs stream-sets of 2 stream thingies and outputs the 1st stream thingy (drops the 2nd stream thingy)
2+00+3+00+2+00+3 -- the infamous 2-3 pull-down: pass 2 stream thingies, drop 2 thingies, pass 3, drop 2, etc.
Note: "2+00+3+00+2+00+3" is an abbreviation of "1+1+0+0+1+1+1+0+0+1+1+0+0+1+1+1"
mapped strides:
1-24,24 -- map stream thingies 1 to 24 to the output stream followed by a copy of stream thingy #24
1,3,2,5,4,7,6,8-12 -- map groups of 12 stream thingies to the output stream in the following order: 1 3 2 5 4 7 6 8 9 10 11 12, as the stream thingies
arrive.

See? They're simple building blocks, like Legos.
With simple building blocks plus a plan, I think we can build more efficient conversations.
...fewer words, more understanding.
...less frustration.
...fewer good people accused of being trolls.

markfilipak

7th November 2021, 22:14

Let me seal the deal via a familiar example...
A 30'fps telecine conveys how many pictures per second?
30'fps=30fps[2+00+3+00+2+00+3[96hps[2x[24pps]]]]
24 pictures per second...
framed at 30 frames per second...
played (via metadata) at 30000/1001 frames per second (i.e. 0.1% slow).

markfilipak

7th November 2021, 22:26

You're collecting "disciples." Look out Betamax ! :devil:

This line is here because "The message you have entered is too short. Please lengthen your message to at least 5 characters."

poisondeathray

7th November 2021, 22:48

It seems the MPEG engineers agreed on one thing: Never to use the word "interlace".

And that was confusing me - "interlaces halfpic pairs" - but in which way are they "combined" . They are "interleaved", such as side by side, still half height. That's the language I understand. Interlave() in ffmpeg or avisynth . The framing step is "Weave".

To be clear, I was still referring to "progressive content encoded interlaced", vs. "progressive content encoded progressive". eg. 25p content
encoded interlaced vs. 25p content encoded progressive . The structure is the same. Both can be represented as AaBbCc... To recap - all this
discussion about "25p content encoded I or P" started with the 25p content PAL DVD example - 25p content encoded interlaced.

What does anything that's encoded have to do with anything that's useful? Certainly, the notation has nothing to do with
encoding.

It's useful, because it has to do with the possible consequences in stream handling. I described a "why" it was important in an earlier post. If it's not covered by the notation , you should still be aware of it

It affects vapoursynth too, because of frame props. A progressive content stream "encoded and flagged interlaced" will cause operations to use "interlaced" form by default. This can cause errors in some operations (e.g. resampling, converting to RGB (chroma planes are upsampled). e.g. if you take a screenshot, you will have chroma upsampling errors. You have to set the frame props to "progressive" override it. (It's similar to NLE's, you have to "interpret" the footage to progressive)

"25p content, telecined for PAL" notation? Did I write that?

I don't think so, or at least not final version. This also is the "25p content DVD, encoded interlaced". Very common. This is what started all this "progressive content encoded interlaced, or encoded progressive" discussion. The 2:2 pulldown (misnomer).

A typical (NTSC area) DV or AVCHD file with motion ...

What does "with motion" mean? What special meaning does "with motion" have for you?

It means objects are moving, or camera is moving. It means not static, so no frame repeats. If you were to look at separated fields, each field would show 50 different pictures.

Motion is required to properly analyze the highest frequency of image capture by the system setting. If I shoot 120fps with my gopro and wave the camera around, there are 120 different images per sec. If it's a static shot, you can't tell definitively

... will display in WMP as 59.94p different images per second (each field is separated , and
spatially the missing scan line is interpolated to "full sized" frames; ...

How do you know that? There's no accounting for how Microsoft chooses to provide an "enhanced user experience".

I have experience with many different windows systems and configurations, many different users. I'm clear on the (interpolated) 59.94p part. Not necessarily clear on the exact algorithm used by wmp

I'm not familiar with any metadata that specifies "double rate deinterlaced".

I'm not either - in terms of metadata.

But the double rate occurs by default, not single rate. It occurs with your flat panel TV too. eg. An interlaced content DVD will display as (interpolated) 59.94p, not 29.97p . A 1080i29.97 Sports channel will display as (interpolated) 1080p59.94p , not 1080p29.97 .

Interlaced content streams, display as (interpolated) 50p or 59.94p on almost all consumer systems by default. The method in which they get there is a "double rate deinterlace" applied somewhere

yadif, like all deinterlacing filters, just incidentally deinterlaces. It's primarily a cosmetic.

How are you defining "cosmetic" ?

Yadif in mode=1 convert fields to frames. You're starting with 1/2 spatial information anyways with interlaced content (each field has 1/2 vertical data compared to a 50p or 60p content source)

You can keep the original scan lines in the recording - for example QTGMC has a "lossless" mode, where the original scan lines are preserved. That's usually not a good thing in terms of visual quality. The normal "cosmetics" applied make it look better

I don't use cosmetics.

Motion interpolation is a cosmetic too. Deinterlacers are interpolating spatially, motion vector interpolation is interpolating temporally

If you use interpolation algorithms that preserve the original frame set, then you can recover the original source frames too , similar to "lossless" deinterlacing. (Some interpolation algorithms resample all frames)

I attempt to undo whatever has been done to pictures so as to turn them back into the best replica of the original pictures that I can recreate.

you mean the best replica if you were there in person with your "eyes" (e.g. teleporting time machine)

Double rate deinterlacers attempt that "replica" too. When you apply a double rate deinterlacer to a 25i content source, it becomes a spatially interpolated 50p version. 25i content can be thought as of starting with a 50p source, and discarding half the scan lines.

1) Let's say you recieve a clip, no other information about the camera or background info, and the recording was that static scene only (no motion) -
what would you "label" it ?

Because I heard a rumor that the original camera stream was 60'sps?

[/quote]
I purposely wrote you heard no such rumors for case 1

There is a method to this madness. I wrote about those situations to determine if the notation can convey "content" in a "sps" stream . Or if [sps] solely indicates "interlaced content".

For example, many AVCHD cameras have "30p" mode too. It's a 29.97p content stream in 59.94 fields/s . Early gen ones were CCD, so interlaced scan sensor only. Knowing that info, what would you call "29.97p content stream in 59.94 fields/s" ?

Is "Scan type" just a label?

In mediainfo - Exactly, just metadata label, and usually that is derived from the encoding mode used (I or P). mediainfo does not say anything about the actual content.

poisondeathray

8th November 2021, 01:06

I'd like to know what you think of one of my favorite tricks: Use halfpic interpolaton to turn 50sps into 25fps that's fully progressive (except for a 1st combed frame).

{ }{ }[50sps] Split the source into 2 streams
{ 1+0}{ }[50sps] For the 1st stream, drop every even scan
{ 25hps[1+0}{ }[50sps] The result is 25hps -- a halfpic stream because scans can only be sources, and because the stream will eventually populate a picture stream.
{50hps[25hps[1+0}{ }[50sps] Interpolate to 50hps -- imagine the halfpics come out of 50hps[25hps[1+0[50sps]]]: (A_)(AC)(C_)(CE)(E_)..
{50hps[25hps[1+0}{ 50hps[25hps[0+1}[50sps] For the 2nd stream, drop every odd scan \___ \___interpolated halfpics
{50hps[25hps[1+0}{1,[50hps[25hps[0+1}[50sps] Map an extra halfpic #1 for the 2nd stream -- imagine: (b_)(b_)(bd)(d_)(de)..
25pps{50hps[25hps[1+0}{1,[50hps[25hps[0+1}[50sps] Interlace the two streams \___ \___interpolated halfpics
25fps[25pps{50hps[25hps[1+0}{1,[50hps[25hps[0+1}[50sps]] Frame -- imagine: [A+b_][AC+b][C+bd][CE+d][E+de]..

If you have 2 separate interpolated halfpic streams each at "50hps" , then "interlace" (I'm assuming it means "interleave"), then "frame" (I'm assuming it's "weave") shouldn't the end result be 50fps ?

How are you "interlacing" (combining) the 2 separate each 50pps interpolated half pic streams to result in 25pps ?

If I "pretend" that "interlace" means "interleave" (and they are side by side), then interleaving 2 50hps streams should result in 100pps . If "framing" means "weave", then framing the 100pps stream would give 50fps[50pps

markfilipak

8th November 2021, 01:08

And that was confusing me - "interlaces halfpic pairs" - but in which way are they "combined" . They are "interleaved",
such as side by side, still half height. That's the language I understand. Interlave() in ffmpeg or avisynth . The framing step is "Weave".

In a hps stream, Yes, the halfpics are interleaved -- everything is interleaved. When 2 halfpics are combined to form a picture (as by 25pps[50sps]),
they are interlaced (as a noun, verb, and adjective).

It affects vapoursynth too, because of frame props. A progressive content stream "encoded and flagged interlaced" ...

Are you 'talking' about soft telecine? E.g. 30'fps[24pps]. Or is there more to what you 'say'?

I don't think so, or at least not final version. This also is the "25p content DVD, encoded interlaced". ...

Are you 'talking' about hard telecine? E.g. 30fps[2+00+3+00+2+00+3[96hps[2x[24pps]]]] (i.e. 24pps telecined via 2-3 pull-down). Or is there more to
what you 'say'?

It means objects are moving, or camera is moving. ...

The ordinary meaning of "moving". Okay. That's a yawn.

Motion is required to properly analyze the highest frequency of image capture by the system setting. If I shoot 120fps with my gopro and wave
the camera around, there are 120 different images per sec. If it's a static shot, you can't tell definitively

Why is that important? You've brought that up several times. So what? A picture is a picture. If it's totally, perfectly static, then there will be no
MVs. But there will still be a picture. What are you driving at? What am I not understanding?

How are you defining "cosmetic" ?

cosmetic [n.] 2 cosmetics, superficial measures to make something appear better, more attractive, or more impressive:
The budget committee opted for cosmetics instead of a serious urban renewal plan.

Yadif in mode=1 convert fields to frames. You're starting with 1/2 spatial information anyways with interlaced content (each field has 1/2
vertical data compared to a 50p or 60p content source)

You can keep the original scan lines in the recording - for example QTGMC has a "lossless" mode, where the original scan lines are preserved.
That's usually not a good thing in terms of visual quality. The normal "cosmetics" applied make it look better

Better by a specific, use-case criterion, worse in others. You can't get out more than what's in the source pictures.

Motion interpolation is a cosmetic too.

Not in my book.

Deinterlacers are interpolating spatially ...

A deinterlacer deinterlaces. What you are implying is that a cosmetic filter that incidentally deinterlaces in order to add cosmetics is a
deinterlacer, then automobiles are speedometers.

... motion vector interpolation is interpolating temporally

Any process that adds new structures (new pictures, or new pixels to existing pictures) is not cosmetic. It's mechanical. Cosmetics just
enhance/emphasize some aspect that's already there at the cost of other aspects. Twiddling pixels that were shot (created) by camera is cosmetic.
Creating new pixels/halfpics/pictures/fields/frames is mechanical.

If you use interpolation algorithms that preserve the original frame set, then you can recover the original source frames too , similar to
"lossless" deinterlacing. (Some interpolation algorithms resample all frames)

Good advice. I use only MV interpolation methods that preserve the original pictures because that produces the best final stream. I've seen some MV
interp algorithms that discard some originals. I don't use them. The TV may discard them (e.g. 120fps to a 60Hz TV), but they're still in the encode.

I attempt to undo whatever has been done to pictures so as to turn them back into the best replica of the original pictures that I can
recreate.you mean the best replica if you were there in person with your "eyes" (e.g. teleporting time machine)

I have to assume that the camera was there and made the recording. That will do. Whatever the camera took, and the editor edited, and the color tech
color graded, and the media master mastered, that's what I want. If the color grading or the mastering got screwed up, and I can fix it, then I want
to fix it. If I can't fix it, then I'll consider cosmetics to make it the best I can knowing that I'll never obtain the original. Sometimes I get
frustrated by the technology. If I have a scan-based TV source with mixed hard telecine segments (e.g. "Making Of" documentaries), then maybe the best
I can do is bob. I could perfectly recover each segment, and I could create a scoreboard to do it, but a reliable comb detector would make automation
possible. Do you know of any reliable comb detector I can try? It's the only thing I lack to make a one-click 'anything'-to-120fps script.

Double rate deinterlacers attempt that "replica" too. When you apply a double rate deinterlacer to a 25i content source, it becomes a spatially
interpolated 50p version. 25i content can be thought as of starting with a 50p source, and discarding half the scan lines.

Really? What do you call 25i?

I purposely wrote you heard no such rumors for case 1

There is a method to this madness. I wrote about those situations to determine if the notation can convey "content" in a "sps" stream . Or if
[sps] solely indicates "interlaced content".

[sps] solely indicates that the image produced is the cinematographer's intended field of view with 2:1 proportions and is separated from its
'nearest' siblings by 1/(2*|frame rate|) number of seconds.

By the way, [hps] solely indicates that the image produced is the cinematographer's intended field of view with 2:1 proportions and is separated from its 'nearest'
siblings by zero seconds on one 'side' and 1/|frame rate| number of seconds on the other 'side'.

For example, many AVCHD cameras have "30p" mode too. It's a 29.97p content stream in 59.94 fields/s . Early gen ones were CCD, so interlaced
scan sensor only. Knowing that info, what would you call "29.97p content stream in 59.94 fields/s" ?

The notation for such early gen CCD cameras would be the same as for an analog NTSC camera: 30'fps[60'sps].

Oh! Oh! Idea! Is "Scan type" just a label? Does "Scan type : Interlaced" just mean "Progressive: No", and does "Scan type :
Progressive" just mean "Progressive: Yes". If so, how does that jibe with the metadata? -- There's 2 of them in an MPEG-ES.

Metadata:

MPEG elemental stream
-- sequence_extension
-- -- offset 0005.4 : 'progressive_sequence' : '0' for picture or halfpics or scans; '1' for picture
-- picture_coding_extension
-- -- offset 0008.0 : 'progressive_frame' : '0' for halfpics or scans; '1' for picture
Is "Scan type" just a label?
In mediainfo - Exactly, just metadata label, and usually that is derived from the encoding mode used (I or P). mediainfo does not say anything about
the actual content.

Well, it's not metadata.
Does "Scan type : Interlaced" just mean "Progressive: No", and does "Scan type : Progressive" just mean "Progressive: Yes"?

poisondeathray

8th November 2021, 03:18

It affects vapoursynth too, because of frame props. A progressive content stream "encoded and flagged interlaced" ...

Are you 'talking' about soft telecine? E.g. 30'fps[24pps]. Or is there more to what you 'say'?

Not soft telecine - because soft telecine is encoded progressive, by definition. Yes is more to what I wrote - I wrote about "why it matters" a few pages back.

What is the "25p content, telecined for PAL" notation? Because that is the same thing as "25p content, encoded interlaced" . The analogous case
for North America would be "29.97p content, telecined for NTSC" (and also "29.97p content encoded interlaced")

"25p content, telecined for PAL" notation? Did I write that?

I don't think so, or at least not final version. This also is the "25p content DVD, encoded interlaced". ...

Are you 'talking' about hard telecine? E.g. 30fps[2+00+3+00+2+00+3[96hps[2x[24pps]]]] (i.e. 24pps telecined via 2-3 pull-down). Or is there more to
what you 'say'?

Not "24pps telecined via 2-3 pull-down)" . I quoted the flow of the conversation above. To clarify - 25p content , "telecined" for PAL DVD. And 29.97p content, "telecined" for NTSC DVD.

Motion is required to properly analyze the highest frequency of image capture by the system setting. If I shoot 120fps with my gopro and wave
the camera around, there are 120 different images per sec. If it's a static shot, you can't tell definitively

Why is that important? You've brought that up several times. So what? A picture is a picture. If it's totally, perfectly static, then there will be no
MVs. But there will still be a picture. What are you driving at? What am I not understanding?

"content" rate is what I'm driving at. The number of unique motion samples /s (during motion). ie. Does the notation differentiate between unique count of the "images" , for "interlaced" scan? Refer to the AVCHD CCD example below.

... motion vector interpolation is interpolating temporally

Any process that adds new structures (new pictures, or new pixels to existing pictures) is not cosmetic. It's mechanical. Cosmetics just
enhance/emphasize some aspect that's already there at the cost of other aspects. Twiddling pixels that were shot (created) by camera is cosmetic.
Creating new pixels/halfpics/pictures/fields/frames is mechanical.

By that definition - and how a "deinterlacer" is defined this forum - a deinterlacer "mechanical" too . New scan lines are being interpolated. New frames are being interpolated from fields. A deinterlacer is spatially interpolating new pixels from a single field to create a frame.

Do you know of any reliable comb detector I can try? It's the only thing I lack to make a one-click 'anything'-to-120fps script.

No, it depends on the source and often you have to adjust the settings because of different source characteristics

Double rate deinterlacers attempt that "replica" too. When you apply a double rate deinterlacer to a 25i content source, it becomes a spatially
interpolated 50p version. 25i content can be thought as of starting with a 50p source, and discarding half the scan lines.

Really? What do you call 25i?

"25i"? - if you mean "25i content", I call it "25i content".

The camera uses an old style CCD sensor - so the sensor itself also works in "field scan" (as opposed to more modern CMOS sensors , progressive scan, where "interlaced content" [sps] outputs are created in camera after the sensor, but before the recording to media in the CMOS case.) For the DV recording, during motion, each field represents a different image. So the DV camera the output recording is 29.97fps[59.94sps].

Yes, I believe that's 30'fps[60'sps]].

For example, many AVCHD cameras have "30p" mode too. It's a 29.97p content stream in 59.94 fields/s . Early gen ones were CCD, so interlaced
scan sensor only. Knowing that info, what would you call "29.97p content stream in 59.94 fields/s" ?

The notation for such early gen CCD cameras would be the same as for an analog NTSC camera: 30'fps[60'sps].

The notation labels the 2 different content streams , the same thing ?

During motion, one has 59.94 different fields in time represented in 59.94 fields. The 2nd has 29.97 different fields represented in 59.94 fields (ie. there are duplicate fields) . But they are both labelled 30'fps[60'sps] ?

[sps] solely indicates that the image produced is the cinematographer's intended field of view with 2:1 proportions and is separated from its
'nearest' siblings by 1/(2*|frame rate|) number of seconds.

In the 29.97p content in 59.94fields/s AVCHD CCD case, the "cinematographer" sets it to "30p" setting, as the desired output. Can you clarify what the "intended field of view" is ?

In mediainfo - Exactly, just metadata label, and usually that is derived from the encoding mode used (I or P). mediainfo does not say anything about
the actual content.

Well, it's not metadata.

"metadata" is a bad description - "Flags" in the ES and/or container that mediainfo parses

Does "Scan type : Interlaced" just mean "Progressive: No", and does "Scan type : Progressive" just mean "Progressive: Yes"?

What are you referring to for "Progressive: No" or "Progressive: Yes"? Are you referring to MPEG2 specifically ? eg. progressive_frame flag in the picture_coding_extension ? Or progressive_sequence in the sequence_header extension ?

I'm not entirely sure for mediainfo; it can be unreliable. It looks to multiple places, not just ES, but container flags as well, and there appears to be a priority level. If container indicates one thing, but ES indicates another, ususally container wins (not just for Scan Type, other entries too) . It depends on the situation - If it's a fresh ES out of the encoder, then progressive_frame flag=1(true) will usually cause medianfo to report Scan type:progressive. But I've seen examples (where I don't know the provenance, somebody uploads a sample of something) where progressive_frame flag=0(false) also is reported scan type:progressive . I usually put low weight on what mediainfo reports, or interpret it cautiously in the situation context with other information

markfilipak

12th November 2021, 00:17

I've made a large compendium here: Video Notation, A Video Lingua Franca (http://forum.doom9.org/showthread.php?p=1957239#post1957239).

markfilipak

12th November 2021, 05:28

... What are you referring to for "Progressive: No" or "Progressive: Yes"? Are you referring to MPEG2 specifically ? eg. progressive_frame flag in the picture_coding_extension ? Or progressive_sequence in the sequence_header extension ?
Both. Here are some instances that perhaps have something to do with what you are trying to explain but I just can't seem to understand:

MPEG elemental stream
-- sequence_extension
-- -- offset 0005.4 : 'progressive_sequence' : '0' "may contain both frame pictures and field pictures"; '1' "contains only progressive frame pictures".
\ \
20th Century Fox splash screen I've never seen a single case of this.
RUNNING ON EMPTY
THE DEAD ZONE
28 DAYS
CHINATOWN
-- picture_coding_extension
-- -- offset 0008.0 : 'progressive_frame' : '0' for scans & hard telecined cinema or '1' for soft telecined cinema (24'fps[24pps])
\ \
20th Century Fox splash screen RUNNING ON EMPTY
28 DAYS THE DEAD ZONE
CHINATOWN

When set to '1' the coded video sequence contains only progressive frame pictures. When progressive_sequence is set to '0' the coded video sequence may contain both frame pictures and field pictures, and frame picture may be progressive or interlaced frames.

¶1 If progressive_frame is set to 0 it indicates that the two fields of the frame are interlaced fields in which an interval of time of the field period exists between (corresponding spatial samples) of the two fields.
¶3 If progressive_frame is set to 1 it indicates that the two fields (of the frame) are actually from the same time instant as one another.

I don't fully understand what H.262 § 6.3.5 is 'saying'.

markfilipak

19th August 2022, 22:43

RE halfpics:

I would clarify the rate , vs. composition ...

Thanks for all your inputs. I've incorporated them here:
Github repository (https://github.com/markfilipak/Video-Object-Notation/)

Or skip github and go direct to the page here:
The live page (https://markfilipak.github.io/Video-Object-Notation/)

I would guess that it will take you about 15 minutes to read it. Please, it would be an honor to have your views.

Enjoy,
Mark Filipak.