Preparing for numPy reshape [Archive]

markfilipak

14th October 2021, 15:58

btw, you can get ffmpeg raw output to vapoursynth,
Yes, that's what I'm doing now. In the future I need to put VS right in the middle of a filter graph, so I'm thinking of breaking the filter graph in half: preprocess and postprocess.

Preprocess in ffmpeg: open source video, do TB & PTS fix ups and whatnot.

MV interpolate in VS: pipe the raw frames via ffmpeg's pipe to do motion vector interpolation in VS, then pipe the frames back to ffmpeg via subprocess.Popen() -- I think.

Postprocess in ffmpeg & mkvmerge: mux in audio, subs, and chapters whatnot, and then output.

I would do everything in VS if I knew how. I'm lost in python -- totally a python newbie who is largely ignorant of standard python architecture and methods. From what I've found, the python documentation seems to concentrate on syntax and functions. But for what I'm trying to do, I need architecture methods, such as '<object>.list_functions()', which are unnamed in the documentation I've found -- you know that learning a new language is likened to climbing a mountain: the most time is spent looking at maps of the mountain and planning a route, not spent getting gear together. :-) I'm kind of bogged down looking for the maps.
if wanting to get a numpy image, where planes have different sizes, plane by plane size data is needed to read (not the whole frame) and then shove each plane into numpy, then stack those three planes into image array, not sure what you want, all processing using numpy is usually using rgb, rgb has same plane size for each array
Well, you see, if the input is yuv420p8 or yuv420p10, does ffmpeg create raw frames that are yuv420? Or yuv444? Or rgb420? Or rgb444? I don't know and haven't learned how to find out such things because I have been concentrating on purely mechical DVD & BD movie fixups (meaning: frame & field & timing manipulation, only, with no color processing and no cosmetic processing, e.g. no yadif).
So if having subsampled stdout, that'd be weird to get to numpy array, because one plane has different shape than other two, is it still what you want? iirc, .reshape(-1,w,h,3) should be for all frames to get , but planes need to have same size (yuv444 or rgb),
if it is for viewing raw data should be rgb -> numpy
I'm not sure what you mean, but, you know, it helps to explain the issues to someone knowledgeable like you. Your questions act as a guide pointing the correct path to take. And my explaining helps to clarify the issues in my mind. Thank you for pointing the correct path.

I guess my best approach would be to force ffmpeg to convert the input to raw rgb444p16le, only, so that all color planes are the same size and all pixels are on byte boundaries. What do you say? Or perhaps there's another way that involves less processing and memory? What do you say?

Thanks, Al.
--Mark.

poisondeathray

14th October 2021, 16:45

What is the 'pix_fmt' of raw frames in an ffmpeg filter complex?

It's whatever the input pix_fmt is, or whatever your transformed it to in the filter_complex.

Some filters have limitations on pixel formats, sometimes ffmpeg will "sneakily" auto inject some conversion.

You can insert a -vf showinfo to see what the current pixel format at that current node, or point in the filter graph

Or does the 'p' in 'yuv...p10' NOT mean "packed"?

planar

Are the raw frames in filter complexes RGB or YUV?

see above

Is there even such a thing as rgb420p10?

no - because RGB is never subsamped

You can use ffmpeg -pix_fmts to list ffmpeg supported pixel formats

Preprocess in ffmpeg: open source video, do TB & PTS fix ups and whatnot.

What are you using ffmpeg for in terms of preprocessing ?

Note there are no TB or PTS issues in vapoursynth from DVD or BD sources (they are CFR encoded only, although the content can VFR) - when you use a frame accurate source filter. None of this flaky wandering timestamp jitter issues that ffmpeg is prone to.

A buggy timestamp DVD or BD caused by ripping with makemkv using 1ms timebase can be interpreted as CFR, with perfect timestamps, perfect timebase, if you use a frame accurate source filter. What is important in these scenarios is "frame accuracy" - you can assume any framerate - and that will fix any timestamp issues if there were any

It would probably be simpler to do all the preprocessing steps in vapoursynth, since you're using it for the interpolation part anyways. Then continue encoding / muxing with ffmpeg/mkvmerge

I would do everything in VS if I knew how.

I find a faster way learn is from examples.

State what you're trying to do in basic terms - and there are probably examples posted already.

Well, you see, if the input is yuv420p8 or yuv420p10, does ffmpeg create raw frames that are yuv420? Or yuv444? Or rgb420? Or rgb444?

ffmpeg generally doesn't do anything unless you tell it to. It keeps the input pixel_fmt unless some procedure forces it to convert to another pixel format

If you send yuv420p8 data, it reads yuv420p8 data.

If you add some RGB-only filter, but don't specify the preconversion, it will auto insert a conversion (or sometimes throw an error)

I guess my best approach would be to force ffmpeg to convert the input to raw rgb444p16le, only, so that all color planes are the same size and all pixels are on byte boundaries. What do you say? Or perhaps there's another way that involves less processing and memory? What do you say?

No - unless your desired output format is 16 bit RGB (unlikely. ...but maybe you 're going to photoshop or after effects for some manual clean up , painting, compositing)

markfilipak

14th October 2021, 18:47

@poisondeathray, you are... a prince! In one reply, you have cleared up so much!
It's whatever the input pix_fmt is, or whatever your transformed it to in the filter_complex.
Thanks, I've often wondered about that.
Some filters have limitations on pixel formats, sometimes ffmpeg will "sneakily" auto inject some conversion.

You can insert a -vf showinfo to see what the current pixel format at that current node, or point in the filter graph
Ah! I did that long ago and forgot about it. Thanks for reminding me.
Or does the 'p' in 'yuv...p10' NOT mean "packed"?
planar
Thank you! Hmmm... Bad interpretation of previous clues on my part. Or perhaps, some people do think 'p' means 'packed', eh? Thanks for clarification. I'll not forget it.

Do you happen to know why it's called 'planar'? Is that simply to indicate that color is treated as sepatate (and separable) components or is it more profound like having to do with the structure of macroblocks (e.g. the differences how pixels are stored in frame v. field structure)? -- I'm quite familiar with the various species of macroblocks.
Is there even such a thing as rgb420p10?no - because RGB is never subsamped
Ding-ding-ding! That's a 10! I get it. I thought RGB could be subsampled. That's going to clear up a lot of issues for me.
You can use ffmpeg -pix_fmts to list ffmpeg supported pixel formats
I did that from the git-go with ffmpeg and saved it as a text file, ''ffmpeg -pix_fmts' .txt', to which I add notes. The main problem is that there are just names (not details/pictures) of the formats. I suppose that what's important is to match processing to 'pix_fmt' but I've not yet fully cracked that nut.
What are you using ffmpeg for in terms of preprocessing ?
Mostly to set TB to 1/FPS and PTS to 'N' to avoid ffmpeg complaints/errors. I was advised that that's not the right thing to do but, you know, I've transcoded many movies and I've yet to see one that's actually VFR or that produces an out-of-order raw frame stream. Despite assertions to the contrary, I'm pretty convinced that ALL DVD & BD content is CFR.
Note there are no TB or PTS issues in vapoursynth from DVD or BD sources (they are CFR encoded only, although the content can VFR) - when you use a frame accurate source filter. None of this flaky wandering timestamp jitter issues that ffmpeg is prone to.
I am 100% with you there. Well, that's really, really, going to help me retire ffmpeg entirely. Paul Mahol insists that a frame number, 'N', approach is not good and that monotonic PTSs are mandatory (without providing a clue how to achieve montonic PTSs). I never could accept that frames can come out of the decoder out of order and that PTSs are significant. I have seen clues from way back in time regarding a dispute between the ffmpeg folks and the avisynth folks in regard to the primacy of frame numbers over PTSs. Now I think I understand it and that Paul simply is wrong.
A buggy timestamp DVD or BD caused by ripping with makemkv using 1ms timebase can be interpreted as CFR, with perfect timestamps, perfect timebase, if you use a frame accurate source filter. What is important in these scenarios is "frame accuracy" - you can assume any framerate - and that will fix any timestamp issues if there were any
Okay. First, I don't use makemkv. I use AnyDVD-HD and rip (i.e. back up discs I own) to ISOs that I mount. Second, what you describe is exactly what I've been doing with 'settb=eval=1/24,setPTS=eval=N' and I've never gotten in trouble or had unexpected results. I think you just voided the need for preprocessing: just do the decode in VS, do the MV interpolation in VS, and then pipe the raw to ffmpeg. I LIKE IT! By the way, I set TB to 1/24 to retime movies to the speed and running times seen in theaters, and I use 'atempo' to fix up the audio, but I've not found a way to fix up subtitles and chapters, but I've discovered that it often doesn't matter -- mkvmerge must be doing some sort of fixup.
It would probably be simpler to do all the preprocessing steps in vapoursynth, since you're using it for the interpolation part anyways. Then continue encoding / muxing with ffmpeg/mkvmerge
Yup! Thanks!
I find a faster way learn is from examples.
Me too. And I think the best examples are complete workflows, not fragments of code. People are pretty smart and are generally able to extrapolate what they need to do in their workflows from a few good workflow examples, even if the objectives differ.
State what you're trying to do in basic terms - and there are probably examples posted already.
I think you have a good vision of what I'm doing already:

For progressive and soft telecine (what I call 23.9fps[24pps]): Force TB & PTSs to 24fps[24pps], MV interpolate to 120fps[120pps], resync audio subs & chaps to the resulting 1/120 TB, encode & mux.

For hard telecine(what I call 23.9fps[2-3[24pps]] for example): Add detelecine.

For so-called NTSC (what I call '29.9fps[59.9sps]'): separatefields, separately MV interpolate each field stream to 120fps, bob the first field stream for 2 frames while delaying the second field stream by 2 frames and finally weave them together to get a perfect run of 100% progressive frames.

Mixed 'NTSC'+hard telecine presents a problem: I've not found a way to flag combed frames, frame-by-frame. I need that flag in order to switch between my 29.9fps[2-3[24pps]]-to-120fps[120pps] method and my 29.9fps[59.9sps]-to-120fps[120pps] method.

The overall objective is a one-click solution to take anything on a DVD or BD disc that's been professionally mastered, probe it to ascertain its properties, and then transcode it to HEVC/MKV. What I don't believe is that professionals make so many mistakes as to make that goal unattainable -- afterall, anything a DVD/BD player can play can't be that messed up.
ffmpeg generally doesn't do anything unless you tell it to. It keeps the input pixel_fmt unless some procedure forces it to convert to another pixel format ... If you add some RGB-only filter, but don't specify the preconversion, it will auto insert a conversion (or sometimes throw an error)
I still lurk the ffmpeg-user list. So many problems people have are just the sort of surprises you cite.
I guess my best approach would be to force ffmpeg to convert the input to raw rgb444p16le ...
No - unless your desired output format is 16 bit RGB (unlikely. ...but maybe you 're going to photoshop or after effects for some manual clean up , painting, compositing)
Thanks so much.

poisondeathray

14th October 2021, 19:42

Do you happen to know why it's called 'planar'? Is that simply to indicate that color is treated as sepatate (and separable) components or is it more profound like having to do with the structure of macroblocks (e.g. the differences how pixels are stored in frame v. field structure)? -- I'm quite familiar with the various species of macroblocks.

Packed vs. planar has to do with the way uncompressed data is stored and organized. If you look at fourcc.org, it lists some common formats and orientation, with some diagrams
https://www.fourcc.org/yuv.php

YUV formats fall into two distinct groups, the packed formats where Y, U (Cb) and V (Cr) samples are packed together into macropixels which are stored in a single array, and the planar formats where each component is stored as a separate array, the final image being a fusing of the three separate planes.

For example, yuv420p8 is "8bit per pixel component, 4:2:0 subsampling". But the uncompressed format can be stored or arranged in a variety of ways. The "fourcc" code is supposed to identify the arrangement

eg. Both "YV12" and "NV12" and "IYUV" are all 8bit 4:2:0. But they store/arrage the data differently. "NV12" has U,V planes interleaved, but "IYUV" has reversed plane order compared to "YV12"
https://www.fourcc.org/pixel-format/yuv-yv12/
https://www.fourcc.org/pixel-format/yuv-nv12/
https://www.fourcc.org/pixel-format/yuv-i420/

I'm pretty convinced that ALL DVD & BD content is CFR.

For DVD/BD - the encoding is CFR, 100% always.

But the content can be VFR. It's very common - What I mean is you can have mixed cadence with field and frame repeats. e.g. 59.94 fields/s interlaced content sections - where this is motion in every field, with 29.97p sections, 23.976p sections, or with 14.985p sections, or other frame rates. For example a "slow motion" section of a 23.976p film might have 14.985p frames as duplicates (for effectively 1/2 speed during that sequence) .

I am 100% with you there. Well, that's really, really, going to help me retire ffmpeg entirely. Paul Mahol insists that a frame number, 'N', approach is not good and that monotonic PTSs are mandatory (without providing a clue how to achieve montonic PTSs). I never could accept that frames can come out of the decoder out of order and that PTSs are significant. I have seen clues from way back in time regarding a dispute between the ffmpeg folks and the avisynth folks in regard to the primacy of frame numbers over PTSs. Now I think I understand it and that Paul simply is wrong.

For general use, he's correct . You have to be able to cover VFR content cases , which FAR outnumber the CFR cases these days. Think cell phone video, tablets etc... - they are all recorded VFR.

The problem with avisynth , is natively CFR. VFR is more difficult to output (it's possible with decimation and output of timestamps to keep sync)

Okay. First, I don't use makemkv. I use AnyDVD-HD and rip (i.e. back up discs I own) to ISOs that I mount. Second, what you describe is exactly what I've been doing with 'settb=eval=1/24,setPTS=eval=N' and I've never gotten in trouble or had unexpected results. I think you just voided the need for preprocessing: just do the decode in VS, do the MV interpolation in VS, and then pipe the raw to ffmpeg. I LIKE IT! By the way, I set TB to 1/24 to retime movies to the speed and running times seen in theaters, and I use 'atempo' to fix up the audio, but I've not found a way to fix up subtitles and chapters, but I've discovered that it often doesn't matter -- mkvmerge must be doing some sort of fixup.

It's up to you, more than one way to do things.

I was just suggesting doing most of it in 1 program, because going back and forth adds complexity and overhead (slower)

I think you have a good vision of what I'm doing already:

For progressive and soft telecine (what I call 23.9fps[24pps]): Force TB & PTSs to 24fps[24pps], MV interpolate to 120fps[120pps], resync audio subs & chaps to the resulting 1/120 TB, encode & mux.

For hard telecine(what I call 23.9fps[2-3[24pps]] for example): Add detelecine.

For so-called NTSC (what I call '29.9fps[59.9sps]'): separatefields, separately MV interpolate each field stream to 120fps, bob the first field stream for 2 frames while delaying the second field stream by 2 frames and finally weave them together to get a perfect run of 100% progressive frames.

Those ones are covered and "textbook" cases . Hard and soft can be treated functionally the same.

If you want to "speedup" 24000/1001 to 24/1, you can use
core.std.AssumeFPS(clip, fpsnum=24, fpsden=1) . It's the same frame count, just the framerate, their timestamps are all adjusted.

There are 24000/1001 vs. 24/1 variants on BD but rarely are BD's telecined (almost always native progressive); but "film" DVD will always be 24000/1001

The NTSC interlaced content case is slightly problematic because of the even/odd field offset (spatially displaced 1 pixel). If you do it the way you propose, you will get an up/down flutter motion per frame pair. Usually a smart double rate deinterlacer would be used, like QTGMC , to output 59.94p, then interpolate to something else if desired.

Mixed 'NTSC'+hard telecine presents a problem: I've not found a way to flag combed frames, frame-by-frame. I need that flag in order to switch between my 29.9fps[2-3[24pps]]-to-120fps[120pps] method and my 29.9fps[59.9sps]-to-120fps[120pps] method.

Yes, this is "VFR" content.

Combed frames can be flagged (but comb detection is not necessarily 100% accurate - there are different types of "comb" patterns) , but I don't know of a good way to automatically process mixed VFR cadence with interpolation properly, and automatically.

The overall objective is a one-click solution to take anything on a DVD or BD disc that's been professionally mastered, probe it to ascertain its properties, and then transcode it to HEVC/MKV. What I don't believe is that professionals make so many mistakes as to make that goal unattainable -- afterall, anything a DVD/BD player can play can't be that messed up.

Maybe for standard titles, big budget Hollywood movies.

I have a feeling you haven't encountered "problem" DVD's, such as some low budget DVD's, multi format converted sources, some anime DVD's. These have many layers of problems on top of what you describe - and what many posts in the avisynth forum deal with . (There is no one click solution for those situations, they need human eyes and custom script solutions)

markfilipak

17th October 2021, 01:32

... For DVD/BD - the encoding is CFR, 100% always.

But the content can be VFR. It's very common - What I mean is you can have mixed cadence with field and frame repeats. e.g. 59.94 fields/s interlaced content sections - where this is motion in every field, with 29.97p sections, 23.976p sections, or with 14.985p sections, or other frame rates. For example a "slow motion" section of a 23.976p film might have 14.985p frames as duplicates (for effectively 1/2 speed during that sequence) . ...
You are so generous with your time and your replies that I almost hesitate to take more of it.

I know you're very knowledgeable and have given video much thought. May I share my views?

That you mix frame rate and picture rate does not surprise me. The MPEG engineers do the same. If there's one small contribution I can make, I would like it to be my nomenclature to separate frame rate and picture rate. Let me give examples and see what you think of it.

'24pps' denotes 24 pictures per second. '23.9fps' denotes 24000 frames per 1001 seconds.
'23.9fps[24pps]' denotes cinema-to-video that runs slow by 1 part in 1000 parts -- running time is ++3.6 seconds per hour.
(The brackets essentially mean 'contained'.)

'72fps[24pps]' denotes cinema that's essentially triple-shuttered.

'2-3[24pps]' denotes 2-3 pull-down cinema that yields the equivalent of 30pps.
'29.9fps[2-3[24pps]]' denotes hard telecine of 2-3 pull-down cinema that runs slow by 3.6 seconds per hour.
'3-2[24pps]' '2-2-2-4[24pps]' etc. denote other pull-downs.

'59.9sps' denotes NTSC. '50sps' denotes PAL. '29.9fps[59.9sps]' & '25fps[50sps]' denote interlaced digital-NTSC & -PAL.
'120fps[59.9sps]' denotes interlaced digital-NTSC that's doubled and runs fast by 3.596403.. seconds per hour.

'120fps[120pps] is cinema that's been 1-to-5 interpolated to 120pps and put in frames on a 1-to-1 basis.

Using this notation, I haven't encountered any video situation that can't be compactly characterized with one exception: the DVD of the movie "PASSION FISH". That DVD feature has sequences of 2 combed frames alternating with an odd number of progressive frames. The number of progressive frames is between 5 and 71, is always an odd number, and the alternation has no descernable repetition pattern.

You cite an example: "59.94 fields/s interlaced content sections - where this is motion in every field, with 29.97p sections, 23.976p sections, or with 14.985p sections".
If they are characterized by '29.9fps[59.9sps]' '29.9fps[29.9pps]' '29.9fps[24pps]' and '29.9fps[14.9pps]', then they are all contained in 29.9fps frames and is therefore a CFR stream.
Is that what you intended?

PS: In the same way that a TS contains PSs (i.e. frames), a notation that has frames contining pictures is, I think, a useful and consistent extension.

poisondeathray

17th October 2021, 02:46

That you mix frame rate and picture rate does not surprise me. The MPEG engineers do the same.

Not a fan of the notation, but that's just me. Maybe someone will like it.

I'm just distinguishing between the content frame rate vs. encoded frame rate or field rate. The frame rate is 29.97 , and the field rate is 59.94 for all interlaced encoded NTSC DVD's for the encoded stream. But that is not necessarily reflective of what the content frame rate truly is.

You cite an example: "59.94 fields/s interlaced content sections - where this is motion in every field, with 29.97p sections, 23.976p sections, or with 14.985p sections".
If they are characterized by '29.9fps[59.9sps]' '29.9fps[29.9pps]' '29.9fps[24pps]' and '29.9fps[14.9pps]', then they are all contained in 29.9fps frames and is therefore a CFR stream.
Is that what you intended?

For that example , it was supposed to be a NTSC DVD. For an interlaced encoded DVD it's all something in 59.94fields/s - because that's the field rate that everything is contained in . So yes, the 59.94 fields/second is the CFR stream. And the point was the content can be variable in that CFR stream (the content frame rate is changing if the duplicates were removed). In your notation it would be 59.9sps[somthing pps] , except for the interlaced content which would be 29.97fps[59.9sps] , I think

What people have been using for years are descriptions like "29.97i" for interlaced content, "29.97p in 29.97i" , "23.976p in 29.97i" , "14.985p in 29.97i" most people would abbreviate as "15p in 29.97i". pN for progressive native, such as 23.976pN , 24pN, 29.97pN .

How would you distinguish between a video 14.985p in 29.97p (encoded progressively) vs. 14.985p in 29.97i (encoded interlaced as fields)? - or is that what sps is for ? - you never fully explained what the sps letters stand for - so would it be "29.9fps[14.9pps]" vs "59.9sps[14.9pps]" ? And 14.985p in 59.94p would be "59.94fps[14.9pps]" ?

markfilipak

17th October 2021, 05:18

... What people have been using for years are descriptions like "29.97i" for interlaced content, "29.97p in 29.97i" , "23.976p in 29.97i" , "14.985p in 29.97i" most people would abbreviate as "15p in 29.97i". pN for progressive native, such as 23.976pN , 24pN, 29.97pN .
It seems to me that what I'm calling "pictures" (hence, pps) you are calling "content". Does that help clarify? The MPEG specs call them "pictures".
How would you distinguish between a video 14.985p in 29.97p (encoded progressively) vs. 14.985p in 29.97i (encoded interlaced as fields)?
14.985p in 29.97p (encoded progressively) == 29.970fps[14.985pps]. That is 14.985 pictures per second shown at 29.970 frames per second, so shown at 2x the original picture rate (sped up).

"14.985p in 29.97i (encoded interlaced as fields)?" Well, if I understand correctly, that would be 14.985 pictures per second, deinterlaced (so, 29.970sps), then interlaced and encoded at 29.970 frames per second -- that, though it doesn't make much sense to me. So I guess that'd be '29.970fps[1-1[14.985pps]]. My problem is with the meaning of "14.985p in 29.97i".

I'm probably misunderstanding because, you see, I have a problem understanding what most folks mean by the word "interlace". For example, you wrote "encoded interlaced as fields". That, to me, is a contradictory statement -- an encoding is either frame-based (interlaced) or field-based (not interlaced). Since field encoding is not interlaced, "encoded interlaced as fields" confuses me.

By "interlaced", most folks mean 2 temporal scans woven together to form (combed) pictures but in the macroblocks, the picture data is field-based, not interlaced -- the interlace occurs in the decoder, not in the stream and the metadata are instructions to the decoder, not a statement regarding what the macroblock format is. So, what most people call "interlaced" video is actually not interlaced. The MPEG engineers solve this problem by calling such streams "interleaved", not "interlaced" -- in fact, when you read the various MPEG specs, you won't find the word "interlace" anywhere. Thus, an interlaced video means that the decoder is required to interlace the fields (i.e. the scans), not that the stream consists of interlaced data. In other words, an interlaced video is actually non-interlaced. Confusing isn't it?

The key to my understanding is that metadata: 'progressive_sequence' 'picture_structure' 'top_field_first' 'repeat_first_field' 'progressive_frame', are all instructions to decoders. However, most folks interpret them as stream data states, instead. So, I've learned that when someone refers to "interlaced video" they mean a video that needs to be interlaced.

I've likewise learned that when most folks refer to "deinterlace" (as, for example, a deinterlace filter), they mean a process that deinterlaces as part of the process, not as an input or an output format.

I even have a problem with the word "filter" because many of the processess that are called filters do absolutely no filtering (i.e. no separating, no sorting, no routing -- even a process that weaves is called a "filter"). They are not filters but, instead, are processes. But I've learned that the word "filter" in video can mean almost anything, usually denoting a position in a processing pipeline (such as 'filter_complex'), not what I learned in electrical engineering courses.

I tried to discuss such stuff in the ffmpeg-user mailing list and was attacked as not knowing anything and stupid.

- or is that what sps is for ? - you never fully explained what the sps letters stand for
Oh, sorry, "scans per second". In other words, non-interlaced fields (scans) -- what MPEG calls "half-pictures" (which is not really correct for scans, eh? But then there's a lot of stuff in MPEG that's not quite correct, eh?).

I use "scans" instead of "fields" because "scan" can be abbreviated as "s" whereas "field" presents an obvious problem. Also, I think the word "scan" is more descriptive of the camera (television) used to make pictures.
- so would it be "29.9fps[14.9pps]" vs "59.9sps[14.9pps]" ?
If the original video was recorded at 14.985pps and then put into 29.970fps frames, that would simply be '29.970fps[14.985pps]', meaning: 14.985 pictures per second shown at 29.970 frames per second, so shown at 2x the original picture rate.
And 14.985p in 59.94p would be "59.94fps[14.9pps]" ?
Yes. 14.985 pictures per second shown at 59.940 frames per second, so shown at 4x the original picture rate.

PS: Just a moment. Upon rereading I see you wrote "59.9sps[14.9pps]". What do you mean by that?

PPS: You're (naturally) asking me about a lot of things I haven't considered in my workflows. For example, to be consistent,
59.9fps[14.9pps] would be 14.9pps framed at 59.9fps, i.e. sped up by 4x.
59.9fps[8[14.9pps]] would be 14.9pps with 8 fields per picture contained in 4 frames, i.e. 14.9pps simply quadruple-shuttered -- '8' specifies a type of telecine, in the same way that '2-3' is telecine except that '8' doesn't change the cadence, it just uses the 8 fields (1 field-pair are original + 3 field-pairs are copies) to fill out the 4 frames. so, no speed up but just 4x shutter.

_Al_

17th October 2021, 06:03

markfilipak

17th October 2021, 06:38

@poisondeathray

I tried to include (attach?) a '.jpg' showing 29.9fps[2-1-2-5[24pps]] from an "ALL ABOUT EVE" BD bonus feature (i.e. 00390.m2ts, frames 1018-1029). It may be approved and show up in a future post to this thread. Also, I have a '.jpg' showing 29.9fps[1-4-4-4..[24pps]+2-3[24pps]] from the tltle screens of "28 DAYS" (i.e. VTS_02_1.VOB). It's a mix of a badly edited background (the 1-4-4-4..[24pps] part) with overlayed text (the 2-3[24pps], i.e. normally telecined, part that apparently was added later). If the "ALL ABOUT EVE" post shows up here, I'll post the "28 DAYS" jpeg. Both of them are terribly hard to describe but using the notation, they are precisely and unambiguously characterized. They provide excellent examples of the power of the notational system.

Selur

17th October 2021, 06:41

side note: might be easier/faster to upload the images to something like imgbb.com

markfilipak

17th October 2021, 06:48

Oh man, just hop on the wagon and use terminology that is used in the neighborhood and there is a reason why. In DVD, broadcast , if interlaced, it is all about fields. Everyone knows what is delivery fps and what is actual fps that is most of the time "dug out" from there. Not sure why would you suddenly decided to attack a terminology. That leads absolutely nowhere.
Hi Al.

Good to hear from you. You know that, if you don't like what I write -- totally understandable -- you're free to ignore it, eh?
I'd be worried about that 24fps to 120fps. It is just insane. You are going to create tons of artifacts that does not belong to video, and you will be storing it.
I've been motion vector interpolating 24fps[24pps] to 120fps[120pps] for some time now and have gotten amazing results and, with such short motion vectors that compress really well, file sizes between 1/6th and 1/8th the sizes of the originals despite using placebo settings.... Truely stunning outputs.

markfilipak

17th October 2021, 06:54

side note: might be easier/faster to upload the images to something like imgbb.com
Thanks for the tip, Selur,

I can be patient. Also, I like to have the jpegs in the thread. That's more convenient, and it makes the posting look so cool. :)
...Like I know what I'm 'talking' about.

poisondeathray

17th October 2021, 16:27

14.985p in 29.97p (encoded progressively) == 29.970fps[14.985pps]. That is 14.985 pictures per second shown at 29.970 frames per second, so shown at 2x the original picture rate (sped up).

Yes, to be clear , this is 14.985p content with duplicates , encoded progressively. 2x the number of frames, but 2x the speed

If you removed the duplicates you have the original, and timestamps would show a delta of ~66.733ms for that section.

Different sections might have different "pps" - hence the usefulness of "VFR" and timestamps that Paul alluded to (? or Elon... WTF :)). Timestamp VFR means you can have unique frames and content runs at the original rate (or each frame is displayed for the proper amount of time) . You don't need wasteful frame or field repeats - that archaic system was imposed on us to comply with NTSC, broadcast engineers

"14.985p in 29.97i (encoded interlaced as fields)?" Well, if I understand correctly, that would be 14.985 pictures per second, deinterlaced (so, 29.970sps), then interlaced and encoded at 29.970 frames per second -- that, though it doesn't make much sense to me. So I guess that'd be '29.970fps[1-1[14.985pps]]. My problem is with the meaning of "14.985p in 29.97i".

"deinterlacing" means different things to different people. I wouldn't call it "deinterlaced" that situation

This is 14.985p original content encoded for NTSC DVD. Hard telecined if you will, so there would be frame duplicate pairs if you were examining frames. Nothing else is done to the content. The encoding type is interlaced (in MPEG2 alternate scan would be used) instead of progressive (in MPEG2 zig zag scan, but progressive would be soft telecined with repeat field flags for DVD compatibility). The metadata "flagging" will be different , one will be interlaced, the other progressive. The flagging and metadata has potential implications for other programs and how stream is handled. (And there is MBAFF too, but we will avoid that for the moment)

I'm probably misunderstanding because, you see, I have a problem understanding what most folks mean by the word "interlace". For example, you wrote "encoded interlaced as fields". That, to me, is a contradictory statement -- an encoding is either frame-based (interlaced) or field-based (not interlaced). Since field encoding is not interlaced, "encoded interlaced as fields" confuses me.

By "interlaced", most folks mean 2 temporal scans woven together to form (combed) pictures but in the macroblocks, the picture data is field-based, not interlaced -- the interlace occurs in the decoder, not in the stream and the metadata are instructions to the decoder, not a statement regarding what the macroblock format is. So, what most people call "interlaced" video is actually not interlaced. The MPEG engineers solve this problem by calling such streams "interleaved", not "interlaced" -- in fact, when you read the various MPEG specs, you won't find the word "interlace" anywhere. Thus, an interlaced video means that the decoder is required to interlace the fields (i.e. the scans), not that the stream consists of interlaced data. In other words, an interlaced video is actually non-interlaced. Confusing isn't it?

The key to my understanding is that metadata: 'progressive_sequence' 'picture_structure' 'top_field_first' 'repeat_first_field' 'progressive_frame', are all instructions to decoders. However, most folks interpret them as stream data states, instead. So, I've learned that when someone refers to "interlaced video" they mean a video that needs to be interlaced.

I've likewise learned that when most folks refer to "deinterlace" (as, for example, a deinterlace filter), they mean a process that deinterlaces as part of the process, not as an input or an output format.

I even have a problem with the word "filter" because many of the processess that are called filters do absolutely no filtering (i.e. no separating, no sorting, no routing -- even a process that weaves is called a "filter"). They are not filters but, instead, are processes. But I've learned that the word "filter" in video can mean almost anything, usually denoting a position in a processing pipeline (such as 'filter_complex'), not what I learned in electrical engineering courses.

I tried to discuss such stuff in the ffmpeg-user mailing list and was attacked as not knowing anything and stupid.

Yes, valid points.

You're correct - it is field encoding, frame encoding, or a mixed field or frame macroblocks (MBAFF).

"interlace", "deinterlace", and "filter" can mean different things to different people . Everyone might be on a different page.

People, official organizations make up and change terms too. For example "PAR" pixel aspect ratio in MPEG2 is now called "SAR" or sample aspect ratio in MPEG4 terminology. It's all there in the ITU specs. There is no PAR anymore in modern formats. Or "29.97i" and "25i" is the original official notation that many organizations like broadcasters, EBU use, but Sony, Adobe, a bunch of other companies now call it "59.94i" and "50i." Maybe their marketing team though a higher number would sell more cameras

You'll never get everyone to agree on nomenclature, just do the best you can to describe whatever it is.

And 14.985p in 59.94p would be "59.94fps[14.9pps]" ?

Yes. 14.985 pictures per second shown at 59.940 frames per second, so shown at 4x the original picture rate.

Yes, 4x repeats.

For example 720p59.94 broadcast channels. The encoded stream is 59.94p. The content frame rate (pps if you want) in a given section might be 14.985fps consisting of 4x frame repeats.

If you decimated the duplicates, you would have the original 14.985fps

PS: Just a moment. Upon rereading I see you wrote "59.9sps[14.9pps]". What do you mean by that?

That was referring to a 29.97i stream, with 14.985 fps content.

PPS: You're (naturally) asking me about a lot of things I haven't considered in my workflows. For example, to be consistent,
59.9fps[14.9pps] would be 14.9pps framed at 59.9fps, i.e. sped up by 4x.

I like "framed at" description, it's similar to the "14.985p in 59.94p" description (4x frame repeats)

I don't like term "sped up", because that implies a speed change (it is a speed change, but with duplicates - someone might interpret that differently, it's a point that might cause confusion)

14.985p native content (unique frames) appears the same as 14.985p in 59.94p (4x frame repeats), on a 59.94Hz display. The former has 4x fewer frames and is more efficient in terms of encoding

markfilipak

18th October 2021, 00:40

And 14.985p in 59.94p would be "59.94fps[14.9pps]" ?

Yes. 14.985 pictures per second shown at 59.940 frames per second, so shown at 4x the original picture rate.

Yes, 4x repeats.

No, not 4x repeats; 4x picture rate.
Bear with me, this is at the core of our miscommunication regarding the notation. Trust me, it's simpler than you think.
14.985pps framed by the camera would be 14.985fps[14.985pps].
14.985pps with 4x repeats would be 59.940[8[14.985pps]] -- pictures have 2-to-8 field telecine, so would run at 1x but are actually 4x shuttered.
14.985pps with 4x speed up would be 59.940[14.985pps] -- pictures would run fast by 4x (running time would be 1/4).
For example 720p59.94 broadcast channels. The encoded stream is 59.94p. The content frame rate (pps if you want) in a given section might be 14.985fps consisting of 4x frame repeats.

If you decimated the duplicates, you would have the original 14.985fps

Okay, that's definitely 59.940[8[14.985pps]] -- 14.985pps with 2-to-8 field telecine.

Give me more use cases and I'll give you more examples of the notation.

To cite one of the most common use cases:
29.970[2-3[24pps]] is cinema (24pps) that's 2-3 telecined (8-to-10 field telecine, so 30 telecined pictures per second) inside 29.970fps (telecined picture rate = frame rate, so no speed up). Well, actually 2-3[24pps] runs slow by 29.970/30 (a 'feature' that the notation exposes unambiguously).

If I write this conversion:
23.976fps[24pps] --> 29.970fps[2-3[24pps]]
isn't that an easier, more compact, and more precise way to characterize 2-3 telecine that runs x/1.001 slow than by using words to explain it?

Another example, this time interpolating to a higher picture rate:
24pps --> 120pps is 1-to-5 picture interpolation.
24fps[24pps] --> 120fps[120pps] is the same interpolation, but contained in frames.
24fps[120pps] would be x/5 slow motion.

You see, when you write "24p", I can't tell whether you mean 24fps (a frame rate) or 24pps (a picture rate).
But if I write "24fps", you know that's a frame rate. If I write "24pps", you know that's a picture rate. If I write "24fps[24pps]", you know that's 24 pictures per second in frames running at 24 frames per second, and you can conclude that the pictures are shown at normal rate. See? Simple, eh? Restricting communication to just "p" and "i" for everything leads to frame v. picture confusion that breaks understanding.

Let me state it another way so you understand. When you write "24p", you know what you mean (by its context), but I don't know what you mean because I don't yet understand the context -- the context is the thing you're trying to explain, otherwise, no one would misunderstand anything, eh? Using terms that rely on context when trying to explain the context (the use case) leads to frustration for both of us.

PS: Just a moment. Upon rereading I see you wrote "59.9sps[14.9pps]". What do you mean by that?

That was referring to a 29.97i stream, with 14.985 fps content.

Okay, let me stick with that for a bit... (I think it may be fruitful.)
By "a 29.97i stream", I think you mean 29.970 frames per second, right? Or do you actually mean 29.970 fields per second? ...At this point, I really don't know.
1 - If you mean 29.970 frames per second, then that's 29.970fps[??????pps].
2 - But if you mean 29.970 fields (i.e. scans) per second, then that's ??????fps[29.970sps].
And by "14.985 fps content", I think you mean 14.985 frames per second, right? Or do you actually mean "content" (meaning: pictures/scans)? ...At this point, I really don't know.
A - If you really do mean 14.985 fps, then that's simply 14.985[??????pps].
B - But if you really do mean content, then that's either ??????fps[14.985pps] or ??????fps[14.985sps].
And to be clear, by "picture" I mean 720x480 for example, and by "scan" I mean 720x240 for example.
(Now, technically, if the original frame's picture is progressive and has been separated into fields (deinterlaced if you wish), then the result is actually half-pictures, not scans, but I'm going to ignore that detail for the time being because what you wrote includes the letter "i" which leads me to believe you're referring to scans. Okay?)
So when attempting to understand what you wrote: "29.97i stream, with 14.985 fps content", I'm presented with 4 possibilities:
1A - The union of 29.970fps[??????pps] and 14.985fps[??????pps], or
1B - The union of 29.970fps[??????pps] and ??????fps[14.985sps], or
2A - The union of ??????fps[29.970sps] and ??????fps[14.985sps], or
2B - The union of ??????fps[29.970sps] and ??????fps[14.985sps].
Well, I can immediately toss out 2A and 2B because they are the same, and they both cite 'sps' with no 'fps' and those 'sps's conflict.
And, I can toss out 1A because a video can't be both 29.970fps and 14.985fps at the same time.
That leaves me with this:
1B - The union of 29.970fps[??????pps] and ??????fps[14.985sps].
At this point I think you refer to a video made by a field-scan camera (maybe a TV camera or camcorder) producing digital-NTSC frames. Am I right?
In that case, then the notation is 29.970fps[14.985sps].
If that is the case, and since the normal frame rate for 14.985sps would be 7.4925fps, then I'd say that you're referring to an interlaced video taken at 14.985sps that is sped up by 4x.

How'd I do?

poisondeathray

18th October 2021, 02:19

No, not 4x repeats; 4x picture rate.
Bear with me, this is at the core of our miscommunication regarding the notation. Trust me, it's simpler than you think.
14.985pps framed by the camera would be 14.985fps[14.985pps].
14.985pps with 4x repeats would be 59.940[8[14.985pps]] -- pictures have 2-to-8 field telecine, so would run at 1x but are actually 4x shuttered.
14.985pps with 4x speed up would be 59.940[14.985pps] -- pictures would run fast by 4x (running time would be 1/4).

Ok I mostly get it now.

Okay, that's definitely 59.940[8[14.985pps]] -- 14.985pps with 2-to-8 field telecine.

But it's encoded progressively as frames, broadcast as frames - so "field telecine" might be inappropriate way to describe it

There is definitely merit to the notation - but trust me - you're going to confuse many people.

If I were you, I would probably write a guide or "FAQS" with common examples - so people can understand what you mean.

The rest of the world is probably fine with the "15p in 60p", or "23.976p in 29.97i" style notation :D It's easier and it already works. People don't like change. LIke PAR to SAR, 29.97i to 59.94i. Video people on forums generally know what it means, because they deal with processing of various video

.

If I write this conversion:
23.976fps[24pps] --> 29.970fps[2-3[24pps]]
isn't that an easier, more compact, and more precise way to characterize 2-3 telecine that runs x/1.001 slow than by using words to explain it?

Yes it is . But you're also going to lose some readers and confuse them with that notation, at least initially. And for the "video" people , they already know what "film NTSC telecined" is and how you got from A to B. For new people - you're going to lose them too, or you're going to have explain the process in words in addition to the notation anyways...

You see, when you write "24p", I can't tell whether you mean 24fps (a frame rate) or 24pps (a picture rate).
But if I write "24fps", you know that's a frame rate. If I write "24pps", you know that's a picture rate. If I write "24fps[24pps]", you know that's 24 pictures per second in frames running at 24 frames per second, and you can conclude that the pictures are shown at normal rate. See? Simple, eh? Restricting communication to just "p" and "i" for everything leads to frame v. picture confusion that breaks understanding.

Personally, I don't write it as "24p" . I usually write as 24.0p or 24/1 (mainly to distinguish it from 23.976p - people often use "24p" when they mean 23.976p). "24p" can only mean 1 thing in common usage (it indicates both frame rate and picture rate). But I can see how that notation can be useful in some situations - eg. if you're speeding up or slowing down . But you can say add "sped up to x" or "slowed down to x". That how people communicate , it's clear and it works

Let me state it another way so you understand. When you write "24p", you know what you mean (by its context), but I don't know what you mean because I don't yet understand the context -- the context is the thing you're trying to explain, otherwise, no one would misunderstand anything, eh? Using terms that rely on context when trying to explain the context (the use case) leads to frustration for both of us.

Yes, but most video people "get it" and understand what is being said based on experience and context of the topic. If you browse video forums, there is a way of communicating, and people (I mean the experienced regulars, not necessarily new members) understand it. Different forums have slightly different sub-cultures and slightly different ways of communicating, but for the most part it works, and people know what is being said. Part of your misunderstanding might be due to not dealing much with video or posting on forums.

But there is nothing wrong with explaining in different ways, or different words - I welcome it (but others might not as you've seen on some forums)

By "a 29.97i stream", I think you mean 29.970 frames per second, right? Or do you actually mean 29.970 fields per second? ...At this point, I really don't know.

You might not know , but people that work with video definitely know...

"in 29.97i" means the stream is encoded as fields at 59.94 fields/second.
"in 29.97p" means the stream was encoded as frames at 29.97 frames/s
"in 59.94p" means the stream was encoded as frames at 59.94 frames/s

"i" means field encoding, "p" means frame encoding.

14.985p in 29.97i can only mean 1 thing to people here (and most that work with video)
14.985p in 59.94p can only mean 1 thing to people here (and most that work with video)

So when attempting to understand what you wrote: "29.97i stream, with 14.985 fps content", I'm presented with 4 possibilities:

That pulled out of context alone might be ambiguous, but if you follow the conversation, you're partially quoting what was originally written - it was originally stated as "14.985p in 29.97i"

How would you distinguish between a video 14.985p in 29.97p (encoded progressively) vs. 14.985p in 29.97i (encoded interlaced as fields)? - or is that what sps is for ? - you never fully explained what the sps letters stand for - so would it be "29.9fps[14.9pps]" vs "59.9sps[14.9pps]" ? And 14.985p in 59.94p would be "59.94fps[14.9pps]" ?

markfilipak

20th October 2021, 03:34

Ok I mostly get it now.
The rest of the world is probably fine with the "15p in 60p", or "23.976p in 29.97i" style notation :D It's easier and it already works. People don't like change. LIke PAR to SAR, 29.97i to 59.94i. Video people on forums generally know what it means, because they deal with processing of various video
Well, dear friend, that would be fine if everyone followed your example (and it would be fine with me), but they don't ...because it's not been formalized. I propose formalizing it, and where else but in Doom9, eh?
PAR, "picture aspect ratio" -- The MPEG folks never use the acronym "PAR", but they do call the macroblock data "picture".
SAR, "sample aspect ratio" -- That's what the MPEG folks call it.
Now, bear with me...
In the case of PAR & SAR, it doesn't really matter.
DAR (display AR) = PAR (picture AR) x SAR (sample AR), or
DAR (display AR) = PAR (pixel AR) x SAR (storage AR).
The equation is the same in either case. But I've seen this:
DAR (data AR), and that is wrong, but the people who write it defend it.

That's not really the same type of issue as with fps vs. pps/sps, is it?
29.97i might be interpreted as 29.97fps with scan interlaced fields, or
59.94i might be interpreted as 59.94 scans per second in 29.97fps frames (but not necessarily: they could be in 119.8fps frames or in 10fps frames).
Big difference there.
29.97fps[59.94sps] is unambiguous. It specifies both frame rate and original scan rate, and does so in a way that can't be misinterpreted, eh?

poisondeathray

20th October 2021, 04:35

Well, dear friend, that would be fine if everyone followed your example (and it would be fine with me), but they don't ...because it's not been formalized. I propose formalizing it, and where else but in Doom9, eh?
PAR, "picture aspect ratio" -- The MPEG folks never use the acronym "PAR", but they do call the macroblock data "picture".
SAR, "sample aspect ratio" -- That's what the MPEG folks call it.
Now, bear with me...
In the case of PAR & SAR, it doesn't really matter.
DAR (display AR) = PAR (picture AR) x SAR (sample AR), or
DAR (display AR) = PAR (pixel AR) x SAR (storage AR).
The equation is the same in either case. But I've seen this:
DAR (data AR), and that is wrong, but the people who write it defend it.

I've also see FAR, as frame aspect ratio (dimensions of frame w:h)

DAR = FAR x SAR

That's not really the same type of issue as with fps vs. pps/sps, is it?
29.97i might be interpreted as 29.97fps with scan interlaced fields, or
59.94i might be interpreted as 59.94 scans per second in 29.97fps frames (but not necessarily: they could be in 119.8fps frames or in 10fps frames).
Big difference there.
29.97fps[59.94sps] is unambiguous. It specifies both frame rate and original scan rate, and does so in a way that can't be misinterpreted, eh?

Yes, it is a better notation style, no argument here

The problem is there are 2 people in the world that knows what it means. It's a bit confusing at first. I know now , so I can help translate what you say, or help translate what other people say to you

In your mind it's unambiguous , but you're assuming someone took time to learn it. That notation might be interpreted as 29.97fps content in 59.94 fields/s, or does that mean actual interlaced content 59.94 fields/s content, with a framerate of 29.97 ? It's quite easy to misinterpret the first time you see it

The more simple your "readme" or "faqs" are with examples - the higher chance that a someone might take time to figure out what you're trying to say. It might end up like "betamax" vs. "vhs", where betamax was technically superior but wasn't as popular :)

People here (or other video related forums) tend use the "content" description as your "pps" , or just describe it in words.

markfilipak

20th October 2021, 04:51

Yes, it is a better notation style, no argument here
Well, thank you poisondeathray. That means a lot to me.

The problem is there are 2 people in the world that knows what it means. It's a bit confusing at first. I know now , so I can help translate what you say, or help translate what other people say to you
You are a helper and a giver. That's plain to see. You make my world go round.

Actually, I've seen other folks using the notation in the last few weeks. If it's going to catch on, better that it catch on slowly.

We both prefer showing by examples over pedantic explanations. Give me some simple and some difficult use cases in words and I'll try to 'translate' them, eh? That will help me. And other people who read this will catch on. People are pretty smart, especially people here.

PS: With the possible exception of implementing finite state machines in software, anything that requires much explanation isn't worth a sh*t.

markfilipak

20th October 2021, 05:07

I'm thinking of making one more extension: "hps" -- half-pictures per second.
"fps" -- frames/sec
"pps" -- pictures/sec
"sps" -- scan-fields/sec
"hps" -- half-pictures/sec, e.g. 24fps[24hps], to indicate temporally aligned fields pulled out of a picture -- the MPEG folks use this term in one place that I've seen. In most places in the specs, they call framed fields "pictures" but in one section they call them "half pictures".

poisondeathray

20th October 2021, 16:54

"hps" -- half-pictures/sec, e.g. 24fps[24hps], to indicate temporally aligned fields pulled out of a picture -- the MPEG folks use this term in one place that I've seen. In most places in the specs, they call framed fields "pictures" but in one section they call them "half pictures".

Did you mean PsF ?

Actually, I've seen other folks using the notation in the last few weeks. If it's going to catch on, better that it catch on slowly.

Give me some simple and some difficult use cases in words and I'll try to 'translate' them, eh? That will help me. And other people who read this will catch on.

You're collecting "disciples." Look out Betamax ! :devil:
I doubt anyone is going to use it here, but you might get some new recruits :)

But I'll try to help you improve the notation and describe some examples. If you want farther explanation or video samples (in most video forums, a video sample is worth a 10,000 words) let me know

Some common NTSC cases have been mentioned here already, but they should be put together in one handy reference - your homework project :) . Hard and soft telecine with different content rates . eg. super8 or 8mm content at 16 or 18fps , telecined for DVD. Animation DVD with different content rates, eg. a "12p" (11.988p) content pan, interlaced content credits, interlaced content fades. Live action and CG e.g. Star Trek derivatives - main sections have 23.976p content, but CG scenes commonly were 29.97p content. ie. The "'x' content in 29.97i" cases

The majority of 25p PAL content for DVD or broadcast will use 2:2 pulldown and "encoded interlaced" . (I know you don't like that "encoded interlaced" description, but it's what people use, and MPEG2 encoders use in their GUI's - there is a tickbox "progressive" or "interlaced" for the encoding type for every one of them; it's never labelled as "frame" or "field", even though it probably should be. Just think of "interlaced" as "fields" ) . ie. The "'x' content in 25i" cases

PAL<=> NTSC conversions that preserve framecount, but speedup or slowdown (duration changed) e.g PAL that are sped up from NTSC sources, or film sources, (or vice-versa NTSC slowdown from PAL) , +/- pitch shift .

PAL<=> NTSC conversions that preserve the duration (instead of slowdown/speedup) . This style is typically used for audio focused content because audio is preserved - such as concert DVD's. e.g. if you started with 25p content , it's frame duplicated to 60000/1001, then "interlaced" and encoded as fields (If you look at separted fields, the first and last field in a group of 4 fields are selected, then weaved). For this example, in practice (e.g. a studio) , 25p content is placed on an interlaced timeline (e.g. 29.97i) in a NLE which is set to frame duplicating for the interpolation. (I know, I know... "what is interlaced timeline", just think "fields"). You could call this "25p in 29.97i" using duplication. "Combing" is visible when viewed as frames, and it can be "reversed" with IVTC (field matching and decimation)

"Field blending" is a semi common conversion style. It's the same procedure as above, but instead of full frame duplicates like above, the interpolation uses frame blending. e.g. for the 25p content conversion to NTSC example : 25p content is frame blended to 60000/1001, then "interlaced" and encoded as fields as above. Or in "studio" practice - 25p content is placed on a interlaced 29.97i NLE timeline set to blends instead of duplicates. You could argue it's slightly more smooth end result. You could call this "25p in 29.97i" using blending. "Combing" is visible when viewed as frames, and it can be partially "reversed" with "srestore" scripts (blends can post problems for completely clean reversal)

Format conversions that use weighted blending . eg. 24/1p film to 25p (then typically encoded as fields for a "PAL" DVD) , blends occur on a frame basis so there is no "combing" when viewing frames. This is known as a "convertfps" style conversion in this forum, and can be partially "reversed" with a "restorefps" script (blends can post problems for completely clean reversal)

The "'x' content in 59.94p" cases - e.g. 720p59.94 channels. eg. 23.976p content with triplicate/duplicates, 29.97p content with duplicates, "15p" (14.985p) content with 4x repeats, "12p" (11.988p) content with 5x repeats, etc...

Sometimes you don't know the provenance - eg. sometimes modern TV series are actually shot at 24000/1001, not 24/1 . So I think a guideline should be write what you have, unless you absolutely know it was converted from something else . eg. Peter Jackson shot at 48fps for some scenes, but does it matter to describe what is in your hands ? If the current scene is 24/1 p content (not slowed down, just decimated), that's what you have in your hands.

'120fps[120pps] is cinema that's been 1-to-5 interpolated to 120pps and put in frames on a 1-to-1 basis.

120fps[120pps] doesn't have to be cinema or interpolated ; it could be a gameplay recording - 120fps is a very common recording rate for gamers

Frame repeat flag in some types of AVCHD , such as AVCHD lite. It's 29.97p content (or 25p content in PAL areas), with flagged duplicates (not encoded, similar to soft telecine and repeat field flags), so playback at 59.94p or 50p consisting of duplicates from the decoder

Mixed content in "layers" - such as overlays, scrolling titles over a background , lower thirds, news tickers. The background might have different pps than the foreground elements, not sure how - or if - you want to describe it

"Improperly" produced content, such as interlaced content encoded as progressive frames instead of fields. It happens in real world. You see a disproportionate amount here and other forums, because people discuss "problems", they don't strike up conversations up because stuff just "works." e.g. 29.97i content encoded as a 29.97p stream, instead of the proper 29.97i stream as encoded fields.

Field shifting (also called phase shifted) is essentially field misalignment. Not super common, but not super rare either. It occurs in real life such as on retail DVD's , broadcast channels, even some cameras. eg. a normal progressive content stream is AaBbCcDd . aBbCcDdE is "field shifted" and looks "combed" when viewing as frames . Not sure what the notation would be ,or if you want to describe it. This can be reversed by field matching or discarding the 1st field to align

There are probably more "textbook" cases that I'm forgetting. Multi "nth" generation conversions , with edits that interrupt patterns (e.g. broadcast edits) are going to be difficult to describe, you have to take them on a case by case basis. Sometimes stuff doesn't "fit nicely in a box" - lots of weird stuff out there.

markfilipak

22nd October 2021, 06:14

@poisondeathray,
I'm crafting a long reply with notations for all of your wonderful use cases, but I first have a question about one of them.
The majority of 25p PAL content for DVD or broadcast will use 2:2 pulldown and "encoded interlaced" .
Well, 2:2 pull-down (similar to 2:3 pull-down but 2:2, instead) is taking 2 successive fields and framing them. Doesn't that just copy the input?
And regarding "encoded interlaced":
... there is a tickbox "progressive" or "interlaced" for the encoding type ...
Wouldn't "interlaced" apply only to scan-field inputs (e.g. 50sps)?
So, I think you really mean 'bobbing' interlaced 50sps to eliminate 'combing', eh?
The notation currently doesn't handle 'bobbing', just as pull-down (the scheme upon which the notation is based) doesn't handle 'bobbing'.

Am I right? Do you mean 'bob'?

poisondeathray

22nd October 2021, 22:21

Well, 2:2 pull-down (similar to 2:3 pull-down but 2:2, instead) is taking 2 successive fields and framing them. Doesn't that just copy the input?

Yes, and 2:2 pulldown is actually a misnomer - but that term is used in the industry in practice, included in TV manuals etc...

1 frame makes up 2 fields (It's 25 frames in 50 fields/s) , and functionally it is "copying" in a sense. The distinction is that "fields" implies it is not considered a native progressive signal, it's encoded interlaced (encoded as fields)

Why it matters:
Functionally, "25pN" and "25p in 25i" are the same - if handled correctly - a 25p content stream encoded progressive is ok, but a 25p content stream encoded as fields can be ok too because both fields are from the same time - provided the display chain handles it correctly. The relevance is 1) native progressive encoded 25p (ie. 25pN) content streams are technically not compatible with BD, DVD, or broadcast. It's analgous in the North American case where 29.97pN content is not compatible for BD, DVD, or broadcast scenarios either. It has to be encoded interlaced (translation: encoded as fields instead of progrssive). 2) receiving software or hardware might not handle it correctly. e.g. If you had an AVC stream, upon detecting "field_pic_flag=1" , some might incorrectly apply a deinterlace ("deinterlace" in this context, meaning interpolating the missing scan lines from a set of even or odd scan lines), thus degrading the image . This commonly occurs in editors - often you have to manually "interpret the footage" as progressive if it was 25p content (25pps) stream encoded interlaced (encoded with field encoding), otherwise it gets mishandled. It's important distinguish the actual content (pps to you ) and encoding type (p or i , again "i" would be fields) . Native progressive encoded streams are almost never mishandled.

And regarding "encoded interlaced":

Wouldn't "interlaced" apply only to scan-field inputs (e.g. 50sps)?
So, I think you really mean 'bobbing' interlaced 50sps to eliminate 'combing', eh?
The notation currently doesn't handle 'bobbing', just as pull-down (the scheme upon which the notation is based) doesn't handle 'bobbing'.

Am I right? Do you mean 'bob'?

No ; I don't mean "bobbing" to eliminate "combing" . (Bobbing in video forums usually means separating fields and interploating the missing scanlines, so even,odd,even,odd fields become even, odd, even odd full frames)

"encoding interlaced" refers to the output, regardless of the input. It has no regard for the actual input (actual pps). It means the encoding type is set to "fields", instead of "frame" , (and instead of mbaff in encoders that support it)

markfilipak

23rd October 2021, 01:22

The majority of 25p PAL content for DVD or broadcast will use 2:2 pulldown and "encoded interlaced" .
Well, 2:2 pull-down (similar to 2:3 pull-down but 2:2, instead) is taking 2 successive fields and framing them. Doesn't that just copy the input?
Yes, and 2:2 pulldown is actually a misnomer - but that term is used in the industry in practice, included in TV manuals etc...

1 frame makes up 2 fields (It's 25 frames in 50 fields/s) , and functionally it is "copying" in a sense.
In what sense is it not copying? What term "is used in the industry in practice"? Is the term "2:2 pull-down for 25p"? or is the term "'encoded interlaced' for 25p"? "Encoded interlaced" seems to be very important to you, so I ask because what's important to you should be important to me: Doesn't "encoded interlaced" apply solely to 'i' video? If yes, then how can that apply to 25p? I'm lost.

The distinction is that "fields" implies it is not considered a native progressive signal, it's encoded interlaced (encoded as fields)
I can only comment about blocks in macroblocks in encoded MPEG elemental streams because that's all I know. All macroblocks have fields: For YUV420, 2, 8x8 blocks per Y field, and 1, 8x4 half-block per field of U & V. In frame-based macroblocks, fields are continuous (progressive). in field-based macroblocks, fields are interleaved (which is MPEG-speak for "interlaced") but they both have fields. Of course, that all disappears in the decoder's raw stream output (in which "field" is just an analog for odd/even lines). What does that have to do with 2:2 pull-down and how does it mean the 2:2 pull-down doesn't just copy frames?

More to follow when my current non-understanding is resolved...

poisondeathray

23rd October 2021, 05:13

What term "is used in the industry in practice"? Is the term "2:2 pull-down for 25p"? or is the term "'encoded interlaced' for 25p"?

Both actually, but that was referring to 2:2 pulldown because 2:2 pulldown was in the same sentence, and 2:2 pulldown was quote.

The "2:2 pulldown" (misnomer) is used to refer to "25p content encoded interlaced", or "25p content telecined" .

"Encoded interlaced" seems to be very important to you, so I ask because what's important to you should be important to me: Doesn't "encoded interlaced" apply solely to 'i' video? If yes, then how can that apply to 25p? I'm lost.

"encoded interlaced" vs. "encoded progressive" is important - I mentioned potential ramifications with compatibility, mishandling of streams, NLE's.

I think that's where the confusion is :
"Encoded interlaced" does not apply solely to "i" content video. You can feed an encoder progressive content or interlaced content. "Encoded interlaced" vs. "Encoded progressive" does not change the "structure" of the input. It does not do things like create, duplicate or drop fields, it does not bob, it does not deinterlace or any sorts of those things. It's just a tickbox in an encoder that changes they way it encodes, and the flagging/signalling

What does that have to do with 2:2 pull-down and how does it mean the 2:2 pull-down doesn't just copy frames?

"Progressive content encoded interlaced" is what the 2-2 pulldown (misnomer) refers to. "Progressive content encoded interlaced" in the 25p content scenario can also be described as "25p content telecine". Similarly in North American situation "29.97p content encoded interlaced" can be described as "29.97p content telecine"

It's a "copy" in the sense that the output structure is essentially the same as the input. This seems to be what you are focused on . When the "interlaced" encoding tick box is checked - there is nothing "weird" going on like reorganizing fields, bobbing, changing "pps" - that sort of thing.

What some decoder, or program does later to the video based on whether it's encoded "p" or "i" - That's another step after the encoder - but it has relevance and repercussions as pointed out in post 24.

It's a not a "copy" in the sense that we're talking about lossy encoding (not just MPEG2; AVC is also used in BD and broadcast). Bits are discarded and changed. There are quality differences between "Progressive" encoding vs "interlaced" encoding (or in AVC , Progressive, vs. PAFF vs. MBAFF) encoding at same bitrate for a progressive input signal into the encoder such as "25p content" . For example, different predicted motion vectors (frame vs. field prediction), zig zag scan vs. alternate scan order, frame vs. field dct. In general, for progressive content input, progressive encoding will yield higher quality than interlaced encoding. So that's the 3rd reason , on why identifying "type of encoding, p or i" matters.

Hard vs soft telecine in the NTSC DVD scenario is another example related to this identifying "type of encoding", p or i. Hard vs. soft telecine has ramifications regarding quality, and methods of IVTC. One of the cases listed earlier was "wrong" processing - interlaced content encoded progressive - and that causes a bunch of specific issues too.

markfilipak

23rd October 2021, 05:29

Both ...
I'm sorry, my friend. I know you spent a lot of time writing your reply. But I still have no idea how 2:2 pull-down can produce any raw (y4m) stream that's different from what came out of the decoder.

poisondeathray

23rd October 2021, 05:39

I'm sorry, my friend. I know you spent a lot of time writing your reply. But I still have no idea how 2:2 pull-down can produce any raw (y4m) stream that's different from what came out of the decoder.

Source => MPEG2 encode progressive at "x" bitrate => decoder => YUV(1)

Source => MPEG2 encode interlaced at "x" bitrate => decoder => YUV(2)

YUV(1) will not equal YUV(2)

Functionally, it will appear to be same (in terms of structure). ie. if source is 25p content , YUV(1) and YUV(2) will still be 25p content

But there will be quality differences (mostly negligible) when adequate bitrate is used.

Why do people bother adding "25p content... but encoded interlaced" ? Why not just "25p content"? (or 25pps) ? Why the distinction ? - Because of the potential consequences mentioned earlier. The potential mishandling of the streams because they are encoded (and "labelled") "i" instead of "p". They "look" basically the same from the decoder if you examine the YUV stream (minor differences) - but the "i" vs. "p" difference can cause the stream to be butchered in some software, or to become non compliant in some situations like BD , DVD.

markfilipak

23rd October 2021, 07:01

Source => MPEG2 encode progressive at "x" bitrate => decoder => YUV(1)

Source => MPEG2 encode interlaced at "x" bitrate => decoder => YUV(2)

YUV(1) will not equal YUV(2)

Functionally, it will "appear" to be same (in terms of structure), if you use high enough bitrate , but there will be encoding quality differences

Yeah, as I went for some food it occurred to me that you're talking about encoders, not streams.

Yes, I know that interlaced pictures, especially interlaced YUV420 pictures in which Cb & Cr are 'splashed' across 3 lines (across line N of the 1st field, line N+1 of the 2nd field, and line N+2 of the 1st field), will be different from progressive pictures, even YUV 420 progressive. Such spacial differences are outside the scope of the notation I'm developing. I apologize if I wrote anything to the contrary. I intend to separate frames & fields from pictures half-pictures & scans primarily to handle temporal structure because that's what gives folks the most trouble and because a notation, any notation, cannot possibly cover everything. I'm trying to create a means to get the basics right that's not too difficult to parse in order for people to free up enough mental bandwidth to absorb the spacial stuff without going into overload.

As I wrote earlier, I'm revising the 'presentation' [note] in order to accommodate bob as easily as it accommodates cadenced half-picture drops (aka telecine) and picture drops, with and without speed changes. Your use cases are great for that purpose.

[note]
This notation employs a field mapping scheme that is similar to the field mapping method employed to specify pull-down.
But the pull-down mapping method can only duplicate (not drop) fields -- it is not a comprehensive mapping scheme.
In contrast, ffmpeg's 'shuffleframes' mapping method can duplicate/drop/reorder frames.
But the shuffleframes mapping method is limited to frames (not fields) -- it is not a comprehensive mapping scheme.
This mapping scheme works like the 'shuffleframes' method but, 1, extends to fields, and 2, uses '-' instead of '-1' as its drop token.

I shall continue with the use cases you've graciously provided but will skip the 2:2 pulldown and "encoded interlaced" case until such time that I fully understand it. ;)

markfilipak

2nd November 2021, 06:51

poisondeathray, where have you been all my life? :cool:

Did you mean PsF ?
Does "PsF" mean "progressive with separated fields"? If so, then they're halfpics, eh?
You're collecting "disciples." Look out Betamax ! :devil:
Oh, you devil.

But I'll try to help you improve the notation and describe some examples. If you want farther explanation or video samples (in most video forums, a video sample is worth a 10,000 words) let me know
Thanks! But rest assured that I will faithfully endeavor to not burden you.
Some common NTSC cases have been mentioned here already, but they should be put together in one handy reference - your homework project :) . Hard and soft telecine with different content rates . eg. super8 or 8mm content at 16 or 18fps , telecined for DVD. Animation DVD with different content rates, eg. a "12p" (11.988p) content pan, interlaced content credits, interlaced content fades. Live action and CG e.g. Star Trek derivatives - main sections have 23.976p content, but CG scenes commonly were 29.97p content. ie. The "'x' content in 29.97i" cases
My experience is solely with commercial BDs and DVDs. An opportunity to expand it to include other media is very welcome. I will address each homework assignment separately in the following paragraphs and posts.

First, to serve as an example: How about the notation for something complex but familiar?

4-to-5 Telecine: 2x[24pps] picture double : // Hey! Skeletal software is over here!
(2-3 pull-down) \____________(A+a)(A+a).. : halfpic getHalfpicByMask(N, mask) { if (!mask[N % mask.length]) return "drop"; ... }
96hps[2x[24pps]] deinterlace
\__________________(A)(a)(A)(a)(B)(b)(B)(b)(C)(c)(C)(c)(D)(d)(D)(d)
2+00+3+00+2+00+3[96hps[2x[24pps]]] drop #s 3 4 8 9 12 13 of 16 in 3 strides/s -- output is 60hps
\___________________________________(A)(a) (B)(b)(B) (c)(C) (d)(D)(d)
30fps[2+00+3+00+2+00+3[96hps[2x[24pps]]]] interlace (60hps to 30pps) and frame at 30fps
\_________________________________________[A+a][B+b][B+c][C+d][D+d]
30'fps=30fps[2+00+3+00+2+00+3[96hps[2x[24pps]]]] and retime to (30/1.001)fps : // build mask array from mask notation and process the stream
: mask = [1,1,0,0,1,1,1,0,0,1,1,0,0,1,1,1];
: for (N=1; !stop(); N++) procHalfpic(getHalfpicByMask(N, mask));

Note: 30' is an abbreviation of 30/1.001.
Note: 30fps=30'fps resets time base, not actual frames.

Oh, by the way, streams flow right to left, but to preserve sanity, masks & maps 'read' left to right.

[note 1] The following notations are all equivalent; all perform the same task.
Given a left-to-right (time ordered) stream of 16 halfpics: (A)(a)(A)(a)(B)(b)(B)(b)(C)(c)(C)(c)(D)(d)(D)(d),
the following methods all output (A)(a) (B)(b)(B) (c)(C) (d)(D)(d).
By stride masking:
2+00+3+00+2+00+3 Mask the first 2 halfpics ON, the next 2 OFF, 3 ON, 2 OFF, 2 ON, 2 OFF, and 3 ON;
2+0+0+3+0+0+2+0+0+3 mask the first 2 halfpics ON, the next OFF, the next OFF, the next 3 ON, etc.;
1+1+0+0+1+1+1+0+0+1+1+0+0+1+1+1 mask the halfpics: ON ON OFF OFF ON ON ON OFF OFF ON ON OFF OFF ON ON ON;
2x1+00+3x1+00+2x1+00+3x1 mask 2 ON, 2 OFF, 3 ON, 2 OFF, 2 ON, 2 OFF, 3 ON;
2x1+2x0+3x1+2x0+2x1+2x0+3x1 mask 2 ON, 2 OFF, 3 ON, 2 OFF, 2 ON, 2 OFF, 3 ON.
Don't write this: 11+00+111+00+11+00+111, unless the stride actually has eleven ON, two OFF, one hundred eleven ON, etc., and
don't write this: 1100111001100111, unless the stride actually contains more than a million-million halfpics.
By stride mapping:
1,2,0,0,3,4,5,0,0,6,7,0,0,8,9,10 Map halfpic 1 to output 1, 2 to 2, 3 & 4 to nowhere (drop), 5 to 3, 6 to 4, 7 to 5, drop 8 & 9, 10 11 to 6 7, drop 12 13, and 14 15 16 to 8 9 10;
1,2,0,0,3-5,0,0,6,7,0,0,8-10 map to 1 2 drop drop 3..5 drop drop 6 7 drop drop 8..10;
1,2,,,3,4,5,,,6,7,,,8,9,10 map 1 2, drop 2, map the next 3 to 3 4 5, drop 2, map the next 3 to 8 9 10;
1,2,,,3-5,,,6,7,,,8-10 map: 1 2 drop drop 3..5 drop drop 6 7 drop drop 8..10.
Note that all the examples above implicitely have 16 members that explicitely span a stride of 16 input halfpics.
Tips:
- Use masking or mapping, but don't mix them. Use whichever notation makes the most sense.
- In most cases, masking is more compact and easier to read -- why it exists. But in some cases, mapping can do more (such as shuffling halfpics around).
- All of the above apply equally to pictures and scans, also.

And to complement 4-to-5 Telecine...

5-to-4 Detelecine: 30fps=30'fps retime to 30fps : // Hey! Skeletal software is over here!
60hps[30fps=30'fps] deinterlace
1-4,,6,5,,7,8[60hps[30fps=30'fps]] swap #s 6-7 & drop #s 5 8 of 10 in 4 strides/s
24fps[1-4,,6,5,,7,8[60hps[30fps=30'fps]]] interlace the resulting 48hps to 24pps and frame at 24fps
\ \ \ \________________[A+a][B+b][B+c][C+d][D+d]
\ \ \_____________________(A)(a)(B)(b)(B)(c)(C)(d)(D)(d) : halfpic getHalfpicFromMap(N, map) { if (!map[N % map.length]) return "drop"; ... }
\ \__________________________________(A)(a)(B)(b)(C)(c)(D)(d) : // build map array from map notation and process the stream
\______________________________________[A+a][B+b][C+c][D+d] : map = [1,2,3,4,0,6,5,0,7,8];
: for (N=1; !stop(); N++) procHalfpic(getHalfpicFromMap(N, map));
[A][a][B][b][B][c][C][d][D][d]
"shuffleframes= 0 1 2 3 -1 5 4 -1 6 7" In FFmpeg's 'shuffleframes' filter, indexes must be separated by one and only one space (otherwise, error).
"1, 2, 3, 4, 0, 6, 5, 0, 7, 8" In this notation's stride map, there's commas instead of spaces but spaces can be added for readability.
[A][a][B][b] [c][C] [D][d] <== step 1: Drop some inputs
[A][a][B][b][C][c] [D][d] <== step 2: Move some inputs
'shuffleframes' works solely on so-called frames, so it must be preceeded by 'separatefields'.
Stride maps work on halfpics and scans and even pictures, too.

Now that the appetizers are over, it's on to the main course...
super8 or 8mm content at 16 or 18fps , telecined for DVD
Well, I think that 16fps & 18fps from film are 16pps & 18pps, while from a camcorder, they are actually 32sps & 36sps. So, when mastered for DVD, I think they'd be:
1 - 30fps[fixup[16pps]] -- 16fps film (progressive)
2 - 30fps[fixup[18pps]] -- 18fps film (progressive)
3 - 30fps[fixup[32sps]] -- 16fps digital (interlaced) ...I don't know whether you intended to include this, but it doesn't hurt, eh?
4 - 30fps[fixup[36sps]] -- 18fps digital (interlaced) ...I don't know whether you intended to include this, but it doesn't hurt, eh?

Now, for the fixups. ...Let's see...

Some skeletal code snips ==> : picture getPictureByMask(N, mask) { if (!mask[N % mask.length]) return "drop"; ... }
(there's no loss if ignored) : halfpic getHalfpicByMask(N, mask) { if (!mask[N % mask.length]) return "drop"; ... }
: scan getScanByMask(N, mask) { if (!mask[N % mask.length]) return "drop"; ... }
: halfpic getHalfpicFromMap(N, map) { if (!map[N % map.length]) return "drop"; ... }

1 - 30fps[fixup[16pps]] -- fixup: Picture drop, or 8-to-15 Telecine.

1.1 - Picture drop: 30fps[15+0[2x[16pps]]]] drop # 16 of 16 in 2 strides : // strides = |maskin-maskout| = |32pps-30pps| = 2; stridesPerSecond = |maskin|/strides = |32pps|/2 = 16
\ \_____________..(G+g)(G+g)(H+h)(H+h) : mask = [1,1, 1,1, 1,1, 1,1, 1,1, 1,1, 1,1, 1,0];
\_________________..(G+g)(G+g)(H+h) : for (N=1; !stop(); N++) procPicture(getPictureByMask(N, mask));

1.2 - 8-to-15 Telecine: 30fps[27+00+3[64hps[2x[16pps]]]] drop #s 4 5 of 32 in 2 strides : // strides = |maskin-maskout| = |64hps-60hps|/2 = 2; stridesPerSecond = |maskin|/strides = |64hps|/2 = 32
(6x4-3-3 pull-down) \ \___________________..(G)(g)(G)(g)(H)(h)(H)(h) : mask = [1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,0, 0,1,1,1];
\__________________________..(G)(g)(G)(h)(H)(h) : for (N=1; !stop(); N++) procHalfpic(getHalfpicByMask(N, mask));

2 - 30fps[fixup[18pps]] -- fixup: Picture drop, or 3-to-5 Telecine.

2.1 - Picture drop: 30fps[5+0[72hps[2x[18pps]]] drop #s 11 12 of 12 in 6 strides : // strides = |maskin-maskout| = |36pps-30pps| = 6; stridesPerSecond = |maskin|/strides = |36pps|/6 = 6
\ \___________________(A+a)(B+b)(C+c)(D+d)(E+e)(F+f) : mask = [1,1,1,1,1,0];
\______________________(A+a)(B+b)(C+c)(D+d)(E+e) : for (N=1; !stop(); N++) procPicture(getPictureByMask(N, mask));

2.2 - 3-to-5 Telecine: 30fps[3+00+7[72hps[2x[18pps]]]] drop #s 4 5 of 12 in 6 strides : // strides = |maskin-maskout| = |72hps-60hps|/2 = 6; stridesPerSecond = |maskin|/strides = |72hps|/6 = 12
(4-3-3 pull-down) \ \___________________(A+a)(A+b)(B+b)(B+b)(C+c)(C+c) : mask = [1,1,1,1, 1,1,1,0, 0,1,1,1];
\_________________________(A+a)(A+b)(B+b)(B+c)(C+c) : for (N=1; !stop(); N++) procPicture(getPictureByMask(N, mask));

3 - 30fps[fixup[32sps]] -- fixup: Scan pair drop, Bob'n'drop, or Bob'n'tele.

3.1 - Scan pair drop: 30fps[2x[15+0[16pps[32sps]]]] drop #s 31 32 of 32 in 1 stride : // strides = |maskin-maskout| = |16pps-15pps| = 1; stridesPerSecond = |maskin|/strides = |16pps|/1 = 16
\ \________________(1+2)..(29+30)(31+32) : mask = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0]
\____________________(1+2)..(29+30) [note 2] : for (N=1; !stop(); N++) procPicture(getPictureByMask(N, mask));

3.2 - Bob'n'drop: 30fps[30+00[2x[32sps]]] drop #s 31 32 of 32 in 2 strides : // strides = |maskin-maskout| = |64sps/2-30pps| = 2; stridesPerSecond = |maskin|/strides = |64sps|/2 = 32
\ \_____________(1)(1)..(15)(15)(16)(16) : mask = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0]
\__________________(1)(1)..(15)(15) [note 2] : for (N=1; !stop(); N++) procScan(getScanByMask(N, mask));

3.3 - Bob'n'tele: 30fps[9+00+1[2x[36sps]]] drop #s 10 11 of 12 in 6 strides : // strides = |maskin-maskout| = |64sps-60sps|/2 = 2; stridesPerSecond = |maskin|/strides = |64sps|/2 = 32
(14x2-1-1 pull-down) \ \ \_____________(1)(1)..(14)(14)(15)(15)(16)(16) : mask = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1];
\ \___________________(1)(1)..(14)(14)(15)(16) : for (N=1; !stop(); N++) procScan(getScanByMask(N, mask));
\_______________________[1+1]..[14+14][15+16]

4 - 30fps[fixup[36sps]] -- fixup: Scan pair drop, Bob'n'drop, or Bob'n'tele.

4.1 - Scan pair drop: 30fps[2x[5+0[18pps[36sps]]]] drop #s 11 12 of 12 in 3 strides : // strides = |maskin-maskout| = |18pps-15pps| = 3; stridesPerSecond = |maskin|/strides = |18pps|/3 = 6
\ \________________(1+2)..(9+10)(11+12) : mask = [1,1,1,1,1,0];
\___________________(1+2)..(9+10) [note 2] : for (N=1; !stop(); N++) procPicture(getPictureByMask(N, mask));

4.2 - Bob'n'drop: 30fps[5+0[36pps[2x[36sps]]] drop # 6 of 6 in 6 strides : // strides = |maskin-maskout| = |36pps-30pps| = 6; stridesPerSecond = |maskin|/strides = |36pps|/6 = 6
\ \___________________(1+1)..(5+5)(6+6) : mask = [1,1,1,1,1,0];
\______________________(1+1)..(5+5) [note 2] : for (N=1; !stop(); N++) procPicture(getPictureByMask(N, mask));

4.3 - Bob'n'tele: 30fps[9+00+1[2x[36sps]]] drop #s 10 11 of 12 in 6 strides : // strides = |maskin-maskout| = |72sps-60sps|/2 = 6; stridesPerSecond = |maskin|/strides = |72sps|/6 = 12
(4x2-1-1 pull-down) \ \ \_____________(1)(1)..(4)(4)(5)(5)(6)(6) : mask = [1,1,1,1,1,1,1,1,1,0,0,1];
\ \___________________(1)(1)..(4)(4)(5)(6) : for (N=1; !stop(); N++) procScan(getScanByMask(N, mask));
\_______________________[1+1]..[4+4][5+6]

[note 2] Note the '2x' placement in §3.1 v. §3.2 and §4.1 v. §4.2 and the difference it makes.

As I worked, this occurred to me: With better knowledge of python and VapourSynth functions, I could write a notation interpreter (a python script) to directly execute the right VapourSynth stuff in the right order.

Apparently, posts are limited to 16000 chars. This seems a good place to break. If you see an error (or think you see an error), kindly tell me. We'll discuss it, okay? (I'm relying on smart people and if I can't resolve issues, then this scheme is broken.)

PS: How could a real video geek possibly resist this stuff, eh? :sly: ...Is this the path to Fame & Fortune or Ill Fame & Ms Fortune?

markfilipak

3rd November 2021, 22:53

...continued
From the preceeding:

24pps, 4-to-5 Telecine: 30fps[2+00+3+00+2+00+3[96hps[2x[24pps]]]] | Useful principles to note:
24fps, 5-to-4 Detelecine: 24fps[1-4,,6,5,,7,8[60hps[30fps=30'fps]]] | 4-to-5 Telecine: ...[2+00+3+00+2+00+3[...
16pps, Picture drop: 30fps[15+0[2x[16pps]]]] | 8-to-15 Telecine: ...[27+00+3[...
16pps, 8-to-15 Telecine: 30fps[27+00+3[64hps[2x[16pps]]]] | 3-to-5 Telecine: ...[3+00+7[...
18pps, Picture drop: 30fps[5+0[72hps[2x[18pps]]] | Bob'n'tele: ...[9+00+1[...
18pps, 3-to-5 Telecine: 30fps[3+00+7[72hps[2x[18pps]]]] | \______Telecine stride masks all end in odd numbers ...always true.
32sps, Scan pair drop: 30fps[2x[15+0[16pps[32sps]]]] | Telecine always requires either halfpics or scans.
32sps, Bob'n'drop: 30fps[30+00[2x[32sps]]] | Move zeros left to 'expose' an odd # when converting a drop to a telecine ...almost mindlessly easy.
36sps, Bob'n'tele: 30fps[9+00+1[2x[36sps]]] | +0+ # of zeros must be odd when dropping pictures.
36sps, Scan pair drop: 30fps[2x[5+0[18pps[36sps]]]] | +00+ # of zeros must be even when dropping halfpics or scans.
36sps, Bob'n'drop: 30fps[5+0[36pps[2x[36sps]]] | \______Use odd or even need to determine whether pps (or hps or sps) are required.
36sps, Bob'n'tele: 30fps[9+00+1[2x[36sps]]] |

As the notational scheme comes into focus, so does its key architectural properties:

- In analog TV, pictures or scans were shot, fields were played.
- In analog TV, "field" and "scan" were synonymous.
- In digital TV, pictures or scans are shot, frames are played.
- In digital TV, "field" is an undifferentiated pseudonym meaning either "halfpic" or "scan" depending on context.
- Think of the notational elements as being like Legos.
- fps means frame rate (or may mean field rate in some contexts).
- fps is playing rate for digital TV -- frames are what are played, not pictures or halfpics or scans.
- [fps] is invalid.
- ...fps=...fps denotes a frame rate metadata (and play rate) change without any internal change to pictures, halfpics, or scans (or to internal rates).
- ...fps...[pps] is what poisondeathray calls "encoded progressively", I think.
- ...fps...[sps] is what poisondeathray calls "encoded interlaced", I think.
- hps is never shooting rate -- halfpics are never shot but always result from a conversion.
- hps is understood to be 2*pps, even when pps is not explicitly stated. [was: "pps/2", corrected 4 Nov 2021]
- 2x[hps (or 3x, 4x, etc.) duplicates each halfpic, halfpic by halfpic, 2 (or 3, 4, etc.) times and doubles (or triples, quadruples, etc.) halfpic rate.
- fps[hps interlaces & frames halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if fps != 2*hps).
- hps[hps creates new halfpics via interpolation -- the factor doesn't have to be noted -- to achieve a new halfpic rate.
- pps[hps interlaces halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if pps != 2*hps).
- pps (not surrounded by brackets) is some intermediate picture rate added for clarity.
- pps is understood to be 2xhps, even when hps is not explicitly stated.
- [pps] is shooting picture rate (called: "progressive").
- 2x[pps (or 3x, 4x, etc.) duplicates each picture, picture by picture, 2 (or 3, 4, etc.) times and doubles (or triples, quadruples, etc.) picture rate.
- fps[pps implies framing (and fast/slow motion if fps != pps).
- hps[pps deinterlaces from pps.
- pps[pps creates new pictures via interpolation -- the factor doesn't have to be noted -- to achieve a new picture rate.
- sps[pps is invalid.
- sps is synonymous with field rate.
- sps is understood to be 2*pps, even when pps is not explicitly stated. [was: "pps/2", corrected 4 Nov 2021]
- sps (not surrounded by brackets) is some intermediate scan rate added for clarity.
- sps was playing rate for analog TV -- scans (transported by fields) were what were played, not pictures or halfpics.
- [sps] is shooting scan rate (called: "interlaced/interleaved").
- 2x[sps (or 3x, 4x, etc.) duplicates each scan, scan by scan, 2 (or 3, 4, etc.) times and doubles (or triples, quadruples, etc.) scan rate.
- fps[sps interlaces & frames scan pairs, beginning with a stream's 1st scan, (and produces fast/slow motion if fps != 2*sps).
- pps[sps interlaces scan pairs, beginning with a stream's 1st scan, (and produces fast/slow motion if pps != 2*sps).
- sps[sps creates new scans via interpolation -- the factor doesn't have to be noted -- to achieve a new scan rate.

Those properties and their distinctions force everything else.
All other aspects of the notation involve operations/conversions of some sort.
For example:
120fps[24pps] (read: "24 shot pictures per second played at 120 frames per second") results in 5x playing rate (i.e. fast motion), whereas
120fps[120pps[24pps]] (read: "24 shot pictures per second converted to 120 pictures per second played at 120 frames per second") specifies some sort of 1-to-5 picture interpolation.

120fps[120pps[24pps]], an example:
86,400 pictures shot at 24pps (shooting rate) have a shooting time of 1 hour.
432,000 pictures in 432,000 frames have a playing time of 1 hour.
The trick is to interpolate the extra 345,600 pictures from the 86,400 shot pictures.

In contrast, 120fps[5x[24pps]] (read: "24 shot pictures per second with 5x duplication played at 120 frames per second") also has a shooting time of 1 hour but without interpolation.

I'm using "30'" & "60'" as contractions of "30/1.001" & "60/1.001".
PAL<=> NTSC conversions that preserve framecount, but speedup or slowdown (duration changed) e.g PAL that are sped up from NTSC sources, ...
These are easy:

30'fps[50sps] (read: "50 shot scans per second played at 30/1.001 frames per second", implied: 2 scans per frame).

This of course is never done because James Earl Jones would sound like Mickey Mouse and walk like Charlie Chaplin.

25fps[60'sps] (read: "60 shot scans per second played at 30/1.001 frames per second", implied: 2 scans per frame).

This of course is never done because Mickey Mouse would sound like James Earl Jones and walk like Frankenstein's monster.
... or film sources, ...
Again, slow/fast motion are easy:

30'fps[24pps] (read: "24 shot pictures per second played at 30/1.001 frames per second", implied: 1 picture per frame).

This of course is never done because Mickey Mouse would sound like James Earl Jones and walk like Frankenstein's monster.

25fps[24pps] (read: "24 shot pictures per second played at 25 frames per second", implied: 1 picture per frame).

This is often done because the speed up is only 4%.

25fps[24pps] for novices:
What's implied but rarely stated is: 1 frame always contains 1 picture.
So, 86,400 pictures shot at 24pps is a shooting time of 1 hour,
but 86,400 pictures played at 25fps -- implied: 1 picture per frame -- is a playing time of only 57 minutes 36 seconds.
Thus, for every hour of movie, 25fps plays 2 minutes, 24 seconds shorter.
That's 1.04x fast motion: objects appear to move 4% faster, characters talk 4% faster and with 4% raised pitch, playing time is 4% shorter than shooting time.

The way to correct the playing speed is to slow it down.
This: 24fps=25fps[24pps], changes the frame rate from 25fps to 24fps without changing the frames (or the pictures).
Note that 24fps=25fps[24pps] is a metadata-only change that does not/should not require video transcoding.

PAL<=> NTSC conversions that preserve the duration (instead of slowdown/speedup) . ...

NTSC to PAL: 60sps[30fps=30'fps] source is (60/1.001)fps
25fps[?[60sps[30fps=30'fps]]] target is 25fps
==> 25fps[5x+0[60sps[30fps=30'fps]]] |60sps-50sps| = 10 strides @ |60sps|/10 = 6 spsPerStride

PAL to NTSC: [50sps] source is 50sps
30'fps=30fps[?[50sps]] target is 30'fps
30'fps=30fps[?[50sps]] |50sps-60sps| < 0 strides, so 2x 50sps to 100sps
30'fps=30fps[?[2x[50sps]]] |100sps-60sps| = 40 strides @ |100sps|/40 = 2.5 spsPerStride ...oops! fractional mask. that doesn't work, does it? try again
30'fps=30fps[?[5x[50sps]]] |300sps-60sps| = 240 strides @ |300sps|/240 = 1.25 spsPerStride ...oops! fractional mask. that doesn't work, does it? try again
30'fps=30fps[?[10x[50sps]]] |600sps-60sps| = 540 strides @ |600sps|/540 = 1.[1..] spsPerStride ...Hmmm... I've been 'here' before :sly:
==> 30'fps=30fps[60sps[50sps]] Interpolation. It's the only way. Tell me the precision to which to "preserve the duration" and I'll give you something more informative.

Next up:
... This style is typically used for audio focused content because audio is preserved - such as concert DVD's. e.g. if you started with 25p content , it's frame duplicated to 60000/1001, then "interlaced" and encoded as fields (If you look at separted fields, the first and last field in a group of 4 fields are selected, then weaved).
For this example, in practice (e.g. a studio) , 25p content is placed on an interlaced timeline (e.g. 29.97i) in a NLE which is set to frame duplicating for the interpolation.
(I know, I know... "what is interlaced timeline", just think "fields").
You could call this "25p in 29.97i" using duplication.
"Combing" is visible when viewed as frames, and it can be "reversed" with IVTC (field matching and decimation)
in pieces:
... if you started with 25p content , it's frame duplicated to 60000/1001 ...
You can't get from 25fps to 30fps by frame duplication.
You can get there only by interpolation and/or wildly complex telecine, neither of which touch the audio, so would be fine for concert material.
... If you look at separted fields, the first and last field in a group of 4 fields are selected, then weaved ...
Hmmm... I understand you of course. I could do it with modulo arithmetic in an FFmpeg filter graph

split[1][2][3][4],[1]select=eq(mod(n\,4)\,1)[5],[2]select=eq(mod(n\,4)\,2)[6],[3]select=eq(mod(n\,4)\,3)[7],[4]select=eq(mod(n\,4)\,0)[8],[5][8]interleave,weave[9],[6][7][9]interleave=nb_inputs=3

But I have a problem with the "weave" part.
For this, let's assume SAR 1:1, okay?
The 4 fields (which I call halfpics) start 2:1.
The 1st & 4th are weaved.
The woven image is now 1:1 (a picture) while the original 2nd and 3rd are still 2:1 (halfpics).
Now what?
Do you bob (double & weave) [6] & [7] before the 'interleave'?
That would give you 3 pictures (6 NTSC 'fields') for every 4 PAL 'fields' -- not what I'd call a 25fps-to-30'fps conversion, eh?

PS:
@poisondeathray, I think I can do everything you outline with the notation, I just have to understand what you wrote.

No doubt, you see how a comprehensive notation would serve as a sort of lingua franca in lieu of so much verbiage and confusion, eh?

this particular use-case to be continued...
this whole topic to be continued...

poisondeathray

4th November 2021, 02:38

Does "PsF" mean "progressive with separated fields"? If so, then they're halfpics, eh?

PsF = progressive segmented frames. You'd typically only see it in a studio setting. Most consumer displays do not support PsF , and SDI / HD-SDI connnections are usually required (usually a consumer display won't have those connections). PsF is the offical name used in ITU BT.709. PsF a method of transmitting progressive content, to make it compatible with interlace equipment. Functionally, PsF is similar to "x" progressive content in "i" . It can be thought as a "field" in that sense. For example 25p content would be sent as 50 "segmented frames". Or 29.97p content sent as "59.94 segmented frames". It also has the same organization of a progressive frame split into even and odd lines as "interlace encoding". But 23.976PsF isn't "in 59.94" with pulldown like you would normally expect in DVD, or broadcast as "telecine" ; 23.976PsF would mean in "23.976p in 47.952 sf/s" , and "24/1p in 48 sf/s") . 47.952Hz and 48/1Hz are not common frequencies in consumer level hardware, hence the telecine for consumer consumption.

If you're using this following definition "for halfpics" , I think it's similar concept wise , but the notation is a bit confusing

"hps" -- half-pictures/sec, e.g. 24fps[24hps], to indicate temporally aligned fields pulled out of a picture -- the MPEG folks use this term in one place that I've seen. In most places in the specs, they call framed fields "pictures" but in one section they call them "half pictures".

Are both aligned fields "pulled out" (the field pair for a progressive frame), at the same time ? So actually half pic "pairs" /s ? 24 half pics /s ([hps]?)would imply 12 full pictures /s . or was the decision because "hpps" too "wordy" ? ie. Shouldn't it be 48hps ?

Or does it indicate a single separated field ? (a half pic) . If so, which one of the pair ?

I think that 16fps & 18fps from film are 16pps & 18pps, while from a camcorder, they are actually 32sps & 36sps. So, when mastered for DVD, I think they'd be:

Camcorder - it depends on the recording setting. e.g. a common method is speeding up a variable speed projector to 18fps to 20fps , and recording camera at 60/1 fps progressive (triplicates). Then decimating back to 20fps (unique frames), and slowing it down to 18fps (same framecount, just slowed down) . For the DVD step, it's just telecine (but it can be soft telecine too; field repeats aren't necessarily encoded)

30'fps[50sps] (read: "50 shot scans per second played at 30/1.001 frames per second", implied: 2 scans per frame).

This of course is never done because James Earl Jones would sound like Mickey Mouse and walk like Charlie Chaplin.

25fps[60'sps] (read: "60 shot scans per second played at 30/1.001 frames per second", implied: 2 scans per frame).

This of course is never done because Mickey Mouse would sound like James Earl Jones and walk like Frankenstein's monster.

Yes, what you describe is never done, obviously. "PAL", "NTSC" was used to denote the region of the DVD, not the content (pps). "PAL" does not necessarily mean 50 fields/s interlaced content. It could mean 25p content (25pps) , such as in this case . "PAL<=> NTSC conversions that preserve framecount, but speedup or slowdown (duration changed)" - refers only to 23.976p <=> 25p content conversions - those are the only times the "speedup" and "slowdown" terms are ever used in the PAL<=>NTSC DVD video context. I guess I shouldn't have assumed it common knowledge.

... if you started with 25p content , it's frame duplicated to 60000/1001 ...

You can't get from 25fps to 30fps by frame duplication.

The first step is 25fps to 59.94fps via frame repeats in a pattern (ie. "25p in 59.94p").

But I have a problem with the "weave" part.

Only the selection set is weaved. The non selected are discarded

I'll need some time to digest the other stuff...

markfilipak

4th November 2021, 06:36

...continued

25fps v. a sort of 30fps that poisondeathray is trying to describe in words.
<---------------------------------------------------------------------------------------- 1/5 s ----------------------------------------------------------------------------------->
[A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [
[a] [b] [c] [d] [e] [f] [g] [h] [i] [j] [
<---PICTURE-A---> <---PICTURE-B---> <---PICTURE-C---> <---PICTURE-D---> <---PICTURE-E---> <---PICTURE-F---> <---PICTURE-G---> <---PICTURE-H---> <---PICTURE-I---> <---PICTURE-J--->

Above: 25fps; below: a sort of 30fps that poisondeathray is trying to describe in words.

<------------------------------------------------------------------------- 1/6 s -------------------------------------------------------------------->
[A] [B] [B] [C] [D] [E] [F] [F] [G] [H] [
[a] [b] [c] [d] [d] [e] [f] [g] [h] [h] [
<----PICTURE-A----> <--PICTURE-B---> <---PICTURE-C---> <----PICTURE-D----> <----PICTURE-E----> <--PICTURE-F---> <---PICTURE-G---> <----PICTURE-H---->

Okay, here's what I see. In the lower diagram, I see standard, 4-to-5 telecine that, though actually 30fps, plays 25fps 'progressive'.
It's brilliant.
I never thought of doing something so simple, especially something like telecining 25fps and then marking it 30fps progressive.

@poisondeathray, kindly study the following carefully.
It's pretty astounding. For a 24pps source, this is a stride.
It has 16 halfpics, 6 of which are dropped.
There are 96/16=6 strides per second.
In them, 6x6=36 of the halfpics are dropped.
____________________That leaves 96-36=60 left (or, another way, 6x(2+3+2+3) left).
/ /
Here's the standard, 4-to-5 telecined 30'fps recipe: 30'fps=30fps[2+00+3+00+2+00+3[96hps[2x[24pps]]]] <== note that this starts with 24pps video

Here's what, I think, you describe: 30'fps=30fps[2x[2+00+3+00+2+00+5+00+3+00+2+00+5+00+3+00+2+00+3+00[50hps[2x[25pps]]]]] <== but this starts with 25pps video
For a 25pps source, this is a stride.___/_________________________________________________/
It has 50 halfpics, 20 of which are dropped.
There is 50/50=1 stride per second.
In it, 10x2=20 of the halfpics are dropped.
That leaves 50-20=30 left.

Do you see this: "30fps[2x["?
First, it's real doubling -- I think it's the doubling you mentioned.
Second, it has to be there to keep the video and audio in sync.
Third, it means that the video that plays is actually 15pps playing at 30fps, not 25pps playing at 25fps -- did you know that the actual pictures are culled by 2/5ths?

The scheme is brilliant even though it does lose 2/5ths temporal resolution.

The point here is not whether what you wrote was 100% correct, or whether I understood only 50% of it.
The point is that the notation precisely describes it and does so compactly: 85 characters on 1 line, and unambiguously.
Yes, the notation takes some pondering, but it's a recipe that shows how to mechanically manipulate pictures, halfpics, and scans that teaches while it shows.

markfilipak

4th November 2021, 08:09

PsF = progressive segmented frames. ...
Yes, what I read about PsF was in a research paper from the late 90s.
If you're using this following definition "for halfpics" , I think it's similar concept wise , but the notation is a bit confusing
How confusing? A halfpic (as opposed to a scan, which is the same size) is the occupant of a field that was separated from a picture.
Yes, there are 2 halfpics in a picture.
No, they are not differentiated from one another -- they don't need to be as they only exist within a raw filter pipeline as 1/2-height virtual images.
Are both aligned fields "pulled out" (the field pair for a progressive frame), at the same time ?
Not necessarily. A halfpic could simply be copied out. It doesn't matter how a halfpic is created, only that it's a half picture. In the MPEG spec, halfpics are also called "pictures" but their use is not confusing because to them, any image is a picture. I call them halfpics simply to differentiate them from pictures -- halfpics are progressive from a progressive parent -- and from scans -- halfpics are the same 'size' as scans, but scan pairs are not progressive.
So actually half pic "pairs" /s ? 24 half pics /s ([hps]?)would imply 12 full pictures /s . or was the decision because "hpps" too "wordy" ? ie. Shouldn't it be 48hps ?
Yes, and yes. 24pps =(temporally)= 48hps (i.e. as streams, halfpics have 2x the rate of pictures.
In the notation, it's understood that since 2 halfpics come out of each picture, then 24pps and 48hps have the same 'data density' and can convert back and forth without processing.
Or does it indicate a single separated field ? (a half pic) . If so, which one of the pair ?
To the stream? It doesn't matter. To metadata if a halfpicture became framed? It still wouldn't matter because, at that point, the halfpic would become a picture in its own right and pictures don't have top and bottom.
Camcorder - it depends on the recording setting. e.g. a common method is speeding up a variable speed projector to 18fps to 20fps , and recording camera at 60/1 fps progressive (triplicates). Then decimating back to 20fps (unique frames), and slowing it down to 18fps (same framecount, just slowed down) . For the DVD step, it's just telecine (but it can be soft telecine too; field repeats aren't necessarily encoded)
Zero-length motion vectors compress to nothing. In my work I've found that HDTV at 24fps converted to 120fps via placebo transcoding resulted in files (and streams) that were 1/6th to 1/8th the size. Now, some of that was from going from H.264 to H.265, but I reckon that most of it is because, though there were more of them, the resulting 1/5th length motion vectors compressed much more, and with fewer planar fixups required by the target decoder. Those planar fixups must take up a lot of bits.
Yes, what you describe is never done, obviously. "PAL", "NTSC" was used to denote the region of the DVD, not the content (pps). "PAL" does not necessarily mean 50 fields/s interlaced content. It could mean 25p content (25pps) , such as in this case . "PAL<=> NTSC conversions that preserve framecount, but speedup or slowdown (duration changed)" - refers only to 23.976p <=> 25p content conversions - those are the only times the "speedup" and "slowdown" terms are ever used in the PAL<=>NTSC DVD video context. I guess I shouldn't have assumed it common knowledge.
No, no, my friend. I do think you can assume almost anyone will know that 24fps content (i.e. 24pps) is framed at 25fps without interpolation. 25fps[24pps], so 4% speedup.

Something that helps:
25fps[24 pps] is a speed up because the frames are faster than the pictures and 'pull' the pictures out of the data (the playing rate) 1/25th second faster than they were put into the data (the shooting rate).
24fps[24pps] is a shooting rate and only applies to the source of the pictures: film, camera, etc.
25pps[24pps] is picture interpolation: 1 new picture for every 24.
2x[24pps] 3x[24pps] 4x[24pps] etc. are picture duplications.
1-24,24[24pps] doubles the last picture, only
-- the notation "1-24,24" is an example of a stride mapping (as opposed to a stride masking) which, in this case, is 1 stride per second.
25pps[1-24,24[24pps]] is the same -- '25pps' is simply added to clarify it but it doesn't 'do' anything.
Get it?
The first step is 25fps to 59.94fps via frame repeats in a pattern (ie. "25p in 59.94p").
If I have that right, those are picture repeats. In the notation, you can get from 25pps to 60pps via 2 routes:
by employing a map
60fps[1,1,1,2,2,3,3,3,4,4,5,5,6,6.6,7,7,8,8,9,9,9,10,10,11,11,12,12,12,13,13,14,14,15,15,15,16,16,17,17,18,18,18,19,19,20,20,21,21,22,22,22,23,23,24,24,25,25[25pps]]
by employing a mask
60fps[2+0+2+0+1+0+1+0+2+0+2+0+2+0+1+0+1+0+2+0+2+0+2+0+2+0+1+0+1+0+1[3x[25pps]]]

Only the selection set is weaved. The non selected are discarded
Well, I gave up trying to figure out exactly which of the halfpics those were. The two ends of 4 'fields' left too many ways I could/did get that wrong. That's the trouble with words -- there's too many of them, they're too easy to misstate, they're too easy to mistake, and they're too easy to misunderstand.

I'll need some time to digest the other stuff...
Take all the time you need. :)

I'm just delighted to find someone who can verify the 'crazy notions' I hold.

markfilipak

4th November 2021, 10:06

"Field blending" ...
Format conversions that use weighted blending ...
I don't do cosmetics, ever. They only degrade images. I don't plan to add cosmetics to the notation. Besides that, people who apply cosmetics generally don't need help and don't preplan their work -- it's all ad hoc via knob twiddling.
The "'x' content in 59.94p" cases - e.g. 720p59.94 channels.
eg. 23.976p content with triplicate/duplicates, 29.97p content with duplicates,
That's 2x[30'pps] or, if from framed video, 2x[30'fps] or even 2x[30fps=30'fps] (just in order to simplify succeeding math).
"15p" (14.985p) content with 4x repeats,
That's 4x[15'pps] or 4x[15fps=15'fps]
"12p" (11.988p) content with 5x repeats, etc...
That's 5x[12'pps] or 5x[12fps=12'fps]
Sometimes you don't know the provenance ...
Why not? There's FFprobe, MediaInfo, the readout built into MPV. Do you know something I don't know about them (or others)? Or are you giving a standard, 'YMMV' disclamer?
So I think a guideline should be write what you have, unless you absolutely know it was converted from something else .
I do the converting, usually to 120fps.
eg. Peter Jackson shot at 48fps for some scenes, but does it matter to describe what is in your hands ?
Yes, indeed. That is, if I want to convert to 120fps.
If the current scene is 24/1 p content (not slowed down, just decimated), that's what you have in your hands.
Mixed content in "layers" - such as overlays, scrolling titles over a background , lower thirds, news tickers.
The background might have different pps than the foreground elements, not sure how - or if - you want to describe it
Those are special effects. As with cosmetics, they don't need documentation.
"Improperly" produced content, such as interlaced content encoded as progressive frames instead of fields.

It happens in real world.
You see a disproportionate amount here and other forums, because people discuss "problems", they don't strike up conversations up because stuff just "works."
e.g. 29.97i content encoded as a 29.97p stream, instead of the proper 29.97i stream as encoded fields.
There's no notation because there's no structural change made (just metadata, only) and there's no conversion.
Field shifting (also called phase shifted) is essentially field misalignment.
Not super common, but not super rare either.
It occurs in real life such as on retail DVD's , broadcast channels, even some cameras.
eg. a normal progressive content stream is AaBbCcDd . aBbCcDdE is "field shifted" and looks "combed" when viewing as frames .
Not sure what the notation would be ,or if you want to describe it.
That would be indicated thus: 2,1[...hps[... (meaning: a stride map with a stride of 2 -- 2 halfpics in this case -- showing a reversed stream order).

@poisondeathray, I'm done with the homework assignment. I'm eagerly looking forward to seeing my grade.

markfilipak

4th November 2021, 17:35

I think "PsF" is a good example of an ad hoc notation -- a name can be considered a type of notation, eh?

To a master, PsF is a yawn.

To a novice, PsF can seem a mystery when the explanation is couched in what seem like vague terms or terms taken out of context based on some tool that masters use and what that tool is called (as a piece of hardware -- "telecine" comes to mind -- or as a NLE setting for example).

To an apprentice, PsF may sow doubt about prior knowledge (about the whole basis of prior knowledge) if there are loose ends. Loose ends can happen when a newly mentioned terminology is misunderstood -- "This can't be what I think it is, so what I thought prior must of have been wrong in some important respect".

A fixed and clearly understood notation can cut through all that. It saves masters having to write a bunch of words, it takes novices to where they need to be surely and quickly, and it provides the context to keep apprentices on the correct path.

But I think I may be preaching to the choir here, eh?

markfilipak

4th November 2021, 17:52

From what I sense, there's one aspect of video structure that could/(should?) be folded into a good notation: color. A colorspace, for example, is more structural than it is cosmetic. Perhaps a change of colorspace can be incorporated into the notation, so to show at what step in the journey from source to target a change, if any, was/can be made.

I don't know a whole lot about colorspaces because I've never needed to know. And I don't know how a colorspace change could be best incorporated into a general notation -- perhaps as a dotted property of a stream, or perhaps as a process denoted by braces, or ...(? something else).

What do you think?

markfilipak

4th November 2021, 18:38

I'd consider yadif for example, to be a video cosmetic, a very good video cosmetic (and cleavor) but a cosmetic nonetheless. I used it before I learned about structure and mechanical repair. Most video tools and transcoder filters are really cosmetics, eh? And it seems that most of the tool developers' interests lay in creating special purpose (use-case) cosmetics.

Why is that? Why do developers spend so much time and effort writing cosmetics?

The FFmpeg documentarians throw the cosmetics into a general list of filters without overarching classifications that separate the mechanical tools from the cosmetics. Why is that? Have they been lazy? thoughtless? simply disorganized? I'll tell you this: It creates no end to confusion in the minds of novices. (Of course, it doesn't really matter why/how FFmpeg got to where it is now, but what about Avisynth? Vapoursynth? Are they any better organized either functionally or by documentation/use?)

When I worked as a video architect at Atari in the early 80s -- I didn't know I was a video architect, but that's what I was -- I came up with all sorts of novel ways to get proper NTSC timing and more than 80 text characters per line out of the relatively primative, low clock rate digital electronics of the day. To be fair about it, creating sprite graphics to feed to NTSC and PAL and SECAM TVs in real time was daunting.

My point is that the state of video cosmetics today seems to me to be a lot like the Atari cludge circuits some folks cooked up in the early Atari days. I look at the specifications and think, "What's the point?"

Perhaps the thoughts of masters here at Doom9 can contribute to either a new appreciation of video cosmetics or a renewed distain for them. Do video cosmetics belong in a notation and, if so, how would they be incorporated?

markfilipak

4th November 2021, 20:40

poisondeathray

4th November 2021, 20:58

RE halfpics:

I would clarify the rate , vs. composition

Yes, there are 2 halfpics in a picture.

2 halfpics in a full pic: makes sense, in terms of composition

24pps =(temporally)= 48hps (i.e. as streams, halfpics have 2x the rate of pictures.

Makes sense

In post 31 , the handy reference - you wrote

- hps is understood to be pps/2, even when pps is not explicitly stated.

"hps understood to be pps/2 " - not in terms of rate, but in terms of composition

Same with this:

- sps is understood to be pps/2, even when pps is not explicitly stated.

I would clarify - not in terms of rate, but in terms of composition

I call them halfpics simply to differentiate them from pictures -- halfpics are progressive from a progressive parent -- and from scans -- halfpics are the same 'size' as scans, but scan pairs are not progressive.

So a halfpic is essentially a "field" from a progressive parent . And scan field from a "non progressive" parent (what referred to "interlaced content" on video forums) ?

- fps[hps interlaces & frames halfpic pairs, beginning with a stream's 1st halfpic
- pps[hps interlaces halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if pps != 2*hps).

- fps[sps interlaces & frames scan pairs, beginning with a stream's 1st scan, (and produces fast/slow motion if fps != 2*sps).
- pps[sps interlaces scan pairs, beginning with a stream's 1st scan, (and produces fast/slow motion if pps != 2*sps)

You should define what you mean by "interlaces", because as you know there are different meanings and context - it's an area of potential confusion. In this context, I think you mean something like "putting together". What did you mean by "frames" ? Does "frames" in this context (as a verb?) mean something similar to "weave" ?

What is the distinction between fps[hps] vs. pps[hps] ? Is it arrangement ? Does fps[hps] mean "weaved" half pic pairs resulting in a full picture, at half the hps rate ?

Does pps[hps] mean half pics arranged individually? Basically, "separate fields", but for "full pics"

And I assume fps[sps] and pps[sps] would be analogous but for "scan field"

- ...fps...[pps] is what poisondeathray calls "encoded progressively", I think.
- ...fps...[sps] is what poisondeathray calls "encoded interlaced", I think.

progressive content, encoded progressively is ok

But progressive content, "encoded interlaced" should be ...something...[pps] , I think... The "encoded interlaced" vs. "encoded progressive" can occur after the acqusition step. Pretend it's a 25p native progressive camera [25pps] acqusition. Then you re-encode with , say, ffmpeg and you either (a) "encode progressive" or (b) "encode interlaced" If the right side [] brakcets indicate what it "is", or the "content" - both should be [25pps]

I would have thought "encoded interlaced" would be sps[pps], or 50sps[25pps] in this example, because "encoded interlaced" can be thought as the orginal 25pps being put through a scan rate in a sense; even though the acquired content is progressive. But you have "sps[pps is invalid."

That example would be "25p content in 25i" . Or if expressed as the field rate: "25p content arranged in 50 fields per second"

There are essentially the same if they are handled correctly in the display chain. e.g. In the UK, on a 50Hz display, You would see 25p with duplicate frames in both cases. (Flagging differences, and mostly negligible differences in encoding). So visually, that's no different than 25fps[25pps] case (the encoded progressive case) ... But I mentioned the potential issues, and why people make the distinction earlier. I thought you decided not to include it, or not distinguish between them

poisondeathray

4th November 2021, 21:04

...continued

25fps v. a sort of 30fps that poisondeathray is trying to describe in words.
<---------------------------------------------------------------------------------------- 1/5 s ----------------------------------------------------------------------------------->
[A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [
[a] [b] [c] [d] [e] [f] [g] [h] [i] [j] [
<---PICTURE-A---> <---PICTURE-B---> <---PICTURE-C---> <---PICTURE-D---> <---PICTURE-E---> <---PICTURE-F---> <---PICTURE-G---> <---PICTURE-H---> <---PICTURE-I---> <---PICTURE-J--->

Above: 25fps;

The schematic shows 10 "pictures" over 1/5 s ; that would indicate 50p content (or [50pps] ) .

Yes, the notation takes some pondering, but it's a recipe that shows how to mechanically manipulate pictures, halfpics, and scans that teaches while it shows.

It's going to take some time - But I can see the potential usefulness in some situations, as it breaks everything out. In other cases, it might be too verbose. If I say "drop 25p content on a 29.97i timeline" , almost every editor knows what I mean. In most discussion contexts, you don't have to spell out the exact pattern or how you got there.

The point here is not whether what you wrote was 100% correct, or whether I understood only 50% of it.

Yes, but "common ground" examples are required to describe what is occurring, in order to understand the proposed notation. What I described is a common conversion in practice.

Third, it means that the video that plays is actually 15pps playing at 30fps, not 25pps playing at 25fps -- did you know that the actual pictures are culled by 2/5ths?

The scheme is brilliant even though it does lose 2/5ths temporal resolution.

I didn't take a full look at it yet, but it can't be what I described. Put on the "common sense hat" for a second. In generic terms, you have 25p content [25pps] , organized in 59.94 fields. How is it possible that you "lose" temporal resolution? It would be a poor conversion (ie. it would never be done like that). Also, I mentioned this could be "reversed" with IVTC, so you have the 25p content back - that means the full temporal resolution is contained in the 59.94 fields.

I'll try to use other words , or a schematic to describe what is occurring. Or if you want, a vapoursynth or avisynth script. Or a before/after video. A video sample is worth 10,000 words in video forums. If both of us can verify examples, then that can be used for learning. So I'll revisit this in a bit...

poisondeathray

4th November 2021, 21:05

Sometimes you don't know the provenance ...

Why not? There's FFprobe, MediaInfo, the readout built into MPV. Do you know something I don't know about them (or others)? Or are you giving a standard, 'YMMV' disclamer?

That's not what I'm getting at. FFProbe, MediaInfo, MPV, etc. don't describe the possible multiple steps in the journey to what you have now, or the content itself. It seems like it can be easy to make the notation longer than it has to be to communicate something basic. I'm just suggesting that people describe what is in their hands for certain, not what it might have been 16 steps ago , because you don't necessarily know what was done. I gave the 24000/1001 acquisition example - that' s what it actually started as - so you wouldn't second guess it was actually 24/1 . But sometimes the context gives you "clarity", eg. you know an old film was shot on 24/1p celluloid.

RE - "Cosmetics" ;

I don't do cosmetics, ever. They only degrade images. I don't plan to add cosmetics to the notation. Besides that, people who apply cosmetics generally don't need help and don't preplan their work -- it's all ad hoc via knob twiddling

Unfortunately these are fairly common real life scenarios, and they occur on professional distributions (The field blending case). Many of the threads in video forums deal with various types of issues, and how to improve them.

I'm not sure how wide or narrow your definition of "cosmetics" is; but they might be for improvement purposes, either objective or subjective -

eg. If you watch an interlaced content [sps] home video of relatives, in your living room - Your modern flat panel TV will deinterlace 60000/1001 fields per second content, to 60000/1001 progressive frames per second. The method of deinterlacing matters, and is a "cosmetic" . Some TV's do a terrible job. (Expensive ones can do a good job). If you can produce a better result in software than your current TV, then why not? eg. Compare yadif to QTGMC. Enormous quality differences. Most "cheap" TV's do something similar to a "bob" deinterlace

Or you upload an interlaced content video to youtube, are you going to let YT butcher it, or control the high quality "cosmetic" used ?

Motion interpolation can be considered "cosmetic", especially when using ones that resample original frames

Interpolating dropped frames and duplicates duing a bad capture is a "cosmetic", changes the current structure, but a definite improvement

Other "cosmetics" might include:
Chroma upsampling algorithm used is a "cosmetic" (How subsampled YCbCr is converted to RGB for display). The algorithm you use can make a difference quality on color borders. Blocky colors, smooth color borders

Sometimes you're improving or "fixing" various issues. e.g. a video is too underexposed, too dark. Or maybe undersaturated. Maybe you want to denoise a video. Those are all "cosmetic" manipulations. Sharpening a blurry video. Adjusting contrast.

poisondeathray

4th November 2021, 21:11

I'm unsure what these are. Sometimes I think they're colorspaces, and other times I think they're macroblock structures, and yet other times I think they're somehow related to "packed" v. "planar". The differentiation may be regarding pixel structure and/or color structure and/or data access topology (or all of the above). I simply don't know.

I've seen many clues, but nothing definitive. Does anyone have any better links than the ones I've found?

You can download the official documents from ITU

https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.601-7-201103-I!!PDF-E.pdf
https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.709-6-201506-I!!PDF-E.pdf

markfilipak

4th November 2021, 22:45

Good inputs! Thanks.
[QUOTE=poisondeathray;1956610]RE halfpics:[QUOTE]
I'm going to respond fully in the near future, but I wanted to reflect a bit, first.
You know, I've never read a fully satisfying description of the bizarre video ecosystem in which video is son-of-cinema and cinema (now) is son-of-video. Some begin with, "The Earth: a molten ball in space", and proceed to relate the evolution of cinema and video. Others begin with right now and describe the What-Is, and leave it at that. Some day I will write something between those two poles.

markfilipak

5th November 2021, 02:14

RE halfpics:

I would clarify the rate , vs. composition
Yes, of course. My errors.
I've corrected both. Thanks for pointing that out. Yes, I inadvertently mixed temporal and spacial.

So a halfpic is essentially a "field" from a progressive parent . And scan field from a "non progressive" parent (what referred to "interlaced content" on video forums) ?
That's correct, though to me, pictures and scans evolved first, then fields, then frames. In analog TV, fields are what existed and frames were almost etherial. Today, frames are everything and fields almost seem like afterthoughts.

What's important to me is to decouple pictures (and halfpics and scans) from the frames (and fields) that transport them (much like MPEG-ESs transport frames). It's harder to see image transport because the images so strongly dominate the process (at least, to our perceptions). I visualize the process like a shutter box: The shutter flies around while the pictures in the box change but the rate of picture change and the rate of shutter (i.e. frame) are actually independent of one another (or, at least, they can be independent). Tying frames (and fps) to pictures (and pps) is misleading, me thinks. Look how many people think the pictures are the frames. Such prejudices really limit their ability to truly understand video and what can be done to/with video, eh?

- fps[hps interlaces & frames halfpic pairs, beginning with a stream's 1st halfpic
You should define what you mean by "interlaces", because as you know there are different meanings and context - it's an area of potential confusion.
Good insight! Would you say this is clearer: "- fps[hps is an operation that interlaces pairs of halfpics, beginning with a stream's 1st pair, and then frames them"
In this context, I think you mean something like "putting together". What did you mean by "frames" ? Does "frames" in this context (as a verb?) mean something similar to "weave" ?
Yup, verb. Nope, not weave. It simply means to put into a frame.

Frame [n]: a border or case for enclosing a picture, mirror, etc.; [v] to put into a frame.
What is the distinction between fps[hps] vs. pps[hps] ? Is it arrangement ? Does fps[hps] mean "weaved" half pic pairs resulting in a full picture, at half the hps rate ?
The notation is pretty brain-dead simplistic. and mechanical.
fps[hps] means to take a source, [hps], and put it in frames, fps. Consequentially, the hps can assume to become interlaced as a picture (with/without combing) -- the assumed parts are not explicitly shown in the notation but could be (as "fps[pps[hps]]" for example).
pps[hps] means to take a source, [hps], and interlace them to form a picture, pps, but don't frame them yet.

As you might have noticed, I'm using 'pps', for example, as both a rate-&-units token and as a moniker for the picture, itself. I thought long and hard about that. I decided that doing so is so damned efficent that it's irresistible, and that modern folks in the Age of Apps will catch on pretty quickly. What do you think? Have an opinion?
Does pps[hps] mean half pics arranged individually? Basically, "separate fields", but for "full pics"
No, but good question... 'pps' means pictures per second in a stream of pictures, and 'hps' means halfpics per second in a stream of half pictures -- think like FFmpeg's filter graph. So, what I describe is really a stream conversion: a stream of halfpics (on the right) becoming a stream of pictures (on the left). -- Damn! Sam! You are one great editor.
And I assume fps[sps] and pps[sps] would be analogous but for "scan field"
Yes, exactly.
But progressive content, "encoded interlaced" should be ...something...[pps] , I think...
Yes. That's exactly the way I think of 'pps'.
So,
- there's 'pps': both a moniker for a stream of pictures and their rate,
- there's 'hps': both a moniker for a stream of halfpics and their rate,
- there's 'sps': both a moniker for a stream of scans and their rate,
The "encoded interlaced" vs. "encoded progressive" can occur after the acqusition step.
Good word, that: "acquisition".
Pretend it's a 25p native progressive camera [25pps] acqusition. Then you re-encode with , say, ffmpeg and you either (a) "encode progressive" or (b) "encode interlaced" If the right side [] brakcets indicate what it "is", or the "content" - both should be [25pps]
Erm... "encode interlaced"? That's a ...dangerous way to put it. Here's where we temporarily part company and I (maybe) insist that I'm right.
No offense meant.
In my mind, in a raw stream, there's only pictures or halfpics or scans. And I've invented halfpictures solely in order to rationalize/'explain' certain things (like the next couple of sentences).
In an encoded stream, well, unless my education is incomplete, there's only 2 halfpics (or 2 scans) inside 2 fields, and 2 fields inside each frame. I think you (and, granted, maybe many others) try to imply the properties of the encode's source as either "encode progressive" or "encode interlaced", but the notation (which is an analog of how I think) makes it explicit by putting the source in brackets, e.g. "[pps]". Now I know that may seem a matter of symantics, but it goes to the heart of the notation and helps to make the notation work and to be efficient. On the other hand, if you can visualize "[pps]" as "encode progressive" and "[sps]" as "encode interlaced", well, that's just fine with me and makes me happy. If not, well, that's okay too, but why? ...To you, doesn't "encode interlaced" simply mean "interlaced" (meaning: consisting of scans)? Or, alternatively, after encoding, there's a very complex structure of interleaved quad pixels and sub-/half-blocks & blocks along color pixel planes and macroblocks and slices... is that what you mean by "interlaced"? As I once said, I have a problem with the word "interlaced".
I would have thought "encoded interlaced" would be sps[pps], or 50sps[25pps] in this example, because "encoded interlaced" can be thought as the orginal 25pps being put through a scan rate in a sense; even though the acquired content is progressive. But you have "sps[pps is invalid."
I think -- I may be wrong -- that when you write "encoded interlaced", you refer solely to "[sps]", right?
Well, "sps[pps" is invalid because, in my mind, you can't get (temporally skewed) scans from a picture.
That example would be "25p content in 25i" . Or if expressed as the field rate: "25p content arranged in 50 fields per second"
Okay, I think I understand you. I'd say, "No, you can separate out the fields of 25pps (aka 25p), but you don't get 50sps (aka 25i) because the fields are temporally offset, one to the next, by 1/(frame rate). What you get from 25pps is 50hps (which has no analog in the 'p' & 'i' world).
There are essentially the same if they are handled correctly in the display chain. e.g. In the UK, on a 50Hz display,
Sure, but you can't just wave your hand and say that and rest assured that everyone knows what you mean. That -- cases like that -- is what the notation attempts to clarify.

Look, let me be honest here. To a certain degree, a new notation or method makes an implied request: "Think like me. Think like I do." I admit it. And I admit that may not appeal to everyone. But the choice of syntax implicitly asks all aspiring codesmiths to "Think like me", does it not? And a new scientific theory implicitly asks all aspiring scientists to "Think like me", eh?
You would see 25p with duplicate frames in both cases. (Flagging differences, and mostly negligible differences in encoding). So visually, that's no different than 25fps[25pps] case (the encoded progressive case) ... But I mentioned the potential issues, and why people make the distinction earlier. I thought you decided not to include it, or not distinguish between them
Sorry, I got lost between the cases. Could you quote directly before your response so I can match them up?

poisondeathray

5th November 2021, 06:23

- fps[hps interlaces & frames halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if fps != 2*hps).
- pps[hps interlaces halfpic pairs, beginning with a stream's 1st halfpic, (and produces fast/slow motion if pps != 2*hps).

fps[hps] means to take a source, [hps], and put it in frames, fps. Consequentially, the hps can assume to become interlaced as a picture (with/without combing) -- the assumed parts are not explicitly shown in the notation but could be (as "fps[pps[hps]]" for example).
pps[hps] means to take a source, [hps], and interlace them to form a picture, pps, but don't frame them yet.

In the pps[hps] case - What is the distinction between 2 half pics combined to form a picture vs. not yet a frame ? Isn't the "picture" already a "frame" in this case ?

A halfpic will never be a 1st generation source, or a distributed format - what is the rationale for including [hps] ? I understand the potential usefulness of it on the left slots, but not the far right "starting point" .

Let's say you start with 25p content, native progressive, camera acqusition 25fps[25pps] . Now you separate to halfpics stream. Pretend that is your new starting point "source" [50hps] . If you "interlace" them to form a picture you have 25pps[50hps] . Isn't that what you started with , really 25fps[25pps[50hps]] ?

As you might have noticed, I'm using 'pps', for example, as both a rate-&-units token and as a moniker for the picture, itself. I thought long and hard about that. I decided that doing so is so damned efficent that it's irresistible, and that modern folks in the Age of Apps will catch on pretty quickly. What do you think? Have an opinion?

I can't say yet... still digesting

Erm... "encode interlaced"? That's a ...dangerous way to put it. Here's where we temporarily part company and I (maybe) insist that I'm right.
No offense meant.
In my mind, in a raw stream, there's only pictures or halfpics or scans. And I've invented halfpictures solely in order to rationalize/'explain' certain things (like the next couple of sentences).
In an encoded stream, well, unless my education is incomplete, there's only 2 halfpics (or 2 scans) inside 2 fields, and 2 fields inside each frame. I think you (and, granted, maybe many others) try to imply the properties of the encode's source as either "encode progressive" or "encode interlaced", but the notation (which is an analog of how I think) makes it explicit by putting the source in brackets, e.g. "[pps]". Now I know that may seem a matter of symantics, but it goes to the heart of the notation and helps to make the notation work and to be efficient. On the other hand, if you can visualize "[pps]" as "encode progressive" and "[sps]" as "encode interlaced", well, that's just fine with me and makes me happy. If not, well, that's okay too, but why? ...To you, doesn't "encode interlaced" simply mean "interlaced" (meaning: consisting of scans)? Or, alternatively, after encoding, there's a very complex structure of interleaved quad pixels and sub-/half-blocks & blocks along color pixel planes and macroblocks and slices... is that what you mean by "interlaced"? As I once said, I have a problem with the word "interlaced"

We already discussed this; "encoded interlaced" is the button you push and it changes the way an encoder functions and flags the output (with potential consequences.) "encoding interlaced" not the same as "encoding progressive". Recall the YUV1 vs. YUV2 example in post 28. Both are functionally similar, but different from each other and the source. This interlaced encoding vs. progressive encoding terminology is entrenched in all software, professional/commercial and freeware, you can't escape it - so learn to live with it or translate it on the fly in your head. Again, "encoded interlaced vs. encoded progressive" NOT a content description (not a pps description). I explained earlier (post 26), it's completely independent of the source properties. "Content" is described by "x content encoded interlaced" , or "x content encoded progressive" . "x content" is the part that you can think of as your [pps] , or [sps] .

Yes, again problems with "interlaced" terminology. It' s used differently in different places, and you're not going to change the way it' s used in professional or non profesional settings. eg. "interlaced timeline" in a NLE. eg. A 29.97i timeline means assets placed on the timeline conform to 59.94 fields per second. Live with it.

I would have thought "encoded interlaced" would be sps[pps], or 50sps[25pps] in this example, because "encoded interlaced" can be thought as the orginal 25pps being put through a scan rate in a sense; even though the acquired content is progressive. But you have "sps[pps is invalid."

I think -- I may be wrong -- that when you write "encoded interlaced", you refer solely to "[sps]", right?
Well, "sps[pps" is invalid because, in my mind, you can't get (temporally skewed) scans from a picture.

No, "encoded interlaced" (or "encoded progressive") is independent of [] . It can be [sps] or [pps]

That example would be "25p content in 25i" . Or if expressed as the field rate: "25p content arranged in 50 fields per second"

Okay, I think I understand you. I'd say, "No, you can separate out the fields of 25pps (aka 25p), but you don't get 50sps (aka 25i) because the fields are temporally offset, one to the next, by 1/(frame rate). What you get from 25pps is 50hps (which has no analog in the 'p' & 'i' world)

To reiterate what you mentioned earlier a few pages back - you decided against including , or there is no way to distinguish it (encoded interlaced vs. progressive) in the notation, and that's fine

You would see 25p with duplicate frames in both cases. (Flagging differences, and mostly negligible differences in encoding). So visually, that's no different than 25fps[25pps] case (the encoded progressive case) ... But I mentioned the potential issues, and why people make the distinction earlier. I thought you decided not to include it, or not distinguish between them

Sorry, I got lost between the cases. Could you quote directly before your response so I can match them up?

This was referring to "25p content encoded interlaced" vs. "25p content encoded progressive"

Another situation - consider an old home video recording say DV. Interlaced recording mode (some DV variant had progressive modes). The camera uses an old style CCD sensor - so the sensor itself also works in "field scan" (as opposed to more modern CMOS sensors , progressive scan, where "interlaced content" [sps] outputs are created in camera after the sensor, but before the recording to media in the CMOS case.) For the DV recording, during motion, each field represents a different image. So the DV camera the output recording is 29.97fps[59.94sps]. If there is no motion, both field pairs are from the same image - it's a progressive content during that scene (unless it's mishandled by playback chain, eg. deinterlaced) . 1) Let's say you recieve a clip, no other information about the camera or background info, and the recording was that static scene only - what would you "label" it ? 2) Let's say you have all the information - what would you "label" it?

markfilipak

5th November 2021, 21:26

Correction.
This: http://forum.doom9.org/showthread.php?p=1956536#post1956536, is interesting, and I'll investigate what it does to see if it might be useful, but it's wrong.

<------------------------------------------------------------------------------------- 1/5 s -------------------------------------------------------------------------------------->
[25fps] [A+a_______________________________][B+b_______________________________][C+c_______________________________][D+d_______________________________][E+e_______________________________]
50hps[25fps] (A_______________)(a_______________)(B_______________)(b_______________)(C_______________)(c_______________)(D_______________)(d_______________)(E_______________)(e_______________)
6x[50hps[25fps]] (A)(A)(A)(A)(A)(A)(a)(a)(a)(a)(a)(a)(B)(B)(B)(B)(B)(B)(b)(b)(b)(b)(b)(b)(C)(C)(C)(C)(C)(C)(c)(c)(c)(c)(c)(c)(D)(D)(D)(D)(D)(D)(d)(d)(d)(d)(d)(d)(E)(E)(E)(E)(E)(E)(e)(e)(e)(e)(e)(e)
1+0000[6x[50hps[25fps]]] (A____________)(A____________)(a____________)(B____________)(b____________)(C____________)(c____________)(c____________)(D____________)(d____________)(E____________)(e____________)
1,3,2,5,4,7,6,8-12[1+0000[6x[50hps[25fps]]]] (A____________)(a____________)(A____________)(b____________)(B____________)(c____________)(C____________)(c____________)(D____________)(d____________)(E____________)(e____________)
30fps[1,3,2,5,4,7,6,8-12[1+0000[6x[50hps[25fps]]]]] [A+a_________________________][A+b_________________________][B+c_________________________][C+c_________________________][D+d_________________________][E+e_________________________]

@poisondeathray, Is the final frame sequence what you tried to describe?

poisondeathray

5th November 2021, 23:36

@poisondeathray, Is the final frame sequence what you tried to describe?

The last line on the right : A+a.... looks correct .

The left side is my homework project this weekend :) . I have to examine with the examples and explanations you posted on the last page with this one

markfilipak

5th November 2021, 23:57

May I comment, first?
I think you may be interpreting the notation to be a lot more complex than it actually is, and maybe with hidden moving parts that it doesn't actually have.
If it helps, think 'stream'. 25pps is a stream. 50hps and 50sps are streams.
When I write "50hps[25fps]", for example, I'm not changing the frames, I'm extracting the picture stream and deinterlacing to a halfpic stream.
A mask stride or a map stride is simply consuming a stream and producing a new stream that 'flows' toward the left.

Yes, the notation shows (on the left) not just WhatIs, but also (the stuff in the middle) how it got 'there' and (on the right) 'where' it came from. I think that stuff matters. The alternative is words that describes only WhatIs, but WhatIs can be very complex and better understood by HowItBecame).

In the pps[hps] case - What is the distinction between 2 half pics combined to form a picture vs. not yet a frame ? Isn't the "picture" already a "frame" in this case ?

25pps <== this is a picture stream.
25fps[25pps] <== this is a picture stream, framed.
"Isn't the picture already a frame" -- a picture is not a frame. A picture is a picture. A frame is a frame.

A halfpic will never be a 1st generation source, or a distributed format - what is the rationale for including [hps] ? I understand the potential usefulness of it on the left slots, but not the far right "starting point" .
You are correct. A halfpic stream cannot be a source (i.e. "[25hps]", for example, is invalid).

25fps <== this is a frame stream.
50hps[25fps] <== this is a halfpic stream, unframed and deinterlaced -- the intermediate pps step can be skipped over because including it wouldn't add any additional info.
The purpose of halfpics is to provide a 'field'-sized image that can be counted, masked, and manipulated without calling it a "field" (because a halfpicture is not a field).

Let's say you start with 25p content, native progressive, camera acqusition 25fps[25pps] . Now you separate to halfpics stream. Pretend that is your new starting point "source" [50hps] . If you "interlace" them to form a picture you have 25pps[50hps] . Isn't that what you started with , really 25fps[25pps[50hps]] ?
What you describe would be notated: 25fps[50hps[25fps]]. It just unframes, deinterlaces, interlaces, and frames. Yes, it doesn't really 'do' anything, but it's what you describe.

We already discussed this; "encoded interlaced" is the button you push and it changes the way an encoder functions and flags the output (with potential consequences.) "encoding interlaced" not the same as "encoding progressive". Recall the YUV1 vs. YUV2 example in post 28.
Is the difference inside the slices? Macroblock topology? I diagramed the full topolgies of 'i'-style macroblocks v. 'p'-style macroblocks but I've never published it. It's very, very complicated. I didn't dive into the macroblocks because that was at the time when I realized that knowing MPEG2 ES was not going to help me to understand FFmpeg. If you think this is important, then shall we start a new thread to pursue it? I have a lot of topology diagrams I could contribute, immediately.

Both are functionally similar, but different from each other and the source. This interlaced encoding vs. progressive encoding terminology is entrenched in all software, professional/commercial and freeware, you can't escape it - so learn to live with it or translate it on the fly in your head. Again, "encoded interlaced vs. encoded progressive" NOT a content description (not a pps description). I explained earlier (post 26), it's completely independent of the source properties. "Content" is described by "x content encoded interlaced" , or "x content encoded progressive" . "x content" is the part that you can think of as your [pps] , or [sps] .
Thank you. Yes, I understand that there are differences. I just don't know what they are. If the differences are inside macroblocks, and if that's important, then I suppose the notation needs to include it. When I know more about it, I'll include it in the notation. Thanks for being patient with me.

Here's my notion about all that. Correct me, please.
"encoded interlaced" is 25fps[50sps] for example -- scans, interlaced [note] and framed.
"encoded progressive" is 25fps[25pps] for example -- pictures framed.

[note] Interleaved within macroblocks, not actually interlaced -- it's actually a lot more complex than just 'interlaced'; for YUV420 it's 2x2 Y quads in 4, 4x8 blocks plus Cb and Cr, each in 4x8x2 blocks -- a total of 6, 16x16 pixel blocks per macroblock -- with each pair of 4 Y blocks interlaced and each 1/2 of each chroma block (top/bottom) interlaced. For progressive content, the macroblock topology is the same except that the final interlaces between Y blocks and within chroma blocks is not done. I have produced diagrams showing the topology plus how the pixels 'thread' in from 'i' and from 'p' streams.

Yes, again problems with "interlaced" terminology. It' s used differently in different places, and you're not going to change the way it' s used in professional or non profesional settings. eg. "interlaced timeline" in a NLE. eg. A 29.97i timeline means assets placed on the timeline conform to 59.94 fields per second. Live with it.
I'm not trying to change anything or anyone. I'm trying to supplement what exists with a notation that I find useful and other folks may find useful.

No, "encoded interlaced" (or "encoded progressive") is independent of [] .
? "independent of []"? "[]" is the source, right? How can the state of being "encoded interlaced" or "encoded progressive" be independent of the video? Or are you saying that "encoded interlaced" & "encoded progressive" apply only to targets? What's the difference?
It can be [sps] or [pps]
?

To reiterate what you mentioned earlier a few pages back - you decided against including , or there is no way to distinguish it (encoded interlaced vs. progressive) in the notation, and that's fine
I can see that "encoded interlaced" v. "encoded progressive" is very important. I think it's just the difference between macroblock topologies but I may be wrong because I'm finding it difficult to understand what you're getting at. My fault.

Quote:
You would see 25p with duplicate frames in both cases. (Flagging differences, and mostly negligible differences in encoding). So visually, that's no different than 25fps[25pps] case (the encoded progressive case) ... But I mentioned the potential issues, and why people make the distinction earlier. I thought you decided not to include it, or not distinguish between them
Quote:
Sorry, I got lost between the cases. Could you quote directly before your response so I can match them up?
This was referring to "25p content encoded interlaced" vs. "25p content encoded progressive"
Well, I'm totally dumbfounded. Perhaps we can ignore this issue for a while, eh?

Another situation - consider an old home video recording say DV. Interlaced recording mode (some DV variant had progressive modes).
Would that be [60'sps]?
The camera uses an old style CCD sensor - so the sensor itself also works in "field scan" (as opposed to more modern CMOS sensors , progressive scan, where "interlaced content" [sps] outputs are created in camera after the sensor, but before the recording to media in the CMOS case.) For the DV recording, during motion, each field represents a different image. So the DV camera the output recording is 29.97fps[59.94sps].
Yes, I believe that's 30'fps[60'sps]].
If there is no motion, both field pairs are from the same image - it's a progressive content during that scene (unless it's mishandled by playback chain, eg. deinterlaced) .
I don't understand. If there's no motion, then the images are identical, but they're still in a stream of separate images, aren't they?
1) Let's say you recieve a clip, no other information about the camera or background info, and the recording was that static scene only - what would you "label" it ?
30'fps[60'sps]].
2) Let's say you have all the information - what would you "label" it?
30'fps[60'sps]].

Lovingly intended as humor...
Star Trek, The Next Generation, Season 5, Episode 2: "Darmok" [1991]. Guest species: Tamarian.
How does one communicate with aliens who speak solely in allegories? "Darmok & Jalad at Tanagra" is what the Tamarian captain proposes just prior to abuptly beaming Picard and himself from their respective ships, down to the uninhabited planet below. Does he intend personal combat? Greetings? Negotiations? Who or what are Darmok & Jalad? What or where is Tanagra? More importantly, what does "Darmlk & Jalad at Tanagra" signify? It seems very important to the Tamarians. They repeat it several times, each time with greater emphasis. But it's a senseless riddle to the ''Enterprise'' bridge officers, even to Troi. The Tamarians are supposedly peaceful beings, and they've apparently arranged a safe venue for their unexpected contact, but their communications are incomprehensible.

I can't possibly interpret what you write when it's based on a video editor's setting. I can't possibly interpret what you write when it's based on a video editor or equipment I've never used.

Can you describe WhatIs?