Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Reply
 
Thread Tools Search this Thread Display Modes
Old 14th October 2021, 04:04   #1  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
Preparing for numPy reshape

Though it might not seem this is VS and python oriented, it is.

What is the 'pix_fmt' of raw frames in an ffmpeg filter complex?

Is it 60 unpadded bits formatted yuv420, like the quads in macroblocks [note 1]?...
like this:
Code:
  <-byte->
  yyyyyyyy
  yy
    yyyyyy
  yyyy
      yyyy
  yyyyyy
        yy
  yyyyyyyy
  uuuuuuuu
  uu
    vvvvvv
  vvvv
  etc.
Or are decoded frames in an ffmpeg filter complex yuv444p10 so that pixels are individually stored as 30 unpadded bits?...
like this:
Code:
  <-byte->
  yyyyyyyy
  yy
    uuuuuu
  uuuu
      vvvv
  vvvvvv
  etc.
Or something else?

Or does the 'p' in 'yuv...p10' NOT mean "packed"?

Are the raw frames in filter complexes RGB or YUV?

Is there even such a thing as rgb420p10?

I need to ask these questions in order to configure ndarrayFrame.reshape() in this python script:
Code:
  rawvideoFrame = procPipeIn.stdout.read(bytesPerFrame)
  import numpy
  ndarrayFrame = numpy.fromstring(rawvideoFrame, dtype='uint8')
  ndarrayFrame = ndarrayFrame.reshape((pixelsPerRow, rowsPerFrame, bytesPerPixel))
At the moment, I don't know how to set 'bytesPerPixel' for a 'pix_fmt' that is not byte-aligned, so that's probably the next step.
Code:
[note 1]
          SAMPLE-QUAD    RGB COMPONENTS       YCbCr420             YCbCr422             YCbCr444    ...simplifed notation
          ·-------·      ·-------·            ·-------·            ·-------·            ·-------·
          ¦ S ¦ S ¦      ¦ R ¦ R ¦            ¦ Y ¦ Y ¦            ¦ Y ¦ Y ¦            ¦ Y ¦ Y ¦   ...samples, conceived as color planes
          ¦---¦---¦      ¦---¦---¦            ¦---¦---¦            ¦---¦---¦            ¦---¦---¦
          ¦ S ¦ S ¦      ¦ R·-------·         ¦ Y·-------·         ¦ Y·-------·         ¦ Y·-------·
          ·-------·      ·--¦ G ¦ G ¦         ·--¦       ¦         ·--¦ Cb    ¦         ·--¦ Cb¦ Cb¦
                          \ ¦---¦---¦          \ ¦ Cb    ¦          \ ¦-------¦          \ ¦---¦---¦
                           \¦ G·-------·        \¦  ·-------·        \¦ C·-------·        \¦ C·-------·
                            ·--¦ B ¦ B ¦         ·--¦       ¦         ·--¦ Cr    ¦         ·--¦ Cr¦ Cr¦
                             \ :---¦---:          \ ¦ Cr    ¦          \ ¦-------¦          \ ¦---¦---¦
                              \¦ B ¦ B ¦           \¦       ¦           \¦ Cr    ¦           \¦ Cr¦ Cr¦
                               ·-------·            ·-------·            ·-------·            ·-------·
Thanks!
--Mark.
markfilipak is offline   Reply With Quote
Old 14th October 2021, 07:18   #2  |  Link
_Al_
Registered User
 
Join Date: May 2011
Posts: 321
btw, you can get ffmpeg raw output to vapoursynth,

if wanting to get a numpy image, where planes have different sizes, plane by plane size data is needed to read (not the whole frame) and then shove each plane into numpy, then stack those three planes into image array, not sure what you want, all processing using numpy is usually using rgb, rgb has same plane size for each array

So if having subsampled stdout, that'd be weird to get to numpy array, because one plane has different shape than other two, is it still what you want? iirc, .reshape(-1,w,h,3) should be for all frames to get , but planes need to have same size (yuv444 or rgb),
if it is for viewing raw data should be rgb -> numpy

Last edited by _Al_; 14th October 2021 at 07:23.
_Al_ is offline   Reply With Quote
Old 14th October 2021, 15:58   #3  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
Quote:
Originally Posted by _Al_ View Post
btw, you can get ffmpeg raw output to vapoursynth,
Yes, that's what I'm doing now. In the future I need to put VS right in the middle of a filter graph, so I'm thinking of breaking the filter graph in half: preprocess and postprocess.

Preprocess in ffmpeg: open source video, do TB & PTS fix ups and whatnot.

MV interpolate in VS: pipe the raw frames via ffmpeg's pipe to do motion vector interpolation in VS, then pipe the frames back to ffmpeg via subprocess.Popen() -- I think.

Postprocess in ffmpeg & mkvmerge: mux in audio, subs, and chapters whatnot, and then output.

I would do everything in VS if I knew how. I'm lost in python -- totally a python newbie who is largely ignorant of standard python architecture and methods. From what I've found, the python documentation seems to concentrate on syntax and functions. But for what I'm trying to do, I need architecture methods, such as '<object>.list_functions()', which are unnamed in the documentation I've found -- you know that learning a new language is likened to climbing a mountain: the most time is spent looking at maps of the mountain and planning a route, not spent getting gear together. :-) I'm kind of bogged down looking for the maps.
Quote:
if wanting to get a numpy image, where planes have different sizes, plane by plane size data is needed to read (not the whole frame) and then shove each plane into numpy, then stack those three planes into image array, not sure what you want, all processing using numpy is usually using rgb, rgb has same plane size for each array
Well, you see, if the input is yuv420p8 or yuv420p10, does ffmpeg create raw frames that are yuv420? Or yuv444? Or rgb420? Or rgb444? I don't know and haven't learned how to find out such things because I have been concentrating on purely mechical DVD & BD movie fixups (meaning: frame & field & timing manipulation, only, with no color processing and no cosmetic processing, e.g. no yadif).
Quote:
So if having subsampled stdout, that'd be weird to get to numpy array, because one plane has different shape than other two, is it still what you want? iirc, .reshape(-1,w,h,3) should be for all frames to get , but planes need to have same size (yuv444 or rgb),
if it is for viewing raw data should be rgb -> numpy
I'm not sure what you mean, but, you know, it helps to explain the issues to someone knowledgeable like you. Your questions act as a guide pointing the correct path to take. And my explaining helps to clarify the issues in my mind. Thank you for pointing the correct path.

I guess my best approach would be to force ffmpeg to convert the input to raw rgb444p16le, only, so that all color planes are the same size and all pixels are on byte boundaries. What do you say? Or perhaps there's another way that involves less processing and memory? What do you say?

Thanks, Al.
--Mark.
markfilipak is offline   Reply With Quote
Old 14th October 2021, 16:45   #4  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,346
Quote:
Originally Posted by markfilipak View Post
What is the 'pix_fmt' of raw frames in an ffmpeg filter complex?
It's whatever the input pix_fmt is, or whatever your transformed it to in the filter_complex.

Some filters have limitations on pixel formats, sometimes ffmpeg will "sneakily" auto inject some conversion.

You can insert a -vf showinfo to see what the current pixel format at that current node, or point in the filter graph

Quote:
Or does the 'p' in 'yuv...p10' NOT mean "packed"?
planar

Quote:
Are the raw frames in filter complexes RGB or YUV?
see above

Quote:
Is there even such a thing as rgb420p10?
no - because RGB is never subsamped


You can use ffmpeg -pix_fmts to list ffmpeg supported pixel formats



Quote:
Originally Posted by markfilipak View Post
Preprocess in ffmpeg: open source video, do TB & PTS fix ups and whatnot.
What are you using ffmpeg for in terms of preprocessing ?

Note there are no TB or PTS issues in vapoursynth from DVD or BD sources (they are CFR encoded only, although the content can VFR) - when you use a frame accurate source filter. None of this flaky wandering timestamp jitter issues that ffmpeg is prone to.

A buggy timestamp DVD or BD caused by ripping with makemkv using 1ms timebase can be interpreted as CFR, with perfect timestamps, perfect timebase, if you use a frame accurate source filter. What is important in these scenarios is "frame accuracy" - you can assume any framerate - and that will fix any timestamp issues if there were any

It would probably be simpler to do all the preprocessing steps in vapoursynth, since you're using it for the interpolation part anyways. Then continue encoding / muxing with ffmpeg/mkvmerge

Quote:
Originally Posted by markfilipak View Post
I would do everything in VS if I knew how.
I find a faster way learn is from examples.

State what you're trying to do in basic terms - and there are probably examples posted already.

Quote:
Originally Posted by markfilipak View Post
Well, you see, if the input is yuv420p8 or yuv420p10, does ffmpeg create raw frames that are yuv420? Or yuv444? Or rgb420? Or rgb444?
ffmpeg generally doesn't do anything unless you tell it to. It keeps the input pixel_fmt unless some procedure forces it to convert to another pixel format

If you send yuv420p8 data, it reads yuv420p8 data.

If you add some RGB-only filter, but don't specify the preconversion, it will auto insert a conversion (or sometimes throw an error)

Quote:
I guess my best approach would be to force ffmpeg to convert the input to raw rgb444p16le, only, so that all color planes are the same size and all pixels are on byte boundaries. What do you say? Or perhaps there's another way that involves less processing and memory? What do you say?
No - unless your desired output format is 16 bit RGB (unlikely. ...but maybe you 're going to photoshop or after effects for some manual clean up , painting, compositing)
poisondeathray is offline   Reply With Quote
Old 14th October 2021, 18:47   #5  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
@poisondeathray, you are... a prince! In one reply, you have cleared up so much!
Quote:
Originally Posted by poisondeathray View Post
It's whatever the input pix_fmt is, or whatever your transformed it to in the filter_complex.
Thanks, I've often wondered about that.
Quote:
Some filters have limitations on pixel formats, sometimes ffmpeg will "sneakily" auto inject some conversion.

You can insert a -vf showinfo to see what the current pixel format at that current node, or point in the filter graph
Ah! I did that long ago and forgot about it. Thanks for reminding me.
Quote:
Quote:
Originally Posted by markfilipak View Post
Or does the 'p' in 'yuv...p10' NOT mean "packed"?
planar
Thank you! Hmmm... Bad interpretation of previous clues on my part. Or perhaps, some people do think 'p' means 'packed', eh? Thanks for clarification. I'll not forget it.

Do you happen to know why it's called 'planar'? Is that simply to indicate that color is treated as sepatate (and separable) components or is it more profound like having to do with the structure of macroblocks (e.g. the differences how pixels are stored in frame v. field structure)? -- I'm quite familiar with the various species of macroblocks.
Quote:
Quote:
Is there even such a thing as rgb420p10?
no - because RGB is never subsamped
Ding-ding-ding! That's a 10! I get it. I thought RGB could be subsampled. That's going to clear up a lot of issues for me.
Quote:
You can use ffmpeg -pix_fmts to list ffmpeg supported pixel formats
I did that from the git-go with ffmpeg and saved it as a text file, ''ffmpeg -pix_fmts' .txt', to which I add notes. The main problem is that there are just names (not details/pictures) of the formats. I suppose that what's important is to match processing to 'pix_fmt' but I've not yet fully cracked that nut.
Quote:
What are you using ffmpeg for in terms of preprocessing ?
Mostly to set TB to 1/FPS and PTS to 'N' to avoid ffmpeg complaints/errors. I was advised that that's not the right thing to do but, you know, I've transcoded many movies and I've yet to see one that's actually VFR or that produces an out-of-order raw frame stream. Despite assertions to the contrary, I'm pretty convinced that ALL DVD & BD content is CFR.
Quote:
Note there are no TB or PTS issues in vapoursynth from DVD or BD sources (they are CFR encoded only, although the content can VFR) - when you use a frame accurate source filter. None of this flaky wandering timestamp jitter issues that ffmpeg is prone to.
I am 100% with you there. Well, that's really, really, going to help me retire ffmpeg entirely. Paul Mahol insists that a frame number, 'N', approach is not good and that monotonic PTSs are mandatory (without providing a clue how to achieve montonic PTSs). I never could accept that frames can come out of the decoder out of order and that PTSs are significant. I have seen clues from way back in time regarding a dispute between the ffmpeg folks and the avisynth folks in regard to the primacy of frame numbers over PTSs. Now I think I understand it and that Paul simply is wrong.
Quote:
A buggy timestamp DVD or BD caused by ripping with makemkv using 1ms timebase can be interpreted as CFR, with perfect timestamps, perfect timebase, if you use a frame accurate source filter. What is important in these scenarios is "frame accuracy" - you can assume any framerate - and that will fix any timestamp issues if there were any
Okay. First, I don't use makemkv. I use AnyDVD-HD and rip (i.e. back up discs I own) to ISOs that I mount. Second, what you describe is exactly what I've been doing with 'settb=eval=1/24,setPTS=eval=N' and I've never gotten in trouble or had unexpected results. I think you just voided the need for preprocessing: just do the decode in VS, do the MV interpolation in VS, and then pipe the raw to ffmpeg. I LIKE IT! By the way, I set TB to 1/24 to retime movies to the speed and running times seen in theaters, and I use 'atempo' to fix up the audio, but I've not found a way to fix up subtitles and chapters, but I've discovered that it often doesn't matter -- mkvmerge must be doing some sort of fixup.
Quote:
It would probably be simpler to do all the preprocessing steps in vapoursynth, since you're using it for the interpolation part anyways. Then continue encoding / muxing with ffmpeg/mkvmerge
Yup! Thanks!
Quote:
I find a faster way learn is from examples.
Me too. And I think the best examples are complete workflows, not fragments of code. People are pretty smart and are generally able to extrapolate what they need to do in their workflows from a few good workflow examples, even if the objectives differ.
Quote:
State what you're trying to do in basic terms - and there are probably examples posted already.
I think you have a good vision of what I'm doing already:

For progressive and soft telecine (what I call 23.9fps[24pps]): Force TB & PTSs to 24fps[24pps], MV interpolate to 120fps[120pps], resync audio subs & chaps to the resulting 1/120 TB, encode & mux.

For hard telecine(what I call 23.9fps[2-3[24pps]] for example): Add detelecine.

For so-called NTSC (what I call '29.9fps[59.9sps]'): separatefields, separately MV interpolate each field stream to 120fps, bob the first field stream for 2 frames while delaying the second field stream by 2 frames and finally weave them together to get a perfect run of 100% progressive frames.

Mixed 'NTSC'+hard telecine presents a problem: I've not found a way to flag combed frames, frame-by-frame. I need that flag in order to switch between my 29.9fps[2-3[24pps]]-to-120fps[120pps] method and my 29.9fps[59.9sps]-to-120fps[120pps] method.

The overall objective is a one-click solution to take anything on a DVD or BD disc that's been professionally mastered, probe it to ascertain its properties, and then transcode it to HEVC/MKV. What I don't believe is that professionals make so many mistakes as to make that goal unattainable -- afterall, anything a DVD/BD player can play can't be that messed up.
Quote:
ffmpeg generally doesn't do anything unless you tell it to. It keeps the input pixel_fmt unless some procedure forces it to convert to another pixel format ... If you add some RGB-only filter, but don't specify the preconversion, it will auto insert a conversion (or sometimes throw an error)
I still lurk the ffmpeg-user list. So many problems people have are just the sort of surprises you cite.
Quote:
Quote:
I guess my best approach would be to force ffmpeg to convert the input to raw rgb444p16le ...
No - unless your desired output format is 16 bit RGB (unlikely. ...but maybe you 're going to photoshop or after effects for some manual clean up , painting, compositing)
Thanks so much.
markfilipak is offline   Reply With Quote
Old 14th October 2021, 19:42   #6  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,346
Quote:
Originally Posted by markfilipak View Post
Do you happen to know why it's called 'planar'? Is that simply to indicate that color is treated as sepatate (and separable) components or is it more profound like having to do with the structure of macroblocks (e.g. the differences how pixels are stored in frame v. field structure)? -- I'm quite familiar with the various species of macroblocks.

Packed vs. planar has to do with the way uncompressed data is stored and organized. If you look at fourcc.org, it lists some common formats and orientation, with some diagrams
https://www.fourcc.org/yuv.php

Quote:
YUV formats fall into two distinct groups, the packed formats where Y, U (Cb) and V (Cr) samples are packed together into macropixels which are stored in a single array, and the planar formats where each component is stored as a separate array, the final image being a fusing of the three separate planes.
For example, yuv420p8 is "8bit per pixel component, 4:2:0 subsampling". But the uncompressed format can be stored or arranged in a variety of ways. The "fourcc" code is supposed to identify the arrangement

eg. Both "YV12" and "NV12" and "IYUV" are all 8bit 4:2:0. But they store/arrage the data differently. "NV12" has U,V planes interleaved, but "IYUV" has reversed plane order compared to "YV12"
https://www.fourcc.org/pixel-format/yuv-yv12/
https://www.fourcc.org/pixel-format/yuv-nv12/
https://www.fourcc.org/pixel-format/yuv-i420/



Quote:
I'm pretty convinced that ALL DVD & BD content is CFR.

For DVD/BD - the encoding is CFR, 100% always.

But the content can be VFR. It's very common - What I mean is you can have mixed cadence with field and frame repeats. e.g. 59.94 fields/s interlaced content sections - where this is motion in every field, with 29.97p sections, 23.976p sections, or with 14.985p sections, or other frame rates. For example a "slow motion" section of a 23.976p film might have 14.985p frames as duplicates (for effectively 1/2 speed during that sequence) .

Quote:
I am 100% with you there. Well, that's really, really, going to help me retire ffmpeg entirely. Paul Mahol insists that a frame number, 'N', approach is not good and that monotonic PTSs are mandatory (without providing a clue how to achieve montonic PTSs). I never could accept that frames can come out of the decoder out of order and that PTSs are significant. I have seen clues from way back in time regarding a dispute between the ffmpeg folks and the avisynth folks in regard to the primacy of frame numbers over PTSs. Now I think I understand it and that Paul simply is wrong.
For general use, he's correct . You have to be able to cover VFR content cases , which FAR outnumber the CFR cases these days. Think cell phone video, tablets etc... - they are all recorded VFR.

The problem with avisynth , is natively CFR. VFR is more difficult to output (it's possible with decimation and output of timestamps to keep sync)


Quote:
Okay. First, I don't use makemkv. I use AnyDVD-HD and rip (i.e. back up discs I own) to ISOs that I mount. Second, what you describe is exactly what I've been doing with 'settb=eval=1/24,setPTS=eval=N' and I've never gotten in trouble or had unexpected results. I think you just voided the need for preprocessing: just do the decode in VS, do the MV interpolation in VS, and then pipe the raw to ffmpeg. I LIKE IT! By the way, I set TB to 1/24 to retime movies to the speed and running times seen in theaters, and I use 'atempo' to fix up the audio, but I've not found a way to fix up subtitles and chapters, but I've discovered that it often doesn't matter -- mkvmerge must be doing some sort of fixup.
It's up to you, more than one way to do things.

I was just suggesting doing most of it in 1 program, because going back and forth adds complexity and overhead (slower)



Quote:
I think you have a good vision of what I'm doing already:

For progressive and soft telecine (what I call 23.9fps[24pps]): Force TB & PTSs to 24fps[24pps], MV interpolate to 120fps[120pps], resync audio subs & chaps to the resulting 1/120 TB, encode & mux.

For hard telecine(what I call 23.9fps[2-3[24pps]] for example): Add detelecine.

For so-called NTSC (what I call '29.9fps[59.9sps]'): separatefields, separately MV interpolate each field stream to 120fps, bob the first field stream for 2 frames while delaying the second field stream by 2 frames and finally weave them together to get a perfect run of 100% progressive frames.
Those ones are covered and "textbook" cases . Hard and soft can be treated functionally the same.

If you want to "speedup" 24000/1001 to 24/1, you can use
core.std.AssumeFPS(clip, fpsnum=24, fpsden=1) . It's the same frame count, just the framerate, their timestamps are all adjusted.

There are 24000/1001 vs. 24/1 variants on BD but rarely are BD's telecined (almost always native progressive); but "film" DVD will always be 24000/1001

The NTSC interlaced content case is slightly problematic because of the even/odd field offset (spatially displaced 1 pixel). If you do it the way you propose, you will get an up/down flutter motion per frame pair. Usually a smart double rate deinterlacer would be used, like QTGMC , to output 59.94p, then interpolate to something else if desired.

Quote:
Mixed 'NTSC'+hard telecine presents a problem: I've not found a way to flag combed frames, frame-by-frame. I need that flag in order to switch between my 29.9fps[2-3[24pps]]-to-120fps[120pps] method and my 29.9fps[59.9sps]-to-120fps[120pps] method.
Yes, this is "VFR" content.

Combed frames can be flagged (but comb detection is not necessarily 100% accurate - there are different types of "comb" patterns) , but I don't know of a good way to automatically process mixed VFR cadence with interpolation properly, and automatically.



Quote:
The overall objective is a one-click solution to take anything on a DVD or BD disc that's been professionally mastered, probe it to ascertain its properties, and then transcode it to HEVC/MKV. What I don't believe is that professionals make so many mistakes as to make that goal unattainable -- afterall, anything a DVD/BD player can play can't be that messed up.
Maybe for standard titles, big budget Hollywood movies.

I have a feeling you haven't encountered "problem" DVD's, such as some low budget DVD's, multi format converted sources, some anime DVD's. These have many layers of problems on top of what you describe - and what many posts in the avisynth forum deal with . (There is no one click solution for those situations, they need human eyes and custom script solutions)
poisondeathray is offline   Reply With Quote
Old 17th October 2021, 01:32   #7  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
Quote:
Originally Posted by poisondeathray View Post
... For DVD/BD - the encoding is CFR, 100% always.

But the content can be VFR. It's very common - What I mean is you can have mixed cadence with field and frame repeats. e.g. 59.94 fields/s interlaced content sections - where this is motion in every field, with 29.97p sections, 23.976p sections, or with 14.985p sections, or other frame rates. For example a "slow motion" section of a 23.976p film might have 14.985p frames as duplicates (for effectively 1/2 speed during that sequence) . ...
You are so generous with your time and your replies that I almost hesitate to take more of it.

I know you're very knowledgeable and have given video much thought. May I share my views?

That you mix frame rate and picture rate does not surprise me. The MPEG engineers do the same. If there's one small contribution I can make, I would like it to be my nomenclature to separate frame rate and picture rate. Let me give examples and see what you think of it.

'24pps' denotes 24 pictures per second. '23.9fps' denotes 24000 frames per 1001 seconds.
'23.9fps[24pps]' denotes cinema-to-video that runs slow by 1 part in 1000 parts -- running time is ++3.6 seconds per hour.
(The brackets essentially mean 'contained'.)

'72fps[24pps]' denotes cinema that's essentially triple-shuttered.

'2-3[24pps]' denotes 2-3 pull-down cinema that yields the equivalent of 30pps.
'29.9fps[2-3[24pps]]' denotes hard telecine of 2-3 pull-down cinema that runs slow by 3.6 seconds per hour.
'3-2[24pps]' '2-2-2-4[24pps]' etc. denote other pull-downs.

'59.9sps' denotes NTSC. '50sps' denotes PAL. '29.9fps[59.9sps]' & '25fps[50sps]' denote interlaced digital-NTSC & -PAL.
'120fps[59.9sps]' denotes interlaced digital-NTSC that's doubled and runs fast by 3.596403.. seconds per hour.

'120fps[120pps] is cinema that's been 1-to-5 interpolated to 120pps and put in frames on a 1-to-1 basis.

Using this notation, I haven't encountered any video situation that can't be compactly characterized with one exception: the DVD of the movie "PASSION FISH". That DVD feature has sequences of 2 combed frames alternating with an odd number of progressive frames. The number of progressive frames is between 5 and 71, is always an odd number, and the alternation has no descernable repetition pattern.

You cite an example: "59.94 fields/s interlaced content sections - where this is motion in every field, with 29.97p sections, 23.976p sections, or with 14.985p sections".
If they are characterized by '29.9fps[59.9sps]' '29.9fps[29.9pps]' '29.9fps[24pps]' and '29.9fps[14.9pps]', then they are all contained in 29.9fps frames and is therefore a CFR stream.
Is that what you intended?

PS: In the same way that a TS contains PSs (i.e. frames), a notation that has frames contining pictures is, I think, a useful and consistent extension.

Last edited by markfilipak; 17th October 2021 at 02:09.
markfilipak is offline   Reply With Quote
Old 17th October 2021, 02:46   #8  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,346
Quote:
Originally Posted by markfilipak View Post

That you mix frame rate and picture rate does not surprise me. The MPEG engineers do the same.
Not a fan of the notation, but that's just me. Maybe someone will like it.

I'm just distinguishing between the content frame rate vs. encoded frame rate or field rate. The frame rate is 29.97 , and the field rate is 59.94 for all interlaced encoded NTSC DVD's for the encoded stream. But that is not necessarily reflective of what the content frame rate truly is.

Quote:

You cite an example: "59.94 fields/s interlaced content sections - where this is motion in every field, with 29.97p sections, 23.976p sections, or with 14.985p sections".
If they are characterized by '29.9fps[59.9sps]' '29.9fps[29.9pps]' '29.9fps[24pps]' and '29.9fps[14.9pps]', then they are all contained in 29.9fps frames and is therefore a CFR stream.
Is that what you intended?
For that example , it was supposed to be a NTSC DVD. For an interlaced encoded DVD it's all something in 59.94fields/s - because that's the field rate that everything is contained in . So yes, the 59.94 fields/second is the CFR stream. And the point was the content can be variable in that CFR stream (the content frame rate is changing if the duplicates were removed). In your notation it would be 59.9sps[somthing pps] , except for the interlaced content which would be 29.97fps[59.9sps] , I think

What people have been using for years are descriptions like "29.97i" for interlaced content, "29.97p in 29.97i" , "23.976p in 29.97i" , "14.985p in 29.97i" most people would abbreviate as "15p in 29.97i". pN for progressive native, such as 23.976pN , 24pN, 29.97pN .

How would you distinguish between a video 14.985p in 29.97p (encoded progressively) vs. 14.985p in 29.97i (encoded interlaced as fields)? - or is that what sps is for ? - you never fully explained what the sps letters stand for - so would it be "29.9fps[14.9pps]" vs "59.9sps[14.9pps]" ? And 14.985p in 59.94p would be "59.94fps[14.9pps]" ?

Last edited by poisondeathray; 17th October 2021 at 03:24.
poisondeathray is offline   Reply With Quote
Old 17th October 2021, 05:18   #9  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
Quote:
Originally Posted by poisondeathray View Post
... What people have been using for years are descriptions like "29.97i" for interlaced content, "29.97p in 29.97i" , "23.976p in 29.97i" , "14.985p in 29.97i" most people would abbreviate as "15p in 29.97i". pN for progressive native, such as 23.976pN , 24pN, 29.97pN .
It seems to me that what I'm calling "pictures" (hence, pps) you are calling "content". Does that help clarify? The MPEG specs call them "pictures".
Quote:
How would you distinguish between a video 14.985p in 29.97p (encoded progressively) vs. 14.985p in 29.97i (encoded interlaced as fields)?
14.985p in 29.97p (encoded progressively) == 29.970fps[14.985pps]. That is 14.985 pictures per second shown at 29.970 frames per second, so shown at 2x the original picture rate (sped up).

"14.985p in 29.97i (encoded interlaced as fields)?" Well, if I understand correctly, that would be 14.985 pictures per second, deinterlaced (so, 29.970sps), then interlaced and encoded at 29.970 frames per second -- that, though it doesn't make much sense to me. So I guess that'd be '29.970fps[1-1[14.985pps]]. My problem is with the meaning of "14.985p in 29.97i".

I'm probably misunderstanding because, you see, I have a problem understanding what most folks mean by the word "interlace". For example, you wrote "encoded interlaced as fields". That, to me, is a contradictory statement -- an encoding is either frame-based (interlaced) or field-based (not interlaced). Since field encoding is not interlaced, "encoded interlaced as fields" confuses me.

By "interlaced", most folks mean 2 temporal scans woven together to form (combed) pictures but in the macroblocks, the picture data is field-based, not interlaced -- the interlace occurs in the decoder, not in the stream and the metadata are instructions to the decoder, not a statement regarding what the macroblock format is. So, what most people call "interlaced" video is actually not interlaced. The MPEG engineers solve this problem by calling such streams "interleaved", not "interlaced" -- in fact, when you read the various MPEG specs, you won't find the word "interlace" anywhere. Thus, an interlaced video means that the decoder is required to interlace the fields (i.e. the scans), not that the stream consists of interlaced data. In other words, an interlaced video is actually non-interlaced. Confusing isn't it?

The key to my understanding is that metadata: 'progressive_sequence' 'picture_structure' 'top_field_first' 'repeat_first_field' 'progressive_frame', are all instructions to decoders. However, most folks interpret them as stream data states, instead. So, I've learned that when someone refers to "interlaced video" they mean a video that needs to be interlaced.

I've likewise learned that when most folks refer to "deinterlace" (as, for example, a deinterlace filter), they mean a process that deinterlaces as part of the process, not as an input or an output format.

I even have a problem with the word "filter" because many of the processess that are called filters do absolutely no filtering (i.e. no separating, no sorting, no routing -- even a process that weaves is called a "filter"). They are not filters but, instead, are processes. But I've learned that the word "filter" in video can mean almost anything, usually denoting a position in a processing pipeline (such as 'filter_complex'), not what I learned in electrical engineering courses.

I tried to discuss such stuff in the ffmpeg-user mailing list and was attacked as not knowing anything and stupid.

Quote:
- or is that what sps is for ? - you never fully explained what the sps letters stand for
Oh, sorry, "scans per second". In other words, non-interlaced fields (scans) -- what MPEG calls "half-pictures" (which is not really correct for scans, eh? But then there's a lot of stuff in MPEG that's not quite correct, eh?).

I use "scans" instead of "fields" because "scan" can be abbreviated as "s" whereas "field" presents an obvious problem. Also, I think the word "scan" is more descriptive of the camera (television) used to make pictures.
Quote:
- so would it be "29.9fps[14.9pps]" vs "59.9sps[14.9pps]" ?
If the original video was recorded at 14.985pps and then put into 29.970fps frames, that would simply be '29.970fps[14.985pps]', meaning: 14.985 pictures per second shown at 29.970 frames per second, so shown at 2x the original picture rate.
Quote:
And 14.985p in 59.94p would be "59.94fps[14.9pps]" ?
Yes. 14.985 pictures per second shown at 59.940 frames per second, so shown at 4x the original picture rate.

PS: Just a moment. Upon rereading I see you wrote "59.9sps[14.9pps]". What do you mean by that?

PPS: You're (naturally) asking me about a lot of things I haven't considered in my workflows. For example, to be consistent,
59.9fps[14.9pps] would be 14.9pps framed at 59.9fps, i.e. sped up by 4x.
59.9fps[8[14.9pps]] would be 14.9pps with 8 fields per picture contained in 4 frames, i.e. 14.9pps simply quadruple-shuttered -- '8' specifies a type of telecine, in the same way that '2-3' is telecine except that '8' doesn't change the cadence, it just uses the 8 fields (1 field-pair are original + 3 field-pairs are copies) to fill out the 4 frames. so, no speed up but just 4x shutter.

Last edited by markfilipak; 17th October 2021 at 05:44.
markfilipak is offline   Reply With Quote
Old 17th October 2021, 06:03   #10  |  Link
_Al_
Registered User
 
Join Date: May 2011
Posts: 321
Oh man, just hop on the wagon and use terminology that is used in the neighborhood and there is a reason why. In DVD, broadcast , if interlaced, it is all about fields. Everyone knows what is delivery fps and what is actual fps that is most of the time "dug out" from there. Not sure why would you suddenly decided to attack a terminology. That leads absolutely nowhere.

I'd be worried about that 24fps to 120fps. It is just insane. You are going to create tons of artifacts that does not belong to video, and you will be storing it.
_Al_ is offline   Reply With Quote
Old 17th October 2021, 06:38   #11  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
@poisondeathray

I tried to include (attach?) a '.jpg' showing 29.9fps[2-1-2-5[24pps]] from an "ALL ABOUT EVE" BD bonus feature (i.e. 00390.m2ts, frames 1018-1029). It may be approved and show up in a future post to this thread. Also, I have a '.jpg' showing 29.9fps[1-4-4-4..[24pps]+2-3[24pps]] from the tltle screens of "28 DAYS" (i.e. VTS_02_1.VOB). It's a mix of a badly edited background (the 1-4-4-4..[24pps] part) with overlayed text (the 2-3[24pps], i.e. normally telecined, part that apparently was added later). If the "ALL ABOUT EVE" post shows up here, I'll post the "28 DAYS" jpeg. Both of them are terribly hard to describe but using the notation, they are precisely and unambiguously characterized. They provide excellent examples of the power of the notational system.
markfilipak is offline   Reply With Quote
Old 17th October 2021, 06:41   #12  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,259
side note: might be easier/faster to upload the images to something like imgbb.com
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 17th October 2021, 06:48   #13  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
Quote:
Originally Posted by _Al_ View Post
Oh man, just hop on the wagon and use terminology that is used in the neighborhood and there is a reason why. In DVD, broadcast , if interlaced, it is all about fields. Everyone knows what is delivery fps and what is actual fps that is most of the time "dug out" from there. Not sure why would you suddenly decided to attack a terminology. That leads absolutely nowhere.
Hi Al.

Good to hear from you. You know that, if you don't like what I write -- totally understandable -- you're free to ignore it, eh?
Quote:
I'd be worried about that 24fps to 120fps. It is just insane. You are going to create tons of artifacts that does not belong to video, and you will be storing it.
I've been motion vector interpolating 24fps[24pps] to 120fps[120pps] for some time now and have gotten amazing results and, with such short motion vectors that compress really well, file sizes between 1/6th and 1/8th the sizes of the originals despite using placebo settings.... Truely stunning outputs.
markfilipak is offline   Reply With Quote
Old 17th October 2021, 06:54   #14  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
Quote:
Originally Posted by Selur View Post
side note: might be easier/faster to upload the images to something like imgbb.com
Thanks for the tip, Selur,

I can be patient. Also, I like to have the jpegs in the thread. That's more convenient, and it makes the posting look so cool.
...Like I know what I'm 'talking' about.
markfilipak is offline   Reply With Quote
Old 17th October 2021, 16:27   #15  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,346
Quote:
Originally Posted by markfilipak View Post
14.985p in 29.97p (encoded progressively) == 29.970fps[14.985pps]. That is 14.985 pictures per second shown at 29.970 frames per second, so shown at 2x the original picture rate (sped up).
Yes, to be clear , this is 14.985p content with duplicates , encoded progressively. 2x the number of frames, but 2x the speed

If you removed the duplicates you have the original, and timestamps would show a delta of ~66.733ms for that section.

Different sections might have different "pps" - hence the usefulness of "VFR" and timestamps that Paul alluded to (? or Elon... WTF ). Timestamp VFR means you can have unique frames and content runs at the original rate (or each frame is displayed for the proper amount of time) . You don't need wasteful frame or field repeats - that archaic system was imposed on us to comply with NTSC, broadcast engineers


Quote:
Originally Posted by markfilipak View Post
"14.985p in 29.97i (encoded interlaced as fields)?" Well, if I understand correctly, that would be 14.985 pictures per second, deinterlaced (so, 29.970sps), then interlaced and encoded at 29.970 frames per second -- that, though it doesn't make much sense to me. So I guess that'd be '29.970fps[1-1[14.985pps]]. My problem is with the meaning of "14.985p in 29.97i".
"deinterlacing" means different things to different people. I wouldn't call it "deinterlaced" that situation

This is 14.985p original content encoded for NTSC DVD. Hard telecined if you will, so there would be frame duplicate pairs if you were examining frames. Nothing else is done to the content. The encoding type is interlaced (in MPEG2 alternate scan would be used) instead of progressive (in MPEG2 zig zag scan, but progressive would be soft telecined with repeat field flags for DVD compatibility). The metadata "flagging" will be different , one will be interlaced, the other progressive. The flagging and metadata has potential implications for other programs and how stream is handled. (And there is MBAFF too, but we will avoid that for the moment)

Quote:
Originally Posted by markfilipak View Post
I'm probably misunderstanding because, you see, I have a problem understanding what most folks mean by the word "interlace". For example, you wrote "encoded interlaced as fields". That, to me, is a contradictory statement -- an encoding is either frame-based (interlaced) or field-based (not interlaced). Since field encoding is not interlaced, "encoded interlaced as fields" confuses me.

By "interlaced", most folks mean 2 temporal scans woven together to form (combed) pictures but in the macroblocks, the picture data is field-based, not interlaced -- the interlace occurs in the decoder, not in the stream and the metadata are instructions to the decoder, not a statement regarding what the macroblock format is. So, what most people call "interlaced" video is actually not interlaced. The MPEG engineers solve this problem by calling such streams "interleaved", not "interlaced" -- in fact, when you read the various MPEG specs, you won't find the word "interlace" anywhere. Thus, an interlaced video means that the decoder is required to interlace the fields (i.e. the scans), not that the stream consists of interlaced data. In other words, an interlaced video is actually non-interlaced. Confusing isn't it?

The key to my understanding is that metadata: 'progressive_sequence' 'picture_structure' 'top_field_first' 'repeat_first_field' 'progressive_frame', are all instructions to decoders. However, most folks interpret them as stream data states, instead. So, I've learned that when someone refers to "interlaced video" they mean a video that needs to be interlaced.

I've likewise learned that when most folks refer to "deinterlace" (as, for example, a deinterlace filter), they mean a process that deinterlaces as part of the process, not as an input or an output format.

I even have a problem with the word "filter" because many of the processess that are called filters do absolutely no filtering (i.e. no separating, no sorting, no routing -- even a process that weaves is called a "filter"). They are not filters but, instead, are processes. But I've learned that the word "filter" in video can mean almost anything, usually denoting a position in a processing pipeline (such as 'filter_complex'), not what I learned in electrical engineering courses.

I tried to discuss such stuff in the ffmpeg-user mailing list and was attacked as not knowing anything and stupid.
Yes, valid points.

You're correct - it is field encoding, frame encoding, or a mixed field or frame macroblocks (MBAFF).

"interlace", "deinterlace", and "filter" can mean different things to different people . Everyone might be on a different page.

People, official organizations make up and change terms too. For example "PAR" pixel aspect ratio in MPEG2 is now called "SAR" or sample aspect ratio in MPEG4 terminology. It's all there in the ITU specs. There is no PAR anymore in modern formats. Or "29.97i" and "25i" is the original official notation that many organizations like broadcasters, EBU use, but Sony, Adobe, a bunch of other companies now call it "59.94i" and "50i." Maybe their marketing team though a higher number would sell more cameras

You'll never get everyone to agree on nomenclature, just do the best you can to describe whatever it is.

Quote:
Originally Posted by markfilipak View Post
Quote:
And 14.985p in 59.94p would be "59.94fps[14.9pps]" ?
Yes. 14.985 pictures per second shown at 59.940 frames per second, so shown at 4x the original picture rate.
Yes, 4x repeats.

For example 720p59.94 broadcast channels. The encoded stream is 59.94p. The content frame rate (pps if you want) in a given section might be 14.985fps consisting of 4x frame repeats.

If you decimated the duplicates, you would have the original 14.985fps

Quote:
Originally Posted by markfilipak View Post
PS: Just a moment. Upon rereading I see you wrote "59.9sps[14.9pps]". What do you mean by that?
That was referring to a 29.97i stream, with 14.985 fps content.

Quote:
PPS: You're (naturally) asking me about a lot of things I haven't considered in my workflows. For example, to be consistent,
59.9fps[14.9pps] would be 14.9pps framed at 59.9fps, i.e. sped up by 4x.
I like "framed at" description, it's similar to the "14.985p in 59.94p" description (4x frame repeats)

I don't like term "sped up", because that implies a speed change (it is a speed change, but with duplicates - someone might interpret that differently, it's a point that might cause confusion)

14.985p native content (unique frames) appears the same as 14.985p in 59.94p (4x frame repeats), on a 59.94Hz display. The former has 4x fewer frames and is more efficient in terms of encoding
poisondeathray is offline   Reply With Quote
Old 18th October 2021, 00:40   #16  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
Quote:
Originally Posted by poisondeathray View Post
Quote:
Originally Posted by markfilipak View Post
Quote:
Originally Posted by poisondeathray View Post
And 14.985p in 59.94p would be "59.94fps[14.9pps]" ?
Yes. 14.985 pictures per second shown at 59.940 frames per second, so shown at 4x the original picture rate.
Yes, 4x repeats.
No, not 4x repeats; 4x picture rate.
Bear with me, this is at the core of our miscommunication regarding the notation. Trust me, it's simpler than you think.
14.985pps framed by the camera would be 14.985fps[14.985pps].
14.985pps with 4x repeats would be 59.940[8[14.985pps]] -- pictures have 2-to-8 field telecine, so would run at 1x but are actually 4x shuttered.
14.985pps with 4x speed up would be 59.940[14.985pps] -- pictures would run fast by 4x (running time would be 1/4).
Quote:
For example 720p59.94 broadcast channels. The encoded stream is 59.94p. The content frame rate (pps if you want) in a given section might be 14.985fps consisting of 4x frame repeats.

If you decimated the duplicates, you would have the original 14.985fps
Okay, that's definitely 59.940[8[14.985pps]] -- 14.985pps with 2-to-8 field telecine.

Give me more use cases and I'll give you more examples of the notation.

To cite one of the most common use cases:
29.970[2-3[24pps]] is cinema (24pps) that's 2-3 telecined (8-to-10 field telecine, so 30 telecined pictures per second) inside 29.970fps (telecined picture rate = frame rate, so no speed up). Well, actually 2-3[24pps] runs slow by 29.970/30 (a 'feature' that the notation exposes unambiguously).

If I write this conversion:
23.976fps[24pps] --> 29.970fps[2-3[24pps]]
isn't that an easier, more compact, and more precise way to characterize 2-3 telecine that runs x/1.001 slow than by using words to explain it?

Another example, this time interpolating to a higher picture rate:
24pps --> 120pps is 1-to-5 picture interpolation.
24fps[24pps] --> 120fps[120pps] is the same interpolation, but contained in frames.
24fps[120pps] would be x/5 slow motion.

You see, when you write "24p", I can't tell whether you mean 24fps (a frame rate) or 24pps (a picture rate).
But if I write "24fps", you know that's a frame rate. If I write "24pps", you know that's a picture rate. If I write "24fps[24pps]", you know that's 24 pictures per second in frames running at 24 frames per second, and you can conclude that the pictures are shown at normal rate. See? Simple, eh? Restricting communication to just "p" and "i" for everything leads to frame v. picture confusion that breaks understanding.

Let me state it another way so you understand. When you write "24p", you know what you mean (by its context), but I don't know what you mean because I don't yet understand the context -- the context is the thing you're trying to explain, otherwise, no one would misunderstand anything, eh? Using terms that rely on context when trying to explain the context (the use case) leads to frustration for both of us.
Quote:
Quote:
PS: Just a moment. Upon rereading I see you wrote "59.9sps[14.9pps]". What do you mean by that?
That was referring to a 29.97i stream, with 14.985 fps content.
Okay, let me stick with that for a bit... (I think it may be fruitful.)
By "a 29.97i stream", I think you mean 29.970 frames per second, right? Or do you actually mean 29.970 fields per second? ...At this point, I really don't know.
1 - If you mean 29.970 frames per second, then that's 29.970fps[??????pps].
2 - But if you mean 29.970 fields (i.e. scans) per second, then that's ??????fps[29.970sps].
And by "14.985 fps content", I think you mean 14.985 frames per second, right? Or do you actually mean "content" (meaning: pictures/scans)? ...At this point, I really don't know.
A - If you really do mean 14.985 fps, then that's simply 14.985[??????pps].
B - But if you really do mean content, then that's either ??????fps[14.985pps] or ??????fps[14.985sps].
And to be clear, by "picture" I mean 720x480 for example, and by "scan" I mean 720x240 for example.
(Now, technically, if the original frame's picture is progressive and has been separated into fields (deinterlaced if you wish), then the result is actually half-pictures, not scans, but I'm going to ignore that detail for the time being because what you wrote includes the letter "i" which leads me to believe you're referring to scans. Okay?)
So when attempting to understand what you wrote: "29.97i stream, with 14.985 fps content", I'm presented with 4 possibilities:
1A - The union of 29.970fps[??????pps] and 14.985fps[??????pps], or
1B - The union of 29.970fps[??????pps] and ??????fps[14.985sps], or
2A - The union of ??????fps[29.970sps] and ??????fps[14.985sps], or
2B - The union of ??????fps[29.970sps] and ??????fps[14.985sps].
Well, I can immediately toss out 2A and 2B because they are the same, and they both cite 'sps' with no 'fps' and those 'sps's conflict.
And, I can toss out 1A because a video can't be both 29.970fps and 14.985fps at the same time.
That leaves me with this:
1B - The union of 29.970fps[??????pps] and ??????fps[14.985sps].
At this point I think you refer to a video made by a field-scan camera (maybe a TV camera or camcorder) producing digital-NTSC frames. Am I right?
In that case, then the notation is 29.970fps[14.985sps].
If that is the case, and since the normal frame rate for 14.985sps would be 7.4925fps, then I'd say that you're referring to an interlaced video taken at 14.985sps that is sped up by 4x.

How'd I do?
markfilipak is offline   Reply With Quote
Old 18th October 2021, 02:19   #17  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,346
Quote:
Originally Posted by markfilipak View Post
No, not 4x repeats; 4x picture rate.
Bear with me, this is at the core of our miscommunication regarding the notation. Trust me, it's simpler than you think.
14.985pps framed by the camera would be 14.985fps[14.985pps].
14.985pps with 4x repeats would be 59.940[8[14.985pps]] -- pictures have 2-to-8 field telecine, so would run at 1x but are actually 4x shuttered.
14.985pps with 4x speed up would be 59.940[14.985pps] -- pictures would run fast by 4x (running time would be 1/4).
Ok I mostly get it now.

Quote:
Okay, that's definitely 59.940[8[14.985pps]] -- 14.985pps with 2-to-8 field telecine.
But it's encoded progressively as frames, broadcast as frames - so "field telecine" might be inappropriate way to describe it



There is definitely merit to the notation - but trust me - you're going to confuse many people.

If I were you, I would probably write a guide or "FAQS" with common examples - so people can understand what you mean.

The rest of the world is probably fine with the "15p in 60p", or "23.976p in 29.97i" style notation It's easier and it already works. People don't like change. LIke PAR to SAR, 29.97i to 59.94i. Video people on forums generally know what it means, because they deal with processing of various video




Quote:
.

If I write this conversion:
23.976fps[24pps] --> 29.970fps[2-3[24pps]]
isn't that an easier, more compact, and more precise way to characterize 2-3 telecine that runs x/1.001 slow than by using words to explain it?
Yes it is . But you're also going to lose some readers and confuse them with that notation, at least initially. And for the "video" people , they already know what "film NTSC telecined" is and how you got from A to B. For new people - you're going to lose them too, or you're going to have explain the process in words in addition to the notation anyways...

Quote:

You see, when you write "24p", I can't tell whether you mean 24fps (a frame rate) or 24pps (a picture rate).
But if I write "24fps", you know that's a frame rate. If I write "24pps", you know that's a picture rate. If I write "24fps[24pps]", you know that's 24 pictures per second in frames running at 24 frames per second, and you can conclude that the pictures are shown at normal rate. See? Simple, eh? Restricting communication to just "p" and "i" for everything leads to frame v. picture confusion that breaks understanding.
Personally, I don't write it as "24p" . I usually write as 24.0p or 24/1 (mainly to distinguish it from 23.976p - people often use "24p" when they mean 23.976p). "24p" can only mean 1 thing in common usage (it indicates both frame rate and picture rate). But I can see how that notation can be useful in some situations - eg. if you're speeding up or slowing down . But you can say add "sped up to x" or "slowed down to x". That how people communicate , it's clear and it works


Quote:
Let me state it another way so you understand. When you write "24p", you know what you mean (by its context), but I don't know what you mean because I don't yet understand the context -- the context is the thing you're trying to explain, otherwise, no one would misunderstand anything, eh? Using terms that rely on context when trying to explain the context (the use case) leads to frustration for both of us.
Yes, but most video people "get it" and understand what is being said based on experience and context of the topic. If you browse video forums, there is a way of communicating, and people (I mean the experienced regulars, not necessarily new members) understand it. Different forums have slightly different sub-cultures and slightly different ways of communicating, but for the most part it works, and people know what is being said. Part of your misunderstanding might be due to not dealing much with video or posting on forums.

But there is nothing wrong with explaining in different ways, or different words - I welcome it (but others might not as you've seen on some forums)


Quote:
By "a 29.97i stream", I think you mean 29.970 frames per second, right? Or do you actually mean 29.970 fields per second? ...At this point, I really don't know.
You might not know , but people that work with video definitely know...

"in 29.97i" means the stream is encoded as fields at 59.94 fields/second.
"in 29.97p" means the stream was encoded as frames at 29.97 frames/s
"in 59.94p" means the stream was encoded as frames at 59.94 frames/s

"i" means field encoding, "p" means frame encoding.

14.985p in 29.97i can only mean 1 thing to people here (and most that work with video)
14.985p in 59.94p can only mean 1 thing to people here (and most that work with video)

Quote:
So when attempting to understand what you wrote: "29.97i stream, with 14.985 fps content", I'm presented with 4 possibilities:
That pulled out of context alone might be ambiguous, but if you follow the conversation, you're partially quoting what was originally written - it was originally stated as "14.985p in 29.97i"

Quote:
Originally Posted by poisondeathray View Post
How would you distinguish between a video 14.985p in 29.97p (encoded progressively) vs. 14.985p in 29.97i (encoded interlaced as fields)? - or is that what sps is for ? - you never fully explained what the sps letters stand for - so would it be "29.9fps[14.9pps]" vs "59.9sps[14.9pps]" ? And 14.985p in 59.94p would be "59.94fps[14.9pps]" ?
poisondeathray is offline   Reply With Quote
Old 20th October 2021, 03:34   #18  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
Quote:
Originally Posted by poisondeathray View Post
Ok I mostly get it now.
The rest of the world is probably fine with the "15p in 60p", or "23.976p in 29.97i" style notation It's easier and it already works. People don't like change. LIke PAR to SAR, 29.97i to 59.94i. Video people on forums generally know what it means, because they deal with processing of various video
Well, dear friend, that would be fine if everyone followed your example (and it would be fine with me), but they don't ...because it's not been formalized. I propose formalizing it, and where else but in Doom9, eh?
PAR, "picture aspect ratio" -- The MPEG folks never use the acronym "PAR", but they do call the macroblock data "picture".
SAR, "sample aspect ratio" -- That's what the MPEG folks call it.
Now, bear with me...
In the case of PAR & SAR, it doesn't really matter.
DAR (display AR) = PAR (picture AR) x SAR (sample AR), or
DAR (display AR) = PAR (pixel AR) x SAR (storage AR).
The equation is the same in either case. But I've seen this:
DAR (data AR), and that is wrong, but the people who write it defend it.

That's not really the same type of issue as with fps vs. pps/sps, is it?
29.97i might be interpreted as 29.97fps with scan interlaced fields, or
59.94i might be interpreted as 59.94 scans per second in 29.97fps frames (but not necessarily: they could be in 119.8fps frames or in 10fps frames).
Big difference there.
29.97fps[59.94sps] is unambiguous. It specifies both frame rate and original scan rate, and does so in a way that can't be misinterpreted, eh?

Last edited by markfilipak; 20th October 2021 at 03:36.
markfilipak is offline   Reply With Quote
Old 20th October 2021, 04:35   #19  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,346
Quote:
Originally Posted by markfilipak View Post
Well, dear friend, that would be fine if everyone followed your example (and it would be fine with me), but they don't ...because it's not been formalized. I propose formalizing it, and where else but in Doom9, eh?
PAR, "picture aspect ratio" -- The MPEG folks never use the acronym "PAR", but they do call the macroblock data "picture".
SAR, "sample aspect ratio" -- That's what the MPEG folks call it.
Now, bear with me...
In the case of PAR & SAR, it doesn't really matter.
DAR (display AR) = PAR (picture AR) x SAR (sample AR), or
DAR (display AR) = PAR (pixel AR) x SAR (storage AR).
The equation is the same in either case. But I've seen this:
DAR (data AR), and that is wrong, but the people who write it defend it.
I've also see FAR, as frame aspect ratio (dimensions of frame w:h)

DAR = FAR x SAR


Quote:
That's not really the same type of issue as with fps vs. pps/sps, is it?
29.97i might be interpreted as 29.97fps with scan interlaced fields, or
59.94i might be interpreted as 59.94 scans per second in 29.97fps frames (but not necessarily: they could be in 119.8fps frames or in 10fps frames).
Big difference there.
29.97fps[59.94sps] is unambiguous. It specifies both frame rate and original scan rate, and does so in a way that can't be misinterpreted, eh?

Yes, it is a better notation style, no argument here

The problem is there are 2 people in the world that knows what it means. It's a bit confusing at first. I know now , so I can help translate what you say, or help translate what other people say to you

In your mind it's unambiguous , but you're assuming someone took time to learn it. That notation might be interpreted as 29.97fps content in 59.94 fields/s, or does that mean actual interlaced content 59.94 fields/s content, with a framerate of 29.97 ? It's quite easy to misinterpret the first time you see it

The more simple your "readme" or "faqs" are with examples - the higher chance that a someone might take time to figure out what you're trying to say. It might end up like "betamax" vs. "vhs", where betamax was technically superior but wasn't as popular

People here (or other video related forums) tend use the "content" description as your "pps" , or just describe it in words.

Last edited by poisondeathray; 20th October 2021 at 04:44.
poisondeathray is offline   Reply With Quote
Old 20th October 2021, 04:51   #20  |  Link
markfilipak
Registered User
 
markfilipak's Avatar
 
Join Date: Jul 2016
Location: Mansfield, Ohio (formerly San Jose, California)
Posts: 277
Quote:
Originally Posted by poisondeathray View Post
Yes, it is a better notation style, no argument here
Well, thank you poisondeathray. That means a lot to me.
Quote:
The problem is there are 2 people in the world that knows what it means. It's a bit confusing at first. I know now , so I can help translate what you say, or help translate what other people say to you
You are a helper and a giver. That's plain to see. You make my world go round.

Actually, I've seen other folks using the notation in the last few weeks. If it's going to catch on, better that it catch on slowly.

We both prefer showing by examples over pedantic explanations. Give me some simple and some difficult use cases in words and I'll try to 'translate' them, eh? That will help me. And other people who read this will catch on. People are pretty smart, especially people here.

PS: With the possible exception of implementing finite state machines in software, anything that requires much explanation isn't worth a sh*t.

Last edited by markfilipak; 20th October 2021 at 04:55.
markfilipak is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:58.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.