Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
2nd January 2018, 16:40 | #1 | Link | ||
Registered User
Join Date: Mar 2006
Location: Shanghai, China
Posts: 203
|
Confused by PTS, DTS and CTS in MP4 and FLV
Hello, I recently learning about FLV and MP4 container and confused by the timestamps.
Long ago I have got touch with XviD in AVI, heard about packed stream. It's said that AVI has no B-frame support and requires the decoder to output one frame with on packet's input. To satisfy the requirement, a hacking called packed stream is used to break the limitaion. It placed the depended P-frame and B-frame in the same packet. Thus the packets are stored in the turns [I] [PB] [B] [] [PB] [B] []... so that it can give one frame's output with one packet's input. Time flies and it's the age of MP4 now. I works on FLV streaming on job, and be getting to know that MP4 and FLV store DTS and CTS-Offset in header. The famous tool FFMPEG has a FFPROBE in its package offering the feature to dump packet information of a stream, which offering PTS and DTS on output. It regards DTS + CTS Offset as PTS. I have a lot of idea to confirm if it's correct. In CFR (constant framerate) situation, I know that PTS is "[frame number] * [frame interval]" which frame number is ordered in presentation sequence, and DTS is "[frame number] * [frame interval]" which frame number is ordered in decoding sequence. For example a bit stream Code:
I P B B P B B P B B Code:
I P B B P B B P ... DTS 0 40 80 120 160 200 240 280 ... CTS 0 120 40 80 240 160 200 360 ... Code:
I P B B P B B P ... DTS 0 40 80 120 160 200 240 280 ... CTS 40 160 80 120 280 200 240 400 ... But I don't know what are them. In my understanding, DTS is short for decoding timestamp and PTS is short for presentation timestamp. The former one standing for the timestamp when the packet should feed to decoder, and the latter one means when the decoded frame should be shown. (is it right?) It seems very important for hardware players to know how to control the input and output buffer. If I CTS means output and DTS means input (is it right?), the sorted action will be Code:
action buffer present I (DTS: 0) I I (CTS: 40) I P (DTS: 40) P I B (DTS: 80) P B I B (CTS: 80) P I B B (DTS: 120) P B I B B (CTS: 120) P I B B P (CTS: 160) I B B P P (DTS: 160) P I B B P B (DTS: 200) P B I B B P B (CTS: 200) P I B B P B B (DTS: 240) P B I B B P B B (CTS: 240) P I B B P B B P (CTS: 280) I B B P B B P ... ... ... By the idea, on the condition that the input and output order kept unchanged and CTS unchanged, modify the DTS and CTS-Offset will not affect the final playback. (is it right?) I guess 2 situations: 1. A hardware player will running out internal buffer if I feed it a MP4 file with small DTS in begining. like this: Code:
I P B B P B B P ... DTS 0 1 2 3 4 5 6 280 ... CTS 40 160 80 120 280 200 240 400 ... 2. I can give timestamps like what packed stream do like this: Code:
I P B B P B B P ... DTS 0 79 80 120 199 200 240 319 ... CTS 0 120 40 80 240 160 200 360 ... The trouble comes in the situation of VFR. I have seen many tools have the feature to import timecode into a track of MP4 file. There are tc4mp4, mp4fpsmod, dtsedit, dtsrepair, lsmash-timelineeditor. And they seem to generate output file with different DTS. If the idea that I came up with is right, the outputs are all correct. But I'm not confident with the idea. I think the concept of DTS and CTS is the same across MP4 and FLV. In the opensource project FLV.JS (https://github.com/bilibili/flv.js which is a library for FLV playback in HTML5-compatible browser), it gives a video sample's duration by [next frame's dts] - [current frame's dts]. (https://github.com/Bilibili/flv.js/b...emuxer.js#L359 ) I think the duration should be computed from cts (or pts) because it's the presentation timestamp. (is it right?) The interval of feeding packet to decoder has nothing to do with a sample's duration. However, in the MP4 standard (http://standards.iso.org/ittf/Public...96-12_2015.zip ), it mentions that Quote:
Quote:
Thus the DTS is not only the timestamp when the packet is sent to decoder, but also has effect in how long the sample keeps. Then, there comes a large number of questions like "how it will be if the duration is not matched with pts interval" "which tool that import timecodes into MP4 tracks fits the standard best" etc. Since nearly all kind of players are made compatible with variable incorrect streams. So it's not a easy work to check the idea I mentioned is correct - they are likely to give correct output on incorrect input. But some of them, like the FLV.JS implementation, may output incorrect av-sync results for incorrect input. Currently I'm working on live broadcasting FLV streams, I think it's important to have a correct stream pushed to server. Mobile phones has limited resource and it will not produce CFR stream. And also the android's "MediaCodec" API gives no "DTS" for outputed packet. I must fill it by myself. Thanks very much. Last edited by leiming2006; 2nd January 2018 at 18:07. |
||
|
|