View Full Version : Editing in the compressed domain versus IDR frequency
neuron2
24th August 2008, 00:42
It's also worth remembering that open GOPs cause big problems for editing in the compressed domain, because you need to re-encode GOPs. That's another reason I like to have frequent IDRs.
bobololo
24th August 2008, 01:58
It's also worth remembering that open GOPs cause big problems for editing in the compressed domain, because you need to re-encode GOPs. That's another reason I like to have frequent IDRs.
Is that specific to open GOP ? If i'm correct, re-encoding is required when editing as long as you're dealing with inter frame prediction, isn't it ? I mean if I need to use a part of a sequence that starts at the middle of a GOP, i won't have much choice than re-encoding the first trimmed GOP.
Following my understanding this is why most editing softwares usually import asserts into an mezzanine intra-predicted frame instead of directly working on the original format.
neuron2
24th August 2008, 03:10
If you have frequent IDRs and are happy cutting with that granularity, then no re-encoding is needed. This is a typical operation when cutting commercials from transport streams. Four seconds can be too much for that application.
I have to say that all my applications are in the transport stream/broadcast domain, so my requirements may be different from the typical "ripper" type guy.
bobololo
24th August 2008, 03:55
If you have frequent IDRs and are happy cutting with that granularity, then no re-encoding is needed. This is a typical operation when cutting commercials from transport streams. Four seconds can be too much for that application.
Ah maybe we're mis-understanding each other, because in that sense open GOP doesn't involve anything wrt editing :)
By chance, were you understanding open GOP as long GOP (like 10s long GOP). To me open GOP means having frames of a GOP being encoded in the bitstream after the first frame of the next GOP.
My initial point was about saying that IDR isn't required to start a GOP and doing so would prevent use of open GOP. Anyway I agree with you that having a reasonable GOP size (ie. no more than a few sec) would be a good thing.
I have to say that all my applications are in the transport stream/broadcast domain, so my requirements may be different from the typical "ripper" type guy.
Same for me actually :)
neuron2
24th August 2008, 04:17
If you cut the GOP in front of an open GOP, then there will be nondecodable frames at the start of the remaining open GOP.
bobololo
24th August 2008, 06:22
If you cut the GOP in front of an open GOP, then there will be nondecodable frames at the start of the remaining open GOP.
It doesn't care since these frames are not intended to be displayed: they belong to the previous GOP (in presentation order) and therefore can be discarded.
neuron2
24th August 2008, 06:59
Please inform me of some editors that cut them away on a cut at an open GOP.
Sergey A. Sablin
24th August 2008, 12:42
Please inform me of some editors that cut them away on a cut at an open GOP.
it really becomes quite an OT, but anyway - there is no need to cut them away.
MPEG2 is using broken link for a purpose of discarding such a frames from decoding process (and this flag was specially designed to use in editing apps).
AVC has same information in recovery point SEI. Thus all is needed in case of presence of recovery point SEI is to set broken_link_flag to 1. Same as in MPEG-2 actually.
neuron2
24th August 2008, 18:15
it really becomes quite an OT, but anyway - there is no need to cut them away.
MPEG2 is using broken link for a purpose of discarding such a frames from decoding process (and this flag was specially designed to use in editing apps).
AVC has same information in recovery point SEI. Thus all is needed in case of presence of recovery point SEI is to set broken_link_flag to 1. Same as in MPEG-2 actually. Yes, but then an editor, e.g., for transport streams, would have to cut away corresponding audio or else every cut will desync the audio further and further. It's an idea that sounds OK, and it may be fine for editing elementary streams, but the main need is for editing containers, such as for commercial removal in transport streams. Am I missing something?
bobololo
24th August 2008, 19:49
Yes, but then an editor, e.g., for transport streams, would have to cut away corresponding audio or else every cut will desync the audio further and further. It's an idea that sounds OK, and it may be fine for editing elementary streams, but the main need is for editing containers, such as for commercial removal in transport streams. Am I missing something?
Are you assuming that such editors cut and join part of the sequence at the TS packets level ? If so, the only proper to me is to cut/join at the splicing points following SCTE35/DPI standards. However they are mainly used in ATSC/CableLabs streams and not very common at least in European DVB TS streams.
Otherwise, I don't see how this could work properly without remuxing the elementary streams in a final TS. Indeed, we would have loads of discontinuities (continuity_counter, PCR, PTS, T-STD buffers, etc.) at the junctions that would requires the decoder device to restart itself with the corresponding effects on the screen (freeze, glitch, etc.).
In the case where the editor actually remuxes the elementary streams, it shouldn't be so difficult to drop the unwanted access units.
neuron2
24th August 2008, 20:33
Well, I hear you, but people are trying to do such cuts on transport streams! The idea is to edit the transport stream and then demux audio and video for transcoding. That way the AV sync is retained.
Sergey A. Sablin
24th August 2008, 21:54
Yes, but then an editor, e.g., for transport streams, would have to cut away corresponding audio or else every cut will desync the audio further and further. It's an idea that sounds OK, and it may be fine for editing elementary streams, but the main need is for editing containers, such as for commercial removal in transport streams. Am I missing something?
if video frames weren't cut away then decoder will output corrupted frames (if broken link flag was not set) with audio. there shall not be desync here - am I correct? (I'm a bit lazy to draw graphics - it's still weekend here! :))
then if broken link is set then video decoder shall compensate skipped frames by changing frame duration.
there problems might only appear if playback part does not use timestamps for elementary streams samples.
I might be quite mistaken btw as I haven't much experience in such an editing.
neuron2
24th August 2008, 22:12
there problems might only appear if playback part does not use timestamps for elementary streams samples. Yes. In the scenario I just gave (cutting TS then demuxing), timestamps are lost.
So the problem is to cut commercials in transport. Demuxing before cutting creates the problem of how to cut the audio and video in a corresponding way.
Any ideas, anyone? The only thing I can see is cutting at IDRs or recovery points, which is why I like frequent IDRs.
bobololo
24th August 2008, 22:20
Well, I hear you, but people are trying to do such cuts on transport streams! The idea is to edit the transport stream and then demux audio and video for transcoding. That way the AV sync is retained.
Why don't they index the TS using dgindex and do the cut/join thing with avisynth ?
Anyway, if I had to do such a commercial removal tool, I would regenerate a proper output (TS, mp4 or mkv) file with corrected & continuous PTS. And I would take advantage of this process to remove unnecessary frames from open GOPs. After all remuxing into any container is a very fast and light process compared to re-encoding or even demuxing the input TS (which is necessary to preview and set mark in/out point).
Now that I understand better your comment, I can't definitely support the proposal to restrict open GOP in order to facilitate limited software :D. (please don't take this as an offense, but just a technical and objective opinion from me).
Sergey A. Sablin
24th August 2008, 22:32
Yes. In the scenario I just gave (cutting TS then demuxing), timestamps are lost.
how come? I see how discontinuities might occur, but how timestamps might be lost?
neuron2
24th August 2008, 22:34
When you demux to elementary streams, there are no longer any PES packets and therefore no longer any PTS/DTS timestamps.
Sergey A. Sablin
24th August 2008, 22:41
When you demux to elementary streams, there are no longer any PES packets and therefore no longer any PTS/DTS timestamps.
how video/audio renderers are supposed to works in this case? how they're synchronized then? they count samples?
neuron2
24th August 2008, 22:48
Well, AVI has no timestamps! They just play the access units at the specified rates and that is sufficient to keep things in sync.
bobololo
24th August 2008, 22:52
Don,
Just imagine that you had to do the cut/join process within dgindex. Inside your application, once you've indexed the source TS, you know exactly each access units dts/pts. You should then be able to create a new TS based on a provided edit list. For each segment, you have to compensate the original dts/pts pairs with a offset that takes account the cut parts. I hope I'm clear enough :)
neuron2
24th August 2008, 22:56
[My name is Don.]
I know I can (with some difficulty) write a tool to implement that. I'm just saying that existing TS editors don't do this (that's why I asked you to name any that do). So for them to work properly, frequent seek points are needed.
bobololo
24th August 2008, 23:04
[My name is Don.]
I know I can (with some difficulty) write a tool to implement that. I'm just saying that existing TS editors don't do this (that's why I asked you to name any that do). So for them to work properly, frequent seek points are needed.
Why not suggesting these app authors to do it the right way ?
ps: The editing softwares I know (Premiere, avid & FCP) on the versions I tried (this may have changed since then) all work on a mezzanine format (HDV, uncompressed, DNxHD, etc) and not on the original format.
Sergey A. Sablin
24th August 2008, 23:09
Well, AVI has no timestamps! They just play the access units at the specified rates and that is sufficient to keep things in sync.
that's bad, that's all I can say. I'm not too much helpful here I know, but if timestamps are not available, then yes, you're not able to handle open gops correctly.
neuron2
24th August 2008, 23:20
ps: The editing softwares I know (Premiere, avid & FCP) on the versions I tried (this may have changed since then) all work on a mezzanine format (HDV, uncompressed, DNxHD, etc) and not on the original format. But that means you have to re-encode your stream! The equivalent in my world would be to just demux the original stream and serve both video and audio into VirtualDub, from where you can do cuts to both the audio and video at once.
But the goal is to have a TS and edit it directly, creating a workable TS that they can play just as the original, without re-encoding. I suppose it is an unrealistic goal in the absence of frequent seek points.
bobololo
24th August 2008, 23:47
But that means you have to re-encode your stream! The equivalent in my world would be to just demux the original stream and serve both video and audio into VirtualDub, from where you can do cuts to both the audio and video at once.
But the goal is to have a TS and edit it directly, creating a workable TS that they can play just as the original, without re-encoding. I suppose it is an unrealistic goal in the absence of frequent seek points.
Absolutely true, and with regards to the open GOP, there is still the very hackish way that consists in filtering out the TS packets that hold unwanted frames (provided the video frames are PES aligned which is the case in DVB).
neuron2
24th August 2008, 23:49
You'd have to remove the corresponding audio. It's not a simple thing.
bobololo
24th August 2008, 23:56
You'd have to remove the corresponding audio. It's not a simple thing.
No it's useless, audio and video will still remain in sync because audio frame PTS are correct. At worst, you may have a little bit of audio "preroll" but nothing more.
Sergey A. Sablin
25th August 2008, 00:01
No it's useless, audio and video will still remain in sync because audio frame PTS are correct. At worst, you may have a little bit of audio "preroll" but nothing more.
correct me if I'm wrong, but Don said there is no timestamps after remuxing in these systems, so you actually won't know whether this is preroll or the beginning of real audio.
neuron2
25th August 2008, 00:05
No it's useless, audio and video will still remain in sync because audio frame PTS are correct. At worst, you may have a little bit of audio "preroll" but nothing more. Not if you are going to demux the streams after the TS cutting.
Sergey A. Sablin
25th August 2008, 03:32
well, there is still a place for a hack, but deeper - if the system do not provide timing on samples (which is just silly given that mpeg-2 was introduced in 1993 with frame reordering) and everybody knows about it, then video decoder can additionally repeat I frame as many times as it has frames to skip (ie from previous gop). Thus preventing propagating desync. If even video decoders aren't bother to handle this, then I'm second here for blaming such a ridiculously ancient editing apps and their decoders all together.
all this situation looks pretty silly, given that there is a strong demand for a working solution, which is obvious, and no solutions at all from neither part - nor system nor decoders. There definitely should be something wrong...
bobololo
25th August 2008, 23:10
Hum I feel a bit lost at this point following the latest comments from you. I've probably missed something.
So if I recap the process, we have an off-the-air TS capture file that contains commercials that we would like to remove.
For this purpose, we're cutting & joining different segments of the source file based on a TS packet boundary operation. And no doubt that a finer editing granularity is achieved thanks to regular random access points.
And this is where I'm confused, what is the final goal?
1. Are we trying to use this output TS file which is intended for being played back as it is?
2. Or is the previous TS output file an intermediate stage of a process which tends to generate 2 ES (audio and video) which are synchronized, of the same duration and of course commercial free?
In case (1), thanks to the broken link signaling (useless frames aren't displayed) and PTS present in the TS (audio and video are synchronized), a player can play the file correctly.
In the case (2), I think I have an audio/video sync concern and this occurs no matter we have open GOP or not. Indeed, audio and video frame durations are not identical and multiple each other. Moreover audio PES packets can contain several audio frames.
Let me give you an example and for simplicity let's consider we don't have open GOP. So if I cut the TS at a time corresponding to the start of a video frame (a random access point), I can only at best cut the audio at the closest audio PES packet start. In any case, I'll never have the same duration of video and audio data. I'll either have excess of audio if the closest audio PES is before (in the time) the video frame or lack of audio otherwise.
If I repeat this for each segment and am pretty unlucky, I can easily accumulate the time shifts between audio and video at each segment boundary which would result to a uncontrolled a/v sync. So how can this work?
I've surely missed something here :)
Sergey A. Sablin
26th August 2008, 20:35
If I repeat this for each segment and am pretty unlucky, I can easily accumulate the time shifts between audio and video at each segment boundary which would result to a uncontrolled a/v sync. So how can this work?
there might be only two solutions here:
1. demuxer outputs a/v in sync, samples of video that do not have corresponding audio are not delivered and vice versa.
2. there is a timing in a system, which every component pass through, thus muxer receives the timing from demuxer and in this case muxer is responsible for synchronization, given that timestamps are correct.
case 1 seems to be far from reality. while 2 is how actually all this is supposed to work.
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.