Log in

View Full Version : Split blu-ray (sup or xml+pngs) subtitles?


pepelugil
6th February 2014, 20:28
Hello,

Is there any tool (free or pro) that lets you choose split points to create multiples subtitle files from a whole movie subtitle stream in order to create a seamless branching movie subtitles?

I have Lemony Pro but it seems that this cannot be done with it.

Thank you.

pball
24th August 2015, 00:05
I'd also like to know if there is a tool that can split pgs (.sup) subtitle files.

I'm currently using mkvmerge with the splitting option which ends up splitting a .sup file. But this seems like a pretty lousy work around as it takes time to make the split mkv files then demux the .sup files. I also believe this method doesn't always split the subs to the exact frame specified because mkvmerge splits the video at the nearest key frame or something and not always the specified frame.

minhjirachi
24th August 2015, 05:02
I have the same question and don't know how to split them in order to create a seamless branching.

DarkSpace
24th August 2015, 09:18
You could try using ffmpeg:

ffmpeg -i INPUT.sup -c:s copy -ss STARTTIME -to ENDTIME OUTPUT.sup


I don't know if this will work, but since ffmpeg only needs to modify the timing information this way, rather than having to re-encode to sup (which it can't), it might work...

Ghitulescu
24th August 2015, 09:29
I believe it can be done, by splitting the main movie according to the required/desired positions, then rip the subtitles of each segment.

pball
25th August 2015, 00:58
DarkSpace:
That didn't work for me. ffmpeg threw an error about not finding codec parameters and it also mentioned mp3 format somewhere in the output also.

Ghitulescu:
That sounds like the method I described using mkvmerge to create a set of split mkv files and demux the subs from them. That works but the timing is thrown off because the video does not split at the exact time specified.

Unless there is another better method you could suggest.

sneaker_ger
25th August 2015, 07:52
That sounds like the method I described using mkvmerge to create a set of split mkv files and demux the subs from them. That works but the timing is thrown off because the video does not split at the exact time specified.
Turn off all video and audio when muxing, then mkvmerge will not be limited to the keyframes. But: mkvmerge will never cut a subtitle line into two parts. Selur has written an extra app because of that limitation (I have not tested it):
SECut filename:<filename> cut:<ms-ms>[;ms-ms][...] outfilename:<output file name>
https://www.sendspace.com/file/jynqgz
http://forum.gleitz.info/showthread.php?46462-Bild-basierte-Untertitel-(idx-sub-pgs)-schneiden&p=439066&viewfull=1#post439066

minhjirachi
25th August 2015, 10:42
Turn off all video and audio when muxing, then mkvmerge will not be limited to the keyframes. But: mkvmerge will never cut a subtitle line into two parts. Selur has written an extra app because of that limitation (I have not tested it):
SECut filename:<filename> cut:<ms-ms>[;ms-ms][...] outfilename:<output file name>
https://www.sendspace.com/file/jynqgz
http://forum.gleitz.info/showthread.php?46462-Bild-basierte-Untertitel-(idx-sub-pgs)-schneiden&p=439066&viewfull=1#post439066

Do you have any example commands? I have tested with this command:

SECut filename:sample.sup cut:1500ms outfilename:part1.sup

But I the output just get 0 KB.

sneaker_ger
25th August 2015, 11:02
Not tested, but I assume you always need to use a range, i.e. 0ms-1500ms to get your first file. Then run again for the next part etc.
Are you sure you want only 1500ms? That's only 1.5 seconds...

DarkSpace
25th August 2015, 11:23
DarkSpace:
That didn't work for me. ffmpeg threw an error about not finding codec parameters and it also mentioned mp3 format somewhere in the output also.


I see. A bit of research showed me that ffmpeg apparently can't write raw pgs files. I'm sorry about that, it was just an idea.
I can't explain the bit about mp3, though...

minhjirachi
25th August 2015, 13:12
Not tested, but I assume you always need to use a range, i.e. 0ms-1500ms to get your first file. Then run again for the next part etc.
Are you sure you want only 1500ms? That's only 1.5 seconds...

Just for testing.

P/S: I have try again but same problem. Here is my command:

SECut filename:Vietnamese.sup cut:20000ms-500000ms outfilename:Test1.sup

pball
26th August 2015, 00:40
I was able to get that program to work, mostly. I used the following command and it made a split sup file. The issue is the time codes for the subs aren't edited so the episode 2 subs start at 22m and the episode 3 subs start at 44m. I checked by loading the .sup file into an sub ocr program.

E:\Pics>secut.exe filename:"T5_Subtitle - English.sup" cut:2672671-4008004 outfilename:"ep3 subs.sup"

I have a idea of how .sup files are structured thanks to the mkvmerge dev mosu, so I'm going to try and write my own program to split .sup files. I'll post back if I make any progress, not exactly a first rate programmer lol.

minhjirachi:
Try it without the "ms" on the times.

minhjirachi
26th August 2015, 04:42
I was able to get that program to work, mostly. I used the following command and it made a split sup file. The issue is the time codes for the subs aren't edited so the episode 2 subs start at 22m and the episode 3 subs start at 44m. I checked by loading the .sup file into an sub ocr program.

E:\Pics>secut.exe filename:"T5_Subtitle - English.sup" cut:2672671-4008004 outfilename:"ep3 subs.sup"

I have a idea of how .sup files are structured thanks to the mkvmerge dev mosu, so I'm going to try and write my own program to split .sup files. I'll post back if I make any progress, not exactly a first rate programmer lol.

minhjirachi:
Try it without the "ms" on the times.

Thank you,
I have success. But the another problem like you said. That the timecode doesn't start from the beginning of each video files. :(

pball
27th August 2015, 01:41
Well I have the source code for SECut and mkvmerge both of which deal with splitting sup files. So together with my expert programmer friend, we should be able to put something together that works. Hopefully will have an update this weekend.

hubblec4
28th August 2015, 21:08
Hi pball

sounds good what you want to do.

What is when the split point are between of a sup's display time?

Sup splitting works for me very well with mkvmerge, but not for the case in my question.

small example:

subtitle 1 starts at 10sec and ends after 14sec. display time 4 sec.

The split should be at 12sec.

mkvmerge cut well but the subtitle lose the info of the display time (PGS have a start time and a duration, not an end time, I think).
While playing the mkv the subtitle starts at 10 sec but it will shown for the rest of the mkv. All following sups will not shown. Only after a "jump" in the timeline stops the display of subtitle 1.

Can you keep this problem in your mind?

pball
29th August 2015, 04:29
I can try, but that sounds like an odd case to me. I'm personally only going to be cutting subtitles when multiple episodes worth are in a single file (because of how the bluray was made). I'm going to focus on splitting between subtitles to start, considering that is supposedly easy and I don't perfectly understand how .sup files work yet. Anything more will come after that point. I'm also going to make a dedicated thread when I have something to show.

hubblec4
29th August 2015, 10:25
Ok thanks.

I'm sure if you try this, you can get help from Mosu or other developers for that problem.

Can you have a look to the SUP-specs?(sorry i haven't it).

sneaker_ger
29th August 2015, 10:41
It's a known problem, Mosu said he won't fix it. (But that was a long time ago, maybe he has forgotten and you can convince him now.)

hubblec4
29th August 2015, 20:48
It's a known problem, Mosu said he won't fix it. (But that was a long time ago, maybe he has forgotten and you can convince him now.)

I ask him per pn last 2 month, i think he know it :-).

He said, he could implement this, but not yet and in a long future.

@pball

The "solution" must cut the subtitle and write new timestamps and durations.

pball
31st August 2015, 00:44
Well for anyone hitting F5 on this thread, I have news. I have code that reads a .sup file and gets me the time codes. I still have to write the code to write the newly split .sup files and adjust time codes as needed. It won't be hard at all but it'll take a bit of time for me to get it together. So my next post should be a new thread for my script.

hubblec4
31st August 2015, 09:33
wow, good news.

EDIT: Do you have any specs for PGS/sup found?

nevcairiel
31st August 2015, 10:06
.sup files are pretty simple, I don't know if there is a formal spec somewhere, but this is how they work:

Every Frame has a 10+3 byte header (10+3 because 3 bytes are technically part of the PGS subtitle data already), all values stored in Big-Endian:
- 2 bytes magic number (0x5047)
- 4 bytes PTS (Presentation Timestamp)
- 4 bytes DTS (Decoding Timestamp)
- 1 byte type
- 2 bytes size (excluding header)

The first 10 bytes are for the SUP format, the type and size field start the actual subtitle data, and should be preserved for the subtitle decoder.

.. followed by the subtitle data of the specified size

sneaker_ger
31st August 2015, 10:13
PTS and DTS for a raw subtitle format?

nevcairiel
31st August 2015, 10:18
Since the main source of PGS is m2ts, someone probably decided to keep both timestamps around, for easier re-construction of a m2ts later.
Note that many tools write a DTS of 0 when demuxing from other sources (like mkv), so 0 DTS should be considered "unset" for compatbility reasons.

hubblec4
1st September 2015, 08:31
.sup files are pretty simple, I don't know if there is a formal spec somewhere, but this is how they work:

Every Frame has a 12 byte header, all values stored in Big-Endian:
- 2 bytes magic number (0x5047)
- 4 bytes PTS (Presentation Timestamp)
- 4 bytes DTS (Decoding Timestamp)
- 2 bytes size (excluding header)

.. followed by the subtitle data of the specified size

Thanks for this info, nevcairiel.

Can you explain it a bit more detailed?

1.Magic number -> there starts every a new sub?

This 6 lines from the hex editor of a sup with one subtitle(sub is empty). There are more then one
"magic number(50 47)".
BDSUP2Sub(5.1.2) decodes frame 1/1 and BDSup2Sub++ decodes three times "frame 1/1"


50 47 00 00 49 50 00 00 2C 29 16 00 13 07 80 04
38 10 00 00 80 00 00 01 00 00 00 40 00 00 03 0A
50 47 00 00 42 F1 00 00 2C 29 17 00 0A 01 00 00
00 03 0A 07 80 01 2E 50 47 00 00 2C 29 00 00 00
00 14 00 07 00 00 00 10 80 80 FF 50 47 00 00 38
E7 00 00 2C 29 15 07 1F 00 00 00 C0 00 07 18 07
... ...


2. Presentation Timestamp -> ist this the start time when the sub is shown?

In which format is the time stored? in the short example the PTS is 00 00 49 50

3. Deoding Timestamp -> is this the length of the display duration?

Same question, which time format.
But I see an extracted sup-file with eac3to has allmost 00 00 00 00 as DTS.


4. Size -> should be clear :-)


But there are more as this 4 infos in a sup.
Forced Caption
Position top
Positon Left
etc.

Where are this infos stored?

pball
2nd September 2015, 03:24
It's not a spec but here is more in depth info on the structure of a .sup file. I will also explain in more detail with an example.

SupRip github page with sup related code. You can read more specifics about each section here that I don't discuss.
https://github.com/peterdk/SupRip/blob/master/Bluray%20Sup.txt

The hex 0x50 0x47 are headers for sections in the .sup file, however there are multiple sections per subtitle line. There is always (in my experience) a 0x16, 0x17, 0x14, 0x15, and 0x80 flagged section for displayed subtitles. There can be multiple 0x15 sections if the bitmap data is large. There are also "empty" subtitles which are used to end a subtitle which only have 0x16, 0x17, and 0x80 flags. Subtitles only have a start time and no end time or duration, so a subtitle is ended by the next subtitle starting. So if two subtitle lines are separated by a time period, a blank subtitle section is used to end the first subtitle.

Referencing the linked page these are the flags each header can have.
TIMES = 0x16
SIZE = 0x17
PALETTE = 0x14
BITMAP = 0x15
END = 0x80

Hopefully explaining everything with hex code of a full subtitle should help. I separated each of the sections and truncated the palette and bitmap sections.

Header structure, always 13 bytes long


50 47 start of header
00 01 E3 E0 time code
00 00 00 00 second time code (normally zero)
16 flag
00 13 data size


To get the time code to a usable form take the hex and convert to decimal, then multiple by 100,000 and divide by 9 to get nanoseconds.

0001e3e0 -> 123872 *100,000/9 = 137635555 ns = 1.37635555 sec


start of .sup file
1st subtitle

50 47 00 01 E3 E0 00 00 00 00 16 00 13 07 80 04 38 10 00 02 80 00 00 01 00 00 00 00 05 41 02 6F Times section of subtitle, gives the start time of the subtitle

50 47 00 01 E3 34 00 00 00 00 17 00 13 02 00 05 41 02 6F 00 81 00 37 01 02 40 03 CB 03 04 00 46 Size section of subtitle

50 47 00 01 CD 04 00 00 00 00 14 01 47 00 00 00 ... Palette section of subtitle

50 47 00 01 CD 2C 00 00 00 00 15 0C 68 00 00 00 ... Bitmap section of subtitle

50 47 00 01 CD 2C 00 00 00 00 80 00 00 End section of subtitle, denotes the end of one subtitle

2nd subtitle (actually blank which ends 1st subtitle)

50 47 00 07 E7 80 00 00 00 00 16 00 0B 07 80 04 38 10 00 05 00 00 00 00 Times section of subtitle, gives the start time of the subtitle. Previous subtitle ends when this "subtitle" begins

50 47 00 07 E6 D4 00 00 00 00 17 00 13 02 00 05 41 02 6F 00 81 00 37 01 02 40 03 CB 03 04 00 46 Size section of subtitle

50 47 00 07 E6 D3 00 00 00 00 80 00 00 End section of subtitle, denotes the end of one subtitle


Hope that helps.

nevcairiel
2nd September 2015, 17:00
Oh yeah my explanation missed the type field (one byte), sorry about that.
Its also important to note that the Type and Size fields are part of the PGS subtitle chunk, while the 10 bytes before it were added for the SUP format specifically.

The times are stored in MPEG timebase, ie. 1/90000, so "00 00 49 50" would be 0x00004950 divided by 90000, or 0.20853 seconds (208.53 ms)
The presentation time is the time the subtitle is supposed to be shown at of course.

Decoding Time is used to ensure proper interleaving, so that the subtitles can reach the subtitle decoder early enough to be done decoding when they are needed.
Many container formats do not have this concept, and only carry a Presentation Time, so it may often simply be 0 and be considered not set.

pball
4th September 2015, 02:36
pball's Bluray PGS Subtitle splitter (.sup file splitter)

http://forum.doom9.org/showthread.php?p=1737122

There is version 1.0 of my subtitle splitter. Try it out and let me know how it works.

hubblec4
5th September 2015, 21:32
thanks alot for the/your specs view.