Log in

View Full Version : PGS (BD *.SUP) files binary layout


Rudde
3rd December 2015, 04:25
Hello.

I'm working on a community for crowd-sourcing OCR effort and QA on subtitles. And need to parse BD Sup's, and in my case PHP.

I've been looking around and I've found the libbluray source in C but I find it very complicated to get anything useful out of it. And I cannot find any documentation on the BD SUP format other then SupRip guys short guide on it here: http://exar.ch/suprip/hddvd.php

Is the header footer and repeating pattern documented anywhere so I can implement it?

hubblec4
6th December 2015, 10:52
the best documentation (http://forum.doom9.org/showthread.php?p=1736749#post1736749)i have found

Rudde
6th December 2015, 16:28
Hey! Thank you very much! I didn't find that and it is very helpful!

It's not a spec but here is more in depth info on the structure of a .sup file. I will also explain in more detail with an example.

SupRip github page with sup related code. You can read more specifics about each section here that I don't discuss.
https://github.com/peterdk/SupRip/blob/master/Bluray%20Sup.txt

The hex 0x50 0x47 are headers for sections in the .sup file, however there are multiple sections per subtitle line. There is always (in my experience) a 0x16, 0x17, 0x14, 0x15, and 0x80 flagged section for displayed subtitles. There can be multiple 0x15 sections if the bitmap data is large. There are also "empty" subtitles which are used to end a subtitle which only have 0x16, 0x17, and 0x80 flags. Subtitles only have a start time and no end time or duration, so a subtitle is ended by the next subtitle starting. So if two subtitle lines are separated by a time period, a blank subtitle section is used to end the first subtitle.

Referencing the linked page these are the flags each header can have.
TIMES = 0x16
SIZE = 0x17
PALETTE = 0x14
BITMAP = 0x15
END = 0x80

Hopefully explaining everything with hex code of a full subtitle should help. I separated each of the sections and truncated the palette and bitmap sections.

Header structure, always 13 bytes long


50 47 start of header
00 01 E3 E0 time code
00 00 00 00 second time code (normally zero)
16 glag
00 13 data size


To get the time code to a usable form take the hex and convert to decimal, then multiple by 100,000 and divide by 9 to get nanoseconds.

0001e3e0 -> 123872 *100,000/9 = 137635555 ns = 1.37635555 sec


start of .sup file
1st subtitle

50 47 00 01 E3 E0 00 00 00 00 16 00 13 07 80 04 38 10 00 02 80 00 00 01 00 00 00 00 05 41 02 6F Times section of subtitle, gives the start time of the subtitle

50 47 00 01 E3 34 00 00 00 00 17 00 13 02 00 05 41 02 6F 00 81 00 37 01 02 40 03 CB 03 04 00 46 Size section of subtitle

50 47 00 01 CD 04 00 00 00 00 14 01 47 00 00 00 ... Palette section of subtitle

50 47 00 01 CD 2C 00 00 00 00 15 0C 68 00 00 00 ... Bitmap section of subtitle

50 47 00 01 CD 2C 00 00 00 00 80 00 00 End section of subtitle, denotes the end of one subtitle

2nd subtitle (actually blank which ends 1st subtitle)

50 47 00 07 E7 80 00 00 00 00 16 00 0B 07 80 04 38 10 00 05 00 00 00 00 Times section of subtitle, gives the start time of the subtitle. Previous subtitle ends when this "subtitle" begins

50 47 00 07 E6 D4 00 00 00 00 17 00 13 02 00 05 41 02 6F 00 81 00 37 01 02 40 03 CB 03 04 00 46 Size section of subtitle

50 47 00 07 E6 D3 00 00 00 00 80 00 00 End section of subtitle, denotes the end of one subtitle


Hope that helps.

But there is a couple of things I don't understand. What exactly is 0x70? And what do I use it for? And what is the 19 bytes in 0x16 when I get the timestamp in PTS? And what is a palette?

It also looks like 07 80 04 38 in 0x16 is suppose to represent 1920x1080

Also in the first 0x16 according to Subtitle Edit source the last 4 bytes are two location coordinates for the placement of the sub, and the one before that is a FPS tag witch is always 0x10 representing 24p fps.