Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
30th March 2007, 06:20 | #1 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
Converting Bluray disc subtitles
I'm wondering if this is possible yet.
I've tried ripping the data from a bluray disc's .m2ts file. I believe that PIDs starting from 1200 are the subtitle streams, but I have no idea what to do with them. Anyone have any information on how to convert them to SRTs? Or where I can find the format specifications for them? |
30th March 2007, 17:37 | #2 | Link | |
Coder
Join Date: Jan 2007
Location: Around the World
Posts: 697
|
Quote:
Could you send me a sample? |
|
31st March 2007, 20:54 | #5 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
I used TSReader Lite. There are also pay versions of those, but I think the Lite version is all we need right now. I guessed that the PIDs for subs begin at PID 1200. I hope I am right!
I would like to try it with VLC too - and see if I get similar results. |
13th April 2007, 21:50 | #8 | Link | |
Coder
Join Date: Jan 2007
Location: Around the World
Posts: 697
|
Yes, I did. It's not the same as the HD DVD's subtitle stream. I think it contains some headers.
Quote:
If you figure out the file structure, I'll change SUPread. Last edited by Pelican9; 13th April 2007 at 21:55. |
|
15th April 2007, 20:50 | #9 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
Here is another rip of subtitles, from Memento blu-ray:
http://www.sendspace.com/file/ghi4gb This should be a better rip than the first one, where I used TSReader Lite that doesn't rip from the beginning of the stream! |
17th April 2007, 14:40 | #10 | Link | |
Coder
Join Date: Jan 2007
Location: Around the World
Posts: 697
|
Quote:
|
|
21st April 2007, 04:23 | #11 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
Yeah I know how you feel - it's a big job figuring these files out!
Fortunately, I think I figured out the Run Length Encoding scheme used for the bitmaps. However, I haven't figured out where the subtitle time and duration information is stored, and I also need to decode the color table somehow. Some examples from my RLE findings: 00 47 80 00 00 = The next 1920 pixels across are of color index 0. The 00 pair at the end signifies the end of a line. 00 84 0e = The next 4 pixels are of the color index 14 (decimal). 00 9e 14 = The next 30 (0x1e) pixels are of the color 20. 00 c1 4e 33 = the next 334 pixels are of the color 51. If the sequence doesn't begin with a 00 then the value indicates the color of the pixel at that point. If the sequence ends with two 00's then that means we reached the end of the line. |
21st April 2007, 19:47 | #12 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
More:
00 03 = The next 3 pixels use color index 0. 00 01 = The next pixel is color index 0. So, basically, 00 is reserved as an RLE command. To get the color 0 you need to do the above. Given the above, here are some examples: 00 47 80 00 00 00 42 0F 0E 14 00 84 FE 14 0E 00 0F Breaks down into: 00 47 80 = 1920 pixels of color 0. 00 00 = end of line, go to the beginning of the next line. 00 42 0F = 527 pixels of color 0 0E = pixel at x location 528 is color 14. 14 = pixel at x location 529 is color 20. 00 84 FE = the next 4 pixels are of color 254. 14 = the next pixel is color 20. 0E = the next pixel is color 14. 00 0F = the next 15 pixels are color 15. This will go on until 1920 pixels for the line have been specified, and shall end in a 00 00 pair. Now, not all subpictures will be 1920 pixels across - this depends on the header before the bitmap info. Last edited by Rectal Prolapse; 21st April 2007 at 19:57. |
25th April 2007, 17:12 | #13 | Link |
Coder
Join Date: Jan 2007
Location: Around the World
Posts: 697
|
Code:
rle_coded_line() { do { if (nextbits != ‘0000 0000b’) { pixel_code (8bit) } else { 8-bit_zero (8bit) switch_1 (1bit) switch_2 (1bit) if (switch_1 == ‘0b’) { if (switch_2 == ‘0b’) { if (nextbits != ’00 0000b’) run_length_zero_1-63 (6bit) else end_of_line_signal (6bit) } else { run_length_zero_64-16K (14bit) } } else { if (switch_2 == ‘0b’) { run_length_3-63 (6bit) pixel_code (8bit) } else { run_length_64-16K (14bit) pixel_code (8bit) } } } } while (!end_of_line_signal) } 8-bit-zero: An 8-bit field filled with ‘0000 0000b’. switch_1: A 1-bit switch that identifies the meaning of the fields that follow: if set to the value ‘0b’, the field indicates a run-length for a pixel value of ‘0x00’ or an end_of_line_signal; if set to the value ‘1b’, the field indicates a run-length for a pixel value that is not ‘0x00’. switch_2: A 1-bit switch that identifies the meaning of the fields that follow: if set to the value ‘0b’, the field indicates a small run-length or end_of_line_signal; if set to the value ‘1b’, the field indicates a long run-length. run_length_zero_1-63: The number of pixels that shall be set to a value of ‘0x00’. end_of_line_signal: A 6-bit field filled with ’00 0000b’. The presence of this field signals the end of the coded line. run_length_3-63: The number of pixels that shall be set to the pixel value defined next. This field shall not have a value less than 3. run_length_zero_64-16K: The number of pixels that shall be set to a value of ‘0x00’. This field shall not have a value less than 64. run_length_64-16K: The number of pixels that shall be set to the pixel value defined next. This field shall not have a value less than 64. |
5th May 2007, 22:40 | #14 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
Thanks Pelican9!
Last week I spent a lot of time doing more work reverse engineering m2ts subpicture streams. I believe I have figure out the essentials. I will try to post more details in the coming days. I can say with absolute certainty that the timestamps are NOT included in the elementary PG stream. They are embedded in the PTS values of the PES packets before each new subtitle. These PES packets should be extracted by a tool (like TSReader or Manzanita) that preserves all the packets in the stream. Unfortunately this means we will need a transport stream parser. I also discovered how subtitles are forced on. There are separate commands for define the palette, the bitmap, and a command for blanking the subtitle. The latter is preceded by a PES packet that specifies the PTS. One problem remaining is determining the subtitle timing relative to the presentation of the first video frame. I don't yet know how to get this value. I hope to provide very specific details soon. Hopefully you and Haali and whoever else can make use of this information. BTW, I used a clean-room approach to this - I do not have the full Blu-ray specification, only the publically available one. I also used the XVI hex editor and PowerDVD 7.3 for my experiments. Lots of trial and error was involved. Thanks to everyone who helped! |
5th May 2007, 23:39 | #15 | Link | |||
Coder
Join Date: Jan 2007
Location: Around the World
Posts: 697
|
Quote:
Quote:
If he can make a .sup file I will change SUPread to handle it. Quote:
Same as HD DVD. :-) We have to find the first video PTS and then subtract this value from every subtitle PTS. (It's very easy to find on HD DVD's DSI packet.) Last edited by Pelican9; 5th May 2007 at 23:49. |
|||
5th May 2007, 23:51 | #16 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
Oh, I'm sure HD-DVD is the same.
I find it very strange that no one has posted their findings on these formats before though - is there another forum for that? EDIT: Nevermind. Last edited by Rectal Prolapse; 6th May 2007 at 00:02. |
6th May 2007, 00:05 | #17 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
The subpicture stream consists of data inside sections. Each section begins with an identifier byte, followed by a 16 bit integer that contains the length of the data, followed by the data itself.
Identifiers, order of presentation found in typical subtitle stream: 16 00 13 07 80 04 38 40 00 01 00 00 00 01 00 00 00 00 00 00 00 00 16 = identifier 00 13 = size of section 07 80 04 38 = 1920x1080. 01 = do not clear. If set to 00 instead it will clear the subpicture! 40 00 01 00 = sequence number?? 00 = if set to 40 instead of 00 the next subtitle will be forced on. 17 00 0A 01 00 00 00 00 00 07 80 04 38 17 = ? 07 80 04 38 = 1920x1080 14 04 FD 00 00 00 10 80 80 00 14 = palette definition 04 FD = size of section following in bytes. 00 10 80 80 00 = color index 0 with YCbCr = 10,80,80, alpha channel = 0. 15 89 92 00 00 01 C0 00 89 8B 07 80 04 38 00 47 80 00 00 15 = bitmap picture section 89 92 = size of section in bytes following. 00 00 01 C0 00 = unknown ! 89 8B = appears to be another length indicator. 89 92 minus 5 bytes. 07 80 04 38 = 1920x1080 image dimensions. 80 00 00 80 = end of picture? No section? End of Epoch? Go blank? |
6th May 2007, 00:13 | #18 | Link | |
Registered User
Join Date: Mar 2005
Posts: 433
|
Quote:
|
|
6th May 2007, 00:14 | #19 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
BTW Pelican9, you implied that you already knew the format - is that true? If dmz01 can write a subtitle extractor then he must know the format already?
Is there a reason why this information isn't more widely known? On the other hand, I don't have any special access to Blu-ray specs - so NDAs and licenses are not a concern for me - just old-fashioned true reverse engineering. |
6th May 2007, 02:43 | #20 | Link |
Registered User
Join Date: Jan 2003
Location: Silicon Valley
Posts: 455
|
Here's an update to xport that demuxes the subtitle (Presentation Graphics) stream.
http://www.w6rz.net/xportpgs.zip An extra parameter has been added to select the subtitle stream. xport -h movie.m2ts 1 1 1 1 Output filename is bits0001.pgs and the -u option dumps the PTS. Ron
__________________
HD MPEG-2 Test Patterns http://www.w6rz.net |
|
|