View Full Version : subtitle conversion from DVD to text without OCR?
lovelove
20th November 2010, 21:52
Hi. I couldn't find a lot of info on the net on subtitles. But from what I understand, DVD subtitles don't exist as text streams but as bitmap (pixel) images ONLY, correct? (... although MediaInfo displays my .vob files as having text streams)
I found a lot of threads here on doom9 pointing to programs which demux the subtitles from the .vob files. Depending on the program, this results in .sup files or in a .sub/.idx combination. But none of them are human readable, isn't it? My guess is that the files still contain a stream with bitmap graphics.
My question is: Is there any program which would decode the data in those files directly to text without the intermediate OCR step? A .sup to .srt converter? Or a .sub to .srt converter? Or something like that?
:thanks:
Inspector.Gadget
20th November 2010, 23:36
although MediaInfo displays my .vob files as having text streams
Some DVDs have closed captions, which are actually text streams muxed into the video stream. They are not the same as bitmap subtitles, which MUST be OCR'd to translate them to text subs, because there is no metadata in the bitmap sub stream that will tell any "dumb" converter what each letter should be. You can hand off the OCR to automatic methods with varying degrees of success using Subrip (very accurate but requires some intervention) or DVDSubedit (reasonably accurate, requires little user input).
lovelove
21st November 2010, 00:54
bitmap subtitles, which MUST be OCR'd to translate them to text subs, because there is no metadata in the bitmap sub stream that will tell any "dumb" converter what each letter should be.
OK, I am not sure if I will manage to articulate my reply in an intelligible way, but I will try. Take a JPEG encoded image. You can open it in an image editor, put the image on screen an then rotate it 90 degrees to the right. And then save it again as a .jpeg file.
On the other hand, there are programs which manipulate the JPEG bitstream directly and rotate the image without ever bringing it on the screen (not even "hidden", because the jpeg bitstream is never decoded to x,y image pixels but manipulated directly, without decoding).
When translating this situation to subtitles, I was hoping that *somehow* the step of bringing the subtitles on the screen (hidden or unhidden) for OCR could be avoided. Now I admit that even after pondering this for quite a bit, I fail to see how this could possibly be done. But my hope was that the experts here, who have seen a lot in their life, would maybe know of *any* other way than OCR...
Is the analogy more or less understandable?
lovelove
21st November 2010, 01:07
Some DVDs have closed captions, which are actually text streams muxed into the video stream. They are not the same as bitmap subtitles
And how can I convert the closed captions on DVD to a text file?
The MediaInfo output of one of my .vob files (the second 1 GB vob file of six) looks as follows:
Text #1
ID : 224 (0xE0)-DVD-1
Format : EIA-608
Muxing mode : MPEG Video / DVD-Video
Muxing mode, more info : Muxed in Video #1
Stream size : 0.00 Byte (0%)
Text #2
ID : 32 (0x20)
Format : RLE
Format/Info : Run-length encoding
Text #3
ID : 33 (0x21)
Format : RLE
Format/Info : Run-length encoding
Text #4
ID : 34 (0x22)
Format : RLE
Format/Info : Run-length encoding
When playing this .vob file in VLC, I have this in my subtitle menu:
closed captions 1
closed captions 2
closed captions 3
closed captions 4
When playing the VIDEO_TS folder, VLC offers me a lot more subtitles:
Track 1 - [Russian]
Track 2 - [English]
Track 3 - [Espagnol]
Track 4 - [Esperanto]
closed captions 1
closed captions 2
closed captions 3
closed captions 4
Why the difference between MediaInfo and VLC?
hm...this seems so complicated and I just don't know where to start to better understand this ...
Inspector.Gadget
21st November 2010, 01:41
And how can I convert the closed captions on DVD to a text file?
CCExtractor.
lovelove
21st November 2010, 01:58
Ok, thanks. I'll try.
Any idea about the results posted in #4 ?
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.