View Full Version : ocr rip permanent sub from mpeg2 to text-file?
Chetwood
19th November 2003, 12:21
I have an mpeg2 file with permanent subs in the lower border of the picture. I wanna extract the subs to a text file so I can later import it into DVD Maestro and add it as a regular subtitle stream.
I have the very same movie clip without permanent subtitles in a higher resolution which will then be combined on DVD with the newly generated subtitles. Any suggestions on which tool to use for ocr-ing?
I'm aware I might not be able to get the time codes so that I will have to set them manually in Sub Station Alpha. Still my primary concern is not to have type the subtitles manually.
smiller667
19th November 2003, 19:11
There are two tools for this, the one is sublog, the other one was mentioned in this forum as well. I have never tried this myself - good luck.
Chetwood
21st November 2003, 08:35
Originally posted by smiller667
There are two tools for this, the one is sublog, the other one was mentioned in this forum as well. I have never tried this myself - good luck.
And would you mind telling me the name of the other tool your're referring to and url of both tools. I've been using the search function of this board several times but obviously always using the wrong keywords. That's why I did my posting in the first place. Thanks.
Shalcker
21st November 2003, 11:14
Well, there is also my AVISubDetector (you'll have to use AviSynth to make it work on mpeg2 files though, and in some low-quality cases it can fail), but it only extracts timecodes and saves frames with subtitle-like images :)
You'll either have to type text manually (once per each text appearance/change) or crop and OCR extracted bitmaps with some kind of generic OCR software.
Current version of AviSubDetector is available here (http://web.etel.ru/~shalcker/)
smiller667
21st November 2003, 11:23
Searched some more ... it is AVISubDetector (see e.g. this thread: http://forum.doom9.org/showthread.php?s=&postid=333009).
And just to slightly correct my previous posting, both tools don't do any ocr themselves. For sublog, bmp files can be extracted for external processing in e.g. subrip. For the second tool, it extracts subtitle timings, but optionally allows you to manually type the text and save the frames, see above.
Did you try sublog? With subs purely in the black bars, the readme suggests it shouldn't be a problem to extract clean bmps. If you don't want to use an external ocr tool, you might convert the script to microdvd and convert the subs to sup using the dvdsuptools. Use ifoedit's dvd-author function to produce a subbed vob & then you can use subrip to either extract bmps suitable for Maestro or do the OCR thing in Subrip. Coomplicated but maybe faster than typing all the subs yourself. Just an idea.
NB: Shalcker beat me with his posting :).
vBulletin® v3.8.11, Copyright ©2000-2024, vBulletin Solutions Inc.