Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se |
|
|
#1 | Link |
|
Registered User
Join Date: Feb 2009
Posts: 11
|
DVB Subtitles to Text Format
Hi,
I'm trying to rip subtitles from dvb .ts recordings into text. The reason i need them to be text is because i want to build a database wherby i can search through different tv programs using keyword search terms. i can rip subpics with Projectx and have tried Subrip, Vobsub and dvdsubedit. However i can never OCR the the files seamlessly. The fonts on all the recordings appear the same so it should be possible to do this. Can anyone offer any advice like training a certain program to read the font? Regards Dave |
|
|
|
|
|
#5 | Link |
|
Registered User
Join Date: Mar 2009
Location: Germany
Posts: 5,773
|
ProjectX can definitively save the teletext as text. I use ProjectX for some years now and this is exactly how I "work" Arte which has 2 languages (FR+DE) and 4 subtitles (2 FR + 2 DE).
I cannot tell you how exactly, because my internet PC is far far away from my video PC which has no internet connection. There are also subtitles in DVB format which essentially is DVD format. If that's the case then do TS->DVD then rip the subtitles with SubRip. |
|
|
|
|
|
#7 | Link |
|
Registered User
Join Date: Feb 2009
Posts: 11
|
yeah i've been trying to use subrip but its not without its problems. the fonts and subpics should be tottally ocr-able as they look fine in programs such as subview etc but subrip sometimes cant read the lines properly.
maybe theres a setting or something to make it work as it does work on about half the subs. i'm not recording teletext subs at the moment as they are anologue and soon to be obosolete. avidemux i dont know much ablout, i tried doing the .ts to .srt earlier and it wouldnt let me. any ideas? |
|
|
|
|
|
#8 | Link |
|
MPlayer addict
Join Date: Dec 2008
Posts: 33
|
You could also covert dvb subs into vobsubs with latest (development version) ProjecX
http://www.oozoon.de/main_en.html Jus check vobsub export box. Vobsubs can be converted to srt with Avidemux |
|
|
|
|
|
#10 | Link |
|
Registered User
Join Date: Mar 2009
Location: Germany
Posts: 5,773
|
For the last time:
There are two types of subtitles in a TS (DVBS, DVBT, DVBC etc.): subtitles that appears within teletext (you select page 150, 888 etc.) or DVB specific subtitles that are DVD compatible. In the first case you have them already in text form (and the font plays no role, you know, like in Notepad and it's absurd and impossible to OCR a text unless you make a BMP of NOTEPAD and use your OCR software upon it. Use projectX for this.In the second case you use one the methods listed above, since DVB/DVD subtitles are bitmaps and they need to be OCRed. Just test it for yourself how they are displayed on your TV to know their type. |
|
|
|
|
|
#11 | Link |
|
HDConvertToX author
Join Date: Nov 2003
Location: Cesena,Italy
Posts: 6,552
|
you can always upload a sample of ts with subs
![]() BHH
__________________
HDConvertToX: your tool for BD backup MultiX264: The quick gui for x264 AutoMen: The Mencoder GUI AutoWebM: supporting WebM/VP8 |
|
|
|
|
|
#12 | Link |
|
Registered User
Join Date: Feb 2009
Posts: 11
|
i know there are 2 types of subtitles (teletext and dvb bitmap). teletext is pretty much soon to be dead in the uk so i'm not really interested in it. i can rip dvb subs with projectx as bitmaps (.sup ans sup/idx) just fine. the problem lies in the OCR, i haven't found the right solution for that yet (with subrip or anything else). the fonts on the subs are all the same for all the recordings i have made so it should be very possible. here is a link to some .ts files.
http://www.mediafire.com/?sharekey=8...eada0a1ae8665a |
|
|
|
|
|
#13 | Link | |
|
Registered User
Join Date: Mar 2009
Location: Germany
Posts: 5,773
|
Quote:
In this case, if I correctly understand your question, then I assume you never had anything else OCRed, otherwise you'd know by now that every OCR process needs once in a while a helping hand from the human operator No algorithm is perfect, not to mention that errors in transmission may affect the bitmap image the very same way it does with the video image or audio track.I cannot test the files now, only in WE. |
|
|
|
|
|
|
#14 | Link |
|
Registered User
Join Date: Feb 2009
Posts: 11
|
yeah, there dvb subtitles (bitmaps).
i have actually tried OCR and i can get it to work on some of the subpics but not all and i dont know why. the best solution at the moment is a version of DVDsubedit which reads pretty much all the characters but still not doing a perfect job as it leaves words with too many spaces in between the lettering (e.g "w h ere i s Jo hn Smith etc etc). the author of the program is looking into this for me but i want to know if anybody else as a solution as creating text files from digital tv recordings is something that i really need for what i'm trying to achieve. there must be a way! |
|
|
|
|
|
#15 | Link |
|
Registered User
Join Date: Feb 2009
Posts: 11
|
guys, if any of you actually know how i can go about solving this problem of getting dvb (bitmaps) subtitles to text format please let me know. teletext is now being scrapped earlier tha n anticiapted in the uk due to loss in revenues and will be decomissioned next year. this makes it even more important that i find a way of working with digital subtitles.
dave. |
|
|
|
|
|
#16 | Link |
|
Registered User
Join Date: Mar 2009
Location: Germany
Posts: 5,773
|
If your keyword database doesn't include special words like and, or, hey, you, what, because and the like, I think you can spare yourself the effort and input yourself the relevant ones.
Unless, of course, you'd like to search for whole phrases. |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|