Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 18th May 2010, 23:22   #1  |  Link
Daveyboyc
Registered User
 
Join Date: Feb 2009
Posts: 11
Converting DVB subtitles to text

Hello

I need to convert DVB subtitles to text format for a project that I am making.
i have been trying for a while and this is what i do currently:
demux videos with ProjectX ro .SUP (bitmap like DVD) files,
then run an OCR, the best so far is DVDSubedit with uses a OCR by GOCR. it still has a few problems with some characters however and i cant go and correct all as i need to rip alot of subs.

Anyone have any ideas of a differnet solution I could use or how i can improve the OCR, maybe the bitmaps themselves?
Daveyboyc is offline   Reply With Quote
Old 19th May 2010, 07:41   #2  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,769
What are DVB subtitles in your opinion?
Ghitulescu is offline   Reply With Quote
Old 19th May 2010, 13:32   #3  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,901
Quote:
Originally Posted by Ghitulescu View Post
What are DVB subtitles in your opinion?
http://broadcasting.ru/pdf-standard-...743.v1.2.1.pdf
Guest is offline   Reply With Quote
Old 19th May 2010, 13:51   #4  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,769
Thank you for the link, I have the document already. I asked of "his" opinion ... because:

Since I'm doing DVB subtitles for some time I noticed that most people simply don't know how to set up ProjectX - and that TXT subtitles are anyway text based, thus no OCR needed ... unlike the PGS ones - ProjectX can extract both of them as SUP.
Ghitulescu is offline   Reply With Quote
Old 19th May 2010, 14:06   #5  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,901
In my experience the majority of broadcast subtitles are bitmaps.

I write subtitle drivers for settop boxes and I have never seen text subtitles. The reason is that multiple languages must be supported and it's easier to just send bitmaps rather than try to implement font rendering in the settop box.

Last edited by Guest; 19th May 2010 at 15:01.
Guest is offline   Reply With Quote
Old 19th May 2010, 14:31   #6  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,769
You are US-based , of course you don't have TXT subtitles, you have CCs.

Within EU on the other hand, only the Nordics have PGS (also TXT), followed by UK, and France. Principially for HD, as the subtitles for SDs are kept in the old(?) format to assure the compatibility with the installed TV base (many/most cable operators simply feed the SAT signal into their network).
Ghitulescu is offline   Reply With Quote
Old 19th May 2010, 14:35   #7  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,901
Quote:
Originally Posted by Ghitulescu View Post
You are US-based , of course you don't have TXT subtitles, you have CCs.
No, I work on settop boxes for countries all over the world. For example, most recently, I worked on subtitling for a Serbian service. It is bitmap based.

Let's wait for the OP to tell us what he actually has in his source material. If he would like to post a sample, we can see what he actually has.

Last edited by Guest; 19th May 2010 at 14:39.
Guest is offline   Reply With Quote
Old 19th May 2010, 14:51   #8  |  Link
pandy
Registered User
 
Join Date: Mar 2006
Posts: 1,049
BBC transmit both types (TXT and DVB) - also for live content dynamically typed (for DVB it is quite rare and seems that some vendors have problem with proper displaying them), also BBC subtitles are rich in attributes (especially colors - some times few changes in one line).

btw
I looking for open source solution to direct (without intermediate DVD subtitles phase) create DVB subtitles (best from series of the pictures) - some limited capabilities are implemented in VLC but documentation is quite vague...
pandy is offline   Reply With Quote
Old 19th May 2010, 14:54   #9  |  Link
pandy
Registered User
 
Join Date: Mar 2006
Posts: 1,049
Quote:
Originally Posted by Daveyboyc View Post
Anyone have any ideas of a differnet solution I could use or how i can improve the OCR, maybe the bitmaps themselves?
Use a better OCR?
maybe for example Abbyy OCR solution? http://www.abbyy.com/ (one of the best in my opinion - if not best of all)
pandy is offline   Reply With Quote
Old 14th September 2010, 18:59   #10  |  Link
Daveyboyc
Registered User
 
Join Date: Feb 2009
Posts: 11
sorry about late response havent looked at this for a while.
i was referring to DVB *bitmap* subtitles which are the only type available on freeview television (not freesat).
I found a script to demux and OCR, the thing is it takes a while and not completely free of errors. Its a real pain.
Daveyboyc is offline   Reply With Quote
Old 15th September 2010, 09:52   #11  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,769
You don't need to OCR the subtitles, you can use them as bitmaps (PGS).
Ghitulescu is offline   Reply With Quote
Old 15th September 2010, 11:25   #12  |  Link
bigotti5
Spielberger
 
bigotti5's Avatar
 
Join Date: Feb 2005
Posts: 838
Quote:
I need to convert DVB subtitles to text format for a project that I am making.
Quote:
You don't need to OCR the subtitles, you can use them as bitmaps (PGS).
SCNR
bigotti5 is offline   Reply With Quote
Reply

Tags
dvb, dvdsubedit, ocr, subtitles, television

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:56.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.