PDA

View Full Version : Converting DVB subtitles to text


Daveyboyc
18th May 2010, 23:22
Hello

I need to convert DVB subtitles to text format for a project that I am making.
i have been trying for a while and this is what i do currently:
demux videos with ProjectX ro .SUP (bitmap like DVD) files,
then run an OCR, the best so far is DVDSubedit with uses a OCR by GOCR. it still has a few problems with some characters however and i cant go and correct all as i need to rip alot of subs.

Anyone have any ideas of a differnet solution I could use or how i can improve the OCR, maybe the bitmaps themselves?

Ghitulescu
19th May 2010, 07:41
What are DVB subtitles in your opinion?

Guest
19th May 2010, 13:32
What are DVB subtitles in your opinion? http://broadcasting.ru/pdf-standard-specifications/subtitling/dvb-sub/en300743.v1.2.1.pdf

Ghitulescu
19th May 2010, 13:51
Thank you for the link, I have the document already. I asked of "his" opinion ... :rolleyes: because:

Since I'm doing DVB subtitles for some time I noticed that most people simply don't know how to set up ProjectX - and that TXT subtitles are anyway text based, thus no OCR needed ... unlike the PGS ones - ProjectX can extract both of them as SUP.

Guest
19th May 2010, 14:06
In my experience the majority of broadcast subtitles are bitmaps.

I write subtitle drivers for settop boxes and I have never seen text subtitles. The reason is that multiple languages must be supported and it's easier to just send bitmaps rather than try to implement font rendering in the settop box.

Ghitulescu
19th May 2010, 14:31
You are US-based :rolleyes:, of course you don't have TXT subtitles, you have CCs.

Within EU on the other hand, only the Nordics have PGS (also TXT), followed by UK, and France. Principially for HD, as the subtitles for SDs are kept in the old(?) format to assure the compatibility with the installed TV base (many/most cable operators simply feed the SAT signal into their network).

Guest
19th May 2010, 14:35
You are US-based :rolleyes:, of course you don't have TXT subtitles, you have CCs.
No, I work on settop boxes for countries all over the world. For example, most recently, I worked on subtitling for a Serbian service. It is bitmap based.

Let's wait for the OP to tell us what he actually has in his source material. If he would like to post a sample, we can see what he actually has.

pandy
19th May 2010, 14:51
BBC transmit both types (TXT and DVB) - also for live content dynamically typed (for DVB it is quite rare and seems that some vendors have problem with proper displaying them), also BBC subtitles are rich in attributes (especially colors - some times few changes in one line).

btw
I looking for open source solution to direct (without intermediate DVD subtitles phase) create DVB subtitles (best from series of the pictures) - some limited capabilities are implemented in VLC but documentation is quite vague...

pandy
19th May 2010, 14:54
Anyone have any ideas of a differnet solution I could use or how i can improve the OCR, maybe the bitmaps themselves?

Use a better OCR?
maybe for example Abbyy OCR solution? http://www.abbyy.com/ (one of the best in my opinion - if not best of all)

Daveyboyc
14th September 2010, 18:59
sorry about late response havent looked at this for a while.
i was referring to DVB *bitmap* subtitles which are the only type available on freeview television (not freesat).
I found a script to demux and OCR, the thing is it takes a while and not completely free of errors. Its a real pain.

Ghitulescu
15th September 2010, 09:52
You don't need to OCR the subtitles, you can use them as bitmaps (PGS).

bigotti5
15th September 2010, 11:25
I need to convert DVB subtitles to text format for a project that I am making.

You don't need to OCR the subtitles, you can use them as bitmaps (PGS). :rolleyes: :) SCNR