Log in

View Full Version : What do you use to OCR Subtitles?


LeXXuz
28th August 2010, 21:48
Just curious what other people use to OCR either PGS or Vobsub subtitles.

As for Vobsub (up to 576p resolution) I use SubRip 1.54 and for PGS Suprip 1.16.

SubRip works quite fine since I have quite a set of character matrices and dictionaries build up over the years.

As for Suprip I am not quite happy with the latter since it has a lot of problems with Italics and also a lot of special characters are sometimes causing trouble. It's also a pain that it seems to forget already ocr'd tracks, sometime after ocr of other tracks, sometime directly after closing the program.

I'm getting tired of spending more than half an hour for doing OCR on a subtitle track including all the aftermath that is necessary afterwards to fix lots of ocr errors.

So, I was wondering if there are other tools to OCR PGS subtitles I should take a look at.

7ekno
29th August 2010, 03:25
Any reason it needs to be OCRed?

Can't live with idx/sub? Far quicker and doesn't have too much container overhead ...

7ek

LeXXuz
29th August 2010, 07:52
Any reason it needs to be OCRed?

Can't live with idx/sub? Far quicker and doesn't have too much container overhead ...

7ek

My players do not fully support picture based subtitles like Vobsub/PGS. Thats why I stick with OCR and all its flaws.

Ghitulescu
29th August 2010, 08:09
I think that in the end you'll nevertheless got tired of all this extra work and buy yourself a player that can display PGSs.

LeXXuz
29th August 2010, 08:28
I'm already tired.^^

I would buy new players but I cant afford them at the moment and in the near future. I fell for the vendors promise that there would be full Vobsub support "soon". This promise is more than 20 month old. Meanwhile the player is EOL.

I get the feeling that nowadays most people don't use OCR anymore. :rolleyes:

Should have posted a poll, would be interesting to know.

laserfan
29th August 2010, 14:46
I get the feeling that nowadays most people don't use OCR anymore. :rolleyes: Not true. I (and others I'm sure) use SupRip 1.16 all the time. I didn't respond to your post because I don't know of any alternatives myself (and would be interested), though honestly the very few cases where SupRip has failed to work for me have been with corrupt streams.

Yes some PGS files require a half-hour of hand-holding the OCR process (and some not-insignificant typing) and then processing with Subtitle Workshop for fixing OCR errors (like confused Is and ls) and SDH stuff (LOUD EXPLOSIONS). But since subtitles are usually the reason for my conversions in the first place, I don't mind the extra effort (and I don't do it all that often).

Still, if there's a better tool than SupRip I too would like to know of it.

Ghitulescu
30th August 2010, 08:11
I see only one advantage of OCRed subtitles (text) over PGS ones (bitmap):
- smaller size

If one needs diacritics (ç, ¢, 듨, ઑ, ƾ, ʤ, Ճ, ڱ, ט ) then OCRed subtitles are more problematic than the PGS ones, as one needs:
- a working system (the ability to use code pages and the provision of that code page) - to get them correctly into text form
- a working player, that could understand and use the text and output the correct characters on the screen.

So, many anime lovers would be forced to hard-encode the subtitles rather than packing a SRT file with the main movie.

Usedocne
21st September 2010, 01:21
I hate OCR'ing, especially when every other word, capitalization and spacing get fudged to heck. (plus my lack of grammar doesn't help :D)

As for what tools I use: SubRip and SupRip (<- this one is the worser of the two :devil:)

Superb
21st September 2010, 05:28
I use Subresync to OCR IDX/SUB tracks.

Pros:
- OCR is pretty accurate: less out-of-place spaces, less I's becoming l's and so on...
- Handles higher resolutions in IDX/SUB. Like when converting Bluray's SUP file into IDX/SUB (using BDSup2Sub) for OCRing.
- Clearly displays current letter and line.

Cons:
- Cannot save/load CharMatrix between sessions. You can, however, rip an entire season (usually uses the same font) by using the same Subresync window and opening the next IDX/SUB pair without loosing your CharMatrix.
- OCR thresholds not configurable, thus having problem OCRing bad fonts OR text w/ a bit-different-every-time letter hinting.