Log in

View Full Version : idx/sub to srt and correct font


ParkerLewis
24th October 2009, 14:11
Hello all,

I have a bunch od idx/sub (image-based) subtitles that I'd like to convert to text-based subtitles (namely srt).

After a look-up on the forum it looked like my best hope was :

1) Convert them to Blu-Ray SUP (image-based too) format thanks to BDSup2Sub

2) Run them through SupRip for the OCR and export to srt (text-based).

Unfortunately, SupRip doesn't seem to be able to automatically OCR them. The subs are displayed fine, I can tweak the numbers so that each of them is correctly identified as a character (same thing with the space size), but it won't be able to determine which character it is.

Maybe I need to tell him which font was used, but I have no idea, except that it looks like it's the standard font I've always seen displayed by those ugly idx/subs.

I guess my questions are...

- Is what I'm doing the best way to convert these idx/sub subtitles ? My only need is I definitely don't want to go through the tiresome manual recognition I remember having to cope with with Subrip years ago when I had a similar problem. I just don't want to have to create the char matrix everytime.

- do you guys know what's the default font in idx/subs, if this info were to be of any use in the process ? I could provide one of the subtitles files if needed but I'm pretty sure it's just the same as it usually is with idx/sub files.

Thanks for your help.

kevinsert
24th October 2009, 15:28
+11111 :(

manusse
24th October 2009, 17:57
Hi,

If your sub/idx are Standart Def (not HD), then you can use SubtitleCreator to convert them to the Sup format. Then use DVDSubedit to OCR them. This will only work with PAL subtitles because SC still has a bug regarding the timings of NTSC subtitles.

Cheers
Manusse

ParkerLewis
24th October 2009, 19:58
Hi,

If your sub/idx are Standart Def (not HD), then you can use SubtitleCreator to convert them to the Sup format. Then use DVDSubedit to OCR them. This will only work with PAL subtitles because SC still has a bug regarding the timings of NTSC subtitles.

Cheers
Manusse

This technically works, thanks. Although there seems to be quite a lot of mismatches (lots of "f"s detected as "r"s, things like that). Is there a way with better results (still automatic) ?

hamletiii
25th October 2009, 22:02
Why not just use VOBSUB's subresync, it can read idx+sub directly and has an OCR engine.
It will ask you to build a character matrix which you can save for later use.

ParkerLewis
28th October 2009, 18:02
Why not just use VOBSUB's subresync, it can read idx+sub directly and has an OCR engine.
It will ask you to build a character matrix which you can save for later use.

Because in my experience, the chance of being able to reuse the same matrix on another source is next to nothing. Building a character matrix for each file sucks big balls and I refuse to do that (not to mention it would be stupid as the characters are actually visually identical, they apparently just don't numerically match for some reason).

talen9
28th October 2009, 20:17
Ahem ... SubRip (http://zuggy.wz.cz/) anyone?

Why go through BDSup2sub and Suprip when there's the right program for .idx/.sub already? :confused:

And no, there's NO default font for those kind of subs, there are at least 3-4 of them, and that's the reason why even if they seem all the same to you, the OCR phase is not simple for the various program(s).

On the SubRip page linked above you can download a "catch-all" kind of matrix already readied for you, but I actually had not a very satisfying experience with it, a bit too many false positives ...

ParkerLewis
28th October 2009, 20:26
Ahem ... SubRip (http://zuggy.wz.cz/) anyone?

Why go through BDSup2sub and Suprip when there's the right program for .idx/.sub already? :confused:

And no, there's NO default font for those kind of subs, there are at least 3-4 of them, and that's the reason why even if they seem all the same to you, the OCR phase is not simple for the various program(s).

On the SubRip page linked above you can download a "catch-all" kind of matrix already readied for you, but I actually had not a very satisfying experience with it, a bit too many false positives ...

I mentioned SubRip in my initial post, which I dismissed because of the problems you mention. My goal was (and is) to end-up with a better (ie more accurate) way to do it (without having to do it manually).

talen9
28th October 2009, 21:03
Whopps sorry :o and to say that I re-read your post at least twice ...

Anyway, IMHO the problem is more or less intrinsic in the way DVDs have NO standard font for the subs, where the BluRays I saw all had "standard" arial font, very OCR-friendly (for SupRip at least) ... but you have to go a bit manually nonetheless even there ;)

manono
31st October 2009, 11:30
My goal was (and is) to end-up with a better (ie more accurate) way to do it (without having to do it manually).
Keep dreaming. Either use the image-based subs (IDX/SUB or SUP), or do a manual OCR. Heck, even without a matrix it only takes 10-15 minutes, maybe with some editing afterwards. As for:
Because in my experience, the chance of being able to reuse the same matrix on another source is next to nothing.
I don't usually have to type much any more because the matrices are reused once you build up a good number of them.

Conspicuous57
31st October 2009, 23:45
Download and install Subrip 1.20. (You must use 1.20)

Then start ripping your own subtitles. Just define all the characters.
After a few subtitle packs, you will be asked to use your keyboard seldomly.
If you want to use it occasionally, then you can use "SubRsync".
But if you want to do this work continuously, you should definitely use "SubRip".
Don't use any beta version.

Chetwood
8th November 2009, 13:22
What's wrong with the beta? In my experience even the 1.20 final has some flaws, namely the inability to properly distinguish between "i" and "l".