Log in

View Full Version : Batch OCR convert subtitles?


zerowalker
18th November 2013, 06:14
I would like to Batch convert many .sub .idx files to .srt.

Why i want to Batch and not go through it manually, is that itīs so many, and i donīt really care for some errors, itīs not worth the time.

I am currently using Subtitle Editor to manually OCR then Save as .srt.

But doing that all the time takes ages, as we are talking 50+ files.

Chetwood
18th November 2013, 06:58
Why not ask in the subtitle subforum? There's a link to Tappen's Subextractor which does batch OCRing when the files are similarly named like:

my show s01e01
my show s01e02

It will ask you to confirm some chars on the first couple of subs, but then it will go through the rest very fast.

zerowalker
18th November 2013, 07:32
Oh missed that, searched on "subtitles" and found General so thought it didnīt exist, a blunder indeed.

Hmm may work perhaps, will look it up, thanks.

An Admin or Moderator, would you kindly move this to the Subtitle subforum when you have time*

zerowalker
19th November 2013, 17:41
I am trying Subtitle Extractor, but i donīt get how to Batch run.
I can add all files, but i still have to go through them, and do many checks on characters.

Asmodian
20th November 2013, 01:53
Well yes, how do you expect it to know what the characters are until you tell it? It stops asking as you teach it more but there is always that "&" or something that wasn't used until episode 22.

That is the problem with OCR; it isn't magic. That said Subtitle Extractor is very impressive, it is easy to do a lot of files and the OCR works very well. I do not believe you will find a better solution but you will have to click a few times per file at minimum. I too would like a pure batch mode which only prompts for unknown characters.

Chetwood
20th November 2013, 06:37
Which is exactly what sub extractor does. When the files are named similarly, it will only check for a few chars that are always ambiguous and the apply those results to the following files. And honestly, I don't think you waste time on correcting some 'I' 'l' confusion at the end of the OCR process.

Asmodian
20th November 2013, 10:13
I remember needing to click a few times for each episode after teaching it most characters, to move through the steps and save and start the next one? I did go through the auto fix lists too, very helpful.

After getting used to it I found Subtitle Extractor quick and easy to use but there are a few extra clicks that, at least with some workflows, could be automatic. Maybe an option on each page so SubExtractor will auto next using current values or auto save? There is a good reason not to do such by default (that "Review and Correct OCR Matches" button!) but when looking at 100s of episodes that next at OCR done can get painful. ;)

Chetwood
20th November 2013, 15:45
Well, you can ask Tappen to implement a true batch mode, he's usually quite receptive but he seems to be busy with work these days.

zerowalker
20th November 2013, 19:12
But it asked for the same stuff every episode pretty much ' , . O o Those characters and some more every time.
Is it supposed to be like that?

Cause i can clearly understand that if there is a new character it will ask. But if it looks the same, i donīt get why it should ask, except if itīs in a hard position in the picture or something.

Asmodian
20th November 2013, 21:57
Actually they are all different, it depends on the font used but some fonts keep rendering those characters differently. In my experience, after you do really a lot of those episodes it has always stopped asking (after you get all 14 different "O"s ). Subtitle Extractor is very picky which is why it is so good. I understand you say you don't care about errors but most of us do. :)

@Chetwood
TBH, I never felt it was worth bothering Tappen with and Subtitle Extractor was such a big improvement in my workflow already. :o
I'll mention it.

Chetwood
21st November 2013, 06:47
But it asked for the same stuff every episode pretty much ' , . O o
It does not when they are named similarly (without episode titles).

zerowalker
21st November 2013, 20:01
They arenīt that similar in names so i guess thatīs why.
But also i guess i would have to keep pressing next, so itīs a "manual" batch run?

And indeed itīs good thatīs itīs picky, and that explains why there was like 5 different ' characters, they looked a bit different so i didnīt know what to do, but just used ' for all of them.

Chetwood
22nd November 2013, 08:12
It's a manual batch run alright. But given the amount of short clicks you need to get a nearly perfect sub, I'd still prefer it over a complete automated version.