View Single Post
Old 12th September 2005, 08:58   #200  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@Saligia: Thanks for the comments.

This whole "scale, to obtain large, clear, non-joined chars" sounds interesting, it's along the lines of AVISubDetector's "double resolution" feature. I did not use it for SubRip because that makes the characters too large, and SubRip's current limit is 72x44 ever since Delphi couldn't handle dynamic arrays. I guess I could make them larger, but that would deffinitely impact performance and potentially make char matrices less reusable. I'll think about it, but again, I have very little time to devote to this.

A question: what kind of scaling are we talking about? Is there some processing going on there to make letters disjoint? Can you send me an example (look up my email in the manual)? The scaling could be implemented directly in SubRip.

Also, it seems that your steps 3 and 4 are superfluous. Why don't you just load the "enlarged" .sub/.idx directly in SubRip, instead of making an .avi? This will also help with your problem, because you won't get the .avi compression artifacts screwing up with the binarization in the hard-subbed .avi OCR.

About your agenda: what the best guess does is it puts a new char in the char matrix, once you confirm it. So, the result of option A would be a char matrix potentially filled with really bad guesses. In my experience, once you start it, you seldom have to type. Try one of the following:
- lower the OCR sensitivity
- use the "fill matrix" option - it's a bit of work to match the font, but after that it should be smooth
I'm not clear what good options B and C would be. Presumably, you'd insert some special char or comment, and then go in and edit manually afterwards (spellcheckers never quite work). Again, if you got a decent source, once you start the OCR you have to type less and less.

Hopefully, with my suggestion above, you'll cut down the time you spend doing this.
ai4spam is offline   Reply With Quote