View Single Post
Old 20th March 2012, 23:00   #229  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
Ah, I finally see the problem. This is the same as the issue with l and I having the same bit pattern in many subtitle fonts making accurate matching impossible. I added the entire spellcheck step just to solve that issue.

I'll have to make an option in the spellcheck step to discriminate between i and ¡ to fix this. I suppose the rule is that if it's not at the beginning of a word, or just after a ¿ at the beginning of a word, I can assume it's an i (eye) and not an inverted exclamation point. Otherwise I'll have to ask and build up a dictionary of words that really begin with i. Quite a bit of work, but I'll see what I can do.

For now, I'd remove the training and when you next run the OCR choose i (eye) and not the inverted exclamation because there are likely more of the former than the latter making cleanup easier.
Tappen is offline   Reply With Quote