@ai4spam: thanks, looks promising. I'll report back.
@niamh: The subs are not soo hard to OCR - edit the colours to remove all background and antialisasing colours and set the OCR difference threshold a bit lower, perhaps 700. Distinguishing italics and plain text is tricky, though. I'd probably ignore all text attributes and set the few lines in italics manually afterwards.
Steve
|