PDA

View Full Version : SubRip / OCR Advanced / Font Style (OCR)


DataLore
24th August 2004, 06:18
I have done serious research on subrip and it seems pretty simple..

I ran subrip on a 2 season dvd boxset and it went thru the first season without any problems.. (That basics i's l's mixed up and the ocr correction almost removed every one of those errors)

Once I started subrip on the second season. (with a new matrix or the old saved matrix) I was getting about 90% error on reads.. It seemed to be confused by the letter u (pulling the left half of the letter then the right half of the letter)

M and N were misunderstood as 2 letters as well.. I tried to subrip it over and over.. there seems to be a visual difference in the font used on the actual images stored in the .VOB's

I have tried to tweak the advanced OCR features but I am at best guessing on what I'm playing with.. There seems to be almost no documentation on the advanced settings..

I can subrip and pull the 10% good and manually adjust the 90% horrible errors and load them into substation and manually watch the show and make each line correct.. but this it VERY time consuming..

Running spell check on words that even next to other words make no sense, is impossible.

Has anyone else encountered this almost total inability to read a particular font style?? If so was it correctable... is there a chance that OCR tweaking will help??

if there is anything you can suggest I would be very greatful. I want to get the subs out to someone that requested them.. As I know he cannot enjoy the audio.. but I do not have the time required to fix 20+ episodes.

Thanks for your time!

DataLore

niamh
24th August 2004, 07:18
Does vobsubconfigure choke on those fonts too? You can OCR from there as well...
And what's wrong with leaving the subtitles as idx/sub?

In subrip, you can edit the character matrix manually from "character matrix". While I've never done it, it seems like it could be a solution for you, want to give it a go?

DataLore
24th August 2004, 17:12
I have followed the how-to's with vobsub.. and my first problem was that smart ripper renamed the vob's with vts_02_2.vob vts_02_3.vob vts_02_4.vob. and this caused a problem as it didnt start with 1 ie: vts_02_1.vob.

So once corrected I was able to load the file with the .idx file created by smart ripper..

But then nothing else seemed to happen once it loaded..(It went thru the indexing an opening scan thingie) (Click OK) and then 3 files should be created in a directory I selected.. This didn't occur.. only 2 files and there was really nothing inside them.. 1 had very basic information but nothing really (2 k)

I tried to use those files but they didnt seem to have any meaning..

I followed the instructions I found to OCR then with subresync.. this did not work at all.. I was unable to OPEN anything with that application.. (Even renaming the files)

I really hate to sound new to subtitles.. but unfortunately I'm very new.. I have used subrip and substation and I'm quite good with them.. but Vobsub and subresync seem to be a little confusing.. If you can shed any light on there detailed use.. (Like changing to change the font style to allow subrip to read it)

just fyi:
the reason for the change to .srt or .sub (from the vob's) is to include the subs with the already divx compressed .avi's (Unless there is a better way.. and I'm truely open to suggestions..

Again thank you for your assitance, it's greatly appreciated! :D

DataLore

niamh
24th August 2004, 21:51
hhmmmmmmm............
So the subs are not extracted properly, that would explain why they're not ocr'ed properly either. sub files should be pretty big (around 10 mb), so there was no information extracted it seems, hence subresync freezing.(try playing those as separate subs with directvobsub, I bet they don't show up)
I have nothing against smart ripper but would you do me a favour and try to rip the vobs with dvd decrypter in IFO mode(options)(leave all the rest default)? Then you can extract the subs loading the ifo file in vobsubconfigure, and start from there. Just curious as to if it is a decrypting issue :)

Theoretically, if you load the idx in subresync, you would have a lot of timestamps appearing, and when clicking save as... srt, you would get the same ocr window as subrip, except cruder.

I take you ocr your subs in subrip by loading the vobs directly too, so it points again to a decrypting issue.(If you use dvd decrypter, you can do it all by the ifo file as well.)

By include the subs, do you mean, burn them in, mux them in or simply share them as separates? in any case, the way it looks, the idx/subs aren't even valid, so I can't see that they would display anyway :)

[edit] by the way, it has not much to do with a "font style" since vobsub files are bmp image based. That's what ocr'ing does, converts an image into text.