View Full Version : Getting subs from a DVD
Octo-puss
2nd September 2008, 18:14
Maybe dumb question, but how do I extract or however you wanna call it subs from a DVD? I know I should use EVOdemux, but the data are split into 1GB .VOBs as usual... So how do I do it? Shall I merge them together for this?
XhmikosR
2nd September 2008, 18:27
Just use VSRip (http://www.videohelp.com/tools/VSRip) and load the ifo of the main movie. Choose your desired languages and then you'll have the subs as .idx/.sub. If you want to convert them to .srt you can use SubRip (http://www.videohelp.com/tools/Subrip).
Octo-puss
2nd September 2008, 18:33
Oh, VSrip is by gabest! 2003 though... But I assume the technology didn't evolve a bit in this case? :)
Anyway, thanks!
Octo-puss
2nd September 2008, 18:37
Hm, doesn't work...
Opening ifo OK
ERROR: Not a Video Title Set IFO file!
That's DVD directly copied to hdd :(
SvT
2nd September 2008, 18:59
Hm, doesn't work...
Opening ifo OK
ERROR: Not a Video Title Set IFO file!
That's DVD directly copied to hdd :(
:) Select VTS_01_0.IFO (NOT VIDEO_TS.IFO) :)
Octo-puss
2nd September 2008, 20:33
jesus:D
but it's not THAT obvious :)
edit: I can't find any conversion option in SubRip... Or am I looking wrong?
XhmikosR
2nd September 2008, 20:44
File-->Open-->Open IFO and load your .idx. After that of course press the Start button.
Octo-puss
2nd September 2008, 20:53
confusing confusing confusing
but I am enlightened!
is it normal it doesn't process the data and I have to type in almost every char of the subs?
XhmikosR
2nd September 2008, 21:08
Yes, it is pretty normal. You have to "teach" the program first. The next time that you are going to do OCR with SubRip, you can load the Characters Matrix that you saved from the first time and you'll do the job much faster.
Be careful though because if you use your saved characters matrix and had errors (let's say you associated the letter i to l) then it will be carried on to all the srt you'll make in the future. You can of course correct your mistakes by editing the matrix.
Octo-puss
2nd September 2008, 21:18
When I extracted the subs with subrip it was doable. But those produced by vsrip are a disaster - they appear as a kind of interlaced huge fonts and I can't even tell some chars on first look myself, lol. I guess I could dig in the options a bit and maybe change the numbers, but I don't understand it too well.
XhmikosR
2nd September 2008, 21:22
I never had to change anything in VSRip. I just choose the languages I like. The .idx/.sub subtitles are exactly like they are on the DVD. So I cannot understand what you mean... If you like post the subtitles to take a look at them.
Octo-puss
2nd September 2008, 21:26
Let me try again.
What's this "extract closed captions" and "forced subs only"? Also, the Angles tab is something I don't get.
XhmikosR
2nd September 2008, 21:32
They are what the say. Extract closed captions and the other one to extract forced subs only. Don't tick them. Just tick the Reset time at the first selected cell.
http://thumbnails3.imagebam.com/1247/3de3d312463380.gif (http://www.imagebam.com/image/3de3d312463380)
What I didn't understand was:
But those produced by vsrip are a disaster - they appear as a kind of interlaced huge fonts and I can't even tell some chars on first look myself, lol.
Do you mean during the OCR process with SubRip or during playback?
Octo-puss
2nd September 2008, 21:40
(edit) the OCR process of course :)
Ok, first is subs extracted by VSrip and 2nd what SubRip shows when used exclusively. Speaking of the 2nd case, is there any way to fix parts of other letters being caught?
Guess it can take up to few hours before the pictures are allowed :(
XhmikosR
2nd September 2008, 21:42
Upload the images to an image hosting site like imageshack, imagebam etc.
Octo-puss
2nd September 2008, 21:55
Allright.
subs generated by vsrip:
http://img291.imageshack.us/img291/2730/vsripbq5.jpg
and subs being done in subrip itself:
http://img291.imageshack.us/img291/2733/subripqh0.jpg
XhmikosR
2nd September 2008, 22:21
Again I cannot understand the difference. Did you load both time the idx/sub which was exported with VSRip?
Upload the idx/sub (in rar) to http://zshare.net/ to take a look to the subtitles.
Octo-puss
2nd September 2008, 22:24
No, first is .idx/sub created by VSrip, loaded in SubRip.
2nd is screen taken from the process of attempt to extract the subs from .VOBs in Subrip
XhmikosR
2nd September 2008, 22:28
I don't know what exactly is happening, it's the first time I see something like this. Did you rip the DVD with DVDFab? If not try again after ripping the DVD with DVDFab.
If it works for you the second method then do it that way.
Octo-puss
2nd September 2008, 22:31
DVDshrink actually. I guess different DVDs have the subs stored a bit differently - that would be the reason they show in such weird big "font".
what I don't understand is why they look different when opened in different programs. Hm. Too late to dig in it today, will see tomorrow.
talen9
2nd September 2008, 22:41
I had this same problem for a lot of time ... then I saw a little, well hidden option which is on by default: "Use IDX's file offset" in "Options" -> "Global Options".
Just deactivate it .... and try again loading the .idx/.sub pair into SubRip ;)
Octo-puss
2nd September 2008, 22:52
Oh!!! Holy #@$%! That's it! I was about to post another screen of subs from different DVD where all I could see was a wall of "fog". This is MUCH better! thanks a lot dude :)
Still though, what to do when the program asks about a letter which in fact contains a part of the next one? This logically could lead to a mess.
talen9
2nd September 2008, 23:00
Yes, I know what you mean.
It usually happens when an italic (slanted) font is used... and words that containt, e.g., " fi ", where SubRip highlights the 'f' and the dot of the 'i' ... and right after it, what the 'i' without the dot is.
The right thing to do, IMO, is to tell it to consider the "f+the dot" as an 'f' and the 'i without the dot' as an 'i' ... because it will only match these characters again when they are repeated exactly as they are now ... one next the other.
I dunno if I succeded in explaining myself, though :)
You can change another parameter too, on a "movie by movie" basis: "Space width setup" in "Options" -> "Advanced OCR Setup".
It regulates the way SubRip decides where a character ends and the one beside it starts. Try raising/lowering it by one-two pixels and see if you like the results better :)
And, as I already said, this is very "font-dependent".
Octo-puss
3rd September 2008, 08:05
It works pretty well now :)
Shall I keep saving the matrix in the same file after each new DVD processed?
talen9
3rd September 2008, 13:59
I personally prefer to start all the process again for each DVD.
You can think that this is a somewhat tedious process (and ... it is :P), but I prefer this to lots of false matches ... :rolleyes:
Anyway, I save each matrix in a file of its own ... this is useful especially when you're ripping the subs out of very similar DVDs (like the ones from a TV series ;)).
Octo-puss
3rd September 2008, 14:04
The bad thing about SubRip is that if you mistype something you can't go back. aaargh!
talen9
3rd September 2008, 14:15
Yes you can :)
"Character matrix" -> "Edit/View char. matrix" -> browse the list of all the glyph/char associations and, when you find one which is not right, correct it (and don't forget to click on the "modify" button ;)).
You'll have to start again from the beginning the current ripping ... but I don't think that's a great deal, as all the chars until the point where you stopped the matching are already known to SubRip ;)
EDIT: there's a lot to discover about SubRip, don't you think? :p
Octo-puss
3rd September 2008, 14:26
Wish a day had 25 hours...
Octo-puss
3rd September 2008, 14:41
I being swapped with l is killing me though. And there's no obvious way to make SubRip recognize them properly.
talen9
3rd September 2008, 14:52
Well, from a bitmap POV, capital 'i' and 'l' are exactly the same image for a lot of fonts.
BUT!
You are doing the "Post OCR spelling correction" step after the end of the OCR, right?
I think that this can rid you of the " I <-> l " problem for the best part of your subtitle file ... and the best way for this NOT to be a problem at all, is to use a sans serif font for the subtitle in your player ;)
Octo-puss
3rd September 2008, 15:39
Jesus. I was doing it all manually.
talen9
3rd September 2008, 16:43
That's the best feature of SubRip, IMHO ... and it's good for you that you've learned it too, now :p
Octo-puss
3rd September 2008, 21:51
I wonder how it works though. I let it eat through one subs file and it really corrected everything 100%. Whoa.
Curious how many more interesting features will I find:D
lovelove
23rd August 2011, 16:11
Well, from a bitmap POV, capital 'i' and 'l' are exactly the same image for a lot of fonts.
NO, they aren't. Have you ever really checked before making this claim? They may look very similar but I tried with a few sans serif fonts and they definitely are not identical. Open the bitmaps in an image editor, put one layer over the other and zoom to 1000%, you will see ...
Unfortunately, even with SubRip sensitivity set to 1000 the error still persists.
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.