Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 9th November 2005, 23:56   #241  |  Link
JnZ
Registered User
 
JnZ's Avatar
 
Join Date: Jan 2004
Location: Czech
Posts: 181
Quote:
Originally Posted by ai4spam
How do you choose the right matrix out of 200, anyway?
It's easy. Only click on "Search for Match" and program serch all matrices, I have. When it find some one, I test it, and if some next chars are different, I use next one...etc. When almost chars are OK, I fill in some missing chars, then save to this matrix.

So I have many matrices with diffrent charmaps. Matrices differs, but contains only one charmap. I don't have problem with bad-recognized chars.
__________________
(Sorry for my bad english, I'm czech, not englishman... :))
JnZ is offline   Reply With Quote
Old 15th November 2005, 00:40   #242  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Quote:
Originally Posted by JnZ
I don't have problem with bad-recognized chars.
Well, I think you've been lucky. The search for char in another matrix feature just explores all the matrices in the directory. If one of them happens to contain a char that's wrongly recognized in the context of a new DVD, you'll still get in trouble.
ai4spam is offline   Reply With Quote
Old 28th November 2005, 17:13   #243  |  Link
masken
uhm... ?
 
Join Date: Oct 2001
Location: Gothenburg, Sweden
Posts: 281
@ai4spam, have you worked anything on the "whole words formatting only" issue?

Two other things I've come to notice:
If the vobsubs are smaller than usual, the words often get stuck together if you don't manually adjust the Space Width setup. Nothing strange in itself, but actually, vobsub could do a pretty good guess at the space width setup automatically by checking the medium character height in a vobsub picture when OCR:ing an adjust this according to a table or math function etc. What do you think about this? Check your PM's for an example on this.

Another thing in the post-OCR check... I think there might already be something similar there, but I'll post about it anyhow. Since the pre-extend selection period, one often missed parts of certain characters (especially when in Italics) when OCR:ing, and many still haven't got the habit of extending a selection since they think most of the character is covered. The post-OCR check could automatically replace these character combinations:
.! > !
!. > !
?. > ?
.? > ?
.: > :
:. > :
masken is offline   Reply With Quote
Old 29th November 2005, 10:46   #244  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@masken:
1) It wouldn't be that easy, because there are other factors involved, and kerning is different from font to font, from regular to italic to bold, and so on.

2) You can do those yourself by editing the "English_screwed.dic" file or creating a new one.

PS: A new version will be up soon, but my time to work on it is limited.

Last edited by ai4spam; 29th November 2005 at 10:53.
ai4spam is offline   Reply With Quote
Old 29th November 2005, 13:28   #245  |  Link
zuggy
Registered User
 
Join Date: May 2002
Location: Czech Republic
Posts: 171
Quote:
Originally Posted by masken
.! > !
!. > !
?. > ?
.? > ?
.: > :
:. > :
Beware of .! > !, .? > ? replaces.

What about this sentense:
[Mama] Wash yours hands...!
[Good son] Should I do it today...?

The other replaces you suggested are already covered by postocr correction.
zuggy is offline   Reply With Quote
Old 4th December 2005, 04:16   #246  |  Link
Mtz
Registered User
 
Mtz's Avatar
 
Join Date: Sep 2003
Location: On The Beach
Posts: 714
There is a bug in English at the Post OCR Correction.
For example: Wasn't
Become: Wasr't

enjoy,
Mtz
Mtz is offline   Reply With Quote
Old 4th December 2005, 12:37   #247  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
There's an error in Dict\English_screwed.dic. Just delete the following 2 lines:
Code:
n'
r
ai4spam is offline   Reply With Quote
Old 4th December 2005, 22:20   #248  |  Link
LeMoi
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 367
Subripping of idx/sub extracted from an mkv file is really weird
http://lemoi.fr.free.fr/sub_fr.rar
idx/sub are correct, but after subrip, timecodes are messed up
LeMoi is offline   Reply With Quote
Old 5th December 2005, 12:36   #249  |  Link
johner23
Registered User
 
Join Date: Jul 2003
Location: Brazil
Posts: 234
Hi, dear all.

I have tryed to rip subtitles from an extra dvd edition and got some problems in OCR.

I final, Subrip can "extract" all of the subs in bmp without problem ( about 400 bmp's files showing all the speeches). But, when I try to rip to srt text, Subrip can't read directly.

There is a way to correct that in Subrip? Or maybe using Ifoedit ( or other tool) to create a new ( dvd pallete) file that can be better read by Subrip later?

And, for additional future request to add, can Subrip in future be able to read/rip/save in SUP/srt and vice-versa formats, just the way I got subtitles when I demux in dvd rip extraction usind DVD-D and similars rippers?

Thanks.

Last edited by johner23; 5th December 2005 at 13:14.
johner23 is offline   Reply With Quote
Old 5th December 2005, 15:26   #250  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@LeMoi: I don't understand what you mean by "messed up", the subtitles you sent work fine here.

@johner23: You'll have to give me more details than "can't read directly", what exacty does not work? Are the colors in the bitmaps somehow screwed up? Can you send me the index file and a couple of bitmaps?
By the way, if at all possible, use vsRip from http://guliverki.sf.net for ripping into idx/sub files instead of bitmaps.
ai4spam is offline   Reply With Quote
Old 5th December 2005, 16:07   #251  |  Link
johner23
Registered User
 
Join Date: Jul 2003
Location: Brazil
Posts: 234
Quote:
Are the colors in the bitmaps somehow screwed up
Yes, I guess the original palette dvd color is a problematic case.

PS: it's possible to correct them using Ifoedit or other tool?

If I choose to rip in bitmap way, ok. Subrip give me almost 400 bmp files, which contains all the lines in an image way.

But whe I try to rip into srt-txt file, no way: Subrip can't recognize the colors. So I need to type manually one by one. There is an option that contains 4 basic colors in subrip ( blue, red, black and one other that I can't remember now).

I turn off 2 of them and after that, Sub rip can "read" in part that files. But I need to type all the lines, because Subrip only reconize the times, not the words themself.

Do you want that I send you that file? It's 7Mb lenght.

I've tryed to use Gabest tool, but it seems that fails in extraction. Only subrip could get the times.

Thanks.

Last edited by johner23; 5th December 2005 at 16:10.
johner23 is offline   Reply With Quote
Old 6th December 2005, 11:23   #252  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Quote:
Originally Posted by johner23
Do you want that I send you that file? It's 7Mb lenght.
Sure, go ahead.
ai4spam is offline   Reply With Quote
Old 6th December 2005, 12:41   #253  |  Link
johner23
Registered User
 
Join Date: Jul 2003
Location: Brazil
Posts: 234
@ai4spam

Again, I tried to rip the sub, but that times, I use the most recent Subrip's version ( the other was a little bit older) and could rip it well.

Of course there was some little parts that I had to correct manually. But, almost 98% was done this time.

As suggestion, can you add the same functions similar DVD Subtitle Tools inside Subrip core?

---> http://web.quick.cz/FKasparek/

In the worst case, if I couldn't rip the sub, I would try to demux into sup files using DVD-D or VobEdit and try to conver them into txt using DVD Subtitle Tools.

If we could deal with sup files and edit them, it will be interesting to correct the colour pallete and others aspects in hard ripping cases.

Thanks for your attention.

Last edited by johner23; 6th December 2005 at 12:44.
johner23 is offline   Reply With Quote
Old 6th December 2005, 13:16   #254  |  Link
LeMoi
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 367
Quote:
Originally Posted by ai4spam
@LeMoi: I don't understand what you mean by "messed up", the subtitles you sent work fine here.
Didn't you notice any difference between subs in idx/sub and the subripped ones? do sentences match same timecodes?
LeMoi is offline   Reply With Quote
Old 7th December 2005, 15:04   #255  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
As a matter of fact, I did. Here are the first and last 3 subtitles, from the .idx and form the result:

Code:
id: fr, index: 0
timestamp: 00:00:48:760, filepos: 000000000
timestamp: 00:04:05:120, filepos: 000001000
timestamp: 00:04:08:560, filepos: 000002000
...
timestamp: 01:43:00:800, filepos: 0003ec000
timestamp: 01:43:03:160, filepos: 0003ed000
timestamp: 01:43:05:720, filepos: 0003ee000
Code:
1
00:00:48,760 --> 00:00:51,593
blah blah blah

2
00:04:05,120 --> 00:04:08,317
blah blah blah

3
00:04:08,560 --> 00:04:10,755
blah blah blah

...

1127
01:43:00,800 --> 01:43:03,872
blah blah blah

1128
01:43:03,160 --> 01:43:05,116
blah blah blah

1129
01:43:05,720 --> 01:43:07,676
blah blah blah
They look the same to me. Now, I looked at my version of the subs (I have them from another source), and the timings are quite different. It may be due to differences in NTSC/PAL framerates. I suggest you grab the .srt from somewhere else and skip ripping this one with SubRip.
ai4spam is offline   Reply With Quote
Old 7th December 2005, 15:07   #256  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Quote:
Originally Posted by johner23
Again, I tried to rip the sub, but that times, I use the most recent Subrip's version ( the other was a little bit older) and could rip it well.
Well, it's always a good idea to get the latest version .
As for functionality implemented elsewhere, there's very little chance you'll see it in SubRip. I simply don't have the time, and it doesn't make sense to reimplement something that works well. If it didn't... that's another story.
ai4spam is offline   Reply With Quote
Old 7th December 2005, 17:13   #257  |  Link
LeMoi
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 367
Quote:
Originally Posted by ai4spam
As a matter of fact, I did. Here are the first and last 3 subtitles, from the .idx and form the result:

Code:
id: fr, index: 0
timestamp: 00:00:48:760, filepos: 000000000
timestamp: 00:04:05:120, filepos: 000001000
timestamp: 00:04:08:560, filepos: 000002000
...
timestamp: 01:43:00:800, filepos: 0003ec000
timestamp: 01:43:03:160, filepos: 0003ed000
timestamp: 01:43:05:720, filepos: 0003ee000
Code:
1
00:00:48,760 --> 00:00:51,593
blah blah blah

2
00:04:05,120 --> 00:04:08,317
blah blah blah

3
00:04:08,560 --> 00:04:10,755
blah blah blah

...

1127
01:43:00,800 --> 01:43:03,872
blah blah blah

1128
01:43:03,160 --> 01:43:05,116
blah blah blah

1129
01:43:05,720 --> 01:43:07,676
blah blah blah
They look the same to me. Now, I looked at my version of the subs (I have them from another source), and the timings are quite different. It may be due to differences in NTSC/PAL framerates. I suggest you grab the .srt from somewhere else and skip ripping this one with SubRip.
It's not really the timestamps, but the content !
At the end of the subriping process :

and the content :
Example :
idx/sub : 00:52:38:320 : Pourquoi ils rient ?
srt : 00:52:38:320 : On a la sensation d'être amoureux
srt : 00:51:33:680 : Pourquoi ils rient ?
idx/sub : 00:52:49:120 : On a la sensation d'être amoureux


is it normal ?
LeMoi is offline   Reply With Quote
Old 7th December 2005, 19:27   #258  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Hmm, ok... so the timestamps don't correspond to the content. It's weird, you can actually just open the .sub file (without the .idx) and still get them "wrong". What tool are you using to visualize the .idx/.sub?
Anyway, I can't really help you, maybe Zuggy can take a look, but my guess is that it's a malformed .sub file.
Again, just go online and get the text subs from some specialized website.
ai4spam is offline   Reply With Quote
Old 7th December 2005, 19:35   #259  |  Link
LeMoi
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 367
I see the content of the sub with SubResync.
Going online to find subs is not really helpful. If i extract subs from file, it's often to find other subs in other languages and sync them with those extracted so that they match and i add those to original mkv. If extracted subs are wrong, i can't resync them ^^. I talked about this with Mosu and he thought it wasn't ang mkvtoolnix problem, but i think he's wrong, so he asked to see with you...
LeMoi is offline   Reply With Quote
Old 7th December 2005, 21:07   #260  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Well, sync-ing won't help you, since the contents are shifted. What you need to do is get a sub that is "good" from the web (maybe in another language), then open it in translator mode in SubtitleWorkshop, and load the "bad" sub as the translation. Then, the times will be taken form the "good" sub, and all you have to do is save the "bad" sub with the new timings. Of course. SubtitleWorkshop also lets you check if the tranlsation is correct.
ai4spam is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:31.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.