Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 21st September 2005, 04:58   #221  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@TiaoMacaleh: If you mean that you often get the prompt to type the entire dialogue, it's probably due to the fact that the lines don't have enough distance between them (usually that happens if you have accentuated characters on the second line). Try lowering the "min interline height" value in the OCR options. Sometimes it's impossible to draw a straight line between the two lines of text no matter what you do, so you'll need to type in those dialogues manually. It is indeed possible to separate text lines with a line that "goes around" accents and the like, but these cases are too rare to justify the programming effort at the moment. And since now you also have the special characters table in that window, it's a lot easier to type the entire line.

@LeMoi:
1) You probably mean the last CodePage, not CharSet. The French language usually triggers the UniCode flag due to the "oe" ligature character. Answer "No" when asked to save as UniCode, then just leave "DEFAULT_CHARSET" and "1252 - ANSI Latin I". You should have no problems with the resulting subtitle. You should never need to choose CodePage 65000 or 65001 (which are UTF-7 and UTF-8, by the way). When you change the CharSet, SubRip automatically sets the corresponding CodePage for you, and normally you should never have to change the CodePage yourself (it's there only for more flexibility). Alternatively, you can save the text as UniCode, then load it in Word and save it as text with the encoding (CodePage) of your choice. Word highlights the characters that cannot be converted for you (it shows them as red question marks).

2) The translator didn't respect the requirement to keep the number of characters when translating, so the best guess text is overlapping the text that says "Meilleure reponse". It should be a bit better in the last beta, because now the best guess has spaces before and after it, like " l ". The font and color are different also, so you should easily see the best guess. Or... edit "Lang\Francais.lng" with NotePad and put "Meil. repo." instead of "Meilleure reponse".
ai4spam is offline   Reply With Quote
Old 21st September 2005, 10:08   #222  |  Link
LeMoi
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 367
Quote:
Originally Posted by ai4spam
@LeMoi:
1) You probably mean the last CodePage, not CharSet. The French language usually triggers the UniCode flag due to the "oe" ligature character. Answer "No" when asked to save as UniCode, then just leave "DEFAULT_CHARSET" and "1252 - ANSI Latin I". You should have no problems with the resulting subtitle. You should never need to choose CodePage 65000 or 65001 (which are UTF-7 and UTF-8, by the way). When you change the CharSet, SubRip automatically sets the corresponding CodePage for you, and normally you should never have to change the CodePage yourself (it's there only for more flexibility). Alternatively, you can save the text as UniCode, then load it in Word and save it as text with the encoding (CodePage) of your choice. Word highlights the characters that cannot be converted for you (it shows them as red question marks).
OK but yesterday i subriped subs where there were "Ç" and it didn't detect that unicode was needed to save these chars and proposed me to save in ansi, which does'nt keep such character

and thanks for the tip
LeMoi is offline   Reply With Quote
Old 21st September 2005, 15:49   #223  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@LeMoi: That's weird, because that character is kept by default. Are you sure you OCR-ed it as such? For example, if you OCR-ed it as regular "C" (either by mistake, or because your char matrix tells it to) and the inter-line distance is too small, it may skip the "," underneath the special "C" altogether.
ai4spam is offline   Reply With Quote
Old 21st September 2005, 15:56   #224  |  Link
LeMoi
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 367
Hmm, i don't really remember, but i'm almost sure. But maybe i am wrong, since in most of the cases, it's well OCR-ed and it advises me to save it in unicode, i'll try with other subs and tell you about it.
LeMoi is offline   Reply With Quote
Old 21st September 2005, 18:53   #225  |  Link
TiaoMacaleh
Registered User
 
Join Date: Aug 2004
Posts: 24
ai4spam it worked, i tryed changing the setting before without luck but the problem was the program doesnt seem to apply the changes right away. You need to restart subrip to changes take effect. After restart everithing was fine. Thanks =]
TiaoMacaleh is offline   Reply With Quote
Old 21st September 2005, 19:36   #226  |  Link
sapient
Unbeliever
 
Join Date: Sep 2002
Location: Greece
Posts: 111
Everytime I load ai4spam's matrix subrip freezes up and stops responding... Any ideas?
sapient is offline   Reply With Quote
Old 21st September 2005, 19:43   #227  |  Link
bourtzovlakas
dvd.stuff.gr moderator
 
bourtzovlakas's Avatar
 
Join Date: Apr 2004
Location: Greece
Posts: 312
It 's big and needs an amount of time to load...
bourtzovlakas is offline   Reply With Quote
Old 21st September 2005, 21:35   #228  |  Link
sapient
Unbeliever
 
Join Date: Sep 2002
Location: Greece
Posts: 111
No amount of time unfreezes it.
I realised, though, that the problem exists only if the matrix is loaded automatically using the search for match button. If it is loaded manually at the beginning, it works fine.
sapient is offline   Reply With Quote
Old 22nd September 2005, 01:00   #229  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@LeMoi: the special ",c" character is part of the ANSI CharSet, so if you only have that and no other special characters SubRip will not ask you to save as UniCode, but will directly go to the saving window and let you set the CharSet and CodePage for conversion.
The problem may be if DEFAULT_CHARSET has a different value for French Windows XP. To be sure, select ANSI_CHARSET instead - this changes the CodePage to 1252 and it should be fine.

@sapient: bad idea, using the search button with my huge matrix. Again, if you see some really weird font, make a new matrix, otherwise just add characters to mine.

Last edited by ai4spam; 22nd September 2005 at 03:09.
ai4spam is offline   Reply With Quote
Old 22nd September 2005, 10:31   #230  |  Link
LeMoi
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 367
Quote:
Originally Posted by ai4spam
@LeMoi: the special ",c" character is part of the ANSI CharSet, so if you only have that and no other special characters SubRip will not ask you to save as UniCode, but will directly go to the saving window and let you set the CharSet and CodePage for conversion.
The problem may be if DEFAULT_CHARSET has a different value for French Windows XP. To be sure, select ANSI_CHARSET instead - this changes the CodePage to 1252 and it should be fine.
I know, i was speaking about the 'Ç' character (the same in caps)
LeMoi is offline   Reply With Quote
Old 22nd September 2005, 14:30   #231  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@LeMoi: Both are part of the standard CharSet.
ai4spam is offline   Reply With Quote
Old 26th October 2005, 13:33   #232  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
SubRip 1.40 Beta 3 is now up.
ChangeLog:
Added support for negative timestamps. It'll show a warning that some delay needs to be added to synchronize, and start with 00:00:00,000 as the first subtitle timestamp.
Added support for using the file offsets from .idx files. It does not seem to help with badly-formed subtitles and screws up some good ones, as the file offsets seem to be bogus, so it's not on by default.
Fixed a few bugs.
ai4spam is offline   Reply With Quote
Old 29th October 2005, 04:15   #233  |  Link
fight2win
What's in a name dude !
 
fight2win's Avatar
 
Join Date: Sep 2005
Location: Cloud 9
Posts: 331
pls pls pls re-upload that char matrix that works with 90% of dvd's, i badly need it!
fight2win is offline   Reply With Quote
Old 30th October 2005, 03:15   #234  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
There's a mirror (kindly provided by FoxyShadis): http://foxyshadis.slightlydark.com/r...CharMatrix.rar
ai4spam is offline   Reply With Quote
Old 30th October 2005, 09:19   #235  |  Link
fight2win
What's in a name dude !
 
fight2win's Avatar
 
Join Date: Sep 2005
Location: Cloud 9
Posts: 331
thanks!
fight2win is offline   Reply With Quote
Old 7th November 2005, 18:15   #236  |  Link
JnZ
Registered User
 
JnZ's Avatar
 
Join Date: Jan 2004
Location: Czech
Posts: 181
I can't save right format

Hi,
I have big problem with SUbRip. When I rip czech subtitles, program ask me to save it to UNICODE or not, but I don't want unicode, because it's sux. So I chose NO but what now. Which charset to select? I try 1250 or 1252, but with even one cant' save right format. With first, instead character "č" it saves "e", and second instead "ř,ů,...." it saves "r,u,....".

Beee, I wanna old SubRip without unicode, but new features...
for any ideas...

EDIT: I found, where is the problem. Because I'm using my old-saved matrices,that maybe isn't saved as unicode...
__________________
(Sorry for my bad english, I'm czech, not englishman... :))

Last edited by JnZ; 7th November 2005 at 19:35.
JnZ is offline   Reply With Quote
Old 8th November 2005, 00:07   #237  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Well, as per your edit, go through your "old" matrices (which are, BTW, updated to the new format automatically), and manually correct the "wrong" characters.
ai4spam is offline   Reply With Quote
Old 8th November 2005, 13:39   #238  |  Link
JnZ
Registered User
 
JnZ's Avatar
 
Join Date: Jan 2004
Location: Czech
Posts: 181
Quote:
Originally Posted by ai4spam
Well, as per your edit, go through your "old" matrices (which are, BTW, updated to the new format automatically), and manually correct the "wrong" characters.
Hm, thx, but browsing and correcting about 200 matrices are litlle suicide. I make some convertor from source codes instead.

BTW: In the last time I rip subs from about 30 DVDs, your matrices can't cover any single case. But my matrices covers 95% cases. So they are very valuable for me. :-)
__________________
(Sorry for my bad english, I'm czech, not englishman... :))
JnZ is offline   Reply With Quote
Old 9th November 2005, 13:34   #239  |  Link
svcdprayer
Registered User
 
Join Date: May 2002
Posts: 203
char matrix ai4spam problem

Hello!

Charmatrix works perfectly but the problem is when i want to save and choose ansi charset 1250 to save i get the problem in saved file for character "i" i get "!" and for italic i get 0 instead of o.

But in previous versions when i had to enter chars for matrix there wasnt option to save charactherset.

Is there any workaround so it can be choosed only codepage and thats it ? Since charmatrix from ai4spam is perfect

Thanks for any suggestions!
svcdprayer is offline   Reply With Quote
Old 9th November 2005, 19:46   #240  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@JnZ: Well, you probably use the option to search in other char matrices, and you have many small ones. You should have had them all in one, and only start a new one when the big matrix wasn't working. BTW, if you use alphabetical sorting, correcting them shouldn't be that hard. How do you choose the right matrix out of 200, anyway?
@svcdprayer: Hmm, "perfect" is a little too much. The problem is not the charset you save in, but the OCR sesitivity. Try setting it to 1000. If you still get problems, then somewhere in the matrix you have an "!" instead of an "i" and so on. Use alphabetical sorting (click on the column heading to sort) and look at all the "!", one of them must be wrong. The "0" vs "o" problem may not be solvable, since characters otherwise identical may mean different things in different DVDs. For that, I'd run a spellchecker (Word or SubtitleWorkshop) on the final text.
ai4spam is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 22:31.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.