Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd September 2005, 11:54   #181  |  Link
Sinistral
Registered User
 
Join Date: Jun 2005
Posts: 9
Hey ai4spam,

First of all, thanks for all the great work you've done on subrip. I use it a lot and it just keeps getting better.

I do however have a problem: when I rip subtitles from certain languages, and I save them as .srt format (not UniCode), some characters aren't saved properly. Like these:

ę --> e
ś --> s
ł --> l
č --> c
ź --> z

These are just some examples. Is there any way to fix this, except for saving them as unicode (I use subrip 1.30b11)? Thanks in advance.

And now for something constructive. I rip subtitles in a lot of languages and sometimes some characters aren't in the little boxes beneath the typing bar. So I made a list for myself which has all the characters I used for a language, but which weren't in the little boxes beneath the typing bar. So I though you might want to add some characters to the little boxes from my list. Here it is:

ENGLISH: ñ á ú í ó ä ö ç è ü ı ë ï ¡ ¿ © å ã ł â ½ Å
DUTCH: ï é Ë è ó à ô ö ê î ç £ ú ñ á â
GERMAN: Ö à é ² º
FRENCH: Œ º Â Û Ô Î í æ ã á
ITALIAN: Ì Ò
SPANISH: Ú Ì ä ª
PORTUGUESE: à À ô Ô Ê Õ í Í Ç ñ º ª
SWEDISH: Ö æ Å ø á ü è
NORWEGIAN: æ Æ Ø é É â ô ò
DANISH: å Å é É ñ ö ó í
FINNISH: å Å š Š
ICELANDIC: Þ É é ý Ý í Í Đ Æ ó Ó Ö Ú á Á
HUNGARIAN: ű Ű í Í ó Ó
CZECH: ř Ř š Š č Č ĕ Ĕ ť ů Ů ň Ň ď Ž á Á ú Ú ó Ó é Ť ń Ď §
POLISH: ś Ś ę Ę ń Ń ł Ł ż Ż ą Ą ć Ć ź Ź ó
TURKISH: ğ Ğ ı İ ş Ş ö Ö â Ç à è
SERBIAN: đ Đ ć Ć č Č š Š ž Ž
SLOVENIAN: š Š č Č ž Ž é
ROMANIAN: ţ Ţ ă Ǎ ş Ş
SLOVAK: í Í č Č ľ Ľ ú Ú á Á ý Ý é É ť ž Ž š Š ô ŕ ň Ď ä ó Ó ř Ť
ESTONIAN: ä Ä õ Õ š Š ř
CROATIAN: č Č ć Ć ž đ Đ

I tried to sort it on importance. So the characters I used the most are in front. I hope you can use them, but if not it doesn't matter.

I also found a orthography dictionary in Dutch for subrip. I tried it a couple of times and it seems to work very good. Here's the link: http://www.nlondertitels.com/board/a...achmentid=1395

I hope I helped out a little bit. If you have any questions about my problem or about something else, just post it here and I'll respond. Thanks for your time!
Sinistral is offline   Reply With Quote
Old 2nd September 2005, 22:39   #182  |  Link
ukendt
Moderator
 
Join Date: Oct 2001
Location: Denmark
Posts: 541
DANISH: å Å é É ñ ö ó í


It ain't danish, sorry to say


ñ is spanish, it doesn't exists in the danish language.
ö ain't danish either, we use ø istead (pronounced almost like the french "eu")
ó never heard of it
í ??? We use i. Are U sure U didn't confuse italian with danish?
ukendt is offline   Reply With Quote
Old 3rd September 2005, 00:01   #183  |  Link
Sinistral
Registered User
 
Join Date: Jun 2005
Posts: 9
Quote:
Originally Posted by ukendt
DANISH: å Å é É ñ ö ó í


It ain't danish, sorry to say


ñ is spanish, it doesn't exists in the danish language.
ö ain't danish either, we use ø istead (pronounced almost like the french "eu")
ó never heard of it
í ??? We use i. Are U sure U didn't confuse italian with danish?

You are right about that, that isn't Danish. But sometimes I rip foreign movies with Danish subtitles. Those were probably from a Spanish or Italian movie with Danish subtitles. In some of the names some of those characters might have been used. If you look at English, you'll see a lot more of them

Because it isn't Danish, I've put them in the back (the less used characters). Hope that clears it up.
Sinistral is offline   Reply With Quote
Old 3rd September 2005, 03:50   #184  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@Sinistral: thanks for the input. Can you please send me your charmaps.ini file (lookup my email in the SubRip manual, or just post a link to somewhere)? That would be easiest for me. I'll also include the Danish dictionary (if it's not too big) into the distribution, otherwise I may just put it separately on the web page. Please send it to me, I cannot login into that forum since I don't know the language.
Anybody that has such dictionaries, please submit them to me or Zuggy.

As for the characters that are wrongly converted, you need to select the apropriate CodePage and CharSet in the Global Options window. There's a bug in Beta 11 that doesn't update all the elements of the GUI to the new font. Currently, I'm working on UniCode support in the GUI without the need to change the system locale, and I've made some signfficant changes. After a couple of bugs are ironed out, a new beta will be up on the website.

Last edited by ai4spam; 3rd September 2005 at 03:57.
ai4spam is offline   Reply With Quote
Old 3rd September 2005, 15:46   #185  |  Link
Sinistral
Registered User
 
Join Date: Jun 2005
Posts: 9
Quote:
Originally Posted by ai4spam
@Sinistral: thanks for the input. Can you please send me your charmaps.ini file (lookup my email in the SubRip manual, or just post a link to somewhere)? That would be easiest for me. I'll also include the Danish dictionary (if it's not too big) into the distribution, otherwise I may just put it separately on the web page. Please send it to me, I cannot login into that forum since I don't know the language.
Anybody that has such dictionaries, please submit them to me or Zuggy.

As for the characters that are wrongly converted, you need to select the apropriate CodePage and CharSet in the Global Options window. There's a bug in Beta 11 that doesn't update all the elements of the GUI to the new font. Currently, I'm working on UniCode support in the GUI without the need to change the system locale, and I've made some signfficant changes. After a couple of bugs are ironed out, a new beta will be up on the website.
Okay, I send you the mail. And thanks for solving the problem, it works perfectly now!

EDIT: I got a message in my inbox saying my e-mail to you wasn't delivered. Maybe I used the wrong address. Could you send your e-mail address to quintrixjvc@yahoo.com ? Thx in advance.

Last edited by Sinistral; 3rd September 2005 at 17:29.
Sinistral is offline   Reply With Quote
Old 4th September 2005, 06:48   #186  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Well, I found a great resource in http://diacritics.typo.cz/index.php?id=49 , I'm in the process of incorporating it.
Also, tons of free fonts are at http://www.alanwood.net/unicode/fonts.html , hopefully some users will be able to get one of those instrad of Arial UniCode MS.

Last edited by ai4spam; 4th September 2005 at 06:55.
ai4spam is offline   Reply With Quote
Old 7th September 2005, 14:00   #187  |  Link
chkp45
Registered User
 
Join Date: Sep 2005
Posts: 6
Support for a new input format ?

ProjectX can extract subtitles from DVB-T broadcasts to .sup format which it says is used by IFOedit. These files have lines like:

{start frame number}{end frame number}{path to .bmp file}

ProjectX creates this file and the relevant bitmap files.

What would be nice is to have SubRip understand this format...

I can supply a sample / link to a sample if needed.
chkp45 is offline   Reply With Quote
Old 7th September 2005, 15:58   #188  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Quote:
Originally Posted by chkp45
Support for a new input format ?
I can supply a sample / link to a sample if needed.
Please do.
ai4spam is offline   Reply With Quote
Old 7th September 2005, 19:27   #189  |  Link
johner23
Registered User
 
Join Date: Jul 2003
Location: Brazil
Posts: 234
Hi, I have some other questions too.

1) Can you add or improve the mechanism for captioning close captions in sub rip? I past, there was some dvd's that uses closed caption that sub rip can't handle or rip.

Here there is a great tool that can do that in some cases and video formats.

---> http://forum.doom9.org/showthread.ph...losed+captions

2) Gabest have created a tool that can extract closed caption too. Can you add similar features in sub rip core?

I guess the program is called "VobSub Ripper Wizard".

---> http://sourceforge.net/projects/guliverkli/ (try these link)

Well, if you could integrate, improve and expand all that features in sub rip, it will be useful for all users that need to extract closed captions from their videos.

Thanks in advance for your time.

Best regards.

devil (johner)

Last edited by johner23; 7th September 2005 at 19:32.
johner23 is offline   Reply With Quote
Old 7th September 2005, 20:27   #190  |  Link
chkp45
Registered User
 
Join Date: Sep 2005
Posts: 6
Actually I was wrong about which program does what...

ProjectX extracts subtitles to x.sup, and x.sup.IFO files which are all binary (+ a log file).

What you need then is DVDSupDecode from DVDSupToos package and run this with -bitmap option and give x.sup for it as input. After this you get x.txt which is in the format I described in my earlier message, plus x_000001.bmp files.

A samples of all these are somewhere in the net, I will send the location to ai4spam privately
chkp45 is offline   Reply With Quote
Old 8th September 2005, 01:06   #191  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@johner:
1) I am not familiar with the format (don't even know if they're bitmaps or text), and I don't have the time to implement support for it. But hey, this is the beauty of OpenSource, anyone can take over .
2) Unfortunately, although hosted on SourceForge, none of Gabest's work is actually available in source form. I could have used some of his VSRip code for saving .idx/.sub from .avi files, but alas, he only gives the executable. And again, no time to reverse-engineer or lookup the documentation for this .

@chkp45: Thanks for the sample, I'll see if I can do something about it. It looks like a decent enough format, I may even implement saving to it myself (from .avi). The problem is, I don't know what the palette is: there are only 4 colors (background, outline, text1 and text2), but I cannot assume they are present in all the bitmaps of this type. Do you have another example, so that I can test (just one bitmap will be enough)?
ai4spam is offline   Reply With Quote
Old 8th September 2005, 04:23   #192  |  Link
DonGato
Registered User
 
Join Date: Sep 2004
Posts: 17
The CVS isn't useful or doesn't have the needed files?
http://cvs.sourceforge.net/viewcvs.p...src/subtitles/
DonGato is offline   Reply With Quote
Old 8th September 2005, 05:22   #193  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@DonGato: Hmm, that may work, thanks. Too bad it doesn't do much about my free time, wich is still next to none . My thinking is... don't duplicate the functionality of something if you can't do it better. In this case, I can't do any better, so you'll have to keep using both programs.

@chkp45: Too bad DVDSupDecode doesn't give you any other format, like a "regular" .idx/.sub pair.

Last edited by ai4spam; 11th September 2005 at 00:39.
ai4spam is offline   Reply With Quote
Old 8th September 2005, 12:46   #194  |  Link
chkp45
Registered User
 
Join Date: Sep 2005
Posts: 6
More than you (or anybody) would ever want to know about DVB subtitles

http://webapp.etsi.org/action%5COP/O...43v010201o.pdf
chkp45 is offline   Reply With Quote
Old 9th September 2005, 05:20   #195  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Quote:
Originally Posted by chkp45
More than you (or anybody) would ever want to know about DVB subtitles
Indeed. Good thing ProjectX and DVDSupDecode did all the work .
ai4spam is offline   Reply With Quote
Old 11th September 2005, 08:09   #196  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Well, SubRip 1.40 Beta 1 is up on the home page.
Changelog:
- Large GUI overhaul (a better way of entering special characters) and bugfixes everywhere.
- Changed the format of the language and char map files to UniCode, so that they can be edited in Word.
- Implemented a way to fill a char matrix with entire UniCode ranges. With just a little exercise in matching you can potentially save a lot of typing for languages with lots of characters, like Chinese and Japanese.
- Implemented opening bitmap sequences that DVDSupDecode outputs from converting DVB-T broadcasts saved as .sup by ProjectX.
- Made Modify and Delete work faster for characters in the Char Matrix.

Thanks go to Sinistral and chkp45 for their suggestions.

Last edited by ai4spam; 15th September 2005 at 05:01.
ai4spam is offline   Reply With Quote
Old 11th September 2005, 12:15   #197  |  Link
bourtzovlakas
dvd.stuff.gr moderator
 
bourtzovlakas's Avatar
 
Join Date: Apr 2004
Location: Greece
Posts: 312
Is v1.30 beta11 considered the stable version of the v1.30 family??
bourtzovlakas is offline   Reply With Quote
Old 11th September 2005, 17:56   #198  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Quote:
Originally Posted by bourtzovlakas
Is v1.30 beta11 considered the stable version of the v1.30 family??
Well, just like with 1.30 from 1.20, there were so many changes I thought it deserved a new version, and yes, with all the changes I may have introduced new bugs that may need a whole new series of betas to fix. So, happy bug hunting !
ai4spam is offline   Reply With Quote
Old 12th September 2005, 06:05   #199  |  Link
Saligia
Registered User
 
Saligia's Avatar
 
Join Date: Jun 2005
Location: Denmark
Posts: 11
Wish it was there...

Hi!
First of all, thanx for devoting all these hours, weeks, months... keeping Subrip going strong.
It is an one-and-only. And without it we would be nowhere.
THX!

Lately I've come across some subs that, no matter what, just would'nt be OCR'ed to anything, but endless lines of ocr-errors.

Born out of pure frustration, I came up with this solution, that works just fine.

1. Rip subs to BMP/SON with Subrip.
2. Convert to SUB/IDX with Son2vobsub.
(Can be scaled, to obtain large, clear, non-joined chars.)
3. Load SUB/IDX into AVIsynth on blank clip, with movie length.
(White chars on black background)
4. Make fake AVI with MakeAVIS.
5. OCR the .avi with Subrip.

It works well, thoug it's no timesaver. But then again, I'd rather spend some hours clicking the mouse now and then, than powerbashing the keyboard until i breaks and burn.

And now to my real agenda. (Of course there had to be a catch!)
When OCR'ing "hardsubbed" txt this way, it seems that Best Guess is correct, next to always.

So these options would be Soooo cool:
A. Use Best guess after (n) seconds, if not interupted. 0=allways
B. Insert (selectable chars) if best guess=none
C. Comment in lines where best guess is used

This could even make the OCR'ing unattended, and easy to correct afterwards with some spellcheking.

Thanks for your great work.

Last edited by Saligia; 12th September 2005 at 06:23.
Saligia is offline   Reply With Quote
Old 12th September 2005, 08:58   #200  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@Saligia: Thanks for the comments.

This whole "scale, to obtain large, clear, non-joined chars" sounds interesting, it's along the lines of AVISubDetector's "double resolution" feature. I did not use it for SubRip because that makes the characters too large, and SubRip's current limit is 72x44 ever since Delphi couldn't handle dynamic arrays. I guess I could make them larger, but that would deffinitely impact performance and potentially make char matrices less reusable. I'll think about it, but again, I have very little time to devote to this.

A question: what kind of scaling are we talking about? Is there some processing going on there to make letters disjoint? Can you send me an example (look up my email in the manual)? The scaling could be implemented directly in SubRip.

Also, it seems that your steps 3 and 4 are superfluous. Why don't you just load the "enlarged" .sub/.idx directly in SubRip, instead of making an .avi? This will also help with your problem, because you won't get the .avi compression artifacts screwing up with the binarization in the hard-subbed .avi OCR.

About your agenda: what the best guess does is it puts a new char in the char matrix, once you confirm it. So, the result of option A would be a char matrix potentially filled with really bad guesses. In my experience, once you start it, you seldom have to type. Try one of the following:
- lower the OCR sensitivity
- use the "fill matrix" option - it's a bit of work to match the font, but after that it should be smooth
I'm not clear what good options B and C would be. Presumably, you'd insert some special char or comment, and then go in and edit manually afterwards (spellcheckers never quite work). Again, if you got a decent source, once you start the OCR you have to type less and less.

Hopefully, with my suggestion above, you'll cut down the time you spend doing this.
ai4spam is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:05.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.