Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 29th July 2005, 04:03   #161  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 378
Well, there's a good reason why SubRip asks you to type in stuff: this way it can tell if some white part of the image is actually a subtitle, as opposed to just white background (when it mistakes the background for a subtitle, you just press Enter for a "null" character).
So, just type in garbage (well, if you want to type in Chinese, using UniCode you should be able to do so using one of those "helper" applications that lets you type in several Latin letters for a Chinese character), and then open the resulting subtitle in SubtitleWorkshop and edit it to your heart's content.
ai4spam is offline   Reply With Quote
Old 30th July 2005, 21:00   #162  |  Link
Zerryk
Registered User
 
Join Date: Jan 2002
Location: Prague
Posts: 6
minor UI bugs

Hi, there are some minor UI bugs in 1.30b10:
- in Global options dialog, the Charset listbox always resets to Default and the Codepage listbox resets to 1252 in the moment when opening the Global options dialog (before I open the dialog again, the previous presets are remembered and saved correctly)
- if different fonts are chosen for Subtitle window (ie Courier) and in the Global options (ie Arial Unicode MS), the font in Subtitle window resets to the global font when opening the Global options dialog (load some text subs, then open the Global options dialog and see the ugly redrawn screen)
- in the Time correction dialog, the last numeric field (after the comma) maybe should be autopadded by zeroes from the right, not from the left (ie "0:0:0,5" entered should mean "0:0:0,500", not "0:0:0,005" )
Zerryk is offline   Reply With Quote
Old 30th July 2005, 21:50   #163  |  Link
Zerryk
Registered User
 
Join Date: Jan 2002
Location: Prague
Posts: 6
Fill matrix from text - wrong characters generated

The character generator under the "Fill matrix from text" generates another characters than in the appropriate section in charmaps.ini (Czech in this case). In the OCR window and in the generator sample text window, the "hint" buttons show correct characters but the resulting character matrix contains another ones. See the screenshot, compare with Notepad window in background.
These ones were generated with CP 1250 set in Global options. With another CP set in Global options, different characters (but still not the correct ones) are generated.
When UTF-8 or Unicode is set in the Global options, the character generator crashes saying "Access violation at 004eedd5 in module SubRip.exe. Read of address 00000000"
My Windows has codepage 1250 set as default, the Arial Unicode MS font is present.
Attached Images
 
Zerryk is offline   Reply With Quote
Old 1st August 2005, 06:22   #164  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 378
Thanks for the bug reports. The first two bugs (font reset) will be fixed in the next beta. The third one will probably be left as it is (hard to fix and confusing).
As for the char matrix generator, it only loops from 33 to 255, not outside, so you won't get any "special characters", just whatever gets mapped to this domain with the current CodePage. I will improve it when I have the time (it needs to know UniCode ranges, and that's not easy to do).
No date is set for the next beta, as I'm busy with other stuff. Meanwhile, Zuggy may release a new beta if he manages to get some work done on the DVD part.
ai4spam is offline   Reply With Quote
Old 1st August 2005, 17:49   #165  |  Link
Zerryk
Registered User
 
Join Date: Jan 2002
Location: Prague
Posts: 6
char matrix generator - random colors

When typing sample text in the char matrix generator window, the background color of both the sample bitmap and "fit-text-to-sub" bitmap changes randomly. Is it a feature or bug?
Zerryk is offline   Reply With Quote
Old 2nd August 2005, 05:06   #166  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 378
It's a bug with color pallette conversion that I didn't have time to look into.
ai4spam is offline   Reply With Quote
Old 16th August 2005, 09:04   #167  |  Link
Nick Less
Registered User
 
Join Date: May 2005
Posts: 45
ok, first of all i apologize if what i'm about to ask has been dealt with before, so point me in the right direction if it has.

i've just strated ripping subs from avis and this is the problem - when subrip finds a sub, it displays it correctly and recognizes it, but it doesn't realize that it's the same sub so it keeps on repeating it over and over and i have to keep pressing the skip button hundreds of times. Any wa around this?
Nick Less is offline   Reply With Quote
Old 17th August 2005, 13:20   #168  |  Link
veverica
Registered User
 
Join Date: Aug 2002
Posts: 16
Suggestions

Hi ai4spam, i have some suggestions for avi subrip. I've developed PodPis Virtualdub filter to extract subs from avi, creates idx/sub files and than subrip it with SubRip. It works really good, for a tv-seria of 45 minutes it takes around 1 hour of your time to get a subtitle (including sub to avi manual check).

The detection routine is rather simple. It recognises around 99 percent of subtitles and double around 1%, for a medium quality video. We can talk more about it, if you like, you can get the source code if you like.

My first proposal would be to rip off the subs in two steps. First to run the detection, that would create idx/sub file and than to rip that file. It saves much time.

Another proposal would be to include a button that would skip the OCR but would include [skiped item nr.] line in a text file, like it does when you hit Done in manual typing.

Another proposal would be to Delete last typed char from char matrix without showing the matrix. With ALT-U it happens often (in a hurry).

The alt-u (USE best guest) is superb!!!

thanks for your work

veverica is offline   Reply With Quote
Old 17th August 2005, 16:59   #169  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 378
Quote:
Originally Posted by Nick Less
i've just strated ripping subs from avis and this is the problem - when subrip finds a sub, it displays it correctly and recognizes it, but it doesn't realize that it's the same sub so it keeps on repeating it over and over and i have to keep pressing the skip button hundreds of times. Any wa around this?
Try increasing the "same sub tolerance" value.
ai4spam is offline   Reply With Quote
Old 17th August 2005, 17:28   #170  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 378
Quote:
Originally Posted by veverica
The detection routine is rather simple. It recognises around 99 percent of subtitles and double around 1%, for a medium quality video. We can talk more about it, if you like, you can get the source code if you like.
Well, it would be nice to compare the detection routines, and maybe see where mine works and yours doesn't, and the other way around. Have you tried anime fansubs? Ther are a bit harder to work with - see the example in guide on the SubRip homepage, how does PodPis deal with that? Do you still get 99% detection accuracy?

Quote:
Originally Posted by veverica
My first proposal would be to rip off the subs in two steps. First to run the detection, that would create idx/sub file and than to rip that file. It saves much time.
Well, I'd have to run them side by side to decide. Sounds to me like you already have everything in place. We can ask zuggy to include a link to PodPis on the SubRip homepage, or I can do it myself, if he agrees. One question: what do you do when you have a "possible"? Do you still output it in the .idx/.sub files? I'd say actually seeing the subtitles helps a lot. And do you actually save the files with different text and outline colors, or do you just make them black and white?

Quote:
Originally Posted by veverica
Another proposal would be to include a button that would skip the OCR but would include [skiped item nr.] line in a text file, like it does when you hit Done in manual typing.
That would be easy enough to do, but what advantage is there? The "Skip this subtitle" button skips without adding that message. Do you actually go in with some other program (like SubtitleWorkshop) and fill in your skipped items? On the other hand, it would make sense to have a "Same as last" button in the manual typing window. However, that only shows up if SubRip can't detect the lines in a multi-line subtitle, which doesn't happen if you use the "inter-line options".

Quote:
Originally Posted by veverica
Another proposal would be to Delete last typed char from char matrix without showing the matrix. With ALT-U it happens often (in a hurry).
Hmm, it would also have to restart the detection of the last subtitle, since that character needs to be in fact replaced, not deleted. That would also mean going back N subtitles if all other characters have been properly detected . It is possible, but not easy to implement, and going back in videos is not easy, or guaranteed to work all the time. So... my advice... don't be hasty when pressing the "Use" button. Maybe I should put a delay timer on it ?

Back to your proposal: I now have zero time available to work on this, and won't have time for a few more months. I can at most give advice . Hopefully, zuggy will get back to the DVD part and make some improvements, and fix some of the reported bugs along the way.

On another note, I was trying to get the subtitle removal filter to work (see some earlier posts). It would be a great help if you could finish it (it's only debugging, everything else is in place) using your knowledge of VirtualDub, and maybe combining it with PodPis.

Last edited by ai4spam; 18th August 2005 at 09:46.
ai4spam is offline   Reply With Quote
Old 18th August 2005, 09:38   #171  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 378
Beta 11

Beta 11 is up - I found some time after all, because I was ripping some subs in Romanian and they weren't saved properly, so I ended up fixing some bugs .

Changes: fixed some GUI bugs, including the ones mentioned in the recent posts; new French language (thanks to Kurtnoise).

Last edited by ai4spam; 18th August 2005 at 09:43.
ai4spam is offline   Reply With Quote
Old 18th August 2005, 10:14   #172  |  Link
veverica
Registered User
 
Join Date: Aug 2002
Posts: 16
Quote:
Originally Posted by ai4spam
Well, it would be nice to compare the detection routines, and maybe see where mine works and yours doesn't, and the other way around. Have you tried anime fansubs? Ther are a bit harder to work with - see the example in guide on the SubRip homepage, how does PodPis deal with that? Do you still get 99% detection accuracy?
I'll try that out. The detection often works very well but the picture is jammed, not possible to OCR. This kind of subs I just skip and use the times only and than in SubtitlesTranslator manualy enter the text. Saves a lot of time, I think.
In PodPis I compare the pictures I consider to be the same subtitle and than use only the one with least white points. Even if some parts of the subtitle are jammed it is a good chance to get a clear subpicture.


Quote:
Originally Posted by ai4spam
Well, I'd have to run them side by side to decide. Sounds to me like you already have everything in place. We can ask zuggy to include a link to PodPis on the SubRip homepage, or I can do it myself, if he agrees. One question: what do you do when you have a "possible"? Do you still output it in the .idx/.sub files? I'd say actually seeing the subtitles helps a lot. And do you actually save the files with different text and outline colors, or do you just make them black and white?
Yes, simply black and white. It would not work, if the subs would be yellow, if they are white i can use threshold parameter to get white color.


Quote:
Originally Posted by ai4spam
That would be easy enough to do, but what advantage is there? The "Skip this subtitle" button skips without adding that message. Do you actually go in with some other program (like SubtitleWorkshop) and fill in your skipped items? On the other hand, it would make sense to have a "Same as last" button in the manual typing window. However, that only shows up if SubRip can't detect the lines in a multi-line subtitle, which doesn't happen if you use the "inter-line options".
See the first answer... If the subtitle is detected but it is too crowded (a lot of jam) I skip it and than type it in when doing correction. I think the correction is crutial for avi sub ripping, just to get a proper product I correct it in SubtitlesTranslator on speed 3x.




Quote:
Originally Posted by ai4spam
On another note, I was trying to get the subtitle removal filter to work (see some earlier posts). It would be a great help if you could finish it (it's only debugging, everything else is in place) using your knowledge of VirtualDub, and maybe combining it with PodPis.
Yes, maybe I can check that out. I remember long nights when bringing up PodPis to life. Spins, Preview button, sliders, ah, thousands of bugs....
veverica is offline   Reply With Quote
Old 18th August 2005, 12:31   #173  |  Link
johner23
Registered User
 
Join Date: Jul 2003
Location: Brazil
Posts: 228
Where I can find PodPis nowadays?

---> http://neuron2.net/other.html

I try using that link to find out, but it seems the path for PodPis Filter is broken.

I have heard about SubLog Extractor Filter too, by (c) Alain Vielle.

---> http://avielle.chez.tiscali.fr//video/sublog.html

Those filters ( PodPis and SubLog Extractor) could be improved and help SubRip to recognize better the hardcoded subtitle process?

Nice job, guys.

I hope you keep these projects and future improvements for them too. Thanks.

Best regards.

devil (johner)

Last edited by johner23; 18th August 2005 at 12:34.
johner23 is offline   Reply With Quote
Old 18th August 2005, 17:06   #174  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 378
@veverica: About the extra button, well, SubRip will still try to fill in the characters that it does recognize, but you're right, I guess it makes sense. I'll think about the best way to do it. About the sub removal: it just occurred to me that, ideally, it would be the reverse of VobSub/VSRip, in which the filter would read a .sub/.idx file and remove the burned-in subtitles. I was too lazy to try to go into the file format, and settled for reading simple .pgm images saved by SubRip instead. Maybe VobSub/VSRip's file reading part can be used (if available), and combined with DeLogo's removal part (which is what I use). Do you think you could do that? It'd be a lot more work than debugging my stuff, which is based on DeLogo. If so, send me your email in a PM.
On the other hand, I could work on improving SubRip's detection, based on both PodPis and SubLog Extractor, and make it save the result in standard .sub/.idx files instead of the .pgm sequences it does now. What do you think?
Again, I think the main advantage of using SubRip is that you see the subtitles on screen, and that some of the characters are filled in for you. So, even when you decide to skip a sub to fill it manually later, it helps, and will save you time as you progress, because it will recognize more and more characters as you go on. Maybe adding the possibility to edit a subtitle right within SubRip would help too, I'll think about it.

@johner: Thanks for the links, I didn't know about SubLog Extractor.

Last edited by ai4spam; 18th August 2005 at 17:24.
ai4spam is offline   Reply With Quote
Old 19th August 2005, 11:41   #175  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 378
From the looks of it, both PodPis and SubLog extractor have fairly simple detection schemes. SubRip's is done in several steps, much like AviSubDetector's, and I think it works a lot better. I guess the only missing functionality is saving the bitmaps as .sub/.idx files for later processing/removal, without user assistance beyond the initial cropping and setting the colors. I'll see what I can do about implementing it. Now if only someone would undertake the endeavour of making a "reverse VobSub/VSRip" to remove burned-in subtitles given .sub/.idx files...
ai4spam is offline   Reply With Quote
Old 25th August 2005, 11:14   #176  |  Link
pcjco04
Registered User
 
Join Date: Mar 2004
Posts: 44
ai4spam, the page for your char matrix displays the message "The file was removed." again.
Can you post it again ?
pcjco04 is offline   Reply With Quote
Old 26th August 2005, 12:23   #177  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 378
Sure, here it is: http://sr1.mytempdir.com/132702
If anyone cares to host it, feel free to do it and post your mirror here, I'll update my first post accordingly.
ai4spam is offline   Reply With Quote
Old 29th August 2005, 09:50   #178  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,175
Laaaaaaaaaaaaaaaaame. (The site, not you.)

http://foxyshadis.slightlydark.com/r...CharMatrix.rar
foxyshadis is offline   Reply With Quote
Old 29th August 2005, 23:56   #179  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 378
Thanks for the mirror. Let us know if at any time it becomes a burden, and I'll rmove the link.
ai4spam is offline   Reply With Quote
Old 1st September 2005, 03:48   #180  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,175
170K as a burden? Don't worry about it, that site's up 24/7 and puts out tens of gigabytes a month.
foxyshadis is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:53.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2017, vBulletin Solutions Inc.