Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 4th May 2005, 20:25   #21  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Thanks everybody for the replies.

@Endzeit: Thanks for the complement. AviSubDetector ( http://forum.doom9.org/showthread.php?s=&threadid=89802 ) does the same thing and has TONS of options, but is hard to use (the interface is just horrendous) and makes lots of mistakes. SubRipAvi will hopefully work just as well or better, and be easier to use.

@niamh: Pressing Enter in the edit box by default means an empty char, and you can use it for skipping over characters that are too small and the like. I guess I could put a checkbox to give you the choice of having functionality you want. About that .idx file, try not using all the colors, sometimes it helps.

@smiller667: The beta version is already up on the official site mentioned in the first post. Now I'm working on improving the speed and accuracy of the detection.

@castellanos: Use the "Post-OCR correction" facility in the text window (there's a button and a menu option, your choice). Also, you can use URUSoft's Subtitle Workshop ( http://www.urusoft.net/downloads.php?lang=1 ) to correct many other OCR mistakes. You can even use the corrector in MS Word for that (Subtitle Workshop can do it for you).

@E-male: There is a subtitle and logo remover filter for VirtualDub, check out http://www.compression.ru/video/subt...msu_delogo.zip

Last edited by ai4spam; 4th May 2005 at 20:41.
ai4spam is offline   Reply With Quote
Old 4th May 2005, 23:05   #22  |  Link
zuggy
Registered User
 
Join Date: May 2002
Location: Czech Republic
Posts: 171
Quote:
Originally posted by castellanos
...if an old bug is fixed it...
How could be fixed (primary) ocred I&l letters if their "pictures" are frequently totally equal?
zuggy is offline   Reply With Quote
Old 4th May 2005, 23:32   #23  |  Link
masken
uhm... ?
 
Join Date: Oct 2001
Location: Gothenburg, Sweden
Posts: 281
Quote:
Originally posted by ai4spam
Thanks for the reply. I don't quite understand what you mean by "hard cases". About the feature suggestion: in what window? The pause/abort can be done, no problem.

I'll resume work at the end of this week. Meanwhile, if he has time, Zuggy will try to solve the crash bug on P4.
The pause/abort is when selecting colors. If you've got a "hard case" you end up in a "catch 22" there, and have to use Ctrl+Alt+Del to be able to end the application.

For accelerator indicators, I mean the OCR-process window, where you work with the subs OCR:ing them. It would be very nice to be able to perform an OCR just using the keyboard and accelerator indicators (ie; alt+i to switch Italics on/off), ie; no mouse operations or 4-times tabbing + space to reach the Italics checkbox

About "hard cases", I've previously sent Zuggy examples of vobubs that has been hard or impossible to perform OCR on using SubRip. If there's still need for such examples, I can send more

btw; I forgot the most sought-after feature suggestion of them all The "Expand/Reduce selection" feature, similar to the one in Gabest's SubResync, for extending/reducing a selection in the image/letter being OCR'd.

oh, came up with yet another one How about a "Only allow whole words to be formatted" feature in the post-OCR correction settings? I often find that when working with formatted subs (italics etc), some letters are sometimes OCR'd with the wrong formatting, making the subs actually quite alot larger (and harder to read/post-OCR fix) because of all the <i></i>'s. If one could limit formatting to whole words instead of single letters, this problem would be gone. Also note that Subtitle Workshop can't handle this, SW only handles formatting on a per-dialogue basis (and actually destroys the formatting to this state too if you open/save a sub in SW).

EDIT #4
Another, low-prio feature suggestion is Regular Expressions support in the post-OCR correction. Would be neat to have a script-file much like in SW where you could specify how the post-OCR should be performed using RegEx. Ie; have an open-format type of post-OCR correction (including the default one). Low prio for me as this feature is already available in SW.

Last but not least, a HUGE THANK YOU for keeping up development of the absolutely best OCR application available for subtitles!

Last edited by masken; 5th May 2005 at 00:01.
masken is offline   Reply With Quote
Old 5th May 2005, 00:06   #24  |  Link
smiller667
Registered User
 
Join Date: Oct 2001
Posts: 1,125
@ai4spam: thanks, looks promising. I'll report back.

@niamh: The subs are not soo hard to OCR - edit the colours to remove all background and antialisasing colours and set the OCR difference threshold a bit lower, perhaps 700. Distinguishing italics and plain text is tricky, though. I'd probably ignore all text attributes and set the few lines in italics manually afterwards.

Steve
smiller667 is offline   Reply With Quote
Old 5th May 2005, 03:56   #25  |  Link
johner23
Registered User
 
Join Date: Jul 2003
Location: Brazil
Posts: 234
I have tried your program to extract some hardburned subtitles from an avi ( as I did using ASD, another great tool too ) and have the same problems related in the link above:

---> http://forum.doom9.org/showthread.php?threadid=93986

Any way to solve this?

Thanks.

devil (johner)
johner23 is offline   Reply With Quote
Old 5th May 2005, 10:33   #26  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@masken: Great suggestions. Will work on them (the expand/reduce selection is next on the agenda). Whole words may take a bit, since there are all these output formats that are supported independently . Please do send me (email) some hard cases, with a line or two of explanations.

@johner23: Yes, SubRipAvi already has duplicate removal, but only for .avi subtitles. It removes subs that are IDENTICAL and have 1 frame distance between them I guess I should also remove subs that have 0 frames distance between them. It'll be trivial to make it work for DVD subtitles also.

Last edited by ai4spam; 5th May 2005 at 10:44.
ai4spam is offline   Reply With Quote
Old 5th May 2005, 15:36   #27  |  Link
yaz
n00b ever
 
Join Date: May 2002
Posts: 627
yep ... that 'whole word only' thing would be great. not a big deal as it can be easily corrected even w/the most basic editor but quite annoying when occurs.

another clue. i've never ever been able to define a good color palette, so i don't know how to cope w/'soft-looking' subs. ye know, white letters outlined w/light grey on grey background, or such. ripping such subs makes practically retyping the whole subs as subrip skips complete line very frequently. so. would it be possible to implement sg like 'loading predefined color schemes', (say, black and white, or so). it'd be quite beneficial for simpletons like me.

thx a lot for your worx
y
yaz is offline   Reply With Quote
Old 5th May 2005, 16:14   #28  |  Link
niamh
Dismembered
 
niamh's Avatar
 
Join Date: Nov 2003
Location: Craggy Island
Posts: 873
@ Ai4spam and smiller : thx for the tip
__________________
Allen's Axiom: When all else fails, read the instructions.
niamh is offline   Reply With Quote
Old 5th May 2005, 17:36   #29  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
@yaz: Unfortunately, as far as I know, each DVD has its own pallette, so predefined schemes won't work. One thing that could be done is to redo the binarization part of the algorithm. Right now it takes at most 4 out of 16 colors into consideration. Maybe some tolerance in color is also necessary. Anonther solution would be to click manually on each color, just as with hard-subbed videos. However, this is not exactly my area, I only work on the hard-subbed videos part and just occasionally dabble elsewhere, when it is easy enough to understand.

Last edited by ai4spam; 5th May 2005 at 18:27.
ai4spam is offline   Reply With Quote
Old 5th May 2005, 22:57   #30  |  Link
masken
uhm... ?
 
Join Date: Oct 2001
Location: Gothenburg, Sweden
Posts: 281
@ai4spam, sent u a pm

Also came up with another feature that really should be in subrip; built-in UnRAR.dll routine. Subrip really should be able to unRAR .sub's in an .idx/.sub set as it's very popular to RAR the .sub when you backup as XviD's.
masken is offline   Reply With Quote
Old 6th May 2005, 08:54   #31  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
I can't access your website, please mail me the subs instead (@gmail).
unRAR is not on my list of priorities, there are tons of other improvements that are more important.
ai4spam is offline   Reply With Quote
Old 6th May 2005, 10:21   #32  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Beta 10 is up on the website.
Changes:
- speedup in avi files (but still a lot to do)
- moved all avi stuff to a new window, changed some things in the GUI
- added Ctrl-Enter for using best guess in OCR
- added Pause/Abort button in color change dialog
- possibly solved the duplicate subs problem (can't test, someone mail me some problem subs @gmail)

@masken: shortcuts are already available for formatting, use Ctrl-I, Ctrl-B and so on, while the cursor is inside the edit box. Can't use Alt- because that shifts the focus away from the edit box. Besides, Ctrl- is the standard key used in other applications (Word, etc.).

Last edited by ai4spam; 6th May 2005 at 18:14.
ai4spam is offline   Reply With Quote
Old 6th May 2005, 12:45   #33  |  Link
yaz
n00b ever
 
Join Date: May 2002
Posts: 627
Quote:
Originally posted by ai4spam
... as far as I know, each DVD has its own pallette, so predefined schemes won't work ...
yep ... that's what i've been afraid of
thx for the new release (it's getting better and better)
y
yaz is offline   Reply With Quote
Old 7th May 2005, 12:47   #34  |  Link
masken
uhm... ?
 
Join Date: Oct 2001
Location: Gothenburg, Sweden
Posts: 281
@ai4spam, would it be possible to add ToolTip texts to the shortcuts (Ctrl + I) etc? Perhaps also underline the letter for each accelerator indicator (Italic etc)?
masken is offline   Reply With Quote
Old 8th May 2005, 09:52   #35  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Beta 11 is available now. Tons of bugfixes and improvements, including some of the things requested here.
Video file support is about as fast and as good as I can make it. Of course, additional heuristics are possible, but I probably won't be implementing them (no time ). I'll only deal with bug reports and some features that I think are worth implementing, but I'll be happy to explain what I did if someone wants to continue.
It still misses some subtitles sometimes, in very hard cases (like, a sub disappears, and the very next frame, everything is white where the subtitles were). This doesn't happen very often (2 times in the movie I tested). When it does happen, stop processing, go to the first frame of the subtitle and play with the settings until it looks just right. That solved the problem in my case.
It seems to confuse a lot italics and regular characters, so just make everything regular if possible, otherwise you'll need to crank up the OCR sensitivity and pretty much end up typing the whole subtitle.
ai4spam is offline   Reply With Quote
Old 9th May 2005, 18:28   #36  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
A request I received via email, and my reply:

Quote:
ai4spam, thanks for all the improvements to SubRip!
They are appreciated by me and many others.

Would it be possible to modify the "Open Hard Subbed Video Files" feature so that you could open VOBs on a DVD? You would choose VTS_01_1.vob on the DVD and it would automatically VTS_01_1.vob, VTS_01_2.vob, etc (kind of like the "Open VOB(s)" feature but for hardcoded subtitles).

I have some DVDs with hardcoded subtitles that I would like to rip.
Hmm, I'm not sure, but it should work already: if you have the MPEG2 video splitter/filters installed (i.e., if you can play DVDs in MediaPlayer and other generic players like ViPlay, but NOT DVD players like CyberLink PowerDVD), then it should simply open them. It won't be able to detect the frame rate, so you'll have to input it manually. Also, you'll need to resize the video window so that you can see the subtitles in it.

Please try it out and let me know if it works. I don't know how it deals with encryption, try opening the movie with a DVD player first, to authenticate. Also, the MPEG2 splitter may only be able to deal with separate VOBs, not with an entire IFO, so you may need to open them one by one and join the subs manually later.

On another note, I just tried an .ogm file, and although it works with MediaPlayer, it refuses to work within SubRip. I guess it's Delphi's fault, I'll try to find another way to deal with this. SubtitleWorkshop works fine, so I guess there is a solution. I wrote DeK about it, waiting for his reply.

-ai4spam

Last edited by ai4spam; 9th May 2005 at 18:42.
ai4spam is offline   Reply With Quote
Old 10th May 2005, 06:37   #37  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Beta 12 is available now.

Changes:
- Introduced the "extend right and left" feature, which allows you to take disjoint characters as one. Technical details follow . It only lets you do it once, so you can't get more than 2 disjoint parts together. Don't abuse it, once you designate a character as part of such a pair, it will always be considered as such, and all subsequent pairs which contain it will need to be typed in manually. For example, if you set "in" as a group of letters, any time "i" shows up, SubRip will ask you for the pair of characters that "i" is part of, like "it", "im" and so on. This behavior is flipped if the first part of a pair ("i") is encountered at the end of a line, at which point it is overwritten as a single char, and all subsequent pairs will not be recognized until another flip is performed by setting it again as part of a pair.

I know, this sounds complicated, but I think it's easy to use. If someone from Israel can please test this with right-to-left processing, it would be great to get some feedback (I did my best, but I have no idea if it does what it's supposed to do).

- Improved the video recognizing routine, and added an option to draw lines between the text lines, to erase stray points. Its parameters are set manually, but if applied, it greatly improves the accuracy of the results. It pretty much solved the problems I had before, but it's still best to use just plain characters, instead of trying to use italics and other styles.

As specified in my previous post, the only other thing I think is worth doing right now is finding another way to play videos, to support all the formats (.ogm, .vob and so on). If I find the appropriate component, I will put it in. Otherwise, I won't have much time for other features.

While the routines are not as advanced as in AviSubDetector, and some manual tweaking is required (for example when the subtitle color and/or position change), I think the ease of use will make SubRip a good choice for ripping subtitles from hard-subbed videos.

It would be nice if someone made a tutorial on how to use the new features. I really don't have the time, and I think that if someone else makes it, it would be an opportunity to check thoroughly for functional problems. Any volunteers? Please send me an email or reply here.

Last edited by ai4spam; 10th May 2005 at 07:43.
ai4spam is offline   Reply With Quote
Old 11th May 2005, 11:29   #38  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Quote:
Originally posted by ai4spam

Hmm, I'm not sure, but it should work already: if you have the MPEG2 video splitter/filters installed (i.e., if you can play DVDs in MediaPlayer and other generic players like ViPlay, but NOT DVD players like CyberLink PowerDVD), then it should simply open them. It won't be able to detect the frame rate, so you'll have to input it manually. Also, you'll need to resize the video window so that you can see the subtitles in it.
Well, DVDs seem to work (if you have the filters installed, as specified above, for example from http://sourceforge.net/projects/gplmpgdec ), you only need to set up the MCI extensions for .vob and .ifo to be handled by MPEGVideo2, as shown in
http://www.atlasti.com/mcisetup.shtml
.ogm files would work also, if you register the .ogm extension to MPEGVideo. Unfortunately, even if you uncheck the OGM filter option "seek only to keyframes", it won't seek to a regular frame.

The best way to use SubRip is still to rip the DVD to an .avi or a .mpg first .

Last edited by ai4spam; 11th May 2005 at 11:44.
ai4spam is offline   Reply With Quote
Old 11th May 2005, 11:36   #39  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Beta 13 is available on the official site.
Changes: minor improvement when filling "open areas", new areas are filled when drawing the lines in the spaces between subtitles.
Since nobody seems to find any serious bugs, chances are this will be declared 1.2 Final.

Last edited by ai4spam; 11th May 2005 at 11:40.
ai4spam is offline   Reply With Quote
Old 11th May 2005, 20:18   #40  |  Link
ai4spam
Programmer
 
ai4spam's Avatar
 
Join Date: Sep 2003
Posts: 382
Continuation on ripping VOBs (question received by mail and my reply):

Quote:
Hello,

These are the steps I am using to open VOB:

1) File -> Open Hard Subbed Video Files
2) Select VTS_01_1.VOB
3) I get "Cannot detect frame rate" and Input frame rate value
4) I enter frame rate value and click OK
5) I get "File opened in MediaPlayer mode. Move all windows away from the subtitle region."
6) The Video File Viewer opens but the window is all black (no picture)

Am I doing anything wrong? I installed gplmpgdec and added MCI extensions for .vob and .ifo in the registry.
Well, I'm in the process of looking for a better MediaPlayer. I'm not sure whether or not my search will be successful.
As for playing VOBs, I only tried with my AniMusic DVD, which is not encrytped. Try ripping the VOBs first with a DVD ripper, and open the decrypted VOBs from your hard drive. Also, I have not actually checked whether the GPL MPEG2 decoder works. Can you please confirm if you're able to play VOBs using mplayer.exe or mplay32.exe? If they don't work with the GPL decoder, then neither does SubRip.

I should mention that I don't know whether decrypted VOBs really work, or if they behave like OGM files and only seek to keyframes (I was too lazy to copy the files from the DVD to my HDD). Please try ripping and then let me know. Basically, you should be able to use the edit control to change the frame number, and see changes on the screen.

If you're in a hurry, the quick and dirty solution is to use a 1-pass DVD2AVI or something similar and then run SubRip on the resulting AVI.

Last edited by ai4spam; 11th May 2005 at 20:55.
ai4spam is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 06:40.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.