Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
4th May 2005, 20:25 | #21 | Link |
Programmer
Join Date: Sep 2003
Posts: 382
|
Thanks everybody for the replies.
@Endzeit: Thanks for the complement. AviSubDetector ( http://forum.doom9.org/showthread.php?s=&threadid=89802 ) does the same thing and has TONS of options, but is hard to use (the interface is just horrendous) and makes lots of mistakes. SubRipAvi will hopefully work just as well or better, and be easier to use. @niamh: Pressing Enter in the edit box by default means an empty char, and you can use it for skipping over characters that are too small and the like. I guess I could put a checkbox to give you the choice of having functionality you want. About that .idx file, try not using all the colors, sometimes it helps. @smiller667: The beta version is already up on the official site mentioned in the first post. Now I'm working on improving the speed and accuracy of the detection. @castellanos: Use the "Post-OCR correction" facility in the text window (there's a button and a menu option, your choice). Also, you can use URUSoft's Subtitle Workshop ( http://www.urusoft.net/downloads.php?lang=1 ) to correct many other OCR mistakes. You can even use the corrector in MS Word for that (Subtitle Workshop can do it for you). @E-male: There is a subtitle and logo remover filter for VirtualDub, check out http://www.compression.ru/video/subt...msu_delogo.zip Last edited by ai4spam; 4th May 2005 at 20:41. |
4th May 2005, 23:32 | #23 | Link | |
uhm... ?
Join Date: Oct 2001
Location: Gothenburg, Sweden
Posts: 281
|
Quote:
For accelerator indicators, I mean the OCR-process window, where you work with the subs OCR:ing them. It would be very nice to be able to perform an OCR just using the keyboard and accelerator indicators (ie; alt+i to switch Italics on/off), ie; no mouse operations or 4-times tabbing + space to reach the Italics checkbox About "hard cases", I've previously sent Zuggy examples of vobubs that has been hard or impossible to perform OCR on using SubRip. If there's still need for such examples, I can send more btw; I forgot the most sought-after feature suggestion of them all The "Expand/Reduce selection" feature, similar to the one in Gabest's SubResync, for extending/reducing a selection in the image/letter being OCR'd. oh, came up with yet another one How about a "Only allow whole words to be formatted" feature in the post-OCR correction settings? I often find that when working with formatted subs (italics etc), some letters are sometimes OCR'd with the wrong formatting, making the subs actually quite alot larger (and harder to read/post-OCR fix) because of all the <i></i>'s. If one could limit formatting to whole words instead of single letters, this problem would be gone. Also note that Subtitle Workshop can't handle this, SW only handles formatting on a per-dialogue basis (and actually destroys the formatting to this state too if you open/save a sub in SW). EDIT #4 Another, low-prio feature suggestion is Regular Expressions support in the post-OCR correction. Would be neat to have a script-file much like in SW where you could specify how the post-OCR should be performed using RegEx. Ie; have an open-format type of post-OCR correction (including the default one). Low prio for me as this feature is already available in SW. Last but not least, a HUGE THANK YOU for keeping up development of the absolutely best OCR application available for subtitles! Last edited by masken; 5th May 2005 at 00:01. |
|
5th May 2005, 00:06 | #24 | Link |
Registered User
Join Date: Oct 2001
Posts: 1,125
|
@ai4spam: thanks, looks promising. I'll report back.
@niamh: The subs are not soo hard to OCR - edit the colours to remove all background and antialisasing colours and set the OCR difference threshold a bit lower, perhaps 700. Distinguishing italics and plain text is tricky, though. I'd probably ignore all text attributes and set the few lines in italics manually afterwards. Steve |
5th May 2005, 03:56 | #25 | Link |
Registered User
Join Date: Jul 2003
Location: Brazil
Posts: 234
|
I have tried your program to extract some hardburned subtitles from an avi ( as I did using ASD, another great tool too ) and have the same problems related in the link above:
---> http://forum.doom9.org/showthread.php?threadid=93986 Any way to solve this? Thanks. devil (johner) |
5th May 2005, 10:33 | #26 | Link |
Programmer
Join Date: Sep 2003
Posts: 382
|
@masken: Great suggestions. Will work on them (the expand/reduce selection is next on the agenda). Whole words may take a bit, since there are all these output formats that are supported independently . Please do send me (email) some hard cases, with a line or two of explanations.
@johner23: Yes, SubRipAvi already has duplicate removal, but only for .avi subtitles. It removes subs that are IDENTICAL and have 1 frame distance between them I guess I should also remove subs that have 0 frames distance between them. It'll be trivial to make it work for DVD subtitles also. Last edited by ai4spam; 5th May 2005 at 10:44. |
5th May 2005, 15:36 | #27 | Link |
n00b ever
Join Date: May 2002
Posts: 627
|
yep ... that 'whole word only' thing would be great. not a big deal as it can be easily corrected even w/the most basic editor but quite annoying when occurs.
another clue. i've never ever been able to define a good color palette, so i don't know how to cope w/'soft-looking' subs. ye know, white letters outlined w/light grey on grey background, or such. ripping such subs makes practically retyping the whole subs as subrip skips complete line very frequently. so. would it be possible to implement sg like 'loading predefined color schemes', (say, black and white, or so). it'd be quite beneficial for simpletons like me. thx a lot for your worx y |
5th May 2005, 17:36 | #29 | Link |
Programmer
Join Date: Sep 2003
Posts: 382
|
@yaz: Unfortunately, as far as I know, each DVD has its own pallette, so predefined schemes won't work. One thing that could be done is to redo the binarization part of the algorithm. Right now it takes at most 4 out of 16 colors into consideration. Maybe some tolerance in color is also necessary. Anonther solution would be to click manually on each color, just as with hard-subbed videos. However, this is not exactly my area, I only work on the hard-subbed videos part and just occasionally dabble elsewhere, when it is easy enough to understand.
Last edited by ai4spam; 5th May 2005 at 18:27. |
5th May 2005, 22:57 | #30 | Link |
uhm... ?
Join Date: Oct 2001
Location: Gothenburg, Sweden
Posts: 281
|
@ai4spam, sent u a pm
Also came up with another feature that really should be in subrip; built-in UnRAR.dll routine. Subrip really should be able to unRAR .sub's in an .idx/.sub set as it's very popular to RAR the .sub when you backup as XviD's. |
6th May 2005, 10:21 | #32 | Link |
Programmer
Join Date: Sep 2003
Posts: 382
|
Beta 10 is up on the website.
Changes: - speedup in avi files (but still a lot to do) - moved all avi stuff to a new window, changed some things in the GUI - added Ctrl-Enter for using best guess in OCR - added Pause/Abort button in color change dialog - possibly solved the duplicate subs problem (can't test, someone mail me some problem subs @gmail) @masken: shortcuts are already available for formatting, use Ctrl-I, Ctrl-B and so on, while the cursor is inside the edit box. Can't use Alt- because that shifts the focus away from the edit box. Besides, Ctrl- is the standard key used in other applications (Word, etc.). Last edited by ai4spam; 6th May 2005 at 18:14. |
8th May 2005, 09:52 | #35 | Link |
Programmer
Join Date: Sep 2003
Posts: 382
|
Beta 11 is available now. Tons of bugfixes and improvements, including some of the things requested here.
Video file support is about as fast and as good as I can make it. Of course, additional heuristics are possible, but I probably won't be implementing them (no time ). I'll only deal with bug reports and some features that I think are worth implementing, but I'll be happy to explain what I did if someone wants to continue. It still misses some subtitles sometimes, in very hard cases (like, a sub disappears, and the very next frame, everything is white where the subtitles were). This doesn't happen very often (2 times in the movie I tested). When it does happen, stop processing, go to the first frame of the subtitle and play with the settings until it looks just right. That solved the problem in my case. It seems to confuse a lot italics and regular characters, so just make everything regular if possible, otherwise you'll need to crank up the OCR sensitivity and pretty much end up typing the whole subtitle. |
9th May 2005, 18:28 | #36 | Link | |
Programmer
Join Date: Sep 2003
Posts: 382
|
A request I received via email, and my reply:
Quote:
Please try it out and let me know if it works. I don't know how it deals with encryption, try opening the movie with a DVD player first, to authenticate. Also, the MPEG2 splitter may only be able to deal with separate VOBs, not with an entire IFO, so you may need to open them one by one and join the subs manually later. On another note, I just tried an .ogm file, and although it works with MediaPlayer, it refuses to work within SubRip. I guess it's Delphi's fault, I'll try to find another way to deal with this. SubtitleWorkshop works fine, so I guess there is a solution. I wrote DeK about it, waiting for his reply. -ai4spam Last edited by ai4spam; 9th May 2005 at 18:42. |
|
10th May 2005, 06:37 | #37 | Link |
Programmer
Join Date: Sep 2003
Posts: 382
|
Beta 12 is available now.
Changes: - Introduced the "extend right and left" feature, which allows you to take disjoint characters as one. Technical details follow . It only lets you do it once, so you can't get more than 2 disjoint parts together. Don't abuse it, once you designate a character as part of such a pair, it will always be considered as such, and all subsequent pairs which contain it will need to be typed in manually. For example, if you set "in" as a group of letters, any time "i" shows up, SubRip will ask you for the pair of characters that "i" is part of, like "it", "im" and so on. This behavior is flipped if the first part of a pair ("i") is encountered at the end of a line, at which point it is overwritten as a single char, and all subsequent pairs will not be recognized until another flip is performed by setting it again as part of a pair. I know, this sounds complicated, but I think it's easy to use. If someone from Israel can please test this with right-to-left processing, it would be great to get some feedback (I did my best, but I have no idea if it does what it's supposed to do). - Improved the video recognizing routine, and added an option to draw lines between the text lines, to erase stray points. Its parameters are set manually, but if applied, it greatly improves the accuracy of the results. It pretty much solved the problems I had before, but it's still best to use just plain characters, instead of trying to use italics and other styles. As specified in my previous post, the only other thing I think is worth doing right now is finding another way to play videos, to support all the formats (.ogm, .vob and so on). If I find the appropriate component, I will put it in. Otherwise, I won't have much time for other features. While the routines are not as advanced as in AviSubDetector, and some manual tweaking is required (for example when the subtitle color and/or position change), I think the ease of use will make SubRip a good choice for ripping subtitles from hard-subbed videos. It would be nice if someone made a tutorial on how to use the new features. I really don't have the time, and I think that if someone else makes it, it would be an opportunity to check thoroughly for functional problems. Any volunteers? Please send me an email or reply here. Last edited by ai4spam; 10th May 2005 at 07:43. |
11th May 2005, 11:29 | #38 | Link | |
Programmer
Join Date: Sep 2003
Posts: 382
|
Quote:
http://www.atlasti.com/mcisetup.shtml .ogm files would work also, if you register the .ogm extension to MPEGVideo. Unfortunately, even if you uncheck the OGM filter option "seek only to keyframes", it won't seek to a regular frame. The best way to use SubRip is still to rip the DVD to an .avi or a .mpg first . Last edited by ai4spam; 11th May 2005 at 11:44. |
|
11th May 2005, 11:36 | #39 | Link |
Programmer
Join Date: Sep 2003
Posts: 382
|
Beta 13 is available on the official site.
Changes: minor improvement when filling "open areas", new areas are filled when drawing the lines in the spaces between subtitles. Since nobody seems to find any serious bugs, chances are this will be declared 1.2 Final. Last edited by ai4spam; 11th May 2005 at 11:40. |
11th May 2005, 20:18 | #40 | Link | |
Programmer
Join Date: Sep 2003
Posts: 382
|
Continuation on ripping VOBs (question received by mail and my reply):
Quote:
As for playing VOBs, I only tried with my AniMusic DVD, which is not encrytped. Try ripping the VOBs first with a DVD ripper, and open the decrypted VOBs from your hard drive. Also, I have not actually checked whether the GPL MPEG2 decoder works. Can you please confirm if you're able to play VOBs using mplayer.exe or mplay32.exe? If they don't work with the GPL decoder, then neither does SubRip. I should mention that I don't know whether decrypted VOBs really work, or if they behave like OGM files and only seek to keyframes (I was too lazy to copy the files from the DVD to my HDD). Please try ripping and then let me know. Basically, you should be able to use the edit control to change the frame number, and see changes on the screen. If you're in a hurry, the quick and dirty solution is to use a 1-pass DVD2AVI or something similar and then run SubRip on the resulting AVI. Last edited by ai4spam; 11th May 2005 at 20:55. |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|