Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 26th February 2012, 20:46   #221  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
Quote:
Originally Posted by ben_franklin View Post
Apparently I spoke too soon. After doing several subs almost effortlessly I tried to do the subs for "black mask" bluray. They don't seem to be any different that other subs, yet I have to do identification on letters in every single sentence.....
My OCR fu is still pretty basic I'm afraid. If whatever authoring tool the bluray disc creators used for subtitles produces inconsistent characters (due to scaling most likely) then my OCR step will require a lot of manual work. This is why I spent so much time getting the manual entry user interface to be easy and efficient.

Basically I look for an exact match of the pixels between the characters in the database and the test character on the screen. If it's a bluray sup file I also shrink both the database and the test character by a factor of 3 in the x and y dimensions to try to find an approximate match. This results in 9 smaller patterns for each match and 9 for each test character (the shrinking can be done done 9 different ways with different starting positions) for a total of 81 chances at a match. But there's nothing more complicated than that going on - I don't in any way understand the shape of the characters.
Tappen is offline   Reply With Quote
Old 12th March 2012, 08:19   #222  |  Link
nibus
Telewhining
 
Join Date: Mar 2010
Posts: 275
I've had a weird issue where the program detects all "i" characters with "¡" (the upside down !). I can't figure out how to fix this. Is there a way to manually set a character assignment?
nibus is offline   Reply With Quote
Old 12th March 2012, 21:34   #223  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
When doing OCR there's a button in the bottom right labelled "Review and Correct OCR Matches". Press that.

Open the OCR Training drop-down list and find the bad match between i and the upside-down ! and press "Remove a Training".

Pressing "Remove all Trainings for this Character" would also work. Assuming you don't OCR Spanish much it would probably prevent future problems since it would eliminate all matches for the upside-down ! character currently in the database.
Tappen is offline   Reply With Quote
Old 13th March 2012, 17:07   #224  |  Link
nibus
Telewhining
 
Join Date: Mar 2010
Posts: 275
I actually tried that, but strangely there is no listing for the letter i. I also tried starting from a brand new OcrMap.bin file.
nibus is offline   Reply With Quote
Old 14th March 2012, 20:49   #225  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
If you delete your OcrMap.bin file the program will re-initialize it with the OcrMapOrig.bin file so that might explain why the 2nd fix didn't work.

The bad match would be for the upside down "!" character, not for "i". That character is probably at the beginning or end of the list - outside of the alphabet.
Tappen is offline   Reply With Quote
Old 15th March 2012, 06:16   #226  |  Link
nibus
Telewhining
 
Join Date: Mar 2010
Posts: 275
Strangely, the upside down "!" is matched correctly - I think it's the regular letter i that is being seen as the upside down "!". But like I said there is no letter i (lower case) listed, except in italics. I guess I could always just do a search and replace.


nibus is offline   Reply With Quote
Old 17th March 2012, 07:04   #227  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
Can't you just remove the training error (hilited in the 2nd picture) and your problem is fixed? I don't understand what the problem is.
Tappen is offline   Reply With Quote
Old 17th March 2012, 08:32   #228  |  Link
nibus
Telewhining
 
Join Date: Mar 2010
Posts: 275
The training in that second screenshot isn't an error - it's correctly identified the upside down ! mark. So removing it has no effect on the incorrect training of matching a lowercase letter i with the same character. I would remove the training of the letter i with the upside down ! but it is not listed, as shown in the first screenshot.
nibus is offline   Reply With Quote
Old 20th March 2012, 23:00   #229  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
Ah, I finally see the problem. This is the same as the issue with l and I having the same bit pattern in many subtitle fonts making accurate matching impossible. I added the entire spellcheck step just to solve that issue.

I'll have to make an option in the spellcheck step to discriminate between i and ¡ to fix this. I suppose the rule is that if it's not at the beginning of a word, or just after a ¿ at the beginning of a word, I can assume it's an i (eye) and not an inverted exclamation point. Otherwise I'll have to ask and build up a dictionary of words that really begin with i. Quite a bit of work, but I'll see what I can do.

For now, I'd remove the training and when you next run the OCR choose i (eye) and not the inverted exclamation because there are likely more of the former than the latter making cleanup easier.
Tappen is offline   Reply With Quote
Old 22nd March 2012, 10:07   #230  |  Link
aMvEL
Registered User
 
Join Date: Dec 2008
Posts: 30
Any chance for a possibility to select more than one language at a time when ripping, or at least being able to filter out or prioritize languages from the Subtitle track selection list?

It would speed up the ripping for me, since I usually rip English and Norwegian subtitles. The problem is usually that the Norwegian subtitle track is near the bottom of the list which makes it tiresome when ripping several seasons of tv-series, seeing as how I need to scroll down the subtitle track list every time.
aMvEL is offline   Reply With Quote
Old 22nd March 2012, 21:46   #231  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
aMvEL: you guys and your batch processing: always surprises me what my customers want to do. But this is a reasonable request, and shouldn't be too difficult, so I'll see what I can do.

I suppose some sort of option that lets you choose 2 languages to put at the top of the sorted subtitle track list would be a start. Also the list currently only shows 5 items. Perhaps I can change the layout of the dialog to make the list taller and allow more to be visible without scrolling.
Tappen is offline   Reply With Quote
Old 22nd March 2012, 22:08   #232  |  Link
aMvEL
Registered User
 
Join Date: Dec 2008
Posts: 30
It wouldn't have been as much of a problem if I only rip movies, but since I rip mostly tv-series with a lot of episodes, I'd like to make it as easy as possible.

Both of your suggested solutions seems like excellent to me, as it would simplify things alot.
aMvEL is offline   Reply With Quote
Old 23rd March 2012, 08:12   #233  |  Link
Chetwood
Registered User
 
Chetwood's Avatar
 
Join Date: Nov 2001
Posts: 1,104
Same here. There could be some tweaking done to streamline the process but I'd thought to hold back with suggestions till more pressing issues (like proper vertical positioning when OCRing to ASS or changing palettes to counter blurred outlines) are solved.
__________________

MultiMakeMKV: MakeMKV batch processing (Win)
MultiShrink
: DVD Shrink batch processing
Offizieller Übersetzer von DVD Shrink deutsch
Chetwood is offline   Reply With Quote
Old 27th March 2012, 22:17   #234  |  Link
Slasher
Registered User
 
Join Date: Oct 2011
Posts: 14
Hi Tappen,

Thanks again for all your work.

I want to point out Chetwood's request about proper vertical positioning when exporting to ass. Other than this issue the app worked fine for me.
Slasher is offline   Reply With Quote
Old 28th March 2012, 00:04   #235  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
Could you guys explain again what exactly you want to change with vertical positioning that isn't done when the "Keep Source Lines and Positions" option is set on the "Create Subtitle File" page? Do you just want a "left-align" or "center-align" option for the ASS tags?

I have to say the reason I wrote this app in the first place was because I didn't like the vertical positioning of DVD subtitles (too high on the screen, with too many line-breaks on 16:9 film) so for me allowing the ASS rendering software to place and line-break the subtitles wherever it wants is the main reason I use my own program. This is why it's hard for me to understand other points of view on the subject and you have to spell it out repeatedly and in simple language.

Last edited by Tappen; 28th March 2012 at 00:21.
Tappen is offline   Reply With Quote
Old 28th March 2012, 17:36   #236  |  Link
Chetwood
Registered User
 
Chetwood's Avatar
 
Join Date: Nov 2001
Posts: 1,104
It's like I wrote in my last mails to you: I want word-wrap and horizontal/vertical positioning to be identical to the Vobsub's:



But when outputting to ASS and having 'Keep Source Line Breaks' selected it looks like this:



The SRT has the proper horizontal placement but vertical is off and thus blocking credits:

__________________

MultiMakeMKV: MakeMKV batch processing (Win)
MultiShrink
: DVD Shrink batch processing
Offizieller Übersetzer von DVD Shrink deutsch
Chetwood is offline   Reply With Quote
Old 28th March 2012, 18:16   #237  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
You have to understand that ASS won't use identical fonts to the DVD subtitles so the width of a line of text won't be the same.

This means you have to choose between left-aligned and center-aligned for all text. The pictures above are left-aligned so their centers don't match. I can put in an option to center-align but then any text that is left-aligned will look strange. Early versions of SubExtractor worked that way and people reported it as a bug.

I'm sorry to say you'll never be completely happy with any solution that a computer can produce.
Tappen is offline   Reply With Quote
Old 28th March 2012, 23:33   #238  |  Link
Slasher
Registered User
 
Join Date: Oct 2011
Posts: 14
Quote:
Originally Posted by Tappen View Post
I have to say the reason I wrote this app in the first place was because I didn't like the vertical positioning of DVD subtitles (too high on the screen, with too many line-breaks on 16:9 film).
Let me better explain my issue. When ocring a bluray subtitle and outputting to ass with the "Keep Source Lines and Positions" option enabled, the resulting ass subtitle is too high on the screen compared to the original bluray subtitle position.

I think this is linked to the fact that when calculating the positions the program assumes 1080p video. But I want these positions adapted to 720p. Could you do that? Maybe offer an option like scaling positions to a preset set of resolutions or maybe custom resolutions? I already tried the "resample resolution" option in Aegisub but with no effect, the subtitle stays the same, even though it changes the values for positions.

Last edited by Slasher; 29th March 2012 at 03:32.
Slasher is offline   Reply With Quote
Old 29th March 2012, 07:24   #239  |  Link
Chetwood
Registered User
 
Chetwood's Avatar
 
Join Date: Nov 2001
Posts: 1,104
Quote:
Originally Posted by Tappen View Post
I'm sorry to say you'll never be completely happy with any solution that a computer can produce.
I get that. Since the Vobsub does not provide info on what font is used (and even if it were), it's not certain that an ASS set to that font would display identically cause there's no way of knowing how the standalone/software player will render the font.

Still, AFAIK many Vobsubs do not use the whole screen and position it at coordinates 0,0 but have a bitmap as small as the rendered text and position this accordingly. Could Subextractor translate this position info to the ASS? If not, it should default to horizontally centered items since the overwhelming majority of the subs I've seen so far are centered.

I've seen very few DVDs that place color-coded items off-center close to the person speaking and only one TV show that does this so far (and this only on the US DVD, the English sub of the German DVD has centered subs only):



which looks like this in ASS:



and like this in SRT:



I agree with Slasher that scaling options would be awesome. When saving to ASS, the save dialogue would offer to pop up a preview window where the longest item is displayed and people could change size, color and position. If that is too much work, please add the option to center-align the subs (a mouseover would explain the implications so people would not report this as a bug again). Thanks!
__________________

MultiMakeMKV: MakeMKV batch processing (Win)
MultiShrink
: DVD Shrink batch processing
Offizieller Übersetzer von DVD Shrink deutsch

Last edited by Chetwood; 29th March 2012 at 07:26.
Chetwood is offline   Reply With Quote
Old 29th March 2012, 21:18   #240  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
SDH subtitles (for the deaf and hard of hearing) position the text near the speaker and left-align. A lot of people use this type of subtitle for various reasons and they were the ones complaining about the center-aligned text. I have a lot of sympathy for their viewpoint so I'll have to add centered as an option and leave left as the default.

In the DVD screenshots above the font used to create the DVD subtitle bitmaps was clearly unusually tall and narrow. If you changed the font used by SubExtractor to something like Arial Narrow instead of Tahoma it would probably line up better. Just a tip.
Tappen is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:41.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.