Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 25th October 2011, 21:46   #101  |  Link
nautilus7
Registered User
 
nautilus7's Avatar
 
Join Date: Jan 2006
Location: Athens, Greece
Posts: 1,518
I 've also seen professional studio subs that use 2 dashes to distinct each speaker in that case.
nautilus7 is offline   Reply With Quote
Old 26th October 2011, 01:15   #102  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
I've seen hyphens, dash ems, and double-hyphens all used to separate speakers, and also sometimes, but it's not common, the first speaker doesn't have an indicator only the 2nd and any others down the screen.

It's also common for speakers to be separated by positioning of the subtitles on the screen, so this is much more of an SRT conversion issue than ASS conversion where positioning can be preserved.

Interesting problems. I think I'm going to try to preserve the choices of the original subtitle authors where possible, but use a single hyphen the rest of the time to indicate multiple speakers after SDH removal. No one will be perfectly happy but things should work out pretty well.
Tappen is offline   Reply With Quote
Old 26th October 2011, 06:28   #103  |  Link
Thunderbolt8
Registered User
 
Join Date: Sep 2006
Posts: 2,197
Quote:
Originally Posted by Tappen View Post
but use a single hyphen the rest of the time to indicate multiple speakers after SDH removal. No one will be perfectly happy but things should work out pretty well.
give an example please of how you mean to look it, not sure what you mean. imho the one hypen thing should only be used when its really one hyphen in the original subtitle layout, if possible. otherwise, the two hyphen thing for two lines we have so far is better imho. I find that doing your own one hyphen layout can sometimes look a bit strange compared to what the studios would do. they seem to have give it more thought when it seems logical to put the one hyphen or two or none when needed. that wouldnt be possible here.
__________________
Laptop Lenovo Legion 5 17IMH05: i5-10300H, 16 GB Ram, NVIDIA GTX 1650 Ti (+ Intel UHD 630), Windows 10 x64, madVR (x64), MPC-HC (x64), LAV Filter (x64), XySubfilter (x64) (K-lite codec pack)

Last edited by Thunderbolt8; 26th October 2011 at 06:37.
Thunderbolt8 is offline   Reply With Quote
Old 26th October 2011, 13:26   #104  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
The current "Change of speaker prefix" I use is a single "-". All I'm saying is that I'm going to keep this the same. If there's a different prefix in the subs already I leave it alone.
Tappen is offline   Reply With Quote
Old 28th October 2011, 20:41   #105  |  Link
nautilus7
Registered User
 
nautilus7's Avatar
 
Join Date: Jan 2006
Location: Athens, Greece
Posts: 1,518
version 1019 has some issues with SDH removal...

Code:
143
00:15:34,893 --> 00:15:37,062
[ALARM CONTINUES SCREECHING
IN DISTANCE]
isn't remove at all, probably because the brackets are not in the same line.
nautilus7 is offline   Reply With Quote
Old 28th October 2011, 23:00   #106  |  Link
Thunderbolt8
Registered User
 
Join Date: Sep 2006
Posts: 2,197
that is the same for all versions so far. tappen said he hasnt thought of a way to remove SHD stuff which goes over two lines (without being sure to break anything)
__________________
Laptop Lenovo Legion 5 17IMH05: i5-10300H, 16 GB Ram, NVIDIA GTX 1650 Ti (+ Intel UHD 630), Windows 10 x64, madVR (x64), MPC-HC (x64), LAV Filter (x64), XySubfilter (x64) (K-lite codec pack)
Thunderbolt8 is offline   Reply With Quote
Old 28th October 2011, 23:17   #107  |  Link
nautilus7
Registered User
 
nautilus7's Avatar
 
Join Date: Jan 2006
Location: Athens, Greece
Posts: 1,518
Ah, ok, missed that.
nautilus7 is offline   Reply With Quote
Old 29th October 2011, 00:10   #108  |  Link
mindbomb
Registered User
 
Join Date: Aug 2010
Posts: 576
neat.
mindbomb is offline   Reply With Quote
Old 29th October 2011, 16:51   #109  |  Link
Thunderbolt8
Registered User
 
Join Date: Sep 2006
Posts: 2,197
would it be possible to block that the '-' hyphens get added in the course of SHD removal when .ass and exactly position every line output both are ticked?

because then the different positions of lines of different speakers on screen already indicates that there is more than 1 person speaking at the moment. the additional hyphens are then superfluous and look strange with that kind of subtitles (the situation I am referring to are those subtitles containing of up to 3 lines which are positioned like everywhere on screen)

if others have a different opinion on this, then at least having the option for this would be nice
__________________
Laptop Lenovo Legion 5 17IMH05: i5-10300H, 16 GB Ram, NVIDIA GTX 1650 Ti (+ Intel UHD 630), Windows 10 x64, madVR (x64), MPC-HC (x64), LAV Filter (x64), XySubfilter (x64) (K-lite codec pack)
Thunderbolt8 is offline   Reply With Quote
Old 30th October 2011, 00:41   #110  |  Link
Thunderbolt8
Registered User
 
Join Date: Sep 2006
Posts: 2,197
http://www.mediafire.com/?yujdlgn86d157uk

imho it would be useful to implement that lines which begin with '--' dont get the additional hyphen added in case of SHD removal. currently

MAN 1 [ON RADIO]:
<i>--supported Senator Eagleman.</i>

gets changed to

<i>---supported Senator Eagleman.</i>

while that 2nd line would look just fine the way it was:

<i>--supported Senator Eagleman.</i>



now I am not sure, are there cases in which another speaker is indicated first, like

man 1: blabla
<i>--supported Senator Eagleman.</i>

which normally would get changed to (? just a guess, havent seen such a case yet)

- blabla
<i>- --supported Senator Eagleman.</i>

but that looks somehow strange. after the proposed change, it would look like

- blabla
<i>--supported Senator Eagleman.</i>

would look strange a bit as well. but maybe that situation doesnt really occur? at least as far as I can remember, usually a -- at the beginning of a line is only used in combination of other lines if they dont contain a hyphen at the beginning. I might be wrong though.
__________________
Laptop Lenovo Legion 5 17IMH05: i5-10300H, 16 GB Ram, NVIDIA GTX 1650 Ti (+ Intel UHD 630), Windows 10 x64, madVR (x64), MPC-HC (x64), LAV Filter (x64), XySubfilter (x64) (K-lite codec pack)

Last edited by Thunderbolt8; 30th October 2011 at 00:52.
Thunderbolt8 is offline   Reply With Quote
Old 30th October 2011, 01:41   #111  |  Link
Thunderbolt8
Registered User
 
Join Date: Sep 2006
Posts: 2,197
http://www.mediafire.com/?zz19kis7hcfcxe7

some inconsistency:

(HORN HONKING)
GIRL: Hi, John.

gets changed to

-Hi, John. (hyphen superfluous, but we know that problem already)

or

GIRL: <i>Can I wizz</i>
<i>on you, Wolfman?</i>
(SOFT ROMANTIC SONG PLAYING)

gets changed to

<i>-Can I wizz</i>
<i>on you, Wolfman?</i> (SRT, 1 hyphen)

{\an4\pos(377,737)}{\i1}-Can I wizz{\i0}
{\an4\pos(377,817)}{\i1}-on you, Wolfman?{\i0} (ASS, 2 hyphens; seems to depend on whether you tick exact position... or not, then the SHD line () gets inserted above that dialogue instead below)



while

Hi, John.
JOHN: Not too good, huh?

gets changed to

Hi, John.
Not too good, huh?

instead of

Hi, John.
- Not too good, huh?

here the hyphen is actually missing (which wouldnt be bad for .ass in combination with exact position of every line as proposed some posts ago, but bad for .srt and/or when not using exact position of every line)
__________________
Laptop Lenovo Legion 5 17IMH05: i5-10300H, 16 GB Ram, NVIDIA GTX 1650 Ti (+ Intel UHD 630), Windows 10 x64, madVR (x64), MPC-HC (x64), LAV Filter (x64), XySubfilter (x64) (K-lite codec pack)

Last edited by Thunderbolt8; 30th October 2011 at 01:53.
Thunderbolt8 is offline   Reply With Quote
Old 30th October 2011, 10:46   #112  |  Link
Chetwood
Registered User
 
Chetwood's Avatar
 
Join Date: Nov 2001
Posts: 1,104
Why change that at all and not simple convert what's in there?

Also some more issues/feature requests:
  • I'm having trouble OCRing the word "figures" in italics. I can only split the word in 2 instead of 3 parts and I'm not automatically asked for a second split.



  • I can't save SRTs ripped from Vobsubs elsewhere but to c:\Users\Chetwood\videos. "Store Sup File Outputs in Source Directory" only applies to SUP but not Vobsub?

  • I can't change the "OCR data file location"

  • please add an option to save to ANSI instead of UTF (a lot of standalones have problems with the latter)

  • Bein able to drag and drop a sub file onto the program window would be cool

Thanks!
__________________

MultiMakeMKV: MakeMKV batch processing (Win)
MultiShrink
: DVD Shrink batch processing
Offizieller Übersetzer von DVD Shrink deutsch
Chetwood is offline   Reply With Quote
Old 30th October 2011, 12:51   #113  |  Link
Thunderbolt8
Registered User
 
Join Date: Sep 2006
Posts: 2,197
Quote:
Originally Posted by Chetwood View Post
Why change that at all and not simple convert what's in there?
because these problems occur in combination with SHD removal.
__________________
Laptop Lenovo Legion 5 17IMH05: i5-10300H, 16 GB Ram, NVIDIA GTX 1650 Ti (+ Intel UHD 630), Windows 10 x64, madVR (x64), MPC-HC (x64), LAV Filter (x64), XySubfilter (x64) (K-lite codec pack)
Thunderbolt8 is offline   Reply With Quote
Old 31st October 2011, 06:33   #114  |  Link
Chetwood
Registered User
 
Chetwood's Avatar
 
Join Date: Nov 2001
Posts: 1,104
Right. Apparently overread this. I'm still annoyed though, that the authoring people just don't add seperate stream for this. Should be piece of cake for them.
__________________

MultiMakeMKV: MakeMKV batch processing (Win)
MultiShrink
: DVD Shrink batch processing
Offizieller Übersetzer von DVD Shrink deutsch
Chetwood is offline   Reply With Quote
Old 2nd November 2011, 23:25   #115  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
Chetwood:

Just split multiple times if you have to. So in the case you shared highlight the "dot" of the i and complete split, start split again and get the bottom of the i. (Sorry I'm making people think like a programmer rather than a normal human in this case but my long estimate of the time it'd take to code it to work the proper way makes it a low priority issue)

I'll make the option to save in same directory apply to sup and idx/sub next release.

Sorry I don't yet allow moving the OcrMap.bin file location. I show the location so people can back up or move it to new machines manually. I'm thinking about how to allow this to be changed safely. Different versions of Windows have really different default program data locations and security around writing files. I don't want to spend a lot of time on error handling on such a minor feature.

Good idea to allow option to save to ANSI SRT as well as UTF. I'll try to add it soon. Whatever ANSI codepage the Windows UI Culture is currently running should be ok.

Drag'n'drop files. Yeah maybe.

Last edited by Tappen; 3rd November 2011 at 01:56.
Tappen is offline   Reply With Quote
Old 2nd November 2011, 23:28   #116  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
Thunderbolt8: I'll look into the SDH errors soon. One thing I'm definitely going to change is to remove the added hyphens when saving to ASS if the 2 lines aren't part of the same block (in terms of position on the screen).
Tappen is offline   Reply With Quote
Old 3rd November 2011, 00:14   #117  |  Link
Thunderbolt8
Registered User
 
Join Date: Sep 2006
Posts: 2,197
how do you plan to find out whether the 2 lines are part of the same block? by distance of letters and lines?

usually, even when 2 people are standing next to each other, theres always enough space to indicate 2 different speaker. but sometimes, when for example one person is speaking from the off or maybe standing behind another speaker, it can happen that those 2 or 3 lines of speech on screen are quite close to each other that its easy to mistake all those lines belonging to a single speaker. but in such cases the lines of one speaker are often differentiated from the other speaker by being italicized. so maybe italics can also be a criteria to distinguish in these situations when determining whether lines belong to the same block and narrowing the criteria of distance down too far wouldnt be of help.
__________________
Laptop Lenovo Legion 5 17IMH05: i5-10300H, 16 GB Ram, NVIDIA GTX 1650 Ti (+ Intel UHD 630), Windows 10 x64, madVR (x64), MPC-HC (x64), LAV Filter (x64), XySubfilter (x64) (K-lite codec pack)
Thunderbolt8 is offline   Reply With Quote
Old 3rd November 2011, 02:02   #118  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
I already break characters into rectangular blocks and OCR them separately in the code. The rule is something like "within 4 normal character's width left or right or 2 normal character's height up or down means it's in the same block".

If there's an error, you'll see an extra couple of hyphens occasionally. Add too many rules and it'll just make the code unfixable AND unreliable. So we'll go with what I've already got for blocks.
Tappen is offline   Reply With Quote
Old 3rd November 2011, 14:19   #119  |  Link
Chetwood
Registered User
 
Chetwood's Avatar
 
Join Date: Nov 2001
Posts: 1,104
Quote:
Originally Posted by Tappen View Post
Just split multiple times if you have to. So in the case you shared highlight the "dot" of the i and complete split, start split again and get the bottom of the i. (Sorry I'm making people think like a programmer rather than a normal human in this case but my long estimate of the time it'd take to code it to work the proper way makes it a low priority issue)
MMh, gonna retry this next time it occurs. IIRC splitting it once made the item not appear again so another split was impossible. Same goes for three letters "erj" recognized as one.

I get it that you want to minimize any potential troubleshooting for users but I'd really appreciate being able to select the OcrMap.bin file location myself. I don't trust 'c:\users' or 'My documents' so I put all important files into a folder that I backup regularly. Maybe you could pop up a short message ("all changes at your own risk!") when someone tries to deviate from the default location and be done with it.

BTW, I had another char not recognized, it had low double quotes Germans often use and looked like this: ,,e''. I had to manually fix it cause Subextractor would not accept it. Thanks again.
__________________

MultiMakeMKV: MakeMKV batch processing (Win)
MultiShrink
: DVD Shrink batch processing
Offizieller Übersetzer von DVD Shrink deutsch
Chetwood is offline   Reply With Quote
Old 3rd November 2011, 16:44   #120  |  Link
Tappen
Registered User
 
Join Date: Dec 2006
Posts: 196
There's an automated attempt to split every unknown character: if there's a perfect split SubExtractor won't stop and ask, it'll just do it. So sometimes even 3 sections joined together require only 1 split.

I can add the low double quotes to the character selection box if it's a common occurrence in German. I think there's an empty spot right now. Let me see if I can find the unicode character point. I'll have to make it work like double quotes I guess.
Tappen is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 06:00.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.