Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 12th June 2020, 02:03   #1121  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 321
Janusz
There's a better workaround: we can just apply a delay +267 ms to the subtitles after visual/waveform adjusting is complete. But fixing the issue would be even better.
__________________
Magically yours
Raistlin
GCRaistlin is offline   Reply With Quote
Old 12th June 2020, 08:47   #1122  |  Link
varekai
Registered User
 
varekai's Avatar
 
Join Date: Jul 2006
Posts: 457
Quote:
Originally Posted by GCRaistlin View Post
BTW how do you take screenshots of SE with video displayed correctly? I get the black screen instead (that's why I used the camera for the left shot).
https://imgur.com/a/0VcVZf9
varekai is offline   Reply With Quote
Old 12th June 2020, 10:47   #1123  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 96
@GCRaistlin
For the last time I am writing in this matter:



The text "A MARTIN SCORSESE PICTURE" begins to enter the screen from 474 frames and time 00:00:19.769.
At what time your inscriptions are to be applied to the image, this is your problem, not SE.

Edit 01:
I tell you one more time, learn. In the posts above, I wrote where you should look for a solution to your problem.
You do not sync to video only to the soundtrack and if the soundtrack has some noticeable delay compared to the video
you have to take this into account. SE won't do it for you.
Now that you know the exact delay for subtitles, just fix it in subtitles. To this end, this program was created.
And you for several posts looking for some error in the program.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 12th June 2020 at 12:52.
Janusz is offline   Reply With Quote
Old 12th June 2020, 11:08   #1124  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 321
Quote:
Originally Posted by Janusz View Post
The text "A MARTIN SCORSESE PICTURE" begins to enter the screen from 474 frames and time 00:00:19.769.
I'm feeling like we are close to your understanding the problem.
__________________
Magically yours
Raistlin
GCRaistlin is offline   Reply With Quote
Old 12th June 2020, 13:10   #1125  |  Link
varekai
Registered User
 
varekai's Avatar
 
Join Date: Jul 2006
Posts: 457
Hello Subtitle Edit forum members!
Just wanted to warn you that GCRaistlin images links to UGLY pron!
He also links to potentially unwanted application (JS/ExAds.A)
Anyone else than me who finds this an inappropriate behavior?
varekai is offline   Reply With Quote
Old 13th June 2020, 20:26   #1126  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 96
Dangerous tool: "Inspect nocr matchet for ..."

In my opinion, this is an unauthorized change in the content of the character base. I suspect that this is not only the case described.
This text was created so that anyone who wants can check the situation at home.

@ Nikse555: If you fail to reproduce this error, I will send the files.

I do not know from which version there have been such major changes in the character database saving format that the new format is not read by stable versions 3.5.14 and 3.5.15. Beta 145 also no longer reads the new format. Subtitle Edit Changelog 3.5.16 (xth July 2020) BETA doesn't mention this. I wanted to reproduce the error described below on stable versions, unfortunately I was unable to load the new character database into these versions, and on those character databases it may not work the same way. Which does not mean that there is no problem there - it's once or twice - the question remains: how will the new version 3.5.16 take over the old character base.

Description of the problem in beta 269 and several earlier (261 for sure):
1. I created a new character base for new text consisting only of non-italic characters. That was my text.
[Binary image compare threshold] = 238 (this value allows, in my case, to eliminate not all, but at least some character connections),
[No of pixels is space] = 4 (proper value for the font used in the text),
[Max wrong pixels] = 5 (maybe too hot, but I wanted to),
[Constains italic] = off (I will not, so I do not see the need for another setting),
[Line split ...] = Auto (works, so I don't change).
It's good to this place. After correcting a few errors in the character database by a better match I received the error-free text.
Conclusion: the character base for this text is error-free and contains 242 characters (this is important).

2. Time for "Batman" - this file probably contains everything possible to find something that may not work.
[Draw missing texts] = off (I will only review how the new character base works with the same text (font), but also with italics),
[Max wrong pixels] = 10 (to see how it works and what mistakes it will make),
[Contains italic] = on (there are lines with italics, so - at least I understand it - based on this parameter and [Set italic angle ...] OCR should read italics correctly).
For this parameter and italics at all, you have written clearly that it is not working well yet, so this is not the purpose of the test here either.
We look for an italic line in my case, e.g. 283. To see if more characters can be obtained, I change [Max wrong pixels] to 25. Start, stop immediately.
I'm going back to line 283, there are new signs in the line, great. Characters in base 242 - nothing has changed.
Note: <i> </i> flags are only added when the entire word is recognized. Does not apply to single letters "A", "I" polish "z" and probably many more in different languages ​​- here flags are added.

Now we will destroy our base:
we choose on any line with italics "Inspect nocr matchet for ..." in the field "Ispect items" select the first character from the top and down arrow we go down to the last character. We can move the cursor up and observe the "Is italic" field. "v" for italics will not appear next to any character. We choose OK and close this window.
Because it can be hard to find a line or lines where you can see what changes have been made based on characters, it's best to run the scan again for the entire file, then "CTRL + F <i>" and we already know:
in my case "A", "you?" etc.: Long to exchange, it has been marked <i> </i>. The number of characters in database 242 means that these characters have not been added as new. I will say that they have been marked in italics in the character database. Which and which characters will be marked this way - I don't know. It probably depends on what characters based on [Max wrong pixels] and [Set italic angle ...] OCR recognizes and considers italics.
The effect is that from now on, single italics will appear in the text, where there are no italics. We will have to add new characters in places where characters have already been added. Each time you open the "Inspect nocr matchet for ..." window, you may make further uncontrolled changes in addition to your changes.
The fact is that we obtained in this way, for example: "A" in italics, but lost all "A" in archived, future and currently processed files.
I have a few more comments, but this text is already too long, so on another occasion.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 13th June 2020 at 20:54.
Janusz is offline   Reply With Quote
Old 14th June 2020, 12:13   #1127  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 362
@Janusz: Yes, I've changed the .nOCR file format to be slightly more compact. SE 3.5.16 will be able to read both the old format from 3.5.15 and the new format. Version 3.5.15 however will not be able to read the new nOCR format from 3.5.16.
nOCR now uses the "margin-top" value (useful for e.g. comma vs apos), so all nOCR files from 3.5.15 and older will not work optically.

Beta 276 (or later) is now here: https://github.com/SubtitleEdit/subt...leEditBeta.zip
(fixed misc minor issues - mostly regarding expanded characters)

I was not able to re-create the italic-check-issue...
Nikse555 is offline   Reply With Quote
Old 14th June 2020, 12:59   #1128  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 96
@Nikse555
I will send by e-mail the full set of Polish xml and character base. I hope you still have Batman files.

Edit:
On uploaded files you can immediately check the case described
here: https://forum.doom9.org/showpost.php...postcount=1104
and my answer here: https://forum.doom9.org/showpost.php...postcount=1106
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 14th June 2020 at 13:55.
Janusz is offline   Reply With Quote
Old 14th June 2020, 13:30   #1129  |  Link
jlw_4049
Registered User
 
Join Date: Sep 2018
Posts: 327
@niksee555 Thanks for your work. Use the program pretty constantly.

Sent from my SM-G986U1 using Tapatalk
jlw_4049 is offline   Reply With Quote
Old 14th June 2020, 14:44   #1130  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 362
@Janusz: I think you have to click "Add better match" on the false italics... and add the same letter again just without italic. Or... I might be misunderstanding.

@jlw_4049: You're welcome, and thx
Nikse555 is offline   Reply With Quote
Old 14th June 2020, 18:02   #1131  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 96
@ Nikse555
Yes, this is the cure and I use it.
It's just that, like any medicine, it helps at one thing, which is harmful to another. And so it is in this case. I will recover, for example: "A", but I will lose another mark. I've seen this SE behavior before, but I didn't know where it came from. I thought - my mistake - I added another sign and it was good. For some time I started to suspect [Constains italic] about it, so with normal use of SE this option is permanently disabled. Like [Fix common OCR errors - also use hard-coded rules], which I use only after OCR.
Thank you for your work and your time.

Edit:
An excellent move:
Code:
<!-- Will be used to check words not in dictionary.
If new word(s) and longer than 4 chars and exists
in spelling dictionary, it is (or they are) accepted -->
Edit 02:
Problem with "c", "w" and "." at the end of the line.
Image to download



New character base created during OCR with the "Draw missing texts" option enabled. Despite entering the characters correctly, the text is not displayed correctly.
The last "s" instead of a dot was matched automatically without my participation.
Each re-import of the image into OCR causes the effect visible in the image. Such distorted text is transferred to the main program window.

Edit 03: 15.06
Correct text can be achieved, but at what cost and for how long?
We turn off the "Import / OCR ..." window, import our image again into the program.
In the "Import / OCR ..." window we turn off [Draw missing texts], create a new character base and press START OCR, as a result we get the same "*" - this result is correct.
Using "Inspect nocr matches ..." we add a better match for the first "C", "W", "." and "-".
We can press START OCR - we will see that everything is in place. The character database contains only 4 characters that we have entered.
In the next step, select [Draw missing text] to enter the text faster. We add the next missing characters from "o" to "m".
The "Import / OCR ..." window has closed. We look at the effect. Is fine.
Someone will ask: what do I mean?
That's it: before you press START OCR, start observing the text that looks good so far. First press - the second "C" has disappeared,
the next press has no first "C", one more press and we got rid of "W".
A look at the character base - we have lowercase letters instead of capital letters.

@Nikse555, please take a look at this. Somewhere there is an error that is responsible for such behavior of the program.

In one of the earlier posts I wrote that re-scanning the text will fix previously made mistakes.
I'm not backing down out of it. This is the reality. In this particular case, however, it failed.

Edit 04. 16.06
Today I added a new image "t.03.z_and_Z.png" to the archive "Image to download", after importing the image into the program
before scanning I chose the "Latin" character base, [Draw missing text] disabled. SE version 3.5.16.
First scan: "22 P*Dz!ERN!KA 2**1 YEAR"
Second scan: "22 P*DZ!ERN!KA 2**1 YEAR" - this is correct
As you can see, the small "z" has changed into a large "Z". Why is this happening?
It seems to me that [Try to guess unknown words] has gained new opportunities not only for English.
@ Nikse555, you and the whole team - congratulations on the release of the new stable version of the program?
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 16th June 2020 at 23:15.
Janusz is offline   Reply With Quote
Old 17th June 2020, 12:04   #1132  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 362
Quote:
Originally Posted by Janusz View Post
As you can see, the small "z" has changed into a large "Z". Why is this happening?

@ Nikse555, you and the whole team - congratulations on the release of the new stable version of the program?
Yes, thx. SE 3.5.16 is out now: https://github.com/SubtitleEdit/subtitleedit/releases
(Released a bit earlier than planned due to changed spell check dictionary links).
And nOCR would not have been released/improved without your input Janusz
By the way, your image gives a 403.

SE 3.5.16 introduces the first (non-beta) version of nOCR.
A bit like "image compare" but just with lines which makes it easier to scale and recognize different font sizes.
nOCR can also be trained with different fonts fairly easy!!!
Just tried (really fast) to make a small tutorial: https://nikse.dk/SubtitleEdit/nocr


In nOCR, casing of "z" and some other letters are determined by average size of letters... so the first few lines may be different in second run.
Nikse555 is offline   Reply With Quote
Old 17th June 2020, 13:10   #1133  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 96
Quote:
By the way, your image gives a 403.
My mistake. I haven't changed my access rights, sorry. The link should work.
Quote:
In nOCR, casing of "z" and some other letters are determined by average size of letters ... so the first few lines may be different in second run.
That's right. The first case is the first line in the text, the second one appears in the text as line 7.
Quote:
A bit like "image compare" but just with lines which makes it easier to scale and recognize different font sizes.
Probably as a result of this I could not add a large "Z" as a new character. Only loading another text ended my fight to add "Z".
So far it works well, it distinguishes [ , ] and [ ' ], well done. Thank you.
Quote:
Just tried (really fast) to make a small tutorial:
I have read. I think the information it contains is sufficient.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 17th June 2020 at 15:19.
Janusz is offline   Reply With Quote
Old 17th June 2020, 17:08   #1134  |  Link
varekai
Registered User
 
varekai's Avatar
 
Join Date: Jul 2006
Posts: 457
@Nikse555
Thanks for the update! Much appreciated!
varekai is offline   Reply With Quote
Old 18th June 2020, 07:16   #1135  |  Link
jlw_4049
Registered User
 
Join Date: Sep 2018
Posts: 327
Quote:
Originally Posted by Nikse555 View Post
Yes, thx. SE 3.5.16 is out now: https://github.com/SubtitleEdit/subtitleedit/releases
(Released a bit earlier than planned due to changed spell check dictionary links).
And nOCR would not have been released/improved without your input Janusz
By the way, your image gives a 403.

SE 3.5.16 introduces the first (non-beta) version of nOCR.
A bit like "image compare" but just with lines which makes it easier to scale and recognize different font sizes.
nOCR can also be trained with different fonts fairly easy!!!
Just tried (really fast) to make a small tutorial: https://nikse.dk/SubtitleEdit/nocr


In nOCR, casing of "z" and some other letters are determined by average size of letters... so the first few lines may be different in second run.
Thanks for the update. I'll grab latest version tomorrow and test it out!

Sent from my SM-G986U1 using Tapatalk
jlw_4049 is offline   Reply With Quote
Old 18th June 2020, 12:12   #1136  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,635
Quote:
Originally Posted by Nikse555 View Post
Could you provide a image/sup so I can try it?
Here you can find a good example.

Plus I did a fresh install with new stable version, deleting everything but latin.db.

Two fix OCR problems, that you can find both during binary compare OCR and Fix common errors:
  • "I" becomes "i"
  • "E " becomes "Es "
To help you find where (as example):

00:09:05,712 --> 00:09:07,297 Es lei ha detto: "Bene.

01:18:53,145 --> 01:18:54,581 INDIGNAZIONE: i CINQUE MOTIVI PER CUI O.J. SIMPSON SE L' CAVATA

P.S: it would be really nice to have the possibility to include a manually corrected word during OCR to dictionary, such as "AIIampanato" instead of "Allampanato". I can correct it by hand but as it is not included in dictionary, it will ask me the same word again and again. I wish the two buttons "add to noise" and "add to dictionary" could take count of the manually modified words too.
__________________
@turment on Telegram

Last edited by tormento; 18th June 2020 at 12:21.
tormento is offline   Reply With Quote
Old 18th June 2020, 16:19   #1137  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 96
Quote:
Originally Posted by tormento View Post
P.S: it would be really nice to have the possibility to include a manually corrected word during OCR to dictionary, such as "AIIampanato" instead of "Allampanato". I can correct it by hand but as it is not included in dictionary, it will ask me the same word again and again. I wish the two buttons "add to noise" and "add to dictionary" could take count of the manually modified words too.
At the moment you have four options for doing what you ask for:
1. Option / Settings / Word lists - here you can add any word to the dictionary with or without spelling distinction. You will add a replacement or fix any word during OCR. All in one step.
2. Use the [Unknown words] list during OCR - select any word in the list and use the buttons on the right. You can enter any words in the fields. What you enter - they will be saved in the dictionary or on the list to exchange.
3. Using [Spell check] - you can enter any word in the field and use the buttons below. Here, unfortunately, you will not add words that you would like to exchange for others.
4. Manual file editing: it_names_user.xml, it_IT_UseAlways.xml, ita_OCRFixReplaceList_User.xml, ita_OCRFixReplaceList.xml. Of course, not all four at once. You make changes to these files at your own risk.
In your case, all you have to do is use point 1 or 2, it depends what you are doing in the program.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 18th June 2020 at 18:04.
Janusz is offline   Reply With Quote
Old 19th June 2020, 08:57   #1138  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,635
Quote:
Originally Posted by Janusz View Post
At the moment you have four options for doing what you ask for
Thanks for your hints.

Number 2 is the most reasonable temporary solution.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 25th June 2020, 16:00   #1139  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 96
@Nikse555

1. The shift in drawing the vertical lines of the table in the[List view] window did not appear in version 3.5.16.
It has always been present.
This does not interfere with normal use of the program, but it spoils the overall impression.
The more that you usually work in the main program window.
If correcting this is not a big problem - I'd ask for it.



2.1. Each time the File / Compare window is opened with the option [Subtitle font size]> 8 for[List view],
the width of the columns [Start time] and [End time] is not calculated for a different font size and is,
for example, too small (see figure below).
The new set width is not remembered as in the case of the main window.
It is enough that the width of these columns is determined by the width of the columns for the main window.



2.2. If we want to compare the text with the content of another file, then the left table
of the [Compare] window is a reflection of the memory content for the main window.
Because after opening [Compare] we can still modify the text in the main window,
instead of closing and opening [Compare], the [Refresh] button would be useful to refresh
the contents of the left table from memory.

3. After importing subtitles from the ts stream, I have access to the [Greyscale]
and [Use color] options (marked in red).
I use this second option in four simple steps available in the program to set dialogs for lines by adding "-".
The effect can be see in the drawing in point 2 in the right table of the [Compare] window.



I want to ask if there is an important reason why these options are not available for importing subtitles
from sup files, png images from html directories? Or maybe they are the only ones I can't find.
If this is not a problem, I would ask you to add these options to make them always available.
As far as I remember - once upon a time - they were.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 25th June 2020 at 17:27.
Janusz is offline   Reply With Quote
Old 29th June 2020, 00:22   #1140  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 321
Bug: switching from Italic to non-Italic doesn't work inside a word.
  1. Install Latin.db.
  2. Open SUP file.
  3. No of pixels is space: 11.
  4. Go to subpic #837, press 'Start OCR', then 'Stop'.
The subtitle is recognized as
Code:
I'll <i>vafangoolyou!</i>
SE correctly recognized 'you' as non-Italic (we can make sure of it in 'Inspect compare matches for current image...'), though 'you' is enclosed in Italic tag in the recognized text.
__________________
Magically yours
Raistlin
GCRaistlin is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 14:08.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, vBulletin Solutions Inc.