Subtitle Edit [Archive] - Page 23

View Full Version : Subtitle Edit

Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

Janusz

10th June 2020, 19:27

@GCRaistlin

The problem is your text, not splitting and joining lines.
Organize the text first, then report the problem.

1
00:12:45,863 --> 00:12:48,864
Креветки и раки были лучше всего.
Они быстро сплавливались.

GCRaistlin

10th June 2020, 21:02

Bugs:

A really dangerous issue with 'Column - Paste from clipboard' feature. In 'Shift cells down' mode, we actually lose as many subtitles at the bottom as were inserted. You can easily check it on a file with only one subtitle: you'll never get the second (duplicate) line by performing 'Copy as text in clipboard' then 'Column - Paste from clipboard'.
I often get the following error performing 'Copy as text in clipboard':
https://i112.fastpic.ru/thumb/2020/0610/7e/282c2dd503e042e6600513a95a48127e.jpeg (https://fastpic.ru/view/112/2020/0610/282c2dd503e042e6600513a95a48127e.jpg.html)
'Retry' works OK. I believe it's because I use Ditto and TextBoard clipboard managers - they seems to cause a delay for clipboard operations or something like that.

varekai

11th June 2020, 08:34

@GCRaistlin
WTF!! Any admin or mod around to stop this idiot?
What about forum rules? Pron? It has to be a violation?
Do we have to put up with this fastpic.ru image hoster?
Everetime someone clicks on his links they get to see pron!! WTF!
Look here what that arrogant prick Rhinoceros-GCRaistlin links to...
https://imgur.com/a/8fjKszN

If he had a brain he would do it like this, right? Right!
https://imgur.com/a/rTdKWgZ

So please admin/moderator give this poster a
serious reprimand or why not ban him for a while?

Edit:
When I get a reply links will be removed.

tormento

11th June 2020, 10:35

Melan

11th June 2020, 10:51

Each language has its own rules. Therefore, users should create the appropriate .xml files.
This is e.g. a fragment of my file.
https://i.imgur.com/roHSlTs.png

Janusz

11th June 2020, 12:20

@Nikse555
I have found a constant bug in the italian OCR common errors correction, i.e. it transforms every standalone "I" to "i".
I have looked into xml dictionary files but can't find the rule at all.
Please let me know where to look into.

You won't find it in any xml. There is one option to blame for this, which you definitely have selected.
It is: Settings / Tools [Fix common OCR errors - also use hard-coded rules].
@Melan probably also has it marked from here his xml so extended to exaggeration.

Out of curiosity, I made your file Evangelion 2.22 [ita].
Unfortunately, you won't get different colors for the top and bottom for the subtitles srt. You need to export the file back to sup, so I don't know if it's worth playing with.

tormento

11th June 2020, 13:23

You won't find it in any xml. There is one option to blame for this, which you definitely have selected.
It is: Settings / Tools [Fix common OCR errors - also use hard-coded rules].
Yes, that's it.

@Nikse555 please put it as external file.

Out of curiosity, I made your file Evangelion 2.22 [ita].Unfortunately, you won't get different colors for the top and bottom for the subtitles srt. You need to export the file back to sup, so I don't know if it's worth playing with.
Yes, at the end I used srt for the forced sub (one line only, at the bottom, in the original sup) and sup for the normal one.

If only SE would support top and bottom time overlapping lines, I could use ass format and get rid of all those issues.

GCRaistlin

11th June 2020, 15:57

Internal (DirectShow) video engine displays video with a negative delay what makes it useless for adjusting the subtitles. Here (https://mir.cr/XMVAHBOR)'s a 30 sec video and subtitles example. When played back in SE ('Play from just before text') the subtitle appears simultaneously with the title text on the screen. When played back in MPC-HC the subtitle appears earlier than the title text. The difference is about 267 ms. Note that if we performing frame steps using 'Video position' field (by pressing Up and Down arrows there) there seems to be no delay.

Janusz

11th June 2020, 18:38

@GCRaistlin,
think or teach. There is no such thing as a video delay. This stream or streams of audio or subtitles have a delay relative to video.
SE always displays the video according to the time it is contained in the subtitles - i.e. delay = 0. Play just before the text has nothing to do with synchronization.

Nikse555

11th June 2020, 18:39

@Nikse555

I have found a constant bug in the italian OCR common errors correction, i.e. it transforms every standalone "I" to "i".

I have looked into xml dictionary files but can't find the rule at all.

Please let me know where to look into.

Could you provide a image/sup so I can try it?
(you can right-click on the image in the OCR window as click "Save as...")

Janusz

11th June 2020, 19:28

@ Nikse555
Since this also applies to the Polish language, you can use 192 "Batman" subtitles to show incorrect operation of this option for languages other than English.
1. [Fix common OCR errors - also use hard-coded rules] = disabled, my "pol_OCRFixReplaceList.xml" turns "l" into "I" and that's fine.
2. [Fix common OCR errors - also use hard-coded rules] = enabled, my "pol_OCRFixReplaceList.xml" turns "l" into "I", your function turns my "I" into "L" and it's bad.
I have now noticed that this is not done on the 195 line. Anyway, this is incorrect.

Edit:
Other lines from these inscriptions in which the exchange took place 255, 359, 448, 520, 934, 1128, 1136.
Lines in which the change did not take place: 195, 308, 560, 619. From the analysis of the text itself, some logic can be seen in these changes, but for the certainty I prefer not to use this function.

GCRaistlin

11th June 2020, 20:11

The problem is your text, not splitting and joining lines.
Organize the text first, then report the problem.

Thanks for your senseless post.

@GCRaistlin,
think or teach. There is no such thing as a video delay.
I don't care how to call this. The problem is definitely present.

Janusz

11th June 2020, 20:29

@GCRaistlin
If your subtitles start from 00:00:03.456 then after this time from the beginning of the film 00:00:00.000 they will appear on the screen "I don't care how to call this.".
If you start watching from 2 seconds, your subtitles will appear after 1.456 seconds and this is not a problem for either SE or MPC. What delay for subtitles you set in MPC is your problem.

GCRaistlin

11th June 2020, 20:48

Janusz
Have you downloaded my example and performed the steps?

I can't reproduce the joining issue now, so my apologies about "senseless post".

Melan

11th June 2020, 21:05

The demons are back. B255

https://i.imgur.com/V6OZ3Fn.png

Janusz

11th June 2020, 21:45

@GCRaistlin

https://drive.google.com/uc?export=view&id=1-AsywNf_kNgaUc_WCDEgbweKRL3H4iQD

In SE, there is no delay in subtitles relative to video. Subtitles are to start within a specified time.
SE does not take into account the delay for subtitles contained in streams ts, m2ts, etc. From the image you can see that the inscription will be displayed a bit too late and SE is used to fix it, because the subtitles have bad times.
What delay for subtitles is used in the m2ts stream you will learn with the help of MediaInfo, but this information will not be useful for the external subtitles you create.

GCRaistlin

11th June 2020, 23:28

From the image you can see that the inscription will be displayed a bit too late
On the contrary, it appears a bit too early - it should appear when the still title "A Martin Scorsese Picture" appears. Now see:
https://i112.fastpic.ru/thumb/2020/0612/65/_dafdd15f3d46a21e8617fafe0031a165.jpeg (https://fastpic.ru/view/112/2020/0612/_dafdd15f3d46a21e8617fafe0031a165.jpg.html) https://i112.fastpic.ru/thumb/2020/0612/1f/_956e7db42f638a2be35bedee2ce8b71f.jpeg (https://fastpic.ru/view/112/2020/0612/_956e7db42f638a2be35bedee2ce8b71f.jpg.html)
The left shot is of SE window, its timestamp is 00:20,389. The right screenshot is of MPC-HC, its timestamp is 00:20,395 - later then the left one. But it is actually earlier as the still title is going after the running one which tail can be seen on the right shot.

BTW how do you take screenshots of SE with video displayed correctly? I get the black screen instead (that's why I used the camera for the left shot).

Janusz

12th June 2020, 00:15

Your basic mistake is: you want to sync the video to subtitles.
Not the way. You won't change the video so you have to change the display time of the subtitle.
If you want the subtitles to be displayed earlier, e.g. when the text in the video appears on the screen,
but it does not stop, you need to speed up the subtitles so that they appear earlier. SUBTITLES not video.
Set 00:00:19.717 - 00:00:29.909 and your inscription "Фильм Мартина Скорсезе" will appear when the text
"A MARTIN SCORSESE PICTURE" appears on the screen and will last for as long as the text scrolls on the screen.

BTW how do you take screenshots of SE with video displayed correctly?
Normally: Left ALT + Print Screen, new bmp file, paste, save as png.

GCRaistlin

12th June 2020, 00:49

Your basic mistake is: you want to sync the video to subtitles.
No you're wrong. I'm afraid you don't even try to understand what I say.
I'm not going to change the video. I want to sync subtitles with video. If I do it with SE's internal player the result is fine - if I'm going to watch the movie with SE. But I am not - I'm going to watch it with MPC. And here I have a problem: the subtitles that are in sync with the video played back with SE's internal player are NOT in sync with the same video played back with MPC-HC. The screenshots that prove it are above.

Normally: Left ALT + Print Screen, new bmp file, paste, save as png.
It doesn't work for me.

Janusz

12th June 2020, 01:04

In addition, F1 F2 in MPC HC accelerate or delay subtitles.

GCRaistlin

12th June 2020, 02:03

Janusz
There's a better workaround: we can just apply a delay +267 ms to the subtitles after visual/waveform adjusting is complete. But fixing the issue would be even better.

varekai

12th June 2020, 08:47

BTW how do you take screenshots of SE with video displayed correctly? I get the black screen instead (that's why I used the camera for the left shot).
https://imgur.com/a/0VcVZf9

Janusz

12th June 2020, 10:47

@GCRaistlin
For the last time I am writing in this matter:

https://drive.google.com/uc?export=view&id=1eJ60FvbaQUIDSKrwzc5tXYcCCyCj0mjK

The text "A MARTIN SCORSESE PICTURE" begins to enter the screen from 474 frames and time 00:00:19.769.
At what time your inscriptions are to be applied to the image, this is your problem, not SE.

Edit 01:
I tell you one more time, learn. In the posts above, I wrote where you should look for a solution to your problem.
You do not sync to video only to the soundtrack and if the soundtrack has some noticeable delay compared to the video
you have to take this into account. SE won't do it for you.
Now that you know the exact delay for subtitles, just fix it in subtitles. To this end, this program was created.
And you for several posts looking for some error in the program.

GCRaistlin

12th June 2020, 11:08

The text "A MARTIN SCORSESE PICTURE" begins to enter the screen from 474 frames and time 00:00:19.769.

I'm feeling like we are close to your understanding the problem.
https://i112.fastpic.ru/thumb/2020/0612/03/_b80cf106aef7559e9a1c2784c8f2ba03.jpeg (https://fastpic.ru/view/112/2020/0612/_b80cf106aef7559e9a1c2784c8f2ba03.jpg.html)

varekai

12th June 2020, 13:10

Hello Subtitle Edit forum members!
Just wanted to warn you that GCRaistlin (http://forum.doom9.org/member.php?u=101288) images links to UGLY pron!
He also links to potentially unwanted application (JS/ExAds.A)
Anyone else than me who finds this an inappropriate behavior?

Janusz

13th June 2020, 20:26

Dangerous tool: "Inspect nocr matchet for ..."

In my opinion, this is an unauthorized change in the content of the character base. I suspect that this is not only the case described.
This text was created so that anyone who wants can check the situation at home.

@ Nikse555: If you fail to reproduce this error, I will send the files.

I do not know from which version there have been such major changes in the character database saving format that the new format is not read by stable versions 3.5.14 and 3.5.15. Beta 145 also no longer reads the new format. Subtitle Edit Changelog 3.5.16 (xth July 2020) BETA doesn't mention this. I wanted to reproduce the error described below on stable versions, unfortunately I was unable to load the new character database into these versions, and on those character databases it may not work the same way. Which does not mean that there is no problem there - it's once or twice - the question remains: how will the new version 3.5.16 take over the old character base.

Description of the problem in beta 269 and several earlier (261 for sure):
1. I created a new character base for new text consisting only of non-italic characters. That was my text.
= 238 (this value allows, in my case, to eliminate not all, but at least some character connections),
[No of pixels is space] = 4 (proper value for the font used in the text),
[Max wrong pixels] = 5 (maybe too hot, but I wanted to),
[Constains italic] = off (I will not, so I do not see the need for another setting),
[Line split ...] = Auto (works, so I don't change).
It's good to this place. After correcting a few errors in the character database by a better match I received the error-free text.
Conclusion: the character base for this text is error-free and contains 242 characters (this is important).

2. Time for "Batman" - this file probably contains everything possible to find something that may not work. ;)
[Draw missing texts] = off (I will only review how the new character base works with the same text (font), but also with italics),
[Max wrong pixels] = 10 (to see how it works and what mistakes it will make),
[Contains italic] = on (there are lines with italics, so - at least I understand it - based on this parameter and [Set italic angle ...] OCR should read italics correctly).
For this parameter and italics at all, you have written clearly that it is not working well yet, so this is not the purpose of the test here either.
We look for an italic line in my case, e.g. 283. To see if more characters can be obtained, I change [Max wrong pixels] to 25. Start, stop immediately.
I'm going back to line 283, there are new signs in the line, great. Characters in base 242 - nothing has changed.
Note: flags are only added when the entire word is recognized. Does not apply to single letters "A", "I" polish "z" and probably many more in different languages - here flags are added.

[B]Now we will destroy our base:
we choose on any line with italics "Inspect nocr matchet for ..." in the field "Ispect items" select the first character from the top and down arrow we go down to the last character. We can move the cursor up and observe the "Is italic" field. "v" for italics will not appear next to any character. We choose OK and close this window.
Because it can be hard to find a line or lines where you can see what changes have been made based on characters, it's best to run the scan again for the entire file, then "CTRL + F " and we already know:
in my case "A", "you?" etc.: Long to exchange, it has been marked . The number of characters in database 242 means that these characters have not been added as new. I will say that they have been marked in italics in the character database. Which and which characters will be marked this way - I don't know. It probably depends on what characters based on [Max wrong pixels] and [Set italic angle ...] OCR recognizes and considers italics.
The effect is that from now on, single italics will appear in the text, where there are no italics. We will have to add new characters in places where characters have already been added. Each time you open the "Inspect nocr matchet for ..." window, you may make further uncontrolled changes in addition to your changes.
The fact is that we obtained in this way, for example: "A" in italics, but lost all "A" in archived, future and currently processed files.
I have a few more comments, but this text is already too long, so on another occasion.

Nikse555

14th June 2020, 12:13

@Janusz: Yes, I've changed the .nOCR file format to be slightly more compact. SE 3.5.16 will be able to read both the old format from 3.5.15 and the new format. Version 3.5.15 however will not be able to read the new nOCR format from 3.5.16.
nOCR now uses the "margin-top" value (useful for e.g. comma vs apos), so all nOCR files from 3.5.15 and older will not work optically.

Beta 276 (or later) is now here: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip
(fixed misc minor issues - mostly regarding expanded characters)

I was not able to re-create the italic-check-issue...

Janusz

14th June 2020, 12:59

@Nikse555
I will send by e-mail the full set of Polish xml and character base. I hope you still have Batman files.

Edit:
On uploaded files you can immediately check the case described
here: https://forum.doom9.org/showpost.php?p=1915280&postcount=1104
and my answer here: https://forum.doom9.org/showpost.php?p=1915295&postcount=1106

jlw_4049

14th June 2020, 13:30

@niksee555 Thanks for your work. Use the program pretty constantly.

Sent from my SM-G986U1 using Tapatalk

Nikse555

14th June 2020, 14:44

@Janusz: I think you have to click "Add better match" on the false italics... and add the same letter again just without italic. Or... I might be misunderstanding.

@jlw_4049: You're welcome, and thx :)

Janusz

14th June 2020, 18:02

@ Nikse555
Yes, this is the cure and I use it.
It's just that, like any medicine, it helps at one thing, which is harmful to another. And so it is in this case. I will recover, for example: "A", but I will lose another mark. I've seen this SE behavior before, but I didn't know where it came from. I thought - my mistake - I added another sign and it was good. For some time I started to suspect [Constains italic] about it, so with normal use of SE this option is permanently disabled. Like [Fix common OCR errors - also use hard-coded rules], which I use only after OCR.
Thank you for your work and your time.

Edit:
An excellent move: :thanks:


Edit 02:
Problem with "c", "w" and "." at the end of the line.
Image to download (https://drive.google.com/uc?export=view&id=1GxbEbG9h6yL67jL3UJYsMPLfJazQvNIQ)

https://drive.google.com/uc?export=view&id=1El2jaMLvXrDJjcJxYM9dN9bUBhc5F8wa

New character base created during OCR with the "Draw missing texts" option enabled. Despite entering the characters correctly, the text is not displayed correctly.
The last "s" instead of a dot was matched automatically without my participation.
Each re-import of the image into OCR causes the effect visible in the image. Such distorted text is transferred to the main program window.

Edit 03: 15.06
Correct text can be achieved, but at what cost and for how long?
We turn off the "Import / OCR ..." window, import our image again into the program.
In the "Import / OCR ..." window we turn off [Draw missing texts], create a new character base and press START OCR, as a result we get the same "*" - this result is correct.
Using "Inspect nocr matches ..." we add a better match for the first "C", "W", "." and "-".
We can press START OCR - we will see that everything is in place. The character database contains only 4 characters that we have entered.
In the next step, select [Draw missing text] to enter the text faster. We add the next missing characters from "o" to "m".
The "Import / OCR ..." window has closed. We look at the effect. Is fine.
Someone will ask: what do I mean?
That's it: before you press START OCR, start observing the text that looks good so far. First press - the second "C" has disappeared,
the next press has no first "C", one more press and we got rid of "W".
A look at the character base - we have lowercase letters instead of capital letters.

@Nikse555, please take a look at this. Somewhere there is an error that is responsible for such behavior of the program.

In one of the earlier posts I wrote that re-scanning the text will fix previously made mistakes.
I'm not backing down out of it. This is the reality. In this particular case, however, it failed.

Edit 04. 16.06
Today I added a new image "t.03.z_and_Z.png" to the archive "Image to download", after importing the image into the program
before scanning I chose the "Latin" character base, [Draw missing text] disabled. SE version 3.5.16.
First scan: "22 P*Dz!ERN!KA 2**1 YEAR"
Second scan: "22 P*DZ!ERN!KA 2**1 YEAR" - this is correct
As you can see, the small "z" has changed into a large "Z". Why is this happening?
It seems to me that [Try to guess unknown words] has gained new opportunities not only for English. :)
@ Nikse555, you and the whole team - congratulations on the release of the new stable version of the program?

Nikse555

17th June 2020, 12:04

As you can see, the small "z" has changed into a large "Z". Why is this happening?

@ Nikse555, you and the whole team - congratulations on the release of the new stable version of the program?

Yes, thx. SE 3.5.16 is out now: https://github.com/SubtitleEdit/subtitleedit/releases
(Released a bit earlier than planned due to changed spell check dictionary links).
And nOCR would not have been released/improved without your input Janusz :)
By the way, your image gives a 403.

SE 3.5.16 introduces the first (non-beta) version of nOCR.
A bit like "image compare" but just with lines which makes it easier to scale and recognize different font sizes.
nOCR can also be trained with different fonts fairly easy!!!
Just tried (really fast) to make a small tutorial: https://nikse.dk/SubtitleEdit/nocr

In nOCR, casing of "z" and some other letters are determined by average size of letters... so the first few lines may be different in second run.

Janusz

17th June 2020, 13:10

By the way, your image gives a 403.
My mistake. I haven't changed my access rights, sorry. The link should work.
In nOCR, casing of "z" and some other letters are determined by average size of letters ... so the first few lines may be different in second run.
That's right. The first case is the first line in the text, the second one appears in the text as line 7.
A bit like "image compare" but just with lines which makes it easier to scale and recognize different font sizes.
Probably as a result of this I could not add a large "Z" as a new character. Only loading another text ended my fight to add "Z".
So far it works well, it distinguishes [ , ] and [ ' ], well done. Thank you.
Just tried (really fast) to make a small tutorial:
I have read. I think the information it contains is sufficient.

varekai

17th June 2020, 17:08

@Nikse555
Thanks for the update! Much appreciated!

jlw_4049

18th June 2020, 07:16

Yes, thx. SE 3.5.16 is out now: https://github.com/SubtitleEdit/subtitleedit/releases
(Released a bit earlier than planned due to changed spell check dictionary links).
And nOCR would not have been released/improved without your input Janusz :)
By the way, your image gives a 403.

SE 3.5.16 introduces the first (non-beta) version of nOCR.
A bit like "image compare" but just with lines which makes it easier to scale and recognize different font sizes.
nOCR can also be trained with different fonts fairly easy!!!
Just tried (really fast) to make a small tutorial: https://nikse.dk/SubtitleEdit/nocr

In nOCR, casing of "z" and some other letters are determined by average size of letters... so the first few lines may be different in second run.Thanks for the update. I'll grab latest version tomorrow and test it out! :)

Sent from my SM-G986U1 using Tapatalk

tormento

18th June 2020, 12:12

Could you provide a image/sup so I can try it?
Here (https://www.mediafire.com/file/gx5sl5oy4j9etqt/jewell_PID_1203_ita.7z/file) you can find a good example.

Plus I did a fresh install with new stable version, deleting everything but latin.db.

Two fix OCR problems, that you can find both during binary compare OCR and Fix common errors:

"I" becomes "i"
"E " becomes "Es "

To help you find where (as example):

00:09:05,712 --> 00:09:07,297 Es lei ha detto: "Bene.

01:18:53,145 --> 01:18:54,581 INDIGNAZIONE: i CINQUE MOTIVI PER CUI O.J. SIMPSON SE L'È CAVATA

P.S: it would be really nice to have the possibility to include a manually corrected word during OCR to dictionary, such as "AIIampanato" instead of "Allampanato". I can correct it by hand but as it is not included in dictionary, it will ask me the same word again and again. I wish the two buttons "add to noise" and "add to dictionary" could take count of the manually modified words too.

Janusz

18th June 2020, 16:19

P.S: it would be really nice to have the possibility to include a manually corrected word during OCR to dictionary, such as "AIIampanato" instead of "Allampanato". I can correct it by hand but as it is not included in dictionary, it will ask me the same word again and again. I wish the two buttons "add to noise" and "add to dictionary" could take count of the manually modified words too.
At the moment you have four options for doing what you ask for:
1. Option / Settings / Word lists - here you can add any word to the dictionary with or without spelling distinction. You will add a replacement or fix any word during OCR. All in one step.
2. Use the [Unknown words] list during OCR - select any word in the list and use the buttons on the right. You can enter any words in the fields. What you enter - they will be saved in the dictionary or on the list to exchange.
3. Using [Spell check] - you can enter any word in the field and use the buttons below. Here, unfortunately, you will not add words that you would like to exchange for others.
4. Manual file editing: it_names_user.xml, it_IT_UseAlways.xml, ita_OCRFixReplaceList_User.xml, ita_OCRFixReplaceList.xml. Of course, not all four at once. You make changes to these files at your own risk.
In your case, all you have to do is use point 1 or 2, it depends what you are doing in the program.

tormento

19th June 2020, 08:57

At the moment you have four options for doing what you ask for
Thanks for your hints.

Number 2 is the most reasonable temporary solution.

Janusz

25th June 2020, 16:00

@Nikse555

1. The shift in drawing the vertical lines of the table in the [List view] window did not appear in version 3.5.16.
It has always been present.
This does not interfere with normal use of the program, but it spoils the overall impression.
The more that you usually work in the main program window.
If correcting this is not a big problem - I'd ask for it.

https://drive.google.com/uc?export=view&id=1_nQ_jkraj-PVyYN8c0pD5eBwKEldKBmC

2.1. Each time the File / Compare window is opened with the option [Subtitle font size]> 8 for [List view],
the width of the columns [Start time] and [End time] is not calculated for a different font size and is,
for example, too small (see figure below).
The new set width is not remembered as in the case of the main window.
It is enough that the width of these columns is determined by the width of the columns for the main window.

https://drive.google.com/uc?export=view&id=1D_HwXo4UJjkSBoj90dR2cPH9Yhcx_i3H

2.2. If we want to compare the text with the content of another file, then the left table
of the [Compare] window is a reflection of the memory content for the main window.
Because after opening [Compare] we can still modify the text in the main window,
instead of closing and opening [Compare], the [Refresh] button would be useful to refresh
the contents of the left table from memory.

3. After importing subtitles from the ts stream, I have access to the [Greyscale]
and [Use color] options (marked in red).
I use this second option in four simple steps available in the program to set dialogs for lines by adding "-".
The effect can be see in the drawing in point 2 in the right table of the [Compare] window.

https://drive.google.com/uc?export=view&id=1W5gd-Rs9yyAUoowaPItmFLyfjrViQp4P

I want to ask if there is an important reason why these options are not available for importing subtitles
from sup files, png images from html directories? Or maybe they are the only ones I can't find.
If this is not a problem, I would ask you to add these options to make them always available.
As far as I remember - once upon a time - they were.

GCRaistlin

29th June 2020, 00:22

Bug: switching from Italic to non-Italic doesn't work inside a word.

Install Latin.db (https://mir.cr/Z31BR2S0).
Open SUP file (https://mir.cr/HFECXRLR).
No of pixels is space: 11.
Go to subpic #837, press 'Start OCR', then 'Stop'.

The subtitle is recognized as

I'll vafangoolyou!

SE correctly recognized 'you' as non-Italic (we can make sure of it in 'Inspect compare matches for current image...'), though 'you' is enclosed in Italic tag in the recognized text.

Janusz

29th June 2020, 08:02

@GCRaistlin
Use the US English dictionary for OCR, select [Fix OCR errors] and [Try to quess unknow words] as a result of which you will receive your
I'll vafangool you!

GCRaistlin

29th June 2020, 10:17

Janusz
What does it have to do with the reported issue? This time your trick helps (maybe, I didn't check), next time it won't.

varekai

30th June 2020, 08:55

I'll vafangool you! (https://streamable.com/7o2d68)

tormento

11th July 2020, 08:34

Could you provide a image/sup so I can try it?
I saw you updated beta but you never replied to my post (https://forum.doom9.org/showthread.php?p=1916021#post1916021).

jlw_4049

16th July 2020, 17:28

Still having major issues with music notes in the latest beta version for tesseract/binary.

http://www.mediafire.com/file/2pyfynot2lx6lb4/example.sup/file

There is a file that I've had the issues on.

Nikse555

18th July 2020, 09:05

@Tormento: I've tested your sup file and it works fine... I don't get the strange replacements that you get, so you should probably do a clean install (delete all old SE files before - including those in %appdata%\Subtitle Edit).
EDIT: Also, latest beta has improved casing in OCR for italian letter "Ú": https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.16/SubtitleEditBeta.zip

@jwl_4049: You should request better support for music symbols for tesseract here: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.16/SubtitleEditBeta.zip
Or you could try "nOCR" or "Binary image compare"...

tormento

18th July 2020, 10:00

I've tested your sup file and it works fine.

You are right about "Es", I have found it in ita_OCRFixReplaceList_User.xml somehow...

The "i" issue comes from ita_OCRFixReplaceList.xml, where "l" is replaced by "i" and sometimes a "I" is OCR as "l".

Also, latest beta has improved casing in OCR for italian letter "Ú"

We don't have any Ú letter in Italian.

We do have ù and Ù. :p

loninapleton

19th July 2020, 06:35

Not hijacking anything. I just need to know if this is the major forum discussion for Subtitle Edit. I am just beginning to do translations. Some online tools are available. But my current need is to get an ASS file which is translated English to Polish saved as SRT or VOB that is recognized by MKVToolnix.

I'm only getting a text save in Subtitle Edit. Please give the steps of getting this kind of save. And thank you for this amazing tool.

jlw_4049

19th July 2020, 07:21

@jlw_4049: You should request better support for music symbols for tesseract here: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.16/SubtitleEditBeta.zip
Or you could try "nOCR" or "Binary image compare"...

Im not sure what nOCR is. All I see is binary or tesseract.

I'll look more into it tomorrow when I get off.

Sent from my SM-G986U1 using Tapatalk

Nikse555

19th July 2020, 08:32

@tormento: I could not find lines where "l" is replaced by "i"... could you give some line numbers? (thx about the italic letter accent U)

@loninapleton: You can open the ASS file and change format in the toolbar to "SubRip (.srt)" (SubRip is the topmost format in the drop down list).
You can also convert multiple ASS files to SubRip (.srt) via Tools -> Batch convert or by using command line convert.

@jlw_4049: If you cannot see the OCR method "nOCR" then you probably don't use SE 3.5.16?