Subtitle Edit [Archive] - Page 20

View Full Version : Subtitle Edit

Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [20] 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

jlw_4049

7th May 2020, 22:38

Could you upload a sample where this happens?
SE tries to fix this via the eng_OcrFixReplaceList.xml file...

So I re-opened the .sup today and it wasn't doing it at all. I walked a way for a bit after minimizing it and the display screen was enlarged when I opened it back up.

It was zoomed in about 50% to much and focused on the left side.

Here is the .sup I was able to reproduce it with.

http://www.mediafire.com/file/r69v2z6h7vyhkmk/sample.sup

Nikse555

8th May 2020, 10:06

...after minimizing it and the display screen was enlarged when I opened it back up.

thx for the file - karoke :)

The OCR window back-from-minimized should be fixed in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip

tormento

8th May 2020, 11:52

@Nikse555,

please of the feature list I posted, at least put Italic on the right side of the character to input during binary compare.

You don't know how many times I forget to enable/disable italic and delete part of the OCR database...

GCRaistlin

8th May 2020, 14:43

Bug:

Revoke write access to Waveforms directory for the current user.
Generate waveform data for a video file. You'll get an error.
Grant write access to Waveforms directory for the current user.
Press Retry.

Nothing happens. The 'Generate waveform data' window stays forever on the screen. No waveform data is being written to Waveforms directory.

GCRaistlin

8th May 2020, 17:48

Feature request: apply undo/redo to blocks of similar actions rather than to separate actions. For example, I adjust the boundaries of a subtitle using the waveform - by moving a boundary with the mouse. I perform one moving but SE considers it as a chain of small boundary shiftings. Hence, these actions replace the older ones in Undo stack. This is pointless as makes it harder to undo the whole action and makes it unable to undo the older actions.
I suggest to introduce a new setting: "Consider similar actions as separate if there are at least x seconds between them". Then, if x is set to 3, adjusts (e. g. by clicking on small arrows in 'Start time' or 'Duration' fields) if there were less than 3 secs between any two of them are considered as one action for Undo/Redo.

GCRaistlin

8th May 2020, 20:28

Feature request: keyboard shortcut for Video - Show/hide waveform.

GCRaistlin

8th May 2020, 20:57

Feature request: ability to set mouse wheel scroll step. Now it is 2 subtitles, I would like to have it set to 1 subtitle.

jlw_4049

9th May 2020, 04:37

thx for the file - karoke :)

The OCR window back-from-minimized should be fixed in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip

Wanted to post back and say the issue with it zooming in to far on the 'subtitle image' window has not been solved with the latest Beta.

https://i.imgur.com/hX6gCfh.png

It is still doing this here when you minimize and come back. Although it seems like the program is reading them just fine.

EDIT: Maximizing the window seems to allow you to read it, however, staying in the smaller window it looks like the picture that I posted above.

Janusz

9th May 2020, 11:52

Wanted to post back and say the issue with it zooming in to far on the 'subtitle image' window has not been solved with the latest Beta.

@ Nikse555
I checked at home. The latest version 3.5.15 NEXT, beta 51 scales this window correctly.
For me, only the first stable version 3.5.15 had a problem with this.
It looked exactly like u jlw_4049.

tormento

9th May 2020, 12:37

Italian OCR correction wants to change
334
00:19:23,913 --> 00:19:25,081
ABBASSO I GLADIATORS
to
334
00:19:23,913 --> 00:19:25,081
ABBASSO i GLADIATORS

Nikse555

10th May 2020, 06:56

@tormento:
"ABBASSO I GLADIATORS" is not changed here... wrong language or something in your dictionaries?
Also, I'm not sure what you mean by "at least put Italic on the right side of the character to input during binary compare." - could you make a screenshot?

@jlw_4049/Janusz: I also cannot re-create the resize-and-restore-issue in latest beta, but I'll test on a few other computers.
jlw_4049, did you check version in Help -> About - also, how do you restore the minimized OCR window?

Latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip
Contains some good fixes for rippers:
- Bluray sup files could miss some images (where a subtitle would be expanded with more text)
- Teletext from .ts/.m2ts/.mts sometimes missed last subtitle

jlw_4049

10th May 2020, 07:04

tormento

10th May 2020, 11:41

@tormento: "ABBASSO I GLADIATORS" is not changed here... wrong language or something in your dictionaries?
Same issue with:
871
01:00:00,263 --> 01:00:02,849
<i>Ripeto. I sospetti del Nite Owl
sono scappati.</i>

https://i1.lensdump.com/i/jBaLA7.md.png (https://lensdump.com/i/jBaLA7)

Here (https://www.mediafire.com/file/3oi17eue9oj1tfd/%5Bita%5D.7z/file) is the srt.

Fresh install. The OCR files are the ones you distribute.

Also, I'm not sure what you mean by "at least put Italic on the right side of the character to input during binary compare." - could you make a screenshot?

Here it is:

https://i.lensdump.com/i/jBaMVr.md.png (https://lensdump.com/i/jBaMVr)

tormento

12th May 2020, 18:08

It would be nice, when aborting OCR recognition, not to cancel the text of the current paragraph, but let it until the unrecognized character.

Sometimes it happens that some strange symbol can't be corrected by simply expanding and I have to abort to enter it manually. Unfortunately I have to enter the whole text!

Janusz

12th May 2020, 22:15

If we are already talking about it there is some inconsistency in the window operation
<Import/OCR Blu-ray (.sup)...> without consideration to the Selected OCR method.

Maybe someone so wanted so yes it works, but:
when the OCR process is stopped at the selected <Binary image compare>
or <OCR via nOCR> is as he wrote @tormento above.
When you select <Tesseract>, the line is recognized to the end of the
and only then the process is stopped.

the right side of the window and the 3rd list: <Unknow words>, <All fixes> and <Guesses used>.
When the OCR process works, these lists are populated accordingly.
When the process is stopped and resumed, the <Unknow words> list is cleaned completely,
and the other two do not. Therefore, always before the resumption of the process, I must first
check unknown words or correct errors in the <Unknow words> list before they disappear.

I think a better solution here would be to add to the list just as in the other two.
And ideally, in all 3 lists, the new text replaces the old from the line from which the process
was resumed and was not remarked at the end.

also in the window <VobsubOCRNOcrCharacter> not only in <VobSub - Manual image to text>,
wrote about it @GCRaistlin here. (https://forum.doom9.org/showpost.php?p=1909844&postcount=895)
The <Skip entire image> button could be useful, e.g. for illegible images and more.

Finally: There is an error in Polish translation to the program in line 2528:

is: <Skip>P&omoń</Skip>

to be: <Skip>P&omiń</Skip>

Melan

13th May 2020, 11:28

is: <Skip>P&omoń</Skip>

to be: <Skip>P&omiń</Skip>

And other error (line 2532):

<AutoSubmitOnFirstChar>Autom. proponuj &amp;pierwszy znak</AutoSubmitOnFirstChar>

<AutoSubmitOnFirstChar>Autom. proponuj pierwszy znak</AutoSubmitOnFirstChar>

borifax ;)

Nikse555

13th May 2020, 14:51

Same issue with:
[CODE]
https://i.lensdump.com/i/jBaMVr.md.png (https://lensdump.com/i/jBaMVr)

Good idea, fixed in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip

Also, "Skip" in the OCR char window will now only skip from current character (and not the whole line).

@Melan/Janusz: thx - updated Polish translation.
(the "&" string will cause the following letter to be a shortcut - e.g. "&Skip" will react to the "Alt+S" shortcut).

Janusz

14th May 2020, 14:36

Note: Applies to version 3.5.15 NEXT, beta 92.

Thank you for this change, Nikse555.

Error creating <Unknow words> list.

https://drive.google.com/uc?export=view&id=1W5TpqxtLlct48xd2Fy9s7pQROFYuk2X1

Lines # 40, # 73 and # 81 - we have the word FBl there, and it has to be FBI.
I have already added the word FBI to the dictionary "names.xml" once.
By <Add pair to OCR replace list> I add FBl to FBI. I start OCR and I have it:

https://drive.google.com/uc?export=view&id=1mUqAarUdIUSzQIfZMeQ2RfJa8ZcrE4G1

In the <Subtitle text> window you can see that the conversion has been made and the word is known. This confirms the green color for this line.
Only that in <Unknown words> still hangs line # 40: FBl, although without # 73 and # 81.
Adding more word pairs works correctly - they do not appear again in the list. Well, unless there is no new word in the dictionary.
Line # 40 in this particular case will disappear only when I close the <Import / OCR Blu-ray ...> window and start the whole OCR process again.
But then another line with a different word will be the first forever with us until the window is closed.
I also checked it for words added to the dictionary - the first line displayed with the unknown word does not disappear.

Edition 1
The duplicate first lines will always appear on the second and subsequent file scans on all 3 lists also after changes made automatically
by the rules from the OCRFixReplaceList_User, OCRFixReplaceList files or with the option enabled <Fix common OCR errors ...] in Option/Settings/Tools.
They will not appear for automatic conversion of "l" (lowercase L) into "I" by Subtitle Edit, but we still don't see it on any of the lists,
except for an unknown word, when such a replacement creates a new incorrect word.

tormento

14th May 2020, 15:12

Also, "Skip" in the OCR char window will now only skip from current character (and not the whole line).
Thanks and please apply to abort too.

Nikse555

16th May 2020, 20:49

Latest beta has new (and hopefully improved) detection of space between italic letters: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip
Do let me know how it works! (it uses the value from "Set un-italic factor" in the list view context menu - probably normally between 0.22-0.32)
@tormento: thx for the test .sup files :)

Thanks and please apply to abort too.
I actually ment that it works for the "Abort" button ;)

@Janusz: I've fixed an issue related to your last post, but it's really hard to test without your exact setup/sup... could you make a .zip archive with all relevant files, if latest beta still has issues?

GCRaistlin

16th May 2020, 23:22

jlw_4049

17th May 2020, 00:52

The latest BETA struggles with ♪ characters very badly.

Nikse555

17th May 2020, 06:41

When performing OCR it is unable to add a proper match for the percent sign (https://mir.cr/10PHMJUD, # 68): SE recognizes its first part as "o". To add a better match, I deleted this "o" from the DB and run OCR again. This time the first part was recognized as "O", and 'Delete' button is inactive.

thx for the file :)
To fix "%" double click in the list view in main OCR window, then right-click in the list box in the "Inspect" windows and choose "Add better multi match", then expand the images to cover the "%" sign:
https://nikse.dk/ocr-percent.png

The latest BETA struggles with ♪ characters very badly.
I probably need more info... subtitle + screenshots... you're using Tesseract for OCR'ing?

Nikse555

17th May 2020, 08:25

Shortcuts for the "OCR Character" window is:
https://nikse.dk/se-ocr-char.png

Expand selection: Alt + arrow right
Shrink selection: Alt + arrow left
Toggle italic: Ctrl+I (+ Alt+I depending on translation)
Toggle auto-submit-first-char: Alt+F (depending on translation)
Skip current letter(s): Esc (+ Alt+S depending on translation)
Skip entire subtitle: Ctrl+Shift+S (new shortcut)

tormento

17th May 2020, 11:37

Shortcuts for the "OCR Character" window is
Now that you made me think about it, it would be nice to have the capability to expand both right and/or left side. Sometimes % character has bad OCR on the left and/or on the right too. The only think I can do now is abort and input it manually. Two buttons such as

|←|expand|→|
|→|shrink|←|

or the same changing function with SHIFT key would be nice.

P.S: The red italic word on the right of the character is great. You could remove the one on top of the window now. :)

GCRaistlin

17th May 2020, 22:39

What does auto-submit-first-char do?

Nikse555

17th May 2020, 23:02

What does auto-submit-first-char do?

It will use the first key down as the OCR letter without waiting for a click on "OK" or the "Enter" key pressed.

I often set the error rate to zero when starting OCR of a new sub for the first 10-20 lines, in which case I add a lot of single letters, and that's much faster without having to press the "Enter" key or the "OK" button.
(you need to turn it off again, if the prompt is for a multi letter image, like "ft")

GCRaistlin

17th May 2020, 23:11

Nikse555
Thanks. I'd say it is needed to add a brief explanation for this option to the UI, as long as for "add better multi match", as these options' names aren't self-explanatory.

GCRaistlin

17th May 2020, 23:25

thx for the file :)
To fix "%" double click in the list view in main OCR window, then right-click in the list box in the "Inspect" windows and choose "Add better multi match", then expand the images to cover the "%" sign:

Something went wrong. I performed all the actions above, then rerun OCR from this line - SE didn't ask me anything but the percent sign is missing in the recognized line:
https://i112.fastpic.ru/thumb/2020/0518/f0/f29c6ed49b180fc586df360f20e75cf0.jpeg (https://fastpic.ru/view/112/2020/0518/f29c6ed49b180fc586df360f20e75cf0.jpg.html)

UPD: It seems that I didn't enter "%" to the field. It's worth to check if it isn't empty...

GCRaistlin

17th May 2020, 23:47

Bug(s):
Follow the steps above but add a wrong match, for example "@".
Start OCR from the same line, then interrupt it.
Call 'Inspect compare matches' window.
Delete the wrong match, add the right match, press OK.
Start OCR from the same line again.
You'll get 'VobSub - Manual image to text' window for the char you have just added a match for. And by the way the window title is incorrect - it's not the VobSub being recognized. But let's go further.
Press Abort, try to add multi match again. You'll get 'Image already in db' error.

Janusz

18th May 2020, 03:04

@Janusz: I've fixed an issue related to your last post, but it's really hard to test without your exact setup/sup... could you make a .zip archive with all relevant files, if latest beta still has issues?

I use Windows 10, 64 bit. For this test Subtitle Edit 3.5.15 NEXT, beta 106, and nOCR.
For the purposes of the test I am not using pol_OCRFixReplaceList_User.xml.
Settings.xml, pol_OCRFixReplaceList, test.sup, test.db (incomplete), test.nocr in janusz.test.zip for download. (https://drive.google.com/uc?export=view&id=1RkRsnLSHtUeOoArF9KH9Y9T7Bd9ioX5e)
Images for this file come from various subtitles, hence many duplicate characters, but this is not a problem.
These or other characters are to be interpreted (read) correctly. This is the assumption.

I know that the sup file for this test was generated from images containing some error and the number of errors
(5 in 4 lines) has nothing to do with the number of errors in the text consisting of a thousand or more lines.

https://drive.google.com/uc?export=view&id=1usEFhkMjzJz_1TuzgyGO7ioiY0-DxXn7

To begin with, the analysis of text created without a dictionary - that is, how the program itself deals with OCR.

1. Lines 5, 6 and 7 we see "I", which we do not have in the character database. Creating the base for this example
was not possible because the text does not contain "I" at all. So where does this come from? Suspicion falls on the program.
Browsing this forum, not everyone looks here, we'll find out that the program can replace l with I: at the beginning,
in the middle and, surprisingly, at the end of words written in lower case.
And also at the beginning of a paragraph or task - example line 5 where the dot in this case does not mean the end of the sentence.
Lines 6 and 7 in the original texts were a continuation of the sentence and should not be changed.
I will add that the words "lub" (or), "lecz" (but) are used in Polish often so for normal, full text there will be many mistakes.

I believe that the program function, which always works, cannot generate errors for any selected language, especially in its absence.
What have we gained? 3 errors instead of 0 (zero). With longer texts, the number of good replacements will always be less than
the number of errors for a simple reason. Statistical "I" is less common than "l" at the beginning of words, and certainly not
in the middle or end of words written in lowercase. Therefore, I would prefer to correct only errors arising in the OCR process.
Why do I need extra?

2. Line 8. There are 2 cases of combined words here. "chybajuż" and "przynajmniejpod".
I can improve them by reducing [No of pixels is space] to 3. I will get "przynajmniej pod" - that's OK, the rest of the text above.
The phrase "chybajuż" will divide into two words "chyba już" only at 2. However, now OCR found additional apostrophes,
which at the beginning creating a character base I combined into one ["].
The effect: line 8 is OK, but the text above went apart. [No of pixels is space] parameter is too small,
hence my request in one of the previous posts for a different space for italics.

Interesting fact: selecting [Inspect nocr matches for current image ...] on line 8 will display the text correctly with appropriate spacing for [No of pixels is space] = 2, pressing OK will not save any changes to the text, however, because this window is only for characters in the database. If selecting OK saved these changes to the text would be great, at least until the italics problem is solved globally.

OKAY. To deal with line 8 I return to setting [4]. I switch the dictionary to Polish.
The pol_OCRFixReplaceList.xml file already contains a <WordPart from = "j" to = " j" /> line in the <PartialWords> section
- this is OK for "chybajuż" - but let's see what happened with "przynajmniejpod".
Based on a comment to this section: the program added a space before "j", did not find in the dictionary either "przynajmnie"
or "jpod" - such words do not exist in Polish. For me, the repair program should end its work at this stage and change nothing.
Why did he divide the program by "j" and also replace "p" with "j". I could use <WordPart from = "j" to = "j "> for this and similar expressions,
but such a conversion in at least the Polish language will divide one correct word into two other also correct, e.g. "najjaśniejszy" (brightest)
to "naj" (most) and "jaśniejszy" (brighter) so I can't use it. In addition, I will not see such a replacement on the [All fixes] list or on [Guesses used] as opposed to substituting "p" for "j". This replacement is visible and can be quickly corrected manually.
Bottom line: it remains to improve "improved" again, as in item 1.

I saw in some files, e.g. dan_OCRFixReplaceList.xml, in the part concerning division into two words, such a notation,
e.g. <WordPart from = "o" to = "e" />. Why is this supposed to serve as not just a simple conversion of "o" to "e".

3. Now lines 1 to 4. They look flawless - that's how it is. Please perform [New], we will create a new character base,
any name other than "test", press [Edit], [Import] - indicate our base "test.nocr", [OK].
We return to OCR, we set ourselves on the first line and [Start].
Result: during import we lost all characters resulting from the combination of 2 or 3 adjacent characters, i.e. [''] is ["], [o/o] is [%].
This is what it looks like. Characters added by the extension to the adjacent character or characters,
are invisible in the database once, and two are lost when importing into a new character base.

It's a lot, but I wanted to write more than just "not working".

Thank you for the dark background in [Set un-italic factor].
I wanted to ask for this for a long time.

varekai

18th May 2020, 12:52

Something went wrong. I performed all the actions above, then rerun OCR from this line - SE didn't ask me anything but the percent sign is missing in the recognized line:
https://i112.fastpic.ru/thumb/2020/0518/f0/f29c6ed49b180fc586df360f20e75cf0.jpeg (https://fastpic.ru/view/112/2020/0518/f29c6ed49b180fc586df360f20e75cf0.jpg.html)
UPD: It seems that I didn't enter "%" to the field. It's worth to check if it isn't empty...
WTF!! You are extremely obnoxious!
Are you really that stupid?
If you wanna post a link to your neverending images use imgur.com and point directly to the jpg
https://i.imgur.com/7jpu1si.jpg
or use imgur.html
https://imgur.com/a/dOBSfAA
Please stop using fastpic*ru it's awful!!
Grr...

Nikse555

18th May 2020, 14:50

@Janusz: Sorry, I've not really done any work with "nOcr (line ocr)"... I've mostly done stuff to improve "Binary image compare"
With latest beta ( https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip ) I get this result with your sup file:

https://nikse.dk/se-doom9-pol.png

Janusz

18th May 2020, 15:20

Thank you, Nikse555.
I also thought that nothing was happening with nOCR.
Please, look again at line 8 at home and my attention 2 above. Why this division and why is "p" converted to "j"?

The nOCR method gave me the same result with latest beta 119

Nikse555

18th May 2020, 15:48

@Janusz: yes, thx :)
line 8 seems to be a bug - I'll look into it.

Nikse555

18th May 2020, 16:35

@Janusz: Beta updated: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip
(also tried to fix the nOcr issues)
Works best with pixes-is-space = 3 for me...

Bug(s):
Follow the steps above but add a wrong match, for example "@".
Start OCR from the same line, then interrupt it.
Call 'Inspect compare matches' window.
Delete the wrong match, add the right match, press OK.
Start OCR from the same line again.
You'll get 'VobSub - Manual image to text' window for the char you have just added a match for. And by the way the window title is incorrect - it's not the VobSub being recognized. But let's go further.
Press Abort, try to add multi match again. You'll get 'Image already in db' error.

Thx, should also be fixed in above beta.

jlw_4049

18th May 2020, 16:36

I will try next beta out [emoji846]

Sent from my Pixel 3a using Tapatalk

Nikse555

18th May 2020, 18:50

@Janusz: And now really fixed the italic-space-stuff in nOcr: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip

Janusz

18th May 2020, 21:07

@Janusz: And now really fixed the italic-space-stuff in nOcr: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip

It's perfect now. Two lines in pol_OCRFixReplaceList.xml

<WordPart from = "ą" to = "ą " />
<WordPart from = "j" to = " j" />

divide expressions consisting of two or even three combined words into single words. With an earlier amendment regarding "l" and "I", the text consisting of 1189 lines, of which almost half was written in italics, is read almost 100%. There are two mistakes to improve. If you add them to your replacements, the effectiveness will be 100%.
Really good work @ Nikse555. Thank you again.

GCRaistlin

18th May 2020, 22:08

Nikse555
The latest beta still allows to add an empty better multi match.

Could you please allow selecting a character by a right click in 'Inspect items' area of 'Inspect compare matches for current image' window? I mean along with showing the context menu.

varekai

19th May 2020, 07:05

This is the message that was sent from you:
***************
Who are you to speak to me this way?
These pics aren't for you.
Don't open them and relax if you never have heard about ad blockers.
***************

This is an open forum, when you post something, everyone can read/see your post.
We post images, to show and clearify the issues we have and want to report bugs, get input/help from the forum.
You make a clickable link and the whole idea of that is to make someone click on it, right?
Therefore you should spare the forum from ugly sites like fastpic*ru.
Of course I have many layers of protection for my computer, including AV, Firewall, AD- and Script-blockers and what not.
Not all in here have that protection.
How hard can it be for you to understand that? Really?
Do yourself and the forum a favour and use another imagehost, imgur*com is very good and easy to use and... it's adfree (almost)!
No nasty pics close to pron, no ads, no popup windows etc etc...
If you don't understand the difference...
This is it:
Link:
https://imgur.com/a/Lqb3rjn
Image:
https://i.imgur.com/DPDnOaG.png

varekai

19th May 2020, 11:33

@GCRaistlin
This is the message that was sent from you:
***************
If you don't understand that other visitors aren't interested in this discussion it's your problem.
Nobody else seems to care about Fastpic so don't try protecting those who don't need your protection.
And don't bother to address me again on the forum, you won't get any answer.
***************

tormento

19th May 2020, 11:54

Latest beta has new (and hopefully improved) detection of space between italic letters:
Enjoy with line 13 of this (https://www.mediafire.com/file/2lkuf2xhlxkp3vv/Apollo_13_eng.7z/file). :)

Melan

19th May 2020, 12:39

https://i.imgur.com/oePIeyR.png

I did it in 10 minutes.
http://www.mediafire.com/file/7daaf4nb889ffak/Apollo_13_eng.srt/file

tormento

19th May 2020, 16:45

I did it in 10 minutes.
And you did it wrong. :D

"of" is italic while in your OCR it is in normal style.

I am finding issues, not establishing OCR time records.

Would you please explain me how can line 1097 contain the {\an8} marker?

I never noticed Subtitle Edit was capable of it.

Janusz

19th May 2020, 20:38

"of" is italic while in your OCR it is in normal style.

To make the text look good, [No of pixels is space] = 12, and this means that "of Apollo" is one word "ofApollo" and as such it was probably included by the algorithm as not italics. I think so - I don't know the algorithm. I do not know at what moment it is divided into two words, or on what terms. Probably this happens after selecting the "English" dictionary and selecting: [Fix OCR errors] and [Try to guess unknown words].
If you change [No of pixels is space] to e.g. 8, you will get 2 words "of" - in italics and "Apollo" - not italics, and "</i>" will be inserted after "of", but with such a small space remaining text will split up.
As you can see, this functionality still needs to be refined.

Would you please explain me how can can 1097 contain the {\ an8} marker?
I never noticed Subtitle Edit was capable of it.

For some time this Subtitle Edit tag added to me while importing subtitles from ts files for texts placed at the top of the screen. I don't remember which version.
Perhaps at this time other permanently embedded subtitles will appear at the bottom of the screen.

Melan

19th May 2020, 20:51

@tormento
Don't be a child. If you think that more than 2,000 lines will not contain errors, you are wrong.
SE works really well.

Nikse555

20th May 2020, 12:20

@Melan: thx, I think SE works really well too. It's still nice with feedback and ideas as it might help with making SE even better.

@tormento: Ah, did you set the proper "italic factor"? Right click in the list view, and choose "Set un-italic" factor (I think it's called). [No of pixels is space] = 13 worked fine for me I think.
SE can detect top align from Bluray .sup files - can be toggled via right click on the image... I've also added a on-video-preview for each image - press Ctrl+P to see the subtitle on actual screen size.

@GCRaistli
>The latest beta still allows to add an empty better multi match.
I think that "empty string" could be a valid text... perhaps a warning?

>Could you please allow selecting a character by a right click in 'Inspect items' area of 'Inspect compare matches for current image' window?
I don't follow... ?

Latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip

Janusz

20th May 2020, 14:02

@Nikse555

Is there sense for the nOCR method to continue reporting bugs in this forum since no one is using this method here?
As you wrote above, you recommend "Binary image compare", and nothing has happened with the nOCR project for a long time.
I would just ask you to fix the crash of the nOCR process from the start when "no dictionary" was selected.

I get this error (last beta 123 and several earlier) regardless of the configuration for the program.
In stable versions 3.5.14 and 3.5.15 this error is not there. If you need any files, you can use those from 18/05/2020.
Setting various options except [Dictionary = none] in the nOCR window does not affect the error.

https://drive.google.com/uc?export=view&id=1oPsaPYKK8HOOhclbrik2UIdkof-Sr0QG

Excerpt from error_log.txt

----------------------------------------------- ------------------------------
Date: 05/19/2020 22:38:29
Message: Unable to load '' (also check libc.so.6 + libdl.so.2)
-------------------------------------------------- ---------------------------
Date: 05/19/2020 22:38:29
Message: Not all required methods was found in libvlc
-------------------------------------------------- ---------------------------
Date: 05/19/2020 22:52:21
Message: Unable to load '' (also check libc.so.6 + libdl.so.2)
-------------------------------------------------- ---------------------------

Nikse555

20th May 2020, 14:27

@Janusz: Is the crash fixed in this beta?
https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.15/SubtitleEditBeta.zip