Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
16th January 2022, 15:19 | #1521 | Link |
Banana User
Join Date: Sep 2008
Posts: 985
|
SE 3.6.4 fails to download any spell-checking dictionaries, tried English and few random ones.
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling |
19th January 2022, 21:35 | #1522 | Link | |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
Quote:
@Janusz: Perhaps it's better with external images/files? Last edited by Nikse555; 20th January 2022 at 07:21. |
|
20th January 2022, 01:08 | #1523 | Link |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
The problem with the rules:
<WordPart from = "W" to = "W " /> <WordPart from = "w" to = "w " /> explained. The Polish dictionary contains the unused word "wżyć" and for this reason the word "wżyciu" has not been split into two words "w życiu" (in life). Sorry for the confusion. As for the change from lower case to capital letter at the beginning of the paragraph, or (".) to (...) at the end, the topic is relevant. If I prepare the examples properly, I will come back to the matter.
__________________
Sorry for my mistakes - I'm using a translator. |
20th January 2022, 22:14 | #1524 | Link |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
@Nikse555
Here are examples of how enabling "Fix common OCR ..." affects our text received during OCR. Sample files to download.
__________________
Sorry for my mistakes - I'm using a translator. Last edited by Janusz; 21st January 2022 at 15:10. |
21st January 2022, 22:53 | #1526 | Link | |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
I wanted to split the phrase "wżyciu" (inlife) into two words "w życiu" (in life) and it didn't work because, as it turned out, the phrase "wżyciu" (inlife) is in the dictionary so the rule <WordPart from = "w" to = "w " /> will not work in this case.
Quote:
In the second example, I made a mistake with the order of the characters: is ." and it should be: ". so here we will not get ... instead ". This is especially frustrating when you create a rule to fix a bug on a specific line and it works for that line, and after scanning all the text you find it doesn't work.
__________________
Sorry for my mistakes - I'm using a translator. Last edited by Janusz; 21st January 2022 at 23:12. |
|
25th January 2022, 03:45 | #1527 | Link |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
In addition to the previous post, I attach a new image with a description of the imperfections of text correction after OCR after enabling the option [Settings/Tools/Fix common errors - also use hard-coded rules].
My program version: 3.6.4 NEXT, beta 388. The contents of the Dictionaries directory: apart from the standard English and Polish dictionaries, I have deleted the remaining files. The contents of the zip file: - ivon.source.srt - source file - used to create sup - used for comparison with the OCR result, - ivon.source.sup - proper file with subtitles, - ivon_60.12.8.131.250.nocr - character base - please set threshold = 131, - ivon.d-on_f-on.srt - OCR result without any correction. My OCR settings as in the picture. Files to download Observations:
In addition to nOCR, I also checked:
Conclusions:
__________________
Sorry for my mistakes - I'm using a translator. |
25th January 2022, 03:54 | #1528 | Link |
Registered User
Join Date: May 2021
Posts: 16
|
I using "subtitle edit" app to convert PGS to srt. in this example there is a word "ANNIE"
but app read it as "ANN IH". in inspect compare matches option how can i remove the space between "ANN" and "lH". any help please? https://i.imgur.com/cuRJKjl.png |
25th January 2022, 05:12 | #1529 | Link |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
@iKron
1. Increase the number of pixels by 1 or 2 and check if the gap disappears. If not enough, add more. 2. You have assigned the H character to the E picture. Set to H, change the assignment of the E to E picture in the text field.
__________________
Sorry for my mistakes - I'm using a translator. |
25th January 2022, 05:26 | #1530 | Link |
Registered User
Join Date: May 2021
Posts: 16
|
@Janusz thank you. no of pixel space 10 worked fine. i got another problem.
there is a space between two word. but it's merged. is there anyway we can add space? please check the screenshot. last word OCR converted to "ofAbed". it suppose to be of Abed. it was working fine if i use pixel space 8 https://i.imgur.com/xh6igwk.png |
25th January 2022, 08:06 | #1531 | Link |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
@iKron
In this case, decreasing the space will separate the words. Additionally, enable the [Try to ques unknow words] option, because there may already be corrections on the list of suggestions.
__________________
Sorry for my mistakes - I'm using a translator. |
25th January 2022, 18:25 | #1532 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
@Janusz: thx for the files - I've tried to improve the ocr fix engine here: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Better? |
25th January 2022, 18:53 | #1533 | Link |
Registered User
Join Date: May 2021
Posts: 16
|
@Janusz thank you for the output. i am really new with this subtitle edit. few suggestion i am looking. is it wise idea to add unknown words to "user directory" like here is "BFFs"
https://i.imgur.com/znA1MO1.png what is the difference between "add to name/noise list" and "add to user directory" and is there anyway to disable this option. whenever i finish subtitle edit a popup box appear. https://i.imgur.com/N1FKczy.png Last edited by iKron; 25th January 2022 at 20:45. |
25th January 2022, 23:41 | #1534 | Link | |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
Quote:
While looking for a way to recover lost characters quickly and reliably, I ran into an error in [Tools/Fix common error]: checking the [Add missing quotes (")] option will not cause the list to be corrected to show lines with a single ("). This can be checked in the current stable or beta version on our example. @iKron
__________________
Sorry for my mistakes - I'm using a translator. Last edited by Janusz; 26th January 2022 at 00:16. |
|
26th January 2022, 02:01 | #1535 | Link |
Registered User
Join Date: May 2021
Posts: 16
|
thank you so much Janusz. two more question
when i converted subtitle via nOCR i got popup box, there is option "Foreground" and "NOT foreground". what is the difference between "Foreground" and "NOT foreground", also difference between "OCR via nOCR" and "Binary image compare" lastly is Tesseract method good? which method is good to convert the sub. |
26th January 2022, 08:14 | #1536 | Link |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
@iKron
__________________
Sorry for my mistakes - I'm using a translator. |
29th January 2022, 11:00 | #1537 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,542
|
I have issues with a left/right hearing impaired sup file.
It splits the sentences on left and right side according to the talking actor. I know that asking you to support {\an*} would lead to excessive programming work, as you already stated. What would be useful is to fix subtitles with more than 3 lines, making the CR removal only when there are commas or spaces and not full marks or capital letters. Just try to OCR it and fix common errors and you will see what I mean.: it mixes dialogues between different actors. The least I can ask is not to make the rule behave in a dumb way. After that some manual work will wait me. Perhaps you could introduce some "special" characters to recognize left and right side, letting us to have a easy job with such kind of sup files.
__________________
@turment on Telegram |
10th February 2022, 03:07 | #1538 | Link |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
@Nikse
SE does not recognize missing <WholeWords> section in _OCRFixReplaceList_User.xml After installing the program, the first time you use [Add pair to OCR replace list] during Import/OCR ... or via Settings/Word lists [Add pair] to [OCR fix list], the file "_OCRFixReplaceList_User.xml" is created. If we deliberately remove the <WholeWords> section from it for some reason and forget about it, the program will not create the missing section, allowing you to add new pairs of words that will not be saved anywhere.
__________________
Sorry for my mistakes - I'm using a translator. Last edited by Janusz; 10th February 2022 at 03:16. |
10th February 2022, 10:11 | #1539 | Link |
Registered User
Join Date: Sep 2009
Posts: 1
|
Hi,
Is it possible to move the video forwards or backwards frame-by-frame in SubEdit, as it is in Aegisub? I couldn't find any reference to it and it is sometimes essential to avoid "flashing" subtitles on scene changes. |
12th February 2022, 10:28 | #1540 | Link |
Registered User
Join Date: Sep 2010
Posts: 34
|
Hi,
So I was trying to export some subs to SUP with Subtitle Edit. But no matter what font or style I choose, lines come out horribly misaligned. See: https://i.ibb.co/TgyRm7P/Untitled.png (Image as link because it's wide and breaks the forum layout) Without apparent sense lines randomly appear higher or lower. The first window shows the desired height, the one the most lines are shown at. You can see the other three at varying heights. Double line, italics, caps, it seems it doesn't matter, it makes no sense. How do you make the bottom line in every picture appear at the same height? :-/ P.S.: I've tried some more. Depending on the font, more or less number of lines are shown aligned. For example Times New Roman is the most consistent, still a few lines are too high or low. Even if it worked it's a horrible font for subs though. Edit 2: Some shitty fonts, like Tempus Sans ITC, seems perfectly in line. I scrolled through a lot of sub-pictures and they look pixel perfect. Ofc, it's an even more horrible font for subs. Seems it's a matter of having just the right font? Why can't it work with Arial? Weird. |
Thread Tools | Search this Thread |
Display Modes | |
|
|