Log in

View Full Version : Subtitle Edit


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 [24] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

tormento
19th July 2020, 09:38
could you give some line numbers?
Line 33 of ita_OCRFixReplaceList.xml

Nikse555
19th July 2020, 10:25
Line 33 of ita_OCRFixReplaceList.xml

Do you also have a line number in .sup file?

tormento
19th July 2020, 12:15
Do you also have a line number in .sup file?
1557

Even removing the OCR line that I told you, it's wrongly OCRing "I" as "i".

Nikse555
19th July 2020, 13:41
1557

Even removing the OCR line that I told you, it's wrongly OCRing "I" as "i".

I get
INDIGNAZIONE: I CINQUE MOTIVI
PER CUl O.J. SIMPSON SE L'É CAVATA

tormento
19th July 2020, 13:44
I get
INDIGNAZIONE: I CINQUE MOTIVI
PER CUl O.J. SIMPSON SE L'É CAVATA


WTF.

Apart from wrong É (it should be È) it looks like your OCR hasn’t my same issue.

Need to sort this thing out.

Perhaps some regional setting? I had problems with an AVS script some time ago.

jlw_4049
19th July 2020, 13:48
@tormento: I could not find lines where "l" is replaced by "i"... could you give some line numbers? (thx about the italic letter accent U)

@loninapleton: You can open the ASS file and change format in the toolbar to "SubRip (.srt)" (SubRip is the topmost format in the drop down list).
You can also convert multiple ASS files to SubRip (.srt) via Tools -> Batch convert or by using command line convert.

@jlw_4049: If you cannot see the OCR method "nOCR" then you probably don't use SE 3.5.16?I downloaded the latest BETA recently. Maybe I need to delete everything and replace it.

Sent from my SM-G986U1 using Tapatalk

Janusz
19th July 2020, 18:52
@jlw_4049
Here I wrote what needs to be done to access nOCR https://forum.doom9.org/showthread.php?p=1913645#post1913645

loninapleton
19th July 2020, 22:56
@tormento: I could not find lines where "l" is replaced by "i"... could you give some line numbers? (thx about the italic letter accent U)

@loninapleton: You can open the ASS file and change format in the toolbar to "SubRip (.srt)" (SubRip is the topmost format in the drop down list).
You can also convert multiple ASS files to SubRip (.srt) via Tools -> Batch convert or by using command line convert.



Thank you. I knew I was missing something-- the tool bar part. I'll try it again. I must be in the right place. :-)

tormento
20th July 2020, 10:21
I get INDIGNAZIONE: I CINQUE MOTIVI PER CUl O.J. SIMPSON SE L'É CAVATA
Ok, it was enough to delete *user*.xml and install last beta.

tormento
20th July 2020, 16:03
Update: I have tried to write a single line srt manually and to use Fix common errors on it, excluding the OCR process.

I have found that if I save "i" letter to Names (it_names_user.xml), SubtitleEdit wants to change "I" to "i".

Usually I save single letter words "i", "a", etc to Names because they can't be found in Italian dictionary and stops the OCR processing.

Any idea to solve this issue?

Boulder
21st July 2020, 07:27
Usually I save single letter words "i", "a", etc to Names because they can't be found in Italian dictionary and stops the OCR processing.

Any idea to solve this issue?

Why don't you add them to the dictionary? Then they won't stop the process.

tormento
21st July 2020, 08:40
Why don't you add them to the dictionary? Then they won't stop the process.


You read the first two lines?

loninapleton
25th July 2020, 19:14
A simple question on Subtitle Edit. I appreciate the previous help give to me as a new user.

The current problem I have is I am in OCR Hell. I used a quick link
to explain why OCR opens on a demuxed (using Inviska from Toolnix) SUB/idix from a Handbrak DVD rip.

All I need (I think) is to convert the SUB/idx pair into SRT so
I can add some translations to the work. But I got lost in the OCR screen activities.

Are there a few simple steps to do this?

nekrovski
25th July 2020, 22:18
Too bad there's no Subtitle Edit for games :(

https://i.imgur.com/n2YwroY.jpg

https://i.imgur.com/2D6T6gH.jpg

loninapleton
28th July 2020, 07:39
A simple question on Subtitle Edit. I appreciate the previous help give to me as a new user.

The current problem I have is I am in OCR Hell. I used a quick link
to explain why OCR opens on a demuxed (using Inviska from Toolnix) SUB/idix from a Handbrak DVD rip.

All I need (I think) is to convert the SUB/idx pair into SRT so
I can add some translations to the work. But I got lost in the OCR screen activities.

Are there a few simple steps to do this?

Progress. I downloaded Tesseract 5 and started the OCR operation which will run for a while.

Press Ok, Return to Main Screen File > Save As >Subrip.

Success.

A fine program I will have to explore again.

loninapleton
1st August 2020, 01:32
An additional problem I'm working on.

Can Subtitle edit join a first and second part of an edited MKV?

The recode was made from DVD originally and then I added an act break--
like being at the theatre. I've had a request for subs for that piece.

I can redo from the DVD from scratch or begin a join of the parts one and two.

Subtitles extracted from my MKV are SUB Idx like the initial item I asked about.

Can either the OCR version or an SRT made with Subtitle Edit's features
combine into one SRT?

Lucius Snow
6th August 2020, 18:18
Hi guys,

I need your help urgently because I must deliver an EBU N19 (STL) file to a channel. Their software reported the following error: "MISSING STARTBOX!"

Do you know what am I missing in the EBU properties during the export?

Thank you very much.

Nikse555
6th August 2020, 18:54
Hi guys,

I need your help urgently because I must deliver an EBU N19 (STL) file to a channel. Their software reported the following error: "MISSING STARTBOX!"

Do you know what am I missing in the EBU properties during the export?

Thank you very much.

Could you try: Display standard code = 1 Level-1 teletext ?

Lucius Snow
6th August 2020, 19:00
Could you try: Display standard code = 1 Level-1 teletext ?
That's the one I already use.

Nikse555
6th August 2020, 19:04
OK, in "Text and timing information" - do you have "Use box around text" checked?

Lucius Snow
6th August 2020, 19:45
OK, in "Text and timing information" - do you have "Use box around text" checked?
Nope. By the way, I downloaded a softwared called EBUSTLViewer which reported the attached file with 5 errors. They appear with or without "Use box around text" checked.

I tried to re-export the EBU STL from another software and these 5 errors disappear. I don't know if they're linked to this "start box" issue reported by the TV channel.

Nikse555
7th August 2020, 18:06
Nope. By the way, I downloaded a softwared called EBUSTLViewer which reported the attached file with 5 errors. They appear with or without "Use box around text" checked.

I tried to re-export the EBU STL from another software and these 5 errors disappear. I don't know if they're linked to this "start box" issue reported by the TV channel.

Could you perhaps upload to some file share site?

Lucius Snow
9th August 2020, 15:45
Could you perhaps upload to some file share site?
Actually, there are two different issues:

1/ The new export with "Use box around text" seems to work according to the channel TV (waiting for a confirmation though).

2/ The errors reported by EBUSTLViewer come from the timecodes converted from milliseconds to frames. For example, the software would read 00:01:02:25 for a 25 fps file which is incorrect. I had to adjust them a very little before to get rid of the errors.

EDIT: I confirm the error was due to "Use box around text" unchecked. The TV channel has now accepted the file.

loninapleton
13th August 2020, 07:56
An additional problem I'm working on.

Can Subtitle edit join a first and second part of an edited MKV?

The recode was made from DVD originally and then I added an act break--
like being at the theatre. I've had a request for subs for that piece.

I can redo from the DVD from scratch or begin a join of the parts one and two.

Subtitles extracted from my MKV are SUB Idx like the initial item I asked about.

Can either the OCR version or an SRT made with Subtitle Edit's features
combine into one SRT?

I came back to this seeing that I left it hanging. There is actually a fix for this using MKV Toolnix where an MKV with two parts can be joined with the subtitles then demux the new single sub in MKVtoolnix _and_ the numbering scheme will be time- adjusted to end of file. I'll see if I can get back the exact link at videohelp.com who posted the solution.

time passes....

Here is the link I mentioned:

https://forum.videohelp.com/threads/191107-Help-Joining-Two-srt-Files-As-One

Post #16 has the specific technique and described in steps in a careful manner. The whole thread is
pretty useful.

loninapleton
13th August 2020, 08:25
A new question for me is: can a movie with hardcoded subs on the image be defeated with a player that will display a new subtitle
in the black border outside the image?

The movie is old and perhaps direct copied from VHS. There is no
subtitle file listed in MediaInfo. It is subtitled at the source. But there is an SRT for the film which could be used to make new languages with a translator if it can be displayed properly in VLC, Daum or MediapPlayer Home edition etc.

Nikse555
13th August 2020, 09:27
The errors reported by EBUSTLViewer come from the timecodes converted from milliseconds to frames. For example, the software would read 00:01:02:25 for a 25 fps file which is incorrect.

That's a bug, thx :)
Beta updated: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.16/SubtitleEditBeta.zip
( code fix is here: https://github.com/SubtitleEdit/subtitleedit/commit/d3ab44a8aa3df8ce691e1de8cc90a7fa95bf9bf0#diff-392f0be95fc0e084b6820e7e18cc9957L703-L707 )


@loninapleton: SE has "Tools -> Append" and "Tools -> Join" - will only work with text based subtitles.

Nikse555
13th August 2020, 09:32
can a movie with hardcoded subs on the image be defeated with a player that will display a new subtitle
in the black border outside the image?

Yes, you could can use the ASS format with a box - if the alignment is not the same for all subtitles it's probably better to use Aegisub - and a lot of time.

loninapleton
14th August 2020, 00:44
Erf. Thanks for answering. I'm not good enough with the simplest tasks much less trying to create a box. I thought Media Player or one of those had a subtitlles outside the frame option. I don't care if the hardsubs show. Saying 'defeated' was not accurate as in trying to scrub it, but to provide an option for translations in SRT or other.

Here is what I see: the hardsubs are visible. But VLC will show no subtitle options -- empty. So can I put an optional SRT down below and selectable as an external sub file? I was just looking around at opensubtitles etc and found an SRT for this video/old movie. It is however a classic-- just never updated. VLC may not be the right option-- it's just what I use and know how to add subs from it's player menu.

robena
16th August 2020, 09:51
I was using SubtitleEdit-3.5.13, and now upgraded to SubtitleEdit-3.5.16, so I'm not yet sure if my problems are solved or not.

I upgraded recently from a 6 year old 6 core on Windows 7 to a 3 times faster ASUS WS X299 SAGE/10G + Intel Core i9-10900X overclocked at 4.7 GHz on all cores + Windows 10.

I hate Windows 10, it took then several years to make it usable, and that's only using Windows Blinds to skin it and using DPI Awareness Enabler to get something where most apps are not fuzzy on a high DPI monitor.

The problem is that with a 3 times faster system, Subtilte Edit is much much slower.

I use an auto-it macro to do repetitive tasks. With the old system, no problems, witht he new ones SE was so slow that the macro was falling out of sync with the various Windows, and that I had to redesign it to read their header content rather than do a reasonable Sleep amount of time to complete the task.

I also had to make a c++ program to elevate itself to REALTIME_PRIORITY_CLASS before calling SE, still in mode REALTIME_PRIORITY_CLASS to make SE responsive enough to work well with the macro.

That problem is (painfully) solved, but I have another one.

I have often whole series to OCR.

What I do is start for example 13 SE OCR Windows, launch OCR on each (useful to have 10 cores!), go watch something and come back 60 minutes later to make the manual corrections.

But SE does not like AT ALL having 13 Windows opened, even if they have all finished to OCR.

Typing on an Unknown Word window makes SE respond in easily 5 seconds, it's excruciatingly slow. When there is only one Window left, response time is less than 1 second.

Any idea why I get that on this new system, and how to speed up the time response?

Edit: I just tried with SubtitleEdit-3.5.16 to confirm that the problem is still there though. With 12 Windows opened, clicking on the an Unknown Word window makes SE respond in 8 seconds. Even though I am only using 25% cpu, and half of my 64GB memory. OCR seems to be much slower with Tesseract 5.0 than with 4.0.
The sup file is located on a 3500 MBs NVME SSD disk. I don't think you can have a PC much more faster than mine, save a new one wit a 4.0 PCI bus, which I'm sure would make no difference.

Thanks!

Edit: I double checked running a lot of OCR windows on an old Windows 7 PC. Clicking on an error brings you to the faulty line in less than half a second. SE has a BIG perormance problem on Windows 10.
It's not specific to my config. A few years ago, the first time I tested Windows 10, using a scrip to open 10 Windows almost simultaneous froze Windows, and I had to reboot. The same worked perfectly with Windows 7.

junah
17th August 2020, 13:18
Im pretty sure its not a fault of Subtitle Edit.

Emulgator
18th August 2020, 19:13
Win10 trying to keylog things ? Muuuhahahaaa...

"It took then several years to make it usable",
yes I am with you regarding both OS XP and 7 and hopefully won't have to do the same again with 10 too soon.

WinXP32ProSP3 and Win7U64SP1 here.

nekrovski
21st August 2020, 15:14
Can anyone help me with Subtitle Edit's regex?
I would like to use Find, to find only double lines that both start with a dash -

Nikse555
24th August 2020, 11:39
@robena:
SE calls tesseract.exe for each image. Tesseract.exe itself uses multithreading. Running multiple OCR windows with Tesseract will probably use all threads pretty fast.
Using one of the other OCR methods will give better results for you when running in parallel.

>The problem is that with a 3 times faster system, Subtilte Edit is much much slower.
You're taliking about Tesseract 5 vs Tesseract 3? Yes, that's probably correct.


@nekrovski: You can try this:
-.+\n-

nekrovski
24th August 2020, 15:12
@nekrovski: You can try this:
-.+\n-
Thanks a lot, works.

loninapleton
25th August 2020, 23:14
I had a DVD rip which showed a VOB sub. It shows in programs like MKVmerge but won't display in Daum or VLC. Where did it go?
I used Subtitle edit to extract the VOB and save it as SRT for subtitle compatibility.

The workaround I have tried is delete the VOB sub in the
original then recode with Handbrake adding in the SRT. It's coding now.

What makes this so odd is the Text from the VOB sub looked fine and complete as an SRT format viwing it in Notepad++.

Janusz
26th August 2020, 00:43
@nekrovski, @Nikse555

Can anyone help me with Subtitle Edit's regex?
I would like to use Find, to find only double lines that both start with a dash -

@nekrovski: You can try this:
-.+\n-

In my opinion, before the expression "-.+\n-" should add "\A". Then for sure "-" will be searched only at the beginning of the line, not in the middle.
The entire expression would be "\A-.+\n-".

nekrovski
26th August 2020, 12:51
@nekrovski, @Nikse555

In my opinion, before the expression "-.+\n-" should add "\A". Then for sure "-" will be searched only at the beginning of the line, not in the middle.
The entire expression would be "\A-.+\n-".
Thank you.

This is gonna sound super nitpicky but sometimes "break long lines" option, does this to a long line
Though this trip to Tochigi was pretty far,
too.
The "too" goes to a second line and I really dislike when there's something really long in first line and only a word on another.

Is there a way to prevent this without manually checking in the "fix common errors" window? When there's only a handful of break long lines suggestions, I can check. But when there are 50 or so, it puts a strain on my eyes/brain to check each manually.

So as a workaround to this, after I apply the "break long lines", I'm looking for an option that will let me find/search/display only the 2 lines subtitles in which there's a significant difference between the number of characters in each line. And possibly, for me to be able to specify the difference.

Is there such thing?

robena
27th August 2020, 14:39
@robena:
S
>The problem is that with a 3 times faster system, Subtilte Edit is much much slower.
You're taliking about Tesseract 5 vs Tesseract 3? Yes, that's probably correct.

No, I'm talking about the fact than when OCR is finished, each window on Windows 10 take more than 5 seconds to react when pressing over a sentence needing manual input.

Even TE 3 is slower on Windows 10 by the way, but it's not a big problem it's not a thing I do interactively, I do something else until it's finished.

What is insufferable is on Windows 10:

1) I click in a sentence needing manual correction

2) It may take up to 8 seconds before the windows reacts and I can work.

That happens only when many OCR windows are opened at the same time.

My system has 64GB of memory and 20 threads, I use less than that, so the problem is elsewhere.

That does not happen with other software. I can have 10 Firefox widows opened with 10 tabs each, typing on a tab goes to it instantly.

That does not happen with Windows 7.

Janusz
28th August 2020, 09:10
@Niksee555
On August 25, 2020 on the main page of the program in the comment to version 3.5.16 @MagratG wrote:
"Query: My temp directory is filling up with 1000s of png files,
the subtitle images, that are not being deleted after closing the program. "

Looking at its directory "temp" I can see that SE automatically creates files with similar names,
eg: a9474388-3f2f-4ae9-b73b-5bff0e0bec39.ass, which also does not delete after exiting the program.

These files are created if "mpv" is selected as the video engine in the program options
and only if the "mpv handles preview text" option is checked.
Sometimes for one and the same inscriptions in quick succession several different files with the same content are created.

****************
Editing 30/08/2020
I checked the 3.5.16 Beta 134 version - the described problem no longer exists.
Thank you.

Nikse555
31st August 2020, 17:33
@Janusz: Cool, thx for reporting/testing :)
fixed via this commit: https://github.com/SubtitleEdit/subtitleedit/commit/d95c64833eca550fbdcce20b0204e063cbcb9ff7

Janusz
11th September 2020, 00:01
@Niksee555
Bug in the stable version of Subtitle Edit 3.5.16 and above.
"OCR auto correction" does not apply to the options you set.

https://drive.google.com/uc?export=view&id=1-ty62o9B6C3PqDGnHg1zSmQ2ybzsF9Cw

As you can see in the picture - except for the dictionary - the other "OCR auto correction" options are disabled,
and yet the OCR program made 13 corrections, although it should not. All fixes can be seen in the [All Fixes] tab.
The situation described occurs only for italic. See lines 521 and 524 and it always happens
regardless of whether I use pol_OCRFixReplaceList.xml or not.
I checked other texts with and without italics - the problem is with all files.
I also checked the stable version 3.5.15 - the problem does not occur.

The remaining tabs: [Guesses used] and [Unknown words] are filled in as expected.
[Guesses used] is empty and [Unknown words] contains unrecognized words.

For those who do not know Polish, the good news is that
that all corrections were made flawlessly.


Editing 17-09-2020

I checked the version in Subtitle Edit beta 184 - the "All fixes" list is no longer populated for the case described above.

Another problem arose - it concerns the "Subtitle text" window.
In the picture above, with the selected language, the lines detected by OCR without errors completely have a green background for the text,
lines with whole words that are unrecognized have a yellow background, while lines with unrecognized single characters have a brown background.
This property allows you to quickly locate the error line and its type visually. And that's great.
In beta 184, this property is lost, and despite selecting a dictionary in the [Dictionary] field, all text from the first to the last line is white background as if the dictionary was not specified (None).
Background recoloring is only restored when "Fix OCR errors" is checked. Until now, this has worked without having to select this option.

tormento
11th September 2020, 15:57
Was playing with tesseract sources and compiled a x64 build.

No time to test, take it (http://www.mediafire.com/file/y7wbyba9oxkjse7/file) as it is. :)
tesseract 5.0.0-alpha-781-gb19e3
leptonica-1.81.0
libjpeg 8d (libjpeg-turbo 2.0.5) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11
Found AVX
Found SSE
Found OpenMP 201511
Found libarchive 3.4.3 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.5

loninapleton
12th September 2020, 00:29
The latest version 3.5.16 which I downloaded just for updates shows a new screen that pops up for OCR which I don't know how to use.

Is it preferable to demux the VOB and avoid this screen rather than trying to drag and drop an MKV which is what I did?

loninapleton
19th September 2020, 05:07
The latest version 3.5.16 which I downloaded just for updates shows a new screen that pops up for OCR which I don't know how to use.

Is it preferable to demux the VOB and avoid this screen rather than trying to drag and drop an MKV which is what I did?


I am the OP. I have fixed things. Did a fresh install with
translation box un-ticked, Tesseract 5 selected and downloaded
for VOB and English installed as the dictionary language.

Someone can say if ticking the translation box activated that
pop up screen I did not know what to do with. I had an older copy on a different machine and reverted to that-- looking for
differences.

Janusz
23rd September 2020, 22:33
@Niksee555

A.
In my opinion, Subtitle Edit is not properly managing the computer's RAM.
I prepared the description for version 3.5.16 beta 222.
The version doesn't matter. I checked previous stable versions 16, 15, 14 all the way to 10.
With the same operations, the results are similar everywhere. But it gets worse from version to version,
so that in version 10 it takes up to 2 GB of RAM during the first loading.

The very launch of the program is ok. The RAM occupancy increases slightly from version to version.
It is known that the program is growing, new functions are added, and this requires space.

The RAM occupancy is based on the Task Manager.
Test file: mpeg-ts contains 1 video stream, 2 audio streams, 1 stream with DVBSUB subtitles
File size: 9,793,003 KB.

After starting, in my case, the program takes up 20.3 MB of RAM,
1. Dropping the ts file on the main program window, the parsing of the file starts.
After its completion, the Import / OCR Vobsub ... window opens. - RAM = 88.4 MB.
In the window I choose [Cancel], I go back to the main window - RAM = 88.4 MB !!! Why?
2. I do the same as in step 1 again.
The RAM occupancy drops to 78 MB so that when the Import / OCR Vobsub ... window opens, it shows RAM = 131.8 MB
I choose [Cancel] again, and the RAM still occupies 139 MB.
If so, I will repeat the operations from point 1, I will eventually take up all RAM.
The program does not release the memory also if I select [OK] and remove the subtitles from the main window by selecting [File / NEW].
In this case, the program takes up RAM even faster.

B.
The second thing is about parsing the file itself. During its digestion, the progress is shown in %.
With each file, I have a situation where the progress counter stops - the numbers stop changing.
During this time, a system message is displayed next to the program name and version (no response) in the program title bar.
At this time, however, the program continues to work because after a shorter or longer time the progress is displayed by a few,
and even several dozen % more. I have one to several such detentions during the file analysis.
The file analysis itself works and completes fine, but these counter stops and messages are annoying.

Janusz
27th September 2020, 16:57
Subtitle Edit beta 232 crashes when trying to import ts file.
The same file in beta 222 opens correctly.
https://drive.google.com/uc?export=view&id=1TEgOKNWGw3Wd0oHPumt03PLsbHfotPVc

Nikse555
28th September 2020, 11:46
@Januz: Do you still get the crash in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.16/SubtitleEditBeta.zip ? (beta 240)
Beta also fixes an issue where bd sups lost overlapping subtitles: https://github.com/SubtitleEdit/subtitleedit/issues/4392
In general, dot net programs do not manage memory release.

Janusz
28th September 2020, 14:13
@Januz: Do you still get the crash in latest beta 240?

Thank you @Nikse555, beta 240 the earlier file already opens correctly, I also checked a few other ts files - they also open without problem.

Nikse555
29th September 2020, 11:26
Thank you @Nikse555, beta 240 the earlier file already opens correctly, I also checked a few other ts files - they also open without problem.

Cool, thx for testing :)

von Suppé
30th September 2020, 09:03
Hi Nikse555,

I wouldn't know if this is already been addressed to, but now that I think about it:

Is it possible to load a SUP and/or XML/PNG file into SE, not OCR-ing it, only for adjusting the timecodes? And after that, export back to SUP or XML/PNG, so without changing the original subtitle images and their X/Y coördinates. Of course, preferably with realtime monitoring in the preview with a chosen video.

I would be very happy if that's possible.
Cheers