Log in

View Full Version : Subtitle Edit


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [17] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

shag00
18th September 2019, 23:20
With so many text-processors, sinking this software into an infinite sea of requests is a bit counterproductive.

Is this in response to the above post or just a random thought on people asking for product enhancements?

Ghitulescu
19th September 2019, 08:43
Is this in response to the above post or just a random thought on people asking for product enhancements?
This is a gentle reminding of a feature I've asked some 5 years ago, thinking that maybe the author got caught in a repetitive-loop. At that time it was the only software that could deal with images in BD/M2TS format.
OCR is performed OUTSIDE of this software, and if one half of the users require a -dash before the first line of dialogue and the other half require exactly the opposite, and this goes on and on for the aforementioned period of time, well, ... :)

Nikse555
20th September 2019, 21:08
@Matt: Atm, if the duration is longer than "max duration" in settings, then SE just set the duration to 3,5 secs - just changed the code to use "max duration" instead of 3,5 seconds - but perhaps it would be better to just let the user fix that afterwards?
You can set the "max duration" to something large like 20 seconds to allow long duration.

@Ghitulescu: Most changes to SE are from people asking for product enhancements, but I do really try to be careful about what features I add - and I don't want to add everything to SE (like advanced ASS features as we already have Aegisub for that or features that will have very few users). SE focuses on creating/adjusting/fixing/reviewing text subtitles and ocr'ing/converting + translating - the last one is ever changing :(. New subtitle formats are also very important (like new stuff from Amazon, Netflix or w3c).
In the last five years many improvements has been made in reading bitmap formats from mp4/ts/m2ts/bdsup and converting between/to bitmap based formats - read more here: https://raw.githubusercontent.com/SubtitleEdit/subtitleedit/master/Changelog.txt
What was your original request by the way?

Matt Kirby
21st September 2019, 11:22
@Matt:
You can set the "max duration" to something large like 20 seconds

Thank you! That works...

Ghitulescu
23rd September 2019, 10:51
What was your original request by the way?

I kindly asked for a possibility to change the colours (maps) of the SUP, sort of what DVDSubEdit did for DVD. Essentially, I need to get rif of the stupid black box surrounding the text, to make it transparent.

Nikse555
28th September 2019, 07:33
...I need to get rif of the stupid black box surrounding the text, to make it transparent.

Could you post/link/email a sample subtitle?

Ghitulescu
2nd October 2019, 21:35
Could you post/link/email a sample subtitle?

I don't know how to cut full streams comprising subtitles (and to keep them in), therefore I needed 3 days to delete 2000 images one-by-one in order to fit the 200kB limit of doom9:

nekrovski
6th October 2019, 10:36
Is there a way to fix capitalizion on next line when there wasn't a fullstop/question mark/exclamation etc on the previous line?
https://i.imgur.com/Tc3Ofp1.jpg

Nikse555
6th October 2019, 14:19
@nekrovski: Try Tools -> Change casing... Normal casing
You can do a compare afterwards to see the changes.

Nikse555
8th October 2019, 06:53
@Ghitulescu: Perhaps you can use some file host for the sample file or email it to me?
In Bluray sup files each subtitle have it's own palette (up to 255 colors), and vobsub normally has a global 4 color palette.

Ghitulescu
8th October 2019, 14:58
In Bluray sup files each subtitle have it's own palette (up to 255 colors), and vobsub normally has a global 4 color palette.

I know this :(

nevertheless it's easy to find out the background color index entry.

nekrovski
9th October 2019, 09:37
@nekrovski: Try Tools -> Change casing... Normal casing
You can do a compare afterwards to see the changes.
Thank you, this worked.

Verminaard
10th October 2019, 14:22
I have a question: When I use "remove text for hearing impaired" it removes the unnecessary SDH tags fine but it adds speaker dashes to some lines. What could be the reason to this? Version 3.5.10.

Nikse555
11th October 2019, 06:04
@Ghitulescu: Sorry, the attached subtitle is pretty impossible to do anything about... sometimes the text is completely transparent and sometimes the text is just as transparent as the box.

@Verminaard: Could you give some examples? Speaker dashes should be added sometimes...

nekrovski
11th October 2019, 09:22
I opened the non hearing impaired subtitles from the new Breaking Bad movie, and it didn't find any common errors.
This is the first time I see an original subtitle in which Subtitle Edit doesn't find any common errors.
I was appalled.
Did they, by any chance, used Subtitle Edit for the subtitles? :D

Nikse555
11th October 2019, 09:38
@nekrovski: that may be possible: https://partnerhelp.netflixstudios.com/hc/en-us/articles/115000258712-Subtitle-Edit

Ghitulescu
11th October 2019, 17:45
@Ghitulescu: Sorry, the attached subtitle is pretty impossible to do anything about... sometimes the text is completely transparent and sometimes the text is just as transparent as the box.

It's one of those dynamic pallettes, I know, but it's very simple to identify and then change only the background.

Verminaard
11th October 2019, 22:23
@nikse55

I will send you a link for an example file later. But logically, a function called "Remove" should not be adding something now should it :) Fix subtitles on the other hand has an option to add dashes. But in my case they're unchecked.

Thanks for responding.

von Suppé
14th October 2019, 22:36
I have the same problem as Verminaard.

But, after some testing, I think I found something. The tool may have an issue when text for hearing impaired are BOTH italic AND in double lines. I created a little .srt file (attachment) that should explain.

dngnt
17th October 2019, 20:57
@dngnt: In the OCR window you can right click in the list view... and choose "Save all images with HTML index..." or "Export -> BDN xml/png"

I've been using this feature quite a lot. I would suggest that when using
"Export -> BDN xml/png" --> Export all
instead of presenting Desktop as default export directory, to use the current location of the idx/sub or sup file ?
That would be a big time saver.
Thanks for your great job!

Nikse555
18th October 2019, 05:04
I've been using this feature quite a lot. I would suggest that when using
"Export -> BDN xml/png" --> Export all
instead of presenting Desktop as default export directory, to use the current location of the idx/sub or sup file ?
That would be a big time saver.
Thanks for your great job!

@dngnt: thx for the info :)
I've tried to fix it here: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.10/SubtitleEditBeta.zip
Does that work for you?

Nikse555
23rd October 2019, 21:13
I have the same problem as Verminaard.

But, after some testing, I think I found something. The tool may have an issue when text for hearing impaired are BOTH italic AND in double lines. I created a little .srt file (attachment) that should explain.


Thx for the info - latest beta updated: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.10/SubtitleEditBeta.zip

von Suppé
24th October 2019, 09:47
Hi Nikse,

When I load the test srt (which I uploaded) in your latest beta, the issue of adding double dashes is fixed, indeed. But it still adds a dash to the first line?

Zetti
27th October 2019, 11:41
Thanks for new release:
https://github.com/SubtitleEdit/subtitleedit/releases/tag/3.5.11

Nikse555
3rd November 2019, 15:05
@von Suppé: SE prefers dialog dashes for both lines... but please do post examples if you find bugs :)

@Zetti: you're welcome, seems to be a nice and stable release!

Also, if you have transport streams (.ts) files with teletext, please do test latest beta version (use File -> Open.. and choose .ts file): https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.11/SubtitleEditBeta.zip

dngnt
3rd November 2019, 19:03
@dngnt: thx for the info :)
I've tried to fix it here: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.10/SubtitleEditBeta.zip
Does that work for you?

I've tried your latest Beta and your 3.5.11 version with the export --> png/xml .
it now offers "Desktop" for default for the output.
Still, I would prefer the current location of the .sup or idx/sub file ( which is on a separate hard disk, and a specific directory) for the default.
When I import from...and then do the OCR and then save... the current sub file location is by default.
With the export feature, one should expect the same behaviour .

Thanks a lot for your versatile tool!

von Suppé
17th November 2019, 15:35
Hi Nikse,

I am figuring out how to to deal with some issues I run into when using another subtitle tool, regarding SUP --> XML/PNG conversion and vice versa. In doing so, but also for just wanting to know this now, I ask myself, how are BD SUP files build up?
Since SubtitleEdit can export to BD SUP, I wonder if you can make some things clear.

As for graphics, of course I can export SUP to XML/PNG and take a look at the .png images. Doing this with an image-editor, it tells me that it's "2D" (only 1 layer) and of course it carries opacity-data.
Again, this is after SUP --> PNG/XML conversion. Are the pictures in SUP indeed stored as "2D" PNG (or uncompressed BMP32) or is there more to them?

Also (this would be ages ago) I read something about the picture's I/O timings being defined differently, using framenumbers & framerate instead of time-codes such as used in SRT. Never giving it more thought at the time, but now it's more topical for me.

I hope you can shed more light - for a newbie, as this is the first time I go more deeper into SUP.

Nikse555
20th November 2019, 12:26
@von Suppé: Some nice info about bdsup is available here: http://blog.thescorpius.com/index.php/2017/07/15/presentation-graphic-stream-sup-files-bluray-subtitle-format/
bdsup does not have frame number (it does have a "frame rate" but that is mostly not used), so it's time based like SRT.
Each image has a palette with 255 colors and each entry also has an "alpha" (lumianance) - much like an 8-bit png.
A bdsup entry can have multiple images (but mostly don't)

von Suppé
20th November 2019, 12:50
Thanks for the info, Nikse. I'll go read.

darksen
2nd December 2019, 00:09
I've been using your program for a very long time and it is great. I always wanted to make a request, but I never registered here until some time ago.
I make a big use of the replace list with a lot of custom regex and other stuff and as such I have set it the best I could so I don't need/have to review every little change it does. My request is to split the message: "Fix common OCR errors (using OCR replace list)" into two messages, one pointing that the change is made using the default replace list and other pointing that the change comes from the user replace list. That way I can disable the changes I have to review and apply the changes I know I don't have to check (the ones coming from the user replace list) so in the second pass I go one by one checking them, because a lot of times I got more than 200 hundred entries in there and going one by one checking what has changed to disable the ones that shouldn't be applied is very time consuming, sometimes I don't have to disable anything because all comes from my user list so I just waste time checking each entry.
Please, if you can consider adding this it would mean a lot.
Thank you.

nekrovski
5th December 2019, 10:18
Hi,
I was working on a subtitle. Then I used Handbrake to burn in the subtitle in the video.
When I played the video, this showed up
https://i.imgur.com/I9BNR0H.jpg
When the video is played without burned in subtitles, but instead with subtitles from a separate .srt file from the video, it shows fine.
I opened the subtitles with Notepad and I noticed that this empty space highlighted here
https://i.imgur.com/fXB3ydT.jpg
confuses Handbrake.

Is there a way for Subtitle Edit to detect errors like this with the Fix common errors option?
I tried but couldn't find.

Nikse555
19th January 2020, 15:52
SE 3.5.12 is out :)

Can now read teletext from .ts / .m2ts files - includes any colors + top alignment (.ts reading should also be faster)
Tesseract updated from 4.1.0 to 4.1.1 + more pre-processing image settings

@darksen: Did you get a lot of false corrections? Perhaps that could be improved if you posted some.

@nekrovski: It's not really errors with whitespaces... you could re-save with SE via command line perhaps?

EDIT and off topic: Is it possible to change the thread name? I've tried to change the title in the first post, but the thread title still says "Subtitle Edit 3.5.10"...

mkver
4th February 2020, 15:23
Now that I wanted to test the new teletext feature I directly stumbled upon the issue behind my ticket 1897 (https://github.com/SubtitleEdit/subtitleedit/issues/1897) again: It should be possible to treat several transport stream files as one big file (in case a DVR only supports FAT32 and therefore splits files into pieces smaller than 4GB each).

arslan
5th March 2020, 05:02
Hi Nikse,

would it be possible to implement the new Serbian dictionary and spellchecker, which is significantly more comprehensive than the current one?

(The new one is located here:
https://github.com/msmiljan/korektor
and here:
https://extensions.libreoffice.org/extensions/serbian-spellcheck-and-hyphenation)

Thank you very much!

GCRaistlin
17th March 2020, 17:04
How can I use portable MPC-HC with Subtitle Edit? MPC-HC option is greyed out in Settings.

Feature requests:

[Options - Settings... - Word lists] Double click on a pair in OCR fix list fills the fields beside 'Add pair' button with the corresponding values.
[Options - Settings... - Tools - Fix common OCR errors - also use hard-coded rules] Make using hard-coded rules customizable. For example, replacing 'l' between uppercase letters with 'I' is surely needed while converting the first letter of the paragraph to uppercase may be completely unwanted.
[Import/OCR Blu-ray (.sup) subtitle file...] Add the ability to disable spell checking while still using OCR fix list for selected language. This makes sense because some errors like "l instead of I after the dot" aren't being fixed by OCR fix list and hence force spell checking dialog to appear. But they may be fixed by applying a regexp in an external editor (for the mentioned error it would be "(?<!\w)l(?!\w)"). Applying regexps before spell checking saves a lot of time but to use regexps currently we need to OCR without error fixing and then call Fix common errors tool with only 'Fix common OCR errors (using OCR replace list)' option checked.
Add the ability to use regexps to fix common OCR errors. It would be great to create a predefined set of regexps like the one above. I'm ready to share my own.
Look for Settings.xml in the current (working) directory (i. e. directory that was the current when SE was laucnhed) instead of the directory where SubtitleEdit.exe is located. It would allow to have different settings for different cases or users.

Bugs (Import/OCR Blu-ray subtitle, OCR method: Binary image compare, Image database: Latin):

Non-Italic dashes that are followed by Italic text are erroneously recognized as Italic (example (https://i111.fastpic.ru/big/2020/0317/1e/2bdc9546c5f53f9260e09643bcd76c1e.png)). Also another bug with this example subpic: if Dictionary field is empty then the space after "Audience" is lost; if Dictionary is set to English then the space is preserved.
"t ]" in Italic is recognized as "t]" with default "8 pixels is space" (example (https://i111.fastpic.ru/big/2020/0317/b1/4aa295f7e8614bf5a559e097361f38b1.png)).
'9' is recognized as '0' (example (https://i111.fastpic.ru/big/2020/0318/ce/a06f2c687da4c65aadd7f52b14a772ce.png)).
Jumping to a subpic by typing its number in 'Subtitle text' area isn't working for #555: typing '5' repeatedly moves the cursor from #50 to #51, then to #52... #59, then to #500 and so on.
'Fix OCR errors' checkbox state isn't being saved.

OCR fix list (English):

Why default OCR fix list includes "backseat -> back seat"? Is "backseat" really incorrect?
Why default OCR fix list includes "lt -> it"? I believe it should be "lt -> It". The same thing with various lf-started pairs: one part of them has "If" as a result (which is correct), another part has "if" as a result (which is not).

Lucius Snow
18th March 2020, 12:53
Hi all,

Since the update to 3.5.13, when I reload existing subtitles from a recent SRT, it doesn't load anymore the video which goes with it.

Can you please tell me how to restore this?

Thank you.

Nikse555
18th March 2020, 15:15
@arslan: thx, will be included in next update.

@GCRaistlin: A lot of input...
You probably need a 64-bit version of MPC-HC... but I would recommend that you try "mpv" as video player, as that seems to be the best option atm.
You can use "Ctrl+G" for go to sub#
'9' is recognized as '0'... Double click on the line in the list view, and use "Add better match".
5) Look for Settings.xml... that's what profiles were made for.
Regular expressions are already supported - see the english ocr fix replace list.
"backseat -> back seat"... yes, that seems like a bug, thx.

@Lucius Snow: SE 3.5.14 is out - and try "mpv" as video player (Options - Settings - Video player - Download mpv lib)

GCRaistlin
19th March 2020, 09:59
You probably need a 64-bit version of MPC-HC
I have both 32-bit and 64-bit portable versions installed. How SE recognizes that MPC-HC is (not) installed?

'9' is recognized as '0'... Double click on the line in the list view, and use "Add better match".
That's what I've already done but isn't it an everyone's issue?

that's what profiles were made for.
Profile on General tab? I don't see how to add a new one there. Also, changes seem to be written to the current profile without asking. For example I've changed 'Single line max. length' and the new value has been saved to 'Default' profile immediately.

Regular expressions are already supported - see the english ocr fix replace list.
I don't see any regexps there (Words lists - OCR fix lists). Also, I mean support for manually applying regexps, not without asking.

Remove text for hearing impaired issues:

It doesn't remove commas: "I am, uh, late" -> "I am, late". It is clear that it may be necessary commas but to leave them all - without the possibility to edit text right there - isn't a good decision, too.
It's hard to identify a caption that needs to be edited after using this tool: only caption numbers are showed but the numeration will be changed due to the deletion of captions. So the only way is to write down a portion of the target caption text and then search it. Displaying of time codes (or/and log of applied fixes available after quitting the tool) would be great.


Now in List view Start time and Duration are displayed and available for editing. In some cases, editing of End time is preferable. Can you please add such a possibility?

The biggest trouble for me is though non-customizable hard-coded rules applying. To prevent making first letters uppercase I'm forced to perform a global replace with a regexp before Fix common errors and another global replace after.

Lucius Snow
19th March 2020, 18:48
@Lucius Snow: SE 3.5.14 is out - and try "mpv" as video player (Options - Settings - Video player - Download mpv lib)
Thank you.

I already use mpv but I installed it manually because it doesn't work from Subtitle Edit (error from a server with no SSL/TLS).

Same with 3.5.14.

Nikse555
19th March 2020, 21:49
@GCRaistlin: Do have have some examples where first letters are converted wrongly to uppercase via hard coded rules?
About the profile on General tab... click the "..." button the make new or delete profiles.
The double-click on word lists feature are available in latest beta.
About MPC-HC, you can use the "MpcHcLocation" in settings.xml or just a subfolder called "MPC-HC" in the SE folder, using the installer should also work (MPC-HC does not really have an API - SE actually just steals the video from the MPC-HC UI) - but just try mpv :)

@Lucius Snow: Hm, that works fine here... how can I re-create your issue with video not opening? Can you give more information? Is it all .srt files or only some. Video type? Do you have the video window open?

Nikse555
19th March 2020, 22:34
@GCRaistlin: Editing the text in "Remove text for HI could be possible - how does this work: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.14/SubtitleEditBeta.zip

Lucius Snow
19th March 2020, 22:52
@Lucius Snow: Hm, that works fine here... how can I re-create your issue with video not opening? Can you give more information? Is it all .srt files or only some. Video type? Do you have the video window open?
Anyway, mpv seems properly installed because I use it to play videos. I manually installed it. The video does play.

My problem is just when I open an existing SRT from File / Reopen menu. Before the update, it opened both the SRT and the associated video file. Now, it doesn't open the video. I have to do it again each time I open the SRT.

It concerns any codec / container.

Nikse555
20th March 2020, 07:08
@Lucius Snow: Do you still have problems in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.14/SubtitleEditBeta.zip ?
If yes, what are the steps for re-creating this in detail (drag-n-drop or file open or shortcuts)?

GCRaistlin
20th March 2020, 11:54
some examples where first letters are converted wrongly to uppercase via hard coded rules

92
00:07:54,641 --> 00:07:56,559
- l would take the idea to its extreme -
- [ Whispering, lndistinct ]

93
00:07:56,643 --> 00:08:00,188
and draw parallels
between reproduction in art. . .

577
00:38:09,245 --> 00:38:13,875
- Well, he said that, uh -
- lt is actually as beautiful as the original.

578
00:38:14,000 --> 00:38:17,295
- that they thought it was an original
for many, many centuries -
- [ Man, ln ltalian ] When was it made?

1091
01:12:28,344 --> 01:12:32,014
- That impression is quite right, but. . .
- [ Crowd Cheering ]

1092
01:12:32,181 --> 01:12:33,808
how can l say. . .

See ## 93, 578, 1092. Anyway, "OCR error" means that something is erroneously recognized while the first letter may be in lower case in an original caption. It may be an error in a general sense, but if we just want to get the text that is identical to the graphical source we surely don't want such AI.


About the profile on General tab... click the "..." button the make new or delete profiles.
Oh I see, how could I just miss it. But searching for Settings.xml in the current directory first would be still useful for multi-user environment.

The double-click on word lists feature are available in latest beta.
Working, thanks. Can you please implement replacing of an existing word on 'Add pair' press (with confirmation)?

Editing the text in "Remove text for HI could be possible - how does this work
It does, thanks again.

Lucius Snow
20th March 2020, 13:08
@Lucius Snow: Do you still have problems in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.14/SubtitleEditBeta.zip ?
If yes, what are the steps for re-creating this in detail (drag-n-drop or file open or shortcuts)?

Thank you but there's no change.

Nikse555
20th March 2020, 13:35
Thank you but there's no change.
But can you give steps to re-create this issue?

Lucius Snow
20th March 2020, 14:29
But can you give steps to re-create this issue?

That's what I described earlier:

My problem is just when I open an existing SRT from File / Reopen menu. Before the update, it opened both the SRT and the associated video file. Now, it doesn't open the video. I have to do it again each time I open the SRT.

Difficult to explain more :(

GCRaistlin
20th March 2020, 22:28
Nikse555

Now in List view Start time and Duration are displayed and available for editing. In some cases, editing of End time is preferable. Can you please add such a possibility?
In addition to my previous request I'm offering to add 'Pause before next' field. The result could look like this:

[x] Start time: ___ [ ] Duration: ___
[ ] End time: ___ [x] Pause before next: ___

The idea is that 0, 1 or 2 checkboxes can be set at the same time. Inactive fields (related to cleared checkboxes) are greyed out, their values get changed in accordance to the values in active fields.
This bug isn't reproducible with Latin.db in the latest beta but I decided to report it 'cause it is really strange. Try to OCR these SUP(BD) subtitles (https://mir.cr/0RLUHRRR) with 3.5.14 (clear all checkboxes but [x] Fix OCR errors). The problematic caption is #203. If you start from #100 (skip unrecognized characters twice) "Sunday" will be recognized as "sunday". If you start from #101 (close current OCR session and start a new one) it will be recognized as "Sunday".
Currently, the installation package contains files that may be changed by an user (Latin.db, en_US_user.xml and so on). Hence, they may be replaced with default ones by update. It would be better if all changes were made to the files that don't exist by default.


My current Latin.db (https://mir.cr/19P4ZIDN) - maybe you'll find my additions useful for all.

tormento
21st March 2020, 10:31
@Nikse555

I am doing some OCR on idx+sub files and every "I" that begins a sentence is converted to "L".

Can you fix that?

Here (https://www.upload.ee/files/11307246/zcd.7z.html) is a sample.

tormento
29th March 2020, 11:31
@Nikse555

I am finding some encoding giving problems to your editor.

Here (https://www.mediafire.com/file/kkr62qfk1mvpica/SupRip_samples.zip/file) you can find some. I hope the names are self explanatory enough.

The only ones I can open with no problems on accented vowels and symbols are UTF16-LE ones. With UTF8 it is a mess ;)