Log in

View Full Version : Subtitle Edit


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 [40] 41 42 43 44

markfilipak
25th February 2025, 18:06
Is there a way to suppress the blue line? I see in Settings, 'Waveform appearance' that it's called "Cursor color". I don't want to change its color. I want it to be gone (transparent). It just gets in the way.

PS: Maybe it would be helpful for the waveform cursor to appear solely when playing but otherwise disappear.

markfilipak
25th February 2025, 19:49
The medium gray background is fine but, you know, red -- spacial resolution is poorest in red -- was the poorest color choice and dark green is also poor. Try these:

Selected color: BCE9FF
Color: FFE955

They're easier on the eyes, too.

markfilipak
26th February 2025, 03:40
The largest times, Start and End, is 99:59:59.999. Honestly, don't you think 9:59:59.999 would be sufficient? That would cut the width by 2 characters. Space in SE comes at a premium.

Emulgator
26th February 2025, 11:29
Since TC conventions (MPEG-2 transport streams, DVD ticks and all the rest) allows for 2-digit hours
any truncation would mean a dead end to the ones who have to sub surveillance video.
It is a bigger world out there indeed...

markfilipak
27th February 2025, 16:52
Upon re-editing some videos that had excellent cues, I'm finding that what looks like 'Synchronization', 'Adjust all times (show earlier/later)...', 'Selected and subsequent lines' has fired all by itself. It appears that sometimes 'Show earlier' has fired or that sometimes 'Show later' has fired, but always 'Selected and subsequent lines'. It appears that this sometimes happens multiple times down the video so that the cues get progressively further off from their initial, excellent timings.

My guess is that this is happening during initialization when reloading. I wish I could be more specific, but I'd say the problem isn't with 'Synchronization' but is with initialization.

von Suppé
28th February 2025, 13:15
It is apparent to me that SE is not intended to "copy" PGS subtitles.

Subtitle Edit does have a PGS tool onboard for editing positions, timings and forced flags. Go "File --> Import --> Blu-ray (.sup) subtitle file for edit."

markfilipak
28th February 2025, 22:14
Ya know, I have an idea that could double the edit-speed of SE.

I don't move subtitles, I move cues. I move sub1's out-cue to open more gap so that I can then move sub2's in-cue to where I want it, then move sub1's out-cue back to close the gap.

I'm actually moving gaps!

So, why not click-drag a gap? And why not click-drag gap-ends instead of cues? I mean, it accomplishes the same thing but it's twice as fast.

markfilipak
1st March 2025, 06:22
Here's my first cut at proposing my ideal editor. Keyboard editing, no mouse required.

---------- NAVIGATE ----------
<< shot
/ < sub
/ / play (looping)
/ / / sub >
/ / / / shot >>
/ / / / /
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
| A | | S | | D | | F | | G | | H | | J | | K | | L | | : | | " |
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
/ / / / / /
<< expand / / / / /
shrink >> / / / /
< move / / /
move > / /
<< shrink /
expand >>
-------------- GAP EDIT --------------
(by one frame per key press)

By editing the gaps instead of the cues, this would be very quick, like playing a guitar.

PS: or like playing a piano.

markfilipak
3rd March 2025, 17:12
'Settings', 'Waveform/spectrogram', 'Single click to select subtitles' has a checkmark.

Single click on a waveform focuses on the clicked subtitle only if quick. If slow, nothing happens.

markfilipak
3rd March 2025, 22:14
I did not find any way to do what I ask for below.

What I want:
Use the Tab key (or really, any key) to switch focus between text panes and waveform pane.

How it should work:
if tab_key_event() {
if text_focus, then focus_waveform()
else, focus_text()
}

That's it. Anyone know how to do it? Possible? I would also need to disable all current focus methods, especially mouse-over, which I think is already possible.

Thanks!

markfilipak
4th March 2025, 23:11
When search finds a search-string and the mouse pointer is over the waveforms and I hit the Delete key, SE asks to delete the current subtitle. So, the list apparently has the focus even though it doesn't show it, even though a found-string is highlighted in the text box, and even though the mouse pointer is over the waveforms. That's not logical.

I expected to delete the found-string in the text box.

PART TWO, ADDED:

When a search for "BELL BEEPING" finds "[ DOORBELL BEEPING ]", "[ DOORBELL BEEPING ]" is put in the text box with "BELL BEEPING" highlighted. When I hit the Delete key, the entire text box is wiped out. That's illogical.

markfilipak
5th March 2025, 16:53
With the mouse over the waveforms, when I press either 'Z' or 'X' keys, the waveform scrolls by approximately -7.5 seconds. How can I disable that?

The above happens if the 'Center' thingy (between the play arrow and the 'Play rate' speedometer) is active. If the 'Center' thingy is not active, then the 'Z' key scrolls -100ms and the 'X' key scrolls +100ms.

Is there a way to kill keys that are not listed in 'Setup', 'Shortcuts'?

castellanos
7th March 2025, 02:32
Hi. New version 4.0.11. I can't use the spacebar to toggle video play/pause anymore. (possible in previous version).
I've changed the shortcut in the settings: "Settings/Shortcuts/Video/Toggle play/pause [Space]" but no luck.

Music Fan
7th March 2025, 14:32
Hi,
is it possible to add this option ;
Replace comma by point when next line begins by uppercase letter or dash, thus a new sentence ?

It could be problematic if the next line begins with a name which has an uppercase letter and which is not a new sentence, but it could help anyway to have this option.
And it's still possible to deactivate the option when unneeded for some lines.

Thanks a lot.

markfilipak
7th March 2025, 22:19
@Music Fan

find:
,(\n[A-Z\-])
replace:
.$1

Music Fan
7th March 2025, 23:02
Thanks but it does not work, no line is detected.
It may need the line break character, I believe it's `n (that's what I use with PowerShell)
But I tried whit this, it's not better.

edit : I tried with PowerShell but of course it can't work because of the space and the next timecode following each (double) line.

markfilipak
8th March 2025, 01:30
Thanks but it does not work ...

I just tested it. It works for me. You have to use 'Edit', 'Multiple replace...' and define it as a regular expression.

Try it and report back, eh?

PS: You're right about it not working in 'Edit', 'Replace'. Well, what do you want for free? -- just kidding. It's a bug.

Music Fan
8th March 2025, 11:58
Actually I see that your trick works for commas inside lines but not at the end of lines.
Example ;
1
00:00:48,000 --> 00:00:50,200
I went there,
That's a beautiful place,

2
00:01:02,700 --> 00:01:04,084
Yes, indeed.
becomes ;
1
00:00:48,000 --> 00:00:50,200
I went there.
That's a beautiful place,

2
00:01:02,700 --> 00:01:04,084
Yes, indeed.
instead of ;
1
00:00:48,000 --> 00:00:50,200
I went there.
That's a beautiful place.

2
00:01:02,700 --> 00:01:04,084
Yes, indeed.

markfilipak
8th March 2025, 17:55
Actually I see that your trick works for commas inside lines but not at the end of lines.

It's not a trick.

You asked for this: "Replace comma by point when next line begins by uppercase letter or dash, thus a new sentence", and that's exactly what I gave you. Now you also want to cover if the text ends on a comma with _nothing_ following it. That's
,($|(\n[A-Z\-]))
Also try
,((\n\n)|(\n[A-Z\-]))
That might cover better -- untested.

PS:
I just tested ,((\n\n)|(\n[A-Z\-]))
That's what you want.

Music Fan
9th March 2025, 11:31
Thanks but here the result is the same as in my example above, the comma stays at then end.:o

You asked for this: "Replace comma by point when next line begins by uppercase letter or dash, thus a new sentence", and that's exactly what I gave you.
Actually, for SE, a line can be split in two with a line break, but it's still considered as one line.
That's how does the option "Add period after lines ..." work in the "Fix common errors" menu.

I don't know how this one works but it seems different from the replace menu (about the managing of lines).

Thus both solution would be helpful, yours when the problem appears inside a line (broken in two or not), and a new option in the "Fix common errors" menu when it's at the very end of the line (admitting a line has one timecode, whatever it's split in two or not).

markfilipak
9th March 2025, 16:22
Actually, for SE, a line can be split in two with a line break, but it's still considered as one line.
No, a line that's split in two becomes 2 lines. This:
I went there,\nThat's a beautiful place,\n\n
is 3 lines. This regular expression:
,((\n\n)|(\n[A-Z\-]))
finds the textual strings that follow both commas, and this replacement:
.$1
replaces the commas with periods to produce this:
I went there.\nThat's a beautiful place.\n\n

ASIDE TO SE'S DEVELOPERS: I once wrote a Snobol script to beautify a friend's code. He was a professional programmer in Silicon Valley. He could not devise a regular expression in 'C' that would do it. It took me less than 5 minutes. That was 40 years ago. I recommend Ralph Griswold's book. Griswold invented Snobol. Snobol is much more powerful than REs.

Music Fan
9th March 2025, 16:41
I mean a line in the vocabulary of SE (one number and one timecode per line).

markfilipak
9th March 2025, 16:55
I mean a line in the vocabulary of SE (one number and one timecode per line).
No. An index number line followed by a time code line followed by lines of text terminated by \n\n is a subtitle.

Music Fan
9th March 2025, 18:12
Look at the menus, that's also a line for SE.
But that doesn't matter, the fact is that your last code does not give a different result from the previous one in the replace menu.

markfilipak
9th March 2025, 18:26
Look at the menus, that's also a line for SE.
But that doesn't matter, the fact is that your last code does not give a different result from the previous one in the replace menu.
I will not comment about fuzzy thinking and fuzzy nomenclature. You are not being specific. You are not citing specific regular expressions. You are being vague. I can advise you no further if you continue to be vague.

Music Fan
9th March 2025, 19:38
Nothing vague, I was very precise but you don't understand while it's quite simple, not my fault.
Nikse555 will surely understand.

markfilipak
9th March 2025, 20:16
test.srt, before:
1
00:00:48,000 --> 00:00:50,200
I went there,
That's a beautiful place,

2
00:01:02,700 --> 00:01:04,084
Yes, indeed.
find and replace:
,($|\n[A-Z\-])
.$1
test.srt, after:
1
00:00:48,000 --> 00:00:50,200
I went there.
That's a beautiful place.

2
00:01:02,700 --> 00:01:04,084
Yes, indeed.

markfilipak
9th March 2025, 22:03
Regular expressions (RE) have recognized limitations. For example, you cannot build a string processor with RE alone. What is required is a language that supports logic threading -- by "logic threading" I don't mean concurrent processing, I mean the ability to move a character pointer back and forth, dynamically gathering and discarding characters according to some controlling logic. That controlling logic can be implemented in 'C', but that's very difficult (and time consuming and painful).

SNOBOL was a language specifically developed for string processing. It is incredibly powerful. SNOBOL can be used to make string processes that drive 'C' codesmiths insane.

SPITBOL is an Intel-based PC implementation of SNOBOL. SPITBOL is compiled for speed. Each 'script' becomes a tool that's exactly suited to the species of task required. Generating SPITBOL for a particular task, compiling it, and running it is faster than RE in 'C', and it can be done on-the-fly.

I strongly recommend SPITBOL for the string processing and time code beautification done in SE. A month of learning SPITBOL will save you a year of writing 'C'.

UPDATE:

I searched and found this:

https://github.com/spitbol/windows-nt

It appears that is SPITBOL-386 by Mark Emmer, Catspaw Inc., renamed. SPITBOL-386 is what I have experienced.

I will be done with a subtitling project in a week. I'll explore SPITBOL-NT at that time.

UPDATE 2: I wanted to gather as much documentation of SNOBOL/SPITBOL as possible. Not much has survived. I gotta admit, this looks kinda crackpot. I promise it's not.

nekrovski
13th March 2025, 06:08
Hello!
What format should be the filename of the pictures for OCR when import with file -> import -> images so that SE can read starting/end/duration of the lines?

markfilipak
20th March 2025, 23:36
This procedure adds 'creature comforts' to the current method of setting in- and out-cues that speed up the process. It also automates as much as possible. If implemented, the 'creature comforts' combined with the automation would make processing subtitles very speedy and very accurate.

=== Begin manual part ===
Click the waveform inside a subtitle. (Call where you clicked THE WAVEPOINT.) Then press and hold THE KEY. While holding THE KEY down, mouse to either the in-cue or the out-cue, click THE CUE, and drag it.

- If THE CUE is an out-cue, SE plays an AUDIO LOOP from the _THE_WAVEPOINT_to_THE_CUE_, _THE_WAVEPOINT_to_THE_CUE_, over and over for as long as THE KEY is held down.

- If THE CUE is an in-cue, SE plays an AUDIO LOOP from the _THE_CUE_to_THE_WAVEPOINT_, _THE_CUE_to_THE_WAVEPOINT_, over and over for as long as THE KEY is held down.

While holding THE KEY, and while holding THE CLICK, THE CUE can be dragged. As THE CUE is dragged back and forth, the AUDIO LOOP will be heard to shorten and lengthen. Hearing that AUDIO LOOP's back and forth makes it easy to find the point where an utterance ends (or begins) and to drop THE CUE there, at that exact spot. When THE KEY is released or THE CLICK is released, either one, THE CUE is dropped.

- If THE KEY is released first, THE CUE is dropped and the AUDIO LOOP stops playing. If it's sub1's out-cue that was dropped, then SE waits until you do sub2's in-cue. If it's sub2's in-cue that was dropped, then SE runs the AUTOMATED PART on the sub1-to-sub2 interval.

- If THE CLICK is released first, THE CUE is dropped but the AUDIO LOOP continues playing and the AUTOMATED PART is not run because THE KEY is still pressed. THE CUE can be picked back up and moved again and again as long as THE KEY is held down.

SE now knows the spots when utterance1 ends and when utterance2 begins. That's the key to the AUTOMATED PART. In the AUTOMATED PART, out-cue1 and in-cue2 are both moved to their final spots.

If you make a mistake, the whole manual part, or any piece of it, can be repeated. SE always knows what you intend to do. For example, if you click either out-cue1 or in-cue2, you're working on the sub1-to-sub2 interval. if you click either out-cue2 or in-cue3, you're working on the sub2-to-sub3 interval. You can skip around if you like, and work on any interval anywhere at any time.

Notes to the developers:
1, The AUDIO LOOP plays just _THE_WAVEPOINT_to_THE_CUE_ or just _THE_CUE_to_THE_WAVEPOINT_ but not entire subtitles. Playing entire subtitles is a different key.
2, Releasing THE KEY always runs the AUTOMATED PART.
=== End manual part ===
=== Begin AUTOMATED PART ===
CASE1: (X-gap)>1300ms
out-cue1 in-cue2
Before: v v
utterance1|<-------------X------------->|utterance2

After: v v
utterance1<----------->| |<--->utterance2
1s 300ms
out-pad1 in-pad2

CASE2: !CASE1 & (X-gap)>=600ms
Before: v v
utterance1|<-----X----->|utterance2

After: v v
utterance1 |gap|<--->utterance2
300ms

CASE3: !CASE1 & !CASE2
Before: v v
utterance1|<---X--->|utterance2

After: v v
Note that the gap utterance1 |gap|<--->utterance2
may often overlap X/2
utterance1.

After THE CASES are run, run THE FIXUPS.

THE FIXUPS.

Shot change within utterance1:
- If utterance1 ends 100ms or more past the shot change, do nothing, leave out-cue1 where it is.
- Otherwise, move out-cue1 to the shot change and, mark this interval FOR REWORK.

Shot change within out-pad1: Move out-cue1 to the shot change. Then either move in-cue2 to in-cue2 minus 300ms or to out-cue1 plus gap, whichever is greater.

Shot change within the gap: Do nothing, leave as is.

Shot change within in-pad2: Move in-cue2 to the shot change. Then either move out-cue1 to in-cue2 minus gap or to out-cue1 plus 1s, whichever is lesser.

Shot change within utterance2:
- If utterance2 begins 100ms or more before the shot change, do nothing, leave in-cue2 where it is.
- Otherwise, move in-cue2 to the shot change and, mark this interval FOR REWORK.
=== End AUTOMATED PART ===

Show the user the intervals that are marked FOR REWORK. Show them one at a time.

FOR REWORK: if utterance1 and utterance2 have the same speaker, consider merging and re-splitting at a better point.

markfilipak
22nd March 2025, 03:13
In waveform, there really needs to be a way to continuously loop between _any_ two points. Press and hold a key, click point A, click point B, and the waveform is looped, A-B, until the key is released. While the key is held, either point can be dragged and the loop responds.

Music Fan
24th March 2025, 23:35
Hi Nikse,
is there a way to replace an uppercase by a lowercase when it follows a coma and a space ?
For example ;
Hello, How are you Hunter ?
replace by ;
Hello, how are you Hunter ?

This pattern can be found with this ;
(\,\s)([A-Z])
And I hoped it could be replaced with that but it does not work ;
$1\l$2


edit : I finally added a case for each letter ;
(\,\s)(A)
replace by ;
$1a
...

GCRaistlin
1st April 2025, 22:46
The default DirectShow Video Player has an issue with audio sync (http://forum.doom9.org/showthread.php?p=1995945#post1995945). mpv library that SE downloads doesn't work on Windows 8.1 x64. Windows 8.1 users should replace it with mpv-dev-x86_64-20240922-git-71f2220.7z (https://sourceforge.net/projects/mpv-player-windows/files/libmpv/mpv-dev-x86_64-20240922-git-71f2220.7z/download) manually.

markfilipak
8th April 2025, 03:46
I'm editing the subtitles for "The Ghost and Mrs. Muir" [1947], DVD. Timing wise, they are a mess. And much of the audio is too soft to see it in waveforms.

With Waveforms losing half their resolution (by design), and with no function (by design) that loops like this: Push key, click point A, click point B, the audio loops A-to-B, drag A (as the audio loops) in order to find where an utterance starts, drag B (as the audio loops) in order to find where an utterance ends, release key, I instead have to drag A, play, drag A again, play, drag A again, etc., drag B, play, drag B again, play, drag B again, etc. Without smarter functions, better thought out and designed functions, editing just takes forever.

I am in despair. I suggest better operations here and get no responses. Does no one give a sh!t?

READ ME: See https://forum.doom9.org/showthread.php?p=2017445#post2017445 for the resolution of this issue.

TR-9970X
8th April 2025, 05:20
I'm editing the subtitles for "The Ghost and Mrs. Muir" [1947], DVD. Timing wise, they are a mess. And much of the audio is too soft to see it in waveforms.

With Waveforms losing half their resolution (by design), and with no function (by design) that loops like this: Push key, click point A, click point B, the audio loops A-to-B, drag A (as the audio loops) in order to find where an utterance starts, drag B (as the audio loops) in order to find where an utterance ends, release key, I instead have to drag A, play, drag A again, play, drag A again, etc., drag B, play, drag B again, play, drag B again, etc. Without smarter functions, better thought out and designed functions, editing just takes forever.

I am in despair. I suggest better operations here and get no responses. Does no one give a sh!t?

Why don't you download it from somewhere in better resolution than DVD, and it may already have the subtitles....or you could get the subtitles from other places (not going to post URL's)

I have had a look, and it's all out there, ready to be got...

markfilipak
8th April 2025, 23:16
Why don't you download it ...
Good grief. Thank you, but my comment is not about the movie. It's about how poorly thought out SE's editing functions are, and how my suggestions get no response. Correcting subtitle times in waveforms is crude and incredibly tedious because the editing functions are crude.

TR-9970X
8th April 2025, 23:56
Good grief. Thank you, but my comment is not about the movie. It's about how poorly thought out SE's editing functions are, and how my suggestions get no response. Correcting subtitle times in waveforms is crude and incredibly tedious because the editing functions are crude.

I guess why I didn't make any suggestions was, that I don't use SE for what you're trying to do...

I've only just recently started using Whisper....

It's all very time consuming, at the best of times.

Good luck.

VoodooFX
9th April 2025, 00:51
I'm editing the subtitles for "The Ghost and Mrs. Muir" [1947], DVD. Timing wise, they are a mess. And much of the audio is too soft to see it in waveforms.

Can you PM me the audio and timestamps where "audio is too soft to see"?

markfilipak
9th April 2025, 02:07
Can you PM me the audio and timestamps where "audio is too soft to see"?
No, I'm sorry to say that I can't. It's copyrighted video and it's 2.6 GB. Would it do if I posted screen shots with arrows showing where an utterance _actually_ starts and ends but that isn't otherwise obvious?

Sometimes the audio is just a flat line but there's actually several frames of utterance there -- sometimes _seconds_ of utterance. Sometimes the utterance is buried in music, so it's all just jagged. If you've tried to set subtitles precisely (meaning: within 10 frames or so), you've run across this problem. You cannot rely on the waveform to show you where an utterance starts and ends. You have to hear it, and it's best to hear it in a loop and to have the power to move the cues while hearing the loop.

Right now there's no good way to audition an utterance, so there's no good way to set in- and out-cues quickly. I have posted a couple of ways to speed up editing. The latest also has "Faster editing" as the subject. I conservatively estimate that providing that function would speed up editing in the waveform window by at least 10x. My audition between points A & B (looping, with A & B both actively dragable) is not the same as simply looping from in-cue to out-cue. Please, read what I wrote and I'm sure you will 'get it'. If you don't 'get it', ask. My proposed method includes a button assignment, clicking A, clicking B, draging A and/or draging B while hearing the audition, and releasing the button. That audition is then automatically followed by setting of the length of in-pad and out-pad with and without intervening shot change. In other words, everything beautify does, but beautify is incapable of listening to utterances.

READ ME: See https://forum.doom9.org/showthread.php?p=2017445#post2017445 for the resolution of this issue.

markfilipak
9th April 2025, 02:26
I guess why I didn't make any suggestions was, that I don't use SE for what you're trying to do...

I've only just recently started using Whisper....

Cute name. What does it do?

It's all very time consuming, at the best of times.

It doesn't have to be so time consuming.

Good luck.
Thanks! But luck has little to do with it.

TR-9970X
9th April 2025, 03:01
Cute name. What does it do?

"Whisper" is an "add on" for SE, that performs an audio to text operation, that is, it creates subtitles from audio.

However, if your video/audio isn't "loud" enough Whisper may not be able to do it's job.

I've tried it on a couple of movies that I can't get any subtitles for, and it definitely does a pretty good job...there would be some reviewing & editing, but at least it's a very good start.

https://www.youtube.com/watch?v=4YZ0B1Zsi70&t=11s&ab_channel=DavidMbugua

https://www.youtube.com/watch?v=ZDXyBAzApH8&t=168s&ab_channel=SubtitlingwithClaudia

markfilipak
9th April 2025, 05:05
"Whisper" is an "add on" for SE, that performs an audio to text operation, that is, it creates subtitles from audio.

Ah, that's what I thought. Thanks. And as you note, it wouldn't work with soft utterances. Besides that, of the several hundred subtitles I've done, the videos come with subtitles that I OCR and fix up, so no Whisper. It's those fix ups that take forever with the current SE waveform tools but which could be greatly streamlined.

https://www.youtube.com/watch?v=4YZ0B1Zsi70&t=11s&ab_channel=DavidMbugua
https://www.youtube.com/watch?v=ZDXyBAzApH8&t=168s&ab_channel=SubtitlingwithClaudia
I've watched quite a few YouTubes, but they weren't useful. All the ones I've seen review how to use SE, not how to deal with difficult subs, and not with how SE can be improved.

TR-9970X
9th April 2025, 05:15
Ah, that's what I thought. Thanks. And as you note, it wouldn't work with soft utterances. Besides that, of the several hundred subtitles I've done, the videos come with subtitles that I OCR and fix up, so no Whisper. It's those fix ups that take forever with the current SE waveform tools but which could be greatly streamlined.


I've watched quite a few YouTubes, but they weren't useful. All the ones I've seen review how to use SE, not how to deal with difficult subs, and not with how SE can be improved.

You could try and run the current video thru Whisper and see what it finds.

I'm actually running an old movie that I can't get any subs for, and I'm using a "bigger" library/model, and it's taking forever, I hope it finds everything & accurately too.

markfilipak
9th April 2025, 05:29
You could try and run the current video thru Whisper and see what it finds.
Oh, that's a very good idea, but I'm very skeptical. I could compare the Whisper subs to the provided subs, but the comparison could only be academic -- not a practical solution, even if it worked. Such a comparison would only take even more time but with no assurance that the in- and out-cues were correct without me listening to them, which is what I'm doing now. I'm not saying that the solution has to be foolproof, only that nothing beats actually listening. It's that listening that I'm trying to optimize.

I'm actually running an old movie that I can't get any subs for, and I'm using a "bigger" library/model, and it's taking forever, I hope it finds everything & accurately too.
Well, good luck to you!

TR-9970X
9th April 2025, 05:39
Oh, that's a very good idea, but I'm very skeptical. I could compare the Whisper subs to the provided subs, but the comparison could only be academic -- not a practical solution, even if it worked. Such a comparison would only take even more time but with no assurance that the in- and out-cues were correct without me listening to them, which is what I'm doing now. I'm not saying that the solution has to be foolproof, only that nothing beats actually listening. It's that listening that I'm trying to optimize.
Well, good luck to you!

Do you use SE to OCR ??

I generally use gMKVExtractGUI.

I have done a couple of tests with a basic Whisper model, and despite the odd typo or misinterpretation, the timing was pretty good.

I will let you know how this current job turns out, it's STILL going, it's been well over 2 hours for a movie that 1.5 hours

But if turns out good, then it's better than the alternative, I guess.

TR-9970X
9th April 2025, 05:48
I'm editing the subtitles for "The Ghost and Mrs. Muir" [1947], DVD. Timing wise, they are a mess. And much of the audio is too soft to see it in waveforms.

Does no one give a sh!t?

I just thought of something...

You're saying that the audio is "soft"...what if you extracted the audio and amplified it, and then see if the waveform process works for you !!

markfilipak
9th April 2025, 05:50
Do you use SE to OCR ??
Yes. I'm satisfied with it. Not perfect, but very good. Kudos to Nik.

I generally use gMKVExtractGUI.
I package solely MP4. MKV has a 1 kHz clock, and that leads to too many problems.

markfilipak
9th April 2025, 07:28
I just thought of something...

You're saying that the audio is "soft"...what if you extracted the audio and amplified it, and then see if the waveform process works for you !!
1) I would have to make the louder audio.
2) I would have to mux the louder audio into the movie at the beginning, and mux it out at the end.
3) Doing so would not improve the situation -- I still have to listen -- and would only add more time to the effort.

The problem isn't that I can't hear the utterances. The problem is that I can't see the actual start and end of the utterances. That's mainly (partly) because waveforms could have twice it's current resolution, but doesn't.

The solution is one that facilitates setting in- and out-cues while simultaneously listening, and doing so much more rapidly than is currently possible.

Compare these methods:

Current SE: A is an in-cue, B is an out-cue. Audio is the sub.
Click-drag A, press a key to listen to A plus a little bit, release key.
Click-drag A again, repeat the hunt until A coincides with the start of the utterance.
Click-drag B, press a key to listen to the whole subtitle in order to hear the end, release key.
Click-drag B again, repeat the hunt until B coincides with the end of the utterance.
Manually add in-padding and out-padding by again dragging A, and again dragging B.
It takes many clicks, many drags, and many listen-key presses to accomplish this.

Proposed SE: A is an out-cue, B is an in-cue. Audio is the space between subs.
Press and hold a key, click A, click B, (SE continuously loops A-to-B).
Click-drag A while audio loops and drop it where utterance A ends.
Click-drag B while audio loops and drop it where utterance B begins.
Release key, (SE automatically adds out- and in-padding while taking shot changes into account).
It takes one mode-key press-and-hold, two clicks, and two drags to accomplish this.

You see, the proposed is not editing subs, it's editing the spaces between subs!

Large gaps between subs exist of course. For them, set A & B using the current, hunting method, above. However, small gaps greatly outnumber large gaps in real videos, so the proposed will work in the vast majority of cases.

READ ME: See https://forum.doom9.org/showthread.php?p=2017445#post2017445 for the resolution of this issue.

TR-9970X
9th April 2025, 07:39
1) I would have to make the louder audio.
2) I would have to mux the louder audio into the movie at the beginning, and mux it out at the end.
3) Doing so would not improve the situation -- I still have to listen -- and would only add more time to the effort.

The problem isn't that I can't hear the utterances. The problem is that I can't see the actual start and end of the utterances. That's mainly (partly) because waveforms could have twice it's current resolution, but doesn't.

The solution is one that facilitates setting in- and out-cues while simultaneously listening, and doing so much more rapidly than is currently possible.

Compare these methods:

Current SE: A is an in-cue, B is an out-cue. Audio is the sub.
Click-drag A, press a key to listen to A plus a little bit, release key.
Click-drag A again, repeat the hunt until A coincides with the start of the utterance.
Click-drag B, press a key to listen to the whole subtitle in order to hear the end, release key.
Click-drag B again, repeat the hunt until B coincides with the end of the utterance.
Manually add in-padding and out-padding by again dragging A, and again dragging B.
It takes many clicks, many drags, and many listen-key presses to accomplish this.

Proposed SE: A is an out-cue, B is an in-cue. Audio is the space between subs.
Press and hold a key, click A, click B, (SE continuously loops A-to-B).
Click-drag A while audio loops and drop it where utterance A ends.
Click-drag B while audio loops and drop it where utterance B begins.
Release key, (SE automatically adds out- and in-padding while taking shot changes into account).
It takes one mode-key press-and-hold, two clicks, and two drags to accomplish this.

You see, the proposed is not editing subs, it's editing the spaces between subs!

Large gaps between subs exist of course. For them, set A & B using the current, hunting method, above. However, small gaps greatly outnumber large gaps in real videos, so the proposed will work in the vast majority of cases.

Well, now that you've put it that way, it does sound like a LOT of extra work.

However, I thought I saw that you can export the audio to a text file....and also grab the subs from just the audio track.

I ended up stopping that Whisper run, @ 4 hours, it kept what it had done, and it got up to just over an hour thru the movie, there was a lot of extra stuff generated (not needed), but the timing was pretty good, and there weren't too many typos.

I'm going to try a different library/model...

Has the author of SE got a "git" page ??? maybe you need to post your concerns there, not here....

I might try Whisper on the "The Ghost and Mrs Muir" that I got the other day, even tho it came with subs.

markfilipak
9th April 2025, 16:02
Well, now that you've put it that way, it does sound like a LOT of extra work.
Yes. Going through a 2 hour movie while checking and correcting timing can take a full day. Setting accurate in- and out-cue times is very important for making subtitles that flow well and are therefore easy to read. I try to match the pace of the utterances. In a well made movie, dialog has a certain pacing that expresses the mood that the director intends. I have found that when the cues match that pacing, the subtitles almost magically become easier to read and understand. It's quite amazing.

However, I thought I saw that you can export the audio to a text file...
Yes. I save subtitles in SRT format -- that's text. SRT is easy to mux-merge into a package stream like MP4, via FFmpeg.

and also grab the subs from just the audio track.
It seems that all movies and TV shows after about the year 2000 include subtitles. So, no, I haven't had to make subs from just an audio track. I have some very old DVDs that don't have subtitles but I just leave them be.

I ended up stopping that Whisper run, @ 4 hours, it kept what it had done, and it got up to just over an hour thru the movie, there was a lot of extra stuff generated (not needed), but the timing was pretty good, and there weren't too many typos.
Well, that's good to know. May I ask: How do you know the timing was pretty good?

Has the author of SE got a "git" page ???
Yes (https://github.com/SubtitleEdit/subtitleedit), and a web site (https://www.nikse.dk/subtitleedit), too.
maybe you need to post your concerns there, not here....
Doom9 is for discussion. I appreciate discussion of proposed changes. I think discussion makes for better applications like SE. SE needs to be more interactive than it is now. I've appreciated your thoughts.