Subtitle Edit [Archive] - Page 43

markfilipak

26th November 2025, 03:02

junah

28th November 2025, 00:16

Tnx for adding Google Lens under OCR options. It manages lots of languages very well, except it doesnt honor lowercase at the start of the line at all. Italic doesnt work aswell.

Janusz

2nd December 2025, 21:20

So that everyone understands the issue...

When I play part 1 via 'FFplay -ignore_editlist 1', the audio is approximately 700 ms early; for part 2, it's approximately 1300 ms early.

When I edit part 1 via SE, the waveforms that SE draws are approximately 700 ms early; for part 2, they are approximately 1300 ms early.

Those are not coincidences.
...

I wouldn't worry about what FFplay shows with this or that parameter, as we're not using it in our case. All we need is the DELAY from our video.

Solution/workaround for your problem:

First, a demonstration... (https://www.mediafire.com/file/82q2b5x05ji823w/K-003_%25233.mp4/file)

1. Loading subtitles
2. Loading the original video (no. 1) (the video has a 29 ms audio offset relative to the video, but I'll skip that for now).
Important!!
Now, reset the program by using "Menu\Video\Set video offset..." [Reset]
As you can see, the video and subtitle synchronization is OK.

3. Loading a specially prepared video (no. 2) with audio, to which I've added 700 ms of silence at the beginning of the audio track (the video has a -700 ms audio offset relative to the video).
As you can see, the video and subtitle synchronization is OK.
SubtitleEdit ignored the negative audio offset relative to the video. The generated waveforms are exactly the same as in step 2.

Now for your case.
4. The next video (no. 3) has a 700 ms audio track trimmed at the beginning. To synchronize it with the video, I used a delay of 700 ms. The generated waveforms don't match the video at all. Shifting the subtitles is out of the question, as you can't synchronize the subtitles with both the video and the shifted audio track simultaneously.

Solution!!
Now you need to extract the audio track from your video. You can use ffmpeg, for example.

- In the program, close the video file ("Menu\Video\Close video file")
- Load the newly obtained audio track into the program by dragging it onto the waveform window,
- In the program, set the delay to 700 ms (Menu\Video\Set video offset...)
Syncing only with the audio will do the trick. Finally, you need to add a 700 ms offset for all subtitles so that everything plays correctly when combined with the video.

Edit:
Another solution:
Once you have the extracted audio track, generate an additional, empty audio file (containing only silence) no shorter than 700 ms with the same parameters as your original video file and combine them (silence + main file) without decoding. Add this combined file as a new audio track to your video with "delay=length of the added audio portion in ms" and use this new video to synchronize subtitles by selecting this new track to generate the waveforms in SubtitleEdit.

markfilipak

3rd December 2025, 01:00

I wouldn't worry about what FFplay shows with this or that parameter, as we're not using it in our case. All we need is the DELAY from our video.

Solution/workaround for your problem:

First, a demonstration... (https://www.mediafire.com/file/82q2b5x05ji823w/K-003_%25233.mp4/file)

Thank you for your work, Janusz. You're very kind.

I downloaded your MP4. It's not what you think it is. The problems are caused by the structure of the MP4, not by SE. I'll write more about that in a bit.

First, your MP4 has an edit list. The source of the following lines is FFprobe's report (level 48), by the [mov,mp4,m4a,3gp,3g2,mj2 @ xxxxxxxxxxxxxxxx] demuxer.

Processing st: 1, edit list 0 - media time: 23956, duration: 4013585
drop a frame at curr_cts: 0 @ 0
drop a frame at curr_cts: 1024 @ 1
drop a frame at curr_cts: 2048 @ 2
drop a frame at curr_cts: 3072 @ 3
drop a frame at curr_cts: 4096 @ 4
drop a frame at curr_cts: 5120 @ 5
drop a frame at curr_cts: 6144 @ 6
drop a frame at curr_cts: 7168 @ 7
drop a frame at curr_cts: 8192 @ 8
drop a frame at curr_cts: 9216 @ 9
drop a frame at curr_cts: 10240 @ 10
drop a frame at curr_cts: 11264 @ 11
drop a frame at curr_cts: 12288 @ 12
drop a frame at curr_cts: 13312 @ 13
drop a frame at curr_cts: 14336 @ 14
drop a frame at curr_cts: 15360 @ 15
drop a frame at curr_cts: 16384 @ 16
drop a frame at curr_cts: 17408 @ 17
drop a frame at curr_cts: 18432 @ 18
drop a frame at curr_cts: 19456 @ 19
drop a frame at curr_cts: 20480 @ 20
drop a frame at curr_cts: 21504 @ 21
skip 1428 audio samples from curr_cts: 22528
That is an edit list. It's the first thing that may cause trouble.

The second thing that does cause trouble is a gap in the audio timestamps at the beginning of the MP4.

I've fixed my problem. Yeah! (I think.) The fix was very tedious involving several days of work.

I plan to write a detailed report. Before I do that, I'll finish the subtitles for part 1. Then I'll process part 2 in the same manner. Then I'll splice them together. The running time for parts 1+2 together is 5 hours.

I want to make sure that what I do is understandable and effective. Then I'll write the report, post it here, and then delete most of my previous postings on this subject. It will take me a few days.

By the way, the trouble was not caused by "Menu\Video\Set video offset..." -- I've never used it.

markfilipak

9th December 2025, 02:03

I'm still working on this.

Edit lists in MP4 and MKV TSes seems to be one of the best kept secrets in video. I've done a ton of searching and reading and everyone blames MP4 and/or MKV. From what I've read, no one seems to know about edit lists. Let me give you a simple, easy-to-understand example using something you probably already do know about: soft telecine. After that I'll tie this subject into SE.

Telecine takes runs of four frames and makes runs of five frames in order to play on legacy televisions. Stated another way, telecine converts eight fields played at 23.976 fps to ten fields played at 29.970 fps. It does it by duplicating two of the fields. Usually, [Aa][Bb][Cc][Dd] on disc becomes [Aa][Bb][Bc][Cd][Dd] sent to the TV. Note that the process is technically called 2-3 pulldown, but everyone calls it telecine because the machines that did it were called telecines.

Around the year 2000, DVD makers realized that putting ten fields on disc to get 29.970 fps was wasting 20% of the potential disc space! What they needed was a way to get the playback unit (the decoder) to accept [Aa][Bb][Cc][Dd], and to then make [Aa][Bb][Bc][Cd][Dd] to feed the TV. MPEG okayed the scheme and thus was born soft telecine.

Making edit lists is like making soft telecine in that it gets decoders to finish the editing, however, unlike soft telecine instructions, edit lists should never appear in final videos. That edit lists _do_ appear in what people _think_ are final videos is their mistake.

So, why do edit lists exist, eh?

Imagine you're an editor who's cutting and splicing video to make a final cut. _Without_ edit lists, you have to wait for hours of re-coding to finish after every edit before you can see the results (unless of course you're editing raw video). _With_ edit lists you see the results almost immediately. You only need to re-code one time, at the end, when you've worked out all the final cuts and splices. During the editing, you can even watch before-&-after, two-window comparisons by telling the before-window to 'ignore_editlist'.

What I am seeing in SE is actual audio, timestamps, shot changes, and subtitles that are all in sync, but audio waveforms that appear to be 'ignore_editlist'. I have already confirmed it because I've succeeded with DVD 1of2.

The project I'm working on is the American release of a five-hour, German movie that spans two DVDs, composed of VOBs that change back and forth between 29.970 and 23.976 fps, and with German-to-English dubbing (so lip-sync is very difficult to discern). I'm working on the five-hour version of "Das Boot".

TR-9970X

9th December 2025, 02:35

I'm still working on this.

The project I'm working on is the American release of a five-hour, German movie that spans two DVDs, composed of VOBs that change back and forth between 29.970 and 23.976 fps, and with German-to-English dubbing (so lip-sync is very difficult to discern). I'm working on the five-hour version of "Das Boot".

Hi, I'm curious about this 5 hour version ??

I checked IMDb, and the only thing that comes close is the "mini series", which is split over 3 episodes, but I've seen it available in 6 parts.

The movie is from 1981, and the series is 1985. is that what you're working on ??

But for me, they appear to be in German (obviously), and that's not ideal, so hopefully it has accurate subtitles.

It would be interesting to translate the German audio to English...can SE do that accurately ??

markfilipak

9th December 2025, 07:02

Hi, I'm curious about this 5 hour version ??

I checked IMDb, and the only thing that comes close is the "mini series", which is split over 3 episodes, but I've seen it available in 6 parts.

There have been five versions. I have [1981], [1996], and [2004]. I did the summaries below based on what I have, which is complete, plus what I researched. I have Das Boot [2004] on two DVDs. It's those that are driving me nuts. [1981] and [1996] are blurays and were a breeze.

Das Boot [1981][1984][1988][1996][2004]

Das Boot [1981] is the 2-1/2 hour, theatrical release version. It's actually an abbreviation of the miniseries and the resulting plot is mostly event-driven.

Das Boot [1984] is the nearly 5 hour, television miniseries broken into 3, 100-minute episodes prefaced by previous-episode summaries. To the theatrical version's tale, the 5-hour version adds countermeasures employed against attacking destroyers, evasive maneuvers used for escapes, British mistakes, and German luck. It presents more of the day-to-day story: coping with weeks-long periods of stormy weather, maintaining sleep cycles, treating incessant skin ailments, reacting to news from home (e.g., the firebombing of Hamburg & Cologne), suffering through boredom and loneliness, upholding morale (e.g., the captain's personal counseling), etc.

Das Boot [1988] is the same miniseries, but broken into 6, 50-minute episodes prefaced by previous-episode summaries.

Das Boot [1996] is the so-called DIRECTOR'S CUT that includes all of the event-driven plot found in the theatrical release combined with some of the character-driven scenes found in the miniseries. This version was digitally restored, recut with 1 hour of new scenes added, and sonically redesigned with new sound effects mixed into 8 channels of digital surround sound.

Das Boot [2004] is the miniseries except it's continuous, without episode breaks or previous-episode summaries.
The movie is from 1981, and the series is 1985. is that what you're working on ??

But for me, they appear to be in German (obviously), and that's not ideal, so hopefully it has accurate subtitles.

It would be interesting to translate the German audio to English...can SE do that accurately ??
I don't know.

The subtitles are a funny story. The English dubbing is quite good regarding lip-sync, but the subtitles don't match the dubbing. You see, the subtitles are translations from German, and they're quite raunchy. The dub is often so different from the translated subtitles that you'd think they're from different movies. I like the subtitles because they illustrate just how prudish Americans are compared to Europeans.

I'm giving up on this project for a while.

I'm beginning to lean heavily in the direction of bugs in SE rather than MP4 edit lists as the source of problems. For example, I went to great effort to avoid MP4 and MKV in favor of M2TS and others (but there's not many others). When presented with AC3-in-M2TS, SE waveform behavior went wildly erratic: jumping around, stopping, skipping. I'd like to concatenate the VOBs via FFmpeg but FFmpeg just has too many bugs and the VOBs require too much clean-up. I tried, but have not succeeded. I've spent weeks on it to the exclusion of my actual life.

This has all been heartbreaking because I dearly love the 5-hour version, but it's on DVD and inaccessible because of FFmpeg and SE bugs.

TR-9970X

9th December 2025, 07:11

There have been five versions. I have [1981], [1996], and [2004]. I did the summaries below based on what I have, which is complete, plus what I researched. I have Das Boot [2004] on two DVDs. It's those that are driving me nuts. [1981] and [1996] are blurays and were a breeze.

Das Boot [1981][1984][1988][1996][2004]

Das Boot [1981] is the 2-1/2 hour, theatrical release version. It's actually an abbreviation of the miniseries and the resulting plot is mostly event-driven.

Das Boot [1984] is the nearly 5 hour, television miniseries broken into 3, 100-minute episodes prefaced by previous-episode summaries. To the theatrical version's tale, the 5-hour version adds countermeasures employed against attacking destroyers, evasive maneuvers used for escapes, British mistakes, and German luck. It presents more of the day-to-day story: coping with weeks-long periods of stormy weather, maintaining sleep cycles, treating incessant skin ailments, reacting to news from home (e.g., the firebombing of Hamburg & Cologne), suffering through boredom and loneliness, upholding morale (e.g., the captain's personal counseling), etc.

Das Boot [1988] is the same miniseries, but broken into 6, 50-minute episodes prefaced by previous-episode summaries.

Das Boot [1996] is the so-called DIRECTOR'S CUT that includes all of the event-driven plot found in the theatrical release combined with some of the character-driven scenes found in the miniseries. This version was digitally restored, recut with 1 hour of new scenes added, and sonically redesigned with new sound effects mixed into 8 channels of digital surround sound.

Das Boot [2004] is the miniseries except it's continuous, without episode breaks or previous-episode summaries.

I don't know.

The subtitles are a funny story. The English dubbing is quite good regarding lip-sync, but the subtitles don't match the dubbing. You see, the subtitles are translations from German, and they're quite raunchy. The dub is often so different from the translated subtitles that you'd think they're from different movies. I like the subtitles because they illustrate just how prudish Americans are compared to Europeans.

I'm giving up on this project for a while.

I'm beginning to lean heavily in the direction of bugs in SE rather than MP4 edit lists as the source of problems. For example, I went to great effort to avoid MP4 and MKV in favor of M2TS and others (but there's not many others). When presented with AC3-in-M2TS, SE waveform behavior went wildly erratic: jumping around, stopping, skipping. I'd like to concatenate the VOBs via FFmpeg but FFmpeg just has too many bugs and the VOBs require too much clean-up. I tried, but have not succeeded. I've spent weeks on it to the exclusion of my actual life.

I didn't realize there were so many variants...

I think I have a couple of different revisions of the movie, but until your previous post, I wasn't aware of a L-O-N-G mini series.

So I have downloaded the 6 part version, which looks pretty good video wise, the audio is German, but I have good (I hope) English subs.

So I'm going to edit each part, then join them altogether, should work out OK.

It's 1080p x265, with 6 ch E-AC3 audio.

It would be great to find an app that could translate the German to English, tho.

Cheers.

markfilipak

10th December 2025, 20:31

It would be great to find an app that could translate the German to English, tho.
Since you already have English subtitles, I assume you mean translating German speech to English speech. I think you would not like the result. Lip and mouth movements would look quite silly.

Dubbing into English is the art of substituting English words that match German lip and mouth movements while still maintaining the thread of conversations. Dubbing is not translation.

Edit: Have you seen the original Godzilla where they did exactly that: Translate Japanese dialog directly into English? Audiences laughed. Godzilla was a serious movie about the dangers of nuclear testing that became a sort of cult comedy.

TR-9970X

11th December 2025, 03:04

Since you already have English subtitles, I assume you mean translating German speech to English speech. I think you would not like the result. Lip and mouth movements would look quite silly.

Dubbing into English is the art of substituting English words that match German lip and mouth movements while still maintaining the thread of conversations. Dubbing is not translation.

Edit: Have you seen the original Godzilla where they did exactly that: Translate Japanese dialog directly into English? Audiences laughed. Godzilla was a serious movie about the dangers of nuclear testing that became a sort of cult comedy.

As mentioned in the PM's, I don't have to worry about translating, just sync & length.

But I have noticed that the subtitles don't match word for word with the dubbed English audio, I might try running the English audio thru SE using Whisper to create hopefully more accurate subs.

markfilipak

11th December 2025, 03:29

As mentioned in the PM's, I don't have to worry about translating, just sync & length.
Sync is not a problem.
But I have noticed that the subtitles don't match word for word with the dubbed English audio, I might try running the English audio thru SE using Whisper to create hopefully more accurate subs.
The subs are translations from the German dialog. The dubs are created dialog (art, as I tried to explain) so that lip and mouth movements don't promote laughter but the conversations still maintain the story. That's always true of dubs since the Godzilla fiasco.

I didn't 'fix' the subtitles because I think it's interesting how raunchy the actual German was. Americans tend to be prudish and the raunchy German shows how Europeans are different. ...Just makes for an extra study in culture along with the story. The English dubbing is so good, so clear, you don't really need subs.

TR-9970X

11th December 2025, 03:36

Sync is not a problem.

The subs are translations from the German dialog. The dubs are created dialog (art, as I tried to explain) so that lip and mouth movements don't promote laughter but the conversations still maintain the story. That's always true of dubs since the Godzilla fiasco.

I didn't 'fix' the subtitles because I think it's interesting how raunchy the actual German was. Americans tend to be prudish and the raunchy German shows how Europeans are different. ...Just makes for an extra study in culture along with the story. The English dubbing is so good, so clear, you don't really need subs.

I agree that the dubbing would be good, especially with Das Boot, and therein lies the problem with using the subs, in this case, as they are not accurate.

But it would still be interesting to create subs that DO match the dubbed English, wouldn't it ??

Have you used Whisper in SE to create subs for a video that hasn't got any ??

It takes quite a long time, depending on the used settings & libraries, but it turns out pretty damn good, I've used it a couple of times.

markfilipak

11th December 2025, 04:02

Have you used Whisper in SE to create subs for a video that hasn't got any ??
Nope.
It takes quite a long time, depending on the used settings & libraries, but it turns out pretty damn good, I've used it a couple of times.
Thanks for the tip. I'll try Whisper when the need arises.

jay123210599

14th December 2025, 14:45

Janusz

14th December 2025, 22:22

Can anyone help me with this?

Now I see what my problem is. The shortcuts I set up for moving videos frame by frame work for this menu (https://imgbox.com/WcMhGnID), but not for this menu (https://imgbox.com/VNpn3u3V). But why is that, and what should I do to fix it?

Let me put it this way... Something hasn't been fully implemented in the keyboard shortcuts in SubtitleEdit and its window for importing *.sub subtitles for editing. A workaround is to use [Video engine]=MPC-MC as the "Video player" and use the video player's shortcuts. For example, CTRL+Right Arrow moves the video forward by one frame, and CTRL+Left Arrow moves the video back by one frame. The spacebar starts video playback. These are the default settings for this video player, and you don't need to change anything. For this to work, in both cases, you need to click on the video window to make it active.

You can delete your settings in SubtitleEdit's "Settings/Shortcuts/Video" menu.

jay123210599

15th December 2025, 01:50

Let me put it this way... Something hasn't been fully implemented in the keyboard shortcuts in SubtitleEdit and its window for importing *.sub subtitles for editing. A workaround is to use [Video engine]=MPC-MC as the "Video player" and use the video player's shortcuts. For example, CTRL+Right Arrow moves the video forward by one frame, and CTRL+Left Arrow moves the video back by one frame. The spacebar starts video playback. These are the default settings for this video player, and you don't need to change anything. For this to work, in both cases, you need to click on the video window to make it active.

You can delete your settings in SubtitleEdit's "Settings/Shortcuts/Video" menu.

It worked, but whenever I move a frame forward or backward, this (https://imgbox.com/QcWaPElr) doesn't move along with the video. It always remained the same. How do I fix that?

hidef_rec

15th December 2025, 23:00

Is there a way to save a .sup (PGS) as a .sup after removing a few lines of the subtitle?

Janusz

16th December 2025, 09:56

@ hidef_rec
Yes. In the main program window, select File/Export/Bluray sup...

Music Fan

16th December 2025, 10:06

Or drag & drop the sup on SE main's window and the Import/OCR window will open.
There you can remove lines.
Then right click on the text, export, Blu-ray sup.

Nikse555

20th December 2025, 15:56

Is there a way to save a .sup (PGS) as a .sup after removing a few lines of the subtitle?

Or use File - Import - Blu-ray (.sup) subtitle file for edit...

Here you can delete/edit/adjust lines

TR-9970X

6th January 2026, 14:42

I have been using Whisper to transcode some videos that haven't got subtitles, and I'm using just the CPP with the large V2 model, it takes about 7 hours to do a 50-minute clip, and the first one I did was really good and accurate, the next one I did, far from it.

Is there a better, faster way or can I use whisper outside of SE ??

Cheers.

Emulgator

7th January 2026, 02:56

TR-9970X

7th January 2026, 03:18

Yes and yes.
VoodooFX has built Purfview's Faster-Whisper-XXL which I use most of the time as the quick first approach for English,
then refine with additional passes for CPP and models v1, maybe v2, v3 missing more than hitting.
plus there are 4 more working Whisper implementations within SE, some running beautilfully quick on GPU
(plus the last 2 I could not get to run), plus multiple Whisper standalones.

You're busy answering 2 threads :thanks:

OK, I just tried the Open AI model/engine with a medium model, didn't do a really good job.

Tried Const-me, which uses the GPU, so it's fast, but I only tried a small model, again, not too good.

Will try your suggestion, installing as we speak.

And I'm also trying a standalone, quite a lot to install for Whisper AI :(

So what model do you use for Purfview ??. and then run it thru again using CPP??

EDIT :- Just did a Purview run with medium model, started out in sync, then completely lost the plot :(

VoodooFX

7th January 2026, 04:26

EDIT :- Just did a Purview run with medium model, started out in sync, then completely lost the plot :(

What do you mean by "lost the plot"? You want to use large-v2 model for accuracy.
What language is that audio, can you upload audio to wetransfer?

TR-9970X

7th January 2026, 04:32

What do you mean by "lost the plot"? You want to use large-v2 model for accuracy.
What language is that audio, can you upload audio to wetransfer?

Meaning that it went way out of sync, that's all.

So recommending large v2, why not v3, and what's large turbo v3 ??

It's all English.

What is the point of just uploading the audio, when you've no video to check it against ??

VoodooFX

7th January 2026, 04:45

Meaning that it went way out of sync, that's all.

So recommending large v2, why not v3, and what's large turbo v3 ??

It's all English.

What is the point of just uploading the audio, when you've no video to check it against ??

large-v2 is best for English.
Turbo model is faster version of large-v3, but it's a bit less accurate.

Video is not needed at all to check audio/subtitles. Remux the file with MKVToolNix deselecting everything except audio.

Can you share srt too (where it's out of sync)?

TR-9970X

7th January 2026, 04:48

large-v2 is best for English.
Turbo model is faster version of large-v3, but it's a bit less accurate.

Video is not needed at all to check audio/subtitles. Remux the file with MKVToolNix deselecting everything except audio.

Can you share srt too (where it's out of sync)?

OK, I will download & try large v2, and if there's still issues, I will upload both.

Cheers & thanks.

EDIT :-

OK, I think there was an issue with the previous files I was using, when I demuxed it with MKVToolNix, it has some sync issues, already :(

So I used another old movie with no subs, and ran that thru with large v2. and it did it rather quickly, however, there are still a lot of sync issues, as you will probably find for yourself.

Here's the link to the audio & sub files, as requested.

https://www.mediafire.com/file/zo66yrozf4yq1xx/audio-subs.7z/file

VoodooFX

8th January 2026, 00:14

This tarzan audio is out of sync too. Your subtitles fits OK on waveform.
When you decode your audio it's shorter by 6 seconds.

TR-9970X

8th January 2026, 00:49

This tarzan audio is out of sync too. Your subtitles fits OK on waveform.
When you decode your audio it's shorter by 6 seconds.

Hi, well that's not good news, not good at all :(

Do you have any suggestions on what I can do ??

Is there a particular way to demux/decode so there is no sync issues, or is that just how it is ??

Can SE or other app re-sync everything ??

Thanks for your help :)

EDIT :-

I just ran a very different video thru SE, and using the large v2 model, it transcribed the 50 minute video very quickly.

However, on play back, there were a LOT of lines missed, that should have been recognised, but at least it was pretty much synced, all the way thru.

Any suggestions ??

VoodooFX

8th January 2026, 17:15

Do you have any suggestions on what I can do ??

Try "--ff_sync" argument, if you run from SE then be sure that SE passes the original file.

Here is tarzan.srt: https://pastebin.com/D8RBuhUX [Produced with Faster-Whisper-XXL Pro, so it will be a bit more accurate]

I just ran a very different video thru SE, and using the large v2 model, it transcribed the 50 minute video very quickly.

However, on play back, there were a LOT of lines missed, that should have been recognised, but at least it was pretty much synced, all the way thru.

Any suggestions ??
Share the audio, the command used and timings where something is missing.

TR-9970X

9th January 2026, 01:25

Try "--ff_sync" argument, if you run from SE then be sure that SE passes the original file.

Here is tarzan.srt: https://pastebin.com/D8RBuhUX [Produced with Faster-Whisper-XXL Pro, so it will be a bit more accurate]

Share the audio, the command used and timings where something is missing.

Hi VoodooFX, I tried nearly everything in SE yesterday, and the results differ so much, but ALL still have some issues of sync, or missed lines or even added lines that are not in the audio :(

One annoying thing, if I try and load the Waveform, it throws up a whole list of errors, and it can't be used :(

So I've just about run out of options to get these movies I would like subtitles to.

I will try a few more things today, including your suggestion.

Will keep you updated.

Also, thanks for the "tarzan" subs :)

Cheers.

EDIT :- Have tried several more settings & models, it's just not working for me...well not as good as it should or could.

Maybe I'm expecting too much :(

VoodooFX

9th January 2026, 06:53

Hi VoodooFX, I tried nearly everything in SE yesterday, and the results differ so much, but ALL still have some issues of sync, or missed lines or even added lines that are not in the audio :(

So I've just about run out of options to get these movies I would like subtitles to.

Where do you get these out of sync files, from youtube?

Here is exact command that I've used:

faster-whisper-xxl.exe tarzan.mka -l en -m large-v2 --vad_method pyannote_v3 -o source --standard --max_gap 1 --ff_sync -hst 2 -ct float16 --realign

Remove "--realign" if you don't use Pro version, and remove "-ct float16" if you don't have GPU with CUDA. Remove "--ff_sync" if a file doesn't have sync problems.

Whisper model is non-deterministic by default, you can get different results every run, especially on this audio where monkeys are screaming and Aborigines are mumbling, model will increase temperature and will hallucinate on those parts.

One annoying thing, if I try and load the Waveform, it throws up a whole list of errors, and it can't be used :(

I didn't have any problems getting waveform from that mka file in SE, but it's useless because it's out of sync with the original audio.
And if you use "--ff_sync" then srt will be out of sync to the waveform, but it will be in sync with the actual video file.

EDIT :- Have tried several more settings & models, it's just not working for me...well not as good as it should or could.

Just share an audio with a problem and I can check it.

TR-9970X

9th January 2026, 07:09

Where do you get these out of sync files, from youtube?

No, they are "proper" downloaded files, rar sets, mainly.

Here is exact command that I've used:

faster-whisper-xxl.exe tarzan.mka -l en -m large-v2 --vad_method pyannote_v3 -o source --standard --max_gap 1 --ff_sync -hst 2 -ct float16 --realign

Can I add that code somewhere in SE ??, or do you use a cmd line process ??

Remove "--realign" if you don't use Pro version, and remove "-ct float16" if you don't have GPU with CUDA. Remove "--ff_sync" if a file doesn't have sync problems.

Don't have Pro (that's contribution version, isn't it??)

I have been using the PC with a 4080 Super.

Whisper model is non-deterministic by default, you can get different results every run, especially on this audio where monkeys are screaming and Aborigines are mumbling, model will increase temperature and will hallucinate on those parts.

Yes, I certainly seem to get different results :(

I didn't have any problems getting waveform from that mka file in SE, but it's useless because it's out of sync with the original audio.

I'm still curious why the wave form errors out.

Just share an audio with a problem and I can check it.

I will have some more attempts, but I've just about had enough for the time being :(

I REALLY appreciate your ongoing help.

I've been using SE for years, but only basically, it wasn't until I started asking certain question a couple of weeks ago that I have found just how much more SE can do, and it's SO much better than what I used to do.

VoodooFX

9th January 2026, 07:38

Can I add that code somewhere in SE ??, or do you use a cmd line process ??

In SE you can add the commands in the Advanced field there (but you need to understand what you are doing, because SE is passing some commands on its own too).
I always use it directly in console/terminal. You can check these links for fancy usage:
"One Click Transcribe" tool (https://github.com/Purfview/whisper-standalone-win/discussions/337)
Context menu tool (https://github.com/Purfview/whisper-standalone-win/discussions/539)

Don't have Pro (that's contribution version, isn't it??)

Yeah, "Pro (https://github.com/Purfview/whisper-standalone-win/discussions/456)" is not free version.

I'm still curious why the wave form errors out.

Maybe something with ffmpeg version. I use 7.1.

TR-9970X

9th January 2026, 07:47

In SE you can add the commands in the Advanced field there (but you need to understand what you are doing, because SE is passing some commands on its own too).
I always use it directly with console. You can check these links for fancy usage:
"One Click Transcribe" tool (https://github.com/Purfview/whisper-standalone-win/discussions/337)
Context menu tool (https://github.com/Purfview/whisper-standalone-win/discussions/539)

Well, I don't really understand what I'm doing :(
But I know where the Advanced commands go.
I'll just copy most of your line, and see what happens.

Yeah, "Pro" is not free version.

Is it a one-off payment, or subscription ??

Maybe something with ffmpeg version. I use 7.1.

Using 8.0.1

Thanks again, I will see if I can educate myself to do it differently.

VoodooFX

9th January 2026, 08:01

I know where the Advanced commands go.
I'll just copy most of your line, and see what happens.

Just don't add these to SE: tarzan.mka -l en -m large-v2 -o source.
And untick all postprocessing boxes there.

BTW, you want to check the whisper log file in SE, to be sure that it's using the original file as input and not a temp wav. In some cases SE can use temp wav, that's no bueno for "--ff_..." args, as those need original file input.

Is it a one-off payment, or subscription ??

It's one-off, but I wouldn't mind a subscription too. :D

TR-9970X

9th January 2026, 09:03

Just don't add these to SE: tarzan.mka -l en -m large-v2 -o source.
And untick all postprocessing boxes there.

BTW, you want to check the whisper log file in SE, to be sure that it's using the original file as input and not a temp wav. In some cases SE can use temp wav, that's no bueno for "--ff_..." args, as those need original file input.

It's one-off, but I wouldn't mind a subscription too. :D

Curiosity got the better of me, so I've run a couple of transcripts using your suggested command line, unchecked the post processing options, and both subs turned out pretty damn good, the sync seems very very close,
there are a couple of words that aren't where they should be, but one thing I noticed, it didn't pick up on any sort of background quiet audio (speech), only the main audio.

Are there any other commands that might help with that ??

I know you recommend large v2, but should I give v3 a try (not the turbo).

VoodooFX

9th January 2026, 10:41

...one thing I noticed, it didn't pick up on any sort of background quiet audio (speech), only the main audio.

Are there any other commands that might help with that ??

Maybe, dunno without the audio.

I know you recommend large v2, but should I give v3 a try (not the turbo).

Try it if you want. I wouldn't.

TR-9970X

9th January 2026, 12:00

Maybe, dunno without the audio.

Try it if you want. I wouldn't.

I will have another session tomorrow, and see how I go, probably won't get the v3 (but you never know).

I'll just have to send you the audio track, and the best .srt that I can produce.

Currently the audio is .flac, but that shouldn't be an issue, I can convert it to .ac3 if you think that might make a difference.

Cheers.

VoodooFX

9th January 2026, 15:39

I'll just have to send you the audio track, and the best .srt that I can produce.

Currently the audio is .flac, but that shouldn't be an issue, I can convert it to .ac3 if you think that might make a difference.

I don't need srt, just write the times where something is missing.
No need to touch audio.

TR-9970X

10th January 2026, 03:50

I don't need srt, just write the times where something is missing.
No need to touch audio.

Hello again,

OK, done more testing, now it's over to you to see if anything else can be done.

I have added the audio, a couple of .srt's (they're small for comparison), and the error msg that Waveform pops up...

And a read me.txt, so please read thru that, saves me adding it here.

https://www.mediafire.com/file/yahfak2vt9t0g6d/New+folder.7z/file

VoodooFX

10th January 2026, 12:41

https://www.mediafire.com/file/yahfak2vt9t0g6d/New+folder.7z/file

Those missing phrases/words were not identified as voice by vad, so there was no attempt to transcribe those areas.

Here is srt produced with Pro version: https://pastecode.io/s/nds33xaw

faster-whisper-xxl.exe boot.flac -l en -m large-v2 --vad_method pyannote_v3 -o source --standard --max_gap 1 -hst 2 -ct float16 --ff_vocal_extract mb-roformer --realign

BTW, that CPP srt is weird, 00:00:00-00:00:30 is not in audio, timestamps are off by up to dozens seconds.

TR-9970X

10th January 2026, 13:04

Those missing phrases/words were not identified as voice by vad, so there was no attempt to transcribe those areas.

That's interesting, I'm sure that other models did pickup some of that "missing" audio/text.

I will compare them tomorrow.

Here is srt produced with Pro version: https://pastecode.io/s/nds33xaw

faster-whisper-xxl.exe boot.flac -l en -m large-v2 --vad_method pyannote_v3 -o source --standard --max_gap 1 -hst 2 -ct float16 --ff_vocal_extract mb-roformer --realign

I should be able to use some of this command line in non Pro, there's only a little bit more than the previous.

BTW, that CPP srt is weird, 00:00:00-00:00:30 is not in audio, timestamps are off by up to dozens seconds.

Well, I'm glad that I sent them too :)

From the quick look I've had of the subs you sent, it looks like it grabbed a LOT of extra dialogue :)

BTW, did you bother to check if my files opened Waveform ??

VoodooFX

10th January 2026, 13:55

That's interesting, I'm sure that other models did pickup some of that "missing" audio/text.

Not "did pickup", as they don't try to pickup anything. They never checked that it's actually a speech what they are trying to transcribe. That's why you get there those hallucinations when there is no speech at all.

I should be able to use some of this command line in non Pro

You wont be able to use that as those are only Pro features.

From the quick look I've had of the subs you sent, it looks like it grabbed a LOT of extra dialogue

Yes, because cpp is weird, instead of transcribing stuff, cpp outputs "(speaking in foreign language)" bs.

BTW, did you bother to check if my files opened Waveform ??

Yes, opened after I named your files properly.

TR-9970X

10th January 2026, 14:29

Not "did pickup", as they don't try to pickup anything. They never checked that it's actually a speech what they are trying to transcribe. That's why you get there those hallucinations when there is no speech at all.

You wont be able to use that as those are only Pro features.

I used this :- --vad_method pyannote_v3 -o source --standard --max_gap 1 --ff_sync -hst 2 -ct float16

and all I can see that's different is this :---ff_vocal_extract mb-roformer

Yes, because cpp is weird, instead of transcribing stuff, cpp outputs "(speaking in foreign language)" bs.

Yes, opened after I named your files properly.

OK, so what did you rename them to, so I can check if it's my different version of FFMPEG that is the problem. I'm guessin' just a really simple name, then.

So one question about Pro version, is it a standalone command line, or can it be used within SE ??

Thanks

VoodooFX

10th January 2026, 14:40

OK, so what did you rename them to, so I can check if it's my different version of FFMPEG that is the problem. I'm guessin' just a really simple name, then.

To the same name as audio. BTW, I use older SE version - 4.0.12.

So one question about Pro version, is it a standalone command line, or can it be used within SE ??

Yes to both.

markfilipak

15th January 2026, 00:27

Above this message there are a lot of pleas for help regarding audio sync and loss of sync and clipped audio. I'm sorry that I don't have time to respond but I hope what I write here will help.

ISOM containers like MP4 and MOV can contain edit lists. I believe that's also true of MKV but I don't use MKV. If you edit audio without recoding it, that will likely result in an edit list being generated and attached to the AV. If you then audition the AV without recoding the audio, and if there is an edit list, then the player/utility may or may not appear to have synchronized audio depending upon whether it honors edit lists and how well it honors edit lists. You can detect and view edit lists via FFprobe, but it's complicated.

SE's waveform rendering -- specifically, the rendered pictures of the timing -- seems to have some issues with edit lists. What I do is recode the audio in an MP4 so that there's no edit lists. The audio must be cleaned of edit lists. For example, don't just 'FFmpeg -ss...' or 'FFmpeg -to...' or 'FFmpeg --itsoffset...' or 'FFmpeg -itsscale:a...' without recoding the audio -- for the final AV and any temporary AV that you're going to use in SE.

If I'm going to be working on the audio extensively, I recode it to frac so that it's a separate .FRAC stream and then edit to my heart's content. If I need to see it in an AV, I make a temporary AV just to do the viewing. Then, in the final MP4, I recode the frac to whatever audio I want. By keeping the frac audio separate, it stays pristine and the final MP4 has no edit lists. All that takes planning and it is tedious to make temporary AVs just to see some result I need to see, but it's worth it in the end.

All that I've written above is in addition to some SE waveform rendering issues that appear to be unrelated to edit lists. I'm still trying to figure it out.

I've written extensively about edit lists during the past month and my explorations are here in the SE forum. I promised to clean up my documentation and I reiterate that promise. But that has to wait until the time when I've figured it all out.

I hope my experiences help you. In the mean time I'm positioning subtitles based on the audio that MPV plays, not on the positions of the waveforms that SE renders.

jay123210599

17th January 2026, 18:37