Log in

View Full Version : Standalone Faster-Whisper-XXL - AI auto-transcription-translation


Pages : 1 [2] 3

Yosho
26th April 2024, 15:46
No idea what you are missing there, you tell me what you are missing... only the input is needed to run it.

I can tell you what I'm missing - I don't see you posting your problem, if you have any.

I figured it out.
I now have subtitles. They are out of sync, and needing to be synced to the dialog.
Do you have something you regularly use for automatic subtitle syncing, or advice on what's best in your educated opinion to use for automatic subtitle syncing?

VoodooFX
27th April 2024, 09:59
I now have subtitles. They are out of sync, and needing to be synced to the dialog.

For me it produces mostly in sync results, can you share an audio sample where it's out of sync?


Do you have something you regularly use for automatic subtitle syncing, or advice on what's best in your educated opinion to use for automatic subtitle syncing?

There is no perfect automatic subtitle syncing.

VoodooFX
21st January 2025, 05:58
If someone doesn't have fast CUDA GPU, you can use one for free in Colab.

EDIT: Look at the first post.

TR-9970X
22nd January 2026, 09:10
Thanks for directing me here:-

So I have been spending a LOT of time with the 6 episode near 5 hour's of Das Boot.

Have downloaded so many variants, I'm getting lost & confused.

So to cut a very long story short, I got 2 different sets, as one was 1080p. and the 720p had the good dubbed engish audio, but each episode was minutes shorter, so I had to overcome that issue, and did, the audio overdub is preety damn good :)

So once I trimmed all the episodes of the useless intro, outro & recaps, I then wanted to transcribe the dubbed english audio. to get matching subtitles. as the original audio & english subs are crap.

I tried Whisper @ Google Colab, and that did a pretty good job on the majority, but there a few really tricky bits that I want to get as correct as I can, and this one I've sent you is a little song, either in French or German. and I just can't quite get it.

I tried to set up WhisperX on my PC, but I got stuck at some error that I couldn't get any help on. so I gave up.

Using SE, the only model that came close to getting this clip, was const-me large. ALL the other's didn't find anything :(

I'd really like to be able to post my final compilation somewhere for anyone that wants a good "ENGLISH" copy of Das Boot, the massive near 5 hours of it.

Here's the file, I sent a video, so you can see what you're dealing with.

https://www.mediafire.com/file/s8kbobp1sv30p5v/Das+Boot+(1985)+S01E01+la+belle+song.7z/file

BTW, I figured out how to use waveforms, what a godsend that is :)

Cheers & thanks again.

VoodooFX
22nd January 2026, 10:33
https://www.mediafire.com/file/s8kbobp1sv30p5v/Das+Boot+(1985)+S01E01+la+belle+song.7z/file


The command used:

-m large-v2 --vad_method pyannote_v3 -o source --standard --max_gap 1 --realign --ff_vocal_extract mb-roformer --multilingual true --batched



Subs:

1
00:00:01,076 --> 00:00:01,701
Exactly.

2
00:00:03,608 --> 00:00:04,080
Me too.

3
00:00:04,705 --> 00:00:09,019
Avec les marins s'attendent, les ponts
cassant leur aveugle.

4
00:00:09,419 --> 00:00:14,500
Moi, je sais bien ce qui leur manque à
tous ces mondes et leurs généraux.

5
00:00:14,859 --> 00:00:21,917
Allez, je m'appelle la reine, la reine du
pays de la Rochelle.

6
00:00:22,120 --> 00:00:29,492
Côle bleu, sentinelle, m'appelle... M
'appelle la reine de la Rochelle.

7
00:00:32,447 --> 00:00:36,042
Come on now, he's too young to play games
with you, Monique, huh?




I'd really like to be able to post my final compilation somewhere for anyone that wants a good "ENGLISH" copy of Das Boot


Why? There are retail English subs for it.

TR-9970X
22nd January 2026, 10:56
The command used:

-m large-v2 --vad_method pyannote_v3 -o source --standard --max_gap 1 --realign --ff_vocal_extract mb-roformer --multilingual true --batched



Subs:

1
00:00:01,076 --> 00:00:01,701
Exactly.

2
00:00:03,608 --> 00:00:04,080
Me too.

3
00:00:04,705 --> 00:00:09,019
Avec les marins s'attendent, les ponts
cassant leur aveugle.

4
00:00:09,419 --> 00:00:14,500
Moi, je sais bien ce qui leur manque à
tous ces mondes et leurs généraux.

5
00:00:14,859 --> 00:00:21,917
Allez, je m'appelle la reine, la reine du
pays de la Rochelle.

6
00:00:22,120 --> 00:00:29,492
Côle bleu, sentinelle, m'appelle... M
'appelle la reine de la Rochelle.

7
00:00:32,447 --> 00:00:36,042
Come on now, he's too young to play games
with you, Monique, huh?





Why? There are retail English subs for it.

Of course I'm guessin' that's with your Pro version ??

Can it do translations??

I let Google translate, and now I should be able to add that.

I have searched far & wide, and all the English subtitles are crap, the dubbed english transcript is pretty much spot on.

And the 6 part Das Boot is pretty rare, as well.

So you've all but convinced me that your Pro version is pretty damn good.

So, not to be a pain, but how do I get it, how much is it going to cost me, and how to integrate it in SE.

Oh, and thanks for doing that little job for me so quickly:)

VoodooFX
22nd January 2026, 11:18
Of course I'm guessin' that's with your Pro version ??

Can it do translations??

I let Google translate, and now I should be able to add that.

So, not to be a pain, but how do I get it, how much is it going to cost me, and how to integrate it in SE.

Yes.
Yes, it does translation to English.
Its translation is better than Google translate.
Look there: https://github.com/Purfview/whisper-standalone-win/discussions/456
Just copy it to the same folder where the regular version is in SE. (Delete the old files there, excluding the models)


I have searched far & wide, and all the English subtitles are crap, the dubbed english transcript is pretty much spot on.

The retail subs are for the original German audio, I guess you're comparing them to the dubbed audio, which is different from the original.

TR-9970X
22nd January 2026, 11:46
Yes.
Yes, it does translation to English.
Its translation is better than Google translate.
Look there: https://github.com/Purfview/whisper-standalone-win/discussions/456
Just copy it to the same folder where the regular version is in SE. (Delete the old files there, excluding the models)




The retail subs are for the original German audio, I guess you're comparing them to the dubbed audio, which is different from the original.

OK, so if you're minimum "donation" is £50, that calculates to nearly AU$100...is that correct ??

From what I've seen, the english that comes with the video, isn't that good, and the subtitles don't match either.

The dubbed english was done by the actors that were in the movie, so it's pretty good.

VoodooFX
22nd January 2026, 11:52
OK, so if you're minimum "donation" is £50, that calculates to nearly AU$100...is that correct ??
Yes, it should be around that.

TR-9970X
22nd January 2026, 11:57
Yes, it should be around that.

OK, well, I'm going to have to give that some serious thought.

Regards.

TR-9970X
25th January 2026, 05:39
Yes, it should be around that.

Hi Voodoo,

Well, I have been wasting my time trying to setup various Whisper versions, and even after following the YouTube instructions as best I can, the end result is pretty much the same, they don't work :(

And it's hard to get support when a lot of the clips are 12 months old, or older....

So I think that you have proven to me what your build can do, I should just "bite the bullet", and donate to you, to get the Pro version.

I'm pretty sure you will provide good support, if I have any issues, or need some experienced assistance.

So I just want to confirm, that the minimum "donation" to be eligible for Pro is £50, is that correct ??

Regards

VoodooFX
25th January 2026, 08:07
Hi Voodoo,

Well, I have been wasting my time trying to setup various Whisper versions, and even after following the YouTube instructions as best I can, the end result is pretty much the same, they don't work :(

Hi. That's why I made it, because the original Whisper and other implementations weren't good enough for me.



So I think that you have proven to me what your build can do, I should just "bite the bullet", and donate to you, to get the Pro version.

I'm pretty sure you will provide good support, if I have any issues, or need some experienced assistance.

So I just want to confirm, that the minimum "donation" to be eligible for Pro is £50, is that correct ??

The Pro version has some extra features, if you want them, it's £50 at the moment.

TR-9970X
25th January 2026, 09:17
hi. That's why i made it, because the original whisper and other implementations weren't good enough for me.




The pro version has some extra features, if you want them, it's £50 at the moment.

sold !!!

VoodooFX
25th January 2026, 09:39
sold !!!
Thanks for the donation, enjoy the Pro version! :thanks:

TR-9970X
25th January 2026, 10:07
Thanks for the donation, enjoy the Pro version! :thanks:

OK, successfully downloaded & unpacked.

Turned out to be AU$103.21.

So you have already mentioned how to add this to SE,

Just copy it to the same folder where the regular version is in SE. (Delete the old files there, excluding the models)


Are there any other instructions?

Be aware, that I could end up asking quite a few "stupid" questions, until I get the hang of it, so sorry in advance.

VoodooFX
25th January 2026, 10:25
Are there any other instructions?

Be aware, that I could end up asking quite a few "stupid" questions, until I get the hang of it, so sorry in advance.

No.
There are no stupid questions, as it's a pretty technically sophisticated app. :)

TR-9970X
26th January 2026, 02:32
No.
There are no stupid questions, as it's a pretty technically sophisticated app. :)

OK, here's my first "question's". Since sending this I have figured out most of my questions :)

I've just run it for the first time running at your defaults, dragging the audio file to a shortcut on the Desktop, it downloaded the medium model, so now I know where they go, and what the directory looks like :)

How to copy the models I downloaded for your app, within SE ?? sorted

Hopefully I don't need to download them again, although that would ensure they were the correct ones. don't

To change the transcription commands. is that here :- confirm this, tho :)

:: Start processing
"%dp%faster-whisper-xxl.exe" %file_list% -pp -o source --batch_recursive --check_files --standard -f json srt -m medium

And just to confirm, adding this to SE so it can be run from there (which probably isn't necessary) (see attached screenshot)

https://imgur.com/eyLGzwt

Does it go in Users\Appdata\Roaming, or Program Files.? also sorted

That's all for now. :rolleyes:

PS:- It would nice to have SE display that it's now the Pro version.

EDIT:- OK, I have done several passes on a Das Boot audio track, and it's doing a pretty good job, but there might be some more advanced commands that might improve it even more.

It's miss pronouncing a few words, but nothing a little manual editing won't fix.

Can a form of SDH sub's be generated, for example if I use const-me large, it does, to a degree.

VoodooFX
26th January 2026, 12:53
Can a form of SDH sub's be generated, for example if I use const-me large, it does, to a degree.

Forget that const-me, it's a subpar implementation and abandonware.
It can, but to get "SDH" you need to disable some quality settings/safety measures, basically you are asking for hallucinations.

Use, --suppress_tokens="" or --suppress_tokens=None, both should've different behavior at the low level. [I don't remember differences, empty list [""] is not intended behaviour, but there were reports that it was useful for something]

Then you may want to disable VAD audio preprocess, --vad_filter=false, and don't enable other audio preprocess.

Then probably you want to enable -hst=2 to combat hallucination.



Hopefully I don't need to download them again

You can set --model_dir to the path of a models folder if you don't want it to look in the default location.


And just to confirm, adding this to SE so it can be run from there (which probably isn't necessary) (see attached screenshot)

That site doesn't work for me. Anyway, about SE, ask at the SE thread.

TR-9970X
26th January 2026, 13:16
Forget that const-me, it's a subpar implementation and abandonware.
It can, but to get "SDH" you need to disable some quality settings/safety measures, basically you are asking for hallucinations.

Use, --suppress_tokens="" or --suppress_tokens=None, both should've different behavior at the low level. [I don't remember differences, empty list [""] is not intended behaviour, but there were reports that it was useful for something]

Then you may want to disable VAD audio preprocess, --vad_filter=false, and don't enable other audio preprocess.

Then probably you want to enable -hst=2 to combat hallucination.

Excellent, I will give that a try tomorrow



You can set --model_dir to the path of a models folder if you don't want it to look in the default location.



That site doesn't work for me. Anyway, about SE, ask at the SE thread.

I've been able to figure out where the models go, and to get Pro working within SE, was pretty easy once I checked out the folder & files within the Subtitle Edit default location.

I am watching Pt 5 of Das Boot, in which I transcribed the audio track with 5 different models, Medium, Large v1, v2 & v3, and each one has slightly different results, so I will be able to go thru and use the best or most correct subtitle for each line, tedious, but accurate.

I'm just being VERY particular with this movie/series.

So with the drag & drop onto the desktop shortcut, can the default command line be changed, like using the Advanced command line used in SE ??

Although I will probably prefer to use SE, it would be nice to customise the drag & drop process.

So now that I have Pro, I can run several models of the transcription, and get the subs spot on, and it's interesting to see how much the GPU is used during the process....

Still got a lot of testing to do, but so far, pretty happy :)

Cheers

VoodooFX
26th January 2026, 13:47
So with the drag & drop onto the desktop shortcut, can the default command line be changed, like using the Advanced command line used in SE ??
Although I will probably prefer to use SE, it would be nice to customise the drag & drop process.


Of course, the commands can be changed, added, or removed however you like.
Like I said before, be aware that on some files SE doesn't work as expected, the results can be worse then.

TR-9970X
27th January 2026, 05:30
Of course, the commands can be changed, added, or removed however you like.
Like I said before, be aware that on some files SE doesn't work as expected, the results can be worse then.

Doom9 has been down for most of today (my time)

Hi, so I've tried to get some SDH stuff working, but I don't think it's producing what I thought it should..the only thing I think I noticed where a lot of "speech marks" (these things ")

Here's the command I used, it's probably quite wrong :-

"%dp%faster-whisper-xxl.exe" %file_list% -pp -o source --batch_recursive --check_files --standard --vad_method pyannote_v3 -o source --standard --max_gap 1 -hst 2 -ct float16 --ff_vocal_extract mb-roformer --realign -f srt -m large-v2 --language en -suppress_tokens"" --vad_filter=false

And another question, is there a way to transcribe foreign language parts, into english, or the language of the part ?? (Would you need to know what the language was to start with?)

OR, if you transcribed the video to catch the english parts, then ran a different script to capture the foreign parts ??

Regards.

TR-9970X
27th January 2026, 08:43
Got an error whilst trying to transcribe a long ac3.

Audio filtering is in progress...
Estimating duration from bitrate, this may be inaccurate
Estimating duration from bitrate, this may be inaccurate
MB-RoFormer model running on CUDA: 1% | 31/2927 | 11:34<<18:00:48

Traceback (most recent call last):
File "__main__.py", line 212, in ffmpeg_audio
File "faster_whisper\roformer_infer.py", line 234, in RoFormer_separator
File "faster_whisper\roformer_infer.py", line 83, in demix_track
torch.AcceleratorError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Press any key to continue . . .

VoodooFX
27th January 2026, 08:52
I noticed where a lot of "speech marks" (these things ")

I don't know what that means.
Anyway, remove ff_vocal_extract, what "SDH" do you expect when all non-voice is removed from the audio. :)

And I think, maybe you need to use one from these too: -prompt None or --reprompt false or --prompt_reset_on_no_end 0 [a former will disable the latter args]
EDIT: Or maybe there will be no harm in not disabling prompt_reset_on_no_end.


And another question, is there a way to transcribe foreign language parts, into english, or the language of the part ?? (Would you need to know what the language was to start with?)

Try --task translate
The model is not meant to transcribe multi-language audio, those measures are workarounds.

VoodooFX
27th January 2026, 09:01
long

Let's not operate with abstractions. What "long" means? What "speech marks" means?

I guess you run out of RAM/VRAM. Try --roformer_vram 6 or other value.

TR-9970X
27th January 2026, 09:06
I don't know what that means.
Anyway, remove ff_vocal_extract, what "SDH" do you expect when all non-voice is removed from the audio. :)
And I think, you need to use one from these too: -prompt None or --reprompt false [the former will disable the latter]

OK, will give it a try :)

I don't know what to expect, I was just asking.


Try --task translate
The model is not meant to transcribe multi-language audio, those measures are workarounds.

Lots of question's, sorry.

TR-9970X
27th January 2026, 09:10
Let's not operate with abstractions. What "long" means? What "speech marks" means?

I guess you run out of RAM/VRAM. Try --roformer_vram 6 or other value.

By long, nearly 5 hours, 1.3Gb !!!!

I had never heard of "speech marks" until I heard a guy on YouTube saying it.

I used to always call them inverted comma's.... " "

I'm running a 4080 Super, surely there won't be a VRAM issue.

VoodooFX
27th January 2026, 09:41
By long, nearly 5 hours

I'm running a 4080 Super, surely there won't be a VRAM issue.

For example, when running mb-roformer on CPU with 3:26:00 long audio, it eats ~28GB RAM.




I had never heard of "speech marks" until I heard a guy on YouTube saying it.

I used to always call them inverted comma's.... " "

I don't know what " " abstraction means too. Please spare me from any abstractions. :)
If you have an issue, please post showing the exact problem, preferably with an audio example to reproduce it.

TR-9970X
27th January 2026, 11:37
For example, when running mb-roformer on CPU with 3:26:00 long audio, it eats ~28GB RAM.

OK, so can you please provide the command needed to use the CPU instead of the GPU ?



I don't know what " " abstraction means too. Please spare me from any abstractions. :)
If you have an issue, please post showing the exact problem, preferably with an audio example to reproduce it.

I don't think I've heard this word "abstraction" before.

What I was trying to explain, that when I used one of your commands, it appeared to produce a lot of extra " " throughout the subtitles.

Here's a Google explanation:-

Speech marks, also known as quotation marks or inverted commas, are punctuation marks used to indicate direct speech or quotations in writing.

VoodooFX
27th January 2026, 13:18
I don't think I've heard this word "abstraction" before.

Aren't you a native English speaker?
https://en.wikipedia.org/wiki/Abstraction


What I was trying to explain, that when I used one of your commands, it appeared to produce a lot of extra " " throughout the subtitles.
Here's a Google explanation:


I know what quotes are, still, I've no idea what the issue is. Or there is no issue?

OK, so can you please provide the command needed to use the CPU instead of the GPU ?

--voc_device cpu
Why you need it when you have CUDA GPU?

TR-9970X
28th January 2026, 00:46
Aren't you a native English speaker?
https://en.wikipedia.org/wiki/Abstraction

Yes, Australian, but in all my years in the workplace, and my small circle of friends, I can safely say that "abstraction" has NEVER been in any conversation or discussion.



I know what quotes are, still, I've no idea what the issue is. Or there is no issue?

Again, I thought I noticed a lot of extra "'s, when I used one of your commands to attempt SDH, that's all.

--voc_device cpu
Why you need it when you have CUDA GPU?

You just told me that a 3:26:00 long audio can use 28Gb of RAM, so that would require a CPU, as the 4080 "only" has 16Gb

Anyway, another day of discovery & learning.

VoodooFX
29th January 2026, 12:05
You just told me that a 3:26:00 long audio can use 28Gb of RAM, so that would require a CPU, as the 4080 "only" has 16Gb

That was just for an example, use the command that I wrote.

TR-9970X
31st January 2026, 00:34
Hi Voodoo, I would like to propose a challenge.

I have only been using Pro for about a week, and it's proving to be very good, of course depending what model is used, and I think on a straight forward english transcribe, it's probably the best available, atm.

I have asked for your help with several issues, and you've provided appropriate suggestions, but as a true newbie, I get confused, as it is very complex, and some of your suggested commands have not yielded what I expected :(, but that's alright.

The SDH commands didn't really provide and SDH (well the type I'm familiar with) results, but again, that's not important.

As you know I have been transcribing the dubbed english audio for the 5 hours Das Boot series, and it's VERY time consuming, and I am still having to do a huge amount of manual editing & adding of lines of text.

What I have done is run the audio track thru Pro, with the medium, large v1, v2 & v3 models, and then having the results open in notepad, and also having the video open in SE, and going thru using Waveform to compare between ALL the models.

This has turned out to be a very accurate way of getting all possible subtitles, even if I have to listen to certain lines, over, & over, & over again to get it correct.

There is a reasonable amount of dialogue that is loud & clear enough to be picked up during the transcription, but isn't, but then there's some that is, but there are also sections that are very fast & confusing & noisy, that is a big problem.

Would increasing the volume of the audio track help ?

So anyway, enough of that, what I would really appreciate is, if I sent you the audio for one full part of the series, (a difficult part) along with the .srt that I have edited, could you do a transcription, and do as many commands as you know, to get it as close (or better) to my .srt ??, and maybe an SDH test.

If your not interested, I'll understand, but you ARE the creator of this, and this would be a very good "test"/challenge for Pro.

Regards.

VoodooFX
31st January 2026, 09:09
The SDH commands didn't really provide and SDH (well the type I'm familiar with) results, but again, that's not important.

Post the problem with all info to reproduce.


There is a reasonable amount of dialogue that is loud & clear enough to be picked up during the transcription, but isn't,

Post the problem with all info to reproduce.


Would increasing the volume of the audio track help ?

Most likely that it would not.



So anyway, enough of that, what I would really appreciate is, if I sent you the audio for one full part of the series, (a difficult part) along with the .srt that I have edited, could you do a transcription, and do as many commands as you know, to get it as close (or better) to my .srt ??

I'm not interested. Of course, AI-generated subtitles can't compare to human produced ones.

TR-9970X
31st January 2026, 09:48
Post the problem with all info to reproduce.



Post the problem with all info to reproduce.



Most likely that it would not.




I'm not interested. Of course, AI-generated subtitles can't compare to human produced ones.

OK, fair enough, so to cover all queries, I would like to send you an .ac3 of part 6 of Das Boot, that will provide some dialogue that isn't recognised, also an opportunity to try an SDH transcription, and to attempt to extract as much dialogue as possible.

If you could do that for me, and provide some commands/scripts you used, that will a HUGE help to my ongoing use of Pro, which IS going to get a LOT of work.

I spent nearly 7 hours on this part, today, and I've still got a few lines I can't figure out.

I've been pretty much using the command from post #56, with the additions from post #68 & #73.

You may still have the original files I uploaded:-

https://www.mediafire.com/file/yahfak2vt9t0g6d/New_folder.7z/file
this is the 1st one I uploaded, contains part 1 as a .flac, and a few other small files.

https://www.mediafire.com/file/l1hv4ia2kp4rffv/Das_6_track2_%255Beng%255D_DELAY_0ms.ac3/file
this one is part 6 as a .ac3.

Thanks.

VoodooFX
31st January 2026, 10:58
Post the command to reproduce the issue on the first file.
Why would I need that bare ac3 file?

TR-9970X
31st January 2026, 11:08
Post the command to reproduce the issue on the first file.
Why would I need that bare ac3 file?

I doubt that I have that command anymore, as it didn't work for me, but I said that I was using a combo of the command(s) that are here :-

I've been pretty much using the command from post #56, with the additions from post #68 & #73.

The .ac3 was the file I wanted you to "play" with, as you'd already downloaded the other files, last week.

Jamaika
31st January 2026, 11:39
As an amateur, I don't know where to download the latest Whisler with CUDA. What version of CUDA is it? Should I care?
cublas64_13.dll cudart64_13.dll
I don't know why ffmpeg doesn't have CUDA? Is it complicated? Does it have a lot of bugs? It's definitely being modified constantly. Strangely, it's not Whisler's GitHub that's being modified, but Llama, and then every month there's a mirror with a patch list.
The much-derided OpenCL, Vulkan, and many other systems are also heavily modified.
https://github.com/ggml-org/llama.cpp/tree/master/ggml/src
Is OpenCL recommended for smartphones? Who knows?
Where can I download the latest multilingual .bin translation files for the latest versions?
There is another question that has been puzzling me for years, don't use the "shit" GCC UCRT because it doesn't have CUDA.

VoodooFX
31st January 2026, 15:13
I doubt that I have that command anymore, as it didn't work for me, but I said that I was using a combo of the command(s) that are here :-

I've been pretty much using the command from post #56, with the additions from post #68 & #73.

I looked at the txt included, it's same as you asked before. And was answered already. Why you sent it again?


The .ac3 was the file I wanted you to "play" with, as you'd already downloaded the other files, last week.

I don't want to "play" anything.

TR-9970X
31st January 2026, 15:25
I looked at the txt included, it's same as you asked before. And was answered already. Why you sent it again?



I don't want to "play" anything.

I would like you to transcribe the .ac3 file, to extract as much dialogue as possible, (and SDH if possible) using the commands you sent me that are on the posts on your thread, that I quoted before.

VoodooFX
31st January 2026, 16:47
I would like you to transcribe the .ac3 file

Sorry, I'm not interested.

TR-9970X
1st February 2026, 01:09
Sorry, I'm not interested.

Not surprised.

You created a very complex transcription app, that seems to way better than anything currently available, and so easy to use.

However, having to pay good money for this, and get NO instructions, NO examples, and as it's turned out, very piss poor after sales service.

You've spent just as much time "helping", as you have questioning my English !!

All I was after was a command that might help in extracting as much text as possible, that I could use for future projects.

And all I get is:- "Sorry, I'm not interested".

So may this be a warning for current & future users of this fine app.

VERY disappointed.

TR-9970X
1st February 2026, 01:13
As an amateur, I don't know where to download the latest Whisler with CUDA. What version of CUDA is it? Should I care?
cublas64_13.dll cudart64_13.dll
I don't know why ffmpeg doesn't have CUDA? Is it complicated? Does it have a lot of bugs? It's definitely being modified constantly. Strangely, it's not Whisler's GitHub that's being modified, but Llama, and then every month there's a mirror with a patch list.
The much-derided OpenCL, Vulkan, and many other systems are also heavily modified.
https://github.com/ggml-org/llama.cpp/tree/master/ggml/src
Is OpenCL recommended for smartphones? Who knows?
Where can I download the latest multilingual .bin translation files for the latest versions?
There is another question that has been puzzling me for years, don't use the "shit" GCC UCRT because it doesn't have CUDA.

WTF are you on about, none of this makes much sense!!!

I think you're in the wrong place!!!

And get your info correct, what's Whisler ??

VoodooFX
1st February 2026, 07:19
All I was after was a command that might help in extracting as much text as possible, that I could use for future projects.


No, you was asking to create the subtitles for you. I don't offer such services.
There is no such magic command.

You've spent just as much time "helping", as you have questioning my English !!

That's wrong assumption. Failing to formulate an issue isn't an "English" problem, it's a logical one.
I can even understand Nania's "English", which is the most "encrypted" English I've encountered in my life. [Now he is using AI tools for the posts] :D

get NO instructions, NO examples, and as it's turned out, very piss poor after sales service.

Wrong assumption again, I don't offer any services.
The GitHub repo is full of examples and instructions. Actually, all your questions were already answered there.

VoodooFX
1st February 2026, 07:30
As an amateur, I don't know where to download the latest Whisler with CUDA. What version of CUDA is it? Should I care?

Those Python repos are not meant for the amateur end users.
At my repo you can find a download which is ready to run.

Strangely, it's not Whisler's GitHub that's being modified

Yes, there is not much of activity on OpenAI Whisper repo. I had to insist for months to merge my PR fixing a critical bug...

StainlessS
1st February 2026, 07:48
No, you was asking to create the subtitles for you. I don't offer such services.
Good for you.

I once (long ago) had a cry for help from a user of my software, after talking to him on the phone, I travelled down to Guildford,
(some tens of miles south of London) and I phoned him back, "I'm outside of the station" he said, "so am I", I said, but no-one in sight.
Turned out that my destination should have been "Ilford", some miles north of London.
Back on the train and went to N.London, and within 10 seconds of him showing me his problem, it became clear he was doing something
totally unexpected and very silly. {I dont recall what it was but just really daft action by him}.
So after many hours of travel, problem sorted in 10 seconds.

You just cant really go out of your way to help to such a degree, it dont make sense.
(I later got a company to do disk duplication and sales and such, much better for me as I am just too damn nice for my own good).
Dont ever make the same mistakes as me {stay mean, keep em keen}. :)

EDIT: The train fares cost quite a bit more than the cost of the software. {but the real cost was my time}

TR-9970X
1st February 2026, 10:27
Good for you.

I once (long ago) had a cry for help from a user of my software, after talking to him on the phone, I travelled down to Guildford,
(some tens of miles south of London) and I phoned him back, "I'm outside of the station" he said, "so am I", I said, but no-one in sight.
Turned out that my destination should have been "Ilford", some miles north of London.
Back on the train and went to N.London, and within 10 seconds of him showing me his problem, it became clear he was doing something
totally unexpected and very silly. {I dont recall what it was but just really daft action by him}.
So after many hours of travel, problem sorted in 10 seconds.

You just cant really go out of your way to help to such a degree, it dont make sense.
(I later got a company to do disk duplication and sales and such, much better for me as I am just too damn nice for my own good).
Dont ever make the same mistakes as me {stay mean, keep em keen}. :)

EDIT: The train fares cost quite a bit more than the cost of the software. {but the real cost was my time}

Well, you got REALLY sucked in with that then...

All I wanted was a good command for a reference point, as I have only had the software for a week, and it just got out of hand, so "stay mean, keep 'em keen" won't work, he's lost a "customer".

StainlessS
1st February 2026, 10:34
won't work, he's lost a "customer".
I doubt he cares, he's the one doing you a favour.

This is the only command (.BAT) file that I use,

DropAudioOnME.bat

Whisper-Faster\whisper.exe --model_dir ".\_models" --language en --model "large-v2" %*


EDIT: Large v3 is out, but I aint gotten around to using it.

TR-9970X
1st February 2026, 10:43
I doubt he cares, he's the one doing you a favour.

This is the only command (.BAT) file that I use,

DropAudioOnME.bat

Whisper-Faster\whisper.exe --model_dir ".\_models" --language en --model "large-v2" %*


EDIT: Large v3 is out, but I aint gotten around to using it.

Thanks.

That's a pretty basic command, but I'll give it a go.

The more commands I get collect, the better it will be for me :)

I have been using medium, large v1, v2 & v3, and they all come up with different results, so you can pick & choose what lines you want to use, that is closest to the audio.

VoodooFX
1st February 2026, 11:49
I once (long ago) had a cry for help...
So after many hours of travel, problem sorted in 10 seconds.

You just cant really go out of your way to help to such a degree, it dont make sense.

I've worked in sales and marketing fields, I can write an academic paper on human craziness. :D
Once a company sent me on a field trip, to fix an issue, they didn't offer such support services but the equipment sold was expensive, it took seconds to show where to press the button...

Fun story, I sold my used laptop on Ebay to a lady. After 6 months, she contacted me claiming she had caught a virus and demanded a refund. After I explained that it's not my problem, I was bombarded with various threats, the police, the low and high courts, you name it. I just blocked her.
Two years later, I got a desperate message from Ebay support, that the same lady is bombarding them, they offered me her contacts and asked if I could deal with her. My response was short: "I don't give a flying fuck", and asked them not to contact me anymore. :D

Well, sometimes I go out of my way, and offer a remote desktop help. And sometimes people compensate the time wasted.


EDIT:
And don't get me started what crazy emails I get from the GitHub projects alone. :D
Usually from various religion organizations/cults, with crazy offers, demands, threats.

Got dozens of messages from this guy (he's at the lower side of the spectrum): https://www.youtube.com/watch?v=N8JwbmFY_zE

VoodooFX
1st February 2026, 12:03
All I wanted was a good command for a reference point

That's not true, you asked me to "play" with some file, then asked to produce the subtitles for you.
What makes even less sense, is that you have multiple commands for a reference already...