Log in

View Full Version : I want to ask about whisper ai


cns00
15th February 2025, 19:27
I have some questions about whisper ai.

I am following https://www.youtube.com/watch?v=ABFqbY_rmEk to install whisper ai. I have Windows 11 on my laptop.

I did the 5 steps in the video. Assume that the audio file is test.wav. I said:
whisper test.wav

It is downloading large-v3-turbo.pt since this is the first time that I ran the command. It is 1.5gb. The problem is that my internet is horrible. The download is failing. It reached 18% and then the download fails and I get an error message that says:
RuntimeError: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model.

I tried it again. It reached 34% and failed. I tried it again. It reached 55% and failed. I even tested it by using a VPN and still no luck. In the end the download got stuck on say 34% and it's not proceeding. The download got stuck 3 times.

I saw https://github.com/openai/whisper/discussions/1027. They said that the issue was fixed if you deleted:
"C:\Users\.cache\torch\whisperx-vad-segmentation.bin"
"C:\Users\USER-NAME.cache\torch\whisperx-vad-segmentation.bak"

I don't have those files. What do I do? How do I download large-v3-turbo.pt?

I have subtitle files for the non English parts of my movies in which lines show before\after a person starts speaking and they go away before\after a person stops speaking and the delays are noticeable.

I want to make the lines show exactly when a person starts speaking and they go away exactly when a person stops speaking. Can whisper do that?

Assume that I have an English movie that has French dialogue and I have a subtitle file for the French part. ChatGPT said that I can use whisper to generate a subtitle file for the movie and then the timing in that subtitle file will be used to fix the timing of my subtitle file and the command is:
python sync_subtitles.py my_movie.mp4 my_subtitles.srt

https://drive.google.com/file/d/1ShP5MmmOrr1CCuHPkPHhZgRKOVBzWyMt/view?usp=sharing is sync_subtitles.py that ChatGPT made.

After that it said this:
https://imgur.com/a/nCcZBm6

I said yes and it made for me a modified file which is modified_sync_subtitles.py and it is https://drive.google.com/file/d/1bvLnCeG9WUfXOydFCQWiUamj_-aS7sA_/view?usp=sharing. The modified script does this:
https://imgur.com/a/k0ql3OM

I want to ask is the coding in both scripts correct? Which script should I use?

VoodooFX
15th February 2025, 21:37
Why you PMed me the same post (twice), don't PM me, this is forum, not facebook.

I doubt that anyone will be interested in your chats with AI.
While Whisper models[weights] are good, the actual vanilla Whisper client software is nothing to write home about. It's better to use other, more advanced projects.
For example Faster-Whisper-XXL (https://github.com/Purfview/whisper-standalone-win).

To differentiate languages you can try [experimental, results - random luck]:
"--multilingual true"

Or this [less "random luck", so differentiation expected to be much better, but transcription quality expected to be a bit worse]:
"--batched --unmerged --multilingual true"

cns00
16th February 2025, 00:29
For example Faster-Whisper-XXL (https://github.com/Purfview/whisper-standalone-win).

To differentiate languages you can try [experimental, results - random luck]:
"--multilingual true"

Or this [less "random luck", so differentiation expected to be much better, but transcription quality expected to be a bit worse]:
"--batched --unmerged --multilingual true"

Does Faster Whisper need to download large-v3-turbo.pt?

Assume that my movie file is abc.mp4 and the subtitle file is abc.srt. How do I use Faster Whisper to fix the timing of abc.srt?

VoodooFX
16th February 2025, 00:39
Does Faster Whisper need to download large-v3-turbo.pt?

Not exactly that file, but yes, it would download the model files, or you can get them manually, files for turbo model is there: https://huggingface.co/Purfview/faster-whisper-large-v3-turbo/tree/main



Assume that my movie file is abc.mp4 and the subtitle file is abc.srt. How do I use Faster Whisper to fix the timing of abc.srt?

No idea what you mean by that.
First, try to get that srt with it, maybe you wouldn't need to fix anything.

cns00
16th February 2025, 06:36
Not exactly that file, but yes, it would download the model files, or you can get them manually, files for turbo model is there: https://huggingface.co/Purfview/faster-whisper-large-v3-turbo/tree/main

How do I download from here? The turbo model file is .pt and there are no .pt files in that page.

No idea what you mean by that.
First, try to get that srt with it, maybe you wouldn't need to fix anything.

I have an English movie abc.mp4 and it has non English parts in it and I downloaded abc.srt which has the subtitles for the non English parts and that subtitle file has bad timing. If I add abc.mp4 to Faster Whisper then it will generate an srt file. How do I use that generated file to fix the timing in abc.srt? Do I have to use ffsubsync to do that?

Should Faster Whisper generate a subtitle file that has the English parts and the non English parts or it should generate a subtitle file that has only the non English parts?

VoodooFX
16th February 2025, 10:42
How do I download from here? The turbo model file is .pt and there are no .pt files in that page.

By pressing a download button [at the file size]. This is different program, it doesn't use "pt" files.

Do I have to use ffsubsync to do that?

You need to do that manually, for example with Notepad, or Subtitle Edit.

Should Faster Whisper generate a subtitle file that has the English parts and the non English parts or it should generate a subtitle file that has only the non English parts?

All would be in one file. But you can transcribe only those parts you need, with "--clip_timestamps" option.

cns00
16th February 2025, 11:08
By pressing a download button [at the file size]. This is different program, it doesn't use "pt" files.

The .bat file initially says:
faster-whisper-xxl.exe %file_list% -pp -o source --batch_recursive --check_files --standard -f json srt -m medium

I changed medium to large and then I dropped a test wav file on the .bat file and it downloaded a 3gb file. I got an error message. I don't know if the download completed successfully.

https://i.imgur.com/nFhxnAm.jpeg is what i see in the faster-whisper-large-v3. The file is 3gb. https://i.imgur.com/yJIfGtM.jpeg is what i see in the link that you gave me and it says that the name is faster-whisper-large-v3-turbo and it's 1.5gb.

How come one file is 3gb and the other is 1.5gb and they are both under large?

VoodooFX
16th February 2025, 11:58
"large" is not "turbo". What error?

cns00
16th February 2025, 13:15
"large" is not "turbo". What error?

I think that the error happened because some files were not downloaded properly.

Anyway, I tested the 1.5gb model.bin file. I got a message which says that large version 3 may not be as good as large version 2 which I can get from https://huggingface.co/openai/whisper-large-v2/tree/main.

I tested the srt file that I got from large version 3. The timing wasn't perfect and there were delays in some lines like a line went away after a person stopped speaking. Does large version 2 do a better job at the timing?

VoodooFX
16th February 2025, 14:00
I think that the error happened because the model.bin file was not downloaded properly.

I don't want to know what you think.
The question was "What error?", not "What you think about the error?"

I got a message which says that large version 3 may not be as good as large version 2. From where do I download large version 2?

It will be downloaded same as v3, use "-m large-v2".
Can be downloaded manually from there: https://huggingface.co/Systran

cns00
16th February 2025, 17:21
I don't want to know what you think.
The question was "What error?", not "What you think about the error?"
I didn't read what the error message said. It just asked me to press enter and no srt file was generated. Thus that means that there was a problem downloading the files.



It will be downloaded same as v3, use "-m large-v2".
Can be downloaded manually from there: https://huggingface.co/Systran

Ok. Please answer the question that I asked you before. Does large v2 do a better job at fixing the timing and there are no delays in the timing?

VoodooFX
16th February 2025, 20:58
Does large v2 do a better job at fixing the timing and there are no delays in the timing?

Maybe.

cns00
18th February 2025, 06:04
Maybe.

It's still not that accurate with large v2. Is there something else that I can try?

VoodooFX
18th February 2025, 12:40
It's still not that accurate with large v2. Is there something else that I can try?

How much it's "not that accurate" and why you need it to be very accurate?

cns00
18th February 2025, 12:50
How much it's "not that accurate" and why you need it to be very accurate?

I need the line to show when a person starts to speak and go away when that person stops speaking. Some of the generated lines are like that but other lines have bad timing and the person starts speaking and then the line shows or the person stopped speaking and the line still showed and then it went away. Why is the ai making a mistake with the timing of those lines?