Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 17th May 2023, 23:53   #21  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
Quote:
Originally Posted by Emulgator View Post
Removed cudart64_110.dll 6.14.11.11080 from system32
Why did you removed it? Maybe it's present in other folder?


Quote:
Originally Posted by Emulgator View Post
generating of subtitles ended there at #1757,
so it did not reach the movie's end at 01:43:40
Subtitle numbers doesn't mean anything, some lines could be in one sub, you need to check actual subtitles.


Quote:
Originally Posted by Emulgator View Post
At the moment r103 looks better.
Did you check transcription differences?

By default r117 runs int8 quantization on GPU, r103 runs float16. [on CPU both use int8]
I changed that because few users reported that int8 is more accurate than float16 and that speed is same.

Quantization can be set by "--compute_type".

EDIT:
@Emulgator
Could you do tests these 2 short files with "medium": https://we.tl/t-S5gnRvMuQB , with "--compute_type=float16" & "--compute_type=int8" on CUDA and share 4 srt files?

Last edited by VoodooFX; 18th May 2023 at 05:03.
VoodooFX is offline   Reply With Quote
Old 18th May 2023, 11:41   #22  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
Quote:
Originally Posted by Emulgator View Post
Removed cudart64_110.dll 6.14.11.11080 from system32
Quote:
Why did you removed it? Maybe it's present in other folder?
To make sure that only the .dlls I should introduce are loaded and not an additional dependency not accounted for.

Ah, well, 8bit vs. 16bit can make all the difference !
I give a .wav 32bit float decode from the DVD .ac3 track and use the large multilingual model only.

Comparing the 3 runs from a 25fps-speed-up 1961 musical movie English soundtrack,
quick, cockney and other slang talking, interleaved with songs
using WinMerge triple comparison:

r103 GPU from 04.05.2023
r103 GPU from 17.05.2023
r117 GPU from 17.05.2023

All versions have their uses and guess differently.
Which is good for me: a wealth to choose from.
Now it is up to the subtitler (me) just to merge the best parts.

Will have to talk Nikse into having 3 editor tabs in SubtitleEdit, muhahaha ;-)

Downloaded your sample, testing soon.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."

Last edited by Emulgator; 18th May 2023 at 12:35.
Emulgator is offline   Reply With Quote
Old 18th May 2023, 14:15   #23  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
Code:
C:\_PROG\! Subtitle Tools\Whisper-Faster_Win.x64_2023.05.13.b117_GPU>whisper.exe "C:\_PROG\! Subtitle Tools\! Testfile VoodooFX 2023 05 18\test_original.aac" --language en --model "large" --compute_type=float16

Standalone Faster-Whisper r117 running on: CUDA

Estimating duration from bitrate, this may be inaccurate
2023-05-18 15:05:00.1132781 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1671 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.

[00:00.760 --> 00:02.760]  Feeling inspired yet?
[00:02.760 --> 00:03.760]  No.
[00:03.760 --> 00:08.890]  Thank you.
[00:08.890 --> 00:10.890]  I thought you said you were hungry.
[00:12.890 --> 00:17.890]  There's a boat tour of the Trakla Island formations this afternoon.
[00:17.890 --> 00:22.890]  I was thinking we could go on that and make reservations in town for dinner.
[00:23.890 --> 00:25.890]  You could try the Chinese place.
[00:25.890 --> 00:28.890]  I don't think I'd survive another dinner in town.
[00:29.890 --> 00:31.890]  Even the idea that...
[00:31.890 --> 00:33.890]  Does anyone think it's a real town?
[00:36.890 --> 00:38.890]  Why would they have a Chinese place?
[00:43.280 --> 00:46.280]  Is it okay if I go? I'll meet you on the beach.
[00:47.280 --> 00:49.280]  Yeah, sure.
[01:22.270 --> 01:24.270]  Someone's making a statement.
[01:24.270 --> 01:26.270]  One of the locals, I guess.
[01:29.270 --> 01:31.270]  What do you think he's trying to say?
[01:31.270 --> 01:35.270]  He's saying that he wants to put a long knife right through her.
[01:35.270 --> 01:40.270]  And after you die, he'll hang your body at the airport to scare off the other tourists.
[01:42.270 --> 01:44.270]  Seems a bit extreme.
[01:46.270 --> 01:49.270]  The Latokans are a melodramatic people.
[01:54.740 --> 01:56.740]  I loved your book.
[01:58.740 --> 01:59.740]  Sorry?
[02:00.740 --> 02:03.740]  You're James Foster. I loved your book.
[02:06.740 --> 02:09.740]  Sorry, is that good? I don't mean to put you in the spot.
[02:09.740 --> 02:11.740]  No, thank you.
[02:11.740 --> 02:13.740]  It's just, um...
[02:13.740 --> 02:15.740]  Not a lot of people read my book.
[02:15.740 --> 02:17.740]  I'm Gabby Bauer.
[02:17.740 --> 02:19.740]  I'm James Foster.
[02:21.740 --> 02:22.740]  Alvin!
[02:25.520 --> 02:27.520]  This is James Foster.
[02:27.520 --> 02:29.520]  Hi, nice to meet you. Albon Bauer.
[02:29.520 --> 02:30.520]  Pleasure.
[02:30.520 --> 02:32.520]  He wrote your book that I love, The Variable Sheath.
[02:32.520 --> 02:34.520]  Oh, yeah, I remember.
[02:34.520 --> 02:36.520]  I thought it was brilliant.
[02:36.520 --> 02:37.520]  Yes.
[02:37.520 --> 02:41.520]  James, do you think I could convince you to join us for dinner this evening?
[02:42.520 --> 02:46.520]  I've been seeing you around the resort for a few days now and I would love to get to know you.
[02:46.520 --> 02:49.520]  We have a reservation tonight at Yang's.
[03:00.450 --> 03:02.450]  Yeah, it was a good...
[03:04.450 --> 03:06.450]  ...learning experience.
[03:06.450 --> 03:07.450]  All right.
[03:07.450 --> 03:10.450]  Is there anything else I can get you?
[03:10.450 --> 03:12.450]  Um, that's all I think.
[03:12.450 --> 03:14.450]  All right, everyone, please have a great meal.
[03:14.450 --> 03:15.450]  Thank you.
[03:15.450 --> 03:19.450]  And let me know any time if I can make your experience even more enjoyable.
[03:22.450 --> 03:24.450]  He's an interesting guy.
[03:24.450 --> 03:25.450]  Yes.
[03:25.450 --> 03:29.450]  This resort is labelled in the resort guide as a multicultural dining experience.
[03:30.450 --> 03:32.450]  Well, it certainly is an experience.
[03:33.450 --> 03:36.450]  So, Albon, what is it you do for a living?
[03:36.450 --> 03:39.450]  Oh, architecture. But I'm mostly retired.
[03:39.450 --> 03:42.450]  Now I run a journal out of Los Angeles called Glass Pane.
[03:42.450 --> 03:43.450]  You're French?
[03:43.450 --> 03:46.450]  Oh, no. Swiss first, from Geneva.
[03:46.450 --> 03:48.450]  Then Paris, then LA.
[03:49.450 --> 03:52.450]  I'm from London first. Then Paris.
[03:52.450 --> 03:53.450]  We met there.
[03:53.450 --> 03:54.450]  That's how we met.
[03:54.450 --> 03:57.450]  But I couldn't get work there, so I made Albon move with me.
[03:58.450 --> 04:00.450]  And what do you do?
[04:00.450 --> 04:03.450]  Well, I'm an actress, of course.
[04:03.450 --> 04:05.450]  Oh, really? She's great.
[04:06.450 --> 04:07.450]  For commercials.
[04:07.450 --> 04:09.450]  I have a contract with an LA company.
[04:09.450 --> 04:11.450]  They've been grooming me.
[04:11.450 --> 04:13.450]  I specialize in failing naturally.
[04:14.450 --> 04:17.450]  What does that mean? Failing naturally?
[04:18.450 --> 04:22.450]  Finding a natural-seeming way to fail at any given task.
[04:22.450 --> 04:24.450]  In each of the commercials that I'm in,
[04:24.450 --> 04:27.450]  I'm the one who simply can't go on without the product.
[04:27.450 --> 04:29.450]  It's ridiculous for me not to have the product.
[04:30.450 --> 04:31.450]  Okay.
[04:31.450 --> 04:32.450]  Show them.
[04:32.450 --> 04:33.450]  No.
[04:33.450 --> 04:34.450]  No, you should.
[04:34.450 --> 04:35.450]  Yeah.
[04:35.450 --> 04:36.450]  Please.
[04:36.450 --> 04:37.450]  Do you want to see?
[04:37.450 --> 04:38.450]  I want to see.
[04:38.450 --> 04:39.450]  Here.
[04:42.450 --> 04:43.450]  She's amazing.
[04:56.660 --> 05:02.450]  I just...
[05:04.450 --> 05:05.450]  I...

Standalone Faster-Whisper operation finished in: 25 seconds

C:\_PROG\! Subtitle Tools\Whisper-Faster_Win.x64_2023.05.13.b117_GPU>pause
Drücken Sie eine beliebige Taste . . .
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 18th May 2023, 14:16   #24  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
Code:
C:\_PROG\! Subtitle Tools\Whisper-Faster_Win.x64_2023.05.13.b117_GPU>whisper.exe "C:\_PROG\! Subtitle Tools\! Testfile VoodooFX 2023 05 18\test_original.aac" --language en --model "large" --compute_type=int8

Standalone Faster-Whisper r117 running on: CUDA

Estimating duration from bitrate, this may be inaccurate
2023-05-18 15:08:01.9382416 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1671 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.

[00:00.760 --> 00:02.760]  Feeling inspired yet?
[00:02.760 --> 00:08.890]  No, thank you.
[00:08.890 --> 00:10.890]  I thought you said you were hungry.
[00:12.890 --> 00:17.890]  There's a boat tour of the Trakla Island formations this afternoon.
[00:17.890 --> 00:22.890]  I was thinking we could go on that and make reservations in town for dinner.
[00:23.890 --> 00:25.890]  You could try the Chinese place.
[00:25.890 --> 00:28.890]  I don't think I'd survive another dinner in town.
[00:29.890 --> 00:31.890]  Even the idea that...
[00:31.890 --> 00:33.890]  Does anyone think it's a real town?
[00:35.890 --> 00:38.890]  Why would they have a Chinese place?
[00:43.280 --> 00:46.280]  Is it okay if I go? I'll meet you on the beach.
[00:47.280 --> 00:49.280]  Yeah, sure.
[01:22.270 --> 01:24.270]  Someone's making a statement.
[01:24.270 --> 01:27.270]  One of the locals, I guess.
[01:29.270 --> 01:31.270]  What do you think he's trying to say?
[01:31.270 --> 01:35.270]  He's saying that he wants to put a long knife right through her.
[01:35.270 --> 01:40.270]  And after you die, he'll hang your body at the airport to scare off the other tourists.
[01:42.270 --> 01:44.270]  Seems a bit extreme.
[01:46.270 --> 01:49.270]  The Latokans are a melodramatic people.
[01:54.740 --> 01:56.740]  I loved your book.
[01:58.740 --> 01:59.740]  Sorry?
[02:00.740 --> 02:03.740]  You're James Foster. I loved your book.
[02:06.740 --> 02:09.740]  Sorry, is that good? I don't mean to put you in the spot.
[02:09.740 --> 02:11.740]  No, thank you.
[02:11.740 --> 02:13.740]  It's just, um...
[02:13.740 --> 02:15.740]  Not a lot of people read my book.
[02:15.740 --> 02:17.740]  I'm Gabby Bauer.
[02:17.740 --> 02:19.740]  I'm James Foster.
[02:21.740 --> 02:22.740]  Alvin!
[02:25.520 --> 02:27.520]  This is James Foster.
[02:27.520 --> 02:29.520]  Hi, nice to meet you. Albon Bauer.
[02:29.520 --> 02:30.520]  Pleasure.
[02:30.520 --> 02:32.520]  He wrote your book that I love, The Variable Sheath.
[02:32.520 --> 02:34.520]  Oh, yeah, I remember.
[02:34.520 --> 02:36.520]  I thought it was brilliant.
[02:36.520 --> 02:37.520]  Yes.
[02:37.520 --> 02:41.520]  James, do you think I could convince you to join us for dinner this evening?
[02:42.520 --> 02:46.520]  I've been seeing you around the resort for a few days now and I would love to get to know you.
[02:46.520 --> 02:49.520]  We have a reservation tonight at Yang's.
[03:00.450 --> 03:02.450]  Yeah, it was a good...
[03:04.450 --> 03:06.450]  ...learning experience.
[03:06.450 --> 03:07.450]  All right.
[03:07.450 --> 03:10.450]  Is there anything else I can get you?
[03:10.450 --> 03:12.450]  Um, that's all I think.
[03:12.450 --> 03:14.450]  All right, everyone, please have a great meal.
[03:14.450 --> 03:15.450]  Thank you.
[03:15.450 --> 03:19.450]  And let me know any time if I can make your experience even more enjoyable.
[03:22.450 --> 03:24.450]  He's an interesting guy.
[03:24.450 --> 03:25.450]  Yes.
[03:25.450 --> 03:29.450]  This resort is labelled in the resort guide as a multicultural dining experience.
[03:30.450 --> 03:32.450]  Well, it certainly is an experience.
[03:33.450 --> 03:36.450]  So, Albon, what is it you do for a living?
[03:36.450 --> 03:39.450]  Oh, architecture. But I'm mostly retired.
[03:39.450 --> 03:42.450]  Now I run a journal out of Los Angeles called Glass Pane.
[03:42.450 --> 03:43.450]  You're French?
[03:43.450 --> 03:46.450]  Oh, no. Swiss first, from Geneva.
[03:46.450 --> 03:48.450]  Then Paris, then L.A.
[03:49.450 --> 03:52.450]  I'm from London first. Then Paris.
[03:52.450 --> 03:53.450]  We met there.
[03:53.450 --> 03:54.450]  That's how we met.
[03:54.450 --> 03:57.450]  But I couldn't get work there, so I made Albon move with me.
[03:58.450 --> 04:00.450]  And what do you do?
[04:00.450 --> 04:03.450]  Well, I'm an actress, of course.
[04:03.450 --> 04:05.450]  Oh, really? She's great.
[04:06.450 --> 04:07.450]  For commercials.
[04:07.450 --> 04:09.450]  I have a contract with an L.A. company.
[04:09.450 --> 04:11.450]  They've been grooming me.
[04:11.450 --> 04:13.450]  I specialize in failing naturally.
[04:14.450 --> 04:17.450]  What does that mean? Failing naturally?
[04:18.450 --> 04:22.450]  Finding a natural-seeming way to fail at any given task.
[04:22.450 --> 04:24.450]  In each of the commercials that I'm in,
[04:24.450 --> 04:27.450]  I'm the one who simply can't go on without the product.
[04:27.450 --> 04:29.450]  It's ridiculous for me not to have the product.
[04:30.450 --> 04:31.450]  Okay.
[04:31.450 --> 04:32.450]  Show them.
[04:32.450 --> 04:33.450]  No.
[04:33.450 --> 04:34.450]  No, you should.
[04:34.450 --> 04:35.450]  Yeah.
[04:35.450 --> 04:36.450]  Please.
[04:36.450 --> 04:37.450]  Do you want to see?
[04:37.450 --> 04:38.450]  I want to see.
[04:38.450 --> 04:39.450]  Here.
[04:42.450 --> 04:43.450]  She's amazing.
[04:56.660 --> 05:02.450]  I just...
[05:04.450 --> 05:05.450]  I...

Standalone Faster-Whisper operation finished in: 38 seconds

C:\_PROG\! Subtitle Tools\Whisper-Faster_Win.x64_2023.05.13.b117_GPU>pause
Drücken Sie eine beliebige Taste . . .
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 18th May 2023, 14:16   #25  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
Code:
C:\_PROG\! Subtitle Tools\Whisper-Faster_Win.x64_2023.05.13.b117_GPU>whisper.exe "C:\_PROG\! Subtitle Tools\! Testfile VoodooFX 2023 05 18\test_ffmpeg6.wav" --language en --model "large" --compute_type=float16

Standalone Faster-Whisper r117 running on: CUDA

2023-05-18 15:09:37.3092070 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1671 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.

[00:00.760 --> 00:02.760]  Feeling inspired yet?
[00:02.760 --> 00:03.760]  No.
[00:03.760 --> 00:08.890]  Thank you.
[00:08.890 --> 00:10.890]  I thought you said you were hungry.
[00:12.890 --> 00:17.890]  There's a boat tour of the Trakla Island formations this afternoon.
[00:17.890 --> 00:22.890]  I was thinking we could go on that and make reservations in town for dinner.
[00:23.890 --> 00:25.890]  You could try the Chinese place.
[00:25.890 --> 00:28.890]  I don't think I'd survive another dinner in town.
[00:29.890 --> 00:31.890]  Even the idea that...
[00:31.890 --> 00:33.890]  Does anyone think it's a real town?
[00:36.890 --> 00:38.890]  Why would they have a Chinese place?
[00:43.280 --> 00:46.280]  Is it okay if I go? I'll meet you on the beach.
[00:47.280 --> 00:49.280]  Yeah, sure.
[01:22.270 --> 01:24.270]  Someone's making a statement.
[01:24.270 --> 01:26.270]  One of the locals, I guess.
[01:29.270 --> 01:31.270]  What do you think he's trying to say?
[01:31.270 --> 01:35.270]  He's saying that he wants to put a long knife right through her.
[01:35.270 --> 01:40.270]  And after you die, he'll hang your body at the airport to scare off the other tourists.
[01:42.270 --> 01:44.270]  Seems a bit extreme.
[01:46.270 --> 01:49.270]  The Latokans are a melodramatic people.
[01:54.740 --> 01:56.740]  I loved your book.
[01:58.740 --> 01:59.740]  Sorry?
[02:00.740 --> 02:03.740]  You're James Foster. I loved your book.
[02:06.740 --> 02:09.740]  Sorry, is that good? I don't mean to put you in the spot.
[02:09.740 --> 02:11.740]  No, thank you.
[02:11.740 --> 02:13.740]  It's just, um...
[02:13.740 --> 02:15.740]  Not a lot of people read my book.
[02:15.740 --> 02:17.740]  I'm Gabby Bauer.
[02:17.740 --> 02:19.740]  I'm James Foster.
[02:21.740 --> 02:22.740]  Alvin!
[02:25.520 --> 02:27.520]  This is James Foster.
[02:27.520 --> 02:29.520]  Hi, nice to meet you. Albon Bauer.
[02:29.520 --> 02:30.520]  Pleasure.
[02:30.520 --> 02:32.520]  He wrote your book that I love, The Variable Sheath.
[02:32.520 --> 02:34.520]  Oh, yeah, I remember.
[02:34.520 --> 02:36.520]  I thought it was brilliant.
[02:36.520 --> 02:37.520]  Yes.
[02:37.520 --> 02:41.520]  James, do you think I could convince you to join us for dinner this evening?
[02:42.520 --> 02:46.520]  I've been seeing you around the resort for a few days now and I would love to get to know you.
[02:46.520 --> 02:49.520]  We have a reservation tonight at Yang's.
[03:00.450 --> 03:02.450]  Yeah, it was a good...
[03:04.450 --> 03:06.450]  ...learning experience.
[03:06.450 --> 03:07.450]  All right.
[03:07.450 --> 03:10.450]  Is there anything else I can get you?
[03:10.450 --> 03:12.450]  Um, that's all I think.
[03:12.450 --> 03:14.450]  All right, everyone, please have a great meal.
[03:14.450 --> 03:15.450]  Thank you.
[03:15.450 --> 03:19.450]  And let me know any time if I can make your experience even more enjoyable.
[03:22.450 --> 03:24.450]  He's an interesting guy.
[03:24.450 --> 03:25.450]  Yes.
[03:25.450 --> 03:29.450]  This resort is labelled in the resort guide as a multicultural dining experience.
[03:30.450 --> 03:32.450]  Well, it certainly is an experience.
[03:33.450 --> 03:36.450]  So, Albon, what is it you do for a living?
[03:36.450 --> 03:39.450]  Oh, architecture. But I'm mostly retired.
[03:39.450 --> 03:42.450]  Now I run a journal out of Los Angeles called Glass Pane.
[03:42.450 --> 03:43.450]  You're French?
[03:43.450 --> 03:46.450]  Oh, no. Swiss first, from Geneva.
[03:46.450 --> 03:48.450]  Then Paris, then LA.
[03:49.450 --> 03:52.450]  I'm from London first. Then Paris.
[03:52.450 --> 03:53.450]  We met there.
[03:53.450 --> 03:54.450]  That's how we met.
[03:54.450 --> 03:57.450]  But I couldn't get work there, so I made Albon move with me.
[03:58.450 --> 04:00.450]  And what do you do?
[04:00.450 --> 04:03.450]  Well, I'm an actress, of course.
[04:03.450 --> 04:05.450]  Oh, really? She's great.
[04:06.450 --> 04:07.450]  For commercials.
[04:07.450 --> 04:09.450]  I have a contract with an LA company.
[04:09.450 --> 04:11.450]  They've been grooming me.
[04:11.450 --> 04:13.450]  I specialize in failing naturally.
[04:14.450 --> 04:17.450]  What does that mean? Failing naturally?
[04:18.450 --> 04:22.450]  Finding a natural-seeming way to fail at any given task.
[04:22.450 --> 04:24.450]  In each of the commercials that I'm in,
[04:24.450 --> 04:27.450]  I'm the one who simply can't go on without the product.
[04:27.450 --> 04:29.450]  It's ridiculous for me not to have the product.
[04:30.450 --> 04:31.450]  Okay.
[04:31.450 --> 04:32.450]  Show them.
[04:32.450 --> 04:33.450]  No.
[04:33.450 --> 04:34.450]  No, you should.
[04:34.450 --> 04:35.450]  Yeah.
[04:35.450 --> 04:36.450]  Please.
[04:36.450 --> 04:37.450]  Do you want to see?
[04:37.450 --> 04:38.450]  I want to see.
[04:38.450 --> 04:39.450]  Here.
[04:42.450 --> 04:43.450]  She's amazing.
[04:56.660 --> 05:02.450]  I just...
[05:04.450 --> 05:05.450]  I...

Standalone Faster-Whisper operation finished in: 21 seconds

C:\_PROG\! Subtitle Tools\Whisper-Faster_Win.x64_2023.05.13.b117_GPU>pause
Drücken Sie eine beliebige Taste . . .
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 18th May 2023, 14:17   #26  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
Code:
C:\_PROG\! Subtitle Tools\Whisper-Faster_Win.x64_2023.05.13.b117_GPU>whisper.exe "C:\_PROG\! Subtitle Tools\! Testfile VoodooFX 2023 05 18\test_ffmpeg6.wav" --language en --model "large" --compute_type=int8

Standalone Faster-Whisper r117 running on: CUDA

2023-05-18 15:11:27.5628509 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1671 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.

[00:00.760 --> 00:02.760]  Feeling inspired yet?
[00:02.760 --> 00:08.890]  No, thank you.
[00:08.890 --> 00:10.890]  I thought you said you were hungry.
[00:12.890 --> 00:17.890]  There's a boat tour of the Trakla Island formations this afternoon.
[00:17.890 --> 00:22.890]  I was thinking we could go on that and make reservations in town for dinner.
[00:23.890 --> 00:25.890]  You could try the Chinese place.
[00:25.890 --> 00:28.890]  I don't think I'd survive another dinner in town.
[00:29.890 --> 00:31.890]  Even the idea that...
[00:31.890 --> 00:33.890]  Does anyone think it's a real town?
[00:36.890 --> 00:38.890]  Why would they have a Chinese place?
[00:43.280 --> 00:46.280]  Is it okay if I go? I'll meet you on the beach.
[00:47.280 --> 00:49.280]  Yeah, sure.
[01:22.270 --> 01:24.270]  Someone's making a statement.
[01:24.270 --> 01:26.270]  One of the locals, I guess.
[01:29.270 --> 01:31.270]  What do you think he's trying to say?
[01:31.270 --> 01:35.270]  He's saying that he wants to put a long knife right through her.
[01:35.270 --> 01:40.270]  And after you die, he'll hang your body at the airport to scare off the other tourists.
[01:42.270 --> 01:44.270]  Seems a bit extreme.
[01:46.270 --> 01:49.270]  The Latokans are a melodramatic people.
[01:54.740 --> 01:56.740]  I loved your book.
[01:58.740 --> 01:59.740]  Sorry?
[02:00.740 --> 02:03.740]  You're James Foster. I loved your book.
[02:06.740 --> 02:09.740]  Sorry, is that good? I don't mean to put you in the spot.
[02:09.740 --> 02:11.740]  No, thank you.
[02:11.740 --> 02:13.740]  It's just, um...
[02:13.740 --> 02:15.740]  Not a lot of people read my book.
[02:15.740 --> 02:17.740]  I'm Gabby Bauer.
[02:17.740 --> 02:19.740]  I'm James Foster.
[02:21.740 --> 02:22.740]  Alvin!
[02:25.520 --> 02:27.520]  This is James Foster.
[02:27.520 --> 02:29.520]  Hi, nice to meet you. Albon Bauer.
[02:29.520 --> 02:30.520]  Pleasure.
[02:30.520 --> 02:32.520]  He wrote your book that I love, The Variable Sheath.
[02:32.520 --> 02:34.520]  Oh, yeah, I remember.
[02:34.520 --> 02:36.520]  I thought it was brilliant.
[02:36.520 --> 02:37.520]  Yes.
[02:37.520 --> 02:42.520]  James, do you think I could convince you to join us for dinner this evening?
[02:42.520 --> 02:46.520]  I've been seeing you around the resort for a few days now and I would love to get to know you.
[02:46.520 --> 02:49.520]  We have a reservation tonight at Yang's.
[03:00.450 --> 03:02.450]  Yeah, it was a good...
[03:04.450 --> 03:06.450]  learning experience.
[03:06.450 --> 03:07.450]  All right.
[03:07.450 --> 03:10.450]  Is there anything else I can get you?
[03:10.450 --> 03:12.450]  Um, that's all I think.
[03:12.450 --> 03:14.450]  All right, everyone, please have a great meal.
[03:14.450 --> 03:15.450]  Thank you.
[03:15.450 --> 03:20.450]  And let me know any time if I can make your experience even more enjoyable.
[03:22.450 --> 03:24.450]  He's an interesting guy.
[03:24.450 --> 03:25.450]  Yes.
[03:25.450 --> 03:30.450]  This resort is labeled in the resort guide as a multicultural dining experience.
[03:30.450 --> 03:33.450]  Well, it certainly is an experience.
[03:33.450 --> 03:36.450]  So, Albon, what is it you do for a living?
[03:36.450 --> 03:39.450]  Oh, architecture. But I'm mostly retired.
[03:39.450 --> 03:42.450]  Now I run a journal out of Los Angeles called Glass Pane.
[03:42.450 --> 03:43.450]  You're French?
[03:43.450 --> 03:46.450]  Oh, no. Swiss first, from Geneva.
[03:46.450 --> 03:48.450]  Then Paris, then LA.
[03:49.450 --> 03:52.450]  I'm from London first. Then Paris.
[03:52.450 --> 03:53.450]  We met there.
[03:53.450 --> 03:54.450]  That's how we met.
[03:54.450 --> 03:57.450]  But I couldn't get work there, so I made Albon move with me.
[03:58.450 --> 04:00.450]  And what do you do?
[04:00.450 --> 04:03.450]  Well, I'm an actress, of course.
[04:03.450 --> 04:05.450]  Oh, really? She's great.
[04:06.450 --> 04:07.450]  For commercials.
[04:07.450 --> 04:09.450]  I have a contract with an LA company.
[04:09.450 --> 04:11.450]  They've been grooming me.
[04:11.450 --> 04:13.450]  I specialize in failing naturally.
[04:14.450 --> 04:17.450]  What does that mean? Failing naturally?
[04:17.450 --> 04:22.450]  Finding a natural-seeming way to fail at any given task.
[04:22.450 --> 04:24.450]  In each of the commercials that I'm in,
[04:24.450 --> 04:27.450]  I'm the one who simply can't go on without the product.
[04:27.450 --> 04:29.450]  It's ridiculous for me not to have the product.
[04:30.450 --> 04:31.450]  Okay.
[04:31.450 --> 04:32.450]  Show them.
[04:32.450 --> 04:33.450]  No.
[04:33.450 --> 04:34.450]  No, you should.
[04:34.450 --> 04:35.450]  Yeah.
[04:35.450 --> 04:36.450]  Please.
[04:36.450 --> 04:37.450]  Do you want to see?
[04:37.450 --> 04:38.450]  I want to see.
[04:38.450 --> 04:39.450]  Here.
[04:42.450 --> 04:43.450]  She's amazing.
[04:56.660 --> 05:02.450]  I just...
[05:04.450 --> 05:05.450]  I...

Standalone Faster-Whisper operation finished in: 38 seconds

C:\_PROG\! Subtitle Tools\Whisper-Faster_Win.x64_2023.05.13.b117_GPU>pause
Drücken Sie eine beliebige Taste . . .
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 18th May 2023, 14:22   #27  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
Tiny differences, float16 was quicker then int8, ffmpeg6.wav float16 was quickest.
.aac was slower.
As I thought: Precision pays off ?
After all it is about cross-comparing spectrograms,
and tiny losses in density differences can lead to costlier because more exhausting searches.

Alle these on model large, only this was available on that system for now,
and I did not want to let that one go into internet again after being bluescreened twice
by the last 2 forced M$ Win10 updates, the last time leaving me with unrepairable system.

I was not aware that M$ had decided the unspeakable from W10 r1803 on:
NOT to perform any registry backups anymore by default...
To save HDD space. WTF?

https://learn.microsoft.com/en-us/tr...regback-folder

NOT to perform any system restore points anymore by default...
Even deleting manually made ones. WTF?

https://answers.microsoft.com/en-us/...e-b605ea095ab1

https://answers.microsoft.com/en-us/...9-f6fd51184185

https://learn.microsoft.com/en-us/tr...oints-disabled
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."

Last edited by Emulgator; 18th May 2023 at 14:51.
Emulgator is offline   Reply With Quote
Old 18th May 2023, 15:24   #28  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
I need and asked for medium model and srt files. [uploaded somewhere like Wetransfer]
But I'll check these large tests too. [saved, so those posts are not needed anymore]

Quote:
Originally Posted by Emulgator View Post
Ah, well, 8bit vs. 16bit can make all the difference !
There are many other quantization types, run --verbose to see all supported on your device.

Quote:
Originally Posted by Emulgator View Post
I give a .wav 32bit float decode from the DVD .ac3 track
Don't. Use original audio.

Quote:
Originally Posted by Emulgator View Post
r103 GPU from 04.05.2023
r103 GPU from 17.05.2023
r117 GPU from 17.05.2023

All versions have their uses and guess differently.
But there was only one "b103" version. Why you need old version?

Quote:
Originally Posted by Emulgator View Post
Will have to talk Nikse into having 3 editor tabs in SubtitleEdit, muhahaha ;-)
You can open other SE instances.


Quote:
Originally Posted by Emulgator View Post
Tiny differences, float16 was quicker then int8
Benchmarks on short files doesn't mean much.

Quote:
Originally Posted by Emulgator View Post
As I thought: Precision pays off ?
Did you meant compute types? I'm not sure how they correlate to accuracy or speed. So far for me int8 looks best when float32 is fastest. Some users reported opposite effects.

EDIT:
Or did you meant something with audio? That "wav" test file is only to check some quirks with FFmpeg v6. For some reason results from v6 can be worse or different, it affects int types.

Last edited by VoodooFX; 18th May 2023 at 15:35.
VoodooFX is offline   Reply With Quote
Old 18th May 2023, 15:38   #29  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
Quote:
Originally Posted by Emulgator View Post
r103 GPU from 04.05.2023
r103 GPU from 17.05.2023
r117 GPU from 17.05.2023

All versions have their uses and guess differently.
But there was only one "b103" version. Why you need old version?
These were my test dates, not the .exe dates.
Sorry for the ambiguity.

Quote:
Originally Posted by Emulgator View Post
I give a .wav 32bit float decode from the DVD .ac3 track
Don't. Use original audio.
This (in this case .ac3) will have to be decoded to uncompressed before FFTing anyway,
and I want to be in control about the decoding precision.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."

Last edited by Emulgator; 18th May 2023 at 15:42.
Emulgator is offline   Reply With Quote
Old 20th May 2023, 13:44   #30  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
@Emulgator Could you do few more tests on aac with CUDA: "--language en --model=large --compute_type=float32" and "--language en --model=medium --compute_type=float16"?
[Results in the same form like you did previous tests.]

Btw, for your own tests you can try "--beam_size=5", it's slower but should produce better results.

Last edited by VoodooFX; 20th May 2023 at 14:05.
VoodooFX is offline   Reply With Quote
Old 22nd May 2023, 23:23   #31  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
Soon (...still trying to get my main system up and running as before)
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 13th July 2023, 23:03   #32  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
These are basically the two that I've tried [only on a few occasions, maybe 5 or 6],
Code:
Whisper-Faster\whisper.exe --model_dir ".\_models" --language en --model "large-v2" ".\audio.wav"

Whisper-Faster\whisper.exe --model_dir ".\_models" --language en --model "large-v2" ".\audio.dts"
I just use above to paste into command line.
Its weird how some subs are flagged <during non talkative periods> maybe up to a minute ahead of the actual start of speech, and stop pretty much at end of speech.
Also,
Quote:
Originally Posted by StainlessS View Post
EDIT: I did one recently on music video {live gig} containing Eng and Spanish, some of the Spanish speech
came out in Spanish, some of it came out translated to English. {& Eng came out Eng}.

EDIT: A few hiccoughs can occur, in one instance, the name "Hiller" was transformed throughout video, into "Hitler"
{Perhaps "Captain Steve Hitler" rings a bell}

EDIT: I wonder if its worth giving it a go on some Star Trek with lots of Klingon, I bet that some of that stuff was
scanned during A.I. training, might auto convert to earthling English.
Spanish/English thingy is Odd.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 13th July 2023 at 23:08.
StainlessS is offline   Reply With Quote
Old 14th July 2023, 01:44   #33  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
Quote:
Originally Posted by StainlessS View Post
whisper.exe --model_dir ".\_models" --language en --model "large-v2" ".\audio.wav"
I think you are using an old version, current is r134.6.
model_dir parameter is redundant in your example, at least in latest version.

Quote:
Originally Posted by StainlessS View Post
...up to a minute ahead of the actual start of speech
Probably in latest version you'll not see that.

Quote:
Originally Posted by StainlessS View Post
Spanish/English thingy is Odd.
Not odd, Whisper models doesn't support transcription of multilingual audio. You can try to process it twice, first with English then with Spanish parameter.
VoodooFX is offline   Reply With Quote
Old 14th July 2023, 15:15   #34  |  Link
SaurusX
Registered User
 
Join Date: Feb 2017
Posts: 134
Does this have the word-level timing feature of the original Whisper? I've yet to find a version of this with that word-level timing, CUDA, and the ability to use the HF models.
SaurusX is offline   Reply With Quote
Old 14th July 2023, 15:31   #35  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
Quote:
Originally Posted by SaurusX View Post
Does this have the word-level timing feature of the original Whisper? I've yet to find a version of this with that word-level timing, CUDA, and the ability to use the HF models.
Yes, it's enabled by default. It includes all those things.
VoodooFX is offline   Reply With Quote
Old 14th July 2023, 16:00   #36  |  Link
SaurusX
Registered User
 
Join Date: Feb 2017
Posts: 134
But the examples shown in the OP screenshot and by Emulgator do not show this. They're all specific second-based intervals that are seemingly locked into a particular fraction-of-a-second start point.

Last edited by SaurusX; 14th July 2023 at 16:03.
SaurusX is offline   Reply With Quote
Old 14th July 2023, 17:44   #37  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
Quote:
Originally Posted by SaurusX View Post
But the examples shown in the OP screenshot and by Emulgator do not show this. They're all specific second-based intervals that are seemingly locked into a particular fraction-of-a-second start point.
No idea what you mean by that, it shows same thing as original Whisper.
Post your screenshot of what your "original Whisper" shows.
VoodooFX is offline   Reply With Quote
Old 14th July 2023, 18:23   #38  |  Link
SaurusX
Registered User
 
Join Date: Feb 2017
Posts: 134
Quote:
Originally Posted by VoodooFX View Post
No idea what you mean by that, it shows same thing as original Whisper.
Post your screenshot of what your "original Whisper" shows.
"Original Whisper" as in from OpenAI's github repo.

https://github.com/openai/whisper

When using their CLI I add "--word_timestamps True" and the timing of each sentence or segment is more precise. To the fraction of a second usually, though it can hiccup.

I'll add some screenshots later today when I get to my computer.
SaurusX is offline   Reply With Quote
Old 14th July 2023, 19:15   #39  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
Quote:
Originally Posted by SaurusX View Post
When using their CLI I add "--word_timestamps True" and the timing of each sentence or segment is more precise.
Yeap, it includes that. In the first post is the old screenshot.

Last edited by VoodooFX; 14th July 2023 at 19:58.
VoodooFX is offline   Reply With Quote
Old 15th July 2023, 00:18   #40  |  Link
SaurusX
Registered User
 
Join Date: Feb 2017
Posts: 134
I was getting a dll error saying that I was missing "cudnn_ops_infer64_8.dll" and to put it into my system path. I downloaded it from this zip and dropped into by CUDA bin folder.

https://developer.download.nvidia.co.../cudnn/v8.3.0/

The word_timestamps is working as you said it would be. Doing other tests now with the different model sizes.


OK, that's fast. Using the large-v2 model!


Using the medium.en model.

Last edited by SaurusX; 15th July 2023 at 00:38.
SaurusX is offline   Reply With Quote
Reply

Tags
audio, openai, speech, subtitles, text

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:14.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.