Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate faster models #36

Closed
jsboige opened this issue Sep 9, 2023 · 8 comments
Closed

Integrate faster models #36

jsboige opened this issue Sep 9, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@jsboige
Copy link
Contributor

jsboige commented Sep 9, 2023

The large model currently brings good results, but it seems faster versions have emerged here and there, typically through quantization, with similar results, faster inference and a much lower memory footprint.
See for instance:

What would it take to support some of those?

@jhj0517 jhj0517 added the enhancement New feature or request label Sep 9, 2023
@jhj0517
Copy link
Owner

jhj0517 commented Sep 9, 2023

Hi @jsboige.
Thanks for the introduction.
I'm currently thinking about integrating this, the first of what you listed.

https://github.com/guillaumekln/faster-whisper

According to the repo, it reduces the VRAM usage to 12GB -> 4GB on large-v2 model as well as reducing the time consuming.
Maybe this would be the easiest way to impelement "faster whisper".

I guess I should add an command line argument for faster-whisper and impelement it.

@jsboige
Copy link
Contributor Author

jsboige commented Sep 9, 2023

That would be great !

@jhj0517
Copy link
Owner

jhj0517 commented Sep 10, 2023

faster-whisper implemented in #37,
It reduced the time consumption from 4 minutes 8 seconds -> 3 minutes 53 seconds for a 30 minute audio file.
I'm not sure if this efficiency is right or not, and I only compared one file ( Korean audio file ), so it could be wrong, but it would probably be useful for reducing VRAM usage.

@jsboige
Copy link
Contributor Author

jsboige commented Sep 10, 2023

faster-whisper implemented in #37,

Thanks, I could test it with success on French songs, but language selection does not seem to work (I had to use auto-detection)
I got the following error ("fr" is expected instead of "french")

Error transcribing file on line french is not a valid language code

It reduced the time consumption from 4 minutes 8 seconds -> 3 minutes 53 seconds for a 30 minute audio file. I'm not sure if this efficiency is right or not, and I only compared one file ( Korean audio file ), so it could be wrong, but it would probably be useful for reducing VRAM usage.

That looks underwhelming indeed. Anyway it's definitely using less VRAM, and I believe I got a perceptible performance gain for small songs' mp3s.

@jhj0517
Copy link
Owner

jhj0517 commented Sep 10, 2023

language selection does not seem to work (I had to use auto-detection)
I got the following error ("fr" is expected instead of "french")

Thanks for pointing this out. fixed in # 6726c6a .

@guillaumekln
Copy link

You set a default beam size of 5 for faster-whisper:

self.default_beam_size = 5

but you don't set the same beam size in openai-whisper (which is 1 by default). You should set the same beam size when comparing the transcription time.

@jhj0517
Copy link
Owner

jhj0517 commented Sep 11, 2023

@guillaumekln Thanks! You're right. I didn't read it correctly.
I compared again with the same file, and the time reduction efficiency is much better.
The processing time was reduced from 4 minutes 8 seconds -> 2 minutes 33 seconds for a 30-minute audio file, when both beam sizes are set to 1.
Thanks for informing me. Also, thanks for creating faster-whisper; it has helped my project a lot.

By the way, beam_size should be a tunable parameter. I'm thinking of creating a collapsible "Advanced Parameters" tab to include it.
There could also be logprob_threshold and no_speech_threshold etc.

@jhj0517
Copy link
Owner

jhj0517 commented Apr 7, 2024

resolved with faster-whipser

@jhj0517 jhj0517 closed this as completed Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants