Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android real-time transcription #4

Open
salehsoleimani opened this issue Feb 22, 2025 · 10 comments
Open

Android real-time transcription #4

salehsoleimani opened this issue Feb 22, 2025 · 10 comments

Comments

@salehsoleimani
Copy link

Hey! thanks for the great repo.
any chances to run whisper real-time on mobile devices? according to your docs 1.5GB is too much for mobile devices. chances around maybe ~300MB memory usage?

@niedev
Copy link

niedev commented Feb 22, 2025

@salehsoleimani this library is derived from RTranslator, an Android app, so 1.5GB is not too much for mobile devices (although it is certainly quite heavy).

@salehsoleimani
Copy link
Author

can you provide an android example in this repo?

@salehsoleimani
Copy link
Author

are you sure it works real time on mobile devices? i tried out RTranslator and It didn't seem to be real-time! it takes at least 3~6 seconds to process each cunck

@eix128
Copy link
Owner

eix128 commented Feb 24, 2025

@salehsoleimani
This is a heavily modified of RTranslator library as niedev as said.
But its not same really.This has been optimized more and currently i dont have so much time to update currently.
There are many libraries appear after whisper.One of them as i told to niedev is , Sensevoice.
You can check it out if you want really small footprint.Its newer library currently in the market.Some people say it whisper killer.

@niedev
Copy link

niedev commented Feb 27, 2025

are you sure it works real time on mobile devices? i tried out RTranslator and It didn't seem to be real-time! it takes at least 3~6 seconds to process each cunck

Depends a lot on the phone you use (mine takes 1.6/2 seconds for each cunk), but yeah the audio is always processed in cuncks, no matter how small or fast the model is. For a true real time speech recognition with whisper, the only option I know is the stream version of whisper.cpp

@salehsoleimani
Copy link
Author

are you sure it works real time on mobile devices? i tried out RTranslator and It didn't seem to be real-time! it takes at least 3~6 seconds to process each cunck

Depends a lot on the phone you use (mine takes 1.6/2 seconds for each cunk), but yeah the audio is always processed in cuncks, no matter how small or fast the model is. For a true real time speech recognition with whisper, the only option I know is the stream version of whisper.cpp

thanks for the reply. where you mentioned 1.6 seconds per chunk you mean RTranslator or this repo?.... do you have any examples for this code you've replied?

@niedev
Copy link

niedev commented Feb 27, 2025

I mean RTranslator, and what do you mean with an example?

@salehsoleimani
Copy link
Author

I mean RTranslator, and what do you mean with an example?

an example for android implementation

@niedev
Copy link

niedev commented Feb 27, 2025

Oh ok, there is a Whisper.cpp example app for Android but it doesn't implement the stream inference for Whisper, but you could implement yourself understanding how the stream version works and implementing it on Android in C++ (the code is in the example I linked in the previous message, and the issue linked in that page explain how it works)

@salehsoleimani
Copy link
Author

Oh ok, there is a Whisper.cpp example app for Android but it doesn't implement the stream inference for Whisper, but you could implement yourself understanding how the stream version works and implementing it on Android in C++ (the code is in the example I linked in the previous message, and the issue linked in that page explain how it works)

nice thanks i appreciate it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants