-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added API support for local Zonos. #73
base: main
Are you sure you want to change the base?
Conversation
i would maybe separate that into a different api file without gradio - and have a api consuming gradio ui - as a refactor - if that is the goal also as a request - maybe trying to keep in alignment to openai's tts api this would allow easy integration for 3rd party systems without much hassle and with sane defaults |
Thank you @PhialsBasement , you are a lifesaver. |
thats more akin to what im proposing .. ( mind you uploading a voice file for every request to a remote mashine maybe suboptimal) we may even want to isolate loading transformer and hybrid at the same time so there is no need to swap over .. models are small enough to fit even in peanut cards - ( model loading time would hurt throughput ) ( optional pinning or full override-able but i would make that the default behaviour for any load bearing api) in an api scenario batch processor with queue could be prefixed with just what model to take as both are present in vram ( i work on that once we get a go ahead or at least a LGTM from the team) voice could be embedded as tensors on voice upload - and on usege we just pull in the tensor to save computation atm i support mp3/wav while always converting to wav as a baseline happy to help out .. but i think api and gradio should be clearly separated .. can someone from zyphra chip in here ? |
Just want to mention this thread as relevant for when a teammate comes around to see this PR: #37. |
agreed but that is different as there api has different sampling .. that should be compensate able once we know what they use |
With OAI endpoint and speakers from folder as returned voices it would work straight away in sillytavern. Unconditional emotions and it would be good "as-is". |
pretty much why i proposed it that way .. integration in hundreds of systems would work w/o any extra work |
@darkacorn just threw in some of your suggestions, check it out and tell me if its what you were thinking |
amazing thanks for pulling that in, good baseline |
I'm currently testing the openai endpoint, will report back if I run into any issues! |
Has anyone been able to create embeddings? I'm running into this error:
|
@ther3zz Fixed. Issue was in api.py, i was tryina use .query() on a CUDA stream handle, now its just a normal UNIX timestamp instead. |
Looks like it's working! |
Another issue I noticed is that MODEL_CACHE_DIR=/app/models doesnt seem to work. I'm not seeing the models cached there. I see them going here: /root/.cache/huggingface/hub/ |
Whack, ill look into it and see whats going on there |
Why can't we just load models from a folder we manually saved? I get that huggingface hub is used for docker, but not all of us are doing that. |
i dont think there is anything that prevents it .. you can even use it offline with the hf client |
I've had to change loading to from_local in gradio and all. The from_pretrained is hijacked away from torch. |
hope this helps: HF hub config: |
@ther3zz can you move this to issues tab over on my fork? |
Add REST API Endpoints
This PR adds FastAPI endpoints to Zonos, allowing programmatic access to the model's functionality alongside the existing Gradio interface.
Added Features
/models
endpoint to list available models/generate
endpoint for text-to-speech generation/speaker_embedding
endpoint for creating speaker embeddingsChanges
Testing
Tested with curl commands:
The implementation reuses existing model management code and runs alongside the Gradio interface on a different port.