Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added API support for local Zonos. #73

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

PhialsBasement
Copy link

Add REST API Endpoints

This PR adds FastAPI endpoints to Zonos, allowing programmatic access to the model's functionality alongside the existing Gradio interface.

Added Features

  • /models endpoint to list available models
  • /generate endpoint for text-to-speech generation
  • /speaker_embedding endpoint for creating speaker embeddings

Changes

  • Added FastAPI integration
  • Model responses are streamed as WAV files
  • Added Pydantic models for request validation

Testing

Tested with curl commands:

  • GET /models works as expected
  • POST /generate successfully generates audio
  • POST /speaker_embedding successfully creates embeddings

The implementation reuses existing model management code and runs alongside the Gradio interface on a different port.

@PhialsBasement PhialsBasement mentioned this pull request Feb 14, 2025
@darkacorn
Copy link
Contributor

darkacorn commented Feb 14, 2025

i would maybe separate that into a different api file without gradio -
as you use one or the other most likely not the same time

and have a api consuming gradio ui - as a refactor - if that is the goal

also as a request -

maybe trying to keep in alignment to openai's tts api
that is very much integrated and supported everywhere,optional features as separate parameters

this would allow easy integration for 3rd party systems without much hassle and with sane defaults

@Steveboy123
Copy link

Thank you @PhialsBasement , you are a lifesaver.

@darkacorn
Copy link
Contributor

darkacorn commented Feb 15, 2025

image

thats more akin to what im proposing .. ( mind you uploading a voice file for every request to a remote mashine maybe suboptimal)

we may even want to isolate loading transformer and hybrid at the same time so there is no need to swap over .. models are small enough to fit even in peanut cards - ( model loading time would hurt throughput ) ( optional pinning or full override-able but i would make that the default behaviour for any load bearing api)

in an api scenario batch processor with queue could be prefixed with just what model to take as both are present in vram ( i work on that once we get a go ahead or at least a LGTM from the team)

voice could be embedded as tensors on voice upload - and on usege we just pull in the tensor to save computation

atm i support mp3/wav while always converting to wav as a baseline

happy to help out .. but i think api and gradio should be clearly separated .. can someone from zyphra chip in here ?

@zaydek
Copy link

zaydek commented Feb 15, 2025

Just want to mention this thread as relevant for when a teammate comes around to see this PR: #37.

@darkacorn
Copy link
Contributor

darkacorn commented Feb 15, 2025

agreed but that is different as there api has different sampling .. that should be compensate able once we know what they use
the model cond. has params for min p top k / top p / temp and rep_pen .. which are not exposed or used atm in oss only min_p for the time beeing

@Ph0rk0z
Copy link

Ph0rk0z commented Feb 15, 2025

With OAI endpoint and speakers from folder as returned voices it would work straight away in sillytavern. Unconditional emotions and it would be good "as-is".

@darkacorn
Copy link
Contributor

With OAI endpoint and speakers from folder as returned voices it would work straight away in sillytavern. Unconditional emotions and it would be good "as-is".

pretty much why i proposed it that way .. integration in hundreds of systems would work w/o any extra work

@PhialsBasement
Copy link
Author

@darkacorn just threw in some of your suggestions, check it out and tell me if its what you were thinking

@darkacorn
Copy link
Contributor

amazing thanks for pulling that in, good baseline

@ther3zz
Copy link

ther3zz commented Feb 16, 2025

I'm currently testing the openai endpoint, will report back if I run into any issues!
That being said, it makes sense to include a swagger docs endpoint as well (or at least some variable to enable/disable the docs page)

@ther3zz
Copy link

ther3zz commented Feb 16, 2025

Has anyone been able to create embeddings? I'm running into this error:

{
    "detail": "'int' object has no attribute 'query'"
}

@PhialsBasement
Copy link
Author

@ther3zz Fixed. Issue was in api.py, i was tryina use .query() on a CUDA stream handle, now its just a normal UNIX timestamp instead.
image

@ther3zz
Copy link

ther3zz commented Feb 17, 2025

@ther3zz Fixed. Issue was in api.py, i was tryina use .query() on a CUDA stream handle, now its just a normal UNIX timestamp instead. image

Looks like it's working!

@ther3zz
Copy link

ther3zz commented Feb 17, 2025

Another issue I noticed is that MODEL_CACHE_DIR=/app/models doesnt seem to work. I'm not seeing the models cached there. I see them going here: /root/.cache/huggingface/hub/

@PhialsBasement
Copy link
Author

Whack, ill look into it and see whats going on there

@Ph0rk0z
Copy link

Ph0rk0z commented Feb 17, 2025

Why can't we just load models from a folder we manually saved? I get that huggingface hub is used for docker, but not all of us are doing that.

@darkacorn
Copy link
Contributor

i dont think there is anything that prevents it .. you can even use it offline with the hf client

@Ph0rk0z
Copy link

Ph0rk0z commented Feb 17, 2025

I've had to change loading to from_local in gradio and all. The from_pretrained is hijacked away from torch.

@mathematicalmichael
Copy link

@PhialsBasement
Copy link
Author

@ther3zz can you move this to issues tab over on my fork?

@ther3zz
Copy link

ther3zz commented Feb 18, 2025

Another issue I noticed is that MODEL_CACHE_DIR=/app/models doesnt seem to work. I'm not seeing the models cached there. I see them going here: /root/.cache/huggingface/hub/

I don't actually see an issues tab on when on your fork
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants