Conversation
Splend1d
left a comment
There was a problem hiding this comment.
Hi,
Thank you for your contibution to the codebase. Could you refactor into the changes by the following 3 types you mentioned?
- Support speaker caching
- vram clean up
- webui/api server
For speaker caching, please include a two step approach example
- generate cache files
- use cache files to generate speech
Please use an argument to point to the .pt file directly, don't use other model_dir arguments to find the cached path.
If you would like to adopt a spk_id approach, please register this during generating cached files. (Apologies if I missed this part)
|
Thank you for your review! I'd like to address your questions as follows: 1. Regarding
|
|
Hi Brandon, I see, thank you for your clarification. I missed the part where spkinfo is part of the model. However, I regarding speaker caching, I would still require the following fixes:
Would you be able commit this separately, without the TTS server and UI code? I would like to review the other components individually. Best, |
|
Hi Jeff, Just a quick update — I’ve set the default I’ve also cleaned up some variable and function names that were potentially unclear. Let me know if the new naming fits with the project’s style. As for the API server and UI parts — I’ve removed them from this PR for now, per your suggestion. Once this is merged, I’ll open a separate PR to submit the cosyvoice API server and Web UI components. Appreciate your time! |
This PR introduces a modification that allows users to cache speaker embeddings in a
spk2info.ptfile. Additionally, users can now specify aspeaker_idat runtime to directly reference the stored speaker information.spk2info.pt.speaker_idis provided, the system retrieves speaker info fromspk2info.ptinstead of reprocessing the audio file.speaker_id.It should significantly reduces latency during voice cloning or synthesis by avoiding redundant speaker embedding extraction.