Hexgrad PRO

hexgrad

AI & ML interests

Solo Leveling Research

Recent Activity

Articles

Organizations

None yet

hexgrad's activity

replied to their post 1 day ago
view reply

@to-be There are more details at https://hf.co/hexgrad/Kokoro-82M/discussions/21 and my Discord DMs are open if you have more questions, but essentially I am looking for segmented text-audio pairs: likely .txt and .wav pairs, with each .txt being ~500 characters or less (needs to fit inside 512 token context hard limit) and the .wav matching the text.

replied to their post 5 days ago
view reply

It's simple: what you put in is what you get out. πŸ˜„ German support in the future depends mostly on how much German data (synthetic audio + text labels) is contributed.

replied to their post 5 days ago
posted an update 5 days ago
view post
Post
4883
πŸ“£ Looking for labeled, high-quality synthetic audio/TTS data πŸ“£ Have you been or are you currently calling API endpoints from OpenAI, ElevenLabs, etc? Do you have labeled audio data sitting around gathering dust? Let's talk! Join https://discord.gg/QuGxSWBfQy or comment down below.

If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.

What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. ❀️

More details at hexgrad/Kokoro-82M#21
Β·
replied to their post 10 days ago
posted an update 10 days ago
view post
Post
2660
Happy New Year! πŸŒƒ af_sky landed in Kokoro, along with an article: hexgrad/Kokoro-82M
  • 2 replies
Β·
posted an update 14 days ago
view post
Post
2719
πŸ‡¬πŸ‡§ Four British voices have joined hexgrad/Kokoro-82M (Apache TTS model): bf_emma, bf_isabella, bm_george, bm_lewis
posted an update 16 days ago
view post
Post
3104
Tonight, Adam & Michael join the 82M Apache TTS model in hexgrad/Kokoro-82M
posted an update 17 days ago
view post
Post
3944
Merry Christmas! πŸŽ„ Open sourced a small TTS model at hexgrad/Kokoro-82M
  • 2 replies
Β·
posted an update about 1 month ago
view post
Post
1083
πŸš€ Shipmas Day 2.5 πŸš€ Kokoro v0.22 packs 5 languages in 82M params! πŸ‡ΊπŸ‡ΈπŸ‡¬πŸ‡§πŸ‡«πŸ‡·πŸ‡―πŸ‡΅πŸ‡°πŸ‡·πŸ‡¨πŸ‡³ hexgrad/Kokoro-TTS

Feedback appreciated, both positive or negative. Non-English languages haven't been validated by the model creator(s), so if you're a native speaker, criticize away!

γ€Œγ‚³γ‚³γƒ­γƒ†γ‚£γƒΌγƒ†γ‚£γƒΌγ‚¨γ‚Ήγ―γ€θ‹±θͺžγ¨ζ—₯本θͺžγ«εŠ γˆγ¦γ€δΈ­ε›½θͺžγ€ιŸ“ε›½θͺžγ€γƒ•γƒ©γƒ³γ‚Ήθͺžγ‚’θ©±γ™γ“γ¨γŒγ§γγ‚‹γ‚ˆγ†γ«γͺγ‚ŠγΎγ—γŸγ€‚γ€

Wav converted to mp4 using FFmpeg, since audio attachments aren't allowed in Posts. You may have to unmute the video.
replied to their post about 1 month ago
view reply

The voice quality actually sounds close to ElevenLabs.

I might've mentioned this elsewhere, but if you plug Kokoro outputs for named ElevenLabs voices into https://elevenlabs.io/ai-speech-classifier you should get very reliable positives (98% confident generated by ElevenLabs).

By ear, I think Kokoro is indeed close to ElevenLabs, especially on certain voices. For Nicole, they are indistinguishable to me. Michael is pretty close; Adam is still somewhat weak.

But StyleTTS usually is not very emotional.

I agree. Kokoro also has 2 specific issues in this area: (1) little to no emotional audio seen during training, and (2) even if there was, the stock voices are average style vectors over 10-100 samples, creating an average/neutral style anyway.

posted an update about 1 month ago
view post
Post
2941
self.brag(): Kokoro finally got 300 votes in Pendrokar/TTS-Spaces-Arena after @Pendrokar was kind enough to add it 3 weeks ago.
Discounting the small sample size of votes, I think it is safe to say that hexgrad/Kokoro-TTS is currently a top 3 model among the contenders in that Arena. This is notable because:
- At 82M params, Kokoro is one of the smaller models in the Arena
- MeloTTS has 52M params
- F5 TTS has 330M params
- XTTSv2 has 467M params
Β·
replied to fdaudens's post about 1 month ago
view reply

I used ffmpeg to make the video:

ffmpeg -i input.wav -r 25 -filter_complex "[0:a]compand,showwaves=size=400x400:colors=#ffd700:draw=full:mode=line,format=yuv420p[vout]" -map "[vout]" -map 0:a -c:v libx264 -c:a aac output.mp4
posted an update about 1 month ago
view post
Post
1369
@Respair just dropped Tsukasa: frontier TTS in Japanese Respair/Tsukasa_Speech
It's expressive, punches way above its weight class, and supports voice cloning. Go check it out! πŸš€
(Unmute the audio sample below after hitting play)
replied to fdaudens's post about 1 month ago
reacted to fdaudens's post with πŸ‘ about 1 month ago
view post
Post
1039
The rapid progress in small audio models is mind-blowing! 🀯 Just tested OuteTTS v0.2 - cloned my voice from a 10s clip with impressive accuracy and natural prosody.

At 500M parameters, it's efficient enough to run on basic hardware but powerful enough for professional use.

This could transform how we produce audio content for new - think instant translated interviews keeping original voices, or scaled audio article production!

Demo and Model on the Hub: OuteAI/OuteTTS-0.2-500M h/t @reach-vb
  • 3 replies
Β·
replied to Pendrokar's post about 2 months ago
view reply

This is conjecture, but it's possible the voice sample for XTTS is in-distribution, i.e. seen during training, and if so you'd expect it to perform better than F5 given the same reference. No knock on XTTS btw, Kokoro is equally guilty for thisβ€”the voice used in the Arena is also in-distribution.

It would not be surprising to me if voice cloning is simply "looking up" the most similar speaker or interpolation of speakers seen in training. François Chollet has discussed this phenomenon many times wrt LLMs, and I highly recommend to listening to his talks.

https://hf.co/spaces/hexgrad/Kokoro-TTS/discussions/3#6744bdea8c689a7071742134

posted an update about 2 months ago
view post
Post
1715
hexgrad/Kokoro-TTS just got an upgrade that substantially improves TTS naturalness for short bursts while maintaining parity for longer utterances! πŸ”₯

Read more and listen to before/after audio samples at https://hf.co/blog/hexgrad/kokoro-short-burst-upgrade

(Probably would have made that Article a Post instead, if audio could be embedded into Posts.)
  • 2 replies
Β·
posted an update about 2 months ago