Kokoro TTS Local

A local implementation of the Kokoro Text-to-Speech model, featuring dynamic module loading, automatic dependency management, and a web interface.

Current Status

✅ WORKING - READY TO USE ✅

The project has been updated with:

Automatic espeak-ng installation and configuration
Dynamic module loading from Hugging Face
Improved error handling and debugging
Interactive CLI interface
Cross-platform setup scripts
Web interface with Gradio
Fast package management with uv

Features

Local text-to-speech synthesis using the Kokoro model
Automatic espeak-ng setup using espeakng-loader
Multiple voice support with easy voice selection
Phoneme output support and visualization
Interactive CLI for custom text input
Voice listing functionality
Dynamic module loading from Hugging Face
Comprehensive error handling and logging
Cross-platform support (Windows, Linux, macOS)
NEW: Web Interface Features
- Modern, user-friendly UI
- Real-time generation progress
- Multiple output formats (WAV, MP3, AAC)
- Network sharing capabilities
- Audio playback and download
- Voice selection dropdown
- Detailed process logging

Prerequisites

Python 3.8 or higher
Git (for cloning the repository)
Internet connection (for initial model download)
FFmpeg (required for MP3/AAC conversion):
- Windows: Automatically installed during setup
- Linux: sudo apt-get install ffmpeg
- macOS: brew install ffmpeg

Windows-Specific Requirements

For optimal performance on Windows, you should either:

Enable Developer Mode:
- Open Windows Settings
- Navigate to System > Developer settings
- Turn on Developer Mode

OR

Run Python as Administrator:
- Right-click your terminal (PowerShell/Command Prompt)
- Select "Run as administrator"
- Run the commands from there

This is needed for proper symlink support in the Hugging Face cache system. If you skip this, the system will still work but may use more disk space.

Dependencies

torch
phonemizer-fork
transformers
scipy
munch
soundfile
huggingface-hub
espeakng-loader
gradio>=4.0.0
pydub  # For audio format conversion

Setup

We use the modern uv package manager for faster and more reliable dependency management.

Windows

# Clone the repository
git clone https://github.com/PierrunoYT/Kokoro-TTS-Local.git
cd Kokoro-TTS-Local

# Run the setup script (will install uv if not present)
.\setup.ps1

Linux/macOS

# Clone the repository
git clone https://github.com/PierrunoYT/Kokoro-TTS-Local.git
cd Kokoro-TTS-Local

# Run the setup script (will install uv if not present)
chmod +x setup.sh
./setup.sh

The setup scripts will:

Install the uv package manager if not present
Create a virtual environment
Install all dependencies using uv
Install system requirements (espeak-ng, FFmpeg)

Usage

Web Interface

# Start the web interface
python gradio_interface.py

After running the command:

Open your web browser and visit: http://localhost:7860
The interface will also create a public share link (optional)
You can now:
- Input text to synthesize
- Select from available voices
- Choose output format (WAV/MP3/AAC)
- Monitor generation progress
- Play or download generated audio

Note: If port 7860 is already in use, Gradio will automatically try the next available port (7861, 7862, etc.). Check the terminal output for the correct URL.

Command Line Interface

python tts_demo.py

The script will:

Download necessary model files from Hugging Face
Set up espeak-ng automatically using espeakng-loader
Import required modules dynamically
Test the phonemizer functionality
Generate speech from your text with phoneme visualization
Save the output as 'output.wav' (22050Hz sample rate)

Project Structure

.
├── .cache/                 # Cache directory for downloaded models
│   └── huggingface/       # Hugging Face model cache
├── .git/                   # Git repository data
├── .gitignore             # Git ignore rules
├── .gradio/               # Gradio cache and configuration
│   ├── certificate.pem    # SSL certificate for Gradio
│   └── ...               # Other Gradio config files
├── __pycache__/           # Python cache files
├── outputs/               # Generated audio output files
│   ├── output.wav        # Default output file
│   ├── output.mp3        # MP3 converted files
│   └── output.aac        # AAC converted files
├── voices/                # Voice model files (downloaded on demand)
│   └── ...               # Voice files are downloaded when needed
├── venv/                  # Python virtual environment
├── LICENSE                # Apache 2.0 License file
├── README.md             # Project documentation
├── gradio_interface.py    # Web interface implementation
├── models.py             # Core TTS model implementation
├── requirements.txt      # Python dependencies
├── setup.ps1             # Windows setup script
├── setup.sh              # Linux/macOS setup script
└── tts_demo.py          # CLI demo implementation

Model Information

The project uses the Kokoro-82M model from Hugging Face:

Repository: hexgrad/Kokoro-82M
Model file: kokoro-v0_19.pth
Voice files: Located in the voices/ directory (downloaded automatically when needed)
Available voices:
- American Female: af_bella, af_nicole, af_sarah, af_sky
- American Male: am_adam, am_michael
- British Female: bf_emma, bf_isabella
- British Male: bm_george, bm_lewis
Automatically downloads required files from Hugging Face

Technical Details

Sample rate: 22050Hz
Input: Text in any language (English recommended)
Output: WAV/MP3/AAC audio file
Dependencies are automatically managed
Modules are dynamically loaded from Hugging Face
Error handling includes stack traces for debugging
Cross-platform compatibility through setup scripts

Contributing

Feel free to contribute by:

Opening issues for bugs or feature requests
Submitting pull requests with improvements
Helping with documentation
Testing different voices and reporting issues
Suggesting new features or optimizations
Testing on different platforms and reporting results

License

This project is licensed under the Apache 2.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kokoro TTS Local

Current Status

Features

Prerequisites

Windows-Specific Requirements

Dependencies

Setup

Windows

Linux/macOS

Usage

Web Interface

Command Line Interface

Project Structure

Model Information

Technical Details

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.gradio		.gradio
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gradio_interface.py		gradio_interface.py
models.py		models.py
requirements.txt		requirements.txt
setup.ps1		setup.ps1
setup.sh		setup.sh
tts_demo.py		tts_demo.py

License

PierrunoYT/Kokoro-TTS-Local

Folders and files

Latest commit

History

Repository files navigation

Kokoro TTS Local

Current Status

Features

Prerequisites

Windows-Specific Requirements

Dependencies

Setup

Windows

Linux/macOS

Usage

Web Interface

Command Line Interface

Project Structure

Model Information

Technical Details

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages