Production First and Production Ready End-to-End Text-to-Speech Toolkit
We suggest installing WeTTS with Anaconda or Miniconda. Clone this repo:
git clone https://github.com/wenet-e2e/wetts.git
Create environment:
conda create -n wetts python=3.8 -y
Install MFA:
conda install -n wetts montreal-forced-aligner=2.0.1 -c conda-forge -y
For CUDA 10.2, run:
conda install -n wetts pytorch=1.11 torchaudio cudatoolkit=10.2 -c pytorch -y
For CUDA 11.3, run:
conda install -n wetts pytorch=1.11 torchaudio cudatoolkit=11.3 -c pytorch -y
Installing other dependencies using:
conda activate wetts
python -m pip install -r requirements.txt
We mainly focus on production and on-device TTS, and we plan to use:
- AM: FastSpeech2
- vocoder: hifigan/melgan
And we are going to provide reference solution of:
- Prosody
- Polyphones
- Text Normalization
We plan to support a variaty of open source TTS datasets, include but not limited to:
- BZNSYP, Chinese Standard Mandarin Speech corpus open sourced by Data Baker.
- AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin speech corpus.
- Opencpop, Mandarin singing voice synthesis (SVS) corpus open sourced by Netease Fuxi.
We plan to support a variaty of hardwares and platforms, including:
- x86
- Android
- Raspberry Pi
- Other on-device platforms
- We borrow some code from FastSpeech2 for FastSpeech2 implentation.
- We refer PaddleSpeech for feature extraction,
pinyin
lexicon preparation for alignment, and the length regulator inFastSpeech2
.