About • Installation • How To Use • Useful Links
This repository contains an implementation of the HiFiGAN model (Vocoder for TTS task). As we want to test e2e TTS pipeline, we using pretrained NeMo FastPitchModel for English. We using LJDataset for training data and also using WV-MOS for automatic MOS validation
The general steps are the following:
-
(Optional) Create and activate new environment using
condaorvenv(+pyenv).a.
condaversion:# create env conda create -n project_env python=PYTHON_VERSION # activate env conda activate project_env
b.
venv(+pyenv) version:# create env ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env # alternatively, using default python version python3 -m venv project_env # activate env source project_env/bin/activate
-
Install all required packages
pip install -r requirements.txt
-
Install model checkpoint using python script
python3 src/utils/download_best_model.py
You need to download LJ Speech Dataset to data/datasets/lj_dataset. You can use wget tool or download manually.
Also we are support multi-GPU training using Distributed Data Parallel (DDP) for torch.
To train a model, run the following command:
python3 train.py -cn=hifiganAnd for fine-tune run:
python3 train.py -cn=hifigan_finetune writer.run_name="hifigan_v1_finetune"
To run inference (synthesize speech with provided text):
python3 synthesize.py -cn=synthesize_text_based inferencer.text="Test phrase for the demo" # here you can provide any other textAnd if you want to resynthesize speech using the ground-truth audio, you can use another config:
python3 synthesize.py -cn=synthesize_mel_based datasets.inference.data_path="YOUR_PATH"For more examples check demo.ipynb.