About • Installation • How To Use • Credits • License
This repository contains a model for generating audio from text. The model was trained on LJSpeech dataset.
Follow these steps to install the project:
-
(Optional) Create and activate new environment using
conda.# create env conda create -n project_env python=PYTHON_VERSION # activate env conda activate project_env
-
Install all required packages
pip install -r requirements.txt
-
Install
speechbrainmodel for inferencegit clone https://github.com/speechbrain/speechbrain.git cd speechbrain pip install --editable . cd .. -
Load model checkpoint
mkdir pretrained gdown https://drive.google.com/uc?id=16ci_beU3km20_ZFZ4VOFpr4KbIUbKCMi -O pretrained/model_best.pth
To train a model, run the following command:
python3 train.py -cn=train HYDRA_CONFIG_ARGUMENTSWhere CONFIG_NAME is a config from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments.
To run inference (synthesize audio from transcriptions or re-synthesize audio):
# resynthesize - if True, then inputs_dir should contain audio samples, else - text transctriptions
# save_mel - whether to save mel spectrograms to outputs_dir/spectrograms folder
# input_text - if passed then the dataset from inputs_dir will be ignored
# save_name - the save name for audio renerated from input_text
python synthesize.py \
inferencer.inputs_dir="path/to/transcriptions/or/audio/dir" \
inferencer.outputs_dir="path/to/output/dir" \
inferencer.from_pretrained="path/to/model/checkpoint/file" \
inferencer.resynthesize=False \
inferencer.input_text=null \
inferencer.save_name="audio_name" \
inferencer.save_mel=False
This repository is based on a PyTorch Project Template.