Continuous Emotional TTS

An implementation of continuous emotional speech synthesis in TensorFlow, folked from Keithito

Azam Rabiee, Tae-Ho Kim, Soo-Young Lee "Adjusting Pleasure-Arousal-Dominance for Continuous Emotional Text-to-speech Synthesizer," accepted paper in Interspeech2019, show and tell demonstration.

Background

In April 2017, Google published a paper, Tacotron: Towards End-to-End Speech Synthesis, where they present a neural text-to-speech model that learns to synthesize speech directly from (text, audio) pairs.

Here we add emotion to the TTS!

In the first project categorical emotion are added; for more info you may check "Emotional End-to-End Neural Speech synthesizer". In the second step, we made the continuous emotional TTS!

Emotion is not limited to discrete categories of happy, sad, angry, fear, disgust, surprise, and so on. Here, each emotion category is projected onto a set of independent dimensions named Pleasure-Arousal-Dominance (PAD). The value of each dimension varies from -1 to 1, such that the neutral emotion is in the center with all-zero values. However, you can generate speech with various emotions by either setting any arbitrary PAD or selecting an emotion category.

Demo

For demo, click here

Continuous Emotional samples

You can find synthesized waves with continuous emotions in this video

Quick Start

Installing dependencies

Install Python 3.
Install the latest version of TensorFlow for your platform. For better performance, install with GPU support if it's available. This code works with TensorFlow 1.12 and later.
Install requirements:
```
pip install -r requirements.txt
```

Using a pre-trained model

a pretrained model for Korean language is available here.

Download the model

Run the demo server:

python3 demo_server_gpu.py --checkpoint <path-to-the-pretrained-model>/model.ckpt-385000

Point your browser at localhost:9000
- select the emotion category or any PAD
- Type what you want to synthesize

Training

Download an emotional speech dataset. We have used an internal Korean emotional speech dataset containing 6 emotional categories plus neutral speech. You can use other datasets if you convert them to the right format. See TRAINING_DATA.md for more info.

Note: In our emotional dataset, the wave files are in wav_16k folder; filename contains the emotion label. For example, acriil_hap_m30_1981.wav means that a 30-year old man has uttered the sentence ID 1981 with happy emotion. In addition, all scripts are in emoTTS_script.txt as <sentence-ID> <text> in each line.
Preprocess the data. The preprocessing prepares triples of the linear spectrogram, the Mel spectrogram, and text, as <spec-npy-filename|mel-npy-filename|text> for training; then saves them in a folder specified by --output. Please note that in the current implementation (in datafeeder.py), it is supposed that the filename contains the emotion label.

Note 1: make sure to trim the silence in the beginning and ending of the wave files. You can run python3 ./datasets/trimmer.py on your in_dir path pointing to the wave folder. Edit the trimmer.py file for your parameters if needed. The trimmer.py utilizes voice activity detection (VAD) to trim the silence.

Note 2: For Korean language, make sure to run python3 preprocess_kor_text.py to separate the characters of a syllabus, for example from 안녕 to ㅇㅏㄴㄴㅕᆼ. Modify the preprocess_kor_text.py file as needed.

Run the following command to make mel-*.npy and spec-*.npy files.
```
python3 preprocess.py --dataset emotionalDS
```
Check the preprocess.py and emotionalDS.py for your dataset. Replace the dataset name with yours.

After this step, the training folder contains mel- and spec- .npy files, as well as a metadata text file, named train.txt with <spec-filename|mel-filename|number-of-frames|text> format for each line. Here is an example:
```
spec-neu_f30_0001.npy|mel-neu_f30_0001.npy|495|ㅈㅔㅇㅣㅍㅣㄴㅡㄴ ㅅㅏㅁㄱㅗㅇㄸㅐ ㅈㅔㅊㅓㄹㅅㅗㄹㅡㄹ ㅅㅣㅈㅏㄱㅎㅏㄹ ㄷㅏㅇㅅㅣ ㅊㅗㅇㄹㅣㄷㅗ ㅎㅏㄱㅗㅎㅐ ㅈㅏㅇㅕㄴㅅㅡㄹㅓㅂㄱㅔ ㅈㅏㅈㅜ ㅁㅏㄴㄴㅏㅆㅇㅡㄹ ㅃㅜㄴㅇㅣㅂㄴㅣㄷㅏ.
```
Note 3: Don't forget to shuffle the train.txt with python3 shuffle_train.txt.py on your desired path. Also point to the shuffled text file in hparams.py as base_dir and input parameters.
Train a model
```
python3 train.py --name <your-desired-run-name>
```
Tunable hyperparameters are found in hparams.py. You can adjust these at the command line using the --hparams flag, for example --hparams="batch_size=16,outputs_per_step=2". Hyperparameters should generally be set to the same values at both training and eval time. The default hyperparameters are recommended. See TRAINING_DATA.md for other languages.
Monitor with Tensorboard (optional)
```
tensorboard --logdir ./logs-<your-desired-run-name>
```
The trainer dumps audio and alignments every 1000 steps. You can find these in ./logs-<your-desired-run-name>.
Synthesize from a checkpoint
```
python3 demo_server_gpu.py --checkpoint ./logs-<your-desired-run-name>/model.ckpt-185000
```
Replace "185000" with the checkpoint number that you want to use, then open a browser to localhost:9000 and type what you want to speak. Alternately, you can run eval.py at the command line:
```
python3 eval.py --checkpoint ./logs-<your-desired-run-name>/model.ckpt-185000
```
If you set the --hparams flag when training, set the same value here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continuous Emotional TTS

Background

Demo

Continuous Emotional samples

Quick Start

Installing dependencies

Using a pre-trained model

Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
datasets		datasets
models		models
tests		tests
text		text
util		util
LICENSE		LICENSE
README.md		README.md
TRAINING_DATA.md		TRAINING_DATA.md
demo_server_gpu.py		demo_server_gpu.py
eval.py		eval.py
hparams.py		hparams.py
pad_379k.png		pad_379k.png
pad_383k.png		pad_383k.png
pad_385k.png		pad_385k.png
preprocess.py		preprocess.py
preprocess_kor_text.py		preprocess_kor_text.py
requirements.txt		requirements.txt
synthesizer_gpu.py		synthesizer_gpu.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Continuous Emotional TTS

Background

Demo

Continuous Emotional samples

Quick Start

Installing dependencies

Using a pre-trained model

Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages