You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+23-23Lines changed: 23 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -26,24 +26,24 @@
26
26
- 2020/08/14 **(NEW!)** Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan).
27
27
- 2020/08/05 **(NEW!)** Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153).
28
28
- 2020/07/17 Support MultiGPU for all Trainer.
29
-
- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from TFlite team for his support.
29
+
- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from the TFlite team for his support.
30
30
- 2020/06/20 [FastSpeech2](https://arxiv.org/abs/2006.04558) implementation with Tensorflow is supported.
31
31
- 2020/06/07 [Multi-band MelGAN (MB MelGAN)](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/multiband_melgan/) implementation with Tensorflow is supported.
32
32
33
33
34
34
## Features
35
35
- High performance on Speech Synthesis.
36
36
- Be able to fine-tune on other languages.
37
-
- Fast, Scalable and Reliable.
37
+
- Fast, Scalable, and Reliable.
38
38
- Suitable for deployment.
39
-
- Easy to implement new model based-on abtract class.
40
-
- Mixed precision to speed-up training if posible.
39
+
- Easy to implement a new model, based-on abstract class.
40
+
- Mixed precision to speed-up training if possible.
41
41
- Support both Single/Multi GPU in base trainer class.
42
-
- TFlite conversion for all supported model.
42
+
- TFlite conversion for all supported models.
43
43
- Android example.
44
44
- Support many languages (currently, we support Chinese, Korean, English.)
45
45
- Support C++ inference.
46
-
- Support Convert weight for some models from pytorch to tensorflow to accelerate speed.
46
+
- Support Convert weight for some models from PyTorch to TensorFlow to accelerate speed.
47
47
48
48
## Requirements
49
49
This repository is tested on Ubuntu 18.04 with:
@@ -54,37 +54,37 @@ This repository is tested on Ubuntu 18.04 with:
Different Tensorflow version should be working but not tested yet. This repo will try to work with latest stable tensorflow version. **We recommend you install tensorflow 2.3.0 to training in case you want to use MultiGPU.**
57
+
Different Tensorflow version should be working but not tested yet. This repo will try to work with the latest stable TensorFlow version. **We recommend you install TensorFlow 2.3.0 to training in case you want to use MultiGPU.**
58
58
59
59
## Installation
60
60
### With pip
61
61
```bash
62
62
$ pip install TensorFlowTTS
63
63
```
64
64
### From source
65
-
Examples are included in the repository but are not shipped with the framework. Therefore, in order to run the latest verion of examples, you need install from source following bellow.
65
+
Examples are included in the repository but are not shipped with the framework. Therefore, to run the latest version of examples, you need to install the source below.
If you want upgrade the repository and its dependencies:
71
+
If you want to upgrade the repository and its dependencies:
72
72
```bash
73
73
$ git pull
74
74
$ pip install --upgrade .
75
75
```
76
76
77
-
# Supported Model achitectures
77
+
# Supported Model architectures
78
78
TensorFlowTTS currently provides the following architectures:
79
79
80
80
1.**MelGAN** released with the paper [MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis](https://arxiv.org/abs/1910.06711) by Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville.
81
81
2.**Tacotron-2** released with the paper [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) by Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu.
82
-
3.**FastSpeech** released with the paper [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263) by Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
82
+
3.**FastSpeech** released with the paper [FastSpeech: Fast, Robust, and Controllable Text to Speech](https://arxiv.org/abs/1905.09263) by Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
83
83
4.**Multi-band MelGAN** released with the paper [Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech](https://arxiv.org/abs/2005.05106) by Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie.
84
84
5.**FastSpeech2** released with the paper [FastSpeech 2: Fast and High-Quality End-to-End Text to Speech](https://arxiv.org/abs/2006.04558) by Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
85
85
6.**Parallel WaveGAN** released with the paper [Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480) by Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim.
86
86
87
-
We are also implement some techniques to improve quality and convergence speed from following papers:
87
+
We are also implementing some techniques to improve quality and convergence speed from the following papers:
88
88
89
89
2.**Guided Attention Loss** released with the paper [Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
90
90
](https://arxiv.org/abs/1710.08969) by Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara.
@@ -106,7 +106,7 @@ Prepare a dataset in the following format:
106
106
| |- ...
107
107
```
108
108
109
-
where`metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format, you can ignore preprocessing steps if you have other format dataset.
109
+
Where`metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format; you can ignore preprocessing steps if you have other format datasets.
110
110
111
111
Note that `NAME_DATASET` should be `[ljspeech/kss/baker/libritts]` for example.
112
112
@@ -118,8 +118,8 @@ The preprocessing has two steps:
118
118
- Convert characters to IDs
119
119
- Compute mel spectrograms
120
120
- Normalize mel spectrograms to [-1, 1] range
121
-
- Split dataset into train and validation
122
-
- Compute mean and standard deviation of multiple features from the **training** split
121
+
- Split the dataset into train and validation
122
+
- Compute the mean and standard deviation of multiple features from the **training** split
123
123
2. Standardize mel spectrogram based on computed statistics
Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) and [`libritts`](http://www.openslr.org/60/) for dataset argument. In the future, we intend to support more datasets.
132
132
133
-
**Note**: To runing`libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need reformat it first before run preprocessing.
133
+
**Note**: To run`libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need to reformat it first before run preprocessing.
134
134
135
135
After preprocessing, the structure of the project folder should be:
136
136
```
@@ -195,16 +195,16 @@ After preprocessing, the structure of the project folder should be:
195
195
-`stats_f0.npy` contains the mean and std of F0 values in the training split
196
196
-`train_utt_ids.npy` / `valid_utt_ids.npy` contains training and validation utterances IDs respectively
197
197
198
-
We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats` and `wave`) for each type of input.
198
+
We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats`, and `wave`) for each input type.
199
199
200
200
201
201
**IMPORTANT NOTES**:
202
202
- This preprocessing step is based on [ESPnet](https://github.com/espnet/espnet) so you can combine all models here with other models from ESPnet repository.
203
-
- Regardless how your dataset is formatted, the final structure of `dump` folder **SHOULD** follow above structure to be able use the training script or you can modify by yourself 😄.
203
+
- Regardless of how your dataset is formatted, the final structure of the `dump` folder **SHOULD** follow the above structure to be able to use the training script, or you can modify it by yourself 😄.
204
204
205
205
## Training models
206
206
207
-
To know how to training model from scratch or fine-tune with other datasets/languages, pls see detail at example directory.
207
+
To know how to train model from scratch or fine-tune with other datasets/languages, please see detail at example directory.
208
208
209
209
- For Tacotron-2 tutorial, pls see [examples/tacotron2](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/tacotron2)
210
210
- For FastSpeech tutorial, pls see [examples/fastspeech](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/fastspeech)
@@ -222,15 +222,15 @@ To know how to training model from scratch or fine-tune with other datasets/lang
222
222
A detail implementation of abstract dataset class from [tensorflow_tts/dataset/abstract_dataset](https://github.com/tensorspeech/TensorFlowTTS/blob/master/tensorflow_tts/datasets/abstract_dataset.py). There are some functions you need overide and understand:
223
223
224
224
1.**get_args**: This function return argumentation for **generator** class, normally is utt_ids.
225
-
2.**generator**: This funtion have an inputs from **get_args** function and return a inputs for models. **Note that we return dictionary for all generator function with they keys exactly match with the parameter of the model because base_trainer will use model(\*\*batch) to do forward step.**
225
+
2.**generator**: This function have an inputs from **get_args** function and return a inputs for models. **Note that we return a dictionary for all generator functions with the keys that exactly match with the model's parameters because base_trainer will use model(\*\*batch) to do forward step.**
226
226
3.**get_output_dtypes**: This function need return dtypes for each element from **generator** function.
227
227
4.**get_len_dataset**: Return len of datasets, normaly is len(utt_ids).
228
228
229
229
**IMPORTANT NOTES**:
230
230
231
231
- A pipeline of creating dataset should be: cache -> shuffle -> map_fn -> get_batch -> prefetch.
232
232
- If you do shuffle before cache, the dataset won't shuffle when it re-iterate over datasets.
233
-
- You should apply map_fn to make each elements return from **generator** function have a same length before get batch and feed it into a model.
233
+
- You should apply map_fn to make each element return from **generator** function have the same length before getting batch and feed it into a model.
234
234
235
235
Some examples to use this **abstract_dataset** are [tacotron_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/tacotron2/tacotron_dataset.py), [fastspeech_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/fastspeech/fastspeech_dataset.py), [melgan_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/melgan/audio_mel_dataset.py), [fastspeech2_dataset.py](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/fastspeech2_dataset.py)
Overrall, Almost models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) for all countries in the world, except in **Viet Nam** this framework cannot be used for production in any way without permission from TensorFlowTTS's Authors. There is an exception, Tacotron-2 can be used with any perpose. So, if you are VietNamese and want to use this framework for production, you **Must** contact our in andvance.
304
+
Overall, Almost models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) for all countries in the world, except in **Viet Nam** this framework cannot be used for production in any way without permission from TensorFlowTTS's Authors. There is an exception, Tacotron-2 can be used with any purpose. If you are Vietnamese and want to use this framework for production, you **Must** contact us in advance.
305
305
306
306
# Acknowledgement
307
-
We would like to thank [Tomoki Hayashi](https://github.com/kan-bayashi), who discussed with our much about Melgan, Multi-band melgan, Fastspeech and Tacotron. This framework based-on his great open-source [ParallelWaveGan](https://github.com/kan-bayashi/ParallelWaveGAN) project.
307
+
We want to thank [Tomoki Hayashi](https://github.com/kan-bayashi), who discussed with us much about Melgan, Multi-band melgan, Fastspeech, and Tacotron. This framework based-on his great open-source [ParallelWaveGan](https://github.com/kan-bayashi/ParallelWaveGAN) project.
0 commit comments