Skip to content

Commit 4f250c2

Browse files
committed
2 parents 303e5de + 5f0b3a7 commit 4f250c2

File tree

1 file changed

+23
-23
lines changed

1 file changed

+23
-23
lines changed

README.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -26,24 +26,24 @@
2626
- 2020/08/14 **(NEW!)** Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan).
2727
- 2020/08/05 **(NEW!)** Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153).
2828
- 2020/07/17 Support MultiGPU for all Trainer.
29-
- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from TFlite team for his support.
29+
- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from the TFlite team for his support.
3030
- 2020/06/20 [FastSpeech2](https://arxiv.org/abs/2006.04558) implementation with Tensorflow is supported.
3131
- 2020/06/07 [Multi-band MelGAN (MB MelGAN)](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/multiband_melgan/) implementation with Tensorflow is supported.
3232

3333

3434
## Features
3535
- High performance on Speech Synthesis.
3636
- Be able to fine-tune on other languages.
37-
- Fast, Scalable and Reliable.
37+
- Fast, Scalable, and Reliable.
3838
- Suitable for deployment.
39-
- Easy to implement new model based-on abtract class.
40-
- Mixed precision to speed-up training if posible.
39+
- Easy to implement a new model, based-on abstract class.
40+
- Mixed precision to speed-up training if possible.
4141
- Support both Single/Multi GPU in base trainer class.
42-
- TFlite conversion for all supported model.
42+
- TFlite conversion for all supported models.
4343
- Android example.
4444
- Support many languages (currently, we support Chinese, Korean, English.)
4545
- Support C++ inference.
46-
- Support Convert weight for some models from pytorch to tensorflow to accelerate speed.
46+
- Support Convert weight for some models from PyTorch to TensorFlow to accelerate speed.
4747

4848
## Requirements
4949
This repository is tested on Ubuntu 18.04 with:
@@ -54,37 +54,37 @@ This repository is tested on Ubuntu 18.04 with:
5454
- Tensorflow 2.2/2.3
5555
- [Tensorflow Addons](https://github.com/tensorflow/addons) >= 0.10.0
5656

57-
Different Tensorflow version should be working but not tested yet. This repo will try to work with latest stable tensorflow version. **We recommend you install tensorflow 2.3.0 to training in case you want to use MultiGPU.**
57+
Different Tensorflow version should be working but not tested yet. This repo will try to work with the latest stable TensorFlow version. **We recommend you install TensorFlow 2.3.0 to training in case you want to use MultiGPU.**
5858

5959
## Installation
6060
### With pip
6161
```bash
6262
$ pip install TensorFlowTTS
6363
```
6464
### From source
65-
Examples are included in the repository but are not shipped with the framework. Therefore, in order to run the latest verion of examples, you need install from source following bellow.
65+
Examples are included in the repository but are not shipped with the framework. Therefore, to run the latest version of examples, you need to install the source below.
6666
```bash
6767
$ git clone https://github.com/TensorSpeech/TensorFlowTTS.git
6868
$ cd TensorFlowTTS
6969
$ pip install .
7070
```
71-
If you want upgrade the repository and its dependencies:
71+
If you want to upgrade the repository and its dependencies:
7272
```bash
7373
$ git pull
7474
$ pip install --upgrade .
7575
```
7676

77-
# Supported Model achitectures
77+
# Supported Model architectures
7878
TensorFlowTTS currently provides the following architectures:
7979

8080
1. **MelGAN** released with the paper [MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis](https://arxiv.org/abs/1910.06711) by Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville.
8181
2. **Tacotron-2** released with the paper [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) by Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu.
82-
3. **FastSpeech** released with the paper [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263) by Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
82+
3. **FastSpeech** released with the paper [FastSpeech: Fast, Robust, and Controllable Text to Speech](https://arxiv.org/abs/1905.09263) by Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
8383
4. **Multi-band MelGAN** released with the paper [Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech](https://arxiv.org/abs/2005.05106) by Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie.
8484
5. **FastSpeech2** released with the paper [FastSpeech 2: Fast and High-Quality End-to-End Text to Speech](https://arxiv.org/abs/2006.04558) by Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
8585
6. **Parallel WaveGAN** released with the paper [Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480) by Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim.
8686

87-
We are also implement some techniques to improve quality and convergence speed from following papers:
87+
We are also implementing some techniques to improve quality and convergence speed from the following papers:
8888

8989
2. **Guided Attention Loss** released with the paper [Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
9090
](https://arxiv.org/abs/1710.08969) by Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara.
@@ -106,7 +106,7 @@ Prepare a dataset in the following format:
106106
| |- ...
107107
```
108108

109-
where `metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format, you can ignore preprocessing steps if you have other format dataset.
109+
Where `metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format; you can ignore preprocessing steps if you have other format datasets.
110110

111111
Note that `NAME_DATASET` should be `[ljspeech/kss/baker/libritts]` for example.
112112

@@ -118,8 +118,8 @@ The preprocessing has two steps:
118118
- Convert characters to IDs
119119
- Compute mel spectrograms
120120
- Normalize mel spectrograms to [-1, 1] range
121-
- Split dataset into train and validation
122-
- Compute mean and standard deviation of multiple features from the **training** split
121+
- Split the dataset into train and validation
122+
- Compute the mean and standard deviation of multiple features from the **training** split
123123
2. Standardize mel spectrogram based on computed statistics
124124

125125
To reproduce the steps above:
@@ -130,7 +130,7 @@ tensorflow-tts-normalize --rootdir ./dump_[ljspeech/kss/baker/libritts] --outdir
130130

131131
Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) and [`libritts`](http://www.openslr.org/60/) for dataset argument. In the future, we intend to support more datasets.
132132

133-
**Note**: To runing `libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need reformat it first before run preprocessing.
133+
**Note**: To run `libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need to reformat it first before run preprocessing.
134134

135135
After preprocessing, the structure of the project folder should be:
136136
```
@@ -195,16 +195,16 @@ After preprocessing, the structure of the project folder should be:
195195
- `stats_f0.npy` contains the mean and std of F0 values in the training split
196196
- `train_utt_ids.npy` / `valid_utt_ids.npy` contains training and validation utterances IDs respectively
197197

198-
We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats` and `wave`) for each type of input.
198+
We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats`, and `wave`) for each input type.
199199

200200

201201
**IMPORTANT NOTES**:
202202
- This preprocessing step is based on [ESPnet](https://github.com/espnet/espnet) so you can combine all models here with other models from ESPnet repository.
203-
- Regardless how your dataset is formatted, the final structure of `dump` folder **SHOULD** follow above structure to be able use the training script or you can modify by yourself 😄.
203+
- Regardless of how your dataset is formatted, the final structure of the `dump` folder **SHOULD** follow the above structure to be able to use the training script, or you can modify it by yourself 😄.
204204

205205
## Training models
206206

207-
To know how to training model from scratch or fine-tune with other datasets/languages, pls see detail at example directory.
207+
To know how to train model from scratch or fine-tune with other datasets/languages, please see detail at example directory.
208208

209209
- For Tacotron-2 tutorial, pls see [examples/tacotron2](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/tacotron2)
210210
- For FastSpeech tutorial, pls see [examples/fastspeech](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/fastspeech)
@@ -222,15 +222,15 @@ To know how to training model from scratch or fine-tune with other datasets/lang
222222
A detail implementation of abstract dataset class from [tensorflow_tts/dataset/abstract_dataset](https://github.com/tensorspeech/TensorFlowTTS/blob/master/tensorflow_tts/datasets/abstract_dataset.py). There are some functions you need overide and understand:
223223

224224
1. **get_args**: This function return argumentation for **generator** class, normally is utt_ids.
225-
2. **generator**: This funtion have an inputs from **get_args** function and return a inputs for models. **Note that we return dictionary for all generator function with they keys exactly match with the parameter of the model because base_trainer will use model(\*\*batch) to do forward step.**
225+
2. **generator**: This function have an inputs from **get_args** function and return a inputs for models. **Note that we return a dictionary for all generator functions with the keys that exactly match with the model's parameters because base_trainer will use model(\*\*batch) to do forward step.**
226226
3. **get_output_dtypes**: This function need return dtypes for each element from **generator** function.
227227
4. **get_len_dataset**: Return len of datasets, normaly is len(utt_ids).
228228

229229
**IMPORTANT NOTES**:
230230

231231
- A pipeline of creating dataset should be: cache -> shuffle -> map_fn -> get_batch -> prefetch.
232232
- If you do shuffle before cache, the dataset won't shuffle when it re-iterate over datasets.
233-
- You should apply map_fn to make each elements return from **generator** function have a same length before get batch and feed it into a model.
233+
- You should apply map_fn to make each element return from **generator** function have the same length before getting batch and feed it into a model.
234234

235235
Some examples to use this **abstract_dataset** are [tacotron_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/tacotron2/tacotron_dataset.py), [fastspeech_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/fastspeech/fastspeech_dataset.py), [melgan_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/melgan/audio_mel_dataset.py), [fastspeech2_dataset.py](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/fastspeech2_dataset.py)
236236

@@ -301,7 +301,7 @@ sf.write('./audio_after.wav', audio_after, 22050, "PCM_16")
301301
[Minh Nguyen Quan Anh](https://github.com/tensorspeech): [email protected], [erogol](https://github.com/erogol): [email protected], [Kuan Chen](https://github.com/azraelkuan): [email protected], [Dawid Kobus](https://github.com/machineko): [email protected], [Takuya Ebata](https://github.com/MokkeMeguru): [email protected], [Trinh Le Quang](https://github.com/l4zyf9x): [email protected], [Yunchao He](https://github.com/candlewill): [email protected], [Alejandro Miguel Velasquez](https://github.com/ZDisket): [email protected]
302302

303303
# License
304-
Overrall, Almost models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) for all countries in the world, except in **Viet Nam** this framework cannot be used for production in any way without permission from TensorFlowTTS's Authors. There is an exception, Tacotron-2 can be used with any perpose. So, if you are VietNamese and want to use this framework for production, you **Must** contact our in andvance.
304+
Overall, Almost models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) for all countries in the world, except in **Viet Nam** this framework cannot be used for production in any way without permission from TensorFlowTTS's Authors. There is an exception, Tacotron-2 can be used with any purpose. If you are Vietnamese and want to use this framework for production, you **Must** contact us in advance.
305305

306306
# Acknowledgement
307-
We would like to thank [Tomoki Hayashi](https://github.com/kan-bayashi), who discussed with our much about Melgan, Multi-band melgan, Fastspeech and Tacotron. This framework based-on his great open-source [ParallelWaveGan](https://github.com/kan-bayashi/ParallelWaveGAN) project.
307+
We want to thank [Tomoki Hayashi](https://github.com/kan-bayashi), who discussed with us much about Melgan, Multi-band melgan, Fastspeech, and Tacotron. This framework based-on his great open-source [ParallelWaveGan](https://github.com/kan-bayashi/ParallelWaveGAN) project.

0 commit comments

Comments
 (0)