Merge branch 'master' of https://github.com/TensorSpeech/TensorFlowTTS

dathudeptrai · dathudeptrai · commit 4f250c268e58 · 2020-09-26T20:01:58.000+07:00
diff --git a/README.md b/README.md
@@ -26,24 +26,24 @@
 - 2020/08/14 **(NEW!)** Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan).
 - 2020/08/05 **(NEW!)** Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153).
 - 2020/07/17 Support MultiGPU for all Trainer.
-- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from TFlite team for his support.
+- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from the TFlite team for his support.
 - 2020/06/20 [FastSpeech2](https://arxiv.org/abs/2006.04558) implementation with Tensorflow is supported.
 - 2020/06/07 [Multi-band MelGAN (MB MelGAN)](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/multiband_melgan/) implementation with Tensorflow is supported.
 
 
 ## Features
 - High performance on Speech Synthesis.
 - Be able to fine-tune on other languages.
-- Fast, Scalable and Reliable.
+- Fast, Scalable, and Reliable.
 - Suitable for deployment.
-- Easy to implement new model based-on abtract class.
-- Mixed precision to speed-up training if posible.
+- Easy to implement a new model, based-on abstract class.
+- Mixed precision to speed-up training if possible.
 - Support both Single/Multi GPU in base trainer class.
-- TFlite conversion for all supported model.
+- TFlite conversion for all supported models.
 - Android example.
 - Support many languages (currently, we support Chinese, Korean, English.)
 - Support C++ inference.
-- Support Convert weight for some models from pytorch to tensorflow to accelerate speed.
+- Support Convert weight for some models from PyTorch to TensorFlow to accelerate speed.
 
 ## Requirements
 This repository is tested on Ubuntu 18.04 with:
@@ -54,37 +54,37 @@ This repository is tested on Ubuntu 18.04 with:
 - Tensorflow 2.2/2.3
 - [Tensorflow Addons](https://github.com/tensorflow/addons) >= 0.10.0
 
-Different Tensorflow version should be working but not tested yet. This repo will try to work with latest stable tensorflow version. **We recommend you install tensorflow 2.3.0 to training in case you want to use MultiGPU.**
+Different Tensorflow version should be working but not tested yet. This repo will try to work with the latest stable TensorFlow version. **We recommend you install TensorFlow 2.3.0 to training in case you want to use MultiGPU.**
 
 ## Installation
 ### With pip
 ```bash
 $ pip install TensorFlowTTS
 ```
 ### From source
-Examples are included in the repository but are not shipped with the framework. Therefore, in order to run the latest verion of examples, you need install from source following bellow.
+Examples are included in the repository but are not shipped with the framework. Therefore, to run the latest version of examples, you need to install the source below.
 ```bash
 $ git clone https://github.com/TensorSpeech/TensorFlowTTS.git
 $ cd TensorFlowTTS
 $ pip install .
 ```
-If you want upgrade the repository and its dependencies:
+If you want to upgrade the repository and its dependencies:
 ```bash
 $ git pull
 $ pip install --upgrade .
 ```
 
-# Supported Model achitectures
+# Supported Model architectures
 TensorFlowTTS currently  provides the following architectures:
 
 1. **MelGAN** released with the paper [MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis](https://arxiv.org/abs/1910.06711) by Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville.
 2. **Tacotron-2** released with the paper [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) by Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu.
-3. **FastSpeech** released with the paper [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263) by Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
+3. **FastSpeech** released with the paper [FastSpeech: Fast, Robust, and Controllable Text to Speech](https://arxiv.org/abs/1905.09263) by Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
 4. **Multi-band MelGAN** released with the paper [Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech](https://arxiv.org/abs/2005.05106) by Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie.
 5. **FastSpeech2** released with the paper [FastSpeech 2: Fast and High-Quality End-to-End Text to Speech](https://arxiv.org/abs/2006.04558) by Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
 6. **Parallel WaveGAN** released with the paper [Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480) by Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim.
 
-We are also implement some techniques to improve quality and convergence speed from following papers:
+We are also implementing some techniques to improve quality and convergence speed from the following papers:
 
 2. **Guided Attention Loss** released with the paper [Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
 ](https://arxiv.org/abs/1710.08969) by Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara.
@@ -106,7 +106,7 @@ Prepare a dataset in the following format:
 |       |- ...
 ```
 
-where `metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format, you can ignore preprocessing steps if you have other format dataset.
+Where `metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format; you can ignore preprocessing steps if you have other format datasets.
 
 Note that `NAME_DATASET` should be `[ljspeech/kss/baker/libritts]` for example.
 
@@ -118,8 +118,8 @@ The preprocessing has two steps:
     - Convert characters to IDs
     - Compute mel spectrograms
     - Normalize mel spectrograms to [-1, 1] range
-    - Split dataset into train and validation
-    - Compute mean and standard deviation of multiple features from the **training** split
+    - Split the dataset into train and validation
+    - Compute the mean and standard deviation of multiple features from the **training** split
 2. Standardize mel spectrogram based on computed statistics
 
 To reproduce the steps above:
@@ -130,7 +130,7 @@ tensorflow-tts-normalize --rootdir ./dump_[ljspeech/kss/baker/libritts] --outdir
 
 Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) and [`libritts`](http://www.openslr.org/60/) for dataset argument. In the future, we intend to support more datasets.
 
-**Note**: To runing `libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need reformat it first before run preprocessing.
+**Note**: To run `libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need to reformat it first before run preprocessing.
 
 After preprocessing, the structure of the project folder should be:
 ```
@@ -195,16 +195,16 @@ After preprocessing, the structure of the project folder should be:
 - `stats_f0.npy` contains the mean and std of F0 values in the training split
 - `train_utt_ids.npy` / `valid_utt_ids.npy` contains training and validation utterances IDs respectively
 
-We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats` and `wave`) for each type of input.
+We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats`, and `wave`) for each input type.
 
 
 **IMPORTANT NOTES**:
 - This preprocessing step is based on [ESPnet](https://github.com/espnet/espnet) so you can combine all models here with other models from ESPnet repository.
-- Regardless how your dataset is formatted, the final structure of `dump` folder **SHOULD** follow above structure to be able use the training script or you can modify by yourself 😄.
+- Regardless of how your dataset is formatted, the final structure of the `dump` folder **SHOULD** follow the above structure to be able to use the training script, or you can modify it by yourself 😄.
 
 ## Training models
 
-To know how to training model from scratch or fine-tune with other datasets/languages, pls see detail at example directory.
+To know how to train model from scratch or fine-tune with other datasets/languages, please see detail at example directory.
 
 - For Tacotron-2 tutorial, pls see [examples/tacotron2](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/tacotron2)
 - For FastSpeech tutorial, pls see [examples/fastspeech](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/fastspeech)
@@ -222,15 +222,15 @@ To know how to training model from scratch or fine-tune with other datasets/lang
 A detail implementation of abstract dataset class from [tensorflow_tts/dataset/abstract_dataset](https://github.com/tensorspeech/TensorFlowTTS/blob/master/tensorflow_tts/datasets/abstract_dataset.py). There are some functions you need overide and understand:
 
 1. **get_args**: This function return argumentation for **generator** class, normally is utt_ids.
-2. **generator**: This funtion have an inputs from **get_args** function and return a inputs for models. **Note that we return dictionary for all generator function with they keys exactly match with the parameter of the model because base_trainer will use model(\*\*batch) to do forward step.**
+2. **generator**: This function have an inputs from **get_args** function and return a inputs for models. **Note that we return a dictionary for all generator functions with the keys that exactly match with the model's parameters because base_trainer will use model(\*\*batch) to do forward step.**
 3. **get_output_dtypes**: This function need return dtypes for each element from **generator** function.
 4. **get_len_dataset**: Return len of datasets, normaly is len(utt_ids).
 
 **IMPORTANT NOTES**:
 
 - A pipeline of creating dataset should be: cache -> shuffle -> map_fn -> get_batch -> prefetch.
 - If you do shuffle before cache, the dataset won't shuffle when it re-iterate over datasets.
-- You should apply map_fn to make each elements return from **generator** function have a same length before get batch and feed it into a model.
+- You should apply map_fn to make each element return from **generator** function have the same length before getting batch and feed it into a model.
 
 Some examples to use this **abstract_dataset** are [tacotron_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/tacotron2/tacotron_dataset.py), [fastspeech_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/fastspeech/fastspeech_dataset.py), [melgan_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/melgan/audio_mel_dataset.py), [fastspeech2_dataset.py](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/fastspeech2_dataset.py)
 
@@ -301,7 +301,7 @@ sf.write('./audio_after.wav', audio_after, 22050, "PCM_16")
 [Minh Nguyen Quan Anh](https://github.com/tensorspeech): nguyenquananhminh@gmail.com, [erogol](https://github.com/erogol): erengolge@gmail.com, [Kuan Chen](https://github.com/azraelkuan): azraelkuan@gmail.com, [Dawid Kobus](https://github.com/machineko): machineko@protonmail.com, [Takuya Ebata](https://github.com/MokkeMeguru): meguru.mokke@gmail.com, [Trinh Le Quang](https://github.com/l4zyf9x): trinhle.cse@gmail.com, [Yunchao He](https://github.com/candlewill): yunchaohe@gmail.com, [Alejandro Miguel Velasquez](https://github.com/ZDisket): xml506ok@gmail.com
 
 # License
-Overrall, Almost models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) for all countries in the world, except in **Viet Nam** this framework cannot be used for production in any way without permission from TensorFlowTTS's Authors. There is an exception, Tacotron-2 can be used with any perpose. So, if you are VietNamese and want to use this framework for production, you **Must** contact our in andvance.
+Overall, Almost models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) for all countries in the world, except in **Viet Nam** this framework cannot be used for production in any way without permission from TensorFlowTTS's Authors. There is an exception, Tacotron-2 can be used with any purpose. If you are Vietnamese and want to use this framework for production, you **Must** contact us in advance.
 
 # Acknowledgement
-We would like to thank [Tomoki Hayashi](https://github.com/kan-bayashi), who discussed with our much about Melgan, Multi-band melgan, Fastspeech and Tacotron. This framework based-on his great open-source [ParallelWaveGan](https://github.com/kan-bayashi/ParallelWaveGAN) project.
+We want to thank [Tomoki Hayashi](https://github.com/kan-bayashi), who discussed with us much about Melgan, Multi-band melgan, Fastspeech, and Tacotron. This framework based-on his great open-source [ParallelWaveGan](https://github.com/kan-bayashi/ParallelWaveGAN) project.