|
19 | 19 | :zany_face: TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using [fake-quantize aware](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide) and [pruning](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras), make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.
|
20 | 20 |
|
21 | 21 | ## What's new
|
22 |
| -- 2020/12/02 **(NEW!)** Support German TTS with [Thorsten dataset](https://github.com/thorstenMueller/deep-learning-german-tts). See the [Colab](https://colab.research.google.com/drive/1W0nSFpsz32M0OcIkY9uMOiGrLTPKVhTy?usp=sharing). Thanks [thorstenMueller](https://github.com/thorstenMueller) and [monatis](https://github.com/monatis). |
23 |
| -- 2020/11/24 **(NEW!)** Add HiFi-GAN vocoder. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/hifigan) |
24 |
| -- 2020/11/19 **(NEW!)** Add Multi-GPU gradient accumulator. See [here](https://github.com/TensorSpeech/TensorFlowTTS/pull/377) |
25 |
| -- 2020/08/23 Add Parallel WaveGAN tensorflow implementation. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/parallel_wavegan) |
| 22 | +- 2021/06/01 (**NEW!**) Integrated with [Huggingface Hub](https://huggingface.co/tensorspeech). See the [PR](https://github.com/TensorSpeech/TensorFlowTTS/pull/555). Thanks [patrickvonplaten](https://github.com/patrickvonplaten) and [osanseviero](https://github.com/osanseviero) |
| 23 | +- 2021/03/18 (**NEW!**) Support IOS for FastSpeech2 and MB MelGAN. Thanks [kewlbear](https://github.com/kewlbear). See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/ios) |
| 24 | +- 2021/01/18 (**NEW!**) Support TFLite C++ inference. Thanks [luan78zaoha](https://github.com/luan78zaoha). See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/cpptflite) |
| 25 | +- 2020/12/02 Support German TTS with [Thorsten dataset](https://github.com/thorstenMueller/deep-learning-german-tts). See the [Colab](https://colab.research.google.com/drive/1W0nSFpsz32M0OcIkY9uMOiGrLTPKVhTy?usp=sharing). Thanks [thorstenMueller](https://github.com/thorstenMueller) and [monatis](https://github.com/monatis) |
| 26 | +- 2020/11/24 Add HiFi-GAN vocoder. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/hifigan) |
| 27 | +- 2020/11/19 Add Multi-GPU gradient accumulator. See [here](https://github.com/TensorSpeech/TensorFlowTTS/pull/377) |
| 28 | +- 2020/08/23 Add Parallel WaveGAN tensorflow implementation. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/parallel_wavegan) |
26 | 29 | - 2020/08/23 Add MBMelGAN G + ParallelWaveGAN G example. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/multiband_pwgan)
|
27 |
| -- 2020/08/20 Add C++ inference code. Thank [@ZDisket](https://github.com/ZDisket). See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/cppwin) |
| 30 | +- 2020/08/20 Add C++ inference code. Thank [@ZDisket](https://github.com/ZDisket). See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/cppwin) |
28 | 31 | - 2020/08/18 Update [new base processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/base_processor.py). Add [AutoProcessor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/inference/auto_processor.py) and [pretrained processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/pretrained/) json file
|
29 | 32 | - 2020/08/14 Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan)
|
30 | 33 | - 2020/08/05 Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153)
|
@@ -261,42 +264,34 @@ import yaml
|
261 | 264 |
|
262 | 265 | import tensorflow as tf
|
263 | 266 |
|
264 |
| -from tensorflow_tts.inference import AutoConfig |
265 | 267 | from tensorflow_tts.inference import TFAutoModel
|
266 | 268 | from tensorflow_tts.inference import AutoProcessor
|
267 | 269 |
|
268 |
| -# initialize fastspeech model. |
269 |
| -fs_config = AutoConfig.from_pretrained('./examples/fastspeech/conf/fastspeech.v1.yaml') |
270 |
| -fastspeech = TFAutoModel.from_pretrained( |
271 |
| - config=fs_config, |
272 |
| - pretrained_path="./examples/fastspeech/pretrained/model-195000.h5" |
273 |
| -) |
| 270 | +# initialize fastspeech2 model. |
| 271 | +fastspeech2 = TFAutoModel.from_pretrained("tensorspeech/tts-fastspeech2-ljspeech-en") |
274 | 272 |
|
275 | 273 |
|
276 |
| -# initialize melgan model |
277 |
| -melgan_config = AutoConfig.from_pretrained('./examples/melgan/conf/melgan.v1.yaml') |
278 |
| -melgan = TFAutoModel.from_pretrained( |
279 |
| - config=melgan_config, |
280 |
| - pretrained_path="./examples/melgan/checkpoint/generator-1500000.h5" |
281 |
| -) |
| 274 | +# initialize mb_melgan model |
| 275 | +mb_melgan = TFAutoModel.from_pretrained("tensorspeech/tts-mb_melgan-ljspeech-en") |
282 | 276 |
|
283 | 277 |
|
284 | 278 | # inference
|
285 |
| -processor = AutoProcessor.from_pretrained(pretrained_path="./test/files/ljspeech_mapper.json") |
| 279 | +processor = AutoProcessor.from_pretrained("tensorspeech/tts-fastspeech2-ljspeech-en") |
286 | 280 |
|
287 | 281 | ids = processor.text_to_sequence("Recent research at Harvard has shown meditating for as little as 8 weeks, can actually increase the grey matter in the parts of the brain responsible for emotional regulation, and learning.")
|
288 |
| -ids = tf.expand_dims(ids, 0) |
289 | 282 | # fastspeech inference
|
290 | 283 |
|
291 |
| -masked_mel_before, masked_mel_after, duration_outputs = fastspeech.inference( |
292 |
| - ids, |
293 |
| - speaker_ids=tf.zeros(shape=[tf.shape(ids)[0]], dtype=tf.int32), |
294 |
| - speed_ratios=tf.constant([1.0], dtype=tf.float32) |
| 284 | +mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference( |
| 285 | + input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0), |
| 286 | + speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32), |
| 287 | + speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), |
| 288 | + f0_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32), |
| 289 | + energy_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32), |
295 | 290 | )
|
296 | 291 |
|
297 | 292 | # melgan inference
|
298 |
| -audio_before = melgan.inference(masked_mel_before)[0, :, 0] |
299 |
| -audio_after = melgan.inference(masked_mel_after)[0, :, 0] |
| 293 | +audio_before = mb_melgan.inference(mel_before)[0, :, 0] |
| 294 | +audio_after = mb_melgan.inference(mel_after)[0, :, 0] |
300 | 295 |
|
301 | 296 | # save to file
|
302 | 297 | sf.write('./audio_before.wav', audio_before, 22050, "PCM_16")
|
|
0 commit comments