New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add a TTS recipe VITS on LJSpeech dataset #1372

Merged

yaozengwei merged 17 commits into k2-fsa:master from yaozengwei:vits

Nov 29, 2023

Collaborator

yaozengwei commented Nov 6, 2023 •

edited

Loading

This PR adds a TTS baseline in icefall.

model: VITS, Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech.
dataset: LJSpeech

The model related codes are mostly copied from espnet.

TODO:

Support model exporting.
Upload checkpints and training logs.
Add a document.

Will add a recipe on VCTK dataset later.

yaozengwei added 6 commits

October 22, 2023 23:14


          first commit

3df16b3


          replace phonimizer with g2p

b719581


          use Conformer as text encoder

8d09f8e


          modify training script, clean codes

04c6ecb


          minor fix

fc359be


          Merge remote-tracking branch 'k2-fsa/master' into vits

0ec1e7d

Collaborator Author

yaozengwei commented Nov 6, 2023

This also requires some changes in lhotse. Will make a PR to lhotse soon.


          rename directory

cd59a69

csukuangfj reviewed

View reviewed changes

egs/ljspeech/TTS/local/compute_spectrogram_ljspeech.py Outdated

+                  with get_executor() as ex:  # Initialize the executor only once.
+                      cuts_filename = f"{prefix}_cuts_{partition}.{suffix}"
+                      if (output_dir / cuts_filename).is_file():
+                          logging.info(f"{partition} already exists - skipping.")

Collaborator

csukuangfj Nov 6, 2023

Suggested change

      
                        logging.info(f"{partition} already exists - skipping.")
          
                        logging.info(f"{cuts_filename} already exists - skipping.")

egs/ljspeech/TTS/local/compute_spectrogram_ljspeech.py Outdated

+              This file computes fbank features of the LJSpeech dataset.
+              It looks for manifests in the directory data/manifests.
+              The generated fbank features are saved in data/spectrogram.

Collaborator

csukuangfj Nov 6, 2023

Suggested change

      
            The generated fbank features are saved in data/spectrogram.
          
            The generated spectrogram features are saved in data/spectrogram.

egs/ljspeech/TTS/local/prepare_token_file.py Outdated



		"""
		This file reads the texts in given manifest and generate the file that maps tokens to IDs.

Collaborator

csukuangfj Nov 6, 2023

Suggested change

      
            This file reads the texts in given manifest and generate the file that maps tokens to IDs.
          
            This file reads the texts in given manifest and generates the file that maps tokens to IDs.

egs/ljspeech/TTS/local/prepare_token_file.py Outdated

+              from pathlib import Path
+              from typing import Dict
+              import g2p_en

Collaborator

csukuangfj Nov 6, 2023

Please add code about how to install g2p_en.

egs/ljspeech/TTS/local/prepare_token_file.py Outdated

+              from typing import Dict
+              import g2p_en
+              import tacotron_cleaner.cleaners

Collaborator

csukuangfj Nov 6, 2023

Please add code about how to install tacotron_cleaner?

Collaborator Author

yaozengwei Nov 6, 2023

Thanks a lot!

csukuangfj reviewed

View reviewed changes

egs/ljspeech/TTS/local/prepare_token_file.py Outdated

+                  # Sort by the number of occurrences in descending order
+                  tokens_and_counts = sorted(counter.items(), key=lambda x: -x[1])
+                  for token, idx in extra_tokens.items():

Collaborator

csukuangfj Nov 6, 2023

Items in a dict are iterated in an unknown order.
Please use a list for extra_tokens.

You can use

tokens_and_counts = extra_tokens + tokens_and_counts

egs/ljspeech/TTS/local/prepare_token_file.py Outdated

+                          counter[t] += 1
+                  # Sort by the number of occurrences in descending order
+                  tokens_and_counts = sorted(counter.items(), key=lambda x: -x[1])

Collaborator

csukuangfj Nov 6, 2023

Is there a reason to sort them by count?

Collaborator Author

yaozengwei Nov 6, 2023

Just make it easy to cut off the vocabulary according the counts if needed. But we don't need this now.

egs/ljspeech/TTS/local/prepare_token_file.py Outdated

+                  for token, idx in extra_tokens.items():
+                      tokens_and_counts.insert(idx, (token, None))
+                  token2id: Dict[str, int] = {token: i for i, (token, count) in enumerate(tokens_and_counts)}

Collaborator

csukuangfj Nov 6, 2023

Suggested change

      
                token2id: Dict[str, int] = {token: i for i, (token, count) in enumerate(tokens_and_counts)}
          
                token2id: Dict[str, int] = {token: i for i, (token, _) in enumerate(tokens_and_counts)}

egs/ljspeech/TTS/local/prepare_token_file.py

+                  args = get_args()
+                  manifest_file = Path(args.manifest_file)
+                  out_file = Path(args.tokens)

Collaborator

csukuangfj Nov 6, 2023

Please check that if out_file exists, it returns directly without any further computation.

Collaborator Author

yaozengwei Nov 6, 2023

We have checked this in prepare.sh

egs/ljspeech/TTS/local/validate_manifest.py Outdated

+                  assert manifest.is_file(), f"{manifest} does not exist"
+                  cut_set = load_manifest_lazy(manifest)
+                  assert isinstance(cut_set, CutSet)

Collaborator

csukuangfj Nov 6, 2023

Suggested change

      
                assert isinstance(cut_set, CutSet)
          
                assert isinstance(cut_set, CutSet), type(cut_set)

egs/ljspeech/TTS/prepare.sh Outdated

+                log "Stage 0: Download data"
+                # If you have pre-downloaded it to /path/to/LJSpeech,
+                # you can create a symlink

Collaborator

csukuangfj Nov 6, 2023

Please add a small description about what the directory LJSpeech contains.

egs/ljspeech/TTS/prepare.sh Outdated

+                # If you have pre-downloaded it to /path/to/LJSpeech,
+                # you can create a symlink
+                #
+                #   ln -sfv /path/to/LJSpeech $dl_dir/LJSpeech

Collaborator

csukuangfj Nov 6, 2023

Is it LJSpeech or LJSpeech-1.1?

egs/ljspeech/TTS/prepare.sh Outdated

+                fi
+                if [ ! -e data/spectrogram/.ljspeech-validated.done ]; then
+                  log "Validating data/fbank for LJSpeech"

Collaborator

csukuangfj Nov 6, 2023

Suggested change

      
                log "Validating data/fbank for LJSpeech"
          
                log "Validating data/spectrogram for LJSpeech"

egs/ljspeech/TTS/shared/parse_options.sh Outdated

		@@ -0,0 +1,97 @@
		#!/usr/bin/env bash

Collaborator

csukuangfj Nov 6, 2023

Could you replace it with a symlink?

Collaborator Author

yaozengwei Nov 6, 2023

Ok. Thanks.


          minor fixes

f55e80a

yaozengwei mentioned this pull request

Modify SpeechSynthesisDataset class, make it return text lhotse-speech/lhotse#1205

Merged

Collaborator Author

yaozengwei commented Nov 6, 2023

This also requires some changes in lhotse. Will make a PR to lhotse soon.

See lhotse-speech/lhotse#1205

JinZr mentioned this pull request

A TTS recipe VITS on VCTK dataset #1380

Merged

yaozengwei added 7 commits

November 13, 2023 21:56


          convert text to tokens in data preparation stage

8791a4e


          fix tts_datamodule.py

32931b7


          minor fix

1ed6b4e


          support onnx export and testing the exported onnx model

a983dcd


          add doc

5ab1428


          add README.md

0030d1b


          fix style

70343a8

Collaborator Author

yaozengwei commented Nov 29, 2023

Training logs, Tensorboard logs, and checkpoints are uploaded to https://huggingface.co/Zengwei/icefall-tts-ljspeech-vits-2023-11-29.

yaozengwei added 2 commits

November 29, 2023 20:02


          modify pyproject.toml

0f72104


          minor fix

24587e3

yaozengwei added the ready label

yaozengwei merged commit 0622dea into k2-fsa:master

37 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels