-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a TTS recipe VITS on LJSpeech dataset #1372
Conversation
This also requires some changes in lhotse. Will make a PR to lhotse soon. |
with get_executor() as ex: # Initialize the executor only once. | ||
cuts_filename = f"{prefix}_cuts_{partition}.{suffix}" | ||
if (output_dir / cuts_filename).is_file(): | ||
logging.info(f"{partition} already exists - skipping.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging.info(f"{partition} already exists - skipping.") | |
logging.info(f"{cuts_filename} already exists - skipping.") |
This file computes fbank features of the LJSpeech dataset. | ||
It looks for manifests in the directory data/manifests. | ||
|
||
The generated fbank features are saved in data/spectrogram. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated fbank features are saved in data/spectrogram. | |
The generated spectrogram features are saved in data/spectrogram. |
|
||
|
||
""" | ||
This file reads the texts in given manifest and generate the file that maps tokens to IDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file reads the texts in given manifest and generate the file that maps tokens to IDs. | |
This file reads the texts in given manifest and generates the file that maps tokens to IDs. |
from pathlib import Path | ||
from typing import Dict | ||
|
||
import g2p_en |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add code about how to install g2p_en
.
from typing import Dict | ||
|
||
import g2p_en | ||
import tacotron_cleaner.cleaners |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add code about how to install tacotron_cleaner
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
# Sort by the number of occurrences in descending order | ||
tokens_and_counts = sorted(counter.items(), key=lambda x: -x[1]) | ||
|
||
for token, idx in extra_tokens.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Items in a dict are iterated in an unknown order.
Please use a list for extra_tokens
.
You can use
tokens_and_counts = extra_tokens + tokens_and_counts
counter[t] += 1 | ||
|
||
# Sort by the number of occurrences in descending order | ||
tokens_and_counts = sorted(counter.items(), key=lambda x: -x[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to sort them by count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make it easy to cut off the vocabulary according the counts if needed. But we don't need this now.
for token, idx in extra_tokens.items(): | ||
tokens_and_counts.insert(idx, (token, None)) | ||
|
||
token2id: Dict[str, int] = {token: i for i, (token, count) in enumerate(tokens_and_counts)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
token2id: Dict[str, int] = {token: i for i, (token, count) in enumerate(tokens_and_counts)} | |
token2id: Dict[str, int] = {token: i for i, (token, _) in enumerate(tokens_and_counts)} |
|
||
args = get_args() | ||
manifest_file = Path(args.manifest_file) | ||
out_file = Path(args.tokens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that if out_file
exists, it returns directly without any further computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have checked this in prepare.sh
|
||
assert manifest.is_file(), f"{manifest} does not exist" | ||
cut_set = load_manifest_lazy(manifest) | ||
assert isinstance(cut_set, CutSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert isinstance(cut_set, CutSet) | |
assert isinstance(cut_set, CutSet), type(cut_set) |
egs/ljspeech/TTS/prepare.sh
Outdated
log "Stage 0: Download data" | ||
|
||
# If you have pre-downloaded it to /path/to/LJSpeech, | ||
# you can create a symlink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a small description about what the directory LJSpeech contains.
egs/ljspeech/TTS/prepare.sh
Outdated
# If you have pre-downloaded it to /path/to/LJSpeech, | ||
# you can create a symlink | ||
# | ||
# ln -sfv /path/to/LJSpeech $dl_dir/LJSpeech |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it LJSpeech
or LJSpeech-1.1
?
egs/ljspeech/TTS/prepare.sh
Outdated
fi | ||
|
||
if [ ! -e data/spectrogram/.ljspeech-validated.done ]; then | ||
log "Validating data/fbank for LJSpeech" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log "Validating data/fbank for LJSpeech" | |
log "Validating data/spectrogram for LJSpeech" |
@@ -0,0 +1,97 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace it with a symlink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thanks.
|
Training logs, Tensorboard logs, and checkpoints are uploaded to https://huggingface.co/Zengwei/icefall-tts-ljspeech-vits-2023-11-29. |
This PR adds a TTS baseline in icefall.
The model related codes are mostly copied from espnet.
TODO:
Will add a recipe on VCTK dataset later.