TTS Eval

tts_eval is a python library for automatic evaluation of TTS outputs.

Setup

pip install tts_eval

Metrics

ASR Metric

ASR metric evaluates fidelity of the generated speech by looking at the difference in the transcripts. The reference transcript should be the prompt used as an input for the TTS generation and the transcript of the generated speech is predicted by ASR model.

Python Usage:

Get sample audio.

wget https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000/resolve/main/sample.flac -O sample_1.flac
wget https://huggingface.co/datasets/japanese-asr/en_asr.esb_eval/resolve/main/sample.wav -O sample_2.wav

Evaluate via python.

from tts_eval import ASRMetric
pipe = ASRMetric(
    model_id="kotoba-tech/kotoba-whisper-v2.0",  # ASR model to transcribe speech input
    metrics=["cer", "wer"]  # metrics
)
output = pipe(
    ["sample_1.flac", "sample_2.wav"],  # a list of audio to evaluate
    transcript="水をマレーシアから買わなくてはならない"  # reference transcript 
)
print(output)
{
    'cer': [15.789473684210526, 110.5263157894737],
    'wer': [100.0, 100.0]
}

Speech Embedding Similarity

Speech embedding similarity evaluates voice cloning capability of TTS model. It takes a reference speech used as speaker reference for generation, and compute similarity between the reference speech and the generated speech based on speech embedding.

Python Usage:

Get sample audio.

wget https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000/resolve/main/sample.flac -O sample_1.flac
wget https://huggingface.co/datasets/japanese-asr/en_asr.esb_eval/resolve/main/sample.wav -O sample_2.wav

Evaluate via python.

from tts_eval import SpeakerEmbeddingSimilarity

pipe = SpeakerEmbeddingSimilarity(model_id="metavoice")
output = pipe(
    audio_target=["sample_1.flac", "sample_2.wav"],
    audio_reference="sample_1.flac"
)
print(output)
{
    'cosine_similarity': [1.0000001, 0.65718323]
}

Following speech embedding models are available:

metavoice, pyannote, clap, clap_general, w2v_bert, hubert_xl, hubert_large, hubert_base, wav2vec, xlsr_2b, xlsr_1b, xlsr_300m

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
experiments/test_tts_output		experiments/test_tts_output
tests		tests
tts_eval		tts_eval
.gitignore		.gitignore
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TTS Eval

Setup

Metrics

ASR Metric

Speech Embedding Similarity

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

kotoba-tech/tts_eval

Folders and files

Latest commit

History

Repository files navigation

TTS Eval

Setup

Metrics

ASR Metric

Speech Embedding Similarity

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages