speech tokenizer training code #9

indiejoseph · 2024-10-21T08:55:08Z

This project is demonstrated a very good way to tokenize a speech with different feature, such as style and pitch tokens, that enable downstream application having fine grained control of the generative voice.

I've tested the speech tokenizer in Cantonese, the output has very strong accent, probably due to the training dataset only contains English, I was wondering how can I trained the tokenizer? I know Fairseq has Hubert and HIFIGan training recipe, but not sure how to about pitch and style feature.

hitchhicker · 2024-10-24T12:43:23Z

@tuanh208 Could you share some insights for this question? Thanks!

tuanh208 · 2024-10-25T09:12:31Z

Hi, I think the reason why the output has strong accent in Chinese is because we only trained the Hifigan vocoder on Expresso (which is in English).

For the pitch tokenizer, as mentioned in the paper, we trained a vqvae model on the extracted f0 (you can use any f0 extractor in this repo) following this work: https://github.com/facebookresearch/speech-resynthesis?tab=readme-ov-file#f0-quantizer-model

For the style tokenizer, we initially fine-tune Speechprop (in this work https://ai.meta.com/research/publications/sonar-expressive-zero-shot-expressive-speech-to-speech-translation/) to predict the styles on the expresso dataset, and train a k-mean tokenizer on the extracted features from speechprop, but for this release we distilled a smaller wav2vec2 model to predict the tokens producted by speechprop, which turns out to work not bad. So let's say if you want to train a new style tokenizer, I would suggest you fine-tune a good speech encoder (e.g. w2v2, wavlm) on some expressive datasets with style labels, and it should work well.

gallilmaimon mentioned this issue Dec 17, 2024

Unexpected(?) behavior Style encoder gives a constant token for every file I used. even files which should have different styles at different times #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speech tokenizer training code #9

speech tokenizer training code #9

indiejoseph commented Oct 21, 2024

hitchhicker commented Oct 24, 2024

tuanh208 commented Oct 25, 2024

speech tokenizer training code #9

speech tokenizer training code #9

Comments

indiejoseph commented Oct 21, 2024

hitchhicker commented Oct 24, 2024

tuanh208 commented Oct 25, 2024