Vikhr Salt: Speech And Language Transformer

Vikhr Salt is a multimodal model based on a pre-trained large language model, extended with new audio tokens to handle both TTS (text-to-speech) and ASR (automatic speech recognition) tasks. The model incorporates two variants for encoding audio—Encodec and SpeechTokenizer—and achieves stable training by fine-tuning precision settings. This approach allows Vikhr Salt to leverage pre-existing LLM knowledge while effectively generating and understanding speech, marking a step forward in multimodal learning.

Model Authors

Ksenia Sycheva, Konstantin Korolev, Aleksandr Nikolic

Datasets

How to run

Preparing Data

To tokenize data run prepare_data.py. Configs for different tokenizers (SpeechTokenizer, WavTokenizer, FishTokenizer) are available in this folder.

python prepare_data.py --config configs/quantization/<your-tokenizer-config>.yaml

Training

It is possible to configure tokenization for TTS and ASR differently:

different number of tokens
different tokenizers

To do that specify type of quantizer and number of codebooks for both tasks. Examples of configs can be found here. Notes:

music/other non-speech data is only supported by this version of WavTokenizer
WavTokenizer has fixed number of codebooks = 1, for SpeechTokenizer values between 1 and 8 can be chosen

for single gpu

source scripts/run_me.sh

for multi gpu+ds2

source scripts/run_me_ds2.sh

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
configs		configs
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.ipynb		inference.ipynb
inference_wavtokenizer.ipynb		inference_wavtokenizer.ipynb
prepare_data.py		prepare_data.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vikhr Salt: Speech And Language Transformer

Model Authors

Datasets

How to run

Preparing Data

Training

About

Releases

Packages

Contributors 3

Languages

License

VikhrModels/Salt

Folders and files

Latest commit

History

Repository files navigation

Vikhr Salt: Speech And Language Transformer

Model Authors

Datasets

How to run

Preparing Data

Training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages