MMS-TTS Fine-tuned for Kabardian (Speaker: Sokhov Murat)

This repository contains a fine-tuned version of Facebook's MMS-TTS model, adapted for generating speech in the Kabardian language. The model is trained on a dataset of audio recordings by the speaker Sokhov Murat.

Model Details

Usage

To use this model for text-to-speech generation, you can leverage the pipeline functionality from the Transformers library. Here's an example:

from transformers import pipeline
import scipy

model_id = "anzorq/mms_finetune_kbd_murat"
synthesiser = pipeline("text-to-speech", model_id, device=0) # add device=0 if you want to use a GPU

text = "ะดะฐัƒั ัƒั‰ั‹ั‚?"
speech = synthesiser(text)

# Save the generated audio to a file
scipy.io.wavfile.write("finetuned_output.wav", rate=speech["sampling_rate"], data=speech["audio"][0])

This code will generate an audio file finetuned_output.wav containing the speech synthesis for the provided Kabardian text.

Notes

  • Fine-tuned following the guide at https://github.com/ylacombe/finetune-hf-vits
  • Since no pre-trained MMS-TTS model was available for Kabardian, we fine-tuned a model for Chechen, which has the closest character set to Kabardian.
  • Do not use in production. This model's performance is considerably worse than that of the fine-tuned VITS model anzorq/kbd-vits-tts-male for Kabardian text-to-speech.

License

The original MMS-TTS model by Meta is licensed under the CC-BY-NC-4.0 License. This fine-tuned version inherits the same license.

Acknowledgments

  • AI at Meta for the original MMS-TTS model.
  • Sokhov Murat for providing the audio recordings used for fine-tuning.
Downloads last month
16
Safetensors
Model size
36.3M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train anzorq/mms_finetune_kbd_murat