Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on mini1.1 #179

Open
krishankantmehra opened this issue Dec 19, 2024 · 2 comments
Open

Training on mini1.1 #179

krishankantmehra opened this issue Dec 19, 2024 · 2 comments

Comments

@krishankantmehra
Copy link

krishankantmehra commented Dec 19, 2024

What changes in scripts needs to be done for training the model parler-tts/parler-tts-mini-v1.1 ...

As I understood, That the tokeniser for this model is changed.

Currently I am trying

--model_name_or_path "parler-tts/parler-tts-mini-v1.1"
--feature_extractor_name "parler-tts/parler-tts-mini-v1.1"
--description_tokenizer_name "google/flan-t5-large"
--prompt_tokenizer_name "google/flan-t5-large"

I am getting error as :

image
@JahidBasher
Copy link

In apply_audio_decoder function, pop bandwidth and padding_mask

`def apply_audio_decoder(batch):
len_audio = batch.pop("len_audio")
audio_decoder.to(batch["input_values"].device).eval()
if bandwidth is not None:
batch["bandwidth"] = bandwidth
elif "num_quantizers" in encoder_signature:
batch["num_quantizers"] = num_codebooks
elif "num_codebooks" in encoder_signature:
batch["num_codebooks"] = num_codebooks
elif "n_quantizers" in encoder_signature:
batch["n_quantizers"] = num_codebooks

batch.pop('padding_mask', None)
batch.pop('bandwidth', None)

with torch.no_grad():
    labels = audio_decoder.encode(**batch)["audio_codes"]
output = {}
output["len_audio"] = len_audio
# (1, bsz, codebooks, seq_len) -> (bsz, seq_len, codebooks)
output["labels"] = labels.squeeze(0).transpose(1, 2)

# if `pad_to_max_length`, the maximum corresponding audio length of the current batch is max_duration*sampling_rate
max_length = len_audio.max() if padding != "max_length" else max_target_length
output["ratio"] = torch.ones_like(len_audio) * labels.shape[-1] / max_length
return output`

@krishankantmehra
Copy link
Author

krishankantmehra commented Jan 13, 2025

Thanks for reply... @JahidBasher ....

I am able to train but still having hallucination problem in outputs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants