You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So basically I tried to experiment with whisper large v3 and fine tuning it for Persian language according to this Tutorial by Sanchit Gandhi with Common Voice Persian.
However, after trying to use the fine tuned model, the model would only transcribe the first short segments of the audio files (10 sec or something like this) and would not continue to the rest of audio file. I believe this is happening because the data in common voice are not so long, but this does not make sense since everyone else has managed to fine tune the model and it would work without this problem.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
So basically I tried to experiment with whisper large v3 and fine tuning it for Persian language according to this Tutorial by Sanchit Gandhi with Common Voice Persian.
However, after trying to use the fine tuned model, the model would only transcribe the first short segments of the audio files (10 sec or something like this) and would not continue to the rest of audio file. I believe this is happening because the data in common voice are not so long, but this does not make sense since everyone else has managed to fine tune the model and it would work without this problem.
Can someone help me with this?
Fine Tuned Model Huggingface
Dataset
Beta Was this translation helpful? Give feedback.
All reactions