You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the model config, set the self_attention_model in NeMo to local, and set a smaller local_attention_window.
Chunk the audio into multiple segments using Silero VAD, run inference on the individual chunks, and finally merge the output transcripts.
The second option might give a better result since the model has been trained on 5 - 25 second audio chunks, so it would most accurate on audio files having a duration within that range.
Fantastic. I am going ahead with the audio chunk route. Besides, is it possible to turn off tqdm logging while transcribing? I have my own progress for chunks using tqdm.
For now, I am using suppresser class to suppress NeMo outputs:
Is it not possible to transcribe long audio files, around ~3 hours? I am trying to transcribe the 3-hour audio to Hindi, but it uses huge memory.
The memory usage is huuuuge.
The text was updated successfully, but these errors were encountered: