Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory. Tried to allocate 91.71 GiB. #4

Open
Aniruddh-J opened this issue Nov 17, 2024 · 2 comments
Open

CUDA out of memory. Tried to allocate 91.71 GiB. #4

Aniruddh-J opened this issue Nov 17, 2024 · 2 comments

Comments

@Aniruddh-J
Copy link

Aniruddh-J commented Nov 17, 2024

Is it not possible to transcribe long audio files, around ~3 hours? I am trying to transcribe the 3-hour audio to Hindi, but it uses huge memory.

import torch
import nemo.collections.asr as nemo_asr

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
nm_model = nemo_asr.models.EncDecCTCModel.from_pretrained('ai4bharat/indicconformer_stt_hi_hybrid_rnnt_large')
nm_model.freeze() # inference mode
nm_model = nm_model.to(device)
nm_model.cur_decoder = 'rnnt'
text = nm_model.transcribe(audio=str(processed_audio_file), batch_size=1, language_id='hi')[0]

The memory usage is huuuuge.

@ryback123
Copy link
Collaborator

You can try two things:

  • In the model config, set the self_attention_model in NeMo to local, and set a smaller local_attention_window.
  • Chunk the audio into multiple segments using Silero VAD, run inference on the individual chunks, and finally merge the output transcripts.

The second option might give a better result since the model has been trained on 5 - 25 second audio chunks, so it would most accurate on audio files having a duration within that range.

@Aniruddh-J
Copy link
Author

Aniruddh-J commented Nov 18, 2024

Fantastic. I am going ahead with the audio chunk route. Besides, is it possible to turn off tqdm logging while transcribing? I have my own progress for chunks using tqdm.

For now, I am using suppresser class to suppress NeMo outputs:

class SuppressNeMo:
    def __enter__(self):
        self._original_stderr = sys.stderr
        sys.stderr = open(os.devnull, "w") 
    def __exit__(self, exc_type, exc_value, traceback):
        sys.stderr.close()
        sys.stderr = self._original_stderr

@Aniruddh-J Aniruddh-J changed the title CUDA out of memory. Tried to allocate 91.71 GiB. GPU CUDA out of memory. Tried to allocate 91.71 GiB. Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants