Skip to content

Steps to install and use OpenAI Whisper for video and audio files transcription

Notifications You must be signed in to change notification settings

Ryan-PG/openai-whisper-installation-usage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

What is OpenAI Whisper?

OpenAI Whisper is an automatic speech recognition (ASR) system developed by OpenAI. It is designed to transcribe spoken language into text with high accuracy. Whisper is trained on a large and diverse dataset of multilingual and multitask supervised data, which enables it to handle various languages, accents, and domains. It can also perform tasks like language identification, translation, and transcription of audio files.

Key Features of OpenAI Whisper:

  1. Multilingual Support: It can understand and transcribe multiple languages, making it versatile for global applications.
  2. Robust to Accents and Noise: Whisper is trained on a wide range of audio data, which includes diverse accents and background noises, making it robust in challenging audio conditions.
  3. Multitask Capabilities: Beyond just transcription, it can also identify the spoken language, detect different speakers, and even translate between some languages.
  4. High Accuracy: Due to its extensive training dataset, Whisper can achieve high levels of accuracy in transcription, especially in comparison to traditional ASR systems.
  5. Open-Source: Whisper is available as an open-source model, allowing developers to integrate it into their applications and further adapt it for specific use cases.

Overall, Whisper is a powerful tool for converting spoken language into text, useful for applications such as transcribing meetings, generating subtitles, voice-activated commands, and more.

Steps to install and use OpenAI Whisper for video and audio files transcription

Follow these steps to install and use OpenAI Whisper for video and audio files transcription. Have fun 😉

Needed Links

Installing Python

Installing PyTorch

Example Installation Commands:

  • Using CPU:
    pip3 install torch torchvision torchaudio
  • Using CUDA 11.8 (NVIDIA GPUs):
    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Installing Chocolatey

  • Open PowerShell as Administrator and run the following command:
    Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

Installing FFmpeg

  • Use Chocolatey to install FFmpeg for reading various audio files:
    choco install ffmpeg

Installing OpenAI Whisper

Install Whisper:

pip install -U openai-whisper

Set this environment to variable to prevent downloading the models again:

exoprt XDG_CACHE_HOME="C:\Users\<USERNAME>\.cache"

Do not forget to change <USERNAME> to your username.

Transcribing Video or Audio Files

  • Default Transcription:
    whisper file_name1 file_name2 file_name3
  • Specify a Model:
    whisper file_name --model medium
  • Specify a Downloaded Model (Prevent Downloading the model again if environment variable is not set):
    whisper file_name --model medium --model_dir C:\Users\<USERNAME>\.cache\whisper
  • Specify English-only Model:
    whisper file_name --model medium.en
  • Specify Language:
    whisper file_name --language German
  • Translate to English:
    whisper file_name --task Translate
  • Get just the srt file:
    whisper file_name --model medium --output_format srt

About

Steps to install and use OpenAI Whisper for video and audio files transcription

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published