OpenAI Whisper is an automatic speech recognition (ASR) system developed by OpenAI. It is designed to transcribe spoken language into text with high accuracy. Whisper is trained on a large and diverse dataset of multilingual and multitask supervised data, which enables it to handle various languages, accents, and domains. It can also perform tasks like language identification, translation, and transcription of audio files.
- Multilingual Support: It can understand and transcribe multiple languages, making it versatile for global applications.
- Robust to Accents and Noise: Whisper is trained on a wide range of audio data, which includes diverse accents and background noises, making it robust in challenging audio conditions.
- Multitask Capabilities: Beyond just transcription, it can also identify the spoken language, detect different speakers, and even translate between some languages.
- High Accuracy: Due to its extensive training dataset, Whisper can achieve high levels of accuracy in transcription, especially in comparison to traditional ASR systems.
- Open-Source: Whisper is available as an open-source model, allowing developers to integrate it into their applications and further adapt it for specific use cases.
Overall, Whisper is a powerful tool for converting spoken language into text, useful for applications such as transcribing meetings, generating subtitles, voice-activated commands, and more.
Follow these steps to install and use OpenAI Whisper for video and audio files transcription.
Have fun 😉
- Visit the Python website and download the latest version.
- Go to the PyTorch Start Locally page to configure the installation command based on your system and requirements.
- Using CPU:
pip3 install torch torchvision torchaudio
- Using CUDA 11.8 (NVIDIA GPUs):
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
- Open PowerShell as Administrator and run the following command:
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
- Use Chocolatey to install FFmpeg for reading various audio files:
choco install ffmpeg
Install Whisper:
pip install -U openai-whisper
- For more information about Whisper models, visit the Whisper GitHub repository.
exoprt XDG_CACHE_HOME="C:\Users\<USERNAME>\.cache"
Do not forget to change <USERNAME>
to your username.
- Default Transcription:
whisper file_name1 file_name2 file_name3
- Specify a Model:
whisper file_name --model medium
- Specify a Downloaded Model (Prevent Downloading the model again if environment variable is not set):
whisper file_name --model medium --model_dir C:\Users\<USERNAME>\.cache\whisper
- Specify English-only Model:
whisper file_name --model medium.en
- Specify Language:
whisper file_name --language German
- Translate to English:
whisper file_name --task Translate
- Get just the srt file:
whisper file_name --model medium --output_format srt