Whisper ASR Box

🎉 Join our Discord Community! Connect with other users, get help, and stay updated on the latest features: https://discord.gg/4Q5YVrePzZ

Whisper ASR Box

Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.

Features

Current release (v1.9.1) supports following whisper models:

Quick Usage

CPU

docker run -d -p 9000:9000 \
  -e ASR_MODEL=base \
  -e ASR_ENGINE=openai_whisper \
  onerahmet/openai-whisper-asr-webservice:latest

GPU

docker run -d --gpus all -p 9000:9000 \
  -e ASR_MODEL=base \
  -e ASR_ENGINE=openai_whisper \
  onerahmet/openai-whisper-asr-webservice:latest-gpu

Cache

To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:

docker run -d -p 9000:9000 \
  -v $PWD/cache:/root/.cache/ \
  onerahmet/openai-whisper-asr-webservice:latest

Key Features

Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
Multiple output formats (text, JSON, VTT, SRT, TSV)
Word-level timestamps support
Voice activity detection (VAD) filtering
Speaker diarization (with WhisperX)
FFmpeg integration for broad audio/video format support
GPU acceleration support
Configurable model loading/unloading
REST API with Swagger documentation

Environment Variables

Key configuration options:

ASR_ENGINE: Engine selection (openai_whisper, faster_whisper, whisperx)
ASR_MODEL: Model selection (tiny, base, small, medium, large-v3, etc.)
ASR_MODEL_PATH: Custom path to store/load models
ASR_DEVICE: Device selection (cuda, cpu)
MODEL_IDLE_TIMEOUT: Timeout for model unloading

Documentation

For complete documentation, visit: https://ahmetoner.github.io/whisper-asr-webservice

Development

# Install poetry v2.X
pip3 install poetry

# Install dependencies for cpu
poetry install --extras cpu

# Install dependencies for cuda
poetry install --extras cuda

# Run service
poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000

After starting the service, visit http://localhost:9000 or http://0.0.0.0:9000 in your browser to access the Swagger UI documentation and try out the API endpoints.

Credits

This software uses libraries from the FFmpeg project under the LGPLv2.1

Name		Name	Last commit message	Last commit date
Latest commit History 357 Commits
.github		.github
app		app
docs		docs
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
LICENCE		LICENCE
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whisper ASR Box

Features

Quick Usage

CPU

GPU

Cache

Key Features

Environment Variables

Documentation

Development

Credits

About

Uh oh!

Releases

Packages

Languages

License

mediainbox/whisper-asr-webservice

Folders and files

Latest commit

History

Repository files navigation

Whisper ASR Box

Features

Quick Usage

CPU

GPU

Cache

Key Features

Environment Variables

Documentation

Development

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages