WhisperWave is a transcription system that allows users to upload audio or video files for transcription using OpenAI's Whisper model. The system is built with a Flask backend and a React frontend, and supports file management functionalities such as file upload, delete, and transcription viewing.
- Upload audio/video files for transcription.
- Real-time transcription using the Whisper model.
- Manage uploaded files (delete, view transcripts).
- Responsive frontend built with React.
- Dockerized for ease of deployment in production.
The app can be run in both standalone mode (without containers) or using Docker containers.
- Python 3.10 or higher
- Node.js 18.x or higher
- npm (comes with Node.js)
- FFmpeg (for handling audio and video)
- Docker and Docker Compose (for running in containers)
This mode is ideal for local development.
-
Clone the repository:
git clone https://github.com/idanshimon/WhisperWave.git cd WhisperWave/backend -
Create and activate a Python virtual environment:
cd backend python -m venv venv source venv/bin/activate # On macOS/Linux # or .\venv\Scripts\activate # On Windows
-
Install the dependencies (including Whisper):
pip install -r requirements.txt
-
Ensure FFmpeg is installed on your system: On macOS:
brew install ffmpeg
On Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg -
Run the Flask app:
python app.py
The Flask backend will start on http://localhost:9010.
- In a new terminal, navigate to the frontend directory:
cd WhisperWave/frontend - Install the frontend dependencies and start:
npm install
- Start the React development server:
npm start
The frontend will run on http://localhost:3000, and it will proxy API requests to the Flask backend on port 9010.
This mode is ideal for production or containerized environments.
Docker Compose will build and run both the backend (Flask) and the frontend (React) as separate services.
-
Clone the repository:
git clone https://github.com/idanshimon/WhisperWave.git cd WhisperWave/backend -
Build and run the app using Docker Compose:
docker-compose up --build
This will:
- Build the backend (Flask) container and expose it on port 9010.
- Build the frontend (React) container and serve it via Nginx on port 3000.
To stop the running containers:
docker-compose downCurrently only support "base" model.
Whisper offers multiple models for transcription, ranging from the smaller tiny models to the larger large models with varying speed and accuracy tradeoffs. You can configure the model size during transcription via the API.
Available models:
tiny, base, small, medium, large For English-only transcriptions, use the .en variants (e.g., tiny.en, base.en) for better performance.
To back up and restore the database and file uploads (transcriptions), use the provided backup_restore.py script in the backend/ directory.
Backup
cd backend
python backup_restore.py backup --file backup_filename.tar.gzRestore
cd backend
python backup_restore.py restore --file backup_filename.tar.gz- Real-Time Transcription: Consider adding support for real-time transcription in the future.
- Authentication/Authorization: If the system will be used by multiple users, add authentication to secure file uploads, viewing, and deletion.
- Cloud Storage: For large-scale deployments, consider integrating cloud storage (e.g., AWS S3) for file storage, and cloud-based transcription services to scale the workload.
- Multiple Language Support: Whisper supports multiple languages. Allow users to select the language for transcription.
- Implement file streaming and partial transcription for large files.
This project is licensed under the MIT License. See the LICENSE file for details.
Idan Shimon

