Record audio in the browser, transcribe it locally with Faster-Whisper, and polish the text with an LLM (Ollama, LM Studio, or any OpenAI-compatible endpoint). Full-stack: FastAPI backend, React/Vite frontend, runs offline-first on your own machine.
Start learning at learnwithparam.com. Regional pricing available with discounts of up to 60%.
- Capture microphone audio in the browser with the MediaRecorder API
- Stream audio uploads to a FastAPI backend with multipart file handling
- Run Whisper locally with
faster-whisperfor speech-to-text on CPU or GPU - Use any OpenAI-compatible LLM (Ollama, LM Studio, OpenAI) to clean transcripts
- Wire a React + Vite frontend to a Python backend with CORS-aware fetch calls
- Ship an offline-first AI app that works without sending audio to the cloud
- Backend: Python 3.12, FastAPI, Uvicorn, faster-whisper, OpenAI-compatible client
- Frontend: React 19, Vite 7, TypeScript, lucide-react icons
- LLM runtime: Ollama (default), LM Studio, or OpenAI API
- Tooling: uv for Python deps, npm for JS deps
- Python 3.12+, Node.js 20+
- uv and npm
- An LLM server: Ollama (pull a model:
ollama pull gemma3:4b) or LM Studio, or an OpenAI API key
# One command to set up both apps
make setup
# Then run in two terminals:
make backend # starts FastAPI at :8000
make frontend # starts Vite at :3000Open http://localhost:3000.
make build
make up
make logs
make downWork through these incrementally to build the full app:
- Browser Recording - Use MediaRecorder to capture webm audio chunks
- Upload Endpoint - FastAPI route that accepts multipart uploads and saves a temp file
- Local Whisper - Load
faster-whisperwith device auto-detect and transcribe the upload - LLM Cleanup - Send the raw transcript to an OpenAI-compatible LLM with a cleanup system prompt
- Provider Switching - Swap Ollama for LM Studio or OpenAI via
.envwithout changing code - Custom System Prompts - Let the user edit the cleanup prompt from the UI
- Copy to Clipboard - One-click copy of both raw and cleaned text
- Streaming Transcripts - Stream partial results back as Whisper processes longer audio
voice-transcription-whisper/
├── backend/ FastAPI + Whisper + LLM cleanup
│ ├── app.py
│ ├── transcription.py
│ ├── system_prompt.txt
│ └── pyproject.toml
├── frontend/ React + Vite recorder UI
│ ├── src/
│ └── package.json
├── Makefile
├── Dockerfile
└── docker-compose.yml
make help Show all available commands
make setup Install backend and frontend dependencies
make backend Run only the backend
make frontend Run only the frontend
make build Build Docker images
make up Start containers
make down Stop containers
make logs View container logs
make clean Remove venv, node_modules, and caches
- Start the course: learnwithparam.com/courses/voice-transcription-whisper
- AI Bootcamp for Software Engineers: learnwithparam.com/ai-bootcamp
- All courses: learnwithparam.com/courses