Build a realtime phone voice agent that streams low-latency audio over WebRTC using FastRTC, pipes it through an LLM, and speaks back with natural-sounding TTS. You will ship a working phone assistant you can actually call into from a browser.
Start learning at learnwithparam.com. Regional pricing available with discounts of up to 60%.
- Realtime audio streaming over WebRTC with FastRTC
- The STT to LLM to TTS voice agent loop
- Designing spoken-style system prompts that keep replies short and natural
- Running a voice agent inside a single FastAPI process
- Swapping LLM providers (OpenRouter, Fireworks, Gemini, OpenAI) with one env var
- FastAPI - High-performance async Python web framework
- FastRTC - Python-native WebRTC streaming library (
Stream,ReplyOnPause) - WebRTC - Browser-standard low-latency audio transport
- OpenRouter - Default LLM gateway (swap providers via
.env) - Whisper (optional) - Speech-to-text when
OPENAI_API_KEYis set - Pydantic - Request/response validation
- Docker - Containerized development
LiveKit is a great alternative if you need multi-participant rooms. This project intentionally uses FastRTC to stay single-process and dependency-light.
- Python 3.11+
- uv (installed automatically by
make setup) - An LLM provider API key (OpenRouter recommended for the default config)
# One command to set up and run
make dev
# Or step by step:
make setup # Create .env and install dependencies
# Edit .env with your API keys
make run # Start the FastAPI + FastRTC serverThen open:
- http://localhost:8000/phone/ for the phone agent landing page
- http://localhost:8000/docs for the interactive Swagger UI
make build # Build the Docker image
make up # Start the container
make logs # View logs
make down # Stop the containerWork through these incrementally to build the full application:
- The First Connection - Boot the FastAPI app and hit
/phone/healthfrom curl - The Text Turn - POST to
/phone/turnand verify the LLM replies in spoken style - The WebRTC Mount - Stand up the FastRTC
Streamat/phone/webrtc - The STT Seam - Plug Whisper into
VoiceAgent.transcribebehind an env flag - The TTS Seam - Wire a real TTS provider into
VoiceAgent.synthesize - The Phone Persona - Tune the system prompt for a specific domain (airline, clinic, delivery)
- Tool Calling - Let the agent look up flights, bookings, or order state mid-call
- Barge-in & Turn Detection - Handle interruptions cleanly using FastRTC
ReplyOnPause
make help Show all available commands
make setup Initial setup (create .env, install deps)
make dev Setup and run the server
make run Start the FastAPI + FastRTC server
make build Build Docker image
make up Start container
make down Stop container
make clean Remove venv and cache
- Start the course: learnwithparam.com/courses/realtime-phone-agents-fastrtc
- AI Bootcamp for Software Engineers: learnwithparam.com/ai-bootcamp
- All courses: learnwithparam.com/courses