Skip to content

learnwithparam/realtime-phone-agents-fastrtc

Repository files navigation

Realtime Phone Agents with FastRTC

learnwithparam.com

Build a realtime phone voice agent that streams low-latency audio over WebRTC using FastRTC, pipes it through an LLM, and speaks back with natural-sounding TTS. You will ship a working phone assistant you can actually call into from a browser.

Start learning at learnwithparam.com. Regional pricing available with discounts of up to 60%.

What You'll Learn

  • Realtime audio streaming over WebRTC with FastRTC
  • The STT to LLM to TTS voice agent loop
  • Designing spoken-style system prompts that keep replies short and natural
  • Running a voice agent inside a single FastAPI process
  • Swapping LLM providers (OpenRouter, Fireworks, Gemini, OpenAI) with one env var

Tech Stack

  • FastAPI - High-performance async Python web framework
  • FastRTC - Python-native WebRTC streaming library (Stream, ReplyOnPause)
  • WebRTC - Browser-standard low-latency audio transport
  • OpenRouter - Default LLM gateway (swap providers via .env)
  • Whisper (optional) - Speech-to-text when OPENAI_API_KEY is set
  • Pydantic - Request/response validation
  • Docker - Containerized development

LiveKit is a great alternative if you need multi-participant rooms. This project intentionally uses FastRTC to stay single-process and dependency-light.

Getting Started

Prerequisites

  • Python 3.11+
  • uv (installed automatically by make setup)
  • An LLM provider API key (OpenRouter recommended for the default config)

Quick Start

# One command to set up and run
make dev

# Or step by step:
make setup          # Create .env and install dependencies
# Edit .env with your API keys
make run            # Start the FastAPI + FastRTC server

Then open:

With Docker

make build          # Build the Docker image
make up             # Start the container
make logs           # View logs
make down           # Stop the container

Challenges

Work through these incrementally to build the full application:

  1. The First Connection - Boot the FastAPI app and hit /phone/health from curl
  2. The Text Turn - POST to /phone/turn and verify the LLM replies in spoken style
  3. The WebRTC Mount - Stand up the FastRTC Stream at /phone/webrtc
  4. The STT Seam - Plug Whisper into VoiceAgent.transcribe behind an env flag
  5. The TTS Seam - Wire a real TTS provider into VoiceAgent.synthesize
  6. The Phone Persona - Tune the system prompt for a specific domain (airline, clinic, delivery)
  7. Tool Calling - Let the agent look up flights, bookings, or order state mid-call
  8. Barge-in & Turn Detection - Handle interruptions cleanly using FastRTC ReplyOnPause

Makefile Targets

make help           Show all available commands
make setup          Initial setup (create .env, install deps)
make dev            Setup and run the server
make run            Start the FastAPI + FastRTC server
make build          Build Docker image
make up             Start container
make down           Stop container
make clean          Remove venv and cache

Learn more

About

Realtime Phone Agents with FastRTC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors