A retrieval-augmented chatbot over the entire FastAPI documentation. Ask plain-English questions, get cited answers with links back to the exact section of the docs.
Live demo: coming soon — Fly.io deploy in progress
Screenshot/GIF will land in
docs/demo.gifafter the first deploy.
FastAPI's docs are excellent but sprawling. I keep losing time grepping through them when I'm building APIs — especially around dependency injection, security, and lifespan events. A focused chatbot over the docs is something I'd actually use, which means I'll dogfood it long enough to find the real bugs.
The headline feature isn't the chatbot UI. It's the eval harness: every release is scored on retrieval@5, citation accuracy, and groundedness against a hand-written question set. Numbers, not vibes.
git clone https://github.com/ThushanthaSanju/askfastapi.git
cd askfastapi
cp .env.example .env # add your OPENAI_API_KEY and ANTHROPIC_API_KEY
docker compose up --buildBackend on :8000, frontend on :3000. First-time ingestion:
docker compose exec backend python -m backend.ingest
# or for a fast iteration cycle:
docker compose exec backend python -m backend.ingest --limit 20Run the eval suite (requires the backend to be up and ingested):
docker compose exec backend python evals/run_evals.py --backend http://localhost:8000┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
│ Next.js │────▶│ FastAPI │────▶│ Hybrid retrieval │
│ (frontend) │ SSE │ (backend) │ │ BM25 + semantic │
└──────────────┘ └──────┬───────┘ │ + RRF + rerank │
│ └────────┬──────────┘
▼ ▼
┌──────────────┐ ┌─────────────┐
│ Anthropic │ │ ChromaDB │
│ Claude SSE │ │ (persistent)│
└──────────────┘ └─────────────┘
Pipeline: ingest crawls the FastAPI sitemap → markdown-aware chunker (header-anchored, ~500 tokens, 50-token overlap) → OpenAI embeddings → Chroma. Retrieval runs BM25 and semantic search in parallel, fuses with Reciprocal Rank Fusion, reranks the top 20 down to 5 with a CPU cross-encoder, then streams Claude's answer with inline citations.
See evals/results.md for the latest run.
| Metric | Value |
|---|---|
| Questions | 25 |
| Retrieval accuracy@5 | pending first run |
| Answer faithfulness (mean) | pending |
| Citation accuracy (mean) | pending |
| Latency p50 | pending |
| Latency p95 | pending |
The eval harness uses Claude as a judge for faithfulness (structured JSON output, 0..1) and parses inline [N] citations to compute citation accuracy against the retrieved set. Per-question results land in the same file.
| Layer | Choice | Why |
|---|---|---|
| Backend | FastAPI + Python 3.11 | Native async, owns the bit |
| LLM | Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) |
Best long-context grounding |
| Embeddings | OpenAI text-embedding-3-small |
~$0.02 / 1M tokens, accurate |
| Vector store | ChromaDB | Persistent, no API key, Docker-friendly |
| Reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 |
Small, CPU-only, fast |
| Lexical search | rank_bm25 |
Simple, in-memory, fits the corpus |
| Frontend | Next.js 14 App Router + Tailwind + shadcn primitives | Streaming-friendly, type-safe |
| Orchestration | LangChain (minimal) | Retrieval chain only — no agents |
| Eval judge | Claude (structured JSON output) | Same model, cheap to run |
askfastapi/
├── backend/ # FastAPI app, ingest CLI, retrieval pipeline
│ ├── main.py # lifespan, /chat (SSE), /stats, /health
│ ├── ingest.py # crawl → chunk → embed → upsert
│ ├── chunking.py # markdown-header-aware splitter
│ ├── retrieval.py # BM25 + semantic + RRF + cross-encoder
│ ├── chat.py # prompt + citation orchestration
│ └── tests/ # unit tests for chunking, retrieval, chat
├── frontend/ # Next.js 14 chat UI with streaming + citations
├── evals/ # 25 hand-written Q/A pairs + scoring harness
├── Dockerfile # multi-stage backend image
├── docker-compose.yml
└── fly.toml
fly launch --no-deploy # creates the app from fly.toml
fly volumes create chroma_data --size 1
fly secrets set OPENAI_API_KEY=... ANTHROPIC_API_KEY=...
fly deploy
fly ssh console -C "python -m backend.ingest"The frontend deploys separately to Vercel — point NEXT_PUBLIC_BACKEND_URL at the Fly app URL.
- Hybrid retrieval (BM25 + semantic + RRF + rerank)
- Streaming
/chatendpoint with structured citations - Next.js chat UI with inline
[N]popovers + sources panel - 25-question eval harness with Claude-as-judge
- Docker + Fly.io deployment config
- First public eval results published
- Demo GIF
- Conversation memory beyond current session
- Caching layer (only if evals show latency is the bottleneck)
- Auto-reingest on docs changes
Issues and pull requests welcome. See CONTRIBUTING.md.
MIT — see LICENSE.