Skip to content

ThushanthaSanju/askfastapi

Repository files navigation

askfastapi

A retrieval-augmented chatbot over the entire FastAPI documentation. Ask plain-English questions, get cited answers with links back to the exact section of the docs.

CI License: MIT Python 3.11+

Live demo: coming soon — Fly.io deploy in progress

Screenshot/GIF will land in docs/demo.gif after the first deploy.


Why this exists

FastAPI's docs are excellent but sprawling. I keep losing time grepping through them when I'm building APIs — especially around dependency injection, security, and lifespan events. A focused chatbot over the docs is something I'd actually use, which means I'll dogfood it long enough to find the real bugs.

The headline feature isn't the chatbot UI. It's the eval harness: every release is scored on retrieval@5, citation accuracy, and groundedness against a hand-written question set. Numbers, not vibes.

Quickstart

git clone https://github.com/ThushanthaSanju/askfastapi.git
cd askfastapi
cp .env.example .env  # add your OPENAI_API_KEY and ANTHROPIC_API_KEY
docker compose up --build

Backend on :8000, frontend on :3000. First-time ingestion:

docker compose exec backend python -m backend.ingest
# or for a fast iteration cycle:
docker compose exec backend python -m backend.ingest --limit 20

Run the eval suite (requires the backend to be up and ingested):

docker compose exec backend python evals/run_evals.py --backend http://localhost:8000

Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
│   Next.js    │────▶│   FastAPI    │────▶│  Hybrid retrieval │
│  (frontend)  │ SSE │   (backend)  │     │  BM25 + semantic  │
└──────────────┘     └──────┬───────┘     │  + RRF + rerank   │
                            │             └────────┬──────────┘
                            ▼                      ▼
                     ┌──────────────┐       ┌─────────────┐
                     │  Anthropic   │       │  ChromaDB   │
                     │  Claude SSE  │       │ (persistent)│
                     └──────────────┘       └─────────────┘

Pipeline: ingest crawls the FastAPI sitemap → markdown-aware chunker (header-anchored, ~500 tokens, 50-token overlap) → OpenAI embeddings → Chroma. Retrieval runs BM25 and semantic search in parallel, fuses with Reciprocal Rank Fusion, reranks the top 20 down to 5 with a CPU cross-encoder, then streams Claude's answer with inline citations.

Eval results

See evals/results.md for the latest run.

Metric Value
Questions 25
Retrieval accuracy@5 pending first run
Answer faithfulness (mean) pending
Citation accuracy (mean) pending
Latency p50 pending
Latency p95 pending

The eval harness uses Claude as a judge for faithfulness (structured JSON output, 0..1) and parses inline [N] citations to compute citation accuracy against the retrieved set. Per-question results land in the same file.

Tech stack

Layer Choice Why
Backend FastAPI + Python 3.11 Native async, owns the bit
LLM Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) Best long-context grounding
Embeddings OpenAI text-embedding-3-small ~$0.02 / 1M tokens, accurate
Vector store ChromaDB Persistent, no API key, Docker-friendly
Reranker cross-encoder/ms-marco-MiniLM-L-6-v2 Small, CPU-only, fast
Lexical search rank_bm25 Simple, in-memory, fits the corpus
Frontend Next.js 14 App Router + Tailwind + shadcn primitives Streaming-friendly, type-safe
Orchestration LangChain (minimal) Retrieval chain only — no agents
Eval judge Claude (structured JSON output) Same model, cheap to run

Project structure

askfastapi/
├── backend/          # FastAPI app, ingest CLI, retrieval pipeline
│   ├── main.py       # lifespan, /chat (SSE), /stats, /health
│   ├── ingest.py     # crawl → chunk → embed → upsert
│   ├── chunking.py   # markdown-header-aware splitter
│   ├── retrieval.py  # BM25 + semantic + RRF + cross-encoder
│   ├── chat.py       # prompt + citation orchestration
│   └── tests/        # unit tests for chunking, retrieval, chat
├── frontend/         # Next.js 14 chat UI with streaming + citations
├── evals/            # 25 hand-written Q/A pairs + scoring harness
├── Dockerfile        # multi-stage backend image
├── docker-compose.yml
└── fly.toml

Deployment (Fly.io)

fly launch --no-deploy            # creates the app from fly.toml
fly volumes create chroma_data --size 1
fly secrets set OPENAI_API_KEY=... ANTHROPIC_API_KEY=...
fly deploy
fly ssh console -C "python -m backend.ingest"

The frontend deploys separately to Vercel — point NEXT_PUBLIC_BACKEND_URL at the Fly app URL.

Roadmap

  • Hybrid retrieval (BM25 + semantic + RRF + rerank)
  • Streaming /chat endpoint with structured citations
  • Next.js chat UI with inline [N] popovers + sources panel
  • 25-question eval harness with Claude-as-judge
  • Docker + Fly.io deployment config
  • First public eval results published
  • Demo GIF
  • Conversation memory beyond current session
  • Caching layer (only if evals show latency is the bottleneck)
  • Auto-reingest on docs changes

Contributing

Issues and pull requests welcome. See CONTRIBUTING.md.

License

MIT — see LICENSE.

About

RAG chatbot over the FastAPI documentation. Hybrid retrieval (BM25 + semantic + RRF + rerank), streaming Claude responses with inline citations, and a 25-question eval harness.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors