askfastapi

A retrieval-augmented chatbot over the entire FastAPI documentation. Ask plain-English questions, get cited answers with links back to the exact section of the docs.

Live demo: coming soon — Fly.io deploy in progress

Screenshot/GIF will land in docs/demo.gif after the first deploy.

Why this exists

FastAPI's docs are excellent but sprawling. I keep losing time grepping through them when I'm building APIs — especially around dependency injection, security, and lifespan events. A focused chatbot over the docs is something I'd actually use, which means I'll dogfood it long enough to find the real bugs.

The headline feature isn't the chatbot UI. It's the eval harness: every release is scored on retrieval@5, citation accuracy, and groundedness against a hand-written question set. Numbers, not vibes.

Quickstart

git clone https://github.com/ThushanthaSanju/askfastapi.git
cd askfastapi
cp .env.example .env  # add your OPENAI_API_KEY and ANTHROPIC_API_KEY
docker compose up --build

Backend on :8000, frontend on :3000. First-time ingestion:

docker compose exec backend python -m backend.ingest
# or for a fast iteration cycle:
docker compose exec backend python -m backend.ingest --limit 20

Run the eval suite (requires the backend to be up and ingested):

docker compose exec backend python evals/run_evals.py --backend http://localhost:8000

Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
│   Next.js    │────▶│   FastAPI    │────▶│  Hybrid retrieval │
│  (frontend)  │ SSE │   (backend)  │     │  BM25 + semantic  │
└──────────────┘     └──────┬───────┘     │  + RRF + rerank   │
                            │             └────────┬──────────┘
                            ▼                      ▼
                     ┌──────────────┐       ┌─────────────┐
                     │  Anthropic   │       │  ChromaDB   │
                     │  Claude SSE  │       │ (persistent)│
                     └──────────────┘       └─────────────┘

Pipeline: ingest crawls the FastAPI sitemap → markdown-aware chunker (header-anchored, ~500 tokens, 50-token overlap) → OpenAI embeddings → Chroma. Retrieval runs BM25 and semantic search in parallel, fuses with Reciprocal Rank Fusion, reranks the top 20 down to 5 with a CPU cross-encoder, then streams Claude's answer with inline citations.

Eval results

See evals/results.md for the latest run.

Metric	Value
Questions	25
Retrieval accuracy@5	pending first run
Answer faithfulness (mean)	pending
Citation accuracy (mean)	pending
Latency p50	pending
Latency p95	pending

The eval harness uses Claude as a judge for faithfulness (structured JSON output, 0..1) and parses inline [N] citations to compute citation accuracy against the retrieved set. Per-question results land in the same file.

Tech stack

Layer	Choice	Why
Backend	FastAPI + Python 3.11	Native async, owns the bit
LLM	Claude Sonnet 4.5 (`claude-sonnet-4-5-20250929`)	Best long-context grounding
Embeddings	OpenAI `text-embedding-3-small`	~$0.02 / 1M tokens, accurate
Vector store	ChromaDB	Persistent, no API key, Docker-friendly
Reranker	`cross-encoder/ms-marco-MiniLM-L-6-v2`	Small, CPU-only, fast
Lexical search	`rank_bm25`	Simple, in-memory, fits the corpus
Frontend	Next.js 14 App Router + Tailwind + shadcn primitives	Streaming-friendly, type-safe
Orchestration	LangChain (minimal)	Retrieval chain only — no agents
Eval judge	Claude (structured JSON output)	Same model, cheap to run

Project structure

askfastapi/
├── backend/          # FastAPI app, ingest CLI, retrieval pipeline
│   ├── main.py       # lifespan, /chat (SSE), /stats, /health
│   ├── ingest.py     # crawl → chunk → embed → upsert
│   ├── chunking.py   # markdown-header-aware splitter
│   ├── retrieval.py  # BM25 + semantic + RRF + cross-encoder
│   ├── chat.py       # prompt + citation orchestration
│   └── tests/        # unit tests for chunking, retrieval, chat
├── frontend/         # Next.js 14 chat UI with streaming + citations
├── evals/            # 25 hand-written Q/A pairs + scoring harness
├── Dockerfile        # multi-stage backend image
├── docker-compose.yml
└── fly.toml

Deployment (Fly.io)

fly launch --no-deploy            # creates the app from fly.toml
fly volumes create chroma_data --size 1
fly secrets set OPENAI_API_KEY=... ANTHROPIC_API_KEY=...
fly deploy
fly ssh console -C "python -m backend.ingest"

The frontend deploys separately to Vercel — point NEXT_PUBLIC_BACKEND_URL at the Fly app URL.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
evals		evals
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
fly.toml		fly.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

askfastapi

Why this exists

Quickstart

Architecture

Eval results

Tech stack

Project structure

Deployment (Fly.io)

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

askfastapi

Why this exists

Quickstart

Architecture

Eval results

Tech stack

Project structure

Deployment (Fly.io)

Roadmap

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages