A Canadian AI assistant that answers questions about any City of Toronto service using official content from Toronto.ca. Every answer is cited and grounded in public evidence.
Zeever.ca: A Low-Budget Experiment in Sovereign Canadian AI
Live at www.zeever.ca
Ask a question about any Toronto city service and get a cited, evidence-based answer:
Q: How do I dispute a parking ticket in Toronto?
A: You can request a review of a parking ticket within 15 days of the
issue date through the City's online dispute portal or in person at
a Court Services counter...
[Source: toronto.ca/services-payments/streets-parking-transportation/...]
Covers property taxes, waste and recycling, parking, transit, housing, recreation, public health, permits and licences, bylaws, city government, and everything in between — every public section of Toronto.ca.
| Metric | Count |
|---|---|
| Parsed documents | 23,000+ |
| Searchable chunks | 112,000+ |
| Parsed PDFs | 10,700+ |
| Models benchmarked | 9 |
| Benchmark questions | 100 across 16 categories |
| Default model relevance | 0.94 |
User Query → Next.js Frontend → FastAPI Backend
│
Query Classifier
│
Model Router → Qwen2.5-7B (Together.ai, default)
Qwen3-8B (Fireworks, fallback)
Qwen3-32B (OVHcloud, free tier)
│
Retriever
/ \
Vector RAG GraphRAG
\ /
pgvector + PostgreSQL
│
Document Chunks (embedded)
│
Parser / Normalizer
│
Crawler
│
Toronto.ca (all sections)
- PostgreSQL as the single store — raw docs, parsed content, embeddings (pgvector), graph data, eval results. No separate vector database.
- Open-source models only — Qwen family via Together.ai (default), Fireworks.ai (fallback), and OVHcloud (free tier). Nomic Embed v1.5 for embeddings. No OpenAI or Google dependencies.
- Heading-aware chunking — HTML pages are split at section headings (h2/h3), producing focused chunks that match better than whole-page embeddings.
- Full Toronto.ca coverage — all sections crawled, not just building permits.
- 5-layer prompt injection hardening — input sanitization, sandwich defense, output validation, context sanitization, and suspicious query logging.
| Component | Technology |
|---|---|
| Frontend | Next.js 16, React 19, Tailwind CSS |
| API | FastAPI (Python) |
| Database | PostgreSQL 16 + pgvector |
| LLM | Qwen2.5-7B-Instruct-Turbo via Together.ai (default, ~2.4s), Qwen3-8B via Fireworks.ai (fallback), Qwen3-32B via OVHcloud (free tier) |
| Embeddings | Nomic Embed v1.5 via Fireworks.ai (768 dim) |
| PDF parsing | PyMuPDF |
| Observability | Langfuse (LLM tracing) |
| Hosting | PM2, Apache |
| Package management | uv (Python), pnpm (Node) |
/apps
/web # Next.js frontend — chat UI, research blog, admin dashboard
/packages
/crawler # Toronto.ca crawler with sitemap-based re-crawl
/parser # HTML/PDF parsing, content classification, heading-aware chunking
/indexer # Embedding generation, pgvector storage and search
/query-engine # FastAPI API, query classification, retrieval, answer generation
/graphrag # Entity/relationship extraction, graph building
/evals # 100 benchmark prompts, signal scorer, LLM-as-judge scorer
/llm-router # Provider abstraction (Together, Fireworks, OVHcloud, Claude)
/shared # Database connection, config, SQLAlchemy ORM models
/db
/migrations # PostgreSQL schema (pgvector, 9 tables)
/scripts
manage.py # Unified crawl management CLI (see below)
eval.py # Benchmark evaluation suite
compare_models.py # Multi-model comparison with CSV export
build_graph.py # Build knowledge graph from chunks
pipeline.py # Full data pipeline (crawl + parse + embed)
recrawl.py # Smart re-crawl from sitemap
crawl_gaps.py # Crawl missing HTML pages
download_pdfs.py # Download uncrawled PDFs by category
gap_analysis.py # Coverage gap analysis
warm_cache.py # Pre-populate semantic query cache
provider_comparison.py # 24-hour latency test across providers
deploy-webhook.js # GitHub webhook for auto-deploy
All crawl operations are available through a unified CLI:
# System overview — stats, gaps, coverage
uv run python scripts/manage.py status
# Full pipeline — crawl + parse + embed + cleanup
uv run python scripts/manage.py pipeline
uv run python scripts/manage.py pipeline --skip-crawl # parse + embed only
uv run python scripts/manage.py pipeline --dry-run # show stats only
# Smart re-crawl — only pages changed since last crawl
uv run python scripts/manage.py recrawl
uv run python scripts/manage.py recrawl --since 7 # last 7 days
uv run python scripts/manage.py recrawl --section recreation
uv run python scripts/manage.py recrawl --dry-run
# Fill gaps — crawl and parse missing HTML pages
uv run python scripts/manage.py gaps
uv run python scripts/manage.py gaps --dry-run
# Download PDFs by category
uv run python scripts/manage.py download --categories guides forms policies building
uv run python scripts/manage.py download --all --dry-run
# PDF analysis — breakdown of uncrawled PDFs by category
uv run python scripts/manage.py pdfs
uv run python scripts/manage.py pdfs --samples 5
# Pre-populate query cache — instant responses for known questions
uv run python scripts/manage.py warmup # all 130 questions
uv run python scripts/manage.py warmup --homepage # 30 homepage questions
uv run python scripts/manage.py warmup --benchmark # 100 benchmark questions
uv run python scripts/manage.py warmup --dry-run # previewThe web admin dashboard at /admin provides the same stats plus pipeline controls, PDF download by category, and gap analysis.
Answers are cached using pgvector semantic similarity. Repeat and similar questions (cosine > 0.95) return cached answers in ~500ms instead of 3-5s.
- Cache auto-invalidates when content changes (chunk hash mismatch after re-crawl)
- 7-day TTL with
cleanup_cache()function - Skipped when model override is specified (benchmark runs)
- Warm the cache after deploys:
uv run python scripts/manage.py warmup
The benchmark suite includes 100 prompts across 16 categories covering all sections of Toronto.ca.
# Run against default model (Qwen2.5-7B via Together.ai)
uv run python scripts/eval.py
# Run against all 9 Fireworks models
uv run python scripts/eval.py --all-models
# Quick test — 3 questions across small models
uv run python scripts/eval.py --small -n 3
# With LLM-as-judge scoring
uv run python scripts/eval.py --all-models --judge
# Graph-enhanced mode
uv run python scripts/eval.py --mode graph --judgeAvailable models: gpt-oss-120b, kimi-k2.5, glm-5, deepseek-v3.2, mixtral-8x22b, glm-4.7, deepseek-v3.1, qwen3-8b, llama3.3-70b
| Model | Relevance | Citation | Latency | Errors |
|---|---|---|---|---|
| Qwen3-8B | 0.94 | 0.80 | 5.7s | 0 |
| Kimi K2.5 | 0.89 | 0.87 | 18.7s | 2 |
| GLM-5 | 0.88 | 0.82 | 8.0s | 0 |
| DeepSeek v3.2 | 0.88 | 0.81 | 7.9s | 3 |
| DeepSeek v3.1 | 0.88 | 0.82 | 9.6s | 0 |
| Mixtral 8x22B | 0.86 | 0.81 | 4.6s | 0 |
| GLM-4.7 | 0.86 | 0.84 | 17.2s | 0 |
| Llama 3.3 70B | 0.86 | 0.81 | 3.6s | 0 |
| GPT-oss 120B | 0.82 | 0.82 | 8.9s | 0 |
Qwen3-8B (8B parameters) outperformed GPT-oss 120B (120B parameters) by 15% on relevance. See the full analysis.
5-layer prompt injection hardening:
- Input sanitization — 34 regex patterns strip injection attempts from user queries
- Sandwich defense — security rules at top and bottom of system prompt
- Output validation — catches prompt leakage and role-breaking in LLM output
- Context sanitization — strips LLM-framing markers from crawled content
- Suspicious query logging — flags and logs queries with multiple injection indicators
Plus: rate limiting (20/min per IP), CORS allowlist, admin auth with timing-safe comparison, model allowlist validation, query mode regex validation.
Published at zeever.ca/research:
- The Missing Layer: AI Inference in Canada
- 100 Questions Across Toronto.ca
- Scaling a RAG Pipeline to 35,000 Documents
- How We Built an LLM-as-Judge
- Comparing 7 Open-Source Models
- Fixing Vector Search
- Vector RAG vs GraphRAG
- 24-Hour Provider Latency Comparison
- Python 3.12+
- Node.js 18+
- Docker (for PostgreSQL)
- Fireworks.ai API key
- uv and pnpm
git clone <repo-url>
cd zeever_ca
cp .env.example .env
# Edit .env — add your FIREWORKS_API_KEY
# Install dependencies
uv sync
pnpm install
# Start PostgreSQL
docker compose up -d
# Run the full pipeline
uv run python scripts/manage.py pipeline
# Start the API
uv run uvicorn query_engine.api:app --port 3034
# Start the frontend (separate terminal)
cd apps/web && pnpm dev
# Open http://localhost:3033Built by Colin Smillie
We welcome contributions! See CONTRIBUTING.md for setup instructions, development workflow, and PR guidelines.
To report a vulnerability, see SECURITY.md. Do not open a public issue for security vulnerabilities.