Transition Risk Scout automates transition-climate-risk due diligence by coupling a FastAPI research pipeline with a Next.js dashboard. The backend orchestrates LLM query generation, Tavily search, evidence extraction, and report synthesis, forming a retrieval-augmented generation (RAG) loop that keeps model outputs grounded in sourced evidence, while the frontend streams structured progress events to analysts in real time.
- End-to-end research pipeline spanning query generation, web/PDF ingestion, markdown evidence routing, and OpenAI-backed synthesis into an ISSB/TCFD-aligned JSON report.
- Live progress telemetry via server-sent events (SSE) that surface stage headlines, document conversion progress, artifacts, and ticker updates in the UI.
- Evidence-first deliverables including structured JSON, raw model output, a consolidated evidence dossier, and searchable appendices saved per run for auditability.
- Recovery-friendly storage where each company assessment is timestamped under
backend/data/<slug>/<timestamp>/transition, making the latest outputs easy to reload or diff.
Transition Risk Scout implements a classic RAG pattern to tether large language model reasoning to vetted evidence:
- Query ideation – an LLM fabricates company-specific probes that seed the retrieval step.
- Document retrieval – Tavily search fans out across the web and PDFs, ranking identity vs. transition-domain sources.
- Evidence distillation – fetched documents are converted to markdown and routed into domain buckets with citations.
- Grounded synthesis – the OpenAI report prompt ingests the curated evidence and emits JSON summaries with inline references.
This loop runs end-to-end for each assessment so every recommendation remains traceable back to the origin documents captured in evidence.md and the artifact appendices.
.
├── backend
│ ├── app.py # FastAPI entrypoint + pipeline orchestration
│ ├── prompts/ # LLM prompt templates (report, scenarios, queries)
│ └── src/ # Pipeline helpers (search, crawling, evidence, utils)
├── frontend
│ ├── app/ # Next.js app router
│ ├── components/ # UI building blocks (ReportCards, ProgressPanel, etc.)
│ ├── hooks/ # Custom hooks (SSE consumer, toast)
│ └── lib/ # API client, slug helpers
├── todo.md # Current roadmap and polish tasks
└── progress_sse_implementation.md # Design doc for the Buzzline progress feature
The pipeline stages mirror the BuzzObserver events exposed to the UI:
scope– establish output folders and slug.aggregate_sources– generate LLM queries and parallel Tavily search.rank_filter– score and prioritize URLs (PDF vs. web).convert_docs– fetch HTML/PDF to markdown with optional table extraction.synthesize– route evidence snippets into transition-specific buckets.model_risk– render the final JSON report with inline citations.finalize– persist artifacts and emit wrap-up tickers.
- Python 3.11+
- Node.js 18+ (Next.js 14 requirement)
- npm 9+ or pnpm/yarn equivalent
- Tavily API key and OpenAI API key for search + report generation
Install Python dependencies with pip (see backend/requirements.txt) and JavaScript dependencies via npm install in frontend/.
| Variable | Required | Purpose | Default |
|---|---|---|---|
OPENAI_API_KEY |
✅ | Auth for OpenAI responses API used in report generation |
– |
OPENAI_MODEL |
❌ | Override the OpenAI model name | gpt-5-mini |
TAVILY_API_KEY |
✅ | Enables Tavily web search for evidence gathering | – |
FETCH_TIMEOUT_MS |
❌ | Timeout for markdown fetcher | 45000 |
FETCH_MARKDOWN_WORKERS |
❌ | Concurrency for markdown fetch | 6 |
FETCH_POLITENESS_SECONDS |
❌ | Delay between fetches to reduce bot blocks | 0.00 |
TAVILY_DEPTH |
❌ | Tavily search depth (basic or advanced) |
basic |
TAVILY_MAX_WORKERS |
❌ | Parallel Tavily worker cap | 12 |
TAVILY_QPS |
❌ | Queries-per-second limiter for Tavily calls | 8 |
PORT |
❌ | Backend serve port when running python app.py --serve |
8000 |
NEXT_PUBLIC_API_BASE |
✅ (frontend) | URL the Next.js client should target for API calls | http://localhost:8000 |
Store secrets in backend/.env and frontend/.env.local (both git-ignored).
-
Clone the repo
git clone <repo-url> cd climate-risk
-
Backend setup
cd backend python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install -r requirements.txt cp .env.example .env # if you maintain one; otherwise create it # set OPENAI_API_KEY and TAVILY_API_KEY in .env python app.py --serve --host 0.0.0.0 --port 8000
- Alternatively run
uvicorn app:api --reload --host 0.0.0.0 --port 8000while in thebackend/directory.
- Alternatively run
-
Frontend setup
cd ../frontend npm install echo "NEXT_PUBLIC_API_BASE=http://localhost:8000" > .env.local npm run dev
- Visit
http://localhost:3000to open Transition Risk Scout.
- Visit
- Enter a company name and optional tuning parameters (
per_q,pdf_cap). - Click Generate report to start the pipeline.
- Watch progress in the Buzzline panel—stage chips, determinate doc conversion meter, ticker, and artifact list update live.
- When streaming completes the app fetches the latest report/evidence from the backend and renders:
- Narrative report cards with citation jump links.
- Raw JSON view (downloadable as
report.json). - Evidence markdown rendered in a reader-friendly panel.
cd backend
python app.py "Company Name" --per-q 2 --pdf-cap 5Outputs land in backend/data/<slug>/<timestamp>/transition with:
queries.json,search_results.json- Markdown per source +
search_results_appendix.md evidence.mdreport.jsonandreport_raw.txt
Kick off the pipeline synchronously and return the resulting JSON payload.
curl -X POST http://localhost:8000/api/report \
-H "Content-Type: application/json" \
-d '{"company": "Example Corp", "per_q": 2, "pdf_cap": 5}'Response: { company, run_dir, report, artifacts } with paths to saved files.
Streams structured SSE events while the pipeline runs.
GET /api/report/stream?payload={"company":"Example Corp","per_q":2,"pdf_cap":5}
Event types include stage, progress, metric, ticker, artifact, error, and done. Heartbeat comments are emitted every 10 seconds to keep connections alive.
GET /api/runs/{company_slug}/latest/report.jsonGET /api/runs/{company_slug}/latest/evidence.md
The frontend uses these endpoints to recover results if the SSE stream drops.
- Markdown ingestion (
src/markdown_parallel.py) enriches PDFs with auto-extracted tables when available. src/evidence_md.pyroutes snippets into transition-specific buckets (targets, capex alignment, policy engagement, etc.).prompts/esg_report.txtenforces anti-hallucination rules and structured JSON output with inline[n]citations.- The React client’s
SourceAwareTextcomponent renders citations as clickable superscripts tied to the sources map.
- Pipeline helpers live under
backend/src/(search, crawling, utils, chains, evidence extraction). - Progress streaming is coordinated via
BuzzObserverinbackend/app.pyand rendered throughProgressPanelon the frontend. todo.mdtracks prioritized UX improvements (download buttons, expanded inputs, artifact exposure).- The
progress_sse_implementation.mddesign doc explains the Buzzline contract if you extend telemetry.
- Missing API keys: The pipeline will short-circuit if
OPENAI_API_KEYorTAVILY_API_KEYis absent—double-check.envfiles. - SSE disconnects: The frontend automatically retries by querying the latest artifacts; ensure the backend run directory is writable.
- PDF parsing issues: Adjust
FETCH_TIMEOUT_MS,FETCH_POLITENESS_SECONDS, or disable table extraction by settingFETCH_PDF_TABLES=0if large PDFs cause timeouts. - Rate limits: Tune
TAVILY_QPSor reduceper_qto stay within plan quotas.
Review todo.md for upcoming features such as additional pipeline tunables, improved artifact downloads, and UI polish.
Released under the MIT License (see LICENSE).
- Add license (MIT included).
- Confirm no secrets are committed. Keep keys in
backend/.envandfrontend/.env.local(both are git-ignored). - Use the new
ALLOW_ORIGINSenv in the backend to restrict CORS to your domains in production. - Ensure
backend/data/(outputs) is git-ignored (already configured). - Optionally add a short demo GIF/screencast to this README.
- Backend: copy
backend/.env.exampletobackend/.envand setOPENAI_API_KEY,TAVILY_API_KEY, and (in production)ALLOW_ORIGINSto a comma-separated list of allowed origins. - Frontend: copy
frontend/.env.local.exampletofrontend/.env.localand setNEXT_PUBLIC_API_BASEto your backend URL.
- Show the end-to-end run: entering a company, live progress (SSE), and the final report with citations and evidence.
- Use conservative parameters (e.g.,
per_q=1,pdf_cap=3) to keep runtime short. - Consider pre-generating a run and using the "Load latest" option for a quick view.
This project is for research and demonstration purposes only and does not constitute investment advice. Model outputs rely on public sources and may be incomplete or out of date.