Last updated: 2026-05-01 Active workstream: Hardening pass ✅ complete through Phase 6 (see
docs/hardening-roadmap-2026-04-16.md) Paused workstream: V1 product backlog — Phase 7.2 / 7.5 / 7.6 / 7.7 (seedocs/roadmap.md)
This file is the single source of truth for /session-start task selection. Phased lists are ordered by dependency. The first unchecked [ ] item is the next task.
Full item specs (Problem, Fix, Files, Acceptance, Effort) live in docs/hardening-roadmap-2026-04-16.md. GitHub Issues are filed per item for tracking.
- H-1 #5 Sandbox external content in LLM prompts — M — no deps ✅ 2026-04-19
- H-2 #6 Populate
retentionExpiresAtat PII collection time — M — no deps ✅ 2026-04-30 - H-3 #7 Stop logging raw PII to stdout — S — no deps ✅ 2026-04-30
- H-4 #8 Upgrade Anthropic default model to
claude-sonnet-4-6— S — no deps ✅ 2026-04-30 - H-5 #9 Replace hand-rolled config validator with Zod — S — no deps ✅ 2026-04-30
- H-10 #10 Stable sort for GitHub repo selection — S — no deps ✅ 2026-04-30
- H-6 #11 Zod-parse checkpoint and intake-context deserialization — S — needs #9 ✅ 2026-04-30
- H-11 #12 Zod-parse external API responses — M — needs #9 ✅ 2026-05-01
- H-7 #13 Real token-usage accounting — M — no deps (pairs with E-2) ✅ 2026-04-30
- H-8 #14 Fix malformed SearchConfig in budget gate — S — no deps ✅ 2026-05-01
- H-9 #15 Penalize the score on hallucinated IDs — S — design-decision: soft proportional + 0.15 floor, surfaced to user, no cap ✅ 2026-04-30
- H-1 follow-up #18 Run behavioral adversarial eval with real LLM — narrative paraphrase directive — ✅ 2026-04-30 (5/5 defended)
- E-2 Structured logging & run telemetry — M — pairs with #7 ✅ 2026-05-01
- E-4 Versioned prompt registry — S — no deps ✅ 2026-05-01
- H-12 #16 Grow scoring-package test coverage — M — needs #5, #15 ✅ 2026-05-01
- H-13 #17 Document plaintext-PII-at-rest posture — S — no deps ✅ 2026-05-01 (encryption follow-up #21)
- E-3 Cache-driven replay mode — S–M — needs E-4 ✅ 2026-05-01
- E-1 Golden-set evaluation harness — L — needs E-2 ✅ 2026-05-01
- E-5 Opus-4.7 / 1M-context batch scoring spike — M + L — recommendation: continue as experimental, do not make default without live-run lift ✅ 2026-05-01
Minimum-viable hardening pass: Phase 1 + H-5 + H-7. Closes every High-severity finding plus the most important Medium in 2–3 sessions.
Full plan in docs/roadmap.md. Resumed after hardening lands.
- 7.2 Post-discovery expansion (
find_similar) — bounded recursion on top-scoring candidates - 7.5 Premium adapters —
adapter-pearch,adapter-pdl,adapter-contactout - 7.6
output-sheets— Google Sheets adapter (deferred from Phase 6, OAuth complexity) - 7.7 Advanced intake — competitor mapping, anti-pattern filtering
- Phases 1–6 + 7.1 / 7.3 / 7.4 (2026-04-06) — core pipeline, budget estimation, non-interactive mode, run management. See
docs/roadmap.mdfor details. - 2026-04-16 audit — full-repo security/privacy/correctness sweep. Output:
docs/hardening-roadmap-2026-04-16.md.