This roadmap documents the evolution of the Research Project Template
infrastructure. Architecture details:
architecture.md and
workflow.md.
Last verified: 2026-06-13 (package and latest published release at
v3.4.0). Measured metrics defer to
TO-DO.md and
docs/_generated/COUNTS.md
unless this file is re-measured.
- published the
v3.4.0release and resolved the prior pending-tag gap - rebaselined public exemplar scope to nine tracked templates under
projects/templates/ - closed the v3.3 follow-up backlog sweep and moved live measured facts into
generated docs and
TO-DO.md - see
CHANGELOG.mdfor the full entry
- completed Pandoc DOCX rendering so output embeds figures and resolves
cross-references (
infrastructure/rendering/pipeline.py) - ran a deep per-exemplar quality pass across the eight tracked public templates and completed + cross-linked sidecar publication metadata for all nine
- reconciled the generated project-scope collection count in
docs/_generated/COUNTS.mdto 216 - committed refreshed rendered
output/artifacts for the public exemplars alongside source so the repo ships reproducible, inspectable deliverables - see
CHANGELOG.mdfor the full entry
- added reference-existence verification
(
infrastructure/reference/verification) — a deterministic anti-hallucination gate resolving each cited reference against Crossref → OpenAlex / arXiv (offline-first, opt-in live, SQLite cache) - added an AI-writing fingerprint detector
(
infrastructure/validation/content/ai_writing.py,validation.cli prose-quality) - shipped the evidence graph (EVIDENCE-GRAPH-1), reproduction bundle (REPRO-BUNDLE-1), release-readiness dashboard (DASHBOARD-1), pipeline plugin stages (PLUGIN-STAGES-1), and incremental pipeline skipping (INCREMENTAL-PIPELINE-1) — all opt-in, default plan unchanged
- parallelized CI infrastructure tests (
pytest-xdist -n auto) and derived thetest-projectmatrix dynamically frompublic_scope(CI-MATRIX-DYNAMIC-1) - quieted terminal logging (LOG-CLEAN-1), consolidated the safe markdown reader
(READFILE-SAFE-1), and ran documentation-accuracy passes across
docs/and infrastructure{SKILL,README,AGENTS}.md - see
CHANGELOG.mdfor the full entry
- added optional Pandoc-backed DOCX/EPUB rendering with per-format toggles (default off)
- extended the Active Inference validation spine with producer completeness, stale-artifact checks, and dependency graph v2
- hardened the public SIA harness boundary and re-baselined coverage gaps
- closed GitHub supply-chain hygiene (SHA-pinned actions,
actionlintgate, guarded Dependabot automerge) and the XML-parser policy - see
CHANGELOG.mdfor the full entry
- added the sixth public exemplar,
projects/templates/template_sia/, plus the reusableinfrastructure.siaharness and validation docs - promoted the first Active Inference validation-spine tracks:
provenance,reproducibility, andcounterexample - refreshed public project signposting to
projects/templates/...and checked folder-levelAGENTS.md/README.mdcoverage across public exemplars - hardened public project coverage orchestration by pinning subprocess coverage behavior across project virtual environments
- shipped curated release notes and a
3.1.0metadata bump
Roster note (post-
v3.1.0): the git-tracked public exemplar set underprojects/templates/has since grown beyond the six referenced above —template_autoscientists,template_newspaper, andtemplate_textbookwere added (each double-published as a standalone GitHub repo + Zenodo DOI), bringing the current public exemplar roster to nine. The authoritative, always-current count and names live indocs/_generated/active_projects.mdanddocs/_generated/COUNTS.md— consult those rather than this historical release log.
- mypy strict adopted as the baseline gate for
infrastructure/(live counts inTO-DO.md) - Ruff format enforcement
- Security hardening: Bandit MEDIUM+ gate in CI; pip-audit blocking since v0.7.2
(ignore list + retries — see
CHANGELOG.md) - Dockerfile modernised to
python:3.12+uv
| Release | Theme |
|---|---|
v2.0.0 |
Two-layer architecture, thin orchestrator pattern, declared DAG pipeline, multi-project support |
v2.1.0 |
Unified intelligent logging — ProjectLogger, structured format, log_operation(), format_duration() |
v2.1.1 |
CI Zero-Mock gate (verify_no_mocks.py); mock/fake patterns eliminated from suite |
v2.2.0 |
Orchestration hermeticity — script discovery, get_subprocess_env(), hermetic subprocess env |
v2.3.0 |
Type safety — TypedDicts for config, ResolvedTestingConfig, ProjectInfo dataclass |
v2.4.0 |
Test isolation — real tmp_path + env-isolation fixtures (pytest's monkeypatch fixture remains permitted by the no-mocks policy and is still used for boundary substitution and env isolation; it was not eliminated) |
v2.5.0 |
Structured log assertions — caplog-based, log_parser.py |
v2.6.0 |
Ruff lint remediation: 710 → 0 errors across infrastructure/, scripts/, tests/ |
v2.7.0 |
Type narrowing & mypy baseline: 100 → 0 errors across core/ |
v2.8.0 |
Error reporting & resilience — typed InfraError constants, standardized error format |
v2.9.0 |
Documentation parity — python3 → uv run python, auto-generated API reference |
161-commit systematic blind-review campaign across all infrastructure packages:
- Import hygiene (unused imports,
sys.pathmutations,TYPE_CHECKINGguards) - Exception narrowing (specific types, context restoration, no silent swallowing)
- Dead-code removal (
coverage_reporter.py, stub wrappers, passthrough methods) - Type annotations modernised (legacy
typing→ built-in generics) - API surface consolidation (
OllamaClientConfig,PerformanceMetrics,ProjectLogger) - Bug fixes (inverted bool, stall detection, path bugs, broken imports)
- Structural: eliminated
core.pyhub, extracted_build_stage_list, broke circular dep - Logging noise reduction; docstring bloat; test name collisions resolved
The active planning surface is TO-DO.md. It is intentionally
more specific than this roadmap and now groups work as Minor, Medium,
and Major.
- keep
TO-DO.md, generated facts, release metadata, and public-scope docs synchronized after each release or major verifier pass - keep forkability checks and regression pins green, finish the active-inference fixed-point cluster, and only then add new scientific surface
The GitHub supply-chain hygiene items (SHA-pinned actions,
actionlint, safe Dependabot automerge) shipped inv3.2.0; terminal logging cleanup (LOG-CLEAN-1), the safe markdown-read helper (READFILE-SAFE-1), and the dynamic CI project matrix (CI-MATRIX-DYNAMIC-1) shipped inv3.3.0. See Completed Releases above.
The authoritative open backlog lives in TO-DO.md. Current
themes are verifier-first maintenance items:
- Active-inference semantic fixed point (
AI-SEMANTIC-FIXPOINT-1): finish the variable-generation and composed sheaf/roadmap cluster before expanding scientific scope. - Active-inference gate runtime (
AI-GATE-PERF-2): reduce redundant slow refreshes only after the semantic fixed-point and negative controls are green. - AutoResearch report and benchmark ergonomics (
AR-REPORT-ERGONOMICS-1,AR-BENCHMARK-ERGONOMICS-1): clarify reviewer-facing evidence and benchmark boundaries before adding new evidence types. - AutoResearch source-ledger contract (
AR-SOURCE-LEDGER-2): promote claim/source drift checks into project-local gates.
The prior vision list — evidence graph (EVIDENCE-GRAPH-1), incremental pipeline (INCREMENTAL-PIPELINE-1), plugin architecture (PLUGIN-STAGES-1), hermetic release bundles (REPRO-BUNDLE-1), and local dashboard (DASHBOARD-1) — all shipped in
v3.3.0(opt-in/default-off). See Completed Releases above. The live follow-up work for these capabilities is tracked in the open backlog above and inTO-DO.md; new long-horizon vision items should replace this note as they are defined.
Use TO-DO.md as the authoritative backlog and live snapshot.
The current open top items are:
AI-SEMANTIC-FIXPOINT-1AI-GATE-PERF-2AR-REPORT-ERGONOMICS-1AR-BENCHMARK-ERGONOMICS-1AR-SOURCE-LEDGER-2
Shipped and not re-tracked here: the GitHub supply-chain hygiene set
(GH-PIN-1, GH-ACTIONLINT-1, GH-AUTOMERGE-1), LOG-CLEAN-1,
READFILE-SAFE-1, CI-MATRIX-DYNAMIC-1, FMT-BUNDLE-1, AI-SPINE-V2,
COVERAGE-REBASE-1, and the five v3.3.0 Major capabilities
(EVIDENCE-GRAPH-1, INCREMENTAL-PIPELINE-1, PLUGIN-STAGES-1,
REPRO-BUNDLE-1, DASHBOARD-1), plus pip-audit blocking CI, Bandit LOW triage,
docs-lint CI, the per-project test runner, and SIA public-exemplar promotion. See
CHANGELOG.md for the release history.
Authoritative counts and gate outputs live in the Live state snapshot
table in TO-DO.md (re-baseline there after substantive
changes). This roadmap avoids duplicating numbers that drift between audits.
| Topic | Where to verify |
|---|---|
| mypy / ruff / Bandit / pip-audit / health | TO-DO.md |
| CI wiring | .github/workflows/ci.yml, .github/AGENTS.md |
| Coverage gaps | coverage-gaps.md |
- Contributing — how to contribute to the template
../../TO-DO.md— active backlog with acceptance criteria../../CHANGELOG.md— historical release notes