feat: milestones v1.1–v1.5 — adapters, observability, testing, streaming by RichardHightower · Pull Request #4 · SpillwaveSolutions/agent-cron

RichardHightower · 2026-02-23T22:51:39Z

Summary

v1.1 — Adapter framework: config hot-reload with ArcSwap, custom adapter TOML definitions, CLI flag fixes for all 5 adapters
v1.2 — Job observability: history/stats CLI, log access (logs show/logs tail), live monitoring (watch dashboard), alerting & webhook notifications, follow-log path alignment
v1.3 — Adapter verification: comprehensive command construction tests for Claude, OpenCode, Gemini, Codex, Copilot
v1.4 — End-to-end testing: test harness infrastructure, lifecycle/failure/concurrency/integration/streaming/webhook/hotreload E2E suites (76 tests)
v1.5 — Multi-CLI integration testing: CLI discovery with capability matrix, workspace isolation, skip macros, smoke test planning

130 files changed, ~23k lines added across Rust source, E2E tests, and planning docs. All 269 tests pass (189 unit + 76 E2E + 4 doc-tests).

Test plan

cargo test — 269 passed, 0 failed
E2E suite — 76 integration tests covering job lifecycle, failure modes, concurrency, streaming, webhooks, hot-reload
CLI binary tests — 14 tests (help, version, completions, error cases)
All 5 CLIs detected in discovery pre-flight (claude, opencode, gemini, codex, copilot)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <[email protected]>

…bing, and LazyLock caching - CliStatus/CliDiscovery structs for 5 CLIs (claude, opencode, gemini, codex, copilot) - probe_binary uses which + --help (sync std::process::Command, not tokio) - probe_auth with per-CLI strategies (subcommand, env var, or both) - LazyLock DISCOVERY static prints pre-flight summary on first access - Version extraction from --help/--version output Co-Authored-By: Claude Opus 4.6 <[email protected]>

…caching - cli_capabilities.toml with 5 CLI configs (hooks, auto-approve, prompt delivery) - CliCapability struct with Deserialize + convenience methods - load_capabilities() uses include_str! for compile-time embedding - CAPABILITIES LazyLock for once-per-process loading Co-Authored-By: Claude Opus 4.6 <[email protected]>

- SUMMARY.md with execution metrics and self-check - STATE.md updated with position, decisions, and session info Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Fake HOME + XDG_CONFIG_HOME prevents CLIs from reading real user config - Git-initialized workspace with user identity for CLI compatibility - apply_env helpers for std::process and tokio::process Commands - No std::env::set_var -- all overrides via Command::env() only Co-Authored-By: Claude Opus 4.6 <[email protected]>

… wire all Phase 29 modules - Add SKIP_LOG global counter and record_skip() to cli_discovery.rs - Create test_discovery.rs with 11 tests: discovery, capabilities, workspace, skip macros, summary - Wire cli_discovery, cli_capabilities, cli_workspace, test_discovery into e2e.rs - Fix pre-existing test_extract_version_none bug (test string contained "version") - require_cli!, require_cli_auth!, require_capability! macros with skip recording Co-Authored-By: Claude Opus 4.6 <[email protected]>

- 29-02-SUMMARY.md with metrics, decisions, deviation documentation - STATE.md updated: Phase 29 complete (2/2 plans), ready for Phase 30 Co-Authored-By: Claude Opus 4.6 <[email protected]>

5/5 requirements done (DISC-01..04, FAIL-08). 17/17 must-haves verified. 462 tests passing (386 unit + 76 e2e), zero regressions. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Enable research step in GSD workflow config and remove completed cch→rulez renaming todo that is no longer relevant. Co-Authored-By: Claude Opus 4.6 <[email protected]>

…uction - RealCliHarness wraps TestHarness with real GenericCliAdapter from CliAdapterConfig builtins - Supports all 5 CLIs: claude, opencode, gemini, codex, copilot - Provides build_real_registry() for constructing AdapterRegistry with real adapter Co-Authored-By: Claude Opus 4.6 <[email protected]>

…st binary - Add real_cli_harness to Phase 29 infrastructure section - Add test_smoke placeholder module for Phase 30 smoke tests - Both modules compile as part of the e2e test binary Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Created 30-01-SUMMARY.md with execution results - Updated STATE.md position to Phase 30 plan 1/2 Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Three async helper functions: smoke_echo, smoke_file_creation, smoke_model_flag - Module-local skip macros: require_cli!, require_cli_auth!, require_capability! - Structural-only assertions (never assert on AI output content) Co-Authored-By: Claude Opus 4.6 <[email protected]>

- SMOK-01: 5 echo round-trip tests (claude, opencode, gemini, codex, copilot) - SMOK-02: 5 file creation tests verifying marker file written to disk - SMOK-03: 5 model flag passthrough tests verifying history entry records model - zzz_smoke_skip_summary test prints accumulated skip table - All 15 tests are #[ignore] and gated by require_cli_auth! macro Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Phase 30 complete: RealCliHarness infrastructure + 15 smoke tests (5 CLIs x 3 requirements: echo, file creation, model flag). All 8/8 must-haves verified. 77 e2e tests pass, 15 ignored (real CLI). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

- 5 FAIL-05 tests: missing binary produces Crashed state for all 5 CLI adapters - 5 FAIL-06 tests: auth failure produces Failed state for all 5 CLI adapters - Shared helpers: adapter_config_for(), fail05_missing_binary(), fail06_auth_failure() - Module registered in e2e.rs Co-Authored-By: Claude Opus 4.6 <[email protected]>

…dapters - 5 FAIL-07 tests: SIGTERM-resistant script with 2s timeout produces Timeout state - Proves SIGTERM->SIGKILL escalation: elapsed >= 3s (timeout + grace period) - Uses trap '' TERM with busy-wait pattern to resist SIGTERM - All 15 tests pass together, full suite has no regressions Co-Authored-By: Claude Opus 4.6 <[email protected]>

- SUMMARY.md with 15 per-adapter failure mode tests across FAIL-05/06/07 - STATE.md updated: Phase 31 complete, metrics recorded Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Add AGCRON_SKIP::{cli}::{reason} stdout marker in record_skip() for report parsing - Add quick-junit = "0.5" to [dependencies] for JUnit XML generation - Add [[bin]] target for test-report binary Co-Authored-By: Claude Opus 4.6 <[email protected]>

…nit output - Parse cargo test JSON line-by-line for test results - Detect skips via AGCRON_SKIP:: stdout marker (not just event status) - Generate test-results.json with CLI x scenario matrix (REPT-01) - Generate colored terminal table with per-CLI tallies (REPT-02) - Generate test-results.xml JUnit XML with one suite per CLI (REPT-03) - Dynamic discovery of CLIs and scenarios from test output - 15 unit tests covering parsing, skip detection, and output generation Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

- GitHub Actions workflow with 3AM UTC cron schedule and manual dispatch - Per-CLI API key secrets (Anthropic, OpenAI, Gemini, GitHub) - Nightly Rust toolchain for --format json test output - JUnit report via mikepenz/action-junit-report with test-results.xml - Matrix summary written to GITHUB_STEP_SUMMARY and stdout - Artifact upload on failure (JSON, XML, logs, 14-day retention) - Fork guard prevents runs on non-SpillwaveSolutions repos - Copilot browser OAuth skip documented in header comments Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Summary documenting workflow creation and verification - STATE.md updated: Phase 32 complete, v1.5 milestone complete Co-Authored-By: Claude Opus 4.6 <[email protected]>

…verified Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Closes critical audit gap: Phase 31's 15 failure tests lacked #[ignore], excluding them from CI pipeline's --ignored run. Adds Phase 33 to restore FAIL-05/06/07 and CIPL-04 coverage in nightly CI. Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Annotated 5 FAIL-05 (missing binary) tests with #[ignore] - Annotated 5 FAIL-06 (auth failure) tests with #[ignore] - Annotated 5 FAIL-07 (timeout SIGKILL) tests with #[ignore] - Tests now visible to CI pipeline's cargo test -- --ignored run Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Created 33-01-SUMMARY.md with execution results - Updated STATE.md with phase 33 position and decisions Co-Authored-By: Claude Opus 4.6 <[email protected]>

- REQUIREMENTS.md: check all v1.5 requirement boxes (were unchecked despite Done in traceability) - ROADMAP.md: update v1.5 progress row (was TBD/In progress, now 8 plans/Complete/2026-03-05) - STATE.md: fix velocity metrics (v1.5 not v1.4, 8 plans, ~2.5min avg not ~12min) - PROJECT.md: v1.5 now shown as completed, not current/starting - MILESTONES.md: add v1.5 section - v1.5-MILESTONE-AUDIT.md: status passed (was gaps_found), GAP-1 closed by Phase 33 - Remove empty phases/18-e2e-test-infrastructure directory Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Archive ROADMAP.md → milestones/v1.5-ROADMAP.md - Archive REQUIREMENTS.md → milestones/v1.5-REQUIREMENTS.md - Move MILESTONE-AUDIT.md → milestones/v1.5-MILESTONE-AUDIT.md - Delete REQUIREMENTS.md (fresh for next milestone) - Collapse ROADMAP.md v1.5 into <details> block - Evolve PROJECT.md: v1.5 requirements validated, key decisions added - Update STATE.md: milestone complete, ready for next Co-Authored-By: Claude Opus 4.6 <[email protected]>

RichardHightower and others added 30 commits February 22, 2026 19:51

docs: start milestone v1.5 Multi-CLI Integration Testing

e456749

docs: v1.5 research — multi-CLI integration testing

8b36170

docs: define milestone v1.5 requirements (17 requirements)

b3a65b5

docs: create milestone v1.5 roadmap (4 phases, 17 requirements)

a84895c

docs(29): capture phase context

3d2bb70

docs(29): research CLI discovery and test harness phase

7a9ab2f

Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(29): create phase plan

bfaa780

fix(29): revise plans based on checker feedback

940f2eb

docs(29-01): complete CLI discovery and capability matrix plan

b7e76fb

- SUMMARY.md with execution metrics and self-check - STATE.md updated with position, decisions, and session info Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(29-02): complete workspace isolation and skip macros plan

f87bf29

- 29-02-SUMMARY.md with metrics, decisions, deviation documentation - STATE.md updated: Phase 29 complete (2/2 plans), ready for Phase 30 Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(phase-29): complete phase execution — discovery + harness verified

b4c2ebd

5/5 requirements done (DISC-01..04, FAIL-08). 17/17 must-haves verified. 462 tests passing (386 unit + 76 e2e), zero regressions. Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(phase-30): complete smoke tests research

db6046d

docs(30): create phase plan for smoke tests

3722a54

chore: enable research workflow toggle, remove obsolete todo

a3b6b5b

Enable research step in GSD workflow config and remove completed cch→rulez renaming todo that is no longer relevant. Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(30-01): complete RealCliHarness infrastructure plan

af016c1

- Created 30-01-SUMMARY.md with execution results - Updated STATE.md position to Phase 30 plan 1/2 Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(30-02): complete smoke tests plan

c7c48d3

Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(phase-31): complete failure mode tests research

b2732d2

Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(31): create phase plan for failure mode tests

db3712b

Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(31-01): complete failure mode tests plan

300dec7

- SUMMARY.md with 15 per-adapter failure mode tests across FAIL-05/06/07 - STATE.md updated: Phase 31 complete, metrics recorded Co-Authored-By: Claude Opus 4.6 <[email protected]>

RichardHightower and others added 18 commits February 24, 2026 19:07

docs(phase-31): complete phase execution — failure mode tests verified

165af18

docs(32): research phase domain — reporting and CI pipeline

3a8adea

docs(32): create phase plan for reporting and CI pipeline

540d33b

docs(32-01): complete test report generator plan

199504b

Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(32-02): complete nightly CI pipeline plan

b90ceaa

- Summary documenting workflow creation and verification - STATE.md updated: Phase 32 complete, v1.5 milestone complete Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(phase-32): complete phase execution — reporting and CI pipeline …

08bb5e1

…verified Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(v1.5): milestone audit — 1 critical gap found (Phase 31 #[ignore])

3499113

Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(33): create phase plan for wiring failure tests to CI

96e2a9d

docs(33-01): complete wire-failure-tests-to-ci plan

6cd91c0

- Created 33-01-SUMMARY.md with execution results - Updated STATE.md with phase 33 position and decisions Co-Authored-By: Claude Opus 4.6 <[email protected]>

docs(phase-33): complete phase execution — failure tests wired to CI

6ca831c

test(33): complete UAT - 4 passed, 0 issues

f02bc44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: milestones v1.1–v1.5 — adapters, observability, testing, streaming#4

feat: milestones v1.1–v1.5 — adapters, observability, testing, streaming#4
RichardHightower wants to merge 48 commits intomainfrom
feature/phase-23-streaming-notifications

RichardHightower commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RichardHightower commented Feb 23, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant