SpillwaveSolutions · RichardHightower · Feb 23, 2026 · Feb 23, 2026 · Feb 23, 2026 · Feb 23, 2026
diff --git a/.github/workflows/nightly-cli-tests.yml b/.github/workflows/nightly-cli-tests.yml
@@ -0,0 +1,101 @@
+# =============================================================================
+# Nightly Real CLI Integration Tests
+# =============================================================================
+#
+# Purpose: Run real CLI adapter integration tests on a nightly schedule to
+#          validate that all supported AI CLIs (Claude, OpenCode, Gemini,
+#          Codex, Copilot) work correctly with agent-cron.
+#
+# Required GitHub Secrets:
+#   - ANTHROPIC_API_KEY  : Claude CLI authentication
+#   - OPENAI_API_KEY     : Codex CLI authentication
+#   - GEMINI_API_KEY     : Gemini CLI authentication
+#   - GH_CLI_TOKEN       : GitHub Copilot CLI authentication
+#
+# Notes:
+#   - Copilot tests will show as SKIP in CI -- browser OAuth requires local
+#     testing only (no headless auth available).
+#   - --format json requires nightly Rust; regular builds use stable.
+#   - See .planning/phases/32-reporting-ci-pipeline/32-RESEARCH.md for details.
+# =============================================================================
+
+name: Nightly CLI Integration Tests
+
+on:
+  schedule:
+    - cron: '0 3 * * *'  # 3 AM UTC nightly
+  workflow_dispatch: {}    # Manual trigger for debugging
+
+jobs:
+  cli-integration:
+    runs-on: ubuntu-latest
+    timeout-minutes: 60
+    if: github.repository == 'SpillwaveSolutions/agent-cron'
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Install Rust nightly
+        uses: dtolnay/rust-toolchain@nightly
+
+      - name: Cache cargo registry and build artifacts
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cargo/registry
+            ~/.cargo/git
+            rust/target
+          key: ${{ runner.os }}-cargo-nightly-${{ hashFiles('rust/Cargo.lock') }}
+
+      - name: Build project
+        run: cargo build --manifest-path rust/Cargo.toml
+
+      - name: Build test-report binary
+        run: cargo build --manifest-path rust/Cargo.toml --bin test-report
+
+      - name: Run real CLI integration tests
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
+          GITHUB_TOKEN: ${{ secrets.GH_CLI_TOKEN }}
+        run: |
+          # Run ignored tests with JSON output, capture to file
+          # Note: Do NOT use --nocapture with --format json (Pitfall 2)
+          cargo test --manifest-path rust/Cargo.toml \
+            -- --ignored --format json -Z unstable-options \
+            2>test-stderr.log | tee test-output.json || true
+          # Generate reports from JSON output
+          cargo run --manifest-path rust/Cargo.toml \
+            --bin test-report -- test-output.json
+
+      - name: Publish JUnit report
+        uses: mikepenz/action-junit-report@v5
+        if: always()
+        with:
+          report_paths: 'test-results.xml'
+          check_name: 'CLI Integration Tests'
+          include_passed: true
+
+      - name: Print matrix summary
+        if: always()
+        run: |
+          echo "## CLI Integration Test Matrix" >> $GITHUB_STEP_SUMMARY
+          echo '```' >> $GITHUB_STEP_SUMMARY
+          cat test-matrix-summary.txt >> $GITHUB_STEP_SUMMARY || echo "No summary generated" >> $GITHUB_STEP_SUMMARY
+          echo '```' >> $GITHUB_STEP_SUMMARY
+          cat test-matrix-summary.txt || true
+
+      - name: Upload test artifacts on failure
+        uses: actions/upload-artifact@v4
+        if: failure()
+        with:
+          name: test-artifacts-${{ github.run_id }}
+          path: |
+            test-output.json
+            test-results.json
+            test-results.xml
+            test-matrix-summary.txt
+            test-stderr.log
+          retention-days: 14
diff --git a/.planning/MILESTONES.md b/.planning/MILESTONES.md
@@ -101,3 +101,23 @@
 
 ---
 
+
+## v1.5 Multi-CLI Integration Testing (Shipped: 2026-03-05)
+
+**Phases:** 5 phases, 8 plans | **Tests:** 30 real CLI integration tests (15 smoke + 15 failure) | **Requirements:** 17/17
+**Timeline:** 11 days (2026-02-23 to 2026-03-05) | **Commits:** 14
+
+**Delivered:** Comprehensive real CLI integration test suite verifying all 5 AI CLI adapters with discovery, smoke tests, failure mode coverage, CI-ready reporting, and nightly GitHub Actions pipeline.
+
+**Key accomplishments:**
+- CLI discovery module with PATH probing, auth detection, and TOML capability matrix (LazyLock cached)
+- 15 smoke tests: echo round-trip, file creation, model flag passthrough — per CLI with require_cli_auth! gating
+- 15 failure mode tests: missing binary (Crashed), auth failure (Failed), timeout/SIGKILL escalation (Timeout) — per CLI
+- Test report binary generating JSON matrix, colored terminal table, and JUnit XML from cargo test JSON output
+- AGCRON_SKIP:: stdout marker chain for skip detection across test harness → report generator
+- GitHub Actions nightly workflow with per-CLI API key secrets, artifact upload, JUnit integration, fork guard
+
+**Archives:** [ROADMAP](milestones/v1.5-ROADMAP.md) | [REQUIREMENTS](milestones/v1.5-REQUIREMENTS.md) | [AUDIT](milestones/v1.5-MILESTONE-AUDIT.md)
+
+---
+
diff --git a/.planning/PROJECT.md b/.planning/PROJECT.md
@@ -55,14 +55,25 @@ Run AI agent workflows on a schedule — reliably, portably, and transparently.
 - [x] IPC round-trip tested: trigger → execute → query history via Unix socket RPC — v1.4
 - [x] Large output (10K+ lines) doesn't deadlock or truncate — v1.4
 - [x] No-record mode verified: no state file or history entry — v1.4
+- [x] CLI discovery detects binary availability and auth status for all 5 CLIs — v1.5
+- [x] TOML capability matrix gates tests by per-CLI features — v1.5
+- [x] 15 smoke tests verify daemon round-trip per CLI (echo, file creation, model flag) — v1.5
+- [x] 15 failure mode tests verify error states per CLI (missing binary, auth, timeout) — v1.5
+- [x] Test report binary produces JSON, terminal matrix, and JUnit XML — v1.5
+- [x] GitHub Actions nightly CI with per-CLI secrets and artifact upload — v1.5
 
-### Active
+## Last Milestone: v1.5 Multi-CLI Integration Testing (Complete)
 
-<!-- No active milestone - v1.4 complete -->
+**Goal:** Verified Agent Cron's adapters correctly invoke each of the 5 real AI CLIs (Claude, Gemini, Codex, Copilot, OpenCode) in headless mode, with full daemon round-trip validation, failure mode coverage, and CI-ready reporting.
 
-*(No active milestone — all milestones through v1.4 complete.)*
+**Delivered:**
+- CLI discovery with PATH probing, auth detection, capability matrix from TOML
+- 15 smoke tests (5 CLIs x 3 scenarios: echo, file creation, model flag)
+- 15 failure mode tests (5 CLIs x 3 modes: missing binary, auth failure, timeout/SIGKILL)
+- Test report binary (JSON, terminal matrix, JUnit XML)
+- GitHub Actions nightly CI workflow with per-CLI secrets and artifact upload
 
-## Last Milestone: v1.4 End-to-End Testing (Complete)
+## Previous Milestone: v1.4 End-to-End Testing (Complete)
 
 **Goal:** Added E2E tests verifying the full job lifecycle through real subprocess execution — covering happy path, failure modes, concurrency, retry, IPC, and no-record mode — using mock shell scripts instead of real AI CLIs.
 
@@ -77,15 +88,17 @@ Run AI agent workflows on a schedule — reliably, portably, and transparently.
 
 ## Context
 
-Shipped v1.4 with ~18,000 LOC Rust, 426 tests passing (375 unit + 5 integration + 42 E2E + 4 doc).
-All milestones through v1.4 complete.
+Shipped v1.5 with ~18,000 LOC Rust, 30 real CLI integration tests (15 smoke + 15 failure), CI pipeline with nightly reporting.
+All milestones through v1.5 complete.
 
 **Tech stack:** Rust + Tokio, clap (CLI), tokio-cron-scheduler, nix (signals), notify (file watcher), serde + toml/json (config/state), gray_matter (frontmatter parsing), fork (daemonization), memory-stats (macOS memory), arc-swap (config hot reload), tracing-appender (file logging), reqwest (webhook delivery), owo-colors (terminal formatting).
 
 **Codebase:** 20+ source modules in `rust/src/` covering daemon, scheduler, IPC, adapters (claude, opencode, gemini, codex, copilot, mock, custom), executor, state machine, locks, history, retry, queue, watcher, validation, config.
 
 **Known technical debt:**
-- None critical (all 5 adapter configs verified correct in v1.3)
+- CliWorkspace unused by smoke/failure tests (self-tested only)
+- Skip macros duplicated across test modules (Rust macro_rules! limitation)
+- CI workflow has no CLI binary installation steps (tests SKIP on stock runners)
 
 <details>
 <summary>v1.0 Context (2026-02-10)</summary>
@@ -125,6 +138,9 @@ Codebase: 20 source modules in `rust/src/` covering daemon, scheduler, IPC, adap
 | Fork before Tokio runtime | tokio#4301 constraint compliance | ✓ Good — no thread corruption in child |
 | ArcSwap for config | Lock-free reads, atomic swap on reload | ✓ Good — zero-cost reads, safe hot reload |
 | Semaphore in ArcSwap | Can't shrink Tokio Semaphore | ✓ Good — old permits drain naturally |
+| #[ignore] for real CLI tests | Separate fast vs slow test runs | ✓ Good — default cargo test stays fast, CI runs --ignored |
+| AGCRON_SKIP:: stdout marker | Skip detection without cargo test internals | ✓ Good — works with JSON stdout field |
+| TOML capability matrix | Per-CLI feature gating without hardcoding | ✓ Good — easy to update when CLIs change |
 
 ---
-*Last updated: 2026-02-12 after v1.4 milestone complete (426 tests)*
+*Last updated: 2026-03-05 after v1.5 milestone complete*
diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md