Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
e456749
docs: start milestone v1.5 Multi-CLI Integration Testing
RichardHightower Feb 23, 2026
8b36170
docs: v1.5 research — multi-CLI integration testing
RichardHightower Feb 23, 2026
b3a65b5
docs: define milestone v1.5 requirements (17 requirements)
RichardHightower Feb 23, 2026
a84895c
docs: create milestone v1.5 roadmap (4 phases, 17 requirements)
RichardHightower Feb 23, 2026
3d2bb70
docs(29): capture phase context
RichardHightower Feb 23, 2026
7a9ab2f
docs(29): research CLI discovery and test harness phase
RichardHightower Feb 23, 2026
bfaa780
docs(29): create phase plan
RichardHightower Feb 23, 2026
940f2eb
fix(29): revise plans based on checker feedback
RichardHightower Feb 23, 2026
4204425
feat(29-01): add CLI discovery module with binary detection, auth pro…
RichardHightower Feb 23, 2026
752a82c
feat(29-01): add CLI capability matrix with TOML config and LazyLock …
RichardHightower Feb 23, 2026
b7e76fb
docs(29-01): complete CLI discovery and capability matrix plan
RichardHightower Feb 23, 2026
25783b4
feat(29-02): add CliWorkspace for isolated CLI test environments
RichardHightower Feb 23, 2026
d2f4b85
feat(29-02): add skip macros, test_discovery tests, zzz_skip_summary,…
RichardHightower Feb 23, 2026
f87bf29
docs(29-02): complete workspace isolation and skip macros plan
RichardHightower Feb 23, 2026
b4c2ebd
docs(phase-29): complete phase execution — discovery + harness verified
RichardHightower Feb 23, 2026
db6046d
docs(phase-30): complete smoke tests research
RichardHightower Feb 23, 2026
3722a54
docs(30): create phase plan for smoke tests
RichardHightower Feb 23, 2026
a3b6b5b
chore: enable research workflow toggle, remove obsolete todo
RichardHightower Feb 23, 2026
55ddb5d
feat(30-01): create RealCliHarness module for real CLI adapter constr…
RichardHightower Feb 24, 2026
7115be1
feat(30-01): wire real_cli_harness and test_smoke modules into e2e te…
RichardHightower Feb 24, 2026
af016c1
docs(30-01): complete RealCliHarness infrastructure plan
RichardHightower Feb 24, 2026
31fb9fc
feat(30-02): add smoke test helpers and skip macros
RichardHightower Feb 24, 2026
8275e47
feat(30-02): add 15 smoke test functions for all 5 CLIs
RichardHightower Feb 24, 2026
c7c48d3
docs(30-02): complete smoke tests plan
RichardHightower Feb 24, 2026
29cd8c1
docs(phase-30): complete phase execution — smoke tests verified
RichardHightower Feb 24, 2026
b2732d2
docs(phase-31): complete failure mode tests research
RichardHightower Feb 24, 2026
db3712b
docs(31): create phase plan for failure mode tests
RichardHightower Feb 24, 2026
4d389d1
feat(31-01): add FAIL-05 and FAIL-06 per-adapter failure mode tests
RichardHightower Feb 25, 2026
f44b0c4
feat(31-01): add FAIL-07 timeout/SIGKILL escalation tests for all 5 a…
RichardHightower Feb 25, 2026
300dec7
docs(31-01): complete failure mode tests plan
RichardHightower Feb 25, 2026
165af18
docs(phase-31): complete phase execution — failure mode tests verified
RichardHightower Feb 25, 2026
3a8adea
docs(32): research phase domain — reporting and CI pipeline
RichardHightower Feb 25, 2026
540d33b
docs(32): create phase plan for reporting and CI pipeline
RichardHightower Feb 25, 2026
bfc84d1
feat(32-01): add AGCRON_SKIP marker and quick-junit dependency
RichardHightower Feb 25, 2026
09a325a
feat(32-01): implement test-report binary with JSON, terminal, and JU…
RichardHightower Feb 25, 2026
199504b
docs(32-01): complete test report generator plan
RichardHightower Feb 25, 2026
fa54399
feat(32-02): add nightly CI workflow for real CLI integration tests
RichardHightower Feb 25, 2026
b90ceaa
docs(32-02): complete nightly CI pipeline plan
RichardHightower Feb 25, 2026
08bb5e1
docs(phase-32): complete phase execution — reporting and CI pipeline …
RichardHightower Feb 25, 2026
3499113
docs(v1.5): milestone audit — 1 critical gap found (Phase 31 #[ignore])
RichardHightower Mar 5, 2026
a34b2fd
docs(roadmap): add gap closure phase 33 — wire failure tests to CI
RichardHightower Mar 5, 2026
96e2a9d
docs(33): create phase plan for wiring failure tests to CI
RichardHightower Mar 5, 2026
95d86d7
feat(33-01): add #[ignore] to all 15 failure test functions
RichardHightower Mar 5, 2026
6cd91c0
docs(33-01): complete wire-failure-tests-to-ci plan
RichardHightower Mar 5, 2026
6ca831c
docs(phase-33): complete phase execution — failure tests wired to CI
RichardHightower Mar 5, 2026
f02bc44
test(33): complete UAT - 4 passed, 0 issues
RichardHightower Mar 5, 2026
fdbf634
docs(v1.5): fix documentation inconsistencies across planning files
RichardHightower Mar 5, 2026
f76c69e
chore: complete v1.5 milestone — archive and evolve project docs
RichardHightower Mar 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions .github/workflows/nightly-cli-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# =============================================================================
# Nightly Real CLI Integration Tests
# =============================================================================
#
# Purpose: Run real CLI adapter integration tests on a nightly schedule to
# validate that all supported AI CLIs (Claude, OpenCode, Gemini,
# Codex, Copilot) work correctly with agent-cron.
#
# Required GitHub Secrets:
# - ANTHROPIC_API_KEY : Claude CLI authentication
# - OPENAI_API_KEY : Codex CLI authentication
# - GEMINI_API_KEY : Gemini CLI authentication
# - GH_CLI_TOKEN : GitHub Copilot CLI authentication
#
# Notes:
# - Copilot tests will show as SKIP in CI -- browser OAuth requires local
# testing only (no headless auth available).
# - --format json requires nightly Rust; regular builds use stable.
# - See .planning/phases/32-reporting-ci-pipeline/32-RESEARCH.md for details.
# =============================================================================

name: Nightly CLI Integration Tests

on:
schedule:
- cron: '0 3 * * *' # 3 AM UTC nightly
workflow_dispatch: {} # Manual trigger for debugging

jobs:
cli-integration:
runs-on: ubuntu-latest
timeout-minutes: 60
if: github.repository == 'SpillwaveSolutions/agent-cron'

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Install Rust nightly
uses: dtolnay/rust-toolchain@nightly

- name: Cache cargo registry and build artifacts
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
rust/target
key: ${{ runner.os }}-cargo-nightly-${{ hashFiles('rust/Cargo.lock') }}

- name: Build project
run: cargo build --manifest-path rust/Cargo.toml

- name: Build test-report binary
run: cargo build --manifest-path rust/Cargo.toml --bin test-report

- name: Run real CLI integration tests
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GH_CLI_TOKEN }}
run: |
# Run ignored tests with JSON output, capture to file
# Note: Do NOT use --nocapture with --format json (Pitfall 2)
cargo test --manifest-path rust/Cargo.toml \
-- --ignored --format json -Z unstable-options \
2>test-stderr.log | tee test-output.json || true
# Generate reports from JSON output
cargo run --manifest-path rust/Cargo.toml \
--bin test-report -- test-output.json

- name: Publish JUnit report
uses: mikepenz/action-junit-report@v5
if: always()
with:
report_paths: 'test-results.xml'
check_name: 'CLI Integration Tests'
include_passed: true

- name: Print matrix summary
if: always()
run: |
echo "## CLI Integration Test Matrix" >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY
cat test-matrix-summary.txt >> $GITHUB_STEP_SUMMARY || echo "No summary generated" >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY
cat test-matrix-summary.txt || true

- name: Upload test artifacts on failure
uses: actions/upload-artifact@v4
if: failure()
with:
name: test-artifacts-${{ github.run_id }}
path: |
test-output.json
test-results.json
test-results.xml
test-matrix-summary.txt
test-stderr.log
retention-days: 14
20 changes: 20 additions & 0 deletions .planning/MILESTONES.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,23 @@

---


## v1.5 Multi-CLI Integration Testing (Shipped: 2026-03-05)

**Phases:** 5 phases, 8 plans | **Tests:** 30 real CLI integration tests (15 smoke + 15 failure) | **Requirements:** 17/17
**Timeline:** 11 days (2026-02-23 to 2026-03-05) | **Commits:** 14

**Delivered:** Comprehensive real CLI integration test suite verifying all 5 AI CLI adapters with discovery, smoke tests, failure mode coverage, CI-ready reporting, and nightly GitHub Actions pipeline.

**Key accomplishments:**
- CLI discovery module with PATH probing, auth detection, and TOML capability matrix (LazyLock cached)
- 15 smoke tests: echo round-trip, file creation, model flag passthrough — per CLI with require_cli_auth! gating
- 15 failure mode tests: missing binary (Crashed), auth failure (Failed), timeout/SIGKILL escalation (Timeout) — per CLI
- Test report binary generating JSON matrix, colored terminal table, and JUnit XML from cargo test JSON output
- AGCRON_SKIP:: stdout marker chain for skip detection across test harness → report generator
- GitHub Actions nightly workflow with per-CLI API key secrets, artifact upload, JUnit integration, fork guard

**Archives:** [ROADMAP](milestones/v1.5-ROADMAP.md) | [REQUIREMENTS](milestones/v1.5-REQUIREMENTS.md) | [AUDIT](milestones/v1.5-MILESTONE-AUDIT.md)

---

32 changes: 24 additions & 8 deletions .planning/PROJECT.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,25 @@ Run AI agent workflows on a schedule — reliably, portably, and transparently.
- [x] IPC round-trip tested: trigger → execute → query history via Unix socket RPC — v1.4
- [x] Large output (10K+ lines) doesn't deadlock or truncate — v1.4
- [x] No-record mode verified: no state file or history entry — v1.4
- [x] CLI discovery detects binary availability and auth status for all 5 CLIs — v1.5
- [x] TOML capability matrix gates tests by per-CLI features — v1.5
- [x] 15 smoke tests verify daemon round-trip per CLI (echo, file creation, model flag) — v1.5
- [x] 15 failure mode tests verify error states per CLI (missing binary, auth, timeout) — v1.5
- [x] Test report binary produces JSON, terminal matrix, and JUnit XML — v1.5
- [x] GitHub Actions nightly CI with per-CLI secrets and artifact upload — v1.5

### Active
## Last Milestone: v1.5 Multi-CLI Integration Testing (Complete)

<!-- No active milestone - v1.4 complete -->
**Goal:** Verified Agent Cron's adapters correctly invoke each of the 5 real AI CLIs (Claude, Gemini, Codex, Copilot, OpenCode) in headless mode, with full daemon round-trip validation, failure mode coverage, and CI-ready reporting.

*(No active milestone — all milestones through v1.4 complete.)*
**Delivered:**
- CLI discovery with PATH probing, auth detection, capability matrix from TOML
- 15 smoke tests (5 CLIs x 3 scenarios: echo, file creation, model flag)
- 15 failure mode tests (5 CLIs x 3 modes: missing binary, auth failure, timeout/SIGKILL)
- Test report binary (JSON, terminal matrix, JUnit XML)
- GitHub Actions nightly CI workflow with per-CLI secrets and artifact upload

## Last Milestone: v1.4 End-to-End Testing (Complete)
## Previous Milestone: v1.4 End-to-End Testing (Complete)

**Goal:** Added E2E tests verifying the full job lifecycle through real subprocess execution — covering happy path, failure modes, concurrency, retry, IPC, and no-record mode — using mock shell scripts instead of real AI CLIs.

Expand All @@ -77,15 +88,17 @@ Run AI agent workflows on a schedule — reliably, portably, and transparently.

## Context

Shipped v1.4 with ~18,000 LOC Rust, 426 tests passing (375 unit + 5 integration + 42 E2E + 4 doc).
All milestones through v1.4 complete.
Shipped v1.5 with ~18,000 LOC Rust, 30 real CLI integration tests (15 smoke + 15 failure), CI pipeline with nightly reporting.
All milestones through v1.5 complete.

**Tech stack:** Rust + Tokio, clap (CLI), tokio-cron-scheduler, nix (signals), notify (file watcher), serde + toml/json (config/state), gray_matter (frontmatter parsing), fork (daemonization), memory-stats (macOS memory), arc-swap (config hot reload), tracing-appender (file logging), reqwest (webhook delivery), owo-colors (terminal formatting).

**Codebase:** 20+ source modules in `rust/src/` covering daemon, scheduler, IPC, adapters (claude, opencode, gemini, codex, copilot, mock, custom), executor, state machine, locks, history, retry, queue, watcher, validation, config.

**Known technical debt:**
- None critical (all 5 adapter configs verified correct in v1.3)
- CliWorkspace unused by smoke/failure tests (self-tested only)
- Skip macros duplicated across test modules (Rust macro_rules! limitation)
- CI workflow has no CLI binary installation steps (tests SKIP on stock runners)

<details>
<summary>v1.0 Context (2026-02-10)</summary>
Expand Down Expand Up @@ -125,6 +138,9 @@ Codebase: 20 source modules in `rust/src/` covering daemon, scheduler, IPC, adap
| Fork before Tokio runtime | tokio#4301 constraint compliance | ✓ Good — no thread corruption in child |
| ArcSwap for config | Lock-free reads, atomic swap on reload | ✓ Good — zero-cost reads, safe hot reload |
| Semaphore in ArcSwap | Can't shrink Tokio Semaphore | ✓ Good — old permits drain naturally |
| #[ignore] for real CLI tests | Separate fast vs slow test runs | ✓ Good — default cargo test stays fast, CI runs --ignored |
| AGCRON_SKIP:: stdout marker | Skip detection without cargo test internals | ✓ Good — works with JSON stdout field |
| TOML capability matrix | Per-CLI feature gating without hardcoding | ✓ Good — easy to update when CLIs change |

---
*Last updated: 2026-02-12 after v1.4 milestone complete (426 tests)*
*Last updated: 2026-03-05 after v1.5 milestone complete*
101 changes: 0 additions & 101 deletions .planning/REQUIREMENTS.md

This file was deleted.

Loading