From 6b241b986226240b0388b500c80d53218912947f Mon Sep 17 00:00:00 2001 From: jiaen-nv Date: Fri, 24 Apr 2026 16:09:41 -0700 Subject: [PATCH] Rewrite testbot README with architecture diagrams Add system architecture diagram showing the LLM/harness/external API separation, per-workflow flow diagrams (generate and respond), and a comprehensive guardrails table covering all safety layers (tool allowlist, label gate, fork rejection, author association, required reviewer, org membership, commit sanitization, partial work discard). Also document the auto-approver workflow, the harness responsibilities split, and the file-based prompt structure (TESTBOT_RULES.md shared between generate and respond). Co-Authored-By: Claude Opus 4.6 (1M context) --- src/scripts/testbot/README.md | 209 +++++++++++++++++++++++----------- 1 file changed, 142 insertions(+), 67 deletions(-) diff --git a/src/scripts/testbot/README.md b/src/scripts/testbot/README.md index 5c1bde709..16c66fa71 100644 --- a/src/scripts/testbot/README.md +++ b/src/scripts/testbot/README.md @@ -1,78 +1,27 @@ -# Testbot: AI-Powered Test Generation +# Testbot: AI-Powered Test Generation & Review Response -Testbot analyzes coverage gaps, generates tests using Claude Code, validates them, and opens PRs for human review. It also responds to inline review comments via `/testbot`. +Testbot is a GitHub Actions bot backed by Claude Code that: -## Architecture +1. **Generates tests** for low-coverage files on a weekly schedule (opens PRs for review) +2. **Responds to `/testbot` comments** on any PR labeled `ai-generated` (applies fixes, writes tests, addresses CodeRabbit feedback) -### Test Generation (`testbot.yaml`) +## Using testbot -```text -Codecov API → coverage_targets.py → Claude Code CLI → guardrails → create_pr.py - | ↑ - └─────────┘ (agent retries on test failures) -``` - -| Stage | Component | Description | -|-------|-----------|-------------| -| **Coverage analysis** | `coverage_targets.py` | Fetches Codecov report, selects lowest-coverage files | -| **Test generation** | Claude Code CLI | Reads source, writes test files and BUILD entries, runs tests, iterates on failures | -| **Guardrails** | `guardrails.py` | Filters out any non-test file changes made by Claude | -| **PR creation** | `create_pr.py` | Creates branch, commits test files, pushes, opens PR with `ai-generated` label | - -Claude Code is sandboxed: it can only read files, edit test files, and run test commands (`bazel test`, `pnpm test`). It cannot run `git`, `gh`, or modify source code. All git and GitHub operations are in deterministic harness scripts. - -### Review Response (`testbot-respond.yaml`) - -```text -/testbot comment → respond.py - ├─ fetch all thread comments (GraphQL) - ├─ filter: trigger phrase, author, dedup - ├─ Claude Code CLI: read files, apply fix, run tests - ├─ respond.py: git commit + push - ├─ structured reply via --json-schema - └─ post inline reply to each thread -``` - -| Feature | Description | -|---------|-------------| -| **Trigger** | Comment starting with `/testbot` on any PR with the `ai-generated` label | -| **Thread context** | Full conversation history (all nested comments) passed to Claude | -| **Structured output** | `--json-schema` returns per-thread replies and commit message | -| **Safety** | Repo-member-only access, crash recovery, push retry | -| **Dedup** | Skips threads where the bot already replied and is awaiting human follow-up | - -### Security Boundary - -| | Claude Code | Harness scripts | -|---|---|---| -| Read source files | Yes | — | -| Write/edit test files | Yes | — | -| Run `bazel test` / `pnpm test` | Yes | — | -| Run `git` commands | **No** | `create_pr.py`, `respond.py` | -| Run `gh` commands | **No** | `create_pr.py`, `respond.py` | -| Filter non-test changes | — | `guardrails.py` | - -## Triggering on GitHub - -### Manual dispatch +### Generate workflow — manual dispatch **Actions → Testbot → Run workflow**, or via CLI: ```bash -gh workflow run testbot.yaml --ref \ +gh workflow run testbot.yaml --ref main \ -f max_targets=1 \ -f max_uncovered=300 \ -f max_turns=50 \ -f model=aws/anthropic/claude-opus-4-5 ``` -### Schedule - -Runs automatically on weekdays at 6 AM UTC. - -### Review response +### Respond workflow — /testbot comments -Add the `ai-generated` label to your PR, then start an inline review comment with `/testbot `. The command must be the first text in the comment. Examples: +Add the `ai-generated` label to your PR, then post an **inline review comment** (on the "Files changed" tab) starting with `/testbot`. Examples: ```text /testbot add unit tests for this file @@ -81,11 +30,15 @@ Add the `ai-generated` label to your PR, then start an inline review comment wit /testbot refactor this function to reduce duplication ``` -The bot responds only to repo members (OWNER, MEMBER, COLLABORATOR). It will not respond to its own replies or comments from bots. +The command must be the **first text** in the comment. Only repo members (OWNER, MEMBER, COLLABORATOR) can trigger the bot. It won't respond to its own replies or to other bots. + +**Example threads** showing the bot in action on PR #890: +- [Thread r3126197776](https://github.com/NVIDIA/OSMO/pull/890/changes/40b026ff5eb4cb99d697476a49dead9811a9131b#r3126197776) +- [Thread r3126743347](https://github.com/NVIDIA/OSMO/pull/890/changes/40b026ff5eb4cb99d697476a49dead9811a9131b#r3126743347) ### Reverting a testbot commit -If the bot's commit isn't what you wanted, revert it and retry: +If the bot's commit isn't what you wanted: ```bash git pull && git revert HEAD --no-edit && git push @@ -93,9 +46,131 @@ git pull && git revert HEAD --no-edit && git push Then post a new `/testbot` comment with clearer instructions. +## System Architecture + +```text + ┌──────────────────────────────────┐ + │ Claude Code CLI │ + │ (sandboxed — Read/Edit/Write, │ + │ bazel test, pnpm test, gh pr) │ + └───────────────┬──────────────────┘ + ▲ + │ --allowedTools, --json-schema + │ + ┌──────────────────────┐ ┌─────────┴──────────┐ ┌──────────────────┐ + │ GENERATE WORKFLOW │ │ HARNESS │ │ RESPOND WORKFLOW │ + │ (testbot.yaml) │────▶│ Python scripts │◀────│ (testbot-respond │ + │ Weekly cron │ │ (git, gh, auth, │ │ .yaml) │ + │ or dispatch │ │ guardrails) │ │ /testbot comment │ + └──────────┬───────────┘ └─────────┬──────────┘ └──────────┬───────┘ + │ │ │ + ▼ ▼ ▼ + ┌──────────┐ ┌───────────────┐ ┌──────────┐ + │ Codecov │ │ git push │ │ GitHub │ + │ API │ │ gh pr ... │ │ API │ + └──────────┘ └───────────────┘ └──────────┘ +``` + +The architecture separates **what the LLM can do** (read code, run tests) from **what the harness does** (git, GitHub API, auth, guardrails). The LLM is never trusted with write access to branches or the GitHub API. + +## Workflow 1: Generate Tests (`testbot.yaml`) + +```text +┌─────────────┐ ┌────────────────────────┐ ┌─────────────────────┐ ┌──────────────┐ ┌────────────┐ +│ Trigger │ │ 1. coverage_targets.py │ │ 2. Claude Code CLI │ │ 3. guardrails│ │ 4. create_ │ +│ weekday │─▶│ Fetch Codecov │─▶│ Read source │─▶│ .py │─▶│ pr.py │ +│ 6 AM UTC │ │ Pick low-cov file │ │ Write tests+BUILD │ │ Keep tests │ │ Branch, │ +│ or manual │ │ Emit target list │ │ Run bazel test │ │ only │ │ commit, │ +└─────────────┘ └────────────────────────┘ │ Retry on fail │ │ Revert src │ │ open PR │ + └─────────────────────┘ └──────────────┘ └────────────┘ +``` + +**Trigger:** Cron `0 6 * * 1-5` (weekdays 6 AM UTC) or `workflow_dispatch` + +**Output:** A new branch `testbot/YYYYMMDD-HHMM` and a PR titled `[testbot] Add tests for ` with the `ai-generated` label + +## Workflow 2: Respond to /testbot (`testbot-respond.yaml`) + +```text +┌──────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ ┌────────────────┐ +│ Trigger │ │ 1. auto-approve │ │ 2. respond.py │ │ 3. Claude Code │ +│ pull_request_ │─▶│ (workflow_run) │─▶│ GraphQL: fetch │─▶│ CLI │ +│ review_comment │ │ Check NVIDIA │ │ threads + filter │ │ Read, Edit, │ +│ + ai-generated │ │ org membership │ │ for /testbot │ │ Write, bazel,│ +│ label │ │ Approve env │ │ trigger + author │ │ gh pr view │ +└──────────────────┘ └─────────────────────┘ │ assoc (MEMBER+) │ └────────┬───────┘ + │ Build prompt │ │ + └─────────────────────┘ │ + ▲ │ + │ structured JSON │ + │ {commit_message, │ + │ replies[...]} │ + │ │ + ┌─────────────────────┐ ┌──────────┴──────────┐ │ + │ 5. Post inline │ │ 4. respond.py │ │ + │ replies via │◀─│ get_changed_files │◀───────────┘ + │ GitHub API │ │ commit_and_push │ + │ (one per thread) │ │ (retry, detect │ + │ │ │ GH013) │ + └─────────────────────┘ └─────────────────────┘ +``` + +**Trigger:** Inline review comment (on "Files changed" tab) that starts with `/testbot` on a PR labeled `ai-generated` + +**Output:** A new commit by `testbot[bot]` pushed to the PR branch, plus an inline reply to each addressed comment + +## Auto-approver (`testbot-respond-approve.yaml`) + +The respond workflow runs under `environment: testbot-respond` (gated by a required reviewer) so it can access the `NVIDIA_NIM_KEY` secret. The auto-approver runs on `main` via `workflow_run` and: + +1. Checks if the triggering actor is in the `NVIDIA/osmo-dev` team, OR a trusted bot (`svc-osmo-ci`, `github-actions[bot]`, `coderabbitai[bot]`) +2. If authorized, calls GitHub's `reviewPendingDeploymentsForRun` to approve the deployment + +This lets `/testbot` comments from NVIDIA team members run automatically while blocking external contributors and bots that don't need to run the full pipeline. + +## Guardrails + +| Guardrail | Scope | Implementation | +|-----------|-------|----------------| +| **Tool allowlist** | Both workflows | `--allowedTools` flag restricts Claude Code to `Read,Edit,Write,Glob,Grep,bazel test,pnpm test/validate/format,gh pr view/diff/checks` — no `git`, no `gh api`, no arbitrary bash | +| **Test-file-only filter** | Generate only | `guardrails.py:get_changed_test_files()` reverts non-test file changes before commit | +| **Label gate** | Respond only | Workflow `if:` requires `ai-generated` label on the PR | +| **Fork rejection** | Respond only | Workflow `if:` requires `head.repo == base.repo` (no forks) | +| **Author association** | Respond only | `respond.py` requires the triggering comment author to be `OWNER`, `MEMBER`, or `COLLABORATOR` | +| **Required reviewer** | Respond only | `environment: testbot-respond` blocks unauthorized runs from accessing the API key secret | +| **Org membership check** | Respond only | Auto-approver verifies the actor is in `NVIDIA/osmo-dev` before approving | +| **Commit message sanitization** | Both | `sanitize_commit_message()` enforces `testbot:` prefix, strips git trailers, caps at 500 chars | +| **Push retry with GH013 detection** | Both | `commit_and_push()` retries up to 3x; fails fast on repository ruleset violations | +| **Partial work discard** | Respond only | On timeout or max-turns hit, `respond.py` discards file changes and posts an informative reply (doesn't push half-finished work) | + +## Harness responsibilities + +Claude Code is intentionally given a **narrow capability surface**. Everything else lives in the Python harness: + +| Responsibility | Component | +|----------------|-----------| +| Coverage analysis & target selection | `coverage_targets.py` | +| Prompt construction (shared rules + workflow-specific) | `respond.py`, `TESTBOT_PROMPT.md`, `TESTBOT_RESPOND_PROMPT.md`, `TESTBOT_RULES.md` | +| Git operations (branch, commit, push, retry, revert) | `create_pr.py`, `respond.py` | +| GitHub API (fetch threads, post replies, create PRs) | `create_pr.py`, `respond.py` | +| Guardrail enforcement (file-type filter) | `guardrails.py` | +| Structured output parsing (3-tier fallback) | `respond.py:_extract_replies()` | +| Timeout / max-turns handling | `respond.py:run_claude()` and `main()` | +| Auto-approval of environment deployments | `testbot-respond-approve.yaml` | + +## Prompt files + +Prompts are **file-based** (not inlined in Python) so they can be edited, diffed, and reviewed independently: + +| File | Purpose | +|------|---------| +| `TESTBOT_RULES.md` | **Shared** test quality rules, bug-detection process, verification steps, language conventions. Referenced by both workflows. | +| `TESTBOT_PROMPT.md` | Generate-specific: coverage targets process, BUILD file handling, guardrails. References `TESTBOT_RULES.md`. | +| `TESTBOT_RESPOND_PROMPT.md` | Respond-specific: role framing, PR context guidance, output JSON schema example. References `TESTBOT_RULES.md`. | + ## Configuration -### Test generation (dispatch inputs) +### Generate workflow (dispatch inputs) | Input | Default | Description | |-------|---------|-------------| @@ -106,13 +181,13 @@ Then post a new `/testbot` comment with clearer instructions. | `model` | `aws/anthropic/claude-opus-4-5` | LLM model on API gateway | | `dry_run` | `false` | Generate without creating PR | -### Review response (CLI args in `testbot-respond.yaml`) +### Respond workflow (CLI args in `testbot-respond.yaml`) | Arg | Default | Description | |-----|---------|-------------| -| `--max-turns` | `50` | Claude Code agent turns | +| `--max-turns` | `75` | Claude Code agent turns | | `--max-responses` | `10` | Max threads to address per trigger | -| `--timeout` | `720` | Claude Code CLI timeout in seconds | +| `--timeout` | `900` | Claude Code CLI timeout in seconds | | `--model` | `aws/anthropic/claude-opus-4-5` | LLM model | ### Coverage target selection (constants in `coverage_targets.py`) @@ -143,5 +218,5 @@ src/scripts/testbot/ .github/workflows/ ├── testbot.yaml # Scheduled test generation ├── testbot-respond.yaml # /testbot review response -└── testbot-respond-approve.yaml # Auto-approve for org members +└── testbot-respond-approve.yaml # Auto-approve for NVIDIA org members ```