-
Notifications
You must be signed in to change notification settings - Fork 39
Rewrite testbot README with architecture diagrams #901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,78 +1,27 @@ | ||||||||||||||||
| # Testbot: AI-Powered Test Generation | ||||||||||||||||
| # Testbot: AI-Powered Test Generation & Review Response | ||||||||||||||||
|
|
||||||||||||||||
| Testbot analyzes coverage gaps, generates tests using Claude Code, validates them, and opens PRs for human review. It also responds to inline review comments via `/testbot`. | ||||||||||||||||
| Testbot is a GitHub Actions bot backed by Claude Code that: | ||||||||||||||||
|
|
||||||||||||||||
| ## Architecture | ||||||||||||||||
| 1. **Generates tests** for low-coverage files on a weekly schedule (opens PRs for review) | ||||||||||||||||
| 2. **Responds to `/testbot` comments** on any PR labeled `ai-generated` (applies fixes, writes tests, addresses CodeRabbit feedback) | ||||||||||||||||
|
|
||||||||||||||||
| ### Test Generation (`testbot.yaml`) | ||||||||||||||||
| ## Using testbot | ||||||||||||||||
|
|
||||||||||||||||
| ```text | ||||||||||||||||
| Codecov API → coverage_targets.py → Claude Code CLI → guardrails → create_pr.py | ||||||||||||||||
| | ↑ | ||||||||||||||||
| └─────────┘ (agent retries on test failures) | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| | Stage | Component | Description | | ||||||||||||||||
| |-------|-----------|-------------| | ||||||||||||||||
| | **Coverage analysis** | `coverage_targets.py` | Fetches Codecov report, selects lowest-coverage files | | ||||||||||||||||
| | **Test generation** | Claude Code CLI | Reads source, writes test files and BUILD entries, runs tests, iterates on failures | | ||||||||||||||||
| | **Guardrails** | `guardrails.py` | Filters out any non-test file changes made by Claude | | ||||||||||||||||
| | **PR creation** | `create_pr.py` | Creates branch, commits test files, pushes, opens PR with `ai-generated` label | | ||||||||||||||||
|
|
||||||||||||||||
| Claude Code is sandboxed: it can only read files, edit test files, and run test commands (`bazel test`, `pnpm test`). It cannot run `git`, `gh`, or modify source code. All git and GitHub operations are in deterministic harness scripts. | ||||||||||||||||
|
|
||||||||||||||||
| ### Review Response (`testbot-respond.yaml`) | ||||||||||||||||
|
|
||||||||||||||||
| ```text | ||||||||||||||||
| /testbot comment → respond.py | ||||||||||||||||
| ├─ fetch all thread comments (GraphQL) | ||||||||||||||||
| ├─ filter: trigger phrase, author, dedup | ||||||||||||||||
| ├─ Claude Code CLI: read files, apply fix, run tests | ||||||||||||||||
| ├─ respond.py: git commit + push | ||||||||||||||||
| ├─ structured reply via --json-schema | ||||||||||||||||
| └─ post inline reply to each thread | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| | Feature | Description | | ||||||||||||||||
| |---------|-------------| | ||||||||||||||||
| | **Trigger** | Comment starting with `/testbot` on any PR with the `ai-generated` label | | ||||||||||||||||
| | **Thread context** | Full conversation history (all nested comments) passed to Claude | | ||||||||||||||||
| | **Structured output** | `--json-schema` returns per-thread replies and commit message | | ||||||||||||||||
| | **Safety** | Repo-member-only access, crash recovery, push retry | | ||||||||||||||||
| | **Dedup** | Skips threads where the bot already replied and is awaiting human follow-up | | ||||||||||||||||
|
|
||||||||||||||||
| ### Security Boundary | ||||||||||||||||
|
|
||||||||||||||||
| | | Claude Code | Harness scripts | | ||||||||||||||||
| |---|---|---| | ||||||||||||||||
| | Read source files | Yes | — | | ||||||||||||||||
| | Write/edit test files | Yes | — | | ||||||||||||||||
| | Run `bazel test` / `pnpm test` | Yes | — | | ||||||||||||||||
| | Run `git` commands | **No** | `create_pr.py`, `respond.py` | | ||||||||||||||||
| | Run `gh` commands | **No** | `create_pr.py`, `respond.py` | | ||||||||||||||||
| | Filter non-test changes | — | `guardrails.py` | | ||||||||||||||||
|
|
||||||||||||||||
| ## Triggering on GitHub | ||||||||||||||||
|
|
||||||||||||||||
| ### Manual dispatch | ||||||||||||||||
| ### Generate workflow — manual dispatch | ||||||||||||||||
|
|
||||||||||||||||
| **Actions → Testbot → Run workflow**, or via CLI: | ||||||||||||||||
|
|
||||||||||||||||
| ```bash | ||||||||||||||||
| gh workflow run testbot.yaml --ref <branch> \ | ||||||||||||||||
| gh workflow run testbot.yaml --ref main \ | ||||||||||||||||
| -f max_targets=1 \ | ||||||||||||||||
| -f max_uncovered=300 \ | ||||||||||||||||
| -f max_turns=50 \ | ||||||||||||||||
| -f model=aws/anthropic/claude-opus-4-5 | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| ### Schedule | ||||||||||||||||
|
|
||||||||||||||||
| Runs automatically on weekdays at 6 AM UTC. | ||||||||||||||||
|
|
||||||||||||||||
| ### Review response | ||||||||||||||||
| ### Respond workflow — /testbot comments | ||||||||||||||||
|
|
||||||||||||||||
| Add the `ai-generated` label to your PR, then start an inline review comment with `/testbot <instruction>`. The command must be the first text in the comment. Examples: | ||||||||||||||||
| Add the `ai-generated` label to your PR, then post an **inline review comment** (on the "Files changed" tab) starting with `/testbot`. Examples: | ||||||||||||||||
|
|
||||||||||||||||
| ```text | ||||||||||||||||
| /testbot add unit tests for this file | ||||||||||||||||
|
|
@@ -81,21 +30,147 @@ Add the `ai-generated` label to your PR, then start an inline review comment wit | |||||||||||||||
| /testbot refactor this function to reduce duplication | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| The bot responds only to repo members (OWNER, MEMBER, COLLABORATOR). It will not respond to its own replies or comments from bots. | ||||||||||||||||
| The command must be the **first text** in the comment. Only repo members (OWNER, MEMBER, COLLABORATOR) can trigger the bot. It won't respond to its own replies or to other bots. | ||||||||||||||||
|
|
||||||||||||||||
| **Example threads** showing the bot in action on PR #890: | ||||||||||||||||
| - [Thread r3126197776](https://github.com/NVIDIA/OSMO/pull/890/changes/40b026ff5eb4cb99d697476a49dead9811a9131b#r3126197776) | ||||||||||||||||
| - [Thread r3126743347](https://github.com/NVIDIA/OSMO/pull/890/changes/40b026ff5eb4cb99d697476a49dead9811a9131b#r3126743347) | ||||||||||||||||
|
|
||||||||||||||||
| ### Reverting a testbot commit | ||||||||||||||||
|
|
||||||||||||||||
| If the bot's commit isn't what you wanted, revert it and retry: | ||||||||||||||||
| If the bot's commit isn't what you wanted: | ||||||||||||||||
|
|
||||||||||||||||
| ```bash | ||||||||||||||||
| git pull && git revert HEAD --no-edit && git push | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| Then post a new `/testbot` comment with clearer instructions. | ||||||||||||||||
|
|
||||||||||||||||
| ## System Architecture | ||||||||||||||||
|
|
||||||||||||||||
| ```text | ||||||||||||||||
| ┌──────────────────────────────────┐ | ||||||||||||||||
| │ Claude Code CLI │ | ||||||||||||||||
| │ (sandboxed — Read/Edit/Write, │ | ||||||||||||||||
| │ bazel test, pnpm test, gh pr) │ | ||||||||||||||||
| └───────────────┬──────────────────┘ | ||||||||||||||||
| ▲ | ||||||||||||||||
| │ --allowedTools, --json-schema | ||||||||||||||||
| │ | ||||||||||||||||
| ┌──────────────────────┐ ┌─────────┴──────────┐ ┌──────────────────┐ | ||||||||||||||||
| │ GENERATE WORKFLOW │ │ HARNESS │ │ RESPOND WORKFLOW │ | ||||||||||||||||
| │ (testbot.yaml) │────▶│ Python scripts │◀────│ (testbot-respond │ | ||||||||||||||||
| │ Weekly cron │ │ (git, gh, auth, │ │ .yaml) │ | ||||||||||||||||
| │ or dispatch │ │ guardrails) │ │ /testbot comment │ | ||||||||||||||||
| └──────────┬───────────┘ └─────────┬──────────┘ └──────────┬───────┘ | ||||||||||||||||
| │ │ │ | ||||||||||||||||
| ▼ ▼ ▼ | ||||||||||||||||
| ┌──────────┐ ┌───────────────┐ ┌──────────┐ | ||||||||||||||||
| │ Codecov │ │ git push │ │ GitHub │ | ||||||||||||||||
| │ API │ │ gh pr ... │ │ API │ | ||||||||||||||||
| └──────────┘ └───────────────┘ └──────────┘ | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| The architecture separates **what the LLM can do** (read code, run tests) from **what the harness does** (git, GitHub API, auth, guardrails). The LLM is never trusted with write access to branches or the GitHub API. | ||||||||||||||||
|
|
||||||||||||||||
| ## Workflow 1: Generate Tests (`testbot.yaml`) | ||||||||||||||||
|
|
||||||||||||||||
| ```text | ||||||||||||||||
| ┌─────────────┐ ┌────────────────────────┐ ┌─────────────────────┐ ┌──────────────┐ ┌────────────┐ | ||||||||||||||||
| │ Trigger │ │ 1. coverage_targets.py │ │ 2. Claude Code CLI │ │ 3. guardrails│ │ 4. create_ │ | ||||||||||||||||
| │ weekday │─▶│ Fetch Codecov │─▶│ Read source │─▶│ .py │─▶│ pr.py │ | ||||||||||||||||
| │ 6 AM UTC │ │ Pick low-cov file │ │ Write tests+BUILD │ │ Keep tests │ │ Branch, │ | ||||||||||||||||
| │ or manual │ │ Emit target list │ │ Run bazel test │ │ only │ │ commit, │ | ||||||||||||||||
| └─────────────┘ └────────────────────────┘ │ Retry on fail │ │ Revert src │ │ open PR │ | ||||||||||||||||
| └─────────────────────┘ └──────────────┘ └────────────┘ | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| **Trigger:** Cron `0 6 * * 1-5` (weekdays 6 AM UTC) or `workflow_dispatch` | ||||||||||||||||
|
|
||||||||||||||||
| **Output:** A new branch `testbot/YYYYMMDD-HHMM` and a PR titled `[testbot] Add tests for <source_file>` with the `ai-generated` label | ||||||||||||||||
|
|
||||||||||||||||
| ## Workflow 2: Respond to /testbot (`testbot-respond.yaml`) | ||||||||||||||||
|
|
||||||||||||||||
| ```text | ||||||||||||||||
| ┌──────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ ┌────────────────┐ | ||||||||||||||||
| │ Trigger │ │ 1. auto-approve │ │ 2. respond.py │ │ 3. Claude Code │ | ||||||||||||||||
| │ pull_request_ │─▶│ (workflow_run) │─▶│ GraphQL: fetch │─▶│ CLI │ | ||||||||||||||||
| │ review_comment │ │ Check NVIDIA │ │ threads + filter │ │ Read, Edit, │ | ||||||||||||||||
| │ + ai-generated │ │ org membership │ │ for /testbot │ │ Write, bazel,│ | ||||||||||||||||
| │ label │ │ Approve env │ │ trigger + author │ │ gh pr view │ | ||||||||||||||||
| └──────────────────┘ └─────────────────────┘ │ assoc (MEMBER+) │ └────────┬───────┘ | ||||||||||||||||
| │ Build prompt │ │ | ||||||||||||||||
| └─────────────────────┘ │ | ||||||||||||||||
| ▲ │ | ||||||||||||||||
| │ structured JSON │ | ||||||||||||||||
| │ {commit_message, │ | ||||||||||||||||
| │ replies[...]} │ | ||||||||||||||||
| │ │ | ||||||||||||||||
| ┌─────────────────────┐ ┌──────────┴──────────┐ │ | ||||||||||||||||
| │ 5. Post inline │ │ 4. respond.py │ │ | ||||||||||||||||
| │ replies via │◀─│ get_changed_files │◀───────────┘ | ||||||||||||||||
| │ GitHub API │ │ commit_and_push │ | ||||||||||||||||
| │ (one per thread) │ │ (retry, detect │ | ||||||||||||||||
| │ │ │ GH013) │ | ||||||||||||||||
| └─────────────────────┘ └─────────────────────┘ | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| **Trigger:** Inline review comment (on "Files changed" tab) that starts with `/testbot` on a PR labeled `ai-generated` | ||||||||||||||||
|
|
||||||||||||||||
| **Output:** A new commit by `testbot[bot]` pushed to the PR branch, plus an inline reply to each addressed comment | ||||||||||||||||
|
|
||||||||||||||||
| ## Auto-approver (`testbot-respond-approve.yaml`) | ||||||||||||||||
|
|
||||||||||||||||
| The respond workflow runs under `environment: testbot-respond` (gated by a required reviewer) so it can access the `NVIDIA_NIM_KEY` secret. The auto-approver runs on `main` via `workflow_run` and: | ||||||||||||||||
|
|
||||||||||||||||
| 1. Checks if the triggering actor is in the `NVIDIA/osmo-dev` team, OR a trusted bot (`svc-osmo-ci`, `github-actions[bot]`, `coderabbitai[bot]`) | ||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Auto-approver authorization description is internally inconsistent. Line 85 says authorization is team member or trusted bot, while Line 100 describes only team membership verification. Please make these statements consistent. Also applies to: 100-100 🤖 Prompt for AI Agents |
||||||||||||||||
| 2. If authorized, calls GitHub's `reviewPendingDeploymentsForRun` to approve the deployment | ||||||||||||||||
|
|
||||||||||||||||
| This lets `/testbot` comments from NVIDIA team members run automatically while blocking external contributors and bots that don't need to run the full pipeline. | ||||||||||||||||
|
|
||||||||||||||||
| ## Guardrails | ||||||||||||||||
|
|
||||||||||||||||
| | Guardrail | Scope | Implementation | | ||||||||||||||||
| |-----------|-------|----------------| | ||||||||||||||||
| | **Tool allowlist** | Both workflows | `--allowedTools` flag restricts Claude Code to `Read,Edit,Write,Glob,Grep,bazel test,pnpm test/validate/format,gh pr view/diff/checks` — no `git`, no `gh api`, no arbitrary bash | | ||||||||||||||||
| | **Test-file-only filter** | Generate only | `guardrails.py:get_changed_test_files()` reverts non-test file changes before commit | | ||||||||||||||||
| | **Label gate** | Respond only | Workflow `if:` requires `ai-generated` label on the PR | | ||||||||||||||||
| | **Fork rejection** | Respond only | Workflow `if:` requires `head.repo == base.repo` (no forks) | | ||||||||||||||||
| | **Author association** | Respond only | `respond.py` requires the triggering comment author to be `OWNER`, `MEMBER`, or `COLLABORATOR` | | ||||||||||||||||
| | **Required reviewer** | Respond only | `environment: testbot-respond` blocks unauthorized runs from accessing the API key secret | | ||||||||||||||||
| | **Org membership check** | Respond only | Auto-approver verifies the actor is in `NVIDIA/osmo-dev` before approving | | ||||||||||||||||
| | **Commit message sanitization** | Both | `sanitize_commit_message()` enforces `testbot:` prefix, strips git trailers, caps at 500 chars | | ||||||||||||||||
| | **Push retry with GH013 detection** | Both | `commit_and_push()` retries up to 3x; fails fast on repository ruleset violations | | ||||||||||||||||
| | **Partial work discard** | Respond only | On timeout or max-turns hit, `respond.py` discards file changes and posts an informative reply (doesn't push half-finished work) | | ||||||||||||||||
|
|
||||||||||||||||
| ## Harness responsibilities | ||||||||||||||||
|
|
||||||||||||||||
| Claude Code is intentionally given a **narrow capability surface**. Everything else lives in the Python harness: | ||||||||||||||||
|
|
||||||||||||||||
| | Responsibility | Component | | ||||||||||||||||
| |----------------|-----------| | ||||||||||||||||
| | Coverage analysis & target selection | `coverage_targets.py` | | ||||||||||||||||
| | Prompt construction (shared rules + workflow-specific) | `respond.py`, `TESTBOT_PROMPT.md`, `TESTBOT_RESPOND_PROMPT.md`, `TESTBOT_RULES.md` | | ||||||||||||||||
| | Git operations (branch, commit, push, retry, revert) | `create_pr.py`, `respond.py` | | ||||||||||||||||
| | GitHub API (fetch threads, post replies, create PRs) | `create_pr.py`, `respond.py` | | ||||||||||||||||
| | Guardrail enforcement (file-type filter) | `guardrails.py` | | ||||||||||||||||
| | Structured output parsing (3-tier fallback) | `respond.py:_extract_replies()` | | ||||||||||||||||
| | Timeout / max-turns handling | `respond.py:run_claude()` and `main()` | | ||||||||||||||||
| | Auto-approval of environment deployments | `testbot-respond-approve.yaml` | | ||||||||||||||||
|
|
||||||||||||||||
| ## Prompt files | ||||||||||||||||
|
|
||||||||||||||||
| Prompts are **file-based** (not inlined in Python) so they can be edited, diffed, and reviewed independently: | ||||||||||||||||
|
|
||||||||||||||||
| | File | Purpose | | ||||||||||||||||
| |------|---------| | ||||||||||||||||
| | `TESTBOT_RULES.md` | **Shared** test quality rules, bug-detection process, verification steps, language conventions. Referenced by both workflows. | | ||||||||||||||||
| | `TESTBOT_PROMPT.md` | Generate-specific: coverage targets process, BUILD file handling, guardrails. References `TESTBOT_RULES.md`. | | ||||||||||||||||
| | `TESTBOT_RESPOND_PROMPT.md` | Respond-specific: role framing, PR context guidance, output JSON schema example. References `TESTBOT_RULES.md`. | | ||||||||||||||||
|
|
||||||||||||||||
| ## Configuration | ||||||||||||||||
|
|
||||||||||||||||
| ### Test generation (dispatch inputs) | ||||||||||||||||
| ### Generate workflow (dispatch inputs) | ||||||||||||||||
|
|
||||||||||||||||
| | Input | Default | Description | | ||||||||||||||||
| |-------|---------|-------------| | ||||||||||||||||
|
|
@@ -106,13 +181,13 @@ Then post a new `/testbot` comment with clearer instructions. | |||||||||||||||
| | `model` | `aws/anthropic/claude-opus-4-5` | LLM model on API gateway | | ||||||||||||||||
| | `dry_run` | `false` | Generate without creating PR | | ||||||||||||||||
|
|
||||||||||||||||
| ### Review response (CLI args in `testbot-respond.yaml`) | ||||||||||||||||
| ### Respond workflow (CLI args in `testbot-respond.yaml`) | ||||||||||||||||
|
|
||||||||||||||||
| | Arg | Default | Description | | ||||||||||||||||
| |-----|---------|-------------| | ||||||||||||||||
| | `--max-turns` | `50` | Claude Code agent turns | | ||||||||||||||||
| | `--max-turns` | `75` | Claude Code agent turns | | ||||||||||||||||
| | `--max-responses` | `10` | Max threads to address per trigger | | ||||||||||||||||
| | `--timeout` | `720` | Claude Code CLI timeout in seconds | | ||||||||||||||||
| | `--timeout` | `900` | Claude Code CLI timeout in seconds | | ||||||||||||||||
|
Comment on lines
+188
to
+190
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix documented respond defaults to match These defaults are currently incorrect in docs. Suggested doc fix-| `--max-turns` | `75` | Claude Code agent turns |
+| `--max-turns` | `50` | Claude Code agent turns |
...
-| `--timeout` | `900` | Claude Code CLI timeout in seconds |
+| `--timeout` | `720` | Claude Code CLI timeout in seconds |📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||
| | `--model` | `aws/anthropic/claude-opus-4-5` | LLM model | | ||||||||||||||||
|
|
||||||||||||||||
| ### Coverage target selection (constants in `coverage_targets.py`) | ||||||||||||||||
|
|
@@ -143,5 +218,5 @@ src/scripts/testbot/ | |||||||||||||||
| .github/workflows/ | ||||||||||||||||
| ├── testbot.yaml # Scheduled test generation | ||||||||||||||||
| ├── testbot-respond.yaml # /testbot review response | ||||||||||||||||
| └── testbot-respond-approve.yaml # Auto-approve for org members | ||||||||||||||||
| └── testbot-respond-approve.yaml # Auto-approve for NVIDIA org members | ||||||||||||||||
| ``` | ||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve schedule wording mismatch (“weekly” vs weekday cron).
Line 5 says generation runs on a weekly schedule, but Line 47 defines a weekday cron (
0 6 * * 1-5), which is daily on weekdays. Please align wording to avoid confusion.Also applies to: 47-47
🤖 Prompt for AI Agents