From 6b241b986226240b0388b500c80d53218912947f Mon Sep 17 00:00:00 2001
From: jiaen-nv <jiaenr@nvidia.com>
Date: Fri, 24 Apr 2026 16:09:41 -0700
Subject: [PATCH] Rewrite testbot README with architecture diagrams

Add system architecture diagram showing the LLM/harness/external API
separation, per-workflow flow diagrams (generate and respond), and a
comprehensive guardrails table covering all safety layers (tool
allowlist, label gate, fork rejection, author association, required
reviewer, org membership, commit sanitization, partial work discard).

Also document the auto-approver workflow, the harness responsibilities
split, and the file-based prompt structure (TESTBOT_RULES.md shared
between generate and respond).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 src/scripts/testbot/README.md | 209 +++++++++++++++++++++++-----------
 1 file changed, 142 insertions(+), 67 deletions(-)

diff --git a/src/scripts/testbot/README.md b/src/scripts/testbot/README.md
index 5c1bde709..16c66fa71 100644
--- a/src/scripts/testbot/README.md
+++ b/src/scripts/testbot/README.md
@@ -1,78 +1,27 @@
-# Testbot: AI-Powered Test Generation
+# Testbot: AI-Powered Test Generation & Review Response
 
-Testbot analyzes coverage gaps, generates tests using Claude Code, validates them, and opens PRs for human review. It also responds to inline review comments via `/testbot`.
+Testbot is a GitHub Actions bot backed by Claude Code that:
 
-## Architecture
+1. **Generates tests** for low-coverage files on a weekly schedule (opens PRs for review)
+2. **Responds to `/testbot` comments** on any PR labeled `ai-generated` (applies fixes, writes tests, addresses CodeRabbit feedback)
 
-### Test Generation (`testbot.yaml`)
+## Using testbot
 
-```text
-Codecov API → coverage_targets.py → Claude Code CLI → guardrails → create_pr.py
-                                      |         ↑
-                                      └─────────┘ (agent retries on test failures)
-```
-
-| Stage | Component | Description |
-|-------|-----------|-------------|
-| **Coverage analysis** | `coverage_targets.py` | Fetches Codecov report, selects lowest-coverage files |
-| **Test generation** | Claude Code CLI | Reads source, writes test files and BUILD entries, runs tests, iterates on failures |
-| **Guardrails** | `guardrails.py` | Filters out any non-test file changes made by Claude |
-| **PR creation** | `create_pr.py` | Creates branch, commits test files, pushes, opens PR with `ai-generated` label |
-
-Claude Code is sandboxed: it can only read files, edit test files, and run test commands (`bazel test`, `pnpm test`). It cannot run `git`, `gh`, or modify source code. All git and GitHub operations are in deterministic harness scripts.
-
-### Review Response (`testbot-respond.yaml`)
-
-```text
-/testbot comment → respond.py
-  ├─ fetch all thread comments (GraphQL)
-  ├─ filter: trigger phrase, author, dedup
-  ├─ Claude Code CLI: read files, apply fix, run tests
-  ├─ respond.py: git commit + push
-  ├─ structured reply via --json-schema
-  └─ post inline reply to each thread
-```
-
-| Feature | Description |
-|---------|-------------|
-| **Trigger** | Comment starting with `/testbot` on any PR with the `ai-generated` label |
-| **Thread context** | Full conversation history (all nested comments) passed to Claude |
-| **Structured output** | `--json-schema` returns per-thread replies and commit message |
-| **Safety** | Repo-member-only access, crash recovery, push retry |
-| **Dedup** | Skips threads where the bot already replied and is awaiting human follow-up |
-
-### Security Boundary
-
-|  | Claude Code | Harness scripts |
-|---|---|---|
-| Read source files | Yes | — |
-| Write/edit test files | Yes | — |
-| Run `bazel test` / `pnpm test` | Yes | — |
-| Run `git` commands | **No** | `create_pr.py`, `respond.py` |
-| Run `gh` commands | **No** | `create_pr.py`, `respond.py` |
-| Filter non-test changes | — | `guardrails.py` |
-
-## Triggering on GitHub
-
-### Manual dispatch
+### Generate workflow — manual dispatch
 
 **Actions → Testbot → Run workflow**, or via CLI:
 
 ```bash
-gh workflow run testbot.yaml --ref <branch> \
+gh workflow run testbot.yaml --ref main \
   -f max_targets=1 \
   -f max_uncovered=300 \
   -f max_turns=50 \
   -f model=aws/anthropic/claude-opus-4-5
 ```
 
-### Schedule
-
-Runs automatically on weekdays at 6 AM UTC.
-
-### Review response
+### Respond workflow — /testbot comments
 
-Add the `ai-generated` label to your PR, then start an inline review comment with `/testbot <instruction>`. The command must be the first text in the comment. Examples:
+Add the `ai-generated` label to your PR, then post an **inline review comment** (on the "Files changed" tab) starting with `/testbot`. Examples:
 
 ```text
 /testbot add unit tests for this file
@@ -81,11 +30,15 @@ Add the `ai-generated` label to your PR, then start an inline review comment wit
 /testbot refactor this function to reduce duplication
 ```
 
-The bot responds only to repo members (OWNER, MEMBER, COLLABORATOR). It will not respond to its own replies or comments from bots.
+The command must be the **first text** in the comment. Only repo members (OWNER, MEMBER, COLLABORATOR) can trigger the bot. It won't respond to its own replies or to other bots.
+
+**Example threads** showing the bot in action on PR #890:
+- [Thread r3126197776](https://github.com/NVIDIA/OSMO/pull/890/changes/40b026ff5eb4cb99d697476a49dead9811a9131b#r3126197776)
+- [Thread r3126743347](https://github.com/NVIDIA/OSMO/pull/890/changes/40b026ff5eb4cb99d697476a49dead9811a9131b#r3126743347)
 
 ### Reverting a testbot commit
 
-If the bot's commit isn't what you wanted, revert it and retry:
+If the bot's commit isn't what you wanted:
 
 ```bash
 git pull && git revert HEAD --no-edit && git push
@@ -93,9 +46,131 @@ git pull && git revert HEAD --no-edit && git push
 
 Then post a new `/testbot` comment with clearer instructions.
 
+## System Architecture
+
+```text
+                         ┌──────────────────────────────────┐
+                         │        Claude Code CLI           │
+                         │  (sandboxed — Read/Edit/Write,   │
+                         │   bazel test, pnpm test, gh pr)  │
+                         └───────────────┬──────────────────┘
+                                         ▲
+                                         │ --allowedTools, --json-schema
+                                         │
+  ┌──────────────────────┐     ┌─────────┴──────────┐     ┌──────────────────┐
+  │  GENERATE WORKFLOW   │     │     HARNESS        │     │  RESPOND WORKFLOW │
+  │  (testbot.yaml)      │────▶│  Python scripts    │◀────│  (testbot-respond │
+  │  Weekly cron         │     │  (git, gh, auth,   │     │   .yaml)          │
+  │  or dispatch         │     │   guardrails)      │     │  /testbot comment │
+  └──────────┬───────────┘     └─────────┬──────────┘     └──────────┬───────┘
+             │                           │                           │
+             ▼                           ▼                           ▼
+       ┌──────────┐             ┌───────────────┐             ┌──────────┐
+       │ Codecov  │             │   git push    │             │  GitHub  │
+       │   API    │             │   gh pr ...   │             │  API     │
+       └──────────┘             └───────────────┘             └──────────┘
+```
+
+The architecture separates **what the LLM can do** (read code, run tests) from **what the harness does** (git, GitHub API, auth, guardrails). The LLM is never trusted with write access to branches or the GitHub API.
+
+## Workflow 1: Generate Tests (`testbot.yaml`)
+
+```text
+┌─────────────┐  ┌────────────────────────┐  ┌─────────────────────┐  ┌──────────────┐  ┌────────────┐
+│  Trigger    │  │ 1. coverage_targets.py │  │ 2. Claude Code CLI  │  │ 3. guardrails│  │ 4. create_ │
+│  weekday    │─▶│   Fetch Codecov        │─▶│   Read source       │─▶│   .py        │─▶│    pr.py   │
+│  6 AM UTC   │  │   Pick low-cov file    │  │   Write tests+BUILD │  │   Keep tests │  │   Branch,  │
+│  or manual  │  │   Emit target list     │  │   Run bazel test    │  │   only       │  │   commit,  │
+└─────────────┘  └────────────────────────┘  │   Retry on fail     │  │   Revert src │  │   open PR  │
+                                             └─────────────────────┘  └──────────────┘  └────────────┘
+```
+
+**Trigger:** Cron `0 6 * * 1-5` (weekdays 6 AM UTC) or `workflow_dispatch`
+
+**Output:** A new branch `testbot/YYYYMMDD-HHMM` and a PR titled `[testbot] Add tests for <source_file>` with the `ai-generated` label
+
+## Workflow 2: Respond to /testbot (`testbot-respond.yaml`)
+
+```text
+┌──────────────────┐  ┌─────────────────────┐  ┌─────────────────────┐  ┌────────────────┐
+│  Trigger         │  │ 1. auto-approve     │  │ 2. respond.py       │  │ 3. Claude Code │
+│  pull_request_   │─▶│   (workflow_run)    │─▶│   GraphQL: fetch    │─▶│   CLI          │
+│  review_comment  │  │   Check NVIDIA      │  │   threads + filter  │  │   Read, Edit,  │
+│  + ai-generated  │  │   org membership    │  │   for /testbot      │  │   Write, bazel,│
+│  label           │  │   Approve env       │  │   trigger + author  │  │   gh pr view   │
+└──────────────────┘  └─────────────────────┘  │   assoc (MEMBER+)   │  └────────┬───────┘
+                                               │   Build prompt      │           │
+                                               └─────────────────────┘           │
+                                                         ▲                       │
+                                                         │ structured JSON       │
+                                                         │ {commit_message,      │
+                                                         │  replies[...]}        │
+                                                         │                       │
+                     ┌─────────────────────┐  ┌──────────┴──────────┐            │
+                     │ 5. Post inline      │  │ 4. respond.py       │            │
+                     │    replies via      │◀─│   get_changed_files │◀───────────┘
+                     │    GitHub API       │  │   commit_and_push   │
+                     │    (one per thread) │  │   (retry, detect    │
+                     │                     │  │    GH013)           │
+                     └─────────────────────┘  └─────────────────────┘
+```
+
+**Trigger:** Inline review comment (on "Files changed" tab) that starts with `/testbot` on a PR labeled `ai-generated`
+
+**Output:** A new commit by `testbot[bot]` pushed to the PR branch, plus an inline reply to each addressed comment
+
+## Auto-approver (`testbot-respond-approve.yaml`)
+
+The respond workflow runs under `environment: testbot-respond` (gated by a required reviewer) so it can access the `NVIDIA_NIM_KEY` secret. The auto-approver runs on `main` via `workflow_run` and:
+
+1. Checks if the triggering actor is in the `NVIDIA/osmo-dev` team, OR a trusted bot (`svc-osmo-ci`, `github-actions[bot]`, `coderabbitai[bot]`)
+2. If authorized, calls GitHub's `reviewPendingDeploymentsForRun` to approve the deployment
+
+This lets `/testbot` comments from NVIDIA team members run automatically while blocking external contributors and bots that don't need to run the full pipeline.
+
+## Guardrails
+
+| Guardrail | Scope | Implementation |
+|-----------|-------|----------------|
+| **Tool allowlist** | Both workflows | `--allowedTools` flag restricts Claude Code to `Read,Edit,Write,Glob,Grep,bazel test,pnpm test/validate/format,gh pr view/diff/checks` — no `git`, no `gh api`, no arbitrary bash |
+| **Test-file-only filter** | Generate only | `guardrails.py:get_changed_test_files()` reverts non-test file changes before commit |
+| **Label gate** | Respond only | Workflow `if:` requires `ai-generated` label on the PR |
+| **Fork rejection** | Respond only | Workflow `if:` requires `head.repo == base.repo` (no forks) |
+| **Author association** | Respond only | `respond.py` requires the triggering comment author to be `OWNER`, `MEMBER`, or `COLLABORATOR` |
+| **Required reviewer** | Respond only | `environment: testbot-respond` blocks unauthorized runs from accessing the API key secret |
+| **Org membership check** | Respond only | Auto-approver verifies the actor is in `NVIDIA/osmo-dev` before approving |
+| **Commit message sanitization** | Both | `sanitize_commit_message()` enforces `testbot:` prefix, strips git trailers, caps at 500 chars |
+| **Push retry with GH013 detection** | Both | `commit_and_push()` retries up to 3x; fails fast on repository ruleset violations |
+| **Partial work discard** | Respond only | On timeout or max-turns hit, `respond.py` discards file changes and posts an informative reply (doesn't push half-finished work) |
+
+## Harness responsibilities
+
+Claude Code is intentionally given a **narrow capability surface**. Everything else lives in the Python harness:
+
+| Responsibility | Component |
+|----------------|-----------|
+| Coverage analysis & target selection | `coverage_targets.py` |
+| Prompt construction (shared rules + workflow-specific) | `respond.py`, `TESTBOT_PROMPT.md`, `TESTBOT_RESPOND_PROMPT.md`, `TESTBOT_RULES.md` |
+| Git operations (branch, commit, push, retry, revert) | `create_pr.py`, `respond.py` |
+| GitHub API (fetch threads, post replies, create PRs) | `create_pr.py`, `respond.py` |
+| Guardrail enforcement (file-type filter) | `guardrails.py` |
+| Structured output parsing (3-tier fallback) | `respond.py:_extract_replies()` |
+| Timeout / max-turns handling | `respond.py:run_claude()` and `main()` |
+| Auto-approval of environment deployments | `testbot-respond-approve.yaml` |
+
+## Prompt files
+
+Prompts are **file-based** (not inlined in Python) so they can be edited, diffed, and reviewed independently:
+
+| File | Purpose |
+|------|---------|
+| `TESTBOT_RULES.md` | **Shared** test quality rules, bug-detection process, verification steps, language conventions. Referenced by both workflows. |
+| `TESTBOT_PROMPT.md` | Generate-specific: coverage targets process, BUILD file handling, guardrails. References `TESTBOT_RULES.md`. |
+| `TESTBOT_RESPOND_PROMPT.md` | Respond-specific: role framing, PR context guidance, output JSON schema example. References `TESTBOT_RULES.md`. |
+
 ## Configuration
 
-### Test generation (dispatch inputs)
+### Generate workflow (dispatch inputs)
 
 | Input | Default | Description |
 |-------|---------|-------------|
@@ -106,13 +181,13 @@ Then post a new `/testbot` comment with clearer instructions.
 | `model` | `aws/anthropic/claude-opus-4-5` | LLM model on API gateway |
 | `dry_run` | `false` | Generate without creating PR |
 
-### Review response (CLI args in `testbot-respond.yaml`)
+### Respond workflow (CLI args in `testbot-respond.yaml`)
 
 | Arg | Default | Description |
 |-----|---------|-------------|
-| `--max-turns` | `50` | Claude Code agent turns |
+| `--max-turns` | `75` | Claude Code agent turns |
 | `--max-responses` | `10` | Max threads to address per trigger |
-| `--timeout` | `720` | Claude Code CLI timeout in seconds |
+| `--timeout` | `900` | Claude Code CLI timeout in seconds |
 | `--model` | `aws/anthropic/claude-opus-4-5` | LLM model |
 
 ### Coverage target selection (constants in `coverage_targets.py`)
@@ -143,5 +218,5 @@ src/scripts/testbot/
 .github/workflows/
 ├── testbot.yaml                    # Scheduled test generation
 ├── testbot-respond.yaml            # /testbot review response
-└── testbot-respond-approve.yaml    # Auto-approve for org members
+└── testbot-respond-approve.yaml    # Auto-approve for NVIDIA org members
 ```