feat: add Claude Code CLI adapter#1
Open
hculap wants to merge 12 commits into
Open
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add a new agent variant that uses the Claude Code CLI (`claude --print`) as the execution backend instead of direct SDK calls. This enables running AutoAgent benchmarks with Claude Code's built-in tool suite. Files added: - agent-claude-code.py: Harbor adapter with editable/fixed boundary - Dockerfile.claude-code: base image with Node.js + Claude Code CLI - program-claude-code.md: meta-agent directive for this variant
The adapter now copies the host's ~/.claude credentials into the container at runtime. No ANTHROPIC_API_KEY or .env file needed — Claude Code CLI uses its own OAuth session from the host machine.
- Fixed CLI args: added --verbose (required for stream-json), prompt is positional not --prompt flag - Rewrote ATIF parser to handle actual stream-json message structure: assistant messages with tool_use content blocks, user messages with tool_result blocks, pending tool pairing - Tested locally: all 4 tasks produce valid ATIF-v1.6 trajectories
Major restructure: claude CLI now runs on the HOST machine (not inside Docker) using the user's existing OAuth session. No API key needed. - Agent syncs files from container to host temp dir before running - Claude executes with full OAuth auth from host keychain - Results synced back to container for verifier - Simplified Dockerfile (no Node.js/Claude Code needed in image) - Rewrites /task/ paths to temp dir paths for correct file placement - Fixed download_file arg order for container-to-host sync Tested: 4/4 tasks pass (hello-world, fibonacci, csv-analysis, git-log) with Harbor e2e Docker pipeline. Mean score: 1.000.
Critical fixes: - Replace manual file sync with Harbor's upload_dir/download_dir (fixes shell injection, silent file loss, and fragile find+loop) - Check subprocess exit code and log errors - Preserve partial stdout on TimeoutExpired Important fixes: - Use asyncio.to_thread() for subprocess.run to avoid blocking event loop - Remove bare except Exception: pass on metrics (no longer needed) - Log non-JSON lines from CLI output instead of silently dropping - Add logging throughout via logger instead of print() Doc fixes: - Remove false claim about copying ~/.claude auth into container - Add PERMISSION_MODE as configurable constant (was hardcoded) - Add security warning about host-side execution - Fix ATIF acronym expansion in docstring - Remove unused workdir param from build_cli_args - Remove unused output_dir creation Tested: 4/4 tasks pass (Mean: 1.000) after all changes.
Author
|
@kevinrgu Hey! Would love your review on this. Adds a Claude Code CLI adapter that uses OAuth (no API key needed). 4/4 Harbor e2e tasks pass. Happy to run on a real benchmark suite if you point me to one. |
- Add MAX_BUDGET_USD (default $1.00) via --max-budget-usd flag - Add TIMEOUT_SEC (default 540s) as configurable constant - Prevents meta-agent from getting stuck on hung tasks
2e68f3b to
eb3f185
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Claude Code CLI adapter for Harbor that uses the user's existing OAuth session — no API key needed.
agent-claude-code.py— Harbor adapter that runsclaudeCLI on the host machineDockerfile.claude-code— lightweight base image (no Node.js/Claude Code in container)program-claude-code.md— meta-agent directive for iterating on this variantArchitecture
Unlike the SDK-based adapters, this variant runs Claude Code CLI on the host (same as
agent.pyruns OpenAI SDK host-side). The flow:download_dirclaude --print --output-format stream-jsonruns host-side in a temp dirupload_dirSetup
No
.envfile, noANTHROPIC_API_KEY, no API billing — uses your Claude subscription via OAuth.Test results
Full Docker + Harbor e2e pipeline with 4 baseline tasks (hello-world, fibonacci, csv-analysis, git-log):
Safety & timeout controls
MAX_TURNSMAX_BUDGET_USDTIMEOUT_SECPERMISSION_MODESecurity note: The CLI runs on the host (not sandboxed in Docker) with
bypassPermissions.Only run on trusted task sets.
Meta-agent iteration surface
The editable section exposes:
SYSTEM_PROMPT— agent instructionsMODEL— sonnet/haiku/opusMAX_TURNS— turn budgetMAX_BUDGET_USD— cost cap per taskTIMEOUT_SEC— hard timeout for CLI processPERMISSION_MODE— CLI permission levelALLOWED_TOOLS— restrict Claude's tool setCLI_EXTRA_FLAGS— additional CLI flagsbuild_cli_args()— full CLI invocation strategyTest plan
claude --print🤖 Generated with Claude Code