Skip to content

feat: add Claude Code CLI adapter#1

Open
hculap wants to merge 12 commits into
kevinrgu:mainfrom
hculap:feat/claude-code-adapter
Open

feat: add Claude Code CLI adapter#1
hculap wants to merge 12 commits into
kevinrgu:mainfrom
hculap:feat/claude-code-adapter

Conversation

@hculap
Copy link
Copy Markdown

@hculap hculap commented Apr 3, 2026

Summary

Adds a Claude Code CLI adapter for Harbor that uses the user's existing OAuth session — no API key needed.

  • agent-claude-code.py — Harbor adapter that runs claude CLI on the host machine
  • Dockerfile.claude-code — lightweight base image (no Node.js/Claude Code in container)
  • program-claude-code.md — meta-agent directive for iterating on this variant

Architecture

Unlike the SDK-based adapters, this variant runs Claude Code CLI on the host (same as agent.py runs OpenAI SDK host-side). The flow:

Host machine                          Docker container
┌──────────────────┐                  ┌──────────────────┐
│ Harbor calls     │                  │ Task environment  │
│ AutoAgent.run()  │──download_dir──> │ /task/files/      │
│                  │                  │ /task/output/     │
│ claude CLI runs  │                  │                   │
│ (OAuth session)  │                  │ Verifier runs     │
│ writes to tmpdir │──upload_dir───>  │ checks /task/     │
└──────────────────┘                  └──────────────────┘
  1. Harbor spins up a container per task
  2. Adapter downloads pre-existing files from container via download_dir
  3. claude --print --output-format stream-json runs host-side in a temp dir
  4. Claude reads instruction, writes output files using its own tools
  5. Adapter syncs output files back via upload_dir
  6. Harbor runs the verifier inside the container

Setup

# 1. Install Claude Code CLI (if not already)
npm install -g @anthropic-ai/claude-code

# 2. Login (uses your existing Claude subscription, no API key)
claude login

# 3. Build the base image
docker build -f Dockerfile.claude-code -t autoagent-base .

# 4. Run tasks
uv run harbor run -p tasks/ --agent-import-path agent-claude-code:AutoAgent -o jobs

No .env file, no ANTHROPIC_API_KEY, no API billing — uses your Claude subscription via OAuth.

Test results

Full Docker + Harbor e2e pipeline with 4 baseline tasks (hello-world, fibonacci, csv-analysis, git-log):

Task Score Turns Cost Duration
hello-world 1.0 2 $0.05 7s
fibonacci 1.0 3 $0.05 14s
csv-analysis 1.0 3 $0.05 13s
git-log 1.0 2 $0.04 15s
Total 4/4 $0.19 36s

Note: These are baseline validation tasks to prove the adapter works end-to-end
through the full Harbor pipeline. Real benchmark results (e.g. swe-bench-verified,
featurebench, dabstep) would be a natural next step for contributor testing.

Safety & timeout controls

Control Default Purpose
MAX_TURNS 30 Limits conversation turns
MAX_BUDGET_USD 1.0 Cost cap per task
TIMEOUT_SEC 540 Hard kill after 9 minutes
PERMISSION_MODE bypassPermissions Configurable permission level

Security note: The CLI runs on the host (not sandboxed in Docker) with bypassPermissions.
Only run on trusted task sets.

Meta-agent iteration surface

The editable section exposes:

  • SYSTEM_PROMPT — agent instructions
  • MODEL — sonnet/haiku/opus
  • MAX_TURNS — turn budget
  • MAX_BUDGET_USD — cost cap per task
  • TIMEOUT_SEC — hard timeout for CLI process
  • PERMISSION_MODE — CLI permission level
  • ALLOWED_TOOLS — restrict Claude's tool set
  • CLI_EXTRA_FLAGS — additional CLI flags
  • build_cli_args() — full CLI invocation strategy

Test plan

  • Build Dockerfile and verify imports
  • Run single task locally with claude --print
  • Verify ATIF-v1.6 trajectory format
  • Full Harbor e2e: 4/4 baseline tasks pass with Docker pipeline
  • Run on real benchmark suite (contributor testing)

🤖 Generated with Claude Code

kevinrgu and others added 11 commits April 3, 2026 00:16
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add a new agent variant that uses the Claude Code CLI (`claude --print`)
as the execution backend instead of direct SDK calls. This enables
running AutoAgent benchmarks with Claude Code's built-in tool suite.

Files added:
- agent-claude-code.py: Harbor adapter with editable/fixed boundary
- Dockerfile.claude-code: base image with Node.js + Claude Code CLI
- program-claude-code.md: meta-agent directive for this variant
The adapter now copies the host's ~/.claude credentials into the
container at runtime. No ANTHROPIC_API_KEY or .env file needed —
Claude Code CLI uses its own OAuth session from the host machine.
- Fixed CLI args: added --verbose (required for stream-json), prompt
  is positional not --prompt flag
- Rewrote ATIF parser to handle actual stream-json message structure:
  assistant messages with tool_use content blocks, user messages with
  tool_result blocks, pending tool pairing
- Tested locally: all 4 tasks produce valid ATIF-v1.6 trajectories
Major restructure: claude CLI now runs on the HOST machine (not inside
Docker) using the user's existing OAuth session. No API key needed.

- Agent syncs files from container to host temp dir before running
- Claude executes with full OAuth auth from host keychain
- Results synced back to container for verifier
- Simplified Dockerfile (no Node.js/Claude Code needed in image)
- Rewrites /task/ paths to temp dir paths for correct file placement
- Fixed download_file arg order for container-to-host sync

Tested: 4/4 tasks pass (hello-world, fibonacci, csv-analysis, git-log)
with Harbor e2e Docker pipeline. Mean score: 1.000.
Critical fixes:
- Replace manual file sync with Harbor's upload_dir/download_dir
  (fixes shell injection, silent file loss, and fragile find+loop)
- Check subprocess exit code and log errors
- Preserve partial stdout on TimeoutExpired

Important fixes:
- Use asyncio.to_thread() for subprocess.run to avoid blocking event loop
- Remove bare except Exception: pass on metrics (no longer needed)
- Log non-JSON lines from CLI output instead of silently dropping
- Add logging throughout via logger instead of print()

Doc fixes:
- Remove false claim about copying ~/.claude auth into container
- Add PERMISSION_MODE as configurable constant (was hardcoded)
- Add security warning about host-side execution
- Fix ATIF acronym expansion in docstring
- Remove unused workdir param from build_cli_args
- Remove unused output_dir creation

Tested: 4/4 tasks pass (Mean: 1.000) after all changes.
@hculap
Copy link
Copy Markdown
Author

hculap commented Apr 3, 2026

@kevinrgu Hey! Would love your review on this. Adds a Claude Code CLI adapter that uses OAuth (no API key needed). 4/4 Harbor e2e tasks pass. Happy to run on a real benchmark suite if you point me to one.

- Add MAX_BUDGET_USD (default $1.00) via --max-budget-usd flag
- Add TIMEOUT_SEC (default 540s) as configurable constant
- Prevents meta-agent from getting stuck on hung tasks
@kevinrgu kevinrgu force-pushed the main branch 6 times, most recently from 2e68f3b to eb3f185 Compare April 3, 2026 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants