feat: add Claude Code CLI adapter by hculap · Pull Request #1 · kevinrgu/autoagent

hculap · 2026-04-03T09:42:49Z

Summary

Adds a Claude Code CLI adapter for Harbor that uses the user's existing OAuth session — no API key needed.

agent-claude-code.py — Harbor adapter that runs claude CLI on the host machine
Dockerfile.claude-code — lightweight base image (no Node.js/Claude Code in container)
program-claude-code.md — meta-agent directive for iterating on this variant

Architecture

Unlike the SDK-based adapters, this variant runs Claude Code CLI on the host (same as agent.py runs OpenAI SDK host-side). The flow:

Host machine                          Docker container
┌──────────────────┐                  ┌──────────────────┐
│ Harbor calls     │                  │ Task environment  │
│ AutoAgent.run()  │──download_dir──> │ /task/files/      │
│                  │                  │ /task/output/     │
│ claude CLI runs  │                  │                   │
│ (OAuth session)  │                  │ Verifier runs     │
│ writes to tmpdir │──upload_dir───>  │ checks /task/     │
└──────────────────┘                  └──────────────────┘

Harbor spins up a container per task
Adapter downloads pre-existing files from container via download_dir
claude --print --output-format stream-json runs host-side in a temp dir
Claude reads instruction, writes output files using its own tools
Adapter syncs output files back via upload_dir
Harbor runs the verifier inside the container

Setup

# 1. Install Claude Code CLI (if not already)
npm install -g @anthropic-ai/claude-code

# 2. Login (uses your existing Claude subscription, no API key)
claude login

# 3. Build the base image
docker build -f Dockerfile.claude-code -t autoagent-base .

# 4. Run tasks
uv run harbor run -p tasks/ --agent-import-path agent-claude-code:AutoAgent -o jobs

No .env file, no ANTHROPIC_API_KEY, no API billing — uses your Claude subscription via OAuth.

Test results

Full Docker + Harbor e2e pipeline with 4 baseline tasks (hello-world, fibonacci, csv-analysis, git-log):

Task	Score	Turns	Cost	Duration
hello-world	1.0	2	$0.05	7s
fibonacci	1.0	3	$0.05	14s
csv-analysis	1.0	3	$0.05	13s
git-log	1.0	2	$0.04	15s
Total	4/4		$0.19	36s

Note: These are baseline validation tasks to prove the adapter works end-to-end
through the full Harbor pipeline. Real benchmark results (e.g. swe-bench-verified,
featurebench, dabstep) would be a natural next step for contributor testing.

Safety & timeout controls

Control	Default	Purpose
`MAX_TURNS`	30	Limits conversation turns
`MAX_BUDGET_USD`	1.0	Cost cap per task
`TIMEOUT_SEC`	540	Hard kill after 9 minutes
`PERMISSION_MODE`	bypassPermissions	Configurable permission level

Security note: The CLI runs on the host (not sandboxed in Docker) with bypassPermissions.
Only run on trusted task sets.

Meta-agent iteration surface

The editable section exposes:

SYSTEM_PROMPT — agent instructions
MODEL — sonnet/haiku/opus
MAX_TURNS — turn budget
MAX_BUDGET_USD — cost cap per task
TIMEOUT_SEC — hard timeout for CLI process
PERMISSION_MODE — CLI permission level
ALLOWED_TOOLS — restrict Claude's tool set
CLI_EXTRA_FLAGS — additional CLI flags
build_cli_args() — full CLI invocation strategy

Test plan

Build Dockerfile and verify imports
Run single task locally with claude --print
Verify ATIF-v1.6 trajectory format
Full Harbor e2e: 4/4 baseline tasks pass with Docker pipeline
Run on real benchmark suite (contributor testing)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Add a new agent variant that uses the Claude Code CLI (`claude --print`) as the execution backend instead of direct SDK calls. This enables running AutoAgent benchmarks with Claude Code's built-in tool suite. Files added: - agent-claude-code.py: Harbor adapter with editable/fixed boundary - Dockerfile.claude-code: base image with Node.js + Claude Code CLI - program-claude-code.md: meta-agent directive for this variant

The adapter now copies the host's ~/.claude credentials into the container at runtime. No ANTHROPIC_API_KEY or .env file needed — Claude Code CLI uses its own OAuth session from the host machine.

- Fixed CLI args: added --verbose (required for stream-json), prompt is positional not --prompt flag - Rewrote ATIF parser to handle actual stream-json message structure: assistant messages with tool_use content blocks, user messages with tool_result blocks, pending tool pairing - Tested locally: all 4 tasks produce valid ATIF-v1.6 trajectories

Major restructure: claude CLI now runs on the HOST machine (not inside Docker) using the user's existing OAuth session. No API key needed. - Agent syncs files from container to host temp dir before running - Claude executes with full OAuth auth from host keychain - Results synced back to container for verifier - Simplified Dockerfile (no Node.js/Claude Code needed in image) - Rewrites /task/ paths to temp dir paths for correct file placement - Fixed download_file arg order for container-to-host sync Tested: 4/4 tasks pass (hello-world, fibonacci, csv-analysis, git-log) with Harbor e2e Docker pipeline. Mean score: 1.000.

Critical fixes: - Replace manual file sync with Harbor's upload_dir/download_dir (fixes shell injection, silent file loss, and fragile find+loop) - Check subprocess exit code and log errors - Preserve partial stdout on TimeoutExpired Important fixes: - Use asyncio.to_thread() for subprocess.run to avoid blocking event loop - Remove bare except Exception: pass on metrics (no longer needed) - Log non-JSON lines from CLI output instead of silently dropping - Add logging throughout via logger instead of print() Doc fixes: - Remove false claim about copying ~/.claude auth into container - Add PERMISSION_MODE as configurable constant (was hardcoded) - Add security warning about host-side execution - Fix ATIF acronym expansion in docstring - Remove unused workdir param from build_cli_args - Remove unused output_dir creation Tested: 4/4 tasks pass (Mean: 1.000) after all changes.

hculap · 2026-04-03T11:36:13Z

@kevinrgu Hey! Would love your review on this. Adds a Claude Code CLI adapter that uses OAuth (no API key needed). 4/4 Harbor e2e tasks pass. Happy to run on a real benchmark suite if you point me to one.

- Add MAX_BUDGET_USD (default $1.00) via --max-budget-usd flag - Add TIMEOUT_SEC (default 540s) as configurable constant - Prevents meta-agent from getting stuck on hung tasks

kevinrgu and others added 11 commits April 3, 2026 00:16

clean up README structure and add thirdlayer branding

95981d4

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

move thirdlayer branding to top of README

74c0bf3

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

resize progress graph to 600px width

467540d

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

add product signup quote under thirdlayer branding

2c9fcd6

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Built by thirdlayer.inc, capitalize AutoAgent, full-width graph

0602cbb

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

remove autoresearch link

b0b0a85

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

fix: use local Claude CLI auth instead of API key

1c5f7bc

The adapter now copies the host's ~/.claude credentials into the container at runtime. No ANTHROPIC_API_KEY or .env file needed — Claude Code CLI uses its own OAuth session from the host machine.

feat: add timeout and budget controls

7a021f5

- Add MAX_BUDGET_USD (default $1.00) via --max-budget-usd flag - Add TIMEOUT_SEC (default 540s) as configurable constant - Prevents meta-agent from getting stuck on hung tasks

kevinrgu force-pushed the main branch 6 times, most recently from 2e68f3b to eb3f185 Compare April 3, 2026 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Claude Code CLI adapter#1

feat: add Claude Code CLI adapter#1
hculap wants to merge 12 commits into
kevinrgu:mainfrom
hculap:feat/claude-code-adapter

hculap commented Apr 3, 2026 •

edited

Loading

Uh oh!

hculap commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hculap commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Setup

Test results

Safety & timeout controls

Meta-agent iteration surface

Test plan

Uh oh!

hculap commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hculap commented Apr 3, 2026 •

edited

Loading