kevinrgu · hculap · Apr 3, 2026 · Apr 3, 2026 · Apr 3, 2026 · Apr 3, 2026
diff --git a/Dockerfile.claude-code b/Dockerfile.claude-code
@@ -0,0 +1,17 @@
+FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim
+
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends ca-certificates git && \
+    rm -rf /var/lib/apt/lists/*
+
+WORKDIR /app
+
+# Python deps — only what the agent needs (harbor excluded via .dockerignore)
+COPY pyproject.toml ./
+RUN uv pip install --system .
+
+# Agent code
+COPY agent-claude-code.py ./
+
+RUN ln -sf $(which python3) /usr/local/bin/python
+RUN mkdir -p /logs /app/output /task/output
diff --git a/README.md b/README.md
@@ -1,26 +1,37 @@
-# autoagent
+<p align="center">
+  <a href="https://www.thirdlayer.inc">
+    <img src="https://www.thirdlayer.inc/thirdlayer-logo.svg" alt="thirdlayer" width="200">
+  </a>
+</p>
+<p align="center">
+  Built by <a href="https://www.thirdlayer.inc">thirdlayer.inc</a>
+</p>
 
-![teaser](progress.png)
+> We're launching a product around self-configuring agents soon. [Sign up here](https://form.typeform.com/to/ZQbnbO09).
+
+# AutoAgent
 
-Like [autoresearch](https://github.com/karpathy/autoresearch) but for agent engineering. Give an AI agent a task, let it build and iterate on an agent harness autonomously overnight. It modifies the system prompt, tools, agent configuration, and orchestration, runs the benchmark, checks the score, keeps or discards the change, and repeats.
+> Like autoresearch but for agent engineering. Give an AI agent a task, let it build and iterate on an agent harness autonomously overnight. It modifies the system prompt, tools, agent configuration, and orchestration, runs the benchmark, checks the score, keeps or discards the change, and repeats.
 
-The core idea is the same: you're not touching the harness Python files like you normally would as an engineer. Instead, you program `program.md` — the Markdown file that provides context to the meta-agent and defines the agent-engineering loop.
+![teaser](progress.png)
+
+The core idea is the same: you're not touching the harness Python files like you normally would as an engineer. Instead, you program `program.md`, the Markdown file that provides context to the meta-agent and defines the agent-engineering loop.
 
 ## How it works
 
 The repo has a few files and directories that matter:
 
-- **`agent.py`** — the entire harness under test in a single file. It contains
+- **`agent.py`** -- the entire harness under test in a single file. It contains
   config, tool definitions, agent registry, routing/orchestration, and the
   Harbor adapter boundary. The adapter section is explicitly marked as fixed;
   the rest is the primary edit surface for the meta-agent.
-- **`program.md`** — instructions for the meta-agent + the directive (what
+- **`program.md`** -- instructions for the meta-agent + the directive (what
   kind of agent to build). **This file is edited by the human**.
-- **`tasks/`** — evaluation tasks in
+- **`tasks/`** -- evaluation tasks in
   [harbor](https://github.com/laude-institute/harbor) format. In a clean
   baseline branch, benchmark payloads may be omitted and added in
   benchmark-specific branches.
-- **`.agent/`** — optional workspace artifacts for reusable instructions,
+- **`.agent/`** -- optional workspace artifacts for reusable instructions,
   notes, prompts, or skills.
 
 The metric is total **score** produced by the benchmark's task test suites. The
@@ -70,16 +81,16 @@ benchmark, diagnose failures, modify `agent.py`, and iterate.
 ## Project structure
 
 ```text
-agent.py                       — single-file harness under test
-  editable harness section     — prompt, registries, tools, routing
-  fixed adapter section        — Harbor integration + trajectory serialization
-program.md                     — meta-agent instructions + directive
-Dockerfile.base                — base image
-.agent/                        — optional agent workspace artifacts
-tasks/                         — benchmark tasks, typically added in benchmark-specific branches
-jobs/                          — Harbor job outputs
-results.tsv                    — experiment log (created by meta-agent, gitignored)
-run.log                        — latest run output
+agent.py                       -- single-file harness under test
+  editable harness section     -- prompt, registries, tools, routing
+  fixed adapter section        -- Harbor integration + trajectory serialization
+program.md                     -- meta-agent instructions + directive
+Dockerfile.base                -- base image
+.agent/                        -- optional agent workspace artifacts
+tasks/                         -- benchmark tasks, typically added in benchmark-specific branches
+jobs/                          -- Harbor job outputs
+results.tsv                    -- experiment log (created by meta-agent, gitignored)
+run.log                        -- latest run output
 ```
 
 ## Task format
@@ -88,17 +99,17 @@ When present, tasks follow [harbor](https://github.com/laude-institute/harbor)'s
 
 ```text
 tasks/my-task/
-  task.toml           — config (timeouts, metadata)
-  instruction.md      — prompt sent to the agent
+  task.toml           -- config (timeouts, metadata)
+  instruction.md      -- prompt sent to the agent
   tests/
-    test.sh           — entry point, writes /logs/reward.txt
-    test.py           — verification (deterministic or LLM-as-judge)
+    test.sh           -- entry point, writes /logs/reward.txt
+    test.py           -- verification (deterministic or LLM-as-judge)
   environment/
-    Dockerfile        — task container (FROM autoagent-base)
-  files/              — reference files mounted into container
+    Dockerfile        -- task container (FROM autoagent-base)
+  files/              -- reference files mounted into container
 ```
 
-Tests write a score (0.0–1.0) to the verifier logs. The meta-agent hill-climbs
+Tests write a score (0.0-1.0) to the verifier logs. The meta-agent hill-climbs
 on this.
 
 ## Design choices
@@ -108,8 +119,7 @@ on this.
 - **Single-file, registry-driven harness.** The implementation lives in one
   file for simplicity, but agent and tool registration stay structured so the
   harness can still evolve cleanly.
-- **Docker isolation.** The agent-under-test runs in a container. It can't
-  damage the host.
+- **Docker isolation.** The agent runs in a container. It can't damage the host.
 - **Score-driven.** Every experiment produces a numeric score. Keep if better,
   discard if not. Same loop as autoresearch.
 - **Harbor-compatible tasks.** Tasks use the same format as harbor benchmarks,
@@ -130,7 +140,7 @@ docker system prune -a -f
 docker container prune -f
 ```
 
-If Docker becomes unresponsive (e.g. after many concurrent runs), restart
+If Docker becomes unresponsive (for example after many concurrent runs), restart
 Docker Desktop:
 
 ```bash
@@ -144,3 +154,4 @@ You can equip the agent with [Agent Skills for Context Engineering](https://gith
 ## License
 
 MIT
+