The linter for your agent harness.
ESLint was for the code humans wrote.
AgentLint is for the context agents read.
🌐 Site · Blog · Install · Demo · Harness 101 · Checks · Evidence · FAQ · 中文
Agent = Model + Harness. The model isn't the bottleneck anymore — the harness is.
Your
AGENTS.md,CLAUDE.md, CI config, hooks, and.gitignoreare the harness. When they're wrong, Claude Code, Cursor, and Codex ship AI slop. When they're right, agents compound.AgentLint scores your harness across 51 deterministic checks on 6 core dimensions, plus 7 opt-in extended checks (Deep + Session) that use AI sub-agents and local Claude Code session logs when available. Evidence-backed. Zero opinions.
📚 Full docs, 20+ long-form guides, and the complete check catalog live at agentlint.app. Highlights: Writing a Good CLAUDE.md · The 33-Check Catalog · AGENTS.md vs CLAUDE.md · 中文博客.
npm install -g agentlint-ai # CLI only — no Claude plugin yet
npx agentlint-ai install # opt-in: register /al Claude Code pluginThe first command installs the
agentlintCLI on$PATHand does not touch~/.claude/. The second command (one-time, opt-in) detects Claude Code, copies the/alslash command into~/.claude/commands/, and registers the marketplace plugin. Side-effect details and uninstall path in INSTALL.md.
Then in any git repo:
agentlint checkIn Claude Code (after running npx agentlint-ai install): run /al for the interactive scan-fix-report flow.
Using an AI coding agent? Point it at INSTALL.md — it's written to be read once and acted on.
$ /al
AgentLint — Score: 72/100 (core)
Findability ██████████████░░░░░░ 7/10
Instructions ████████████████░░░░ 8/10
Workability ████████████░░░░░░░░ 6/10
Safety ██████████░░░░░░░░░░ 5/10
Continuity ██████████████░░░░░░ 7/10
Harness ████████████████████ 10/10
Deep ░░░░░░░░░░░░░░░░░░░░ n/a (opt-in)
Session ░░░░░░░░░░░░░░░░░░░░ n/a (opt-in)
Fix Plan (7 items):
[guided] Pin 8 GitHub Actions to SHA (supply chain risk)
[guided] Add .env to .gitignore (AI exposes secrets)
[assisted] Generate HANDOFF.md
[guided] Reduce IMPORTANT keywords (7 found, Anthropic uses 4)
Select items → AgentLint fixes → re-scores → saves HTML report
In February 2026, Mitchell Hashimoto (HashiCorp) coined the term. OpenAI's Ryan Lopopolo formalized it days later. LangChain's Vivek Trivedy gave it the cleanest definition:
Agent = Model + Harness. If you're not the model, you're the harness.
The harness is every piece of code, configuration, and instruction that wraps an LLM and turns it into an agent. For coding agents, your harness includes:
AGENTS.md/CLAUDE.md— the persistent rules injected at session start.cursor/rules/,.github/copilot-instructions.md— tool-specific instruction layers- CI, pre-commit hooks,
.gitignore— the deterministic constraints the agent can't override SECURITY.md, changelogs, handoff notes — the context that survives across sessions
Harness engineering is the discipline of designing those pieces so the agent stays reliable across hundreds of tool calls, not just the first ten.
The research is blunt:
- Anthropic's 2026 Agentic Coding Trends Report found that teams maintaining a good context file report 40% fewer "bad suggestion" sessions
- DORA 2025 State of AI-Assisted Software Development reached the same conclusion: AI is an amplifier — it accelerates teams with good harnesses and amplifies dysfunction in teams without them
- An ETH Zurich study found that auto-generated context files actually reduce agent success rates in 5 of 8 tested settings, and increase inference cost by 20–23%
- A randomized controlled trial found developers using AI were 19% slower on complex tasks — while believing they were 20% faster
- LangChain's February 2026 report: 70% of agent performance lives outside the model. Same weights, different harness, different results.
Translation: a bad harness is worse than no harness. And almost nobody knows what a good one looks like.
AgentLint is the first linter for the harness itself.
Every check is backed by data, not opinions. The data comes from places most developers never look — and it's what lets us measure harness health rigorously:
- 265 versions of Anthropic's own Claude Code system prompt — we tracked every single word they added, deleted, and rewrote. When they cut
IMPORTANTfrom 12 uses to 4, we knew. When they removed every "You are a helpful assistant..." identity section, we knew. - Claude Code source code — which is where the harness hard limits live. 40,000-character entry files get silently truncated. 256 KB files can't be read at all. Pre-commit hooks that take too long cause commits to hang forever because Claude Code never uses
--no-verify. - Real production audits across open-source codebases — the security gaps that agents walk straight into.
- 6 academic papers on instruction compliance, context-file effectiveness, and documentation decay.
If a check can't cite a source, it doesn't ship.
58 checks total: 51 deterministic core checks across 6 dimensions (always run), plus 7 opt-in extended checks (Deep: 3 AI-powered analysis checks; Session: 4 Claude Code log-reading checks). Default agentlint check and the GitHub Action only run the 51 core checks — the extended ones need AI sub-agents or local Claude Code session logs, so they're opt-in via /al inside Claude Code.
The total score is averaged only over dimensions that actually ran. A default CI run shows Score: NN/100 (core) and marks Deep/Session as n/a, never as 0/10. When extended checks do run, the header shows (core+extended).
| Check | What | Why |
|---|---|---|
| F1 | Entry file exists | No CLAUDE.md / AGENTS.md = AI starts blind |
| F2 | Project description in first 10 lines | AI needs context before rules |
| F3 | Conditional loading guidance | "If working on X, read Y" prevents context bloat |
| F4 | Large directories have INDEX | >10 files without index = AI reads everything |
| F5 | All references resolve | Broken links waste tokens on dead-end reads |
| F6 | Standard file naming | README.md, CLAUDE.md are auto-discovered |
| F7 | @include directives resolve |
Missing targets are silently ignored — you think it's loaded, it isn't |
| F8 | Rule file frontmatter uses globs | .cursor/rules/ MDC files should match glob patterns, not exact paths |
| F9 | No unfilled template placeholders | {{variables}} left in context files waste tokens and confuse the model |
| Check | What | Why |
|---|---|---|
| I1 | Emphasis keyword count | Anthropic cut IMPORTANT from 12 to 4 across 265 versions |
| I2 | Keyword density | More emphasis = less compliance. Anthropic: 7.5 → 1.4 per 1K words |
| I3 | Rule specificity | "Don't X. Instead Y. Because Z." — Anthropic's golden formula |
| I4 | Action-oriented headings | Anthropic deleted all "You are a..." identity sections |
| I5 | No identity language | "Follow conventions" removed — model already does this |
| I6 | Entry file length | 60–120 lines is the sweet spot. Longer dilutes priority |
| I7 | Under 40,000 characters | Claude Code hard limit. Above this, your file is truncated — silently |
| I8 | Total injected content within budget | All auto-injected files stay within the 200K context budget |
| Check | What | Why |
|---|---|---|
| W1 | Build/test commands documented | AI can't guess your test runner |
| W2 | CI exists | Rules without enforcement are suggestions |
| W3 | Tests exist (not empty shell) | A CI that runs pytest with 0 test files always "passes" |
| W4 | Linter configured | Mechanical formatting frees AI from guessing style |
| W5 | No files over 256 KB | Claude Code cannot read them — hard error |
| W6 | Pre-commit hooks are fast | Claude Code never uses --no-verify. Slow hooks = stuck commits |
| W7 | Local fast test command documented | Entry file documents a fast (<30s) test command for mid-session verification |
| W8 | npm test script exists | JS/Node repos need npm test so AI can run tests without guessing |
| W9 | Release workflow validates version consistency | Automated drift detection across package.json, CHANGELOG, and badges |
| W10 | Test cost tiers defined (pytest markers) | @pytest.mark.fast lets AI run the cheap subset, not the full 10-minute suite |
| W11 | feat/fix commits paired with test commits | Gate that catches features landing without corresponding tests |
| Check | What | Why |
|---|---|---|
| C1 | Document freshness | Stale instructions are worse than no instructions |
| C2 | Handoff file exists | Without it, every session starts from zero |
| C3 | Changelog has "why" | "Updated INDEX" says nothing. "Fixed broken path" says everything |
| C4 | Plans in repo | Plans in Jira don't exist for AI |
| C5 | CLAUDE.local.md not in git |
Private per-user file — must be in .gitignore |
| C6 | HANDOFF.md has verify conditions | Notes with evidence (score ≥ X, tests pass) let the next session skip full re-audit |
| Check | What | Why |
|---|---|---|
| S1 | .env in .gitignore |
AI's Glob tool ignores .gitignore by default — secrets visible |
| S2 | Actions SHA pinned | AI push triggers CI. Floating tags = supply chain attack vector |
| S3 | Secret scanning configured | AI won't self-check for accidentally written API keys |
| S4 | SECURITY.md exists |
AI needs security context for sensitive code decisions |
| S5 | Workflow permissions minimized | AI-triggered workflows shouldn't have write access by default |
| S6 | No hardcoded secrets | Detects sk-, ghp_, AKIA, private key patterns in source |
| S7 | No personal paths in source | Absolute home-dir paths leak machine identity and break on other machines |
| S8 | No pull_request_target trigger |
Runs in privileged context — supply chain attack vector for external PRs |
| S9 | No personal email in git history | Personal email in commits is a privacy and identity leak |
| Check | What | Why |
|---|---|---|
| H1 | Hook event names valid | PoToolUse vs PostToolUse — typos silently prevent hooks from ever firing |
| H2 | PreToolUse hooks have matcher | Without a tool matcher, the hook runs before every tool call |
| H3 | Stop hook has circuit breaker | Stop hooks without an exit condition run forever |
| H4 | No dangerous auto-approve | * or .* grant unlimited tool execution with no human check |
| H5 | Env deny coverage complete | Missing deny patterns let secrets leak to untrusted tools |
| H6 | Hook scripts network access | Outbound calls from hooks can exfiltrate data triggered by the agent |
| H7 | Gate workflows are blocking | Warn-only CI gates are effectively disabled — agents merge despite failures |
| H8 | Hook errors use structured format | what/rule/fix lets the agent self-correct; unstructured errors leave it stuck |
Spawns AI subagents to find what pattern-matching can't:
| Check | What | Why |
|---|---|---|
| D1 | Contradictory rules | Two rules that conflict cause the model to pick one — usually the wrong one |
| D2 | Dead-weight rules | Rules the model would follow anyway waste tokens and dilute priority |
| D3 | Vague rules without decision boundary | "Use good judgment" gives the model nothing to evaluate against |
Reads your session history to surface patterns you'd never notice manually:
| Check | What | Why |
|---|---|---|
| SS1 | Repeated instructions | Instructions you type every session belong in CLAUDE.md |
| SS2 | Ignored rules | Rules AI keeps bypassing need rewriting, not repeating |
| SS3 | Friction hotspots | Which projects and tasks generate the most re-work |
| SS4 | Missing rule suggestions | Common corrections that aren't captured anywhere yet |
/init generates a template CLAUDE.md from scratch. Useful on day one. Useless on day fifty — when the file is stale, bloated with emphasis keywords the model ignores, missing .env in .gitignore, and silently exceeds the 40K hard limit.
/init writes a file. AgentLint audits the whole system:
/init |
AgentLint | |
|---|---|---|
Generates template CLAUDE.md |
✅ | — |
| Checks entry-file quality | — | ✅ |
Finds broken @include references |
— | ✅ |
| Enforces the 40K character hard limit | — | ✅ |
Audits CI, hooks, .gitignore, Actions SHA pinning |
— | ✅ |
| Detects instruction rot over time | — | ✅ |
| Audits Claude Code hook configuration | — | ✅ |
| Auto-fixes what it can | — | ✅ |
| Every check backed by a cited data source | — | ✅ |
- Solo developers using Claude Code, Cursor, or Codex who want the agent to stop ignoring your rules
- Team leads who need every repo in the org to be AI-ready before agents ship to prod
- OSS maintainers whose external contributors (and their agents) should write code in your style
- Security-conscious engineers worried about agents exfiltrating
.envor triggering vulnerable workflows
AgentLint ships as a Claude Code plugin and standalone CLI. When it runs, it audits any of the following if present in your repo:
CLAUDE.md(Anthropic's Claude Code)AGENTS.md(the universal standard — used by OpenAI Codex, Cursor, Windsurf, Kilo, GitHub Copilot, Gemini CLI, and 60,000+ open-source repos).cursor/rules/.github/copilot-instructions.md
Roadmap: native Cursor and Codex integrations. Star the repo to follow.
npm install -g agentlint-aiOr update the Claude Code plugin directly:
claude plugin update agent-lint@agent-lintEvery check cites its source. No opinions, no best practices — data.
| Source | Type |
|---|---|
| Anthropic 265 prompt versions | Primary dataset |
| Claude Code source code | Hard limits and internal behavior |
| IFScale (NeurIPS) | Instruction compliance at scale |
| ETH Zurich | Do context files help coding agents? |
| Codified Context | Stale content as #1 failure mode |
| Agent READMEs | Concrete vs abstract effectiveness |
Full citations in standards/evidence.json.
What exactly is an "agent harness"?
The term got popular in early 2026 (Mitchell Hashimoto, OpenAI, LangChain). Shortest definition: Agent = Model + Harness. The harness is everything that wraps an LLM and turns it into an agent — tools, state management, feedback loops, and the persistent rules it reads at session start. For coding agents, that last part is your AGENTS.md, CLAUDE.md, .cursor/rules, CI, pre-commit hooks, and .gitignore. AgentLint is the first linter built specifically to audit that layer.
Why not just use /init and call it a day?
See the table above. /init writes a file; it doesn't audit your repo. AgentLint does 51 deterministic checks across 6 core dimensions (plus 7 opt-in extended checks) — and fixes what it finds.
Does this work with Cursor, Codex, or GitHub Copilot?
Today AgentLint runs inside Claude Code, but the checks apply to repo assets every agent reads: AGENTS.md, .cursor/rules, .github/copilot-instructions.md. A well-linted repo makes every agent better, not just Claude. Native Cursor and Codex integrations are on the roadmap.
Is my code sent anywhere?
It depends on which mode you run. The default (agentlint check and the GitHub Action) is local-only and runs zero AI. The two opt-in extended modes do touch AI or local session logs — we spell it out so there's no surprise:
| Mode | Data accessed | Network / AI |
|---|---|---|
agentlint check (default) |
files in the repo being scanned | Local only, no AI |
| GitHub Action | files in the checked-out repo inside the runner | Local only, no AI |
/al (core dims only) |
git repos under the configured PROJECTS_ROOT |
Local only, no AI |
/al with Deep (opt-in) |
selected entry files (e.g. CLAUDE.md) |
Sends file contents to a Claude sub-agent |
/al with Session (opt-in) |
~/.claude/projects/ logs on your machine |
Local analyzer. Output is redacted by default; raw snippets require --include-raw-snippets |
Deep is the only mode that transmits file contents off your machine, and it only runs when you explicitly ask for it inside Claude Code. Everything the default scan produces — the Score: NN/100 (core) output, the JSONL, the SARIF, the GitHub Action annotations — comes from pattern checks on disk, no API calls.
Does npm install write outside node_modules?
No. npm install -g agentlint-ai only installs the agentlint CLI to npm's global prefix (just like any other CLI tool). The Claude Code plugin install is opt-in: run npx agentlint-ai install (one-time) to detect Claude Code and register the /al slash command in ~/.claude/commands/. The CLI works without that step; the /al slash command does not.
Failure-mode fallbacks live in INSTALL.md.
Isn't this just "best practices"?
No. Every check cites a specific source — Anthropic's 265 prompt versions, Claude Code source code, peer-reviewed papers, or real production audits. If a check can't be backed by data, it doesn't ship.
Why do you lint AGENTS.md if this is a Claude Code plugin?
Because good context engineering is cross-tool. If you're using any combination of Claude Code, Cursor, and Codex, the same AGENTS.md serves all of them. AgentLint checks it against the same evidence base regardless of which agent ends up reading it.
How long does a scan take?
Under 5 seconds for most repos. The Deep and Session dimensions take longer because they spawn subagents or read session logs.
- Node 20+
jq- Claude Code (for
/alplugin and Deep/Session analysis)
Issues and PRs welcome. See CONTRIBUTING.md.
If AgentLint saved you from one bad agent session, please ⭐ star the repo — it's how we find out it's useful.
Built by @0xmariowu · agentlint.app