Skip to content

0xmariowu/AgentLint

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

417 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

AgentLint

The linter for your agent harness.

ESLint was for the code humans wrote.
AgentLint is for the context agents read.

CI Release License: MIT Checks npm Claude Code

🌐 Site · Blog · Install · Demo · Harness 101 · Checks · Evidence · FAQ · 中文


Agent = Model + Harness. The model isn't the bottleneck anymore — the harness is.

Your AGENTS.md, CLAUDE.md, CI config, hooks, and .gitignore are the harness. When they're wrong, Claude Code, Cursor, and Codex ship AI slop. When they're right, agents compound.

AgentLint scores your harness across 51 deterministic checks on 6 core dimensions, plus 7 opt-in extended checks (Deep + Session) that use AI sub-agents and local Claude Code session logs when available. Evidence-backed. Zero opinions.

📚 Full docs, 20+ long-form guides, and the complete check catalog live at agentlint.app. Highlights: Writing a Good CLAUDE.md · The 33-Check Catalog · AGENTS.md vs CLAUDE.md · 中文博客.

Install

npm install -g agentlint-ai           # CLI only — no Claude plugin yet
npx agentlint-ai install              # opt-in: register /al Claude Code plugin

The first command installs the agentlint CLI on $PATH and does not touch ~/.claude/. The second command (one-time, opt-in) detects Claude Code, copies the /al slash command into ~/.claude/commands/, and registers the marketplace plugin. Side-effect details and uninstall path in INSTALL.md.

Then in any git repo:

agentlint check

In Claude Code (after running npx agentlint-ai install): run /al for the interactive scan-fix-report flow.

Using an AI coding agent? Point it at INSTALL.md — it's written to be read once and acted on.

What you get

$ /al

AgentLint — Score: 72/100 (core)

Findability      ██████████████░░░░░░  7/10
Instructions     ████████████████░░░░  8/10
Workability      ████████████░░░░░░░░  6/10
Safety           ██████████░░░░░░░░░░  5/10
Continuity       ██████████████░░░░░░  7/10
Harness          ████████████████████  10/10
Deep             ░░░░░░░░░░░░░░░░░░░░  n/a   (opt-in)
Session          ░░░░░░░░░░░░░░░░░░░░  n/a   (opt-in)

Fix Plan (7 items):
  [guided]   Pin 8 GitHub Actions to SHA (supply chain risk)
  [guided]   Add .env to .gitignore (AI exposes secrets)
  [assisted] Generate HANDOFF.md
  [guided]   Reduce IMPORTANT keywords (7 found, Anthropic uses 4)

Select items → AgentLint fixes → re-scores → saves HTML report

The harness problem

In February 2026, Mitchell Hashimoto (HashiCorp) coined the term. OpenAI's Ryan Lopopolo formalized it days later. LangChain's Vivek Trivedy gave it the cleanest definition:

Agent = Model + Harness. If you're not the model, you're the harness.

The harness is every piece of code, configuration, and instruction that wraps an LLM and turns it into an agent. For coding agents, your harness includes:

  • AGENTS.md / CLAUDE.md — the persistent rules injected at session start
  • .cursor/rules/, .github/copilot-instructions.md — tool-specific instruction layers
  • CI, pre-commit hooks, .gitignore — the deterministic constraints the agent can't override
  • SECURITY.md, changelogs, handoff notes — the context that survives across sessions

Harness engineering is the discipline of designing those pieces so the agent stays reliable across hundreds of tool calls, not just the first ten.

The research is blunt:

  • Anthropic's 2026 Agentic Coding Trends Report found that teams maintaining a good context file report 40% fewer "bad suggestion" sessions
  • DORA 2025 State of AI-Assisted Software Development reached the same conclusion: AI is an amplifier — it accelerates teams with good harnesses and amplifies dysfunction in teams without them
  • An ETH Zurich study found that auto-generated context files actually reduce agent success rates in 5 of 8 tested settings, and increase inference cost by 20–23%
  • A randomized controlled trial found developers using AI were 19% slower on complex tasks — while believing they were 20% faster
  • LangChain's February 2026 report: 70% of agent performance lives outside the model. Same weights, different harness, different results.

Translation: a bad harness is worse than no harness. And almost nobody knows what a good one looks like.

AgentLint is the first linter for the harness itself.

What makes AgentLint different

Every check is backed by data, not opinions. The data comes from places most developers never look — and it's what lets us measure harness health rigorously:

  • 265 versions of Anthropic's own Claude Code system prompt — we tracked every single word they added, deleted, and rewrote. When they cut IMPORTANT from 12 uses to 4, we knew. When they removed every "You are a helpful assistant..." identity section, we knew.
  • Claude Code source code — which is where the harness hard limits live. 40,000-character entry files get silently truncated. 256 KB files can't be read at all. Pre-commit hooks that take too long cause commits to hang forever because Claude Code never uses --no-verify.
  • Real production audits across open-source codebases — the security gaps that agents walk straight into.
  • 6 academic papers on instruction compliance, context-file effectiveness, and documentation decay.

If a check can't cite a source, it doesn't ship.

What it checks

58 checks total: 51 deterministic core checks across 6 dimensions (always run), plus 7 opt-in extended checks (Deep: 3 AI-powered analysis checks; Session: 4 Claude Code log-reading checks). Default agentlint check and the GitHub Action only run the 51 core checks — the extended ones need AI sub-agents or local Claude Code session logs, so they're opt-in via /al inside Claude Code.

The total score is averaged only over dimensions that actually ran. A default CI run shows Score: NN/100 (core) and marks Deep/Session as n/a, never as 0/10. When extended checks do run, the header shows (core+extended).

🔍 Findability — can AI find what it needs? (20%)

Check What Why
F1 Entry file exists No CLAUDE.md / AGENTS.md = AI starts blind
F2 Project description in first 10 lines AI needs context before rules
F3 Conditional loading guidance "If working on X, read Y" prevents context bloat
F4 Large directories have INDEX >10 files without index = AI reads everything
F5 All references resolve Broken links waste tokens on dead-end reads
F6 Standard file naming README.md, CLAUDE.md are auto-discovered
F7 @include directives resolve Missing targets are silently ignored — you think it's loaded, it isn't
F8 Rule file frontmatter uses globs .cursor/rules/ MDC files should match glob patterns, not exact paths
F9 No unfilled template placeholders {{variables}} left in context files waste tokens and confuse the model

📝 Instructions — are your rules well-written? (25% — highest weight)

Check What Why
I1 Emphasis keyword count Anthropic cut IMPORTANT from 12 to 4 across 265 versions
I2 Keyword density More emphasis = less compliance. Anthropic: 7.5 → 1.4 per 1K words
I3 Rule specificity "Don't X. Instead Y. Because Z." — Anthropic's golden formula
I4 Action-oriented headings Anthropic deleted all "You are a..." identity sections
I5 No identity language "Follow conventions" removed — model already does this
I6 Entry file length 60–120 lines is the sweet spot. Longer dilutes priority
I7 Under 40,000 characters Claude Code hard limit. Above this, your file is truncated — silently
I8 Total injected content within budget All auto-injected files stay within the 200K context budget

🔨 Workability — can AI build and test? (18%)

Check What Why
W1 Build/test commands documented AI can't guess your test runner
W2 CI exists Rules without enforcement are suggestions
W3 Tests exist (not empty shell) A CI that runs pytest with 0 test files always "passes"
W4 Linter configured Mechanical formatting frees AI from guessing style
W5 No files over 256 KB Claude Code cannot read them — hard error
W6 Pre-commit hooks are fast Claude Code never uses --no-verify. Slow hooks = stuck commits
W7 Local fast test command documented Entry file documents a fast (<30s) test command for mid-session verification
W8 npm test script exists JS/Node repos need npm test so AI can run tests without guessing
W9 Release workflow validates version consistency Automated drift detection across package.json, CHANGELOG, and badges
W10 Test cost tiers defined (pytest markers) @pytest.mark.fast lets AI run the cheap subset, not the full 10-minute suite
W11 feat/fix commits paired with test commits Gate that catches features landing without corresponding tests

🔄 Continuity — can the next session pick up? (12%)

Check What Why
C1 Document freshness Stale instructions are worse than no instructions
C2 Handoff file exists Without it, every session starts from zero
C3 Changelog has "why" "Updated INDEX" says nothing. "Fixed broken path" says everything
C4 Plans in repo Plans in Jira don't exist for AI
C5 CLAUDE.local.md not in git Private per-user file — must be in .gitignore
C6 HANDOFF.md has verify conditions Notes with evidence (score ≥ X, tests pass) let the next session skip full re-audit

🔒 Safety — is AI working securely? (15%)

Check What Why
S1 .env in .gitignore AI's Glob tool ignores .gitignore by default — secrets visible
S2 Actions SHA pinned AI push triggers CI. Floating tags = supply chain attack vector
S3 Secret scanning configured AI won't self-check for accidentally written API keys
S4 SECURITY.md exists AI needs security context for sensitive code decisions
S5 Workflow permissions minimized AI-triggered workflows shouldn't have write access by default
S6 No hardcoded secrets Detects sk-, ghp_, AKIA, private key patterns in source
S7 No personal paths in source Absolute home-dir paths leak machine identity and break on other machines
S8 No pull_request_target trigger Runs in privileged context — supply chain attack vector for external PRs
S9 No personal email in git history Personal email in commits is a privacy and identity leak

⚙️ Harness — is your Claude Code setup correct? (10%)

Check What Why
H1 Hook event names valid PoToolUse vs PostToolUse — typos silently prevent hooks from ever firing
H2 PreToolUse hooks have matcher Without a tool matcher, the hook runs before every tool call
H3 Stop hook has circuit breaker Stop hooks without an exit condition run forever
H4 No dangerous auto-approve * or .* grant unlimited tool execution with no human check
H5 Env deny coverage complete Missing deny patterns let secrets leak to untrusted tools
H6 Hook scripts network access Outbound calls from hooks can exfiltrate data triggered by the agent
H7 Gate workflows are blocking Warn-only CI gates are effectively disabled — agents merge despite failures
H8 Hook errors use structured format what/rule/fix lets the agent self-correct; unstructured errors leave it stuck

🧠 Deep — AI-powered instruction analysis (opt-in, extended)

Spawns AI subagents to find what pattern-matching can't:

Check What Why
D1 Contradictory rules Two rules that conflict cause the model to pick one — usually the wrong one
D2 Dead-weight rules Rules the model would follow anyway waste tokens and dilute priority
D3 Vague rules without decision boundary "Use good judgment" gives the model nothing to evaluate against

📊 Session — learn from your Claude Code logs (opt-in, extended)

Reads your session history to surface patterns you'd never notice manually:

Check What Why
SS1 Repeated instructions Instructions you type every session belong in CLAUDE.md
SS2 Ignored rules Rules AI keeps bypassing need rewriting, not repeating
SS3 Friction hotspots Which projects and tasks generate the most re-work
SS4 Missing rule suggestions Common corrections that aren't captured anywhere yet

How is this different from /init?

/init generates a template CLAUDE.md from scratch. Useful on day one. Useless on day fifty — when the file is stale, bloated with emphasis keywords the model ignores, missing .env in .gitignore, and silently exceeds the 40K hard limit.

/init writes a file. AgentLint audits the whole system:

/init AgentLint
Generates template CLAUDE.md
Checks entry-file quality
Finds broken @include references
Enforces the 40K character hard limit
Audits CI, hooks, .gitignore, Actions SHA pinning
Detects instruction rot over time
Audits Claude Code hook configuration
Auto-fixes what it can
Every check backed by a cited data source

Who this is for

  • Solo developers using Claude Code, Cursor, or Codex who want the agent to stop ignoring your rules
  • Team leads who need every repo in the org to be AI-ready before agents ship to prod
  • OSS maintainers whose external contributors (and their agents) should write code in your style
  • Security-conscious engineers worried about agents exfiltrating .env or triggering vulnerable workflows

Compatibility

AgentLint ships as a Claude Code plugin and standalone CLI. When it runs, it audits any of the following if present in your repo:

  • CLAUDE.md (Anthropic's Claude Code)
  • AGENTS.md (the universal standard — used by OpenAI Codex, Cursor, Windsurf, Kilo, GitHub Copilot, Gemini CLI, and 60,000+ open-source repos)
  • .cursor/rules/
  • .github/copilot-instructions.md

Roadmap: native Cursor and Codex integrations. Star the repo to follow.

Update

npm install -g agentlint-ai

Or update the Claude Code plugin directly:

claude plugin update agent-lint@agent-lint

Evidence

Every check cites its source. No opinions, no best practices — data.

Source Type
Anthropic 265 prompt versions Primary dataset
Claude Code source code Hard limits and internal behavior
IFScale (NeurIPS) Instruction compliance at scale
ETH Zurich Do context files help coding agents?
Codified Context Stale content as #1 failure mode
Agent READMEs Concrete vs abstract effectiveness

Full citations in standards/evidence.json.

FAQ

What exactly is an "agent harness"?

The term got popular in early 2026 (Mitchell Hashimoto, OpenAI, LangChain). Shortest definition: Agent = Model + Harness. The harness is everything that wraps an LLM and turns it into an agent — tools, state management, feedback loops, and the persistent rules it reads at session start. For coding agents, that last part is your AGENTS.md, CLAUDE.md, .cursor/rules, CI, pre-commit hooks, and .gitignore. AgentLint is the first linter built specifically to audit that layer.

Why not just use /init and call it a day?

See the table above. /init writes a file; it doesn't audit your repo. AgentLint does 51 deterministic checks across 6 core dimensions (plus 7 opt-in extended checks) — and fixes what it finds.

Does this work with Cursor, Codex, or GitHub Copilot?

Today AgentLint runs inside Claude Code, but the checks apply to repo assets every agent reads: AGENTS.md, .cursor/rules, .github/copilot-instructions.md. A well-linted repo makes every agent better, not just Claude. Native Cursor and Codex integrations are on the roadmap.

Is my code sent anywhere?

It depends on which mode you run. The default (agentlint check and the GitHub Action) is local-only and runs zero AI. The two opt-in extended modes do touch AI or local session logs — we spell it out so there's no surprise:

Mode Data accessed Network / AI
agentlint check (default) files in the repo being scanned Local only, no AI
GitHub Action files in the checked-out repo inside the runner Local only, no AI
/al (core dims only) git repos under the configured PROJECTS_ROOT Local only, no AI
/al with Deep (opt-in) selected entry files (e.g. CLAUDE.md) Sends file contents to a Claude sub-agent
/al with Session (opt-in) ~/.claude/projects/ logs on your machine Local analyzer. Output is redacted by default; raw snippets require --include-raw-snippets

Deep is the only mode that transmits file contents off your machine, and it only runs when you explicitly ask for it inside Claude Code. Everything the default scan produces — the Score: NN/100 (core) output, the JSONL, the SARIF, the GitHub Action annotations — comes from pattern checks on disk, no API calls.

Does npm install write outside node_modules?

No. npm install -g agentlint-ai only installs the agentlint CLI to npm's global prefix (just like any other CLI tool). The Claude Code plugin install is opt-in: run npx agentlint-ai install (one-time) to detect Claude Code and register the /al slash command in ~/.claude/commands/. The CLI works without that step; the /al slash command does not.

Failure-mode fallbacks live in INSTALL.md.

Isn't this just "best practices"?

No. Every check cites a specific source — Anthropic's 265 prompt versions, Claude Code source code, peer-reviewed papers, or real production audits. If a check can't be backed by data, it doesn't ship.

Why do you lint AGENTS.md if this is a Claude Code plugin?

Because good context engineering is cross-tool. If you're using any combination of Claude Code, Cursor, and Codex, the same AGENTS.md serves all of them. AgentLint checks it against the same evidence base regardless of which agent ends up reading it.

How long does a scan take?

Under 5 seconds for most repos. The Deep and Session dimensions take longer because they spawn subagents or read session logs.

Requirements

  • Node 20+
  • jq
  • Claude Code (for /al plugin and Deep/Session analysis)

Contributing

Issues and PRs welcome. See CONTRIBUTING.md.

License

MIT


If AgentLint saved you from one bad agent session, please ⭐ star the repo — it's how we find out it's useful.

Built by @0xmariowu · agentlint.app