audit-agents

Stop repeating the same agent failures.

Built out of frustration: I kept seeing the exact same issues (scope creep, hallucinations, weak robustness) across every new agent I wrote.

So I turned my checklist into a Claude Code skill that runs an 11-section PASS/WARN/FAIL audit on your agent definitions — with exact line numbers and fix suggestions.

Works with Claude Code, Cursor, Codex, and any markdown-based agent format.

Quick Install (recommended)

npx skills add ajmalhassan/audit-agents

Other install methods

Clone into skills directory:

git clone https://github.com/ajmalhassan/audit-agents.git ~/.claude/skills/audit-agents

Or copy just the skill file:

mkdir -p ~/.claude/skills/audit-agents
cp SKILL.md ~/.claude/skills/audit-agents/

Why I built this

I keep writing new agents — different projects, different purposes. But the things that go wrong are always the same. Agent scope-creeps, hallucinates findings, declares "done" too early. I open the definition and find a gap I've already fixed in another agent.

There's no quick way to check if a definition is solid before you start using it. You run the agent, watch it break, fix the definition, repeat. So I wrote down the criteria I keep checking for against industry best practices and packaged it as a skill. I run it after every revision now to see if the score improves.

What It Checks

#	Section	Covers
1	Frontmatter Integrity	name, description, tool whitelist, model selection
2	Scope Boundaries	purpose, handoffs, exit conditions, agent spiral/relentless patterns
3	Anti-Hallucination	read-before-claim, evidence requirements, exhaustion gates
4	Prompt Structure	phase breakdown, approval gates, thin agent principle
5	Output Format	structured templates, severity levels
6	Tool Usage Discipline	least-privilege, role archetypes, command restrictions
7	Robustness	missing file handling, failure behavior, fallbacks
8	Hook Awareness	redundancy with lifecycle hooks
9	Reusability	generic instructions, no hardcoded context
10	Security	secrets, prompt injection, unsafe commands, scoped access
11	Model-Capability Alignment	instructions match model strengths

Plus cross-agent checks for scope overlap, handoff coherence, and defense-in-depth.

Example Output

Real audit of a component-reviewer agent (60/63 checks passed):

Agent Audit: component-reviewer

Summary
- Checks: 60/63
- Warnings: 3
- Failures: 0

Warnings (suggested improvements)
1. [Prompt Structure — Thin Agent] 340 lines (2.3× guideline). The 7 category
   contracts (lines 185–237) could be a companion file loaded based on the
   component's detected category.
   Rationale: Reduces main body by ~50 lines and limits attention dilution
   for mid-capability model.

2. [Robustness] No explicit instruction for what to do if the component
   directory doesn't exist (user provides wrong name).
   Rationale: The reviewer reads packages/react/src/components/{Name}/index.tsx
   — if it's missing, behavior is undefined.

3. [Model-Capability] 80+ checklist items may exceed mid-capability model's
   attention span for a single pass.
   Rationale: Sonnet works best with concise, outcome-oriented rules. Consider
   priority-ordering items or splitting into focused passes.

Passes
- [Frontmatter] name, description, tools (8), model (sonnet) all correct;
  Judge archetype (no Write/Edit)
- [Scope] Purpose clear; known codebase-wide gaps called out; layout
  components explicitly excluded; exit = structured report
- [Anti-Hallucination] "re-read the actual file" before FAILs; UNDETERMINED
  escape hatch; evidence requires 3-question reasoning
- [Output Format] Severity levels defined; structured report template with
  Failures/Warnings/Passes
- [Tool Usage] All 8 tools have concrete use cases; Bash constrained to
  grep, pnpm run check, pnpm run test
- [Reusability] Generic placeholders; category-based conditional contracts
- [Security] No secrets, scoped to packages/, no unsafe commands

Usage

Invoke directly:

/audit-agents

Or describe what you need:

Audit my agent definitions
Review the agents in this project for best practices

Tested With

Claude Code (primary)
Cursor
Any tool that reads markdown-based agent definitions

Complementary Tools

This skill does semantic, expert-level prompt review. For programmatic CI checks, pair with:

Tool	What it does
AgentLinter	25+ syntax/secrets/schema rules, VS Code + GitHub Action
cclint	Frontmatter validation, heading structure, dangerous commands
Agent Audit	Security scanner, 53 rules mapped to OWASP Agentic Top 10
Promptfoo	Agent evaluation framework, red teaming

Contributing

PRs welcome — especially new checks, edge cases, or example audits.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

audit-agents

Quick Install (recommended)

Why I built this

What It Checks

Example Output

Usage

Tested With

Complementary Tools

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

audit-agents

Quick Install (recommended)

Why I built this

What It Checks

Example Output

Usage

Tested With

Complementary Tools

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages