prompt-defense-audit

Deterministic LLM prompt defense scanner. Checks system prompts for missing defenses against 17 attack vectors (12 base + 5 agent-specific in v1.4). Pure regex — no LLM, no API calls, < 5ms, 100% reproducible.

繁體中文版

$ npx prompt-defense-audit "You are a helpful assistant."

  Grade: F  (8/100, 1/12 defenses)

  Defense Status:

  ✗ Role Boundary (80%)
    Partial: only 1/2 defense pattern(s)
  ✗ Instruction Boundary (80%)
    No defense pattern found
  ✗ Data Protection (80%)
    No defense pattern found
  ...

Why

OWASP lists Prompt Injection as the #1 threat to LLM applications. Yet most developers ship system prompts with zero defense.

We scanned 1,646 production system prompts from 4 public datasets. Results:

97.8% lack indirect injection defense
78.3% score F (below 45/100)
Average score: 36/100

Existing security tools require LLM calls (expensive, non-deterministic) or cloud services (privacy concerns). This package runs locally, instantly, for free.

Our philosophy: The deterministic engine is the product. AI deep analysis is optional — because regex is already strong enough for 90%+ of use cases. Zero AI cost by default.

Install

npm install prompt-defense-audit

Usage

Programmatic (TypeScript / JavaScript)

import { audit, auditWithDetails } from 'prompt-defense-audit'

// Quick audit
const result = audit('You are a helpful assistant.')
console.log(result.grade)    // 'F'
console.log(result.score)    // 8
console.log(result.missing)  // ['instruction-override', 'data-leakage', ...]

// Detailed audit with per-vector evidence
const detailed = auditWithDetails(mySystemPrompt)
for (const check of detailed.checks) {
  console.log(`${check.defended ? '✅' : '❌'} ${check.name}: ${check.evidence}`)
}

CLI

# Inline prompt
npx prompt-defense-audit "You are a helpful assistant."

# From file
npx prompt-defense-audit --file my-prompt.txt

# Pipe from stdin
cat prompt.txt | npx prompt-defense-audit

# JSON output (for CI/CD)
npx prompt-defense-audit --json "Your prompt"

# Traditional Chinese output
npx prompt-defense-audit --zh "你的系統提示"

# List all 12 attack vectors
npx prompt-defense-audit --vectors

CI/CD Gate

GRADE=$(npx prompt-defense-audit --json --file prompt.txt | node -e "
  const r = JSON.parse(require('fs').readFileSync('/dev/stdin','utf8'));
  console.log(r.grade);
")
if [[ "$GRADE" == "D" || "$GRADE" == "F" ]]; then
  echo "Prompt defense audit failed: grade $GRADE"
  exit 1
fi

17 Attack Vectors

Based on OWASP LLM Top 10, empirical research on 1,646 production prompts, and structured analysis of six documented crypto AI agent incidents (see CASE_STUDIES.md).

12 Base Vectors

#	Vector	What it checks	Gap rate*
1	Role Escape	Role definition + boundary enforcement	92.4%
2	Instruction Override	Refusal clauses + meta-instruction protection	—
3	Data Leakage	System prompt / training data disclosure prevention	9.4%
4	Output Manipulation	Output format restrictions	88.3%
5	Multi-language Bypass	Language-specific defense	64.3%
6	Unicode Attacks	Homoglyph / zero-width character detection	—
7	Context Overflow	Input length limits	—
8	Indirect Injection	External data validation	97.8%
9	Social Engineering	Emotional manipulation resistance	71.4%
10	Output Weaponization	Harmful content generation prevention	—
11	Abuse Prevention	Rate limiting / auth awareness	—
12	Input Validation	XSS / SQL injection / sanitization	—

5 Agent Vectors (v1.4, May 2026)

Added after analysing six documented crypto AI agent incidents. Each vector is grounded in a specific real-world failure — see CASE_STUDIES.md for primary sources and root-cause analysis.

#	Vector	What it checks	Reference incident
13	Encoding-aware Indirect Injection	Treating decoded/translated content (Morse, base64, ROT13) as untrusted data, not instructions	Grok×Bankrbot Morse code, May 2026
14	Function/Tool Semantic Immutability	Function or tool semantics cannot be redefined mid-conversation	Freysa `approveTransfer` redefinition, Nov 2024
15	Memory Provenance Awareness	Retrieved RAG memory may be poisoned by adversaries on other platforms	ElizaOS memory injection, Princeton 2025
16	Cross-Agent Authorization Boundary	Authority does not silently inherit from another agent's output	Grok×Bankrbot principal confusion, May 2026
17	Financial Transaction Guardrails	Hard limits, multi-sig, refusal thresholds for transactions	Lobstar Wilde decimal-error transfer, Feb 2026

*Gap rate = % of 1,646 production prompts missing this defense. Source: research data.

Grading

Grade	Score	Meaning
A	90–100	Strong defense coverage
B	70–89	Good, some gaps
C	50–69	Moderate, significant gaps
D	30–49	Weak, most defenses missing
F	0–29	Critical, nearly undefended

API Reference

`audit(prompt: string): AuditResult`

Quick audit. Returns grade, score, and list of missing defense IDs.

interface AuditResult {
  grade: 'A' | 'B' | 'C' | 'D' | 'F'
  score: number       // 0-100
  coverage: string    // e.g. "4/12"
  defended: number    // count of defended vectors
  total: number       // 12
  missing: string[]   // IDs of undefended vectors
}

`auditWithDetails(prompt: string): AuditDetailedResult`

Full audit with per-vector evidence.

interface AuditDetailedResult extends AuditResult {
  checks: DefenseCheck[]
  unicodeIssues: { found: boolean; evidence: string }
}

interface DefenseCheck {
  id: string
  name: string          // English
  nameZh: string        // 繁體中文
  defended: boolean
  confidence: number    // 0-1
  evidence: string      // Human-readable explanation
}

`ATTACK_VECTORS: AttackVector[]`

Array of all 12 attack vector definitions with bilingual names and descriptions.

How It Works

Parses the system prompt text
For each of 12 attack vectors, applies regex patterns that detect defensive language
A defense is "present" when enough patterns match (usually >= 1, some require >= 2)
Checks for suspicious Unicode characters embedded in the prompt
Calculates coverage score and assigns a letter grade

This tool does NOT:

Send your prompt to any external service
Use LLM calls (100% regex-based)
Guarantee security (it checks for defensive language, not runtime behavior)
Replace penetration testing or behavioral evaluation

What This Scanner Does NOT Catch

Static prompt analysis is layer 1 of a defense-in-depth model. The following classes of attack require defenses at other layers — this scanner does not replace them, and we say so explicitly so it isn't oversold:

Runtime credential compromise. Dashboard takeovers, leaked API keys, malicious deployment commits. Standard infosec, out of scope. (Reference: AIXBT dashboard takeover, Mar 2025.)
Tool / permission scoping bugs. Whether the agent has dangerous tools, and how those tools are gated, is invisible to a prompt scanner. (Reference: Bankrbot NFT-as-authorization, May 2026.)
Whether declared defenses are enforced at runtime. A prompt can declare "verify retrieved memory" and the framework can ignore it. The scanner cannot tell.
Numerical and unit bugs. Off-by-1000 decimal errors, wrong-token-id transfers. Code-level bugs, not prompt issues. (Reference: Lobstar Wilde, Feb 2026.)
Effectiveness vs. presence. A prompt with the keyword "never" registers as defended even if a "helpful" framing dominates under adversarial pressure. We check for presence of defensive language, not its strength.
Multi-turn adversarial dynamics. Static scan of turn 0 cannot predict turn 482. (Reference: Freysa, Nov 2024.)

A pass on this scanner is necessary, not sufficient. See CASE_STUDIES.md for an honest mapping of which documented incidents this scanner would flag versus which it cannot help with.

Limitations

Regex-based detection is heuristic — a prompt can contain defensive language but still be vulnerable at runtime. This tool measures intent to defend, not actual defense effectiveness.
Only checks system prompt text, not model behavior under adversarial pressure.
English and Traditional Chinese patterns only (contributions welcome for other languages).
False positives/negatives are possible. See research data for calibration details.
Fullwidth CJK punctuation (e.g. ，) triggers Unicode detection — known limitation.

Complementary tools

prompt-defense-audit is a static, design-time check. It pairs cleanly with runtime-side projects that detect attacks as they happen:

Lifecycle stage	Tool	Question it answers
Build / CI gate	`prompt-defense-audit` (this)	"Is the prompt designed to resist attacks?"
Runtime detection	Agent-Threat-Rule (ATR)	"Is an attack happening right now?"

Failure modes are orthogonal: the audit misses novel attacks not anticipated at design time; ATR misses prompts that have no resistance even before traffic arrives. Used together they form a defense-in-depth pattern (CI gate → runtime detection).

Detailed integration including the 1:N vector mapping (20 defense vectors → 9 ATR detection categories), recommended usage pattern, and cross-references: docs/integrations/agent-threat-rules.md.

Research

This tool is backed by empirical analysis of 1,646 production system prompts from 4 public datasets:

Dataset	Size	Source
LouisShark/chatgpt_system_prompt	1,389	GPT Store custom GPTs
jujumilk3/leaked-system-prompts	121	ChatGPT, Claude, Grok, Perplexity, Cursor, v0
x1xhlol/system-prompts-and-models	80	Cursor, Windsurf, Devin, Augment
elder-plinius/CL4R1T4S	56	Claude, Gemini, Grok, Cursor

Key references:

Greshake et al. (2023), Not what you've signed up for — indirect prompt injection
Schulhoff et al. (2023), Ignore This Title and HackAPrompt — prompt injection taxonomy
OWASP LLM Top 10 (2025)

Contributing

See CONTRIBUTING.md. Key areas: new language patterns, better regex accuracy, integration examples.

Security

See SECURITY.md. Report vulnerabilities to dev@ultralab.tw — not via GitHub issues.

License

MIT — Ultra Lab

Used In Production

This library powers prompt defense detection across multiple production deployments and security frameworks. 11 PRs merged into indicator-org repos (Microsoft / Cisco / OWASP / UK Government AISI / awesome-list curators):

Frameworks + scanners (engine-level integration)

Microsoft Agent Governance Toolkit — agent-compliance — PromptDefenseEvaluator integrated with MerkleAuditChain + PromotionGate, merged Apr 2026.
Microsoft Agent Governance Toolkit — Docker provider doc fix — companion docstring fix, merged May 2026.
Cisco AI Defense — mcp-scanner — PromptDefenseAnalyzer module (12-vector regex audit), merged Apr 2026 (proposal → merge in 39 minutes).

Test harnesses + eval suites

OWASP Agent-Security-Regression-Harness — mcp_trust_boundary scenario — adversarial-seeding regression test, merged May 2026.
OWASP Agent-Security-Regression-Harness — goal_hijack scenario — foot-in-the-door variant, merged May 2026.
OWASP AI Testing Guide — AITG-APP-05 output-injection categories — 6 vector categories added, merged Jun 2026.
UK Government BEIS AISI — inspect_evals — SimpleQA config migration to single-file --run-config, merged Jun 2026.

Standards mappings + awesome lists

sint-ai/sint-protocol — Static Analysis Complement — ASI mapping addendum, merged May 2026.
Agent-Threat-Rule/agent-threat-rules — Integration page — 20 vectors → 9 detection categories cross-walk, merged May 2026.
TalEliyahu/Awesome-AI-Security — primary listing — added under Servers & Dev tooling, merged May 2026.
TalEliyahu/Awesome-AI-Security — misp-mcp-server — companion listing, merged May 2026.

Ultra Lab products built on this engine

UltraProbe (UltraLab) — free AI security scanner; uses this library as the Prompt Security engine.
Quartz Cloud — Taiwan-domiciled runtime AI firewall (Q3 2026 closed beta). Quartz uses this engine as its ingress detector + extends it with runtime + jurisdictional layers. The engine is open source under Ultra Lab; Quartz is a commercial brand built on top of it. Customers can audit, fork, or self-host the engine without lock-in.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
docs/integrations		docs/integrations
examples		examples
paper		paper
research		research
src		src
test		test
.gitignore		.gitignore
CASE_STUDIES.md		CASE_STUDIES.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-TW.md		README.zh-TW.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

prompt-defense-audit

Why

Install

Usage

Programmatic (TypeScript / JavaScript)

CLI

CI/CD Gate

17 Attack Vectors

12 Base Vectors

5 Agent Vectors (v1.4, May 2026)

Grading

API Reference

audit(prompt: string): AuditResult

auditWithDetails(prompt: string): AuditDetailedResult

ATTACK_VECTORS: AttackVector[]

How It Works

What This Scanner Does NOT Catch

Limitations

Complementary tools

Research

Contributing

Security

License

Used In Production

Frameworks + scanners (engine-level integration)

Test harnesses + eval suites

Standards mappings + awesome lists

Ultra Lab products built on this engine

Related

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`audit(prompt: string): AuditResult`

`auditWithDetails(prompt: string): AuditDetailedResult`

`ATTACK_VECTORS: AttackVector[]`

Packages