security: implement comprehensive prompt injection prevention and input sanitization by nik-kale · Pull Request #6 · nik-kale/AutoOPS-Architect

nik-kale · 2025-12-26T21:49:56Z

Summary

Implements comprehensive input sanitization and prompt injection prevention to protect against LLM manipulation attacks. Features configurable strictness levels and integrates seamlessly into the planning workflow.

Changes

Enhanced sanitizer.py with advanced prompt injection detection:
- Critical patterns: "ignore previous instructions", role overrides, forget commands
- Moderate patterns: Role injection (system:/user:/assistant:), instruction tokens, code blocks
- Permissive patterns: Minimal blocking for less sensitive deployments
Added PromptInjectionError exception for blocked attacks
Integrated sanitization into Architect.plan() method
Added configurable sanitization modes: strict, moderate, permissive
Added block_on_injection flag to control blocking behavior
Comprehensive test suite with 30+ test cases covering:
- All injection pattern types
- Different sanitization modes
- Integration with planner
- URL validation (SSRF prevention)
- Path validation (traversal prevention)
- Shell injection detection
- Sensitive data redaction

Type of Change

New feature (non-breaking change adding functionality)
Bug fix (non-breaking change fixing an issue)
Breaking change (fix or feature causing existing functionality to change)
Security patch

Problem

The original implementation had placeholder sanitization that didn't actually prevent prompt injection attacks. Malicious users could:

Override system prompts with "ignore previous instructions"
Inject role tokens to impersonate system/assistant
Use special instruction tokens ([INST], <|im_start|>) to manipulate behavior
Embed code blocks to trick the LLM into executing malicious logic

Solution

Multi-layered defense:

Pattern detection: Regex patterns for known injection techniques
Configurable strictness: Choose security level based on deployment context
Sanitization options: Block (raise exception) or sanitize (remove patterns)
Minimal content enforcement: Ensure goals remain meaningful after sanitization
Unicode filtering: Remove unusual Unicode that might confuse models

Configuration Examples

Strict mode (recommended for production):

config = PlannerConfig(
    sanitization_mode="strict",
    block_on_injection=True
)

Moderate mode (balanced):

config = PlannerConfig(
    sanitization_mode="moderate",
    block_on_injection=False  # Sanitize instead of blocking
)

Permissive mode (trusted environments):

config = PlannerConfig(
    sanitization_mode="permissive",
    block_on_injection=False
)

Testing

30+ test cases covering all attack vectors
Tests verify both blocking and sanitization modes
Integration tests with Architect planner
No false positives on normal workflow descriptions

Security Impact

Prevents entire class of attacks:

✅ Prompt injection
✅ Role override attempts
✅ Instruction token injection
✅ Code block injection
✅ System prompt leakage attempts

Checklist

Code follows project style guidelines
Self-review completed
Comments added for complex logic
Documentation updated (inline docstrings)
No new warnings introduced
Tests added and passing (30+ test cases)

Related Issues

Addresses security roadmap item: "Implement input sanitization for prompt injection prevention"

…ut sanitization

chatgpt-codex-connector · 2025-12-26T21:50:03Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

security: implement comprehensive prompt injection prevention and inp…

09df3cc

…ut sanitization

nik-kale merged commit 2f8efa2 into claude/init-autoops-architect-01U1Ygp8bjM5jUUNtnD9kR2E Dec 26, 2025
6 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: implement comprehensive prompt injection prevention and input sanitization#6

security: implement comprehensive prompt injection prevention and input sanitization#6
nik-kale merged 1 commit into
claude/init-autoops-architect-01U1Ygp8bjM5jUUNtnD9kR2Efrom
security/prompt-injection-prevention

nik-kale commented Dec 26, 2025

Uh oh!

chatgpt-codex-connector Bot commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nik-kale commented Dec 26, 2025

Summary

Changes

Type of Change

Problem

Solution

Configuration Examples

Testing

Security Impact

Checklist

Related Issues

Uh oh!

chatgpt-codex-connector Bot commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants