Skip to content

Add OWASP LLM02 output-side scorer pack: SSRF / SSTI / XXE / open redirect / LDAP injection#2118

Merged
romanlutz merged 4 commits into
microsoft:mainfrom
ppcvote:add-injection-scorer-pack
Jul 3, 2026
Merged

Add OWASP LLM02 output-side scorer pack: SSRF / SSTI / XXE / open redirect / LDAP injection#2118
romanlutz merged 4 commits into
microsoft:mainfrom
ppcvote:add-injection-scorer-pack

Conversation

@ppcvote

@ppcvote ppcvote commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds five RegexScorer subclasses extending the OWASP LLM02 (Insecure Output Handling) scorer pack from #1868 with the remaining statically-detectable payload families: SSRF, SSTI, XXE, open redirect, and LDAP injection.

Proposed as a follow-up in #2002. Same design as #1868: one focused subclass per payload family, following the CredentialLeakScorer pattern, each independently enable/disable-able per scenario.

I'm opening this without a help-wanted label on #2002 — given the direct pattern parity with the merged #1868, I wanted the code visible for review rather than a standalone comment. Happy to close and wait for triage if the team prefers.

Per-scorer breakdown

Scorer Default patterns Pattern names
SSRFOutputScorer 4 Cloud Metadata Endpoint · Loopback URL Target · Private Network URL Target · SSRF URL Scheme
SSTIOutputScorer 2 Arithmetic Eval Probe · Python Gadget Chain
XXEOutputScorer 3 External Entity Declaration · External Parameter Entity · Doctype Internal Subset Entity
OpenRedirectOutputScorer 3 Protocol-Relative Redirect Param · Encoded Slash Redirect · Userinfo Host Confusion
LDAPInjectionOutputScorer 3 Filter Break Sequence · Always-True Clause · Boolean Operator Injection

15 default patterns across 5 scorers.

What's included

  • 5 scorer classes under pyrit/score/true_false/regex/ (~55-65 lines each, mirroring the CredentialLeakScorer / Add OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path) #1868 structure: _DEFAULT_PATTERNS class var, keyword-only __init__ with an optional patterns override, categories=["security"], OR aggregator).
  • 5 test files under tests/unit/score/regex/ — parametrized positive/negative cases + rationale-name assertion + custom-pattern override + memory-write assertion, mirroring test_credential_leak_scorer.py. The tests/unit/score/regex/ directory runs green (262 passed locally).
  • Export wiring: the submodule pyrit/score/true_false/regex/__init__.py is kept alphabetized; the insertions into the top-level pyrit/score/__init__.py follow that file's existing (non-alphabetical) ordering.

Note on precision

These are output-side detectors, so the patterns lean toward flagging when a model emits attack-shaped content. Two deliberate scoping choices worth calling out for review:

  • LDAP: each pattern requires an attr= clause adjacent to the filter break (*)(uid=, )(cn=*)), so ordinary code punctuation with the same *)( shape — e.g. a regex group (\w*)(\s+) — does not match. Negative tests cover that case.
  • SSRF: the Cloud Metadata Endpoint pattern matches the link-local metadata IP/host by design, so it will also fire if a model merely echoes 169.254.169.254 in prose. That is intentional for an output scorer (emitting the metadata endpoint is itself signal), but reviewers should be aware it is not URL-scheme-gated. Happy to tighten it to require a fetch/URL context if you'd prefer lower recall.

Not included (deliberate)

Pattern provenance

Ported from the MIT-licensed prompt-defense-audit-py (same provenance as #1868, same author). Each pattern is verified in-repo against positive and negative cases in the new test files.

Test evidence

  • pytest tests/unit/score/regex/ → 262 passed locally.
  • Branch rebased onto current main; top-level from pyrit.score import SSRFOutputScorer, SSTIOutputScorer, XXEOutputScorer, OpenRedirectOutputScorer, LDAPInjectionOutputScorer verified.
  • pre-commit ruff format + ruff check pass on the changed files (the ty type-check hook was skipped locally for lack of uv; it will run in CI).

…irect / LDAP

Extends the regex true/false scorer family from microsoft#1868 with five additional
output-side payload detectors, mirroring the existing RegexScorer pattern.
Each is deterministic (no LLM call), categorized "security", with unit tests
covering positive payloads, benign negatives, rationale, custom patterns, and
memory integration.
@romanlutz romanlutz self-assigned this Jul 2, 2026
Copilot AI and others added 3 commits July 2, 2026 14:31
- List SSRF/SSTI/XXE/open-redirect/LDAP scorers in the OWASP LLM02 doc section (.py + .ipynb)

- Fix alphabetical placement of LDAP/OpenRedirect entries in pyrit.score __all__

- Align Encoded Slash Redirect param list with Protocol-Relative Redirect Param (adds returnto/destination/forward/location); add regression test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz added this pull request to the merge queue Jul 3, 2026
Merged via the queue into microsoft:main with commit 1e21a04 Jul 3, 2026
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants