Add OWASP LLM02 output-side scorer pack: SSRF / SSTI / XXE / open redirect / LDAP injection#2118
Merged
Merged
Conversation
…irect / LDAP Extends the regex true/false scorer family from microsoft#1868 with five additional output-side payload detectors, mirroring the existing RegexScorer pattern. Each is deterministic (no LLM call), categorized "security", with unit tests covering positive payloads, benign negatives, rationale, custom patterns, and memory integration.
- List SSRF/SSTI/XXE/open-redirect/LDAP scorers in the OWASP LLM02 doc section (.py + .ipynb) - Fix alphabetical placement of LDAP/OpenRedirect entries in pyrit.score __all__ - Align Encoded Slash Redirect param list with Protocol-Relative Redirect Param (adds returnto/destination/forward/location); add regression test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
romanlutz
approved these changes
Jul 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds five
RegexScorersubclasses extending the OWASP LLM02 (Insecure Output Handling) scorer pack from #1868 with the remaining statically-detectable payload families: SSRF, SSTI, XXE, open redirect, and LDAP injection.Proposed as a follow-up in #2002. Same design as #1868: one focused subclass per payload family, following the
CredentialLeakScorerpattern, each independently enable/disable-able per scenario.I'm opening this without a
help-wantedlabel on #2002 — given the direct pattern parity with the merged #1868, I wanted the code visible for review rather than a standalone comment. Happy to close and wait for triage if the team prefers.Per-scorer breakdown
SSRFOutputScorerSSTIOutputScorerXXEOutputScorerOpenRedirectOutputScorerLDAPInjectionOutputScorer15 default patterns across 5 scorers.
What's included
pyrit/score/true_false/regex/(~55-65 lines each, mirroring theCredentialLeakScorer/ Add OWASP LLM02 output-side scorer pack (XSS / SQLi / Shell / Path) #1868 structure:_DEFAULT_PATTERNSclass var, keyword-only__init__with an optionalpatternsoverride,categories=["security"], OR aggregator).tests/unit/score/regex/— parametrized positive/negative cases + rationale-name assertion + custom-pattern override + memory-write assertion, mirroringtest_credential_leak_scorer.py. Thetests/unit/score/regex/directory runs green (262 passed locally).pyrit/score/true_false/regex/__init__.pyis kept alphabetized; the insertions into the top-levelpyrit/score/__init__.pyfollow that file's existing (non-alphabetical) ordering.Note on precision
These are output-side detectors, so the patterns lean toward flagging when a model emits attack-shaped content. Two deliberate scoping choices worth calling out for review:
attr=clause adjacent to the filter break (*)(uid=,)(cn=*)), so ordinary code punctuation with the same*)(shape — e.g. a regex group(\w*)(\s+)— does not match. Negative tests cover that case.Cloud Metadata Endpointpattern matches the link-local metadata IP/host by design, so it will also fire if a model merely echoes169.254.169.254in prose. That is intentional for an output scorer (emitting the metadata endpoint is itself signal), but reviewers should be aware it is not URL-scheme-gated. Happy to tighten it to require a fetch/URL context if you'd prefer lower recall.Not included (deliberate)
doc/code/scoring/owasp_llm02_scorers.*notebook was folded intodoc/code/scoring/1_true_false_scorers.{py,ipynb}by the docs refactor in DOC: Scoring Docs Refactor #1892; that section currently lists only XSS/SQLi/Shell/Path. Extending it with these five scorers is ready to fold into this PR on request, or as an immediate follow-up — kept out here to keep the diff review-sized.Pattern provenance
Ported from the MIT-licensed
prompt-defense-audit-py(same provenance as #1868, same author). Each pattern is verified in-repo against positive and negative cases in the new test files.Test evidence
pytest tests/unit/score/regex/→ 262 passed locally.main; top-levelfrom pyrit.score import SSRFOutputScorer, SSTIOutputScorer, XXEOutputScorer, OpenRedirectOutputScorer, LDAPInjectionOutputScorerverified.ruff format+ruff checkpass on the changed files (thetytype-check hook was skipped locally for lack ofuv; it will run in CI).