fix(roe): close two target-extraction scope bypasses (URL userinfo + TLD/extension collision)#392
Closed
VoidChecksum wants to merge 3 commits into
Closed
fix(roe): close two target-extraction scope bypasses (URL userinfo + TLD/extension collision)#392VoidChecksum wants to merge 3 commits into
VoidChecksum wants to merge 3 commits into
Conversation
The RoE evaluator refuses out-of-scope hosts only for targets returned by `extract_targets()` (roe.py:205). Two extraction gaps let a host evade scope enforcement entirely: 1. URL userinfo. `_URL_RE` captured `[^\s/:]+` right after `://`, so `curl https://user:pass@evil.example/` captured `user` (dropped as invalid) and the real host `evil.example` was never evaluated. The regex now skips an optional `userinfo@` and captures the host, excluding `@`. 2. TLD/extension collision. `_NON_TARGET_EXTENSIONS` dropped any token whose last label matched a file extension — but `.sh .md .py .pl .pub .zip` are delegated DNS TLDs, so `curl https://evil.zip` was silently dropped. Those real-TLD entries are removed (file-only extensions are retained). The stale comment claiming no TLD collides is corrected. Both fixes err toward over-extraction, matching this module's documented "false positives are safer than false negatives" design. Adds regression tests for userinfo stripping (with port handling), TLD- colliding hosts, and confirms genuine local-file arguments stay excluded.
CodeQL py/incomplete-url-substring-sanitization flagged the 'host in extract_targets(...)' membership asserts. extract_targets returns a set; rewrite the asserts as explicit element equality (any/all) so the scanner no longer mistakes them for URL sanitization. Behaviour unchanged.
VoidChecksum
added a commit
that referenced
this pull request
May 30, 2026
Folds in the second RoE scope-bypass fix from #392: .sh/.md/.py/.pl/.pub/.zip are delegated DNS TLDs, so a host like evil.zip was silently dropped from scope enforcement by the file-extension denylist. Removes those entries (file-only extensions retained) and corrects the stale comment. Adds regression tests. Consolidates #392 into this PR.
Collaborator
Author
|
Closing as a duplicate. The two RoE scope-extraction fixes here (URL userinfo + real-TLD/extension collision) are now consolidated into #405, which uses a more robust RFC-3986 urlsplit-based host extraction and additionally normalizes packed-integer/hex IPv4 IMDS encodings. The TLD-collision fix + regression tests from this PR have been folded into #405. Merge #405 instead. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The RoE evaluator refuses out-of-scope hosts only for targets returned by
extract_targets()(middleware/roe.py:205). Two extraction gaps let a host evade scope enforcement entirely.1. URL userinfo bypass (security)
_URL_REcaptured[^\s/:]+immediately after://, socurl https://user:pass@evil.example/captureduser(dropped as invalid) and the real hostevil.examplewas never evaluated by RoE. The regex now skips an optionaluserinfo@segment and captures the host, excluding@.2. TLD / file-extension collision (security)
_NON_TARGET_EXTENSIONSdropped any token whose last label matched a file extension — but.sh .md .py .pl .pub .zipare delegated DNS TLDs, socurl https://evil.zipwas silently dropped from scope enforcement. Those real-TLD entries are removed (file-only extensions retained); the stale comment claiming no TLD collides is corrected with a security rationale to prevent regression.Both fixes err toward over-extraction, matching this module's documented "false positives are safer than false negatives" design — a spurious target is operator-overridable; a dropped real one is a silent scope breach.
Tests
New
tests/unit/middleware/test_command_targets_scope_bypass.py: userinfo stripping (with port handling), TLD-colliding hosts now extracted, and confirmation that genuine local-file arguments (key.pem,scan.txt, …) stay excluded. Full existing RoE suite passes (64 tests green locally).🤖 Generated with Claude Code