Skip to content

fix(roe): close two target-extraction scope bypasses (URL userinfo + TLD/extension collision)#392

Closed
VoidChecksum wants to merge 3 commits into
mainfrom
fix/roe-userinfo-and-tld-bypass
Closed

fix(roe): close two target-extraction scope bypasses (URL userinfo + TLD/extension collision)#392
VoidChecksum wants to merge 3 commits into
mainfrom
fix/roe-userinfo-and-tld-bypass

Conversation

@VoidChecksum
Copy link
Copy Markdown
Collaborator

Summary

The RoE evaluator refuses out-of-scope hosts only for targets returned by extract_targets() (middleware/roe.py:205). Two extraction gaps let a host evade scope enforcement entirely.

1. URL userinfo bypass (security)

_URL_RE captured [^\s/:]+ immediately after ://, so curl https://user:pass@evil.example/ captured user (dropped as invalid) and the real host evil.example was never evaluated by RoE. The regex now skips an optional userinfo@ segment and captures the host, excluding @.

2. TLD / file-extension collision (security)

_NON_TARGET_EXTENSIONS dropped any token whose last label matched a file extension — but .sh .md .py .pl .pub .zip are delegated DNS TLDs, so curl https://evil.zip was silently dropped from scope enforcement. Those real-TLD entries are removed (file-only extensions retained); the stale comment claiming no TLD collides is corrected with a security rationale to prevent regression.

Both fixes err toward over-extraction, matching this module's documented "false positives are safer than false negatives" design — a spurious target is operator-overridable; a dropped real one is a silent scope breach.

Tests

New tests/unit/middleware/test_command_targets_scope_bypass.py: userinfo stripping (with port handling), TLD-colliding hosts now extracted, and confirmation that genuine local-file arguments (key.pem, scan.txt, …) stay excluded. Full existing RoE suite passes (64 tests green locally).

🤖 Generated with Claude Code

The RoE evaluator refuses out-of-scope hosts only for targets returned by
`extract_targets()` (roe.py:205). Two extraction gaps let a host evade scope
enforcement entirely:

1. URL userinfo. `_URL_RE` captured `[^\s/:]+` right after `://`, so
   `curl https://user:pass@evil.example/` captured `user` (dropped as
   invalid) and the real host `evil.example` was never evaluated. The regex
   now skips an optional `userinfo@` and captures the host, excluding `@`.

2. TLD/extension collision. `_NON_TARGET_EXTENSIONS` dropped any token whose
   last label matched a file extension — but `.sh .md .py .pl .pub .zip` are
   delegated DNS TLDs, so `curl https://evil.zip` was silently dropped. Those
   real-TLD entries are removed (file-only extensions are retained). The stale
   comment claiming no TLD collides is corrected.

Both fixes err toward over-extraction, matching this module's documented
"false positives are safer than false negatives" design.

Adds regression tests for userinfo stripping (with port handling), TLD-
colliding hosts, and confirms genuine local-file arguments stay excluded.
CodeQL py/incomplete-url-substring-sanitization flagged the 'host in extract_targets(...)' membership asserts. extract_targets returns a set; rewrite the asserts as explicit element equality (any/all) so the scanner no longer mistakes them for URL sanitization. Behaviour unchanged.
VoidChecksum added a commit that referenced this pull request May 30, 2026
Folds in the second RoE scope-bypass fix from #392: .sh/.md/.py/.pl/.pub/.zip are delegated DNS TLDs, so a host like evil.zip was silently dropped from scope enforcement by the file-extension denylist. Removes those entries (file-only extensions retained) and corrects the stale comment. Adds regression tests. Consolidates #392 into this PR.
@VoidChecksum
Copy link
Copy Markdown
Collaborator Author

Closing as a duplicate. The two RoE scope-extraction fixes here (URL userinfo + real-TLD/extension collision) are now consolidated into #405, which uses a more robust RFC-3986 urlsplit-based host extraction and additionally normalizes packed-integer/hex IPv4 IMDS encodings. The TLD-collision fix + regression tests from this PR have been folded into #405. Merge #405 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants