fix(roe): close target-extraction scope bypasses (userinfo, IP-encoding, compound args)#405
Open
VoidChecksum wants to merge 4 commits into
Open
fix(roe): close target-extraction scope bypasses (userinfo, IP-encoding, compound args)#405VoidChecksum wants to merge 4 commits into
VoidChecksum wants to merge 4 commits into
Conversation
…ng, compound args) RoEEnforcementMiddleware gates offensive commands by the targets that extract_targets() returns; an empty or incomplete extraction means NO scope check at all (fail-open). Three extraction gaps let a command reach an out-of-scope or forbidden host while the gate saw nothing or an in-scope decoy: - URL userinfo confusion: the old `://([^\s/:]+)` slice captured the authority up to the first `:`/`/`, i.e. the userinfo, not the host. `curl http://in-scope.acme.com@evil.com/` extracted the in-scope decoy (or nothing) while curl connects to evil.com. The authority is now parsed RFC-3986-correctly (urlsplit .hostname), so userinfo and port are dropped and the real connect host is evaluated. - Encoded-IP IMDS bypass: `http://2852039166/`, `http://0xa9fea9fe/` and `http://[fd00:ec2::254]/` never matched the dotted-quad / forbidden rules. Packed integer/hex hosts are normalized to dotted-quad and IPv6 literals de-bracketed, so cloud-metadata (169.254.169.254) and other forbidden destinations can no longer be reached via an alternate encoding. - Compound option mis-extraction: `--resolve host:port:ip` was emitted as a single junk `host:port:ip` target; it is now split so each piece (the real IP) is validated independently. Adds tests/unit/middleware/test_command_targets_scope_bypass.py (11 cases) covering each bypass plus regressions (plain URL, explicit port, IPv4, CIDR, and a small integer NOT mangled into an IP). Scoped to middleware/_command_targets.py only; the core matcher (decepticon_core/types/roe.py) is untouched, so this does not collide with #374 (FQDN trailing-dot normalization). Note: #383 adds a test_command_targets.py asserting current extraction output -- merge it after this PR and re-baseline those assertions against the corrected host extraction. Fast-lane: 1717 passed, 26 skipped. basedpyright 0 errors. ruff clean.
…tion false positive The userinfo-decoy cases asserted `"evil.com" in targets`, which CodeQL's py/incomplete-url-substring-sanitization flags as a HIGH alert (it cannot tell `targets` is a set, not a URL). Asserting exact set equality is both stronger (proves the in-scope decoy is absent) and free of the flagged pattern.
Folds in the second RoE scope-bypass fix from #392: .sh/.md/.py/.pl/.pub/.zip are delegated DNS TLDs, so a host like evil.zip was silently dropped from scope enforcement by the file-extension denylist. Removes those entries (file-only extensions retained) and corrects the stale comment. Adds regression tests. Consolidates #392 into this PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Closes three target-extraction gaps in
RoEEnforcementMiddlewarethat let an offensive command reach an out-of-scope or forbidden host while the RoE gate saw nothing (fail-open) or an in-scope decoy. Extraction is the sole input to the scope evaluator, so each gap is a scope bypass, not a cosmetic miss.curl http://in-scope.acme.com@evil.com/evil.com(real host)curl http://2852039166/·http://0xa9fea9fe/·http://[fd00:ec2::254]/169.254.169.254/ de-bracketed IPv6curl --resolve host:80:169.254.169.254 …host:80:169.254.169.254targetThe authority is now parsed RFC-3986-correctly (
urlsplit().hostname) so userinfo + port are dropped; packed integer/hex hosts are canonicalized to dotted-quad; compoundhost:port:iptokens are split.Tests
tests/unit/middleware/test_command_targets_scope_bypass.py— 11 cases: each bypass + regressions (plain URL, explicit port, IPv4, CIDR, and a small integer that must not be mangled into an IP).Scope / merge notes
middleware/_command_targets.py; the core matcher (decepticon_core/types/roe.py) is untouched → no collision with fix(roe): normalize FQDN trailing dot so it can't bypass scope deny checks #374 (FQDN trailing-dot).test_command_targets.pyasserting the current extraction output. Merge it after this PR and re-baseline those assertions against the corrected host extraction.Verification (local, Python 3.13, same uv.lock as CI)
ruff check✅ ·ruff format --check✅basedpyright✅ 0 errors (152 pre-existing warnings, unchanged)pytest -n auto -m "not slow"✅ 1717 passed, 26 skipped