feat(scan): add opt-in transitive reference scanning#225
Conversation
rng1995
left a comment
There was a problem hiding this comment.
Requesting changes. Transitive prefix controls can be escaped through non-normalized URL paths, traversal has no fan-out budget, and shared visited state causes sibling skills to omit shared dependency findings. The merged report also misreports dependency coverage, several report tests are no longer collected, and the branch conflicts with current main. Please address the inline blockers and rebase while preserving current CLI, input-handler, and reporting behavior.
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
47dfb07 to
4fb451c
Compare
|
Addressed the blockers on the transitive traversal path rather than patching each edge case in isolation. The branch now normalizes target identities before allow or deny checks, applies an explicit traversal budget, reuses cached child results per referring root instead of suppressing sibling findings behind one shared visited set, keeps dependency coverage source-aware in reporting, restores the report tests that had fallen out of pytest collection, and replays the whole change onto current |
Summary
Adds opt-in transitive scanning for source references inside scanned skill content, including normalized trust-prefix checks, bounded traversal budgets, shared dependency result reuse across sibling roots, and source-aware reporting.
Closes #97
Root cause
The current scan pipeline resolves one input into one local directory, builds context from that tree, and writes the graph result from that single invocation. Recursive mode only repeats that one-hop scan for immediate local child skill directories, and report emitters assume every finding belongs to the directly requested source. The original transitive branch added follow-up scanning on top of that model, but it still split normalization, caching, and merged coverage across separate helpers, which left path-trust escapes, unbounded fan-out, sibling result suppression, and path-only report merging behind.
Diff Notes
file_cache, filters out adjacent non-source URLs, canonicalizes source identities with dot-segment and unreserved percent-decoding normalization, enforces allow or deny prefix checks on normalized path boundaries, and owns per-root visited-state mutation.--transitive,--transitive-depth, and repeated allow or deny prefix controls toskillspector scan, then route both single-skill and recursive multi-skill entrypoints through a cached traversal helper.InputHandlerfor approved external targets so existing host allowlists, SSRF checks, clone or download handling, and archive protections stay authoritative, including archive URLs on allowed Git hosts.transitive_depthandsource_urltoFinding, preserve the root cleanup path through both the merged and zero-depth transitive paths, keep baseline fingerprints scoped by provenance, and merge component coverage with source-aware keys so same-named dependency files do not collapse or appear skipped.main.Scope
This change stays on source types already supported by
InputHandler, normalized trust-prefix checks, traversal depth plus hard fan-out limits, allow or deny prefix controls, cache reuse across recursive siblings, and provenance in reports. It does not add a web crawler, new allowed hosts, MCP behavior, or any default behavior change when--transitiveis absent.Verification
python -m pytest tests/unit/test_transitive.py tests/unit/test_cli.py tests/nodes/test_report.py tests/nodes/test_sarif_rules_and_empty_findings.py -q- pass,110 passeduv run ruff check src/ tests/- passuv run ruff format --check src/ tests/- not clean locally; currently reportssrc/skillspector/nodes/analyzers/mcp_least_privilege.pyLint & Test (Python 3.12),Lint & Test (Python 3.13), andDCO Check- pending maintainer approval and rerun after push