fix(check): collapse duplicate manual and OSV findings per package version#137
Merged
Merged
Conversation
…rsion
When a refreshed OSV snapshot catches up to a hand-curated manual
advisory, the matcher returns BOTH records for the same (ecosystem,
name, version, path) tuple. The check output layer was rendering
each record as a separate Finding, so a single real-world exposure
showed up twice in `aguara check --fresh` once OSV ingested a tuple
already in the manual snapshot.
Fix lives at the output boundary, not in the matcher. The matcher
keeps returning every record so correlation consumers can still
see the full set; check/runner output collapses to one Finding/Hit
per exposure. EmbeddedSnapshots() returns the manual snapshot first
and the OSV snapshot second, so hits[0] picks the curated advisory
ID (e.g. SOCKET-*) over the OSV one (MAL-*) when both are present.
The user-facing advisory token stays stable across `aguara check`
and `aguara check --fresh`.
Sites touched:
- internal/incident/npm.go installed-tree npm path
- internal/incident/checker.go PyPI site-packages + dist-info
cache scan
- internal/packagecheck/runner.go multi-ecosystem lockfile path,
including the version-alias loop
(Composer v-prefix etc.) which
also breaks on first match
Tests:
- TestCheckNPM_CollapsesManualAndOSVDuplicate: synthetic manual
+ OSV snapshots covering @antv/g2 5.6.8; asserts exactly one
Finding, manual advisory ID wins.
- TestRunnerCollapsesMultipleIntelRecordsPerPackageRef: same
scenario through the packagecheck runner against the existing
pnpm-mini-shai-hulud-antv fixture; asserts one Hit, manual
record wins.
- Existing TestMatcherDistinctIDsAtSameTupleStaySeparate stays
green: the matcher continues to return both records.
- Existing TestRunner_ComposerAliasDoesNotDoubleCountFinding
stays green: alias-loop dedup still works at the new layer.
JSON shape unchanged. KnownCompromised unchanged. OSV importer
unchanged. intel.Matcher unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a refreshed OSV snapshot catches up to a hand-curated manual advisory, the matcher correctly returns both records for the same
(ecosystem, name, version, path)tuple. The check output layer was rendering each record as a separate Finding, so a single real-world exposure showed up twice inaguara check --freshonce OSV ingested a tuple already in the manual snapshot.Where the fix lives
At the output boundary, not in the matcher.
intel.Match) can still see the full set.(ecosystem, name, version, path)tuple.EmbeddedSnapshots()returns the manual snapshot first and OSV second, sohits[0]picks the curated advisory ID (e.g.SOCKET-*) over the OSV one (e.g.MAL-*) when both are present. The user-facing advisory token stays stable acrossaguara checkandaguara check --fresh.Sites touched
internal/incident/npm.gointernal/incident/checker.gointernal/incident/checker.goseen[path], now also short-circuits on first hit)internal/packagecheck/runner.goTests
TestCheckNPM_CollapsesManualAndOSVDuplicate: synthetic manual + OSV snapshots covering the same package version. Asserts exactly one Finding and that the manual advisory ID wins the title.TestRunnerCollapsesMultipleIntelRecordsPerPackageRef: the same scenario routed through the packagecheck runner against the existingpnpm-mini-shai-hulud-antvfixture. Asserts one Hit and that the manual record wins.TestMatcherDistinctIDsAtSameTupleStaySeparatestays green: the matcher continues to return both records.TestRunner_ComposerAliasDoesNotDoubleCountFindingstays green: the version-alias dedup still works at the new layer.Scope and what does not change
KnownCompromisedunchangedintel.MatcherunchangedEnd-to-end
Before, with manual intel + OSV both covering an affected tuple:
After:
Test plan
go test -race -count=1 ./...cleango vet ./...cleangolangci-lint run ./...0 issuesTestMatcherDistinctIDsAtSameTupleStaySeparatestill green (correct invariant preserved)ghcr.io/garagon/aguara:0.18.1confirms 4 -> 2 findings on the--freshpath