Skip to content

fix(check): collapse duplicate manual and OSV findings per package version#137

Merged
garagon merged 1 commit into
mainfrom
fix/check-collapse-duplicate-findings
May 19, 2026
Merged

fix(check): collapse duplicate manual and OSV findings per package version#137
garagon merged 1 commit into
mainfrom
fix/check-collapse-duplicate-findings

Conversation

@garagon
Copy link
Copy Markdown
Owner

@garagon garagon commented May 19, 2026

Summary

When a refreshed OSV snapshot catches up to a hand-curated manual advisory, the matcher correctly returns both records for the same (ecosystem, name, version, path) tuple. The check output layer was rendering each record as a separate Finding, so a single real-world exposure showed up twice in aguara check --fresh once OSV ingested a tuple already in the manual snapshot.

Where the fix lives

At the output boundary, not in the matcher.

  • The matcher keeps returning every record so correlation consumers (existing analyzers and any future tools that iterate intel.Match) can still see the full set.
  • The check / runner output layers collapse to one Finding (or Hit) per (ecosystem, name, version, path) tuple.

EmbeddedSnapshots() returns the manual snapshot first and OSV second, so hits[0] picks the curated advisory ID (e.g. SOCKET-*) over the OSV one (e.g. MAL-*) when both are present. The user-facing advisory token stays stable across aguara check and aguara check --fresh.

Sites touched

File Path Behaviour after
internal/incident/npm.go installed-tree npm one Finding per package
internal/incident/checker.go PyPI site-packages one Finding per package
internal/incident/checker.go PyPI dist-info cache scan one Finding per cache path (already capped by seen[path], now also short-circuits on first hit)
internal/packagecheck/runner.go multi-ecosystem lockfile runner one Hit per ref; version-alias loop now breaks on first match

Tests

  • TestCheckNPM_CollapsesManualAndOSVDuplicate: synthetic manual + OSV snapshots covering the same package version. Asserts exactly one Finding and that the manual advisory ID wins the title.
  • TestRunnerCollapsesMultipleIntelRecordsPerPackageRef: the same scenario routed through the packagecheck runner against the existing pnpm-mini-shai-hulud-antv fixture. Asserts one Hit and that the manual record wins.
  • TestMatcherDistinctIDsAtSameTupleStaySeparate stays green: the matcher continues to return both records.
  • TestRunner_ComposerAliasDoesNotDoubleCountFinding stays green: the version-alias dedup still works at the new layer.

Scope and what does not change

  • JSON schema unchanged
  • KnownCompromised unchanged
  • OSV importer unchanged
  • intel.Matcher unchanged

End-to-end

Before, with manual intel + OSV both covering an affected tuple:

aguara check /repo --ecosystem npm --fresh --format json
findings_count: 4
package A -> SOCKET-* and MAL-*  (2 findings)
package B -> SOCKET-* and MAL-*  (2 findings)

After:

findings_count: 2
package A -> SOCKET-*
package B -> SOCKET-*

Test plan

  • go test -race -count=1 ./... clean
  • go vet ./... clean
  • golangci-lint run ./... 0 issues
  • TestMatcherDistinctIDsAtSameTupleStaySeparate still green (correct invariant preserved)
  • Docker E2E against ghcr.io/garagon/aguara:0.18.1 confirms 4 -> 2 findings on the --fresh path
  • CI green on this PR

…rsion

When a refreshed OSV snapshot catches up to a hand-curated manual
advisory, the matcher returns BOTH records for the same (ecosystem,
name, version, path) tuple. The check output layer was rendering
each record as a separate Finding, so a single real-world exposure
showed up twice in `aguara check --fresh` once OSV ingested a tuple
already in the manual snapshot.

Fix lives at the output boundary, not in the matcher. The matcher
keeps returning every record so correlation consumers can still
see the full set; check/runner output collapses to one Finding/Hit
per exposure. EmbeddedSnapshots() returns the manual snapshot first
and the OSV snapshot second, so hits[0] picks the curated advisory
ID (e.g. SOCKET-*) over the OSV one (MAL-*) when both are present.
The user-facing advisory token stays stable across `aguara check`
and `aguara check --fresh`.

Sites touched:
  - internal/incident/npm.go        installed-tree npm path
  - internal/incident/checker.go    PyPI site-packages + dist-info
                                    cache scan
  - internal/packagecheck/runner.go multi-ecosystem lockfile path,
                                    including the version-alias loop
                                    (Composer v-prefix etc.) which
                                    also breaks on first match

Tests:
  - TestCheckNPM_CollapsesManualAndOSVDuplicate: synthetic manual
    + OSV snapshots covering @antv/g2 5.6.8; asserts exactly one
    Finding, manual advisory ID wins.
  - TestRunnerCollapsesMultipleIntelRecordsPerPackageRef: same
    scenario through the packagecheck runner against the existing
    pnpm-mini-shai-hulud-antv fixture; asserts one Hit, manual
    record wins.
  - Existing TestMatcherDistinctIDsAtSameTupleStaySeparate stays
    green: the matcher continues to return both records.
  - Existing TestRunner_ComposerAliasDoesNotDoubleCountFinding
    stays green: alias-loop dedup still works at the new layer.

JSON shape unchanged. KnownCompromised unchanged. OSV importer
unchanged. intel.Matcher unchanged.
@garagon garagon merged commit b8d2dc3 into main May 19, 2026
1 check passed
@garagon garagon deleted the fix/check-collapse-duplicate-findings branch May 19, 2026 14:58
@garagon garagon mentioned this pull request May 19, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant