Skip to content

Reduce Aguara scanner noise: filter, deduplicate, and weight findings #17

@rbodkin

Description

@rbodkin

Context

From scanning analysis on PR #13 branch. Aguara is currently the only scanner producing significant findings (188 across 25 skills), but the signal-to-noise ratio is poor:

  • 60% are LOW severity — mostly pattern matches in example code / tutorial content
  • Top categories by volume: THIRDPARTY_009 (HTTP URLs in test strings, 30 hits), THIRDPARTY_008 (CDN scripts in example HTML, 30 hits), CRED_021 (.env references in Angular tutorials, 16 hits)
  • A single skill (react-best-practices) has 56 findings because it contains extensive example code
  • Genuinely concerning findings (PROMPT_INJECTION_011, NLP_HIDDEN_INSTRUCTION, MCP_005) are a small minority (11 of 188)

Proposed improvements

  1. Filter to HIGH+ for ranking: Only use HIGH/CRITICAL findings when computing risk scores. LOW/MEDIUM are informational noise for most skills.
  2. Deduplicate same rule + same evidence: Aguara fires the same rule multiple times on the same pattern (e.g. spawnSync( matched 8 times in one repo).
  3. Weight by file context: Findings in .md, example/test directories, or non-executable files should be weighted lower than findings in actual source code.
  4. Per-skill finding cap: Consider capping findings per skill (e.g. top 10 by severity) to prevent monorepos from dominating aggregate stats.

Data

Severity Count %
LOW 113 60%
MEDIUM 47 25%
HIGH 24 13%
CRITICAL 4 2%

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions