Skip to content

bugbot: add recent-commit cross-reference to dedup layer #8

@kofort9

Description

@kofort9

Problem

Bugbot's dedup layer (bugbot/src/dedup.ts) has 3 layers:

  1. Fingerprint cache — previous bugbot scan results
  2. GH issue search — open issues mentioning the same file + title similarity
  3. GH PR search — open PRs touching the same file

Missing: recent git commit cross-reference. If a bug was fixed in a recent commit on origin/main before bugbot runs, bugbot still flags it, files a GH issue, and nightshift creates a worktree — all for already-fixed code.

Evidence

On 2026-03-11, bugbot filed 6 issues for bugs that were already fixed 8-9 days earlier:

Issue Filed Already Fixed Fix Commit
#215 (dead getVariablesByForm) Mar 11 Mar 2 e4e4e73
#216 (dead resolveCourtName) Mar 11 Mar 2 (false positive — function is used) n/a
#221 (formatNumber 999K boundary) Mar 11 Mar 2 fb47934
#222 (DEFAULT_CSV_PATH relative) Mar 11 Mar 2 e4e4e73
#223 (absoluteUrl empty string) Mar 11 Mar 3 7a10081
#233 (untested parseBmfRecord) Mar 11 already tested n/a

All 6 resulted in empty nightshift worktrees (0 commits ahead of main) that sat around for 2+ weeks.

Proposed Fix

Add a Layer 0 in deduplicateFindings() that checks whether the finding's target code was modified in recent commits:

// Layer 0: Check if the flagged code was recently changed on origin/main
// If the function/pattern was touched in the last N commits, skip it —
// the fix may already be on main.
function wasRecentlyModified(scanRoot: string, file: string, pattern: string, lookback = 50): boolean {
  try {
    const log = execFileSync("git", [
      "log", `--max-count=${lookback}`, "--oneline", "-S", pattern, "--", file
    ], { cwd: scanRoot }).toString().trim();
    return log.length > 0;
  } catch {
    return false;
  }
}

The -S flag (pickaxe search) checks if the string was added or removed in recent commits. If the flagged function/pattern shows up in recent commit diffs, it's likely already addressed.

Trade-offs

  • False negatives: A commit that touches the function but doesn't fix the bug would cause bugbot to skip it. Mitigated by keeping lookback small (50 commits ~ 2 weeks of active dev).
  • Performance: One git log -S per finding. Should be fast on local repos. Can batch by file.
  • Scope: Only useful for categories that flag specific code patterns (dead code, untested functions). Categories like "missing test coverage" need different dedup.

Impact

Prevents phantom issues, prevents phantom worktrees, reduces nightshift noise and worktree sprawl.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions