Skip to content

Research: semantic change chunking for PR review #313

@sentry-junior

Description

@sentry-junior

PR review currently processes diffs per hunk — the raw unified-diff chunks produced by git. Hunks are syntactic boundaries, not semantic ones, so a single logical change (e.g. renaming a parameter, moving a function, introducing a new abstraction) can span several hunks across files while unrelated changes can land in the same hunk.

Grouping review context by semantic change instead of per-hunk could improve intent recognition and produce more useful review feedback.

Research areas

  • Semantic chunking strategies — how to cluster diff hunks into logical change units (AST-level analysis, symbol-based grouping, commit-message/description cross-referencing, embedding similarity)
  • Tradeoffs vs. per-hunk — cases where semantic grouping helps (renames, refactors, multi-file features) and where it might hurt (unrelated co-located edits, large PRs)
  • Implementation approaches — lightweight heuristics (e.g. shared symbol references) vs. heavier analysis (tree-sitter / AST diffing, LLM-based clustering)
  • Prior art — existing tools or papers on semantic diff grouping (e.g. GumTree, Semantic Diff, difftastic for structural diffs)

Goal

Determine whether semantic chunking meaningfully improves review quality and, if so, propose an approach suitable for integration into Warden's review pipeline.

Action taken on behalf of David Cramer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions