Skip to content

Learnings from Pullfrog: detection and UX improvements #312

@sentry-junior

Description

@sentry-junior

Analysis of pullfrog/pullfrog surfaced several patterns worth adopting in Warden. These are the most concrete, high-impact takeaways from a deep codebase comparison.

Multi-lens parallel review orchestration

Pullfrog dispatches independent read-only subagents per review lens (correctness, security, user-journey, performance, etc.) in parallel, then aggregates and de-dups findings. Warden currently runs one skill at a time per hunk. Coordinated multi-skill reviews with finding aggregation across lenses could improve both recall and coherence.

  • Parallel subagent fan-out per lens with independent context discovery
  • Orchestrator-level aggregation: overlapping findings from multiple lenses are a strong signal
  • Lens selection is adaptive based on PR triage (domain, seams, external contracts touched)

Cross-run PR context (rolling summary snapshots)

Pullfrog maintains a rolling PR summary file that persists across re-review runs. Each run reads the previous snapshot, uses it to inform triage and lens selection, then updates it with the PR's current state. This gives incremental reviews cumulative memory instead of starting cold.

  • Seeds a tmpfile with the previous snapshot (fetched from API)
  • Agent reads it at run start alongside the diff
  • Agent updates it in place; persisted server-side at run end
  • Prevents re-flagging resolved issues and surfaces new risks from new commits

Repo-level learnings persistence

Pullfrog seeds a repo-level learnings file each run. The agent can record patterns it discovers (e.g. "this repo uses X convention for Y") and reference them in future runs. Warden's skill execution is stateless — each run starts with zero repo-specific context beyond what the skill prompt and diff provide.

Incremental range-diff for re-reviews

Pullfrog tracks beforeSha on pull_request_synchronize events and generates incremental range-diffs scoped to new commits. Warden's review-state tracking could be enriched with similar range-diff intelligence to avoid re-analyzing unchanged hunks.

Structured fix quality validation

Pullfrog's Fix mode runs tests and self-reviews before pushing. On the Warden side, the suggestedFix pipeline already validates diff application — but extending this to semantic validation (does the fix actually address the finding?) and test-awareness would improve fix acceptance rates.


Context: deep comparison documented in this Slack canvas.

Action taken on behalf of David Cramer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions