Learnings from Pullfrog: detection and UX improvements

Analysis of [pullfrog/pullfrog](https://github.com/pullfrog/pullfrog) surfaced several patterns worth adopting in Warden. These are the most concrete, high-impact takeaways from a deep codebase comparison.

## Multi-lens parallel review orchestration

Pullfrog dispatches independent read-only subagents per review lens (correctness, security, user-journey, performance, etc.) in parallel, then aggregates and de-dups findings. Warden currently runs one skill at a time per hunk. Coordinated multi-skill reviews with finding aggregation across lenses could improve both recall and coherence.

- Parallel subagent fan-out per lens with independent context discovery
- Orchestrator-level aggregation: overlapping findings from multiple lenses are a strong signal
- Lens selection is adaptive based on PR triage (domain, seams, external contracts touched)

## Cross-run PR context (rolling summary snapshots)

Pullfrog maintains a rolling PR summary file that persists across re-review runs. Each run reads the previous snapshot, uses it to inform triage and lens selection, then updates it with the PR's current state. This gives incremental reviews cumulative memory instead of starting cold.

- Seeds a tmpfile with the previous snapshot (fetched from API)
- Agent reads it at run start alongside the diff
- Agent updates it in place; persisted server-side at run end
- Prevents re-flagging resolved issues and surfaces new risks from new commits

## Repo-level learnings persistence

Pullfrog seeds a repo-level learnings file each run. The agent can record patterns it discovers (e.g. "this repo uses X convention for Y") and reference them in future runs. Warden's skill execution is stateless — each run starts with zero repo-specific context beyond what the skill prompt and diff provide.

## Incremental range-diff for re-reviews

Pullfrog tracks `beforeSha` on `pull_request_synchronize` events and generates incremental range-diffs scoped to new commits. Warden's review-state tracking could be enriched with similar range-diff intelligence to avoid re-analyzing unchanged hunks.

## Structured fix quality validation

Pullfrog's Fix mode runs tests and self-reviews before pushing. On the Warden side, the `suggestedFix` pipeline already validates diff application — but extending this to semantic validation (does the fix actually address the finding?) and test-awareness would improve fix acceptance rates.

---

Context: deep comparison documented in [this Slack canvas](https://sentry.slack.com/docs/T024ZCV9U/F0B3G5XDXEY).

Action taken on behalf of David Cramer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Learnings from Pullfrog: detection and UX improvements #312

Multi-lens parallel review orchestration

Cross-run PR context (rolling summary snapshots)

Repo-level learnings persistence

Incremental range-diff for re-reviews

Structured fix quality validation

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Learnings from Pullfrog: detection and UX improvements #312

Description

Multi-lens parallel review orchestration

Cross-run PR context (rolling summary snapshots)

Repo-level learnings persistence

Incremental range-diff for re-reviews

Structured fix quality validation

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions