You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pr_review stays advisory until it earns enforcement. Grow the eval set from n=10 to n≥50 (target 100) with a documented labeling methodology that fixes the v5 mislabeling; track precision/recall per voter role so chronically noisy voters can be demoted on the Epic D ladder; define and encode the explicit pr_review→enforce promotion criterion.
v5 baseline (2026-04-26): n=10 (5 synthetic buggy, 3 historical, 2 synthetic clean). Bug-catch 100% (6/6), location-match 83.3%, raw FP rate 50% — re-classified after triage as dataset error + borderline judgment calls: fix(research): repair github + semantic_scholar source coverage (#2234) #2235 was labeled "clean" but shipped a real bug (wrong env var in a rate-limit message) that the voters CAUGHT. The "false positives" were partly the dataset being wrong — the labeling methodology is the bottleneck, not just the panel.
No per-voter metrics exist; v5 reports panel-level only. The 5 pr_review roles: architect, security, devex, catfish, scope_steward.
The README claim "100% bug-catch" enters the claims registry (Epic A) linked to the v5 artifact WITH the n=10 caveat; v6 supersedes it.
Depends on Epic D for the demotion path (noisy voter → ladder demotion) and the promotion-criterion machinery. #3134 (author identity input) related, not absorbed.
Definition of done
n≥50 labeled eval set with a published rubric; v5 mislabels corrected
Per-voter precision/recall in the outcome store, queryable over time
v6 results doc; promotion criterion encoded in the claims registry
Checkbox reconciliation 2026-06-21: flipped boxes for children closed since the last edit (backlog deep-dive). Remaining unchecked items are genuinely open.
Narrative
pr_reviewstays advisory until it earns enforcement. Grow the eval set from n=10 to n≥50 (target 100) with a documented labeling methodology that fixes the v5 mislabeling; track precision/recall per voter role so chronically noisy voters can be demoted on the Epic D ladder; define and encode the explicit pr_review→enforce promotion criterion.Grounding (Phase 0 recon — docs/research/pr-review-experiment-results-v5.md)
Children
Absorbed
Dependency notes
Depends on Epic D for the demotion path (noisy voter → ladder demotion) and the promotion-criterion machinery. #3134 (author identity input) related, not absorbed.
Definition of done
Checkbox reconciliation 2026-06-21: flipped boxes for children closed since the last edit (backlog deep-dive). Remaining unchecked items are genuinely open.