feat(ce-doc-review): autofix tiers, per-finding walk-through, multi-round memory by tmchow · Pull Request #601 · EveryInc/compound-engineering-plugin

tmchow · 2026-04-19T00:04:58Z

Summary

Overhaul ce-doc-review to match the interaction quality of ce-code-review (post-PR #590). The current skill surfaces too many findings as "needs user judgment" when one clear fix exists, nitpicks at low confidence, and ends with a binary "refine / complete" question that forces re-review when the user wants to apply fixes and move on. This PR replaces binary classification with a three-tier system aligned with ce-code-review's names, ports the per-finding walk-through + bulk preview + routing question interaction, adds in-doc Open Questions deferral, introduces multi-round decision memory, rewrites ce-learnings-researcher to handle domain-agnostic institutional knowledge, and expands the ce-compound frontmatter problem_type enum.

Brainstorm: docs/brainstorms/2026-04-18-ce-doc-review-autofix-and-interaction-requirements.md (38 requirements, R31–R35 dropped from scope per pipeline-separation decision).

Plan: docs/plans/2026-04-18-001-feat-ce-doc-review-autofix-and-interaction-overhaul-plan.md (8 units, 4 phases).

What changed

Autofix classification (three tiers, ce-code-review-aligned)

autofix_class enum renamed from [auto, present] to [safe_auto, gated_auto, manual]:

safe_auto — one clear correct fix, applied silently
gated_auto — concrete fix exists but touches document meaning; user confirms via walk-through (Apply marked (recommended))
manual — requires user judgment; multiple valid approaches

No advisory tier. Low-confidence manual findings (above a 0.40 FYI floor, below the severity gate) surface in a distinct FYI subsection at the presentation layer — observational value without forcing a decision.

Per-severity confidence gates replace the flat 0.50 floor: P0 0.50 / P1 0.60 / P2 0.65 / P3 0.75. Residual-concern promotion (old step 3.4) removed; replaced with cross-persona agreement boost (+0.10) on merged findings multiple personas flagged.

Classification rule sharpened with a strawman-aware test: a "do nothing / accept the defect" alternative is the failure state, not a real alternative. When any genuine non-strawman alternative exists, safe_auto downgrades to gated_auto so the user sees the tradeoff before silent apply.

Framing guidance block added to the shared subagent template (observable-consequence-first, why-the-fix-works grounding, 2-4 sentence budget, positive/negative example pair). One file edit propagates to all 7 personas.

Interaction model (ported from ce-code-review)

After safe_auto fixes apply, the new four-option routing question dispatches:

A. Review each finding one by one (per-finding walk-through)
B. LFG (best-judgment action per finding)
C. Append findings to the doc's Open Questions section
D. Report only

Per-finding walk-through renders each finding in two surfaces — a terminal markdown block (explanation) and a compact two-line question stem (decision). Four per-finding options: Apply / Defer / Skip / LFG-the-rest, with the post-tie-break recommendation marked (recommended). N=1 and no-append adaptations included.

Bulk preview shows the plan before LFG, Append-to-Open-Questions, and walk-through LFG-the-rest executes. Two options: Proceed / Cancel.

Differences from ce-code-review: no tracker-detection tuple or sink-availability logic (Defer is always in-doc append); no [TRACKER] label substitution; no fixer-subagent batch dispatch (the orchestrator applies edits inline); no advisory-variant Acknowledge option.

In-doc Open Questions deferral (document-native Defer)

When the user picks Defer on a finding, an entry appends to a ## Deferred / Open Questions section at the end of the document under review, under a timestamped subsection (### From YYYY-MM-DD review). Deferred findings stay attached to the document they came from — no external tracker, no sibling file. Failure path (read-only doc, write error, concurrent-edit collision) surfaces a blocking sub-question with Retry / Record-only / Convert-to-Skip options.

Multi-round decision memory

Every review round after the first passes a cumulative decision primer to every persona. Rejected findings (Skip or Defer) from any prior round in the current session are carried in a <prior-decisions> block in the subagent template. Synthesis-level R29 suppression drops re-raised rejected findings using the same fingerprint + evidence-substring predicate as R30 fix-landed verification. No diff is passed — fixed findings self-suppress because evidence is gone from the current doc; regressions surface as normal findings; rejected findings are handled by pattern-match suppression. Cross-session persistence is out of scope.

Terminal question (three options, separates "apply" from "re-review")

The old binary "refine / complete" question replaced with:

Apply decisions and proceed to <next stage> — default when fixes applied; <next stage> is ce-plan for requirements docs, ce-work for plan docs
Apply decisions and re-review — opt-in when the user believes edits warrant another pass
Exit without further action

Zero-actionable case drops the re-review option and adapts the primary label to Proceed to <next stage>. Caller-context (ce-brainstorm nested invocation, ce-plan 5.3.8, standalone) is handled implicitly by the agent reading conversation state — no nested:true argument or caller-side change.

learnings-researcher rewrite (domain-agnostic)

ce-learnings-researcher rewritten for domain-agnostic institutional knowledge. Identity framing dropped "preventing repeated mistakes" for "find and distill applicable learnings." Accepts a structured <work-context> input (Activity / Concepts / Decisions / Domains) while preserving free-form fallback. Hardcoded Feature-Type-to-Directory table replaced with dynamic probe of docs/solutions/ at invocation time. Keyword extraction expanded from 4 dimensions (Module, Technical terms, Problem indicators, Component) to 8 (adds Concepts, Decisions, Approaches, Domains). critical-patterns.md read is now conditional on file existence (the file doesn't exist in this repo). Integration Points drops ce-doc-review — the agent is ce-plan-owned and ce-doc-review explicitly does not dispatch it (pipeline-separation principle per docs/solutions/skill-design/research-agent-pipeline-separation-2026-04-05.md).

ce-compound problem_type enum expansion

Four knowledge-track values added: architecture_pattern, design_pattern, tooling_decision, convention. Absorbs the best_practice overflow that had been a catch-all. Category mappings added. Both ce-compound and ce-compound-refresh schemas updated in sync. Plugin AGENTS.md gets a new "Documented Solutions" discoverability section naming all knowledge-track and bug-track categories. 10 existing best_practice entries migrated to narrower values; one correctness-gap schema violation resolved (docs/solutions/workflow/todo-status-lifecycle.md → workflow_issue).

Scope boundaries

No external tracker integration. Defer is an in-doc section.
No persona activation/selection logic change. The 7 personas and conditional triggers stay as-is.
No batch fixer subagent. Document fixes apply inline.
No document-review-beta fork. Phased ship directly on the stable skill; headless envelope preserves backward compatibility with ce-brainstorm and ce-plan callers.
No cross-session primer persistence. In-memory across rounds within one session; fresh round 1 on new invocation.
No frontmatter schema dimensional redesign. Enum expansion only.

Testing

bun test: 798 pass, 0 fail. 13 new contract tests cover the three-tier enum, framing guidance, strawman rule, Open-Questions exclusion, decision primer, per-severity gates, FYI floor, cross-persona boost, R29 suppression, R30 verification, headless envelope tier buckets, terminal question three-option structure, label adaptation, Interactive-mode pre-load, walk-through and bulk-preview required mechanics, Open-Questions-defer append flow and failure path, schema enum expansion, schema duplicate-sync, yaml-schema category mappings, and learnings-researcher domain-agnostic framing.
bun run release:validate: in sync. compound-engineering currently has 50 agents, 43 skills, 0 MCP servers.
Seeded test fixture created at tests/fixtures/ce-doc-review/seeded-plan.md with known issues planted across tier shapes — 3 safe_auto candidates, 3 gated_auto, 5 manual, 5 FYI candidates, 3 drop-worthy P3s. Intended for pre-merge pipeline validation: run the new synthesis against this fixture and verify the bucket distribution matches expectations. Not yet exercised in this PR — doing so requires the new pipeline to be live in the installed skill cache. Recommend running once the PR lands and before promoting this into the default document-review flow.

Review

Brainstorm went through document-review (5 personas, 38 findings → 13 auto-applied, pushed back on 15 findings through 2 refinement rounds)
Plan went through recursive document-review against itself (5 personas, 38 findings → 9 auto-applied, resolved 15 findings through 2 refinement rounds including scope trim, tier rename, strawman safeguard, Open-Questions exclusion, primer structure spec, R30 matching predicate, pipeline sequencing fix, envelope-promotion-to-Unit-4, ce-plan-side validation for Unit 2)
Pipeline-separation learning consulted (docs/solutions/skill-design/research-agent-pipeline-separation-2026-04-05.md); drove decision to NOT dispatch learnings-researcher from ce-doc-review
State-machine discipline learning applied (docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md); finding-lifecycle state machine modeled explicitly in the plan with each transition narrated in synthesis prose
Post-PR-590 ce-code-review end-to-end learnings treated as mandatory reading (docs/solutions/best-practices/ce-pipeline-end-to-end-learnings-2026-04-17.md)

Commits (9)

Each unit shipped as its own commit so the progression is reviewable:

docs(ce-doc-review): brainstorm + plan
feat(ce-compound): Unit 1 — enum expansion + 10 migrations
feat(learnings-researcher): Unit 2 — domain-agnostic rewrite
feat(ce-doc-review): Unit 3 — subagent template upgrade + findings-schema enum
feat(ce-doc-review): Unit 4 — synthesis pipeline with per-severity gates and headless envelope
feat(ce-doc-review): Unit 5 — routing question + walk-through + bulk preview
feat(ce-doc-review): Unit 6 — in-doc Open Questions deferral
feat(ce-doc-review): Unit 7 — terminal question + multi-round decision primer
test(ce-doc-review): Unit 8 — contract tests + framing polish

Post-deploy validation

Run ce-doc-review against tests/fixtures/ce-doc-review/seeded-plan.md and verify bucket distribution: ≥2 of 3 seeded safe_auto applied silently, ≥2 of 3 seeded gated_auto in walk-through Apply bucket, all 3 drop-worthy P3s dropped by the 0.75 gate, ≥3 of 5 FYI candidates in the FYI subsection, zero false-auto on manual-shaped seeds.
Re-run ce-doc-review against this PR's brainstorm doc (docs/brainstorms/2026-04-18-ce-doc-review-autofix-and-interaction-requirements.md) as a before/after comparison with the old pipeline; verify meaningful decision-count reduction without regressing previously-applied P0/P1 fixes.
Run ce-plan against any plan doc and verify the rewritten learnings-researcher produces reasonable results across code-bug queries and non-code queries (skill-design, workflow). If output quality regresses, iterate via prompt edits in follow-up PRs.

Fixes #562 spiritually (document-review analog of the ce-review interactive judgment overhaul).

🤖 Generated with Claude Code

…ion overhaul Capture requirements brainstorm (38 requirements across 11 sections) and implementation plan (8 units, 4 phases) for overhauling ce-doc-review: three-tier autofix classification (safe_auto / gated_auto / manual) aligned with ce-code-review names, per-severity confidence gates, ported walk-through + bulk-preview + routing-question interaction, in-doc Open Questions deferral, multi-round decision memory, domain-agnostic learnings-researcher rewrite, and ce:compound problem_type enum expansion. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

… / design_pattern / tooling_decision / convention Adds four knowledge-track values to the ce:compound frontmatter schema to absorb the best_practice overflow that had been serving as a catch-all for architecture, design, tooling, and convention learnings. best_practice remains valid but is now documented as the fallback for entries that no narrower knowledge-track value covers. Schema updates land in both ce-compound and ce-compound-refresh (duplicate files are kept in sync intentionally). Category mappings added for the four new values under docs/solutions/. Plugin AGENTS.md gets a new "Documented Solutions" discoverability section naming all knowledge-track and bug-track categories so agents reading project instructions know the full taxonomy. Migrates 10 existing best_practice entries to narrower values and resolves one correctness-gap schema violation (workflow/todo-status-lifecycle.md -> workflow_issue). Remaining best_practice entries are genuinely broad enough to stay in that bucket (e.g., ce-pipeline-end-to-end-learnings). bun test tests/frontmatter.test.ts passes. Part of the ce-doc-review autofix and interaction overhaul plan. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

… knowledge The learnings-researcher was shaped around bug-hunting — its taxonomy, category table, output framing, and DO/DON'T guidelines all assumed the caller was looking for past defects. But the team captures more than bugs: architecture patterns, design patterns, tooling decisions, conventions, and workflow learnings are all first-class in docs/solutions/ now. Authors were retreating to best_practice as a catch-all because the agent didn't reliably surface non-bug entries. This rewrite treats all learning shapes equally: - Identity framing dropped "preventing repeated mistakes" for "find and distill applicable learnings" — bug avoidance is one flavor, not the premise - Accepts a structured <work-context> input (Activity / Concepts / Decisions / Domains) and still handles free-form text for callers that don't use the structured form - Replaced the hardcoded Feature-Type-to-Directory table with a dynamic probe of docs/solutions/ at invocation time so new subdirectories (architecture-patterns/, design-patterns/, conventions/, etc.) are picked up automatically - Expanded keyword extraction from 4 dimensions (Module, Technical terms, Problem indicators, Component) to 8 by adding Concepts, Decisions, Approaches, Domains — so design-pattern and convention queries have something to match on - Made Step 3b (critical-patterns.md read) conditional on file existence — the file does not exist in this repo and demanding it was emitting warnings - Reframed output to "Applicable Past Learnings" with a Type field so knowledge-track entries surface as themselves (architecture_pattern, design_pattern, etc.) instead of being flattened to a bug-shape - Reworked Integration Points to drop the ce-doc-review reference (it explicitly does not dispatch this agent) and stop framing planning-time as the agent's primary home - Updated DO/DON'T guidelines: don't discard candidates for missing bug-shaped fields, treat non-bug learnings as first-class, don't assume critical-patterns.md exists Preserved the grep-first filtering strategy intact (it was already efficient) and the overall 7-step flow. File grew from 246 to 283 lines due to the expanded taxonomy and structured input documentation. Part of the ce-doc-review autofix and interaction overhaul plan. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…tion, strawman rule, framing guidance The persona subagent template was producing output that pushed too much into 'present' (requires user judgment) because the classification rule was permissive about what counted as a 'reasonable alternative' — (a) real fix + (b) 'accept the defect' strawman + (c) 'document in release notes' strawman looked like three valid approaches, inflating the manual/present bucket. This upgrade sharpens classification and improves persona output quality across all seven document-review personas via one file edit. Changes: - autofix_class enum renamed and expanded: [auto, present] -> [safe_auto, gated_auto, manual]. Aligns names with ce-code-review's first three tiers so cross-skill vocabulary is consistent. Drops the advisory tier in favor of a presentation-layer FYI subsection handled by synthesis - Three-tier routing spelled out with one-sentence descriptions and examples: safe_auto for single-option fixes, gated_auto for concrete fixes that touch meaning/scope (the new default for 'I know the fix but the author should sign off'), manual for genuine judgment calls - Strawman-aware classification rule added: a 'do nothing / accept the defect' option is not a real alternative — it is the failure state the finding describes. Same for 'document in release notes' and similar sidesteps. Listed alternatives must be ones a competent implementer would genuinely weigh. Positive and negative example pair included - Strawman safeguard on safe_auto: if safe_auto is chosen via strawman-dismissal, the dismissed alternatives must be named in why_it_matters. Any genuine non-strawman alternative downgrades the finding to gated_auto (user confirms) rather than silent apply - Auto-promotion patterns consolidated into an explicit rule set: factually incorrect behavior, missing standard security/reliability controls, codebase-pattern-resolved fixes (with cited pattern), framework-native-API substitutions, mechanically-implied completeness - Framing guidance block added (ported from ce-code-review's subagent-template): observable-consequence-first phrasing, why-the-fix-works grounding, 2-4 sentence budget, weak/strong illustrative pair. Tuned for document-review's surface — 'observable consequence' rather than 'observable behavior' since doc problems surface as disagreements, misreads, and wrong decisions rather than runtime behavior - Persona exclusion rule added: personas must exclude '## Deferred / Open Questions' sections and their timestamped subsections from review scope. Prevents round-2 feedback loops where deferred entries get flagged as new findings or quoted as evidence for suppression checks. Unit 6 is going to create these sections as the document-native defer mechanic Template grows from 52 to 95 lines, still under the 150-line @ inclusion threshold so SKILL.md continues to use @-inclusion. findings-schema.json autofix_class enum and description updated to match. Part of the ce-doc-review autofix and interaction overhaul plan. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…s and FYI subsection Rewrites the synthesis pipeline to route findings by the three new tiers (safe_auto / gated_auto / manual) with per-severity confidence gates and a distinct FYI subsection for low-confidence manual findings. This is the load-bearing synthesis change that enables the interaction model (Unit 5) and multi-round memory (Unit 7) to follow. Pipeline changes: - Step 3.2: replace flat 0.50 gate with per-severity table. P0 0.50, P1 0.60, P2 0.65, P3 0.75. FYI floor at 0.40: below-gate manual findings above the floor survive in an FYI subsection rather than being dropped — preserves observational value without forcing decisions - Step 3.4: delete residual-concern promotion. Replace with cross-persona agreement boost (+0.10, capped at 1.0) applied to merged findings that multiple personas flagged. Mirrors ce-code-review stage 5 step 4. Residual concerns surface in Coverage only; below-gate findings are not promoted back into the review surface - Step 3.5: tier language updated for three tiers. Contradictions route to manual with both perspectives - Step 3.6: auto-promotion patterns consolidated into an explicit rule set: codebase-pattern-resolved, factually incorrect behavior, missing standard security/reliability controls, framework-native-API substitutions, mechanically-implied completeness additions. Promotion targets safe_auto or gated_auto depending on whether the addition is substantive. Strawman-downgrade safeguard: if safe_auto was chosen via strawman-dismissal per the subagent template, synthesis verifies the dismissed alternatives were genuinely strawmen and downgrades to gated_auto if any were plausible design choices - Step 3.7: routing rewritten for three tiers + FYI. safe_auto applies silently in Phase 4. gated_auto enters walk-through with Apply recommended. manual enters walk-through with user-judgment framing. FYI-subsection routes outside the walk-through entirely - State-machine narration added to the phase header so each transition (Raised -> Classified -> one of four buckets) is a named step in synthesis prose, not an implied carry-forward - R30 fix-landed matching predicate specified explicitly: fingerprint + evidence-substring overlap >50%. Section renames count as different locations so a rename-caused re-raise is treated as new, not as 'fix did not land' Phase 4 updates: - Headless envelope extended with distinct 'Gated-auto findings' and 'FYI observations' sections alongside existing 'Applied N safe_auto fixes' and 'Manual findings' sections. Backward-compatible: callers that only read the old sections continue to work; new sections surface alongside, not instead of, the existing buckets. Landing the envelope change here (Unit 4) rather than Unit 8 means ce-plan 5.3.8 headless invocations never observe an interstitial state where new tiers are produced but the envelope can't carry them - Interactive presentation split by tier and finding type. FYI observations get their own subsection. Applied safe_auto fixes list every change with section + reviewer attribution - Phase 5 simplified: handoff to Unit 5 walkthrough.md for routing question and per-finding interaction. Terminal question (Unit 7's three-option structure) fires after walk-through or bulk action Creates tests/fixtures/ce-doc-review/seeded-plan.md — a test fixture plan doc with known issues planted across tier shapes (3 safe_auto, 3 gated_auto, 5 manual, 5 FYI candidates, 3 drop-worthy P3s). Used for the validation gate: run the new pipeline against this fixture and verify bucket distribution matches expectations before shipping Phase 2. Part of the ce-doc-review autofix and interaction overhaul plan. Unit 4 of 8. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…and bulk preview Ports ce-code-review's interactive flow (post-PR #590) into ce-doc-review while adapting for document-specific semantics. Replaces the current binary 'refine / complete' Phase 5 question with a four-option routing question that dispatches to focused interactions sized to the user's intent. New references: - walkthrough.md — the per-finding walk-through that fires when the user picks routing option A. Each finding renders in two surfaces: a terminal markdown block carrying the explanation (heading, What's wrong, Proposed fix, Why it works) and a compact two-line question stem carrying the decision. Four per-finding options: Apply / Defer (append to Open Questions) / Skip / LFG-the-rest. Post-tie-break recommendation marked with '(recommended)'. N=1 and no-append adaptations included. Unified completion report covers walk-through, LFG, Append-to-Open-Questions, and zero-findings terminal paths - bulk-preview.md — the compact plan preview before every bulk action (routing B, routing C, walk-through LFG-the-rest). Grouped buckets (Applying / Appending to Open Questions / Skipping) with one-line plain-English summary per finding. Two options: Proceed or Cancel. Cancel semantics route back to the originating question (routing or per-finding). Proceed executes the plan and emits the unified completion report Differences from ce-code-review's equivalents: - No tracker-detection tuple or sink-availability logic — Defer is always an in-doc append, no external tracker - No [TRACKER] label substitution — bucket is always 'Appending to Open Questions' - No fixer-subagent batch dispatch — the orchestrator applies gated_auto/manual fixes inline (document edits are single-file markdown with no cross-file dependencies) - No advisory variant with Acknowledge — advisory-style findings surface in the FYI subsection at the presentation layer, outside the walk-through - No .context/ artifact-lookup paths — ce-doc-review personas don't write run artifacts - Observable-consequence phrasing tuned for document surface (readers, implementers, downstream decisions) rather than ce-code-review's runtime-behavior framing SKILL.md changes: - Added 'Interactive mode rules' section at the top with the AskUserQuestion pre-load directive and numbered-list fallback rule. Mirrors ce-code-review's approach for reliable question-tool loading on Claude Code - Updated the Phases 3-5 handoff to load walkthrough.md and bulk-preview.md after agent dispatch completes, lazily via backtick paths per the skill compliance checklist Part of the ce-doc-review autofix and interaction overhaul plan. Unit 5 of 8. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Implements the Defer action's document-native analogue to ce-code-review's tracker-ticket mechanic. When the user chooses Defer on a finding (from the walk-through or from the bulk-preview Append-to-Open-Questions path), an entry appends to a '## Deferred / Open Questions' section at the end of the document under review. Design decisions: - In-doc append instead of sibling follow-up file or external tracker: deferred findings stay attached to the document they came from, naturally discoverable by anyone reading the doc, no new infrastructure required. Trade-off: deferred findings visibly mutate the doc — but that's the point, an Open Questions section is a canonical place for 'worth remembering but not acting on now' in planning docs - Section location handling covers three cases: end-of-doc heading (most common — append inside), mid-doc heading (user positioned deliberately — still append at that location, don't duplicate at end), absent (create at end, or above trailing separators/footers, or after frontmatter for frontmatter-only docs) - Timestamped subsections (### From YYYY-MM-DD review) group multiple defers per review and keep sequential reviews distinguishable. Same-day reviews within a session share one subsection; cross-day reviews get distinct subsections - Entry format: title, severity, reviewer attribution, confidence, and the why_it_matters framing. Intentionally excludes suggested_fix and evidence — those live in the run artifact when applicable; the in-doc entry is a concern summary for the reader returning months later, not a full decision packet - Idempotence on title collisions within the same subsection prevents duplicate entries from LFG-the-rest re-routing or orchestrator retry paths - Concurrent-edit safety: re-read document from disk before every append to reduce collision window with user-in-editor writes. No persistent lock — interactive review needs observation-before-write, not coordination - Failure path via blocking sub-question (Retry / Record-only / Convert- to-Skip) ensures no silent failures. Default-to-Record-only when the sub-question goes unanswered keeps in-memory state consistent - Upstream availability signal cached at Phase 4 start — known-unwritable documents suppress the Defer option in the walk-through menu and routing question; mid-flow failures route through the per-finding failure path without flipping the session signal Referenced by references/walkthrough.md (per-finding Defer option) and references/bulk-preview.md (routing option C Proceed). Includes a worked example showing the appended-content format in context. Part of the ce-doc-review autofix and interaction overhaul plan. Unit 6 of 8. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…nd decision primer Replaces the current binary 'refine / complete' Phase 5 question with a three-option terminal question that separates 'apply decisions' from 're-review' — the most common user frustration the brainstorm surfaced was that the old wording forced a re-review even when the user wanted to apply fixes and move on. Terminal question: - 'Apply decisions and proceed to <next stage>' — default when fixes were applied or decisions were made. <next stage> substitutes to ce-plan for requirements docs and ce-work for plan docs - 'Apply decisions and re-review' — opt-in re-review when the user believes their edits warrant another pass - 'Exit without further action' — user wants to stop for now When fixes_applied_count == 0 (zero-actionable case or all-Skip walk- through), the 're-review' option drops (re-review is useless when nothing changed) and the 'Apply decisions and' prefix drops from the primary option's label so it matches what the system is doing. Caller-context handling is implicit rather than requiring a formal nested:true argument. The agent interprets 'Proceed to <next stage>' contextually from the visible conversation state — when invoked from inside another skill's flow, it returns control to the caller; when standalone, it dispatches the next stage skill. No caller-side change to ce-brainstorm or ce-plan. If implicit handling proves unreliable, an explicit nested:true flag can be added as a follow-up. Multi-round decision memory: - SKILL.md Phase 2 adds a new {decision_primer} template variable that accumulates prior-round decisions across the session. Round 1 renders an empty <prior-decisions> block; round 2+ renders applied and rejected entries by round with section, title, reviewer, confidence, and rejection reason. Skip and Defer both count as 'rejected' for suppression purposes. Cross-session persistence is out of scope — fresh invocations start a fresh round 1 - synthesis-and-presentation.md adds an R29 suppression step. For each current-round finding, compare against the primer's rejected list using the same fingerprint + evidence-substring predicate as R30. On match, drop the current-round finding and log the suppression in Coverage. Materially-different exception: if the section was edited enough that the prior-round evidence quote no longer appears in the current document, treat as new. Synthesis is the authoritative gate; the persona instruction below is a soft hint - subagent-template.md adds a <decision-primer-rules> block that instructs personas to honor the primer: don't re-raise a finding whose title and evidence pattern-match a prior-round rejected entry unless materially different. Prior-round Applied findings are informational — the orchestrator verifies those landed via R30. Round 1 runs with no primer constraints Part of the ce-doc-review autofix and interaction overhaul plan. Unit 7 of 8. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…ugh, defer, enum expansion Extends pipeline-review-contract.test.ts with structural assertions for the ce-doc-review overhaul. Uses pattern matching on key tokens rather than exact prose so future wording improvements don't ossify the tests. New test groups: - ce-doc-review contract (8 tests): - findings-schema autofix_class enum uses [safe_auto, gated_auto, manual] — aligned with ce-code-review's first three tier names, no advisory tier, no legacy auto/present values - subagent template carries framing guidance, strawman rule, strawman safeguard, Deferred/Open-Questions exclusion, and decision-primer slot + rules - synthesis pipeline has per-severity gates (P0 0.50 / P1 0.60 / P2 0.65 / P3 0.75), FYI floor at 0.40, cross-persona agreement boost (+0.10), three-tier routing, R29 rejected-finding suppression, R30 fix-landed matching predicate - headless envelope surfaces new tiers distinctly (Applied N safe_auto fixes / Gated-auto findings / Manual findings / FYI observations) with preserved 'Review complete' terminal signal - terminal question is three-option by default with label adaptation in the zero-actionable case; next-stage substitution rules documented - SKILL.md has Interactive mode rules with AskUserQuestion pre-load via ToolSearch, decision_primer variable in dispatch, lazy references to walkthrough.md and bulk-preview.md - walkthrough has routing question with front-loaded distinguishing words, four per-finding options, (recommended) marker, no advisory variant, no tracker-detection machinery; bulk-preview has Proceed/ Cancel with three bucket labels, no Acknowledging bucket - open-questions-defer implements the append mechanic with timestamped subsections, field-level entry format, failure-path sub-question (Retry / Record-only / Convert-to-Skip), no tracker-detection logic - ce-compound frontmatter schema expansion contract (3 tests): - problem_type enum includes architecture_pattern, design_pattern, tooling_decision, convention — with best_practice retained as fallback - ce-compound-refresh schema is byte-identical to canonical ce-compound schema (kept in sync per AGENTS.md duplicate convention) - yaml-schema.md documents category mappings for the four new values - ce-learnings-researcher domain-agnostic contract (1 test): - Agent prompt frames as domain-agnostic (not bug-focused), names Architecture / Design / Tooling / Conventions as first-class learning shapes, accepts structured <work-context> input, replaces the hardcoded category table with a dynamic probe, reads critical-patterns.md conditionally, and drops ce-doc-review from Integration Points (agent is ce-plan-owned) All 13 new tests pass. Full suite: 798 pass, 0 fail. bun run release:validate: in sync. Part of the ce-doc-review autofix and interaction overhaul plan. Unit 8 of 8 — final unit; this ships the contract tests that lock in the three-tier / multi-round / interaction-model changes shipped in Units 3 through 7. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d897823f52

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Preserve evidence snippets in decision primer entries (thread r3106059451, P1) R29 rejected-finding suppression and R30 fix-landed verification both use fingerprint + evidence-substring overlap (>50%) as their matching predicate. The primer format in SKILL.md previously stored only section/title/reviewer/confidence/reason, leaving the orchestrator without prior-round evidence needed to compute the overlap check. Adds an 'Evidence:' line to each primer entry containing the first evidence quote truncated to ~120 chars (word-boundary-preserving, internal quotes escaped). Documented inline why the field is required. - Align review-output-template.md with new three-tier taxonomy (thread r3106059452, P2) The interactive presentation template still referenced the legacy 'auto/present' contract — summary line, 'Auto-fixes Applied' heading, 'Auto | Present' coverage columns, and rules text all leaked stale bucket semantics into user-facing output. Rewrites the template for the new taxonomy: 'Applied N safe_auto fixes' summary line, a Tier column in severity tables, a dedicated 'FYI Observations' section, and Coverage columns for SafeAuto | GatedAuto | Manual | FYI | Residual. Residual Concerns section clarified: no longer promoted via residual-concern promotion (dropped in Unit 4) — listed for transparency only. Adds a note up-front that this template covers Phase 4 interactive presentation specifically; the headless envelope lives in synthesis-and-presentation.md. All contract tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df79db5015

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Both findings fall in the same pattern the prior round surfaced: high-level rules written without the deterministic mechanics the orchestrator needs to actually implement them. Cross-invocation signal from resolve-pr-feedback flagged the theme. - Add deterministic recommended-action tie-break to synthesis (thread r3106064985, P1) walkthrough.md said 'recommended action has already been normalized by synthesis step 3.6' but step 3.6 only handled auto-promotion; there was no rule for picking Apply/Defer/Skip when contributing personas implied different actions. Without a tie-break, LFG outcomes are non-deterministic across runs and walk-through '(recommended)' marks drift. Adds synthesis step 3.5b 'Deterministic Recommended-Action Tie-Break' implementing 'Skip > Defer > Apply' (most conservative first). Scans contributing personas in that order; first match wins. Documents persona-to-action mapping (safe_auto/gated_auto -> Apply, manual tradeoffs -> Defer, residual-suppression-eligible -> Skip, contradiction-set with 'keep as-is' -> Skip). Silent-personas fallback to Apply. Records a conflict-context string on the merged finding for the walk-through's R15 conflict-context line. Updates walkthrough.md's reference from 'synthesis step 3.6' to the correct '3.5b'. Invariant: walk-through and bulk-preview read recommended_action and never recompute. LFG-the-rest and routing option B execute the recorded recommendation in bulk, so the same review artifact always produces the same bulk plan. - Strengthen dedup key in open-questions-defer.md (thread r3106064987, P2) Title-only idempotence silently drops legitimate deferred entries when two different concerns share a short title in the same review date. Loses user-visible backlog context. Compound key is now 'normalize(section) + normalize(title) + evidence_fingerprint', mirroring the R29/R30 matching predicate (fingerprint + evidence-substring overlap) so cross-round and intra-round dedup behave consistently. Evidence fingerprint is the first ~120 chars of the finding's first evidence quote (same slice used in the decision primer). Falls back to section+title when evidence is unavailable. All contract tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afc699da8f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Both findings continue the pattern: specification gaps in deterministic orchestration mechanics. Different gaps each round — Codex keeps finding adjacent spots where high-level rules need concrete implementation detail. - Drop routing-level recommendation labeling (thread r3106084407, P2) walkthrough.md's routing question said any of A/B/C/D could be marked '(recommended)' based on synthesis, but synthesis only defines per-finding recommended_action — there was no rule for picking a routing-level recommendation. Rather than invent one (a mapping from finding-set shape to routing recommendation would pressure users toward automated paths in ways that conflict with the user-intent framing of the four options), remove the routing- level (recommended) labeling. Per-finding (recommended) inside the walk-through and bulk preview still applies — that's deterministic via step 3.5b. The routing choice itself is user-intent: engage / trust / triage / skim. - Persist dedup-key fields in the appended entry (thread r3106084409, P2) Previous round added a compound dedup key (section + title + evidence_fingerprint) but the deferred-entry format on disk only stored title/severity/reviewer/confidence/why_it_matters — so retries or same-day reruns couldn't reliably reconstruct the key from persisted data. Duplicates could still slip through. Adds an HTML-comment line on each appended entry carrying the machine-readable dedup fields: . Keeps the visible content reader-friendly (section now surfaces in the header line alongside severity/reviewer/confidence; the HTML comment stays hidden in rendered markdown) while persisting exactly the fields Step 4's compound-key check needs. Legacy entries without the comment fall back to title-only comparison — imperfect but strictly better than duplicate-appending. All contract tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f65830da9b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T00:53:03Z

-1. The finding's `why_it_matters` cites a specific existing codebase pattern -- not just "best practice" or "convention," but a concrete pattern with a file, function, or usage reference
-2. The finding includes a concrete `suggested_fix` that follows that cited pattern
-3. There is no genuine tradeoff -- the codebase context resolves any ambiguity about which approach to use
+If the contributing personas are all silent on action (e.g., a merged `manual` finding from personas that all flagged it as observation without recommendation) → `recommended_action: Apply` as the pragmatic default; the walk-through still lets the user pick any of the four options.


Do not default actionless manual findings to Apply

Step 3.5b sets recommended_action: Apply when contributing personas are “silent on action”, but manual findings are allowed to omit suggested_fix (3.7), and end-of-walk-through execution applies each Apply-set finding’s suggested_fix (walkthrough.md line 183). This means a merged manual observation with no concrete fix can still be auto-routed to Apply (including via LFG), yielding non-executable or empty apply actions. Gate Apply on suggested_fix presence (or default this branch to Defer/Skip) so every recommended action is executable.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-19T00:53:03Z

+A. Review each finding one by one — accept the recommendation or choose another action
+B. LFG. Apply the agent's best-judgment action per finding
+C. Append findings to the doc's Open Questions section and proceed
+D. Report only — take no further action


Suppress routing option C when append is unavailable

The routing question always includes option C (Append findings to the doc's Open Questions section and proceed), but open-questions-defer.md says this option must be suppressed when append_available: false (e.g., read-only document). In that environment, exposing C sends users into guaranteed append failures and per-finding fallback prompts instead of preventing the invalid path up front. Add the append-availability adaptation here so option C is omitted when append is known unavailable.

Useful? React with 👍 / 👎.

tmchow and others added 9 commits April 18, 2026 16:42

chatgpt-codex-connector bot reviewed Apr 19, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-doc-review/SKILL.md

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/synthesis-and-presentation.md

chatgpt-codex-connector bot reviewed Apr 19, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/open-questions-defer.md Outdated

chatgpt-codex-connector bot reviewed Apr 19, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/open-questions-defer.md Outdated

chatgpt-codex-connector bot reviewed Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ce-doc-review): autofix tiers, per-finding walk-through, multi-round memory#601

feat(ce-doc-review): autofix tiers, per-finding walk-through, multi-round memory#601
tmchow wants to merge 12 commits intomainfrom
tmchow/doc-review-autofix

tmchow commented Apr 19, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 19, 2026

Uh oh!

chatgpt-codex-connector bot Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmchow commented Apr 19, 2026

Summary

What changed

Autofix classification (three tiers, ce-code-review-aligned)

Interaction model (ported from ce-code-review)

In-doc Open Questions deferral (document-native Defer)

Multi-round decision memory

Terminal question (three options, separates "apply" from "re-review")

learnings-researcher rewrite (domain-agnostic)

ce-compound problem_type enum expansion

Scope boundaries

Testing

Review

Commits (9)

Post-deploy validation

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant