Skip to content

feat(ce-doc-review): autofix tiers, per-finding walk-through, multi-round memory#601

Open
tmchow wants to merge 12 commits intomainfrom
tmchow/doc-review-autofix
Open

feat(ce-doc-review): autofix tiers, per-finding walk-through, multi-round memory#601
tmchow wants to merge 12 commits intomainfrom
tmchow/doc-review-autofix

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Apr 19, 2026

Summary

Overhaul ce-doc-review to match the interaction quality of ce-code-review (post-PR #590). The current skill surfaces too many findings as "needs user judgment" when one clear fix exists, nitpicks at low confidence, and ends with a binary "refine / complete" question that forces re-review when the user wants to apply fixes and move on. This PR replaces binary classification with a three-tier system aligned with ce-code-review's names, ports the per-finding walk-through + bulk preview + routing question interaction, adds in-doc Open Questions deferral, introduces multi-round decision memory, rewrites ce-learnings-researcher to handle domain-agnostic institutional knowledge, and expands the ce-compound frontmatter problem_type enum.

Brainstorm: docs/brainstorms/2026-04-18-ce-doc-review-autofix-and-interaction-requirements.md (38 requirements, R31–R35 dropped from scope per pipeline-separation decision).

Plan: docs/plans/2026-04-18-001-feat-ce-doc-review-autofix-and-interaction-overhaul-plan.md (8 units, 4 phases).

What changed

Autofix classification (three tiers, ce-code-review-aligned)

autofix_class enum renamed from [auto, present] to [safe_auto, gated_auto, manual]:

  • safe_auto — one clear correct fix, applied silently
  • gated_auto — concrete fix exists but touches document meaning; user confirms via walk-through (Apply marked (recommended))
  • manual — requires user judgment; multiple valid approaches

No advisory tier. Low-confidence manual findings (above a 0.40 FYI floor, below the severity gate) surface in a distinct FYI subsection at the presentation layer — observational value without forcing a decision.

Per-severity confidence gates replace the flat 0.50 floor: P0 0.50 / P1 0.60 / P2 0.65 / P3 0.75. Residual-concern promotion (old step 3.4) removed; replaced with cross-persona agreement boost (+0.10) on merged findings multiple personas flagged.

Classification rule sharpened with a strawman-aware test: a "do nothing / accept the defect" alternative is the failure state, not a real alternative. When any genuine non-strawman alternative exists, safe_auto downgrades to gated_auto so the user sees the tradeoff before silent apply.

Framing guidance block added to the shared subagent template (observable-consequence-first, why-the-fix-works grounding, 2-4 sentence budget, positive/negative example pair). One file edit propagates to all 7 personas.

Interaction model (ported from ce-code-review)

After safe_auto fixes apply, the new four-option routing question dispatches:

  • A. Review each finding one by one (per-finding walk-through)
  • B. LFG (best-judgment action per finding)
  • C. Append findings to the doc's Open Questions section
  • D. Report only

Per-finding walk-through renders each finding in two surfaces — a terminal markdown block (explanation) and a compact two-line question stem (decision). Four per-finding options: Apply / Defer / Skip / LFG-the-rest, with the post-tie-break recommendation marked (recommended). N=1 and no-append adaptations included.

Bulk preview shows the plan before LFG, Append-to-Open-Questions, and walk-through LFG-the-rest executes. Two options: Proceed / Cancel.

Differences from ce-code-review: no tracker-detection tuple or sink-availability logic (Defer is always in-doc append); no [TRACKER] label substitution; no fixer-subagent batch dispatch (the orchestrator applies edits inline); no advisory-variant Acknowledge option.

In-doc Open Questions deferral (document-native Defer)

When the user picks Defer on a finding, an entry appends to a ## Deferred / Open Questions section at the end of the document under review, under a timestamped subsection (### From YYYY-MM-DD review). Deferred findings stay attached to the document they came from — no external tracker, no sibling file. Failure path (read-only doc, write error, concurrent-edit collision) surfaces a blocking sub-question with Retry / Record-only / Convert-to-Skip options.

Multi-round decision memory

Every review round after the first passes a cumulative decision primer to every persona. Rejected findings (Skip or Defer) from any prior round in the current session are carried in a <prior-decisions> block in the subagent template. Synthesis-level R29 suppression drops re-raised rejected findings using the same fingerprint + evidence-substring predicate as R30 fix-landed verification. No diff is passed — fixed findings self-suppress because evidence is gone from the current doc; regressions surface as normal findings; rejected findings are handled by pattern-match suppression. Cross-session persistence is out of scope.

Terminal question (three options, separates "apply" from "re-review")

The old binary "refine / complete" question replaced with:

  • Apply decisions and proceed to <next stage> — default when fixes applied; <next stage> is ce-plan for requirements docs, ce-work for plan docs
  • Apply decisions and re-review — opt-in when the user believes edits warrant another pass
  • Exit without further action

Zero-actionable case drops the re-review option and adapts the primary label to Proceed to <next stage>. Caller-context (ce-brainstorm nested invocation, ce-plan 5.3.8, standalone) is handled implicitly by the agent reading conversation state — no nested:true argument or caller-side change.

learnings-researcher rewrite (domain-agnostic)

ce-learnings-researcher rewritten for domain-agnostic institutional knowledge. Identity framing dropped "preventing repeated mistakes" for "find and distill applicable learnings." Accepts a structured <work-context> input (Activity / Concepts / Decisions / Domains) while preserving free-form fallback. Hardcoded Feature-Type-to-Directory table replaced with dynamic probe of docs/solutions/ at invocation time. Keyword extraction expanded from 4 dimensions (Module, Technical terms, Problem indicators, Component) to 8 (adds Concepts, Decisions, Approaches, Domains). critical-patterns.md read is now conditional on file existence (the file doesn't exist in this repo). Integration Points drops ce-doc-review — the agent is ce-plan-owned and ce-doc-review explicitly does not dispatch it (pipeline-separation principle per docs/solutions/skill-design/research-agent-pipeline-separation-2026-04-05.md).

ce-compound problem_type enum expansion

Four knowledge-track values added: architecture_pattern, design_pattern, tooling_decision, convention. Absorbs the best_practice overflow that had been a catch-all. Category mappings added. Both ce-compound and ce-compound-refresh schemas updated in sync. Plugin AGENTS.md gets a new "Documented Solutions" discoverability section naming all knowledge-track and bug-track categories. 10 existing best_practice entries migrated to narrower values; one correctness-gap schema violation resolved (docs/solutions/workflow/todo-status-lifecycle.mdworkflow_issue).

Scope boundaries

  • No external tracker integration. Defer is an in-doc section.
  • No persona activation/selection logic change. The 7 personas and conditional triggers stay as-is.
  • No batch fixer subagent. Document fixes apply inline.
  • No document-review-beta fork. Phased ship directly on the stable skill; headless envelope preserves backward compatibility with ce-brainstorm and ce-plan callers.
  • No cross-session primer persistence. In-memory across rounds within one session; fresh round 1 on new invocation.
  • No frontmatter schema dimensional redesign. Enum expansion only.

Testing

  • bun test: 798 pass, 0 fail. 13 new contract tests cover the three-tier enum, framing guidance, strawman rule, Open-Questions exclusion, decision primer, per-severity gates, FYI floor, cross-persona boost, R29 suppression, R30 verification, headless envelope tier buckets, terminal question three-option structure, label adaptation, Interactive-mode pre-load, walk-through and bulk-preview required mechanics, Open-Questions-defer append flow and failure path, schema enum expansion, schema duplicate-sync, yaml-schema category mappings, and learnings-researcher domain-agnostic framing.
  • bun run release:validate: in sync. compound-engineering currently has 50 agents, 43 skills, 0 MCP servers.
  • Seeded test fixture created at tests/fixtures/ce-doc-review/seeded-plan.md with known issues planted across tier shapes — 3 safe_auto candidates, 3 gated_auto, 5 manual, 5 FYI candidates, 3 drop-worthy P3s. Intended for pre-merge pipeline validation: run the new synthesis against this fixture and verify the bucket distribution matches expectations. Not yet exercised in this PR — doing so requires the new pipeline to be live in the installed skill cache. Recommend running once the PR lands and before promoting this into the default document-review flow.

Review

  • Brainstorm went through document-review (5 personas, 38 findings → 13 auto-applied, pushed back on 15 findings through 2 refinement rounds)
  • Plan went through recursive document-review against itself (5 personas, 38 findings → 9 auto-applied, resolved 15 findings through 2 refinement rounds including scope trim, tier rename, strawman safeguard, Open-Questions exclusion, primer structure spec, R30 matching predicate, pipeline sequencing fix, envelope-promotion-to-Unit-4, ce-plan-side validation for Unit 2)
  • Pipeline-separation learning consulted (docs/solutions/skill-design/research-agent-pipeline-separation-2026-04-05.md); drove decision to NOT dispatch learnings-researcher from ce-doc-review
  • State-machine discipline learning applied (docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md); finding-lifecycle state machine modeled explicitly in the plan with each transition narrated in synthesis prose
  • Post-PR-590 ce-code-review end-to-end learnings treated as mandatory reading (docs/solutions/best-practices/ce-pipeline-end-to-end-learnings-2026-04-17.md)

Commits (9)

Each unit shipped as its own commit so the progression is reviewable:

  1. docs(ce-doc-review): brainstorm + plan
  2. feat(ce-compound): Unit 1 — enum expansion + 10 migrations
  3. feat(learnings-researcher): Unit 2 — domain-agnostic rewrite
  4. feat(ce-doc-review): Unit 3 — subagent template upgrade + findings-schema enum
  5. feat(ce-doc-review): Unit 4 — synthesis pipeline with per-severity gates and headless envelope
  6. feat(ce-doc-review): Unit 5 — routing question + walk-through + bulk preview
  7. feat(ce-doc-review): Unit 6 — in-doc Open Questions deferral
  8. feat(ce-doc-review): Unit 7 — terminal question + multi-round decision primer
  9. test(ce-doc-review): Unit 8 — contract tests + framing polish

Post-deploy validation

  1. Run ce-doc-review against tests/fixtures/ce-doc-review/seeded-plan.md and verify bucket distribution: ≥2 of 3 seeded safe_auto applied silently, ≥2 of 3 seeded gated_auto in walk-through Apply bucket, all 3 drop-worthy P3s dropped by the 0.75 gate, ≥3 of 5 FYI candidates in the FYI subsection, zero false-auto on manual-shaped seeds.
  2. Re-run ce-doc-review against this PR's brainstorm doc (docs/brainstorms/2026-04-18-ce-doc-review-autofix-and-interaction-requirements.md) as a before/after comparison with the old pipeline; verify meaningful decision-count reduction without regressing previously-applied P0/P1 fixes.
  3. Run ce-plan against any plan doc and verify the rewritten learnings-researcher produces reasonable results across code-bug queries and non-code queries (skill-design, workflow). If output quality regresses, iterate via prompt edits in follow-up PRs.

Fixes #562 spiritually (document-review analog of the ce-review interactive judgment overhaul).

🤖 Generated with Claude Code

tmchow and others added 9 commits April 18, 2026 16:42
…ion overhaul

Capture requirements brainstorm (38 requirements across 11 sections) and
implementation plan (8 units, 4 phases) for overhauling ce-doc-review:
three-tier autofix classification (safe_auto / gated_auto / manual)
aligned with ce-code-review names, per-severity confidence gates, ported
walk-through + bulk-preview + routing-question interaction, in-doc
Open Questions deferral, multi-round decision memory, domain-agnostic
learnings-researcher rewrite, and ce:compound problem_type enum
expansion.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… / design_pattern / tooling_decision / convention

Adds four knowledge-track values to the ce:compound frontmatter schema
to absorb the best_practice overflow that had been serving as a catch-all
for architecture, design, tooling, and convention learnings. best_practice
remains valid but is now documented as the fallback for entries that no
narrower knowledge-track value covers.

Schema updates land in both ce-compound and ce-compound-refresh (duplicate
files are kept in sync intentionally). Category mappings added for the
four new values under docs/solutions/. Plugin AGENTS.md gets a new
"Documented Solutions" discoverability section naming all knowledge-track
and bug-track categories so agents reading project instructions know
the full taxonomy.

Migrates 10 existing best_practice entries to narrower values and resolves
one correctness-gap schema violation (workflow/todo-status-lifecycle.md
-> workflow_issue). Remaining best_practice entries are genuinely broad
enough to stay in that bucket (e.g., ce-pipeline-end-to-end-learnings).

bun test tests/frontmatter.test.ts passes.

Part of the ce-doc-review autofix and interaction overhaul plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… knowledge

The learnings-researcher was shaped around bug-hunting — its taxonomy,
category table, output framing, and DO/DON'T guidelines all assumed the
caller was looking for past defects. But the team captures more than
bugs: architecture patterns, design patterns, tooling decisions,
conventions, and workflow learnings are all first-class in docs/solutions/
now. Authors were retreating to best_practice as a catch-all because the
agent didn't reliably surface non-bug entries.

This rewrite treats all learning shapes equally:

- Identity framing dropped "preventing repeated mistakes" for "find and
  distill applicable learnings" — bug avoidance is one flavor, not the
  premise
- Accepts a structured <work-context> input (Activity / Concepts /
  Decisions / Domains) and still handles free-form text for callers that
  don't use the structured form
- Replaced the hardcoded Feature-Type-to-Directory table with a dynamic
  probe of docs/solutions/ at invocation time so new subdirectories
  (architecture-patterns/, design-patterns/, conventions/, etc.) are
  picked up automatically
- Expanded keyword extraction from 4 dimensions (Module, Technical terms,
  Problem indicators, Component) to 8 by adding Concepts, Decisions,
  Approaches, Domains — so design-pattern and convention queries have
  something to match on
- Made Step 3b (critical-patterns.md read) conditional on file existence
  — the file does not exist in this repo and demanding it was emitting
  warnings
- Reframed output to "Applicable Past Learnings" with a Type field so
  knowledge-track entries surface as themselves (architecture_pattern,
  design_pattern, etc.) instead of being flattened to a bug-shape
- Reworked Integration Points to drop the ce-doc-review reference (it
  explicitly does not dispatch this agent) and stop framing planning-time
  as the agent's primary home
- Updated DO/DON'T guidelines: don't discard candidates for missing
  bug-shaped fields, treat non-bug learnings as first-class, don't assume
  critical-patterns.md exists

Preserved the grep-first filtering strategy intact (it was already
efficient) and the overall 7-step flow. File grew from 246 to 283 lines
due to the expanded taxonomy and structured input documentation.

Part of the ce-doc-review autofix and interaction overhaul plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…tion, strawman rule, framing guidance

The persona subagent template was producing output that pushed too much
into 'present' (requires user judgment) because the classification rule
was permissive about what counted as a 'reasonable alternative' — (a)
real fix + (b) 'accept the defect' strawman + (c) 'document in release
notes' strawman looked like three valid approaches, inflating the
manual/present bucket. This upgrade sharpens classification and improves
persona output quality across all seven document-review personas via
one file edit.

Changes:

- autofix_class enum renamed and expanded: [auto, present] -> [safe_auto,
  gated_auto, manual]. Aligns names with ce-code-review's first three
  tiers so cross-skill vocabulary is consistent. Drops the advisory tier
  in favor of a presentation-layer FYI subsection handled by synthesis
- Three-tier routing spelled out with one-sentence descriptions and
  examples: safe_auto for single-option fixes, gated_auto for concrete
  fixes that touch meaning/scope (the new default for 'I know the fix
  but the author should sign off'), manual for genuine judgment calls
- Strawman-aware classification rule added: a 'do nothing / accept the
  defect' option is not a real alternative — it is the failure state the
  finding describes. Same for 'document in release notes' and similar
  sidesteps. Listed alternatives must be ones a competent implementer
  would genuinely weigh. Positive and negative example pair included
- Strawman safeguard on safe_auto: if safe_auto is chosen via
  strawman-dismissal, the dismissed alternatives must be named in
  why_it_matters. Any genuine non-strawman alternative downgrades the
  finding to gated_auto (user confirms) rather than silent apply
- Auto-promotion patterns consolidated into an explicit rule set:
  factually incorrect behavior, missing standard security/reliability
  controls, codebase-pattern-resolved fixes (with cited pattern),
  framework-native-API substitutions, mechanically-implied completeness
- Framing guidance block added (ported from ce-code-review's
  subagent-template): observable-consequence-first phrasing,
  why-the-fix-works grounding, 2-4 sentence budget, weak/strong
  illustrative pair. Tuned for document-review's surface — 'observable
  consequence' rather than 'observable behavior' since doc problems
  surface as disagreements, misreads, and wrong decisions rather than
  runtime behavior
- Persona exclusion rule added: personas must exclude '## Deferred /
  Open Questions' sections and their timestamped subsections from review
  scope. Prevents round-2 feedback loops where deferred entries get
  flagged as new findings or quoted as evidence for suppression checks.
  Unit 6 is going to create these sections as the document-native defer
  mechanic

Template grows from 52 to 95 lines, still under the 150-line @ inclusion
threshold so SKILL.md continues to use @-inclusion.

findings-schema.json autofix_class enum and description updated to match.

Part of the ce-doc-review autofix and interaction overhaul plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…s and FYI subsection

Rewrites the synthesis pipeline to route findings by the three new tiers
(safe_auto / gated_auto / manual) with per-severity confidence gates and
a distinct FYI subsection for low-confidence manual findings. This is
the load-bearing synthesis change that enables the interaction model
(Unit 5) and multi-round memory (Unit 7) to follow.

Pipeline changes:

- Step 3.2: replace flat 0.50 gate with per-severity table. P0 0.50,
  P1 0.60, P2 0.65, P3 0.75. FYI floor at 0.40: below-gate manual
  findings above the floor survive in an FYI subsection rather than
  being dropped — preserves observational value without forcing decisions
- Step 3.4: delete residual-concern promotion. Replace with cross-persona
  agreement boost (+0.10, capped at 1.0) applied to merged findings that
  multiple personas flagged. Mirrors ce-code-review stage 5 step 4.
  Residual concerns surface in Coverage only; below-gate findings are
  not promoted back into the review surface
- Step 3.5: tier language updated for three tiers. Contradictions route
  to manual with both perspectives
- Step 3.6: auto-promotion patterns consolidated into an explicit rule
  set: codebase-pattern-resolved, factually incorrect behavior, missing
  standard security/reliability controls, framework-native-API
  substitutions, mechanically-implied completeness additions. Promotion
  targets safe_auto or gated_auto depending on whether the addition is
  substantive. Strawman-downgrade safeguard: if safe_auto was chosen via
  strawman-dismissal per the subagent template, synthesis verifies the
  dismissed alternatives were genuinely strawmen and downgrades to
  gated_auto if any were plausible design choices
- Step 3.7: routing rewritten for three tiers + FYI. safe_auto applies
  silently in Phase 4. gated_auto enters walk-through with Apply
  recommended. manual enters walk-through with user-judgment framing.
  FYI-subsection routes outside the walk-through entirely
- State-machine narration added to the phase header so each transition
  (Raised -> Classified -> one of four buckets) is a named step in
  synthesis prose, not an implied carry-forward
- R30 fix-landed matching predicate specified explicitly: fingerprint +
  evidence-substring overlap >50%. Section renames count as different
  locations so a rename-caused re-raise is treated as new, not as
  'fix did not land'

Phase 4 updates:

- Headless envelope extended with distinct 'Gated-auto findings' and
  'FYI observations' sections alongside existing 'Applied N safe_auto
  fixes' and 'Manual findings' sections. Backward-compatible: callers
  that only read the old sections continue to work; new sections surface
  alongside, not instead of, the existing buckets. Landing the envelope
  change here (Unit 4) rather than Unit 8 means ce-plan 5.3.8 headless
  invocations never observe an interstitial state where new tiers are
  produced but the envelope can't carry them
- Interactive presentation split by tier and finding type. FYI observations
  get their own subsection. Applied safe_auto fixes list every change
  with section + reviewer attribution
- Phase 5 simplified: handoff to Unit 5 walkthrough.md for routing
  question and per-finding interaction. Terminal question (Unit 7's
  three-option structure) fires after walk-through or bulk action

Creates tests/fixtures/ce-doc-review/seeded-plan.md — a test fixture
plan doc with known issues planted across tier shapes (3 safe_auto, 3
gated_auto, 5 manual, 5 FYI candidates, 3 drop-worthy P3s). Used for
the validation gate: run the new pipeline against this fixture and
verify bucket distribution matches expectations before shipping Phase 2.

Part of the ce-doc-review autofix and interaction overhaul plan. Unit 4
of 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…and bulk preview

Ports ce-code-review's interactive flow (post-PR #590) into ce-doc-review
while adapting for document-specific semantics. Replaces the current
binary 'refine / complete' Phase 5 question with a four-option routing
question that dispatches to focused interactions sized to the user's
intent.

New references:

- walkthrough.md — the per-finding walk-through that fires when the user
  picks routing option A. Each finding renders in two surfaces: a
  terminal markdown block carrying the explanation (heading, What's
  wrong, Proposed fix, Why it works) and a compact two-line question
  stem carrying the decision. Four per-finding options: Apply / Defer
  (append to Open Questions) / Skip / LFG-the-rest. Post-tie-break
  recommendation marked with '(recommended)'. N=1 and no-append
  adaptations included. Unified completion report covers walk-through,
  LFG, Append-to-Open-Questions, and zero-findings terminal paths

- bulk-preview.md — the compact plan preview before every bulk action
  (routing B, routing C, walk-through LFG-the-rest). Grouped buckets
  (Applying / Appending to Open Questions / Skipping) with one-line
  plain-English summary per finding. Two options: Proceed or Cancel.
  Cancel semantics route back to the originating question (routing or
  per-finding). Proceed executes the plan and emits the unified
  completion report

Differences from ce-code-review's equivalents:

- No tracker-detection tuple or sink-availability logic — Defer is always
  an in-doc append, no external tracker
- No [TRACKER] label substitution — bucket is always 'Appending to Open
  Questions'
- No fixer-subagent batch dispatch — the orchestrator applies
  gated_auto/manual fixes inline (document edits are single-file
  markdown with no cross-file dependencies)
- No advisory variant with Acknowledge — advisory-style findings surface
  in the FYI subsection at the presentation layer, outside the
  walk-through
- No .context/ artifact-lookup paths — ce-doc-review personas don't
  write run artifacts
- Observable-consequence phrasing tuned for document surface (readers,
  implementers, downstream decisions) rather than ce-code-review's
  runtime-behavior framing

SKILL.md changes:

- Added 'Interactive mode rules' section at the top with the
  AskUserQuestion pre-load directive and numbered-list fallback rule.
  Mirrors ce-code-review's approach for reliable question-tool loading
  on Claude Code
- Updated the Phases 3-5 handoff to load walkthrough.md and
  bulk-preview.md after agent dispatch completes, lazily via backtick
  paths per the skill compliance checklist

Part of the ce-doc-review autofix and interaction overhaul plan. Unit 5
of 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Implements the Defer action's document-native analogue to ce-code-review's
tracker-ticket mechanic. When the user chooses Defer on a finding (from
the walk-through or from the bulk-preview Append-to-Open-Questions path),
an entry appends to a '## Deferred / Open Questions' section at the end
of the document under review.

Design decisions:

- In-doc append instead of sibling follow-up file or external tracker:
  deferred findings stay attached to the document they came from,
  naturally discoverable by anyone reading the doc, no new infrastructure
  required. Trade-off: deferred findings visibly mutate the doc — but
  that's the point, an Open Questions section is a canonical place for
  'worth remembering but not acting on now' in planning docs

- Section location handling covers three cases: end-of-doc heading (most
  common — append inside), mid-doc heading (user positioned deliberately
  — still append at that location, don't duplicate at end), absent
  (create at end, or above trailing separators/footers, or after
  frontmatter for frontmatter-only docs)

- Timestamped subsections (### From YYYY-MM-DD review) group multiple
  defers per review and keep sequential reviews distinguishable. Same-day
  reviews within a session share one subsection; cross-day reviews get
  distinct subsections

- Entry format: title, severity, reviewer attribution, confidence, and
  the why_it_matters framing. Intentionally excludes suggested_fix and
  evidence — those live in the run artifact when applicable; the in-doc
  entry is a concern summary for the reader returning months later, not
  a full decision packet

- Idempotence on title collisions within the same subsection prevents
  duplicate entries from LFG-the-rest re-routing or orchestrator retry
  paths

- Concurrent-edit safety: re-read document from disk before every
  append to reduce collision window with user-in-editor writes. No
  persistent lock — interactive review needs observation-before-write,
  not coordination

- Failure path via blocking sub-question (Retry / Record-only / Convert-
  to-Skip) ensures no silent failures. Default-to-Record-only when the
  sub-question goes unanswered keeps in-memory state consistent

- Upstream availability signal cached at Phase 4 start — known-unwritable
  documents suppress the Defer option in the walk-through menu and
  routing question; mid-flow failures route through the per-finding
  failure path without flipping the session signal

Referenced by references/walkthrough.md (per-finding Defer option) and
references/bulk-preview.md (routing option C Proceed). Includes a
worked example showing the appended-content format in context.

Part of the ce-doc-review autofix and interaction overhaul plan. Unit 6
of 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…nd decision primer

Replaces the current binary 'refine / complete' Phase 5 question with a
three-option terminal question that separates 'apply decisions' from
're-review' — the most common user frustration the brainstorm surfaced
was that the old wording forced a re-review even when the user wanted
to apply fixes and move on.

Terminal question:

- 'Apply decisions and proceed to <next stage>' — default when fixes
  were applied or decisions were made. <next stage> substitutes to
  ce-plan for requirements docs and ce-work for plan docs
- 'Apply decisions and re-review' — opt-in re-review when the user
  believes their edits warrant another pass
- 'Exit without further action' — user wants to stop for now

When fixes_applied_count == 0 (zero-actionable case or all-Skip walk-
through), the 're-review' option drops (re-review is useless when
nothing changed) and the 'Apply decisions and' prefix drops from the
primary option's label so it matches what the system is doing.

Caller-context handling is implicit rather than requiring a formal
nested:true argument. The agent interprets 'Proceed to <next stage>'
contextually from the visible conversation state — when invoked from
inside another skill's flow, it returns control to the caller; when
standalone, it dispatches the next stage skill. No caller-side change
to ce-brainstorm or ce-plan. If implicit handling proves unreliable,
an explicit nested:true flag can be added as a follow-up.

Multi-round decision memory:

- SKILL.md Phase 2 adds a new {decision_primer} template variable that
  accumulates prior-round decisions across the session. Round 1 renders
  an empty <prior-decisions> block; round 2+ renders applied and
  rejected entries by round with section, title, reviewer, confidence,
  and rejection reason. Skip and Defer both count as 'rejected' for
  suppression purposes. Cross-session persistence is out of scope —
  fresh invocations start a fresh round 1
- synthesis-and-presentation.md adds an R29 suppression step. For each
  current-round finding, compare against the primer's rejected list
  using the same fingerprint + evidence-substring predicate as R30.
  On match, drop the current-round finding and log the suppression in
  Coverage. Materially-different exception: if the section was edited
  enough that the prior-round evidence quote no longer appears in the
  current document, treat as new. Synthesis is the authoritative gate;
  the persona instruction below is a soft hint
- subagent-template.md adds a <decision-primer-rules> block that
  instructs personas to honor the primer: don't re-raise a finding whose
  title and evidence pattern-match a prior-round rejected entry unless
  materially different. Prior-round Applied findings are informational
  — the orchestrator verifies those landed via R30. Round 1 runs with
  no primer constraints

Part of the ce-doc-review autofix and interaction overhaul plan. Unit 7
of 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ugh, defer, enum expansion

Extends pipeline-review-contract.test.ts with structural assertions for
the ce-doc-review overhaul. Uses pattern matching on key tokens rather
than exact prose so future wording improvements don't ossify the tests.

New test groups:

- ce-doc-review contract (8 tests):
  - findings-schema autofix_class enum uses [safe_auto, gated_auto,
    manual] — aligned with ce-code-review's first three tier names,
    no advisory tier, no legacy auto/present values
  - subagent template carries framing guidance, strawman rule, strawman
    safeguard, Deferred/Open-Questions exclusion, and decision-primer
    slot + rules
  - synthesis pipeline has per-severity gates (P0 0.50 / P1 0.60 /
    P2 0.65 / P3 0.75), FYI floor at 0.40, cross-persona agreement
    boost (+0.10), three-tier routing, R29 rejected-finding suppression,
    R30 fix-landed matching predicate
  - headless envelope surfaces new tiers distinctly (Applied N safe_auto
    fixes / Gated-auto findings / Manual findings / FYI observations)
    with preserved 'Review complete' terminal signal
  - terminal question is three-option by default with label adaptation
    in the zero-actionable case; next-stage substitution rules
    documented
  - SKILL.md has Interactive mode rules with AskUserQuestion pre-load
    via ToolSearch, decision_primer variable in dispatch, lazy
    references to walkthrough.md and bulk-preview.md
  - walkthrough has routing question with front-loaded distinguishing
    words, four per-finding options, (recommended) marker, no advisory
    variant, no tracker-detection machinery; bulk-preview has Proceed/
    Cancel with three bucket labels, no Acknowledging bucket
  - open-questions-defer implements the append mechanic with timestamped
    subsections, field-level entry format, failure-path sub-question
    (Retry / Record-only / Convert-to-Skip), no tracker-detection logic

- ce-compound frontmatter schema expansion contract (3 tests):
  - problem_type enum includes architecture_pattern, design_pattern,
    tooling_decision, convention — with best_practice retained as
    fallback
  - ce-compound-refresh schema is byte-identical to canonical
    ce-compound schema (kept in sync per AGENTS.md duplicate convention)
  - yaml-schema.md documents category mappings for the four new values

- ce-learnings-researcher domain-agnostic contract (1 test):
  - Agent prompt frames as domain-agnostic (not bug-focused), names
    Architecture / Design / Tooling / Conventions as first-class
    learning shapes, accepts structured <work-context> input, replaces
    the hardcoded category table with a dynamic probe, reads
    critical-patterns.md conditionally, and drops ce-doc-review from
    Integration Points (agent is ce-plan-owned)

All 13 new tests pass. Full suite: 798 pass, 0 fail. bun run
release:validate: in sync.

Part of the ce-doc-review autofix and interaction overhaul plan. Unit 8
of 8 — final unit; this ships the contract tests that lock in the
three-tier / multi-round / interaction-model changes shipped in Units 3
through 7.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d897823f52

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-doc-review/SKILL.md
- Preserve evidence snippets in decision primer entries (thread r3106059451, P1)
  R29 rejected-finding suppression and R30 fix-landed verification both
  use fingerprint + evidence-substring overlap (>50%) as their matching
  predicate. The primer format in SKILL.md previously stored only
  section/title/reviewer/confidence/reason, leaving the orchestrator
  without prior-round evidence needed to compute the overlap check.
  Adds an 'Evidence:' line to each primer entry containing the first
  evidence quote truncated to ~120 chars (word-boundary-preserving,
  internal quotes escaped). Documented inline why the field is required.

- Align review-output-template.md with new three-tier taxonomy (thread
  r3106059452, P2)
  The interactive presentation template still referenced the legacy
  'auto/present' contract — summary line, 'Auto-fixes Applied' heading,
  'Auto | Present' coverage columns, and rules text all leaked stale
  bucket semantics into user-facing output. Rewrites the template for
  the new taxonomy: 'Applied N safe_auto fixes' summary line, a Tier
  column in severity tables, a dedicated 'FYI Observations' section,
  and Coverage columns for SafeAuto | GatedAuto | Manual | FYI |
  Residual. Residual Concerns section clarified: no longer promoted
  via residual-concern promotion (dropped in Unit 4) — listed for
  transparency only. Adds a note up-front that this template covers
  Phase 4 interactive presentation specifically; the headless envelope
  lives in synthesis-and-presentation.md.

All contract tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df79db5015

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated
Both findings fall in the same pattern the prior round surfaced:
high-level rules written without the deterministic mechanics the
orchestrator needs to actually implement them. Cross-invocation
signal from resolve-pr-feedback flagged the theme.

- Add deterministic recommended-action tie-break to synthesis
  (thread r3106064985, P1)
  walkthrough.md said 'recommended action has already been normalized
  by synthesis step 3.6' but step 3.6 only handled auto-promotion;
  there was no rule for picking Apply/Defer/Skip when contributing
  personas implied different actions. Without a tie-break, LFG
  outcomes are non-deterministic across runs and walk-through
  '(recommended)' marks drift.

  Adds synthesis step 3.5b 'Deterministic Recommended-Action Tie-Break'
  implementing 'Skip > Defer > Apply' (most conservative first). Scans
  contributing personas in that order; first match wins. Documents
  persona-to-action mapping (safe_auto/gated_auto -> Apply, manual
  tradeoffs -> Defer, residual-suppression-eligible -> Skip,
  contradiction-set with 'keep as-is' -> Skip). Silent-personas
  fallback to Apply. Records a conflict-context string on the merged
  finding for the walk-through's R15 conflict-context line. Updates
  walkthrough.md's reference from 'synthesis step 3.6' to the correct
  '3.5b'.

  Invariant: walk-through and bulk-preview read recommended_action
  and never recompute. LFG-the-rest and routing option B execute the
  recorded recommendation in bulk, so the same review artifact
  always produces the same bulk plan.

- Strengthen dedup key in open-questions-defer.md (thread r3106064987,
  P2)
  Title-only idempotence silently drops legitimate deferred entries
  when two different concerns share a short title in the same review
  date. Loses user-visible backlog context.

  Compound key is now 'normalize(section) + normalize(title) +
  evidence_fingerprint', mirroring the R29/R30 matching predicate
  (fingerprint + evidence-substring overlap) so cross-round and
  intra-round dedup behave consistently. Evidence fingerprint is the
  first ~120 chars of the finding's first evidence quote (same slice
  used in the decision primer). Falls back to section+title when
  evidence is unavailable.

All contract tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afc699da8f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md Outdated
Both findings continue the pattern: specification gaps in deterministic
orchestration mechanics. Different gaps each round — Codex keeps
finding adjacent spots where high-level rules need concrete
implementation detail.

- Drop routing-level recommendation labeling (thread r3106084407, P2)
  walkthrough.md's routing question said any of A/B/C/D could be
  marked '(recommended)' based on synthesis, but synthesis only
  defines per-finding recommended_action — there was no rule for
  picking a routing-level recommendation. Rather than invent one
  (a mapping from finding-set shape to routing recommendation would
  pressure users toward automated paths in ways that conflict with
  the user-intent framing of the four options), remove the routing-
  level (recommended) labeling. Per-finding (recommended) inside the
  walk-through and bulk preview still applies — that's deterministic
  via step 3.5b. The routing choice itself is user-intent:
  engage / trust / triage / skim.

- Persist dedup-key fields in the appended entry (thread r3106084409,
  P2)
  Previous round added a compound dedup key (section + title +
  evidence_fingerprint) but the deferred-entry format on disk only
  stored title/severity/reviewer/confidence/why_it_matters — so
  retries or same-day reruns couldn't reliably reconstruct the key
  from persisted data. Duplicates could still slip through.

  Adds an HTML-comment line on each appended entry carrying the
  machine-readable dedup fields: <!-- dedup-key: section=\"...\"
  title=\"...\" evidence=\"...\" -->. Keeps the visible content
  reader-friendly (section now surfaces in the header line alongside
  severity/reviewer/confidence; the HTML comment stays hidden in
  rendered markdown) while persisting exactly the fields Step 4's
  compound-key check needs. Legacy entries without the comment fall
  back to title-only comparison — imperfect but strictly better than
  duplicate-appending.

All contract tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f65830da9b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

1. The finding's `why_it_matters` cites a specific existing codebase pattern -- not just "best practice" or "convention," but a concrete pattern with a file, function, or usage reference
2. The finding includes a concrete `suggested_fix` that follows that cited pattern
3. There is no genuine tradeoff -- the codebase context resolves any ambiguity about which approach to use
If the contributing personas are all silent on action (e.g., a merged `manual` finding from personas that all flagged it as observation without recommendation) → `recommended_action: Apply` as the pragmatic default; the walk-through still lets the user pick any of the four options.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not default actionless manual findings to Apply

Step 3.5b sets recommended_action: Apply when contributing personas are “silent on action”, but manual findings are allowed to omit suggested_fix (3.7), and end-of-walk-through execution applies each Apply-set finding’s suggested_fix (walkthrough.md line 183). This means a merged manual observation with no concrete fix can still be auto-routed to Apply (including via LFG), yielding non-executable or empty apply actions. Gate Apply on suggested_fix presence (or default this branch to Defer/Skip) so every recommended action is executable.

Useful? React with 👍 / 👎.

Comment on lines +20 to +23
A. Review each finding one by one — accept the recommendation or choose another action
B. LFG. Apply the agent's best-judgment action per finding
C. Append findings to the doc's Open Questions section and proceed
D. Report only — take no further action
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Suppress routing option C when append is unavailable

The routing question always includes option C (Append findings to the doc's Open Questions section and proceed), but open-questions-defer.md says this option must be suppressed when append_available: false (e.g., read-only document). In that environment, exposing C sends users into guaranteed append failures and per-finding fallback prompts instead of preventing the invalid path up front. Add the append-availability adaptation here so option C is omitted when append is known unavailable.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

What is the best workflow after ce:review - worth introducing a ce:fix?

1 participant