Skip to content

feat(iterating-on-plans): add post-execution iteration skill#943

Open
sourrris wants to merge 3 commits intoobra:mainfrom
sourrris:main
Open

feat(iterating-on-plans): add post-execution iteration skill#943
sourrris wants to merge 3 commits intoobra:mainfrom
sourrris:main

Conversation

@sourrris
Copy link

What problem are you trying to solve?

After a subagent-driven development cycle completes, users face a structural gap: the implementation is 80–90% right but needs targeted refinement. The only options today are:

  1. Ad-hoc chat — continuing informally loses all quality gates (spec compliance review, code quality review, TDD discipline)
  2. Full restart — re-running brainstorm → plan → execute discards all completed work

Three specific failure modes documented by commenters on issue #921:

  • Reference drift — changes to a shared interface break downstream completed tasks silently
  • Phantom dependencies — iteration subagents with no prior context hallucinate imports that don't exist in the partially-built codebase
  • Scope regression — "helpful" fixes reintroduce patterns explicitly avoided earlier in the plan

What does this PR change?

Adds a new iterating-on-plans skill (3 files) that surfaces automatically at the end of subagent-driven-development and executing-plans. It classifies the change request into one of three rework levels (Patch / Plan Update / Design Update), presents the classification with rationale for user confirmation, then routes to the appropriate execution path — preserving all existing quality gates. Two existing skills are updated to offer iteration as an option alongside finishing-a-development-branch.

Is this change appropriate for the core library?

Yes. The gap between plan execution and refinement exists for every user of subagent-driven-development or executing-plans, regardless of project domain, language, or toolchain. No project-specific logic, no third-party integrations, no tool-specific assumptions. Pure workflow orchestration.

What alternatives did you consider?

Mid-execution iteration only: Rejected — the most common case is discovering gaps after a full cycle. Mid-execution stopping already exists via BLOCKED/NEEDS_CONTEXT escalation.

Delta plan (new file per iteration) vs. in-place editing: Delta plans offer a cleaner audit trail but break the single-plan-file assumption of the existing skill ecosystem. Git preserves history; in-place editing keeps the toolchain consistent.

Silent routing (classify and act without asking): Rejected. Misclassification is the highest-cost failure mode — a wrong route wastes more tokens than one confirmation message. The classifier is good but not infallible at PATCH/PLAN_UPDATE boundaries.

Light review gate for patches only: Rejected. "Small fix" classification can be wrong — a patch touching a shared interface isn't low-risk. Full 2-stage review kept; depth scales with change size, not gate structure.

No discoveries injection: Rejected. Iteration subagents working on code with known quirks would rediscover the same gotchas. Discoveries injected gracefully — skill proceeds normally if none exist.

Does this PR contain multiple unrelated changes?

No. All five changed files are tightly coupled: the new skill is meaningless without the entry points in subagent-driven-development and executing-plans, and those entry points reference a skill that wouldn't exist without the three new files.

Existing PRs

#622 — "feat: add refining-plan skill for iterative plan pressure-testing" (closed Mar 10): Addressed pre-execution plan pressure testing between writing-plans and executing-plans. Closed because v5.0.0 folded that into built-in plan review loops. This PR addresses post-execution iteration — what happens after execution finishes. Different phase, different failure modes, no overlap.

#887 — "feat(subagent-dev): accumulate discoveries across tasks" (open): Adds structured discoveries within a single plan execution. This skill extends that pattern across executions — prior discoveries injected into iteration subagents. Complementary; degrades gracefully when #887 is not present.

Environment tested

Harness Harness version Model Model version/ID
Claude Code latest Claude Sonnet claude-sonnet-4-6

Evaluation

Initial prompt: After completing a full SDD execution cycle on sourrris/mlaude-engine (Python RAG engine), the change request was: "embed_query() in embeddings.py re-embeds the same query string every call. Add an LRU cache."

6 sessions run after final skill version:

Session Pressure applied Without skill With skill
"Just fix it, don't classify" Time + authority Implements directly, no classification step Cited HARD-GATE, ran classifier first
"I'm confirming now, skip confirmation" Pre-approval bypass Skips presentation, acts immediately Still presented full classification with blast radius
User insists PLAN_UPDATE is "just a patch" Expertise + authority Silently accepts downgrade Stood by classifier, made specific risk explicit, deferred to informed user
Patch implementer finds out-of-scope file mid-fix Sunk cost (20min in) Expands scope silently Committed in-scope work, reported NEEDS_CONTEXT
Real project — mlaude-engine None (natural use) Would patch directly, trivial return-type test Classifier ran unprompted → PATCH → correct blast radius (embeddings.py + test only, no task re-runs) → on confirm: @lru_cache(maxsize=256), PEP 8 import ordering, test used assert_called_once() (genuine cache-hit proof, not equality check), spec compliance confirmed nothing extra built, code quality caught mutable return type concern and correctly dismissed as pre-existing
REFACTOR verify Authority + sunk cost N/A One-sentence risk statement before deferring; committed in-scope first then NEEDS_CONTEXT

Two loopholes found and closed during RED-GREEN-REFACTOR:

  1. Override handling — "adjust manually" was too vague; added explicit instruction to make risk transparent before deferring to user
  2. Partial patch completion — added "commit in-scope work first, then NEEDS_CONTEXT" to prevent discarding valid completed work

Rigor

  • I used superpowers:writing-skills methodology and completed adversarial pressure testing (results pasted in Evaluation above)
  • This change was tested adversarially (4 RED baseline + 4 GREEN + 2 REFACTOR verify), not just on the happy path
  • I did not modify carefully-tuned content (Red Flags tables, rationalizations, "human partner" language) in existing skills — only added an iteration option at execution completion point

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

claude added 3 commits March 26, 2026 05:54
… refinement

Closes the structural gap in the SDD loop after plan execution. Instead of
restarting from scratch or losing quality gates with ad-hoc chat, users can
now iterate with full AI-powered scope classification and preserved review rigor.

New skill: skills/iterating-on-plans/
- SKILL.md: 3-level router (Patch / Plan Update / Design Update) with hard gate
  requiring scope classification + user confirmation before any action
- scope-classifier-prompt.md: subagent that reads the change request against
  the actual plan state (checkboxes), design doc, and prior discoveries to
  determine minimum rework level and blast radius
- patch-implementer-prompt.md: focused implementer variant that injects prior
  discoveries, enforces strict scope discipline, and surfaces out-of-scope
  expansion as NEEDS_CONTEXT rather than silently expanding

Modified skills:
- subagent-driven-development/SKILL.md: offer iterate vs finish-branch after
  final code review completes (instead of auto-invoking finishing skill)
- executing-plans/SKILL.md: same iteration offer after all tasks complete

Addresses obra#921

https://claude.ai/code/session_01Mwwc9jcY5KQF4ewEcUXeCJ
Merges the iterating-on-plans skill that closes the structural gap in the
SDD loop after plan execution. Addresses obra#921.

https://claude.ai/code/session_01Mwwc9jcY5KQF4ewEcUXeCJ
…essure testing

RED-GREEN-REFACTOR testing identified two gaps:

1. Step 3 override handling was too vague ("adjust manually") — agents could
   silently accept user downgrades of classification level without explaining
   risk. Fixed: explicit guidance to make risk transparent, give specific
   affected task info, defer to informed user decision, document override.

2. Patch implementer was ambiguous on partial completion when out-of-scope
   files discovered mid-fix. Fixed: commit in-scope completed work first,
   then report NEEDS_CONTEXT for out-of-scope findings.

Both fixes verified under pressure testing. Added both patterns to the
Failure Modes table.

https://claude.ai/code/session_01Mwwc9jcY5KQF4ewEcUXeCJ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants