feat(iterating-on-plans): add post-execution iteration skill by sourrris · Pull Request #943 · obra/superpowers

sourrris · 2026-03-26T19:16:31Z

What problem are you trying to solve?

After a subagent-driven development cycle completes, users face a structural gap: the implementation is 80–90% right but needs targeted refinement. The only options today are:

Ad-hoc chat — continuing informally loses all quality gates (spec compliance review, code quality review, TDD discipline)
Full restart — re-running brainstorm → plan → execute discards all completed work

Three specific failure modes documented by commenters on issue #921:

Reference drift — changes to a shared interface break downstream completed tasks silently
Phantom dependencies — iteration subagents with no prior context hallucinate imports that don't exist in the partially-built codebase
Scope regression — "helpful" fixes reintroduce patterns explicitly avoided earlier in the plan

What does this PR change?

Adds a new iterating-on-plans skill (3 files) that surfaces automatically at the end of subagent-driven-development and executing-plans. It classifies the change request into one of three rework levels (Patch / Plan Update / Design Update), presents the classification with rationale for user confirmation, then routes to the appropriate execution path — preserving all existing quality gates. Two existing skills are updated to offer iteration as an option alongside finishing-a-development-branch.

Is this change appropriate for the core library?

Yes. The gap between plan execution and refinement exists for every user of subagent-driven-development or executing-plans, regardless of project domain, language, or toolchain. No project-specific logic, no third-party integrations, no tool-specific assumptions. Pure workflow orchestration.

What alternatives did you consider?

Mid-execution iteration only: Rejected — the most common case is discovering gaps after a full cycle. Mid-execution stopping already exists via BLOCKED/NEEDS_CONTEXT escalation.

Delta plan (new file per iteration) vs. in-place editing: Delta plans offer a cleaner audit trail but break the single-plan-file assumption of the existing skill ecosystem. Git preserves history; in-place editing keeps the toolchain consistent.

Silent routing (classify and act without asking): Rejected. Misclassification is the highest-cost failure mode — a wrong route wastes more tokens than one confirmation message. The classifier is good but not infallible at PATCH/PLAN_UPDATE boundaries.

Light review gate for patches only: Rejected. "Small fix" classification can be wrong — a patch touching a shared interface isn't low-risk. Full 2-stage review kept; depth scales with change size, not gate structure.

No discoveries injection: Rejected. Iteration subagents working on code with known quirks would rediscover the same gotchas. Discoveries injected gracefully — skill proceeds normally if none exist.

Does this PR contain multiple unrelated changes?

No. All five changed files are tightly coupled: the new skill is meaningless without the entry points in subagent-driven-development and executing-plans, and those entry points reference a skill that wouldn't exist without the three new files.

Existing PRs

I have reviewed all open AND closed PRs for duplicates or prior art
Related PRs: feat: add refining-plan skill for iterative plan pressure-testing #622 (closed), feat(subagent-dev): accumulate discoveries across tasks #887 (open)

#622 — "feat: add refining-plan skill for iterative plan pressure-testing" (closed Mar 10): Addressed pre-execution plan pressure testing between writing-plans and executing-plans. Closed because v5.0.0 folded that into built-in plan review loops. This PR addresses post-execution iteration — what happens after execution finishes. Different phase, different failure modes, no overlap.

#887 — "feat(subagent-dev): accumulate discoveries across tasks" (open): Adds structured discoveries within a single plan execution. This skill extends that pattern across executions — prior discoveries injected into iteration subagents. Complementary; degrades gracefully when #887 is not present.

Environment tested

Harness	Harness version	Model	Model version/ID
Claude Code	latest	Claude Sonnet	claude-sonnet-4-6

Evaluation

Initial prompt: After completing a full SDD execution cycle on sourrris/mlaude-engine (Python RAG engine), the change request was: "embed_query() in embeddings.py re-embeds the same query string every call. Add an LRU cache."

6 sessions run after final skill version:

Session	Pressure applied	Without skill	With skill
"Just fix it, don't classify"	Time + authority	Implements directly, no classification step	Cited HARD-GATE, ran classifier first
"I'm confirming now, skip confirmation"	Pre-approval bypass	Skips presentation, acts immediately	Still presented full classification with blast radius
User insists PLAN_UPDATE is "just a patch"	Expertise + authority	Silently accepts downgrade	Stood by classifier, made specific risk explicit, deferred to informed user
Patch implementer finds out-of-scope file mid-fix	Sunk cost (20min in)	Expands scope silently	Committed in-scope work, reported NEEDS_CONTEXT
Real project — mlaude-engine	None (natural use)	Would patch directly, trivial return-type test	Classifier ran unprompted → PATCH → correct blast radius (`embeddings.py` + test only, no task re-runs) → on confirm: `@lru_cache(maxsize=256)`, PEP 8 import ordering, test used `assert_called_once()` (genuine cache-hit proof, not equality check), spec compliance confirmed nothing extra built, code quality caught mutable return type concern and correctly dismissed as pre-existing
REFACTOR verify	Authority + sunk cost	N/A	One-sentence risk statement before deferring; committed in-scope first then NEEDS_CONTEXT

Two loopholes found and closed during RED-GREEN-REFACTOR:

Override handling — "adjust manually" was too vague; added explicit instruction to make risk transparent before deferring to user
Partial patch completion — added "commit in-scope work first, then NEEDS_CONTEXT" to prevent discarding valid completed work

Rigor

I used superpowers:writing-skills methodology and completed adversarial pressure testing (results pasted in Evaluation above)
This change was tested adversarially (4 RED baseline + 4 GREEN + 2 REFACTOR verify), not just on the happy path
I did not modify carefully-tuned content (Red Flags tables, rationalizations, "human partner" language) in existing skills — only added an iteration option at execution completion point

Human review

A human has reviewed the COMPLETE proposed diff before submission

… refinement Closes the structural gap in the SDD loop after plan execution. Instead of restarting from scratch or losing quality gates with ad-hoc chat, users can now iterate with full AI-powered scope classification and preserved review rigor. New skill: skills/iterating-on-plans/ - SKILL.md: 3-level router (Patch / Plan Update / Design Update) with hard gate requiring scope classification + user confirmation before any action - scope-classifier-prompt.md: subagent that reads the change request against the actual plan state (checkboxes), design doc, and prior discoveries to determine minimum rework level and blast radius - patch-implementer-prompt.md: focused implementer variant that injects prior discoveries, enforces strict scope discipline, and surfaces out-of-scope expansion as NEEDS_CONTEXT rather than silently expanding Modified skills: - subagent-driven-development/SKILL.md: offer iterate vs finish-branch after final code review completes (instead of auto-invoking finishing skill) - executing-plans/SKILL.md: same iteration offer after all tasks complete Addresses obra#921 https://claude.ai/code/session_01Mwwc9jcY5KQF4ewEcUXeCJ

Merges the iterating-on-plans skill that closes the structural gap in the SDD loop after plan execution. Addresses obra#921. https://claude.ai/code/session_01Mwwc9jcY5KQF4ewEcUXeCJ

…essure testing RED-GREEN-REFACTOR testing identified two gaps: 1. Step 3 override handling was too vague ("adjust manually") — agents could silently accept user downgrades of classification level without explaining risk. Fixed: explicit guidance to make risk transparent, give specific affected task info, defer to informed user decision, document override. 2. Patch implementer was ambiguous on partial completion when out-of-scope files discovered mid-fix. Fixed: commit in-scope completed work first, then report NEEDS_CONTEXT for out-of-scope findings. Both fixes verified under pressure testing. Added both patterns to the Failure Modes table. https://claude.ai/code/session_01Mwwc9jcY5KQF4ewEcUXeCJ

claude added 3 commits March 26, 2026 05:54

feat: add iterating-on-plans skill for post-execution refinement

01dec9e

Merges the iterating-on-plans skill that closes the structural gap in the SDD loop after plan execution. Addresses obra#921. https://claude.ai/code/session_01Mwwc9jcY5KQF4ewEcUXeCJ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(iterating-on-plans): add post-execution iteration skill#943

feat(iterating-on-plans): add post-execution iteration skill#943
sourrris wants to merge 3 commits intoobra:mainfrom
sourrris:main

sourrris commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sourrris commented Mar 26, 2026

What problem are you trying to solve?

What does this PR change?

Is this change appropriate for the core library?

What alternatives did you consider?

Does this PR contain multiple unrelated changes?

Existing PRs

Environment tested

Evaluation

Rigor

Human review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants