feat(iterating-on-plans): add post-execution iteration skill#943
Open
feat(iterating-on-plans): add post-execution iteration skill#943
Conversation
… refinement Closes the structural gap in the SDD loop after plan execution. Instead of restarting from scratch or losing quality gates with ad-hoc chat, users can now iterate with full AI-powered scope classification and preserved review rigor. New skill: skills/iterating-on-plans/ - SKILL.md: 3-level router (Patch / Plan Update / Design Update) with hard gate requiring scope classification + user confirmation before any action - scope-classifier-prompt.md: subagent that reads the change request against the actual plan state (checkboxes), design doc, and prior discoveries to determine minimum rework level and blast radius - patch-implementer-prompt.md: focused implementer variant that injects prior discoveries, enforces strict scope discipline, and surfaces out-of-scope expansion as NEEDS_CONTEXT rather than silently expanding Modified skills: - subagent-driven-development/SKILL.md: offer iterate vs finish-branch after final code review completes (instead of auto-invoking finishing skill) - executing-plans/SKILL.md: same iteration offer after all tasks complete Addresses obra#921 https://claude.ai/code/session_01Mwwc9jcY5KQF4ewEcUXeCJ
Merges the iterating-on-plans skill that closes the structural gap in the SDD loop after plan execution. Addresses obra#921. https://claude.ai/code/session_01Mwwc9jcY5KQF4ewEcUXeCJ
…essure testing
RED-GREEN-REFACTOR testing identified two gaps:
1. Step 3 override handling was too vague ("adjust manually") — agents could
silently accept user downgrades of classification level without explaining
risk. Fixed: explicit guidance to make risk transparent, give specific
affected task info, defer to informed user decision, document override.
2. Patch implementer was ambiguous on partial completion when out-of-scope
files discovered mid-fix. Fixed: commit in-scope completed work first,
then report NEEDS_CONTEXT for out-of-scope findings.
Both fixes verified under pressure testing. Added both patterns to the
Failure Modes table.
https://claude.ai/code/session_01Mwwc9jcY5KQF4ewEcUXeCJ
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem are you trying to solve?
After a subagent-driven development cycle completes, users face a structural gap: the implementation is 80–90% right but needs targeted refinement. The only options today are:
Three specific failure modes documented by commenters on issue #921:
What does this PR change?
Adds a new
iterating-on-plansskill (3 files) that surfaces automatically at the end ofsubagent-driven-developmentandexecuting-plans. It classifies the change request into one of three rework levels (Patch / Plan Update / Design Update), presents the classification with rationale for user confirmation, then routes to the appropriate execution path — preserving all existing quality gates. Two existing skills are updated to offer iteration as an option alongsidefinishing-a-development-branch.Is this change appropriate for the core library?
Yes. The gap between plan execution and refinement exists for every user of
subagent-driven-developmentorexecuting-plans, regardless of project domain, language, or toolchain. No project-specific logic, no third-party integrations, no tool-specific assumptions. Pure workflow orchestration.What alternatives did you consider?
Mid-execution iteration only: Rejected — the most common case is discovering gaps after a full cycle. Mid-execution stopping already exists via BLOCKED/NEEDS_CONTEXT escalation.
Delta plan (new file per iteration) vs. in-place editing: Delta plans offer a cleaner audit trail but break the single-plan-file assumption of the existing skill ecosystem. Git preserves history; in-place editing keeps the toolchain consistent.
Silent routing (classify and act without asking): Rejected. Misclassification is the highest-cost failure mode — a wrong route wastes more tokens than one confirmation message. The classifier is good but not infallible at PATCH/PLAN_UPDATE boundaries.
Light review gate for patches only: Rejected. "Small fix" classification can be wrong — a patch touching a shared interface isn't low-risk. Full 2-stage review kept; depth scales with change size, not gate structure.
No discoveries injection: Rejected. Iteration subagents working on code with known quirks would rediscover the same gotchas. Discoveries injected gracefully — skill proceeds normally if none exist.
Does this PR contain multiple unrelated changes?
No. All five changed files are tightly coupled: the new skill is meaningless without the entry points in
subagent-driven-developmentandexecuting-plans, and those entry points reference a skill that wouldn't exist without the three new files.Existing PRs
#622 — "feat: add refining-plan skill for iterative plan pressure-testing" (closed Mar 10): Addressed pre-execution plan pressure testing between
writing-plansandexecuting-plans. Closed because v5.0.0 folded that into built-in plan review loops. This PR addresses post-execution iteration — what happens after execution finishes. Different phase, different failure modes, no overlap.#887 — "feat(subagent-dev): accumulate discoveries across tasks" (open): Adds structured discoveries within a single plan execution. This skill extends that pattern across executions — prior discoveries injected into iteration subagents. Complementary; degrades gracefully when #887 is not present.
Environment tested
Evaluation
Initial prompt: After completing a full SDD execution cycle on sourrris/mlaude-engine (Python RAG engine), the change request was: "embed_query() in embeddings.py re-embeds the same query string every call. Add an LRU cache."
6 sessions run after final skill version:
embeddings.py+ test only, no task re-runs) → on confirm:@lru_cache(maxsize=256), PEP 8 import ordering, test usedassert_called_once()(genuine cache-hit proof, not equality check), spec compliance confirmed nothing extra built, code quality caught mutable return type concern and correctly dismissed as pre-existingTwo loopholes found and closed during RED-GREEN-REFACTOR:
Rigor
superpowers:writing-skillsmethodology and completed adversarial pressure testing (results pasted in Evaluation above)Human review