feat(strands-py): add GoalLoop vended plugin with docs#2738
feat(strands-py): add GoalLoop vended plugin with docs#2738notowen333 wants to merge 17 commits into
Conversation
Review SummaryAssessment: Comment (a few changes worth making before merge) Clean, faithful port of the TS Themes
Process note: This adds a new public plugin class plus 8 public exports — a "moderate" change under API_BAR_RAISING.md. The PR description has the API prep (use cases, signatures, exports) ✅, but the Nice work — the port reads cleanly and the docs page is thorough. |
Re-review — commit
|
Re-check at head
|
Port the GoalLoop iterative-refinement plugin from TypeScript to Python. The plugin validates agent responses against a goal (NL string or programmatic validator) and loops with feedback until satisfied. Includes 35 unit tests and a dual-language documentation page.
TypeScript examples must live in sibling .ts files and be included via --8<-- directives, not inlined in MDX. Created goal-loop.ts and goal-loop_imports.ts with proper snippet regions. Updated docs-writer skill to make this convention unmissable: added CRITICAL callout in Step 3b and a new top-level Gotcha.
All three skills (writer, reviewer, audit) now enforce: - TypeScript is never inlined in MDX - Imports live in a separate _imports.ts file with per-example regions - Every TS example must include both imports and body snippets - A body-only include missing imports fails review
Replace ASCII box-drawing diagram with a mermaid flowchart. Add mermaid requirement to docs-writer and docs-reviewer skills.
Both were plain facts that belong as inline prose, not visually loud admonitions. The caution described behavior the plugin already warns about; the note was just context.
The callout sparing-use rule already lives in mdx-authoring.md. Remove redundant restatements from writer/reviewer skills and instead point to the existing guidance at the right moments.
- Reflow goal-loop.mdx prose to fill lines to ~80-90 chars
- Remove language-specific param from heading ("Stateless Retries")
- Restructure reviewer skill: split monolithic Constraints bullet
into separate dimensions (Voice Stack, Multi-Language, Terminology,
Code Examples with site conventions, Readability, Type Alignment)
- Add heading language-neutrality rule to writer Step 4
Prose outside tabs must be language-neutral. Replaced Python-specific parameter names (preserve_context=False, max_attempts, stop_reason, last_result()) with plain English equivalents. Updated writer skill to make language-neutral shared prose the top-level rule.
…ator - Decompose build_judge_prompt into small named helpers instead of nested loops - Rename _nl_validator to _judge_validator for clarity - Add integration tests mirroring the TS integ suite (standard loop + preserve_context=False) - Type WeakSet/WeakKeyDictionary with Agent instead of Any
- Reject timeout <= 0 (was allowing 0 which causes immediate timeout) - Use unicode ellipsis in truncation to match TS output format
- Include system_prompt in Ralph-mode snapshot (restores TS parity) - Use Literal type for GoalStopReason instead of bare str - Move `import inspect` to module level - Fix mypy: type judge helpers with Message/ContentBlock/ToolResultContent - Add NL judge unit tests (construction, feedback, model/prompt overrides, fallback path, fresh-agent-per-validation) - Remove unused pytest import from integ tests
…ment - Convert Validator from Callable alias to Protocol with **kwargs for forward-compatible extensibility - Fix logger.warning to use structured format (plugin=<%s>, error=<%s> |) - Use full GoalResult/GoalAttempt equality assertions instead of per-field - Add comment explaining Python snapshot preset divergence from TS
Replace 'plugins' (not in the collection schema) with 'event-loop'.
Documentation Preview ReadyYour documentation preview has been successfully deployed! Changed pages: Updated at: 2026-06-12T22:15:28.370Z |
- Remove hook/event implementation details from "How It Works" section - Convert JUDGE_SYSTEM_PROMPT to triple-quoted string for readability - Extract TS word_count_validator into a named function (matches Python) - Normalize variable naming to `plugin` in "Inspecting Results" examples - Replace Spanish resume prompt examples with English - Use <Syntax> component for language-specific inline terms - Add prompt-authoring tag to link goal-loop with steering - Add explanatory comments for WeakKeyDictionary and WeakSet usage
The "start over from scratch" prompt shows a real reason to customize — diverging from the default incremental-fix behavior — rather than restating the default in slightly different words.
| when you have actionable feedback for the next attempt. | ||
| """ | ||
|
|
||
| @runtime_checkable |
There was a problem hiding this comment.
Issue: ruff format --check fails on this file, so the ci.yml merge gate will block. The PR checklist marks hatch run prepare as run, but formatting isn't applied here.
$ ruff format --check src/strands/vended_plugins/goal/plugin.py
Would reformat: src/strands/vended_plugins/goal/plugin.py
Three spots need it:
- missing blank line before
@runtime_checkable class Validator(here, L79) - missing blank line before
GoalStopReason = ...(L97) - the
__call__signature (L91-93) collapses to a single line under the 120-char limit
Suggestion: Run hatch fmt --formatter (or ruff format) and commit the result. ruff check and mypy are both clean — it's only the formatter.
| return | ||
|
|
||
| elapsed = time.monotonic() - run.start_time | ||
| if elapsed >= self._timeout: |
There was a problem hiding this comment.
Issue: The timeout is checked before the response is validated, so an attempt whose response would satisfy the goal is reported as passed=False, stop_reason="timeout" if the budget elapsed during the model call that produced it. The successful result is computed-then-discarded.
Suggestion (non-blocking): Consider validating first and only falling back to "timeout" when validation fails — that way a passing final attempt still resolves as "satisfied". If reporting timeout even on a would-pass response is intentional (strict wall-clock budget), a one-line comment here would make that deliberate choice clear to the next reader.
|
Assessment: Comment (one merge-gate fix before merge) Clean, well-documented port with strong test coverage. I verified the suite locally — 41/41 unit tests pass, What I found
Only the formatter is blocking; everything else is non-blocking. Nice work — the docs page and judge/validator separation read really well. |
Description
Agents often need iterative refinement — retry until the response meets a quality bar. Today that means hand-rolling a loop with timeout logic, attempt tracking, feedback injection, and state management. GoalLoop encapsulates all of that as a vended plugin that works inside the existing hook lifecycle.
This PR ports the GoalLoop plugin from TypeScript to Python and adds a dual-language documentation page.
Public API Changes
New module:
strands.vended_plugins.goalExported symbols
GoalLoopplugins=[...]GoalResultpassed,stop_reason,attemptsGoalAttemptattempt,passed,feedbackGoalStopReason"satisfied"|"max_attempts"|"timeout"JudgeConfigmodel,system_promptValidationOutcomepassed,feedbackValidatorJUDGE_SYSTEM_PROMPTJudgeOutcomebuild_judge_promptGoalLoop constructor parameters
goalmax_attemptsinftimeoutinfjudgeNoneJudgeConfigto override the judge model or system promptpreserve_contextTrueresume_prompt_templatename"strands:goal-loop"Related Issues
N/A - new feature port
Documentation PR
Included in this PR under
site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdxwith dual-language tabs (Python + TypeScript).Type of Change
New feature
Testing
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.