Skip to content

feat(strands-py): add GoalLoop vended plugin with docs#2738

Open
notowen333 wants to merge 17 commits into
strands-agents:mainfrom
notowen333:python-goal-plugin-with-docs
Open

feat(strands-py): add GoalLoop vended plugin with docs#2738
notowen333 wants to merge 17 commits into
strands-agents:mainfrom
notowen333:python-goal-plugin-with-docs

Conversation

@notowen333

@notowen333 notowen333 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Description

Agents often need iterative refinement — retry until the response meets a quality bar. Today that means hand-rolling a loop with timeout logic, attempt tracking, feedback injection, and state management. GoalLoop encapsulates all of that as a vended plugin that works inside the existing hook lifecycle.

This PR ports the GoalLoop plugin from TypeScript to Python and adds a dual-language documentation page.

Public API Changes

New module: strands.vended_plugins.goal

from strands import Agent
from strands.vended_plugins.goal import GoalLoop

# Natural-language goal — judged by an internal agent built from the host's model
concise = GoalLoop(
    goal="At most 3 sentences, accessible to a 10-year-old, no jargon.",
    max_attempts=3,
)

agent = Agent(plugins=[concise])
agent("Explain how rainbows form.")
print(concise.last_result(agent))
# GoalResult(passed=True, stop_reason='satisfied', attempts=[...])
# Programmatic validator — pass a callable to skip the judge agent entirely
from strands.vended_plugins.goal import GoalLoop

def word_count_validator(response, agent):
    text = " ".join(
        block["text"] for block in response["content"] if "text" in block
    )
    words = len(text.split())
    if words <= 50:
        return True
    return {"passed": False, "feedback": f"Too long ({words} words). Cap at 50."}

plugin = GoalLoop(goal=word_count_validator, max_attempts=5, timeout=30.0)

Exported symbols

Symbol Kind Purpose
GoalLoop Plugin class Main entry point — attach to an agent via plugins=[...]
GoalResult Dataclass Aggregate result with passed, stop_reason, attempts
GoalAttempt Dataclass Per-attempt record: attempt, passed, feedback
GoalStopReason Literal type "satisfied" | "max_attempts" | "timeout"
JudgeConfig Dataclass Optional judge tuning: model, system_prompt
ValidationOutcome Dataclass Canonical validator return: passed, feedback
Validator Protocol Type for programmatic validator callables
JUDGE_SYSTEM_PROMPT str Default system prompt for the NL judge
JudgeOutcome Pydantic model Structured output schema the judge fills
build_judge_prompt Function Builds the judge input from a goal + transcript

GoalLoop constructor parameters

Parameter Default Description
goal (required) NL string (judged by internal agent) or callable validator
max_attempts inf Maximum attempts before stopping
timeout inf Wall-clock budget in seconds
judge None JudgeConfig to override the judge model or system prompt
preserve_context True Keep conversation history across retries
resume_prompt_template (built-in) `Callable[[str
name "strands:goal-loop" Plugin name (must be unique per agent)

Related Issues

N/A - new feature port

Documentation PR

Included in this PR under site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdx with dual-language tabs (Python + TypeScript).

Type of Change

New feature

Testing

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@github-actions github-actions Bot added enhancement New feature or request python Pull requests that update python code area-hooks Features or requests that might be implementable via hooks area-structured-output Related to the structured output api documentation Documentation changes, improvements, additions, content updates, site improvements, examples, guides size/xl and removed size/xl labels Jun 11, 2026
@github-actions github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026
@github-actions github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026
@github-actions github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026
@github-actions github-actions Bot removed the size/xl label Jun 11, 2026
Comment thread strands-py/tests/strands/vended_plugins/goal/test_plugin.py Outdated
Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py Outdated
Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py
Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py Outdated
@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Comment (a few changes worth making before merge)

Clean, faithful port of the TS GoalLoop. The hook lifecycle, resumed-flag continuation logic, fresh-judge-per-validation, and snapshot/restore all mirror the original correctly, and the snapshot preset difference between the two SDKs was handled deliberately rather than copied blindly. Test coverage is strong.

Themes
  • Type design (Python-specific): Public callable types (Validator, resume_prompt_template) use Callable, which the repo style guide steers away from for extensible interfaces — Protocol with **kwargs keeps the door open for future arguments. Worth fixing while the API is new.
  • Conventions: One log line doesn't follow the SDK's structured field=<%s> logging style.
  • Tests: A few assertions check dataclass fields individually where a single == against the expected GoalResult would be stronger. Non-blocking.
  • Port fidelity: Verified faithful. One intentional divergence (Ralph mode restores conversation_manager_state in Python) deserves a code comment.

Process note: This adds a new public plugin class plus 8 public exports — a "moderate" change under API_BAR_RAISING.md. The PR description has the API prep (use cases, signatures, exports) ✅, but the needs-api-review label isn't applied. Consider adding it (this PR even introduces the label workflow). The PR is also still in draft.

Nice work — the port reads cleanly and the docs page is thorough.

@github-actions

Copy link
Copy Markdown
Contributor

Re-review — commit 7d855d0

Assessment: Approve (one minor, non-blocking follow-up)

Thanks for the fast turnaround — all four review points are addressed and the suite is green (41 unit tests pass locally).

Verification
  • Validator@runtime_checkable Protocol with **kwargs (forward-compatible)
  • ✅ Structured logging: logger.warning("plugin=<%s>, error=<%s> | validator threw", ...)
  • ✅ Tests assert full GoalResult/GoalAttempt equality instead of per-field
  • ✅ Snapshot preset divergence now documented with an inline comment
  • ✅ Port fidelity re-confirmed against the TS source

Minor follow-up: resume_prompt_template (L228) is still a frozen Callable — same forward-compat reasoning as Validator, so converting it to a Protocol with **kwargs would round this out. Not a blocker.

Process reminder: still worth applying api/needs-review (new public abstraction) and flipping out of draft when ready. Otherwise this looks good to go.

@agent-of-mkmeral

Copy link
Copy Markdown
Contributor

Re-check at head 7d855d0 (covers 930c808 + 7d855d0)

Verdict: the port is still faithful, and all four items from my original review are resolved. Re-verified the changed surface against the TS source and ran the suite — 41/41 unit tests pass locally.

✅ The one real fidelity gap is fixed (Ralph-mode system_prompt rewind)

run.initial_snapshot = event.agent.take_snapshot(
    preset="session", include=["system_prompt"], exclude=["state"]
)

Verified empirically against the snapshot resolver:

before fix: ['conversation_manager_state', 'interrupt_state', 'messages', 'model_state']
after fix:  ['conversation_manager_state', 'interrupt_state', 'messages', 'model_state', 'system_prompt']

Python now rewinds everything the TS session preset rewinds (plus conversation_manager_state, which is documented inline as an intentional Python-only divergence — exactly what I'd hoped for). The test at test_snapshot_taken_on_first_model_call pins the exact call signature, so this can't silently regress. And the docs claim in goal-loop.mdx ("messages, system prompt, model state") is now accurate for Python — no doc change needed.

✅ Minor items from my review, all addressed

  • Judge-path unit tests — the previously untested _judge_validator now has 6 mocked tests (test_nl_judge_*: first-attempt pass, feedback loop, judge.model override, judge.system_prompt override, no-structured-output fallback, fresh-agent-per-validation), matching the TS suite's coverage 1:1. The patch target (strands.agent.agent.Agent) correctly intercepts the plugin's lazy import.
  • GoalStopReason — now Literal["satisfied", "max_attempts", "timeout"].
  • import inspect — moved to module level.
  • Bonus: judge.py rendering helpers got real types (Message/ContentBlock/ToolResultContent instead of dict) — behavior unchanged, port fidelity unaffected.

Port fidelity of the new changes themselves

The Validator@runtime_checkable Protocol conversion preserves the exact call shape ((response, agent) positionally, return bool | dict | ValidationOutcome, sync or async), so all TS-equivalent behavior in _fn_validator is untouched. The structured-logging reformat changes only the log string, not semantics.

Remaining (non-blocking)

  • resume_prompt_template is still a frozen Callable — same forward-compat argument as Validator, worth doing while the API is new (the /strands review bot flagged this too).
  • CI note: all Python unit-test jobs are green across 3.10–3.14 / linux+windows (the CANCELLED entries are superseded runs); the label-size check failure is labeler noise unrelated to this change.

Good to go from a port-fidelity standpoint. 🚢

Port the GoalLoop iterative-refinement plugin from TypeScript to Python.
The plugin validates agent responses against a goal (NL string or
programmatic validator) and loops with feedback until satisfied.

Includes 35 unit tests and a dual-language documentation page.
TypeScript examples must live in sibling .ts files and be included
via --8<-- directives, not inlined in MDX. Created goal-loop.ts and
goal-loop_imports.ts with proper snippet regions.

Updated docs-writer skill to make this convention unmissable:
added CRITICAL callout in Step 3b and a new top-level Gotcha.
All three skills (writer, reviewer, audit) now enforce:
- TypeScript is never inlined in MDX
- Imports live in a separate _imports.ts file with per-example regions
- Every TS example must include both imports and body snippets
- A body-only include missing imports fails review
Replace ASCII box-drawing diagram with a mermaid flowchart.
Add mermaid requirement to docs-writer and docs-reviewer skills.
Both were plain facts that belong as inline prose, not visually
loud admonitions. The caution described behavior the plugin
already warns about; the note was just context.
The callout sparing-use rule already lives in mdx-authoring.md.
Remove redundant restatements from writer/reviewer skills and
instead point to the existing guidance at the right moments.
- Reflow goal-loop.mdx prose to fill lines to ~80-90 chars
- Remove language-specific param from heading ("Stateless Retries")
- Restructure reviewer skill: split monolithic Constraints bullet
  into separate dimensions (Voice Stack, Multi-Language, Terminology,
  Code Examples with site conventions, Readability, Type Alignment)
- Add heading language-neutrality rule to writer Step 4
Prose outside tabs must be language-neutral. Replaced Python-specific
parameter names (preserve_context=False, max_attempts, stop_reason,
last_result()) with plain English equivalents. Updated writer skill
to make language-neutral shared prose the top-level rule.
…ator

- Decompose build_judge_prompt into small named helpers instead of nested loops
- Rename _nl_validator to _judge_validator for clarity
- Add integration tests mirroring the TS integ suite (standard loop + preserve_context=False)
- Type WeakSet/WeakKeyDictionary with Agent instead of Any
- Reject timeout <= 0 (was allowing 0 which causes immediate timeout)
- Use unicode ellipsis in truncation to match TS output format
- Include system_prompt in Ralph-mode snapshot (restores TS parity)
- Use Literal type for GoalStopReason instead of bare str
- Move `import inspect` to module level
- Fix mypy: type judge helpers with Message/ContentBlock/ToolResultContent
- Add NL judge unit tests (construction, feedback, model/prompt overrides,
  fallback path, fresh-agent-per-validation)
- Remove unused pytest import from integ tests
…ment

- Convert Validator from Callable alias to Protocol with **kwargs for
  forward-compatible extensibility
- Fix logger.warning to use structured format (plugin=<%s>, error=<%s> |)
- Use full GoalResult/GoalAttempt equality assertions instead of per-field
- Add comment explaining Python snapshot preset divergence from TS
Replace 'plugins' (not in the collection schema) with 'event-loop'.
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Changed pages:

Updated at: 2026-06-12T22:15:28.370Z

Comment thread .agents/skills/docs-reviewer/SKILL.md
Comment thread site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdx Outdated
Comment thread .agents/skills/docs-audit/SKILL.md
Comment thread strands-py/src/strands/vended_plugins/goal/judge.py Outdated
Comment thread strands-py/src/strands/vended_plugins/goal/judge.py Outdated
Comment thread site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdx Outdated
Comment thread site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdx Outdated
Comment thread site/src/content/docs/user-guide/concepts/plugins/goal-loop.ts Outdated
Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py
Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py
- Remove hook/event implementation details from "How It Works" section
- Convert JUDGE_SYSTEM_PROMPT to triple-quoted string for readability
- Extract TS word_count_validator into a named function (matches Python)
- Normalize variable naming to `plugin` in "Inspecting Results" examples
- Replace Spanish resume prompt examples with English
- Use <Syntax> component for language-specific inline terms
- Add prompt-authoring tag to link goal-loop with steering
- Add explanatory comments for WeakKeyDictionary and WeakSet usage
The "start over from scratch" prompt shows a real reason to customize —
diverging from the default incremental-fix behavior — rather than
restating the default in slightly different words.
when you have actionable feedback for the next attempt.
"""

@runtime_checkable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: ruff format --check fails on this file, so the ci.yml merge gate will block. The PR checklist marks hatch run prepare as run, but formatting isn't applied here.

$ ruff format --check src/strands/vended_plugins/goal/plugin.py
Would reformat: src/strands/vended_plugins/goal/plugin.py

Three spots need it:

  • missing blank line before @runtime_checkable class Validator (here, L79)
  • missing blank line before GoalStopReason = ... (L97)
  • the __call__ signature (L91-93) collapses to a single line under the 120-char limit

Suggestion: Run hatch fmt --formatter (or ruff format) and commit the result. ruff check and mypy are both clean — it's only the formatter.

return

elapsed = time.monotonic() - run.start_time
if elapsed >= self._timeout:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The timeout is checked before the response is validated, so an attempt whose response would satisfy the goal is reported as passed=False, stop_reason="timeout" if the budget elapsed during the model call that produced it. The successful result is computed-then-discarded.

Suggestion (non-blocking): Consider validating first and only falling back to "timeout" when validation fails — that way a passing final attempt still resolves as "satisfied". If reporting timeout even on a would-pass response is intentional (strict wall-clock budget), a one-line comment here would make that deliberate choice clear to the next reader.

@github-actions

Copy link
Copy Markdown
Contributor

Assessment: Comment (one merge-gate fix before merge)

Clean, well-documented port with strong test coverage. I verified the suite locally — 41/41 unit tests pass, mypy is clean, and ruff check (lint) is clean. The prior review history (port fidelity, Validator Protocol, structured logging, full-object assertions, snapshot-preset docs) is all addressed, so I focused only on what's new.

What I found
  • Formatting gate (worth fixing before merge): ruff format --check fails on plugin.py — see the inline comment. CI's ci.yml runs this, so it'll block merge despite the hatch run prepare checkbox. One ruff format run clears it.
  • Timeout semantics (suggestion): timeout is checked before validation, so a final response that would pass is reported as timeout/not-passed. Inline note suggests validating first or documenting the choice.

Only the formatter is blocking; everything else is non-blocking. Nice work — the docs page and judge/validator separation read really well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-hooks Features or requests that might be implementable via hooks area-structured-output Related to the structured output api documentation Documentation changes, improvements, additions, content updates, site improvements, examples, guides enhancement New feature or request python Pull requests that update python code size/xl

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants