feat(strands-py): add GoalLoop vended plugin with docs by notowen333 · Pull Request #2738 · strands-agents/harness-sdk

notowen333 · 2026-06-11T18:08:40Z

Description

Agents often need iterative refinement — retry until the response meets a quality bar. Today that means hand-rolling a loop with timeout logic, attempt tracking, feedback injection, and state management. GoalLoop encapsulates all of that as a vended plugin that works inside the existing hook lifecycle.

This PR ports the GoalLoop plugin from TypeScript to Python and adds a dual-language documentation page.

Public API Changes

New module: strands.vended_plugins.goal

from strands import Agent
from strands.vended_plugins.goal import GoalLoop

# Natural-language goal — judged by an internal agent built from the host's model
concise = GoalLoop(
    goal="At most 3 sentences, accessible to a 10-year-old, no jargon.",
    max_attempts=3,
)

agent = Agent(plugins=[concise])
agent("Explain how rainbows form.")
print(concise.last_result(agent))
# GoalResult(passed=True, stop_reason='satisfied', attempts=[...])

# Programmatic validator — pass a callable to skip the judge agent entirely
from strands.vended_plugins.goal import GoalLoop

def word_count_validator(response, agent):
    text = " ".join(
        block["text"] for block in response["content"] if "text" in block
    )
    words = len(text.split())
    if words <= 50:
        return True
    return {"passed": False, "feedback": f"Too long ({words} words). Cap at 50."}

plugin = GoalLoop(goal=word_count_validator, max_attempts=5, timeout=30.0)

Exported symbols

Symbol	Kind	Purpose
`GoalLoop`	Plugin class	Main entry point — attach to an agent via `plugins=[...]`
`GoalResult`	Dataclass	Aggregate result with `passed`, `stop_reason`, `attempts`
`GoalAttempt`	Dataclass	Per-attempt record: `attempt`, `passed`, `feedback`
`GoalStopReason`	Literal type	`"satisfied"` \| `"max_attempts"` \| `"timeout"`
`JudgeConfig`	Dataclass	Optional judge tuning: `model`, `system_prompt`
`ValidationOutcome`	Dataclass	Canonical validator return: `passed`, `feedback`
`Validator`	Protocol	Type for programmatic validator callables
`JUDGE_SYSTEM_PROMPT`	str	Default system prompt for the NL judge
`JudgeOutcome`	Pydantic model	Structured output schema the judge fills
`build_judge_prompt`	Function	Builds the judge input from a goal + transcript

GoalLoop constructor parameters

Parameter	Default	Description
`goal`	(required)	NL string (judged by internal agent) or callable validator
`max_attempts`	`inf`	Maximum attempts before stopping
`timeout`	`inf`	Wall-clock budget in seconds
`judge`	`None`	`JudgeConfig` to override the judge model or system prompt
`preserve_context`	`True`	Keep conversation history across retries
`resume_prompt_template`	(built-in)	`Callable[[str
`name`	`"strands:goal-loop"`	Plugin name (must be unique per agent)

Related Issues

N/A - new feature port

Documentation PR

Included in this PR under site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdx with dual-language tabs (Python + TypeScript).

Type of Change

New feature

Testing

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2026-06-12T03:09:06Z

Review Summary

Assessment: Comment (a few changes worth making before merge)

Clean, faithful port of the TS GoalLoop. The hook lifecycle, resumed-flag continuation logic, fresh-judge-per-validation, and snapshot/restore all mirror the original correctly, and the snapshot preset difference between the two SDKs was handled deliberately rather than copied blindly. Test coverage is strong.

Themes

Type design (Python-specific): Public callable types (Validator, resume_prompt_template) use Callable, which the repo style guide steers away from for extensible interfaces — Protocol with **kwargs keeps the door open for future arguments. Worth fixing while the API is new.
Conventions: One log line doesn't follow the SDK's structured field=<%s> logging style.
Tests: A few assertions check dataclass fields individually where a single == against the expected GoalResult would be stronger. Non-blocking.
Port fidelity: Verified faithful. One intentional divergence (Ralph mode restores conversation_manager_state in Python) deserves a code comment.

Process note: This adds a new public plugin class plus 8 public exports — a "moderate" change under API_BAR_RAISING.md. The PR description has the API prep (use cases, signatures, exports) ✅, but the needs-api-review label isn't applied. Consider adding it (this PR even introduces the label workflow). The PR is also still in draft.

Nice work — the port reads cleanly and the docs page is thorough.

github-actions · 2026-06-12T03:23:56Z

Re-review — commit `7d855d0`

Assessment: Approve (one minor, non-blocking follow-up)

Thanks for the fast turnaround — all four review points are addressed and the suite is green (41 unit tests pass locally).

Verification

✅ Validator → @runtime_checkable Protocol with **kwargs (forward-compatible)
✅ Structured logging: logger.warning("plugin=<%s>, error=<%s> | validator threw", ...)
✅ Tests assert full GoalResult/GoalAttempt equality instead of per-field
✅ Snapshot preset divergence now documented with an inline comment
✅ Port fidelity re-confirmed against the TS source

Minor follow-up: resume_prompt_template (L228) is still a frozen Callable — same forward-compat reasoning as Validator, so converting it to a Protocol with **kwargs would round this out. Not a blocker.

Process reminder: still worth applying api/needs-review (new public abstraction) and flipping out of draft when ready. Otherwise this looks good to go.

agent-of-mkmeral · 2026-06-12T09:44:52Z

Re-check at head `7d855d0` (covers `930c808` + `7d855d0`)

Verdict: the port is still faithful, and all four items from my original review are resolved. Re-verified the changed surface against the TS source and ran the suite — 41/41 unit tests pass locally.

✅ The one real fidelity gap is fixed (Ralph-mode `system_prompt` rewind)

run.initial_snapshot = event.agent.take_snapshot(
    preset="session", include=["system_prompt"], exclude=["state"]
)

Verified empirically against the snapshot resolver:

before fix: ['conversation_manager_state', 'interrupt_state', 'messages', 'model_state']
after fix:  ['conversation_manager_state', 'interrupt_state', 'messages', 'model_state', 'system_prompt']

Python now rewinds everything the TS session preset rewinds (plus conversation_manager_state, which is documented inline as an intentional Python-only divergence — exactly what I'd hoped for). The test at test_snapshot_taken_on_first_model_call pins the exact call signature, so this can't silently regress. And the docs claim in goal-loop.mdx ("messages, system prompt, model state") is now accurate for Python — no doc change needed.

✅ Minor items from my review, all addressed

Judge-path unit tests — the previously untested _judge_validator now has 6 mocked tests (test_nl_judge_*: first-attempt pass, feedback loop, judge.model override, judge.system_prompt override, no-structured-output fallback, fresh-agent-per-validation), matching the TS suite's coverage 1:1. The patch target (strands.agent.agent.Agent) correctly intercepts the plugin's lazy import.
GoalStopReason — now Literal["satisfied", "max_attempts", "timeout"].
import inspect — moved to module level.
Bonus: judge.py rendering helpers got real types (Message/ContentBlock/ToolResultContent instead of dict) — behavior unchanged, port fidelity unaffected.

Port fidelity of the new changes themselves

The Validator → @runtime_checkable Protocol conversion preserves the exact call shape ((response, agent) positionally, return bool | dict | ValidationOutcome, sync or async), so all TS-equivalent behavior in _fn_validator is untouched. The structured-logging reformat changes only the log string, not semantics.

Remaining (non-blocking)

resume_prompt_template is still a frozen Callable — same forward-compat argument as Validator, worth doing while the API is new (the /strands review bot flagged this too).
CI note: all Python unit-test jobs are green across 3.10–3.14 / linux+windows (the CANCELLED entries are superseded runs); the label-size check failure is labeler noise unrelated to this change.

Good to go from a port-fidelity standpoint. 🚢

Port the GoalLoop iterative-refinement plugin from TypeScript to Python. The plugin validates agent responses against a goal (NL string or programmatic validator) and loops with feedback until satisfied. Includes 35 unit tests and a dual-language documentation page.

TypeScript examples must live in sibling .ts files and be included via --8<-- directives, not inlined in MDX. Created goal-loop.ts and goal-loop_imports.ts with proper snippet regions. Updated docs-writer skill to make this convention unmissable: added CRITICAL callout in Step 3b and a new top-level Gotcha.

All three skills (writer, reviewer, audit) now enforce: - TypeScript is never inlined in MDX - Imports live in a separate _imports.ts file with per-example regions - Every TS example must include both imports and body snippets - A body-only include missing imports fails review

Replace ASCII box-drawing diagram with a mermaid flowchart. Add mermaid requirement to docs-writer and docs-reviewer skills.

Both were plain facts that belong as inline prose, not visually loud admonitions. The caution described behavior the plugin already warns about; the note was just context.

The callout sparing-use rule already lives in mdx-authoring.md. Remove redundant restatements from writer/reviewer skills and instead point to the existing guidance at the right moments.

- Reflow goal-loop.mdx prose to fill lines to ~80-90 chars - Remove language-specific param from heading ("Stateless Retries") - Restructure reviewer skill: split monolithic Constraints bullet into separate dimensions (Voice Stack, Multi-Language, Terminology, Code Examples with site conventions, Readability, Type Alignment) - Add heading language-neutrality rule to writer Step 4

Prose outside tabs must be language-neutral. Replaced Python-specific parameter names (preserve_context=False, max_attempts, stop_reason, last_result()) with plain English equivalents. Updated writer skill to make language-neutral shared prose the top-level rule.

…ator - Decompose build_judge_prompt into small named helpers instead of nested loops - Rename _nl_validator to _judge_validator for clarity - Add integration tests mirroring the TS integ suite (standard loop + preserve_context=False) - Type WeakSet/WeakKeyDictionary with Agent instead of Any

- Reject timeout <= 0 (was allowing 0 which causes immediate timeout) - Use unicode ellipsis in truncation to match TS output format

- Include system_prompt in Ralph-mode snapshot (restores TS parity) - Use Literal type for GoalStopReason instead of bare str - Move `import inspect` to module level - Fix mypy: type judge helpers with Message/ContentBlock/ToolResultContent - Add NL judge unit tests (construction, feedback, model/prompt overrides, fallback path, fresh-agent-per-validation) - Remove unused pytest import from integ tests

…ment - Convert Validator from Callable alias to Protocol with **kwargs for forward-compatible extensibility - Fix logger.warning to use structured format (plugin=<%s>, error=<%s> |) - Use full GoalResult/GoalAttempt equality assertions instead of per-field - Add comment explaining Python snapshot preset divergence from TS

Replace 'plugins' (not in the collection schema) with 'event-loop'.

github-actions · 2026-06-12T21:04:10Z

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Changed pages:

user-guide/concepts/plugins/goal-loop

Updated at: 2026-06-12T22:15:28.370Z

- Remove hook/event implementation details from "How It Works" section - Convert JUDGE_SYSTEM_PROMPT to triple-quoted string for readability - Extract TS word_count_validator into a named function (matches Python) - Normalize variable naming to `plugin` in "Inspecting Results" examples - Replace Spanish resume prompt examples with English - Use <Syntax> component for language-specific inline terms - Add prompt-authoring tag to link goal-loop with steering - Add explanatory comments for WeakKeyDictionary and WeakSet usage

The "start over from scratch" prompt shows a real reason to customize — diverging from the default incremental-fix behavior — rather than restating the default in slightly different words.

github-actions · 2026-06-12T22:17:03Z

+when you have actionable feedback for the next attempt.
+"""
+
+@runtime_checkable


Issue: ruff format --check fails on this file, so the ci.yml merge gate will block. The PR checklist marks hatch run prepare as run, but formatting isn't applied here.

$ ruff format --check src/strands/vended_plugins/goal/plugin.py Would reformat: src/strands/vended_plugins/goal/plugin.py

Three spots need it:

missing blank line before @runtime_checkable class Validator (here, L79)

missing blank line before GoalStopReason = ... (L97)

the __call__ signature (L91-93) collapses to a single line under the 120-char limit

Suggestion: Run hatch fmt --formatter (or ruff format) and commit the result. ruff check and mypy are both clean — it's only the formatter.

github-actions · 2026-06-12T22:17:04Z

+                return
+
+            elapsed = time.monotonic() - run.start_time
+            if elapsed >= self._timeout:


Issue: The timeout is checked before the response is validated, so an attempt whose response would satisfy the goal is reported as passed=False, stop_reason="timeout" if the budget elapsed during the model call that produced it. The successful result is computed-then-discarded.

Suggestion (non-blocking): Consider validating first and only falling back to "timeout" when validation fails — that way a passing final attempt still resolves as "satisfied". If reporting timeout even on a would-pass response is intentional (strict wall-clock budget), a one-line comment here would make that deliberate choice clear to the next reader.

github-actions · 2026-06-12T22:17:05Z

Assessment: Comment (one merge-gate fix before merge)

Clean, well-documented port with strong test coverage. I verified the suite locally — 41/41 unit tests pass, mypy is clean, and ruff check (lint) is clean. The prior review history (port fidelity, Validator Protocol, structured logging, full-object assertions, snapshot-preset docs) is all addressed, so I focused only on what's new.

What I found

Formatting gate (worth fixing before merge): ruff format --check fails on plugin.py — see the inline comment. CI's ci.yml runs this, so it'll block merge despite the hatch run prepare checkbox. One ruff format run clears it.
Timeout semantics (suggestion): timeout is checked before validation, so a final response that would pass is reported as timeout/not-passed. Inline note suggests validating first or documenting the choice.

Only the formatter is blocking; everything else is non-blocking. Nice work — the docs page and judge/validator separation read really well.

github-actions Bot added the size/xl label Jun 11, 2026

notowen333 requested a deployment to manual-approval June 11, 2026 18:11 — with GitHub Actions Waiting

notowen333 had a problem deploying to manual-approval June 11, 2026 18:11 — with GitHub Actions Error

notowen333 requested a deployment to manual-approval June 11, 2026 19:35 — with GitHub Actions Waiting

notowen333 had a problem deploying to manual-approval June 11, 2026 19:35 — with GitHub Actions Error

notowen333 requested a deployment to manual-approval June 11, 2026 19:35 — with GitHub Actions Waiting

github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026

notowen333 requested a deployment to manual-approval June 11, 2026 19:38 — with GitHub Actions Waiting

notowen333 had a problem deploying to manual-approval June 11, 2026 19:38 — with GitHub Actions Error

notowen333 requested a deployment to manual-approval June 11, 2026 19:38 — with GitHub Actions Waiting

github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026

notowen333 requested a deployment to manual-approval June 11, 2026 19:43 — with GitHub Actions Waiting

notowen333 had a problem deploying to manual-approval June 11, 2026 19:43 — with GitHub Actions Error

notowen333 requested a deployment to manual-approval June 11, 2026 19:43 — with GitHub Actions Waiting

github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026

notowen333 requested a deployment to manual-approval June 11, 2026 19:45 — with GitHub Actions Waiting

notowen333 had a problem deploying to manual-approval June 11, 2026 19:45 — with GitHub Actions Error

notowen333 requested a deployment to manual-approval June 11, 2026 19:45 — with GitHub Actions Waiting

github-actions Bot removed the size/xl label Jun 11, 2026

github-actions Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread strands-py/tests/strands/vended_plugins/goal/test_plugin.py Outdated

github-actions Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py Outdated

github-actions Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py

github-actions Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py Outdated

notowen333 added 15 commits June 12, 2026 16:56

fix: use mermaid for diagrams, update skills to enforce it

f3652e2

Replace ASCII box-drawing diagram with a mermaid flowchart. Add mermaid requirement to docs-writer and docs-reviewer skills.

fix: remove unnecessary callout boxes from goal-loop page

c083530

Both were plain facts that belong as inline prose, not visually loud admonitions. The caution described behavior the plugin already warns about; the note was just context.

fix: consolidate callout guidance to mdx-authoring.md reference

a470236

The callout sparing-use rule already lives in mdx-authoring.md. Remove redundant restatements from writer/reviewer skills and instead point to the existing guidance at the right moments.

fix: use proper Agent type instead of Any in weakref collections

caf7c9e

fix: align Python GoalLoop validation behavior with TypeScript

5c0d918

- Reject timeout <= 0 (was allowing 0 which causes immediate timeout) - Use unicode ellipsis in truncation to match TS output format

refactor: rename 'raw' variable to 'result' in validator normalization

08e7293

fix(site): use valid schema tag in goal-loop frontmatter

0dc0bf7

Replace 'plugins' (not in the collection schema) with 'event-loop'.

zastrowm reviewed Jun 12, 2026

View reviewed changes

notowen333 added 2 commits June 12, 2026 18:07

fix(docs): use more illustrative custom resume prompt example

cf3935e

The "start over from scratch" prompt shows a real reason to customize — diverging from the default incremental-fix behavior — rather than restating the default in slightly different words.

github-actions Bot reviewed Jun 12, 2026

View reviewed changes

zastrowm approved these changes Jun 13, 2026

View reviewed changes

Conversation

notowen333 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Public API Changes

Exported symbols

GoalLoop constructor parameters

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Review Summary

Uh oh!

github-actions Bot commented Jun 12, 2026

Re-review — commit 7d855d0

Uh oh!

agent-of-mkmeral commented Jun 12, 2026

Re-check at head 7d855d0 (covers 930c808 + 7d855d0)

✅ The one real fidelity gap is fixed (Ralph-mode system_prompt rewind)

✅ Minor items from my review, all addressed

Port fidelity of the new changes themselves

Remaining (non-blocking)

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation Preview Ready

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

notowen333 commented Jun 11, 2026 •

edited

Loading

Re-review — commit `7d855d0`

Re-check at head `7d855d0` (covers `930c808` + `7d855d0`)

✅ The one real fidelity gap is fixed (Ralph-mode `system_prompt` rewind)

github-actions Bot commented Jun 12, 2026 •

edited

Loading