Skip to content

Conversation

@jeffmylife
Copy link

@jeffmylife jeffmylife commented Nov 23, 2025

TL;DR in code

const result = await generateText({
  model: openai('gpt-4o-mini'),
  prompt: 'Generate a text message: no markdown, under 160 characters',
  onStepFinish: async (step) => {
    const text = step.text;
    if (/[*_`]/.test(text) || text.length > 160) {
      return {
        continue: true,
        // temporarily inject feedback into message history
        messages: [{
          role: 'user',
          content: 'Remove markdown and keep under 160 characters.' 
        }]
      };
    }
    return { continue: false }; // success!
  },
  stopWhen: stepCountIs(5),
});

Background

My original motivation can be simplified into the above tl;dr snippet. I wanted to generate SMS text messages that don't break many assumptions like content length, avoiding keywords or special characters that you would never see in a real text message. For example, I wrote a regex that never let the model generate a markdown characters in the output since I can reasonably say we shouldn't text that. I even had some validators to check for Chinese characters since some models occasionally generate them.

The way I implemented it was by having a tool call "reviewMessage" which returned pass or fail with reasons. But the model had to generate a tool call for this! Sometimes the tool call's output didn't match the resulting text so I had a final validator outside the agent loop which means I just had to retry the whole function without feedback. It costs more and is more complex to have a tool call for something that should be deterministic. Of course, I could inject a tool call for this but while implementing it it smelled wrong.

Detailed Thought Process / Notes

So, I want a custom piece of code to decide if it's okay to continue or not during the onStepFinish that also has feedback on why we failed. Why within onStepFinish? I considered options likestopWhen where we could stop when we have a successfully validated message and using prepareStep to inject feedback. In the end, it was more intuitive for me to use onStepFinish since we're adjusting control flow. Perhaps a new function like onStepContinue would've made more sense here rather than extending onStepFinish.

Implementing the feature for generateText wasn't too challenging but I realized that we had to support streaming too. Then support the UI dependencies. Then generateObject and experimental_output.

The "gotcha" for implementing continuation for streamText in the UI was the fact that if a text is finished streaming and fails the onStepFinish validation, we have to restart the request which looks like the message just disappears after completion. I think this is appropriate but I'm not sure how to give developers control over this operation so I added an option to "Clear Step on Retry" with experimental_clearStep so that the default behavior of clearing is disabled.

The best part of this implementation in my opinion is that it sets up for validators with feedback on objects as well. If there's an object that couldn't get generated due to complex schema failures - such as zod validations - we give the developer control of how that information is brought to the model via the onStepFinsish's StepContinueResult's messages array.

Summary

This PR adds support for returning a StepContinueResult from the onStepFinish callback in generateText, streamText, and generateObject.

Key Changes

onStepFinish Callback Update: The callback can now return a StepContinueResult object.

  • { continue: true, messages: [...] }: Continues the generation loop, injecting the provided messages (feedback/corrections) as the next user step.
  • { continue: false }: Stops the generation loop immediately, even if there are pending tool calls.
  • void / undefined: Default behavior (continues if tool calls exist, stops otherwise).

generateText & streamText:

  • Added logic to handle the continuation result.
  • UI Handling: For streamText, when a retry happens (continuation), the previous failed step needs to be handled in the UI.
  • Added experimental_clearStep (default: true) to StepContinueResult. This allows developers to control whether the "failed" step should be cleared from the UI stream before the retry starts. This prevents the user from seeing the invalid attempt disappear confusingly unless desired.

generateObject:

  • Supports validation retries for structured data.
  • If zod validation fails, developers can catch the error in onStepFinish and return a continuation with a specific error message to guide the model to fix the JSON.

Manual Verification

Verified locally using the following test apps and test cases:

  1. generateText & streamText: Confirmed that returning continue: true triggers a new generation step with the provided feedback messages.
  2. generateObject: Verified that schema validation errors can be caught in onStepFinish and used to trigger a retry, successfully correcting the output in subsequent steps.
  3. UI Stream: Verified experimental_clearStep behavior. When true (default), the invalid step is cleared from the UI before the new stream starts.
  4. Automated Tests:
    • Added stream-text-continuation.test.ts (10 tests passed) covering retry limits, message injection, and stop conditions.
    • Updated generate-object.test.ts (33 tests passed) to verify retry-on-validation logic.
    • Ran full test suite for packages/ai: 88 test files passed.
  5. Created examples for confirming changes work with feature flags
    • examples/next-openai/app/test-on-step-finish-continuation/
    • examples/next-openai/app/test-object-continuation/

Checklist

  • Tests have been added / updated (for bug fixes / features)
  • Documentation has been added / updated (for bug fixes / features)
  • A patch changeset for relevant packages has been added (for bug fixes / features - run pnpm changeset in the project root)
  • I have reviewed this pull request (self-review)

Future Work

It's worth considering this the comment I made below about how we handle message history for messages failing validation: #10507 (comment)

Screenshots

This is test-on-step-finish-continuation with the clear step disabled so both messages are shown (highlighted is the resulting message). If the clear step is enabled (default) only the highlighted message remains.

Screenshot 2025-11-23 at 2 17 42 PM

This is test-object-continuation/ showing the a failed validation step with feedback.

Screenshot 2025-11-23 at 2 16 15 PM

- Fix condition to add assistant response to responseMessages when stepContinueResult.continue is true
- Add test case to verify assistant message is included in continuation without tool calls
- Fix TypeScript errors by checking Array.isArray before calling .some() on content
supportedUrls: await model.supportedUrls,
download,
});
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In generateObject with onStepFinish continuation and multiple retry attempts, continuation feedback messages from earlier attempts are discarded instead of accumulating. This causes the model to only receive the latest feedback message, losing context from previous failed attempts.

View Details
📝 Patch Details
diff --git a/packages/ai/src/generate-object/generate-object.ts b/packages/ai/src/generate-object/generate-object.ts
index 53de2c6b6..eb3ed5d6e 100644
--- a/packages/ai/src/generate-object/generate-object.ts
+++ b/packages/ai/src/generate-object/generate-object.ts
@@ -335,7 +335,7 @@ functionality that can be fully encapsulated in the provider.
 
         const initialMessages = standardizedPrompt.messages;
         let currentMessages: Array<ModelMessage> = [...initialMessages];
-        let nextStepContinuationMessages: Array<ModelMessage> = [];
+        let accumulatedContinuationMessages: Array<ModelMessage> = [];
 
         let result: string;
         let finishReason: FinishReason;
@@ -353,12 +353,11 @@ functionality that can be fully encapsulated in the provider.
         do {
           attemptCount++;
 
-          // Combine initial messages with continuation messages
+          // Combine initial messages with accumulated continuation messages
           const stepInputMessages = [
             ...currentMessages,
-            ...nextStepContinuationMessages,
+            ...accumulatedContinuationMessages,
           ];
-          nextStepContinuationMessages = []; // Clear after use
 
           const promptMessages = await convertToLanguageModelPrompt({
             prompt: {
@@ -530,8 +529,10 @@ functionality that can be fully encapsulated in the provider.
               'continue' in onStepFinishResult
             ) {
               if (onStepFinishResult.continue === true) {
-                // Store continuation messages for the next step's input
-                nextStepContinuationMessages = onStepFinishResult.messages;
+                // Accumulate continuation messages for the next step's input
+                accumulatedContinuationMessages.push(
+                  ...onStepFinishResult.messages,
+                );
                 shouldContinue = true;
               }
               // continue: false means stop

Analysis

Continuation feedback messages discarded on retry attempts in generateObject

What fails: In generateObject with onStepFinish continuation and multiple retry attempts (maxRetries > 0), continuation feedback messages from earlier failed attempts are discarded instead of accumulated. The model receives only the latest feedback message, losing context from previous validation failures.

How to reproduce:

When using generateObject with:

  • A schema that fails validation on the first attempt
  • An onStepFinish callback that returns continue: true with feedback messages
  • Multiple retry attempts before success

The expected behavior is:

Attempt 1: [initial_prompt] → invalid
Attempt 2: [initial_prompt + feedback_1] → invalid  
Attempt 3: [initial_prompt + feedback_1 + feedback_2] → success

Actual behavior (before fix):

Attempt 1: [initial_prompt] → invalid
Attempt 2: [initial_prompt + feedback_1] → invalid
Attempt 3: [initial_prompt + feedback_2] ← feedback_1 is LOST

Root cause: In packages/ai/src/generate-object/generate-object.ts lines 338 and 534:

  • Line 338: nextStepContinuationMessages initialized as empty array
  • Line 363 (old): nextStepContinuationMessages = [] clears after each use
  • Line 534 (old): nextStepContinuationMessages = onStepFinishResult.messages replaces instead of accumulating

This caused messages to be replaced rather than accumulated across iterations, losing earlier feedback.

The fix: Changed the logic to accumulate continuation messages across attempts:

  • Renamed nextStepContinuationMessages to accumulatedContinuationMessages for clarity
  • Removed the line that cleared messages after use
  • Changed line 534 from assignment to push() to accumulate messages: accumulatedContinuationMessages.push(...onStepFinishResult.messages)

This ensures the model receives full context from all previous validation failures, matching the pattern used in generateText with responseMessages accumulation and aligning with the documented behavior that "messages injected via continuation are added to message history."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see this a legit problem but it's probably best to give the developer control over the message history. The ideal solution would allow for accumulation, no accumulation, or direct control over the message history.

This highlights something else. Is it desirable to have a message history that preserves the validation errors? Illustration below:

Options

Current Behavior:

Message History: 
- UserMessage: ...
- AssistantMessage: <--validation success

Detailed History With Messages

Message History: 
- UserMessage ...
- AssistantMessage <--validation failed 
- UserMessage: "You failed bc of XYZ. Regenerate." <-- validation failure feedback
- AssistantMessage: <--validation success

Detailed History With Tool Injection

Detailed Message History With Inserted Tool: 
- UserMessage ...
- ToolMessage <-- validation failure feedback with original message
- AssistantMessage: <--validation success

Of these choices, it's probably best to consider a tool injection so that we can preserve everything! However, I'm biased to my use cases where I'm attempting to be token efficient and support models without tool calling prowess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant