Skip to content

fix: preserve responses continuation state for openai-compatible upstreams#2307

Open
xpcrack wants to merge 1 commit intorouter-for-me:mainfrom
xpcrack:codex/fix-openai-responses-http-continuation
Open

fix: preserve responses continuation state for openai-compatible upstreams#2307
xpcrack wants to merge 1 commit intorouter-for-me:mainfrom
xpcrack:codex/fix-openai-responses-http-continuation

Conversation

@xpcrack
Copy link
Copy Markdown

@xpcrack xpcrack commented Mar 24, 2026

Summary

  • preserve HTTP /v1/responses continuation state for openai-compatible upstreams that do not support previous_response_id
  • cache completed response turns for both non-streaming and streaming responses
  • normalize string shorthand input into a user message before merging continuation input
  • add regression coverage for HTTP continuation and streaming completion caching

Testing

  • go test ./sdk/api/handlers/openai -count=1
  • verified /v1/responses continuation against a live gpt-5.4 openai-compatible upstream for both non-streaming and streaming flows

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the API's ability to manage conversational continuity for OpenAI-compatible upstreams. By introducing a caching mechanism for previous request and response states, the system can now intelligently merge these states into subsequent continuation requests. This ensures that the full conversational context is preserved and correctly transmitted, even when the upstream service does not inherently support direct continuation IDs, leading to a more robust and seamless user experience for multi-turn interactions.

Highlights

  • Continuation State Preservation: Implemented a mechanism to preserve HTTP /v1/responses continuation state, especially for OpenAI-compatible upstreams that do not natively support previous_response_id.
  • Response Turn Caching: Introduced caching for completed response turns, covering both non-streaming and streaming response flows, to maintain conversational context.
  • Input Normalization: Added logic to normalize string shorthand input into a structured user message before merging it into continuation requests.
  • Regression Coverage: Included new regression tests to ensure the correct functioning of HTTP continuation and streaming completion caching.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature to preserve conversation state for OpenAI-compatible upstreams by caching response turns. The implementation correctly handles both streaming and non-streaming responses and includes good regression test coverage. My feedback focuses on improving the robustness of the new JSON manipulation logic by adding more comprehensive error handling, and I've also suggested a refactoring to improve code organization and reduce coupling between different parts of the handler logic.

Comment on lines +112 to +115
normalized, errDelete := sjson.DeleteBytes(rawJSON, "previous_response_id")
if errDelete != nil {
normalized = bytes.Clone(rawJSON)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The error from sjson.DeleteBytes is not handled correctly. If sjson.DeleteBytes fails, the code proceeds with a clone of the original rawJSON. This means the previous_response_id field is not removed and will be sent to the upstream, which this change is intended to prevent. This could lead to unexpected behavior or errors from upstreams that don't support this field. The error should be properly handled and returned to the caller.

 normalized, errDelete := sjson.DeleteBytes(rawJSON, "previous_response_id")
 if errDelete != nil {
  return nil, &interfaces.ErrorMessage{
   StatusCode: http.StatusBadRequest,
   Error:      fmt.Errorf("failed to remove previous_response_id from request: %w", errDelete),
  }
 }

Comment on lines +124 to +135
if !gjson.GetBytes(normalized, "model").Exists() {
modelName := strings.TrimSpace(gjson.GetBytes(lastRequest, "model").String())
if modelName != "" {
normalized, _ = sjson.SetBytes(normalized, "model", modelName)
}
}
if !gjson.GetBytes(normalized, "instructions").Exists() {
instructions := gjson.GetBytes(lastRequest, "instructions")
if instructions.Exists() {
normalized, _ = sjson.SetRawBytes(normalized, "instructions", []byte(instructions.Raw))
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Errors from sjson.SetBytes (line 127) and sjson.SetRawBytes (line 133) are ignored when setting the model and instructions from the previous request. If these operations fail, the normalized request might be missing a required model field, leading to a failure when the request is sent to the upstream. These errors should be handled to ensure request integrity.

Comment on lines +179 to +184
for _, payload := range websocketJSONPayloadsFromChunk(chunk) {
if gjson.GetBytes(payload, "type").String() != wsEventTypeCompleted {
continue
}
h.rememberCompletedResponse(requestJSON, payload)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function rememberCompletedResponseFromChunk uses websocketJSONPayloadsFromChunk and wsEventTypeCompleted which are defined in openai_responses_websocket.go. This creates a tight coupling between the turn state logic and websocket-specific implementation details. Furthermore, the names are misleading as this logic is also used for HTTP SSE streaming.

To improve modularity and clarity, consider moving these shared components to a more neutral location. For example, websocketJSONPayloadsFromChunk could be renamed to parseSSEPayloads and moved to this file (openai_responses_turn_state.go) or a new sse.go file within the package. Similarly, wsEventTypeCompleted could be renamed to something like SSEEventResponseCompleted.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: caac256b33

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +143 to +147
previousResponseID := strings.TrimSpace(gjson.GetBytes(rawJSON, "previous_response_id").String())
if previousResponseID == "" {
return rawJSON, nil
}
lastRequest, lastResponseOutput, ok := h.turnState.load(previousResponseID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Namespace continuation cache by auth context

Continuation lookup is keyed only by previous_response_id, so the handler applies cached turn state without checking which authenticated client created it. In a multi-tenant deployment, if a response ID is exposed across users (for example via logs or shared tooling), a different user can attach that ID and have another tenant’s prior input/output merged into their request, leaking prompt context and corrupting conversation state. Scope cache keys by auth/client identity before loading and merging.

Useful? React with 👍 / 👎.

Comment on lines +54 to +58
c.entries.Store(responseID, responsesTurnStateEntry{
request: bytes.Clone(requestJSON),
output: bytes.Clone(outputJSON),
expire: time.Now().Add(responsesTurnStateTTL),
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add proactive eviction for turn-state cache

The cache stores every completed response with a TTL, but expired entries are only removed when that exact key is loaded later. Since many response IDs are never referenced again, stale entries remain in sync.Map indefinitely and memory usage will grow over time under normal traffic. Add proactive cleanup (background sweep or size-bounded eviction) so one-shot turns do not accumulate permanently.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:
This PR improves /v1/responses continuation for OpenAI-compatible upstreams by caching completed turns and reconstructing continuation input when previous_response_id is present. The core behavior and targeted regression tests are solid.

Key findings:

  • Blocking: The new turn-state cache is unbounded in practice. Entries expire logically, but cleanup only happens on exact-key reads. Since response IDs are generally not read again after expiration, memory can grow without bound over time.
  • Non-blocking: Continuation normalization currently requires input to be an array and may reject valid string shorthand input in cache-hit flows.
  • Non-blocking: Missing cache-eviction/boundedness tests.

Test plan:

  • Reviewed PR metadata, file-level diff, and full patch via gh/script context collection.
  • Reviewed CI checks (intentionally ignored translator-path-guard per instruction).
  • Did not run local tests or checkout the PR branch in this review pass.

Requested changes:
Please add a deterministic cache cleanup/bounding strategy (for example periodic sweep, size cap + eviction policy, or both) and corresponding tests before approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants