fix: preserve responses continuation state for openai-compatible upstreams#2307
fix: preserve responses continuation state for openai-compatible upstreams#2307xpcrack wants to merge 1 commit intorouter-for-me:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the API's ability to manage conversational continuity for OpenAI-compatible upstreams. By introducing a caching mechanism for previous request and response states, the system can now intelligently merge these states into subsequent continuation requests. This ensures that the full conversational context is preserved and correctly transmitted, even when the upstream service does not inherently support direct continuation IDs, leading to a more robust and seamless user experience for multi-turn interactions. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature to preserve conversation state for OpenAI-compatible upstreams by caching response turns. The implementation correctly handles both streaming and non-streaming responses and includes good regression test coverage. My feedback focuses on improving the robustness of the new JSON manipulation logic by adding more comprehensive error handling, and I've also suggested a refactoring to improve code organization and reduce coupling between different parts of the handler logic.
| normalized, errDelete := sjson.DeleteBytes(rawJSON, "previous_response_id") | ||
| if errDelete != nil { | ||
| normalized = bytes.Clone(rawJSON) | ||
| } |
There was a problem hiding this comment.
The error from sjson.DeleteBytes is not handled correctly. If sjson.DeleteBytes fails, the code proceeds with a clone of the original rawJSON. This means the previous_response_id field is not removed and will be sent to the upstream, which this change is intended to prevent. This could lead to unexpected behavior or errors from upstreams that don't support this field. The error should be properly handled and returned to the caller.
normalized, errDelete := sjson.DeleteBytes(rawJSON, "previous_response_id")
if errDelete != nil {
return nil, &interfaces.ErrorMessage{
StatusCode: http.StatusBadRequest,
Error: fmt.Errorf("failed to remove previous_response_id from request: %w", errDelete),
}
}| if !gjson.GetBytes(normalized, "model").Exists() { | ||
| modelName := strings.TrimSpace(gjson.GetBytes(lastRequest, "model").String()) | ||
| if modelName != "" { | ||
| normalized, _ = sjson.SetBytes(normalized, "model", modelName) | ||
| } | ||
| } | ||
| if !gjson.GetBytes(normalized, "instructions").Exists() { | ||
| instructions := gjson.GetBytes(lastRequest, "instructions") | ||
| if instructions.Exists() { | ||
| normalized, _ = sjson.SetRawBytes(normalized, "instructions", []byte(instructions.Raw)) | ||
| } | ||
| } |
There was a problem hiding this comment.
Errors from sjson.SetBytes (line 127) and sjson.SetRawBytes (line 133) are ignored when setting the model and instructions from the previous request. If these operations fail, the normalized request might be missing a required model field, leading to a failure when the request is sent to the upstream. These errors should be handled to ensure request integrity.
| for _, payload := range websocketJSONPayloadsFromChunk(chunk) { | ||
| if gjson.GetBytes(payload, "type").String() != wsEventTypeCompleted { | ||
| continue | ||
| } | ||
| h.rememberCompletedResponse(requestJSON, payload) | ||
| } |
There was a problem hiding this comment.
This function rememberCompletedResponseFromChunk uses websocketJSONPayloadsFromChunk and wsEventTypeCompleted which are defined in openai_responses_websocket.go. This creates a tight coupling between the turn state logic and websocket-specific implementation details. Furthermore, the names are misleading as this logic is also used for HTTP SSE streaming.
To improve modularity and clarity, consider moving these shared components to a more neutral location. For example, websocketJSONPayloadsFromChunk could be renamed to parseSSEPayloads and moved to this file (openai_responses_turn_state.go) or a new sse.go file within the package. Similarly, wsEventTypeCompleted could be renamed to something like SSEEventResponseCompleted.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: caac256b33
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| previousResponseID := strings.TrimSpace(gjson.GetBytes(rawJSON, "previous_response_id").String()) | ||
| if previousResponseID == "" { | ||
| return rawJSON, nil | ||
| } | ||
| lastRequest, lastResponseOutput, ok := h.turnState.load(previousResponseID) |
There was a problem hiding this comment.
Namespace continuation cache by auth context
Continuation lookup is keyed only by previous_response_id, so the handler applies cached turn state without checking which authenticated client created it. In a multi-tenant deployment, if a response ID is exposed across users (for example via logs or shared tooling), a different user can attach that ID and have another tenant’s prior input/output merged into their request, leaking prompt context and corrupting conversation state. Scope cache keys by auth/client identity before loading and merging.
Useful? React with 👍 / 👎.
| c.entries.Store(responseID, responsesTurnStateEntry{ | ||
| request: bytes.Clone(requestJSON), | ||
| output: bytes.Clone(outputJSON), | ||
| expire: time.Now().Add(responsesTurnStateTTL), | ||
| }) |
There was a problem hiding this comment.
Add proactive eviction for turn-state cache
The cache stores every completed response with a TTL, but expired entries are only removed when that exact key is loaded later. Since many response IDs are never referenced again, stale entries remain in sync.Map indefinitely and memory usage will grow over time under normal traffic. Add proactive cleanup (background sweep or size-bounded eviction) so one-shot turns do not accumulate permanently.
Useful? React with 👍 / 👎.
luispater
left a comment
There was a problem hiding this comment.
Summary:
This PR improves /v1/responses continuation for OpenAI-compatible upstreams by caching completed turns and reconstructing continuation input when previous_response_id is present. The core behavior and targeted regression tests are solid.
Key findings:
- Blocking: The new turn-state cache is unbounded in practice. Entries expire logically, but cleanup only happens on exact-key reads. Since response IDs are generally not read again after expiration, memory can grow without bound over time.
- Non-blocking: Continuation normalization currently requires
inputto be an array and may reject valid string shorthand input in cache-hit flows. - Non-blocking: Missing cache-eviction/boundedness tests.
Test plan:
- Reviewed PR metadata, file-level diff, and full patch via
gh/script context collection. - Reviewed CI checks (intentionally ignored
translator-path-guardper instruction). - Did not run local tests or checkout the PR branch in this review pass.
Requested changes:
Please add a deterministic cache cleanup/bounding strategy (for example periodic sweep, size cap + eviction policy, or both) and corresponding tests before approval.
Summary
/v1/responsescontinuation state for openai-compatible upstreams that do not supportprevious_response_idinputinto a user message before merging continuation inputTesting
go test ./sdk/api/handlers/openai -count=1/v1/responsescontinuation against a livegpt-5.4openai-compatible upstream for both non-streaming and streaming flows