Skip to content

fix(websocket): skip stale state merge after client-side compact#2266

Open
DragonFSKY wants to merge 3 commits intorouter-for-me:devfrom
DragonFSKY:fix/ws-compact-tool-output-mismatch
Open

fix(websocket): skip stale state merge after client-side compact#2266
DragonFSKY wants to merge 3 commits intorouter-for-me:devfrom
DragonFSKY:fix/ws-compact-tool-output-mismatch

Conversation

@DragonFSKY
Copy link
Copy Markdown
Contributor

Summary

  • After Codex CLI compact, the client sends a full conversation transcript as input (with compaction/compaction_summary items). normalizeResponseSubsequentRequest() was unconditionally merging this with stale lastRequest/lastResponseOutput, breaking function_call/function_call_output pairings and causing 400 errors ("No tool output found for function call").
  • Added inputContainsFullTranscript() heuristic to detect full transcripts (compaction items or assistant messages), bypassing the stale merge when present.
  • Normal incremental flow (user messages / function_call_output only) is unaffected.

Test plan

  • Unit tests: 5 new tests covering detection heuristic + merge bypass + incremental merge preservation
  • Production verification: deployed to staging, triggered compact via Codex CLI, confirmed full transcript detected, skipping stale merge (input items=127) in logs with no subsequent 400 errors

Fixes #2207

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the websocket response normalization logic that occurred after client-side conversation compaction. By intelligently detecting when a full conversation transcript is provided, the system now correctly avoids merging it with outdated state, preventing data corruption and ensuring the stability of function call interactions.

Highlights

  • Stale State Merge Fix: Addressed an issue where normalizeResponseSubsequentRequest() incorrectly merged stale lastRequest/lastResponseOutput with full conversation transcripts after client-side compaction, causing function_call/function_call_output pairing breakage and 400 errors.
  • Full Transcript Detection: Introduced an inputContainsFullTranscript() heuristic to identify full conversation transcripts (containing compaction items or assistant messages), allowing the system to bypass the problematic stale merge in these cases.
  • Incremental Flow Preservation: Ensured that the normal incremental conversation flow, involving only user messages or function call outputs, remains unaffected by these changes, preserving existing functionality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue where stale state was being merged with full conversation transcripts from the client after a compaction, leading to errors. The introduction of the inputContainsFullTranscript heuristic to detect and bypass this merge for full transcripts is a solid approach. The new unit tests are comprehensive and cover the detection logic, the merge bypass, and ensure that the normal incremental flow remains unaffected.

I've added a few suggestions to improve code structure and test maintainability, primarily by refactoring to use a switch statement and adopting a table-driven approach for some of the new tests.

Comment on lines +623 to +631
for _, item := range input.Array() {
t := item.Get("type").String()
if t == "message" && item.Get("role").String() == "assistant" {
return true
}
if t == "compaction" || t == "compaction_summary" {
return true
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and structure, you can refactor this loop to use a switch statement on the item's type. This makes the logic clearer and is more idiomatic in Go for handling multiple cases based on a string value.

Suggested change
for _, item := range input.Array() {
t := item.Get("type").String()
if t == "message" && item.Get("role").String() == "assistant" {
return true
}
if t == "compaction" || t == "compaction_summary" {
return true
}
}
for _, item := range input.Array() {
switch item.Get("type").String() {
case "message":
if item.Get("role").String() == "assistant" {
return true
}
case "compaction", "compaction_summary":
return true
}
}

Comment on lines +677 to +682
for _, typ := range []string{"compaction", "compaction_summary"} {
input := gjson.Parse(`[{"type":"message","role":"user","content":"hello"},{"type":"` + typ + `","encrypted_content":"summary"}]`)
if !inputContainsFullTranscript(input) {
t.Fatalf("expected full transcript for type=%s", typ)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve test clarity and maintainability, consider converting this test to a table-driven test using t.Run. This will create a separate sub-test for each compaction type, making it easier to identify which case fails if the test breaks in the future.

testCases := []struct {
		name string
		typ  string
	}{
		{name: "compaction", typ: "compaction"},
		{name: "compaction_summary", typ: "compaction_summary"},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			input := gjson.Parse(`[{"type":"message","role":"user","content":"hello"},{"type":"` + tc.typ + `","encrypted_content":"summary"}]`)
			if !inputContainsFullTranscript(input) {
				t.Errorf("expected full transcript for type=%s", tc.typ)
			}
		})
	}

Comment on lines +687 to +695
for _, raw := range []string{
`[{"type":"function_call_output","call_id":"call-1","output":"result"}]`,
`[{"type":"message","role":"user","content":"next question"}]`,
`[]`,
} {
if inputContainsFullTranscript(gjson.Parse(raw)) {
t.Fatalf("incremental input must not be detected as full transcript: %s", raw)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test can be improved by converting it into a table-driven test with t.Run. This approach makes the test more organized and provides clearer output on failures, as each input case will run as a distinct sub-test.

testCases := []struct {
		name  string
		input string
	}{
		{name: "function call output", input: `[{"type":"function_call_output","call_id":"call-1","output":"result"}]`},
		{name: "user message", input: `[{"type":"message","role":"user","content":"next question"}]`},
		{name: "empty input", input: `[]`},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			if inputContainsFullTranscript(gjson.Parse(tc.input)) {
				t.Errorf("incremental input must not be detected as full transcript: %s", tc.input)
			}
		})
	}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26bed8cc1b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +311 to +313
if inputContainsFullTranscript(nextInput) {
log.Infof("responses websocket: full transcript detected, skipping stale merge (input items=%d)", len(nextInput.Array()))
mergedInput = nextInput.Raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep appending history when new input includes assistant items

This branch now treats any payload with an assistant message as a "full transcript" and skips the merge entirely. That regresses valid response.append / follow-up response.create requests that append an assistant exemplar or repaired assistant turn: the upstream request will contain only the new items, and all prior context from lastRequest/lastResponseOutput is dropped. Assistant messages are still accepted input elsewhere in this repo, so this is not just a compaction-only shape.

Useful? React with 👍 / 👎.

Comment on lines +624 to +626
t := item.Get("type").String()
if t == "message" && item.Get("role").String() == "assistant" {
return true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recognize role-only assistant messages in the full-transcript check

inputContainsFullTranscript() only matches assistant items when type == "message", but the rest of the Responses pipeline accepts shorthand message objects with just role/content and no explicit type. In that case a replayed full transcript like [{"role":"assistant",...}] will fall through to the stale-merge path, reintroducing the duplicated history / broken tool-call pairing that this fix is trying to avoid.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and the focused tests around compact replay behavior.

I’m requesting changes for one correctness risk in the new detection heuristic:

  • Blocking: The new inputContainsFullTranscript() logic marks any assistant-role message as “full transcript,” which causes normalizeResponseSubsequentRequest() to skip merging prior state. That can drop required historical context in incremental (non-compact) requests that legitimately contain assistant items, resulting in behavior changes outside the compact scenario this PR targets.

Suggested adjustment

  • Narrow full-transcript detection to stronger replay signals (for example, compaction / compaction_summary), or gate assistant-message detection with additional conditions that distinguish replay from incremental append.

Tests

  • Please add one regression test where incremental input includes an assistant message but should still preserve prior merged history. Current tests cover compact replay and user/function_call_output incremental flows, but not this ambiguity.

Once this heuristic is tightened, I’m happy to re-review.

@DragonFSKY
Copy link
Copy Markdown
Contributor Author

Thanks, that was a valid regression risk.

I tightened the full-transcript detection so we now only treat compact replay markers (compaction / compaction_summary) as a signal to skip the stale merge. Assistant messages alone no longer bypass the merge path.

I also added a regression test covering incremental assistant input to make sure prior history is still merged in that case, while keeping the compact replay test in place.

Targeted verification:
go test ./sdk/api/handlers/openai -run 'Test(InputContainsFullTranscript|NormalizeSubsequentRequest)'

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abbff2f524

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +311 to +313
if inputContainsFullTranscript(nextInput) {
log.Infof("responses websocket: full transcript detected, skipping stale merge (input items=%d)", len(nextInput.Array()))
mergedInput = nextInput.Raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not drop merged history for backends that ignore compaction

When /v1/responses/ws is used with a non-Codex backend after a client-side compact, this branch now sends nextInput as-is and skips the lastRequest/lastResponseOutput merge. I checked the downstream request translators in internal/translator/gemini/openai/responses/gemini_openai-responses_request.go:118 and internal/translator/claude/openai/responses/claude_openai-responses_request.go:180: their switch handles message/function_call/function_call_output only, so compaction and compaction_summary items are silently discarded. In that scenario the upstream request loses the prior conversation/tool state entirely and only sees the fresh user items, whereas the pre-change merge path still preserved the old transcript for those providers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The follow-up correctly narrows replay detection to compaction markers, keeps incremental assistant-message flows on the merge path, and adds a Codex-only bypass for compact replays. The new tests cover both the bypass and the unsupported-downstream fallback, so I do not see a remaining blocking correctness issue in sdk/api/handlers/openai/openai_responses_websocket.go or sdk/api/handlers/openai/openai_responses_websocket_test.go.

GitHub checks are green, so this looks ready.

This is an automated Codex review result and still requires manual verification by a human reviewer.

@luispater luispater added the codex label Apr 5, 2026
@luispater
Copy link
Copy Markdown
Collaborator

This PR is approved but currently has merge conflicts. Please change the base branch to dev, resolve the conflicts, and then wait for merge.

@KajiMaCN
Copy link
Copy Markdown

KajiMaCN commented Apr 7, 2026

This PR is approved but currently has merge conflicts. Please change the base branch to dev, resolve the conflicts, and then wait for merge.

@DragonFSKY @luispater

I pulled this branch over a week ago, and it had been working fine up until today. But now the 400 error has suddenly reappeared:

{
  "type": "error",
  "status": 400,
  "error": {
    "message": "No tool call found for custom tool call output with call_id call_xxx.",
    "type": "invalid_request_error",
    "param": "input",
    "code": null
  }
}

What seems a bit different from before is that if I ask Codex to modify the text, the content in the document does get changed, but a 400 error appears in the Codex chat window, and GPT’s response message does not show up.

Also, this now happens in every conversation. Only during the first 2–3 seconds do normal messages appear. After that, it always turns into a 400 error. Codex’s reply also disappears, leaving only the error code shown above.

After I switched to the main branch and rebuilt the container locally, the 400 error disappeared.

Could you help check and confirm what might be causing this?

After a Codex CLI compact, the client sends a full conversation
transcript (with compaction items or assistant messages) as input.
Previously, normalizeResponseSubsequentRequest() unconditionally
merged this with stale lastRequest/lastResponseOutput, breaking
function_call/function_call_output pairings and causing 400 errors
("No tool output found for function call").

Add inputContainsFullTranscript() heuristic that detects compaction
items (type=compaction/compaction_summary) or assistant messages in
the input array, and bypasses the merge when a full transcript is
present.

Fixes router-for-me#2207
@DragonFSKY DragonFSKY force-pushed the fix/ws-compact-tool-output-mismatch branch from f91adbd to 4ca00f7 Compare April 7, 2026 06:25
@DragonFSKY DragonFSKY changed the base branch from main to dev April 7, 2026 06:25
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

if err := worktree.Checkout(&git.CheckoutOptions{Branch: branchRefName}); err != nil {
return fmt.Errorf("checkout branch %s: %w", branchRefName.String(), err)

P2 Badge Fall back when default-branch checkout hits unstaged changes

When branch is unset, EnsureRepository() now calls checkoutRemoteDefaultBranch() before pull, and this path returns an error immediately if worktree.Checkout(...) fails. In repositories with local unstaged edits (for example, after an interrupted write) and a branch switch to the remote default, this makes startup fail even though the subsequent pull logic explicitly treats git.ErrUnstagedChanges as non-fatal (“local changes win”). This regression can block token-store initialization for a recoverable state; handling ErrUnstagedChanges here the same way as pull would preserve prior behavior.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@DragonFSKY
Copy link
Copy Markdown
Contributor Author

@KajiMaCN Hi, thanks for the detailed report.

The 400 error you're seeing (No tool call found for custom tool call output with call_id) is caused by tool-call / tool-call-output pairing breakage in the websocket transcript — exactly the class of bug this PR targets.

The reason it re-appeared on this branch is that the branch was based on the old main and was missing several complementary fixes that have since landed on dev:

  • c1d75998 — handle transcript replacement after websocket compaction
  • d1c07a09 — add websocket tool call repair with caching (auto-patches orphaned tool-call outputs)
  • b6c6379b — fix repair logic for custom_tool_call specifically

These fixes work together with this PR's compaction-bypass logic to cover the full range of edge cases.

I've just rebased the branch onto the latest dev, so all of the above fixes are now included. Please re-pull and rebuild — the issue should be resolved.

If it still reproduces after the rebase, please share the request log (with LOG_LEVEL=debug) so I can trace the exact merge path.

@KajiMaCN
Copy link
Copy Markdown

KajiMaCN commented Apr 7, 2026

@KajiMaCN Hi, thanks for the detailed report.

The 400 error you're seeing (No tool call found for custom tool call output with call_id) is caused by tool-call / tool-call-output pairing breakage in the websocket transcript — exactly the class of bug this PR targets.

The reason it re-appeared on this branch is that the branch was based on the old main and was missing several complementary fixes that have since landed on dev:

  • c1d75998 — handle transcript replacement after websocket compaction
  • d1c07a09 — add websocket tool call repair with caching (auto-patches orphaned tool-call outputs)
  • b6c6379b — fix repair logic for custom_tool_call specifically

These fixes work together with this PR's compaction-bypass logic to cover the full range of edge cases.

I've just rebased the branch onto the latest dev, so all of the above fixes are now included. Please re-pull and rebuild — the issue should be resolved.

If it still reproduces after the rebase, please share the request log (with LOG_LEVEL=debug) so I can trace the exact merge path.

Understood. Thank you very much for your patient explanation and for fixing the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(codex): invalid_request_error after auto compact

3 participants