Skip to content

fix(gateway): add content-based session hash fallback for non-Codex clients#1428

Open
YanzheL wants to merge 4 commits intoWei-Shaw:mainfrom
YanzheL:fix/openai-gateway-content-session-hash-fallback
Open

fix(gateway): add content-based session hash fallback for non-Codex clients#1428
YanzheL wants to merge 4 commits intoWei-Shaw:mainfrom
YanzheL:fix/openai-gateway-content-session-hash-fallback

Conversation

@YanzheL
Copy link
Copy Markdown
Contributor

@YanzheL YanzheL commented Apr 1, 2026

Summary

  • Add content-based fallback (tier 4) to OpenAIGatewayService.GenerateSessionHash for sticky session routing when no explicit session signals are provided
  • New helper deriveOpenAIContentSessionSeed extracts a stable seed from request body content (model + tools + system + first user message)
  • 23 new tests covering both Chat Completions and Responses API formats, JSON canonicalization, and priority ordering

Closes #1421

Problem

Non-Codex clients using the Chat Completions API without sending session_id, conversation_id, or prompt_cache_key get "" from GenerateSessionHash. This causes SelectAccountWithScheduler to fall back to random load-balanced routing, making prompt caching impossible (cache_read_input_tokens always 0).

Solution

Added a 4th priority tier to GenerateSessionHash that derives a stable content-based seed from the request body when all explicit signals are absent:

Priority:
 1. Header: session_id          (Codex CLI)
 2. Header: conversation_id     (Codex CLI)
 3. Body:   prompt_cache_key    (Codex CLI)
 4. Body:   content-based seed  ← NEW (non-Codex clients)

The seed includes only fields constant across conversation turns:

  • model — always the same per conversation
  • tools / functions — tool definitions don't change mid-conversation
  • instructions — Responses API system prompt
  • system / developer messages — stable system prompts
  • First user message — the conversation opener

JSON fragments (tools, functions, structured content) are canonicalized via the existing normalizeCompatSeedJSON to handle whitespace/key-order differences.

Files Changed

File Change
openai_content_session_seed.go NEWderiveOpenAIContentSessionSeed helper (~105 lines)
openai_gateway_service.go +3 lines — add tier 4 fallback call + comment
openai_content_session_seed_test.go NEW — 20 unit tests for the seed helper
openai_gateway_service_test.go +54 lines — 3 integration tests for content fallback

Design Decisions

  1. No client identity mixing — Unlike GatewayService.GenerateSessionHash which mixes in ClientIP/UserAgent/APIKeyID, the OpenAI content fallback is purely content-based. Two different users with identical prompts route to the same account, maximizing prompt cache hits.

  2. JSON canonicalization — Reuses existing normalizeCompatSeedJSON (json.Unmarshal → json.Marshal) for tools, functions, and structured content to ensure key-order and whitespace differences don't produce different seeds.

  3. Responses API input support — Handles string input, role-based array input, and input_text typed items.

  4. Prefix compat_cs_ — Prevents collisions between content-derived seeds and explicit session IDs or compat_cc_ prompt cache keys.

Test Evidence

  • ✅ Build: go build ./... — exit 0
  • ✅ New tests: 23/23 pass
  • ✅ Existing GenerateSessionHash tests: 4/4 pass (zero regressions)
  • ✅ Full service package: all tests pass (35.5s)

Known Trade-offs

  • Hot account risk: Highly popular identical prompts could pin to one account. This is an intentional trade-off for cache locality — monitor after deployment.
  • WS behavior change: GenerateSessionHashWithFallback now prefers content seed over the generic fallbackSeed for first turns. The explicit seed still takes effect when content extraction yields nothing.

YanzheL added 4 commits April 2, 2026 00:11
…lients

When no explicit session signals (session_id, conversation_id, prompt_cache_key)
are provided, derive a stable session seed from the request body content
(model + tools + system prompt + first user message) to enable sticky routing
and prompt caching for non-Codex clients using the Chat Completions API.

This mirrors the content-based fallback already present in GatewayService.
GenerateSessionHash, adapted for the OpenAI gateway's request formats (both
Chat Completions messages and Responses API input).

JSON fragments are canonicalized via normalizeCompatSeedJSON to ensure
semantically identical requests produce the same seed regardless of
whitespace or key ordering.

Closes Wei-Shaw#1421
- 20 unit tests for deriveOpenAIContentSessionSeed covering:
  - Empty/nil inputs, model-only, stable across turns
  - Different model/system/first-user produce different seeds
  - Tools, functions, developer role, structured content
  - Responses API: input string, input array, instructions, input_text typed items
  - JSON canonicalization (whitespace/key-order insensitive)
  - Prefix presence, empty tools ignored, messages preferred over input
- 3 integration tests for GenerateSessionHash content fallback:
  - Content fallback produces stable hash
  - Explicit signals override content fallback
  - Empty body still returns empty hash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[fix] OpenAIGatewayService.GenerateSessionHash` lacks content-based fallback, causing cache misses for non-Codex clients like Claude Code

1 participant