Skip to content

fix: fail over codex accounts after retryable 2xx outcomes#223

Merged
mxyhi merged 1 commit intomainfrom
fix/codex-account-outcome-failover
Apr 16, 2026
Merged

fix: fail over codex accounts after retryable 2xx outcomes#223
mxyhi merged 1 commit intomainfrom
fix/codex-account-outcome-failover

Conversation

@mxyhi
Copy link
Copy Markdown
Owner

@mxyhi mxyhi commented Apr 16, 2026

Summary

This change fixes unpinned Codex account failover so account switching is decided from the final proxy AttemptOutcome, not only the first raw upstream HTTP status.

Previously, the auto-selected Codex account path would switch to the next account when the initial send failed or when the raw upstream response was non-2xx, but it stopped account failover as soon as the upstream returned a raw 2xx response. That meant retryable "fake success" cases discovered later in proxy post-processing, such as an empty chat completion after Codex-to-chat transformation, were returned as a final 502 instead of continuing to the next available Codex account.

This PR makes the unpinned Codex account pool behave like the intended account-pool policy:

  • continue to the next account for Success(non-2xx)
  • continue to the next account for Retryable
  • stop for Fatal and SkippedAuth

Root Cause

The old Codex failover path in retry_with_next_codex_account(...) decided whether to advance to the next account before finalize_attempt(...) ran. As a result, it only had access to the raw reqwest::Response and could not see retryable outcomes that are produced later by proxy response handling.

Fix

The failover loop now finalizes each attempted Codex account into a real AttemptOutcome first, then decides whether to continue account failover from that final outcome. This keeps upstream ordering unchanged and only changes the semantics inside the unpinned Codex account pool.

A regression test was added for the missing case: the first Codex account returns HTTP 200 but transforms into an empty chat completion, and the proxy must fail over to the next Codex account instead of returning 502 immediately.

Validation

  • cargo test -p token_proxy_core chat_request_failovers_to_next_codex_account_after_empty_2xx_response -- --nocapture
  • cargo test -p token_proxy_core codex_account -- --nocapture
  • cargo test -p token_proxy_core messages_request_failovers_to_next_kiro_account_before_next_upstream -- --nocapture
  • cargo fmt --all --check

@mxyhi mxyhi marked this pull request as ready for review April 16, 2026 11:16
@mxyhi mxyhi merged commit 958e8f0 into main Apr 16, 2026
1 check passed
@mxyhi mxyhi deleted the fix/codex-account-outcome-failover branch April 16, 2026 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant