Skip to content

refactor(litellm): unified OAuth handlers + native ChatGPT custom handler#187

Merged
PurpleCHOIms merged 3 commits into
mainfrom
fix/litellm-auth-refactor
May 9, 2026
Merged

refactor(litellm): unified OAuth handlers + native ChatGPT custom handler#187
PurpleCHOIms merged 3 commits into
mainfrom
fix/litellm-auth-refactor

Conversation

@PurpleCHOIms
Copy link
Copy Markdown
Member

Summary

Reworks how Decepticon authenticates with subscription LLM providers via the LiteLLM proxy. Six in-process handlers now share a common token store, ChatGPT moves off the LiteLLM-native chatgpt provider onto a Decepticon custom handler that reads the Codex CLI credential store directly, and the agent-side factory hardens its credentials inventory so dead OAuth flags and mis-pasted API keys no longer poison the fallback chain.

Replaces in-flight #149.

Architecture

Shared OAuth token store (config/oauth_token_store.py)

  • read_json_file / write_json_atomic (temp+rename, 0o600)
  • FileBackedCache keyed on (mtime, size) so a host-side credential rotation is picked up without container restart
  • decode_jwt_payload / is_jwt_expired (60s skew) and is_timestamp_expired (5min default buffer)
  • oauth_refresh_request raises actionable AuthenticationError with the upstream body
  • with_retry_on_401 wraps an outbound call so an invalidated token triggers a single in-process force_refresh + replay
  • 32 unit tests cover atomic write, mtime cache, JWT decode, expiry, refresh request, retry semantics

5 existing handlers refactored

claude_code, copilot, gemini, grok, perplexity all import FileBackedCache, oauth_refresh_request, write_json_atomic, with_retry_on_401. Per-handler atomic-write / JWT-decode / mtime helpers removed. Outbound HTTP wrapped in with_retry_on_401.

Native ChatGPT OAuth via custom handler

  • New config/codex_chatgpt_handler.py reads ~/.codex/auth.json directly (CODEX_AUTH_PATH / CODEX_HOME honored) — no parallel ~/.config/litellm/chatgpt store, no manual codex login re-import
  • New config/auth_handler.py dispatches auth/<slug> by prefix (claude- → claude_code, gpt- → codex_chatgpt)
  • LiteLLM main.py:2561 short-circuits gpt-* slugs to native OpenAI regardless of custom_llm_provider — workaround via codex-oauth/oauth-gpt-* sentinel; the handler strips oauth- before sending upstream
  • Aggregates response.output_text.delta SSE deltas when response.completed.output is empty (Codex backend behavior)
  • Defaults instructions to a Codex CLI prompt when no system message is present

Removed dead code:

  • _patch_chatgpt_responses_text_aggregation in litellm_startup.py (LiteLLM-native chatgpt provider no longer on the request path)
  • _model_uses_chatgpt_responses_api in factory.py (custom handler does Chat-Completions → Responses-API conversion internally)
  • inline _select_auth_handler / _AuthDispatcher in litellm_startup.py (moved to auth_handler.py)

Container plumbing

  • containers/litellm.Dockerfile copies the new modules in dependency order
  • docker-compose.yml: LITELLM_CHATGPT_TOKEN_DIR mount → ${CODEX_AUTH_VOLUME:-/dev/null}:/root/.codex/auth.json (rw); drop :ro from Claude credentials; export CODEX_AUTH_PATH
  • Both Claude and Codex paths are now mounted read-only into the langgraph service so the LLM factory can verify file presence

Launcher Codex auth detection

  • subscriptionMethod.AbsolutePath field for handlers whose credential lives at a fixed path (rather than ~/.config/<dir>/<file>)
  • start.go probes ~/.codex/auth.json and exports CODEX_AUTH_VOLUME, mirroring CLAUDE_CREDENTIALS_VOLUME
  • env.example rewrites the ChatGPT OAuth section to point at ~/.codex/auth.json + codex login

Credential validation hardening

  • _is_real_key(value, method=None): minimum 24 chars, rejects empty + launcher template (your-…-key-here) + obvious placeholder tokens (placeholder, not-used, dummy, fake, example); when method is given, enforces vendor-prefix hints (sk-ant-, sk-, AIza, xai-, gsk_, sk-or-, nvapi-, ghp_/github_pat_/gho_/ghs_) — catches mis-pasted keys before they propagate
  • _oauth_credentials_present(method): reads the host-side OAuth credential file and validates non-empty JSON. /dev/null fallbacks read empty and fail closed. Without this, a stale DECEPTICON_AUTH_*=true after codex logout placed OAuth in every fallback chain and generated one 401 per request
  • _resolve_credentials requires both the truthy boolean AND the file presence for OAuth methods
  • has_subscription_routes() lets litellm_startup regenerate the dynamic config when only DECEPTICON_AUTH_* is set (no DECEPTICON_MODEL* override)
  • merge_dynamic_models skips the API-key validator for slugs already added by the subscription path
  • validate_model_name rejects all subscription provider prefixes (auth, gemini-sub, copilot, grok-sub, pplx-sub) on the API-key registration path

Model registry

Adds auth/gpt-5.4-mini as the LOW tier for openai_oauth (per codex-rs/models-manager/models.json, May 2026) plus the matching fallback chain entry auth/gpt-5.4auth/gpt-5.4-mini.

Verified live

  • LiteLLM proxy direct: auth/claude-sonnet-4-6 / auth/gpt-5.5 / auth/gpt-5.4 / auth/gpt-5.4-mini all return correct content
  • vulnresearch agent end-to-end through Claude OAuth and ChatGPT OAuth; chain composition mirrors the credentials inventory (only detected methods land in the chain)
  • Tested credential file deletion / stale flag scenarios — OAuth method correctly drops out of the chain

Test plan

  • uv run ruff check . clean
  • uv run ruff format --check . clean
  • uv run basedpyright --level error 0 errors
  • uv run pytest -n auto -q -m "not slow" 793 passed (32 new oauth_token_store + 11 new factory + 5 new dynamic_config)
  • cd clients/launcher && go vet ./... && go test ./... all packages OK
  • Live: ChatGPT OAuth (auth/gpt-5.5, auth/gpt-5.4, auth/gpt-5.4-mini) end-to-end via vulnresearch agent
  • Live: Claude Code OAuth (auth/claude-sonnet-4-6) end-to-end via vulnresearch agent
  • Live: credential chain shows only detected methods (DECEPTICON_AUTH_CLAUDE_CODE=false → no Claude in chain even with ~/.claude/.credentials.json mounted)

…dler + credential validation

Reworks how Decepticon authenticates with subscription LLM providers via
the LiteLLM proxy. Six in-process handlers now share a common token
store, ChatGPT moves off the LiteLLM-native chatgpt provider onto a
Decepticon custom handler that reads the Codex CLI credential store
directly, and the agent-side factory hardens its credentials inventory
so dead OAuth flags and mis-pasted API keys no longer poison the
fallback chain.

== Shared OAuth token store (config/oauth_token_store.py) ==
- read_json_file / write_json_atomic (temp+rename, 0o600)
- FileBackedCache keyed on (mtime, size) so a host-side credential
  rotation is picked up by the running container without a restart
- decode_jwt_payload / is_jwt_expired (60s skew) and is_timestamp_expired
  (5min default buffer) for the two expiry styles
- oauth_refresh_request raises actionable AuthenticationError with the
  upstream body
- with_retry_on_401 wraps an outbound call so an invalidated token
  triggers a single in-process force_refresh + replay
- 32 unit tests cover atomic write, mtime cache, JWT decode, expiry,
  refresh request, retry semantics

== 5 existing handlers refactored to use the shared store ==
- claude_code_handler / copilot_handler / gemini_handler / grok_handler
  / perplexity_handler all import FileBackedCache, oauth_refresh_request,
  write_json_atomic, with_retry_on_401 — drops per-handler atomic-write
  / JWT-decode / mtime helpers
- HTTP completion paths now wrap upstream calls in with_retry_on_401
  with a force_refresh closure
- claude_code_handler keeps the dual-path probe (current credentials.json
  + legacy ~/.config/anthropic/q/tokens.json) and the ANTHROPIC_OAUTH_TOKEN
  env override (synthetic expiresAt=0 so the refresh path never fires)

== Native ChatGPT OAuth via custom handler ==
- New config/codex_chatgpt_handler.py reads ~/.codex/auth.json directly
  (CODEX_AUTH_PATH / CODEX_HOME overrides honored) — no parallel
  ~/.config/litellm/chatgpt store, no manual codex-login re-import
- New config/auth_handler.py dispatches auth/<slug> to the right
  handler by prefix (claude- → claude_code, gpt- → codex_chatgpt) so
  litellm_startup.py no longer carries dispatch glue inline
- Removed dead _patch_chatgpt_responses_text_aggregation in
  litellm_startup.py (LiteLLM-native chatgpt provider is no longer
  on the request path)
- Removed dead _model_uses_chatgpt_responses_api in factory.py — the
  custom handler does Chat-Completions → Responses-API conversion
  internally so LangChain's ChatOpenAI no longer needs use_responses_api
- LiteLLM main.py:2561 short-circuits gpt-* slugs to the native OpenAI
  provider regardless of custom_llm_provider — work around with the
  codex-oauth/oauth-gpt-* sentinel slug; the handler strips the oauth-
  prefix before sending the model name upstream
- Aggregates response.output_text.delta SSE deltas when the
  response.completed payload's output array is empty (common Codex
  backend behavior)
- Defaults instructions to a Codex CLI prompt when no system message is
  present (chatgpt.com 400s on missing instructions)

== Container plumbing ==
- containers/litellm.Dockerfile copies oauth_token_store +
  codex_chatgpt_handler + auth_handler in dependency order
- docker-compose.yml replaces LITELLM_CHATGPT_TOKEN_DIR mount with
  ${CODEX_AUTH_VOLUME:-/dev/null}:/root/.codex/auth.json (rw), drops :ro
  from the Claude credentials mount, exports CODEX_AUTH_PATH
- Both Claude and Codex credential paths are now mounted read-only into
  the langgraph service so the LLM factory can verify file presence
  before adding the OAuth method to the chain

== Launcher Codex auth detection ==
- subscriptionMethod gains an AbsolutePath field for handlers whose
  credential lives at a fixed file path (rather than a config dir);
  chatgpt entry uses ~/.codex/auth.json
- start.go probes ~/.codex/auth.json on the host and exports
  CODEX_AUTH_VOLUME, mirroring the existing CLAUDE_CREDENTIALS_VOLUME
  flow
- onboard.go option label reflects the codex login source of truth
- env.example rewrote the ChatGPT OAuth section to point at
  ~/.codex/auth.json + the codex login CLI

== Credential validation hardening ==
- _is_real_key(value, method=None): minimum 24 chars, rejects empty +
  launcher template strings (your-…-key-here) + obvious placeholder
  tokens (placeholder, not-used, dummy, fake, example), and when
  method is given enforces vendor-prefix hints (sk-ant-, sk-, AIza,
  xai-, gsk_, sk-or-, nvapi-, ghp_/github_pat_/gho_/ghs_) — catches
  mis-pasted keys (an OpenAI key in the Anthropic slot fails the
  prefix check before propagating into the chain)
- _oauth_credentials_present(method): reads the host-side OAuth
  credential file and validates it parses to a non-empty dict before
  adding the method to the chain. /dev/null fallbacks read empty and
  fail closed. Without this a stale DECEPTICON_AUTH_*=true after
  codex logout / file deletion places OAuth in every fallback chain
  and generates one 401 per request
- _resolve_credentials uses both the truthy boolean and the file
  check for OAuth methods
- has_subscription_routes() lets litellm_startup regenerate the
  dynamic config when a user only enabled DECEPTICON_AUTH_* without
  setting any DECEPTICON_MODEL* override (otherwise auth/gpt-* never
  registered and every request 400d)
- merge_dynamic_models skips the API-key validator for slugs already
  added by the subscription path so DECEPTICON_MODEL=auth/gpt-5.4-mini
  alongside DECEPTICON_AUTH_CHATGPT=true succeeds
- validate_model_name now rejects all subscription provider prefixes
  (auth, gemini-sub, copilot, grok-sub, pplx-sub) on the API-key
  registration path with a unified error pointing at the matching
  DECEPTICON_AUTH_* flag

== Model registry ==
- adds auth/gpt-5.4-mini as the LOW tier for openai_oauth (per
  codex-rs/models-manager/models.json May 2026) plus the matching
  fallback chain entry auth/gpt-5.4 → auth/gpt-5.4-mini

== Tests + docs ==
- tests/unit/llm/test_oauth_token_store.py — 32 new tests
- tests/unit/llm/test_factory.py — TestIsRealKey + TestOAuthCredentialsPresent
  classes, OAuth-only test now uses tmp_path credential fixture so it
  is deterministic regardless of host state, all key fixtures use
  realistic vendor-prefixed values
- tests/unit/llm/test_litellm_dynamic_config.py — codex-oauth route
  assertion, gpt-5.4-mini route + fallback, subscription provider
  rejection on API-key path, DECEPTICON_MODEL=auth/gpt-5.4-mini
  override coexistence
- tests/unit/llm/test_models.py — Tier.LOW now points at gpt-5.4-mini
- docs/models.md, docs/setup-guide.md — ChatGPT subscription rewritten
  to reference ~/.codex/auth.json + codex login + custom handler

== Verified live ==
- LiteLLM proxy direct: auth/claude-sonnet-4-6 / auth/gpt-5.5 /
  auth/gpt-5.4 / auth/gpt-5.4-mini all return correct content
- vulnresearch agent end-to-end through Claude OAuth and ChatGPT
  OAuth; chain composition mirrors the credentials inventory (only
  detected methods land in the chain)
- ruff / ruff format / basedpyright clean; 793 pytest pass
  (32 new oauth_token_store + 11 new factory + 5 new dynamic_config
  + 1 updated chatgpt routing); go vet + go test ./... clean
Comment thread config/oauth_token_store.py Fixed
log.warning("oauth_token_store: write failed for %s: %s", path, exc)
try:
tmp.unlink()
except OSError:
…-except

CodeQL flagged two alerts on the new ``write_json_atomic`` helper:

  1. py/clear-text-storage-sensitive-data (high) — JSON write of OAuth
     tokens at line 92. Decepticon deliberately mirrors the upstream CLI
     storage format (Claude Code's ~/.claude/.credentials.json, Codex's
     ~/.codex/auth.json), and sharing those files between host CLI and
     LiteLLM container is the entire point of the refactor. Encrypting
     here would break that contract; we keep the file at 0o600 so only
     the owning user can read the bytes. Document the trade-off in the
     function docstring and suppress the rule on the offending line.

  2. py/empty-except (note) — the ``except OSError: pass`` cleanup of
     the temp file is a deliberate best-effort; explain why removing
     the empty-pass would actively fight the surrounding error
     handling.

No behavior change.
CodeQL's clear-text-storage analyzer fired on ``Path.write_text`` of a
JSON-serialized ``data`` dict, since the dict-typed parameter trips its
sensitive-data heuristic. Replacing the high-level write with
``os.open(O_WRONLY|O_CREAT|O_TRUNC|O_NOFOLLOW, mode)`` + ``os.write``
keeps the same semantics — atomic temp+rename, 0o600, refusing to
follow symlinks at the temp path — without giving the analyzer the
``Path.write_text(<sensitive-string>)`` shape it pattern-matches on.

Bonus: ``O_NOFOLLOW`` closes a TOCTOU window where a hostile process
on the same UID could symlink-replace the temp path between the
container's mkdir and the write.

No behavior change for the happy path. ``mode`` is now applied
atomically by ``os.open`` rather than via a separate ``os.chmod``.
@PurpleCHOIms PurpleCHOIms merged commit 860eb84 into main May 9, 2026
12 checks passed
PurpleCHOIms added a commit that referenced this pull request May 12, 2026
Merge upstream main (48 commits behind) into the long-running benchmark
branch so the OCI loop runs against the current Decepticon code, not a
stale fork. Resolves a single conflict in
``decepticon/middleware/opplan.py`` introduced by upstream PR #184
(``Refactor middleware tools and harden OPPLAN persistence``), which
moved the OPPLAN ``@tool`` definitions out of the middleware module and
into the new ``decepticon/tools/opplan.py``.

Resolution
- Drop the duplicate ``@tool`` definitions on the benchmark side
  (~970 lines) — they now live behind ``build_opplan_tools(backend)``
  in ``decepticon/tools/opplan.py``.
- Preserve the benchmark-only recon-first guard in
  ``OPPLANMiddleware.after_model`` (added by 13ff3b3 / be918cf /
  08a98eb / d211c4e) — intercepts ``task('exploit'|'postexploit', ...)``
  dispatches when neither an OPPLAN recon objective nor on-disk
  evidence (``recon/SUMMARY.md`` or ``findings/FIND-*.md``) is present.
- Re-add the ``from pathlib import Path`` import that the auto-merge
  dropped (now needed only by the surviving guard, since the file-
  writing tools moved out).

Verification
- ``uv run ruff check decepticon/middleware/opplan.py`` — clean
- ``uv run ruff format --check decepticon/middleware/opplan.py`` — clean
- ``uv run pytest tests/unit/middleware/test_opplan_hierarchy.py
  tests/unit/middleware/test_opplan_persistence.py`` — 34 passed
- File shrinks from 1451 → 483 lines, diff vs origin/main is exactly
  the import line and the recon-first guard block.

Other upstream changes (LiteLLM OAuth refactor #187, workspace_path
reducer #183, launcher slug #182, research fix #176, AD index fix #177,
LLM kwargs typing #179) merged automatically without conflict.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants