Skip to content

feat(proxy): multi-upstream routing by model prefix (HEADROOM_UPSTREAM_ROUTES)#1280

Open
albertolinard wants to merge 3 commits into
headroomlabs-ai:mainfrom
albertolinard:feat/multi-upstream-routing
Open

feat(proxy): multi-upstream routing by model prefix (HEADROOM_UPSTREAM_ROUTES)#1280
albertolinard wants to merge 3 commits into
headroomlabs-ai:mainfrom
albertolinard:feat/multi-upstream-routing

Conversation

@albertolinard

@albertolinard albertolinard commented Jun 22, 2026

Copy link
Copy Markdown

Description

Adds per-request upstream routing so a single proxy instance can fan out to multiple upstreams by model prefix (e.g. glm-* → ollama.com, claude-* → anthropic, gpt-5.5 → openai, * → default) while keeping one shared savings history / dashboard across all traffic families. With the feature unconfigured, behavior is byte-identical to today's single-upstream resolution.

Closes #1279

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Changes Made

  • New env var HEADROOM_UPSTREAM_ROUTES — a JSON array of route objects parsed at startup:

    [
      {"model_prefix": "glm-", "upstream": "https://ollama.com", "auth": "env:OLLAMA_API_KEY"},
      {"model_prefix": "claude-", "protocol": "anthropic", "auth": "passthrough"},
      {"model_prefix": "gpt-5.5", "auth": "passthrough"},
      {"model_prefix": "*", "auth": "passthrough"}
    ]
    field description
    model_prefix case-insensitive prefix of the request model field; "*" is the catch-all. First match wins.
    upstream optional base URL override (trailing slash stripped). If omitted, protocol selects the default api_targets URL.
    protocol "openai" or "anthropic" — selects the api_targets slot when upstream is unset. Default "openai".
    auth "passthrough" (forward inbound headers) or "env:VARNAME" (strip inbound auth, inject env-sourced bearer per-request).
  • headroom/providers/registry.py — new UpstreamRoute, UpstreamResolution, PassthroughAuth, BearerAuth dataclasses; parse_upstream_routes() env parser; ProxyProviderRuntime.resolve_upstream(); wired into build_proxy_provider_runtime().

  • headroom/proxy/server.pyHeadroomProxy.resolve_upstream() (URL + auth substitution) and resolve_upstream_url() convenience wrapper.

  • Routing consulted at 4 forward call sites before the upstream URL is built:

    1. headroom/proxy/handlers/openai.py/v1/chat/completions
    2. headroom/proxy/handlers/openai.py/v1/responses (HTTP)
    3. headroom/proxy/handlers/openai.py/v1/responses WS-to-HTTP fallback (_ws_http_fallback)
    4. headroom/proxy/handlers/anthropic.py/v1/messages
  • Legacy self.OPENAI_API_URL / self.ANTHROPIC_API_URL paths are used only as fallbacks when no route matches and no "*" catch-all is configured.

Failure modes (all degrade to legacy single-upstream behavior except the fail-closed auth case):

  • Malformed JSON / non-array → log + ignore
  • Empty model_prefix → skip with warning
  • Unknown protocol → default "openai" with warning
  • Unknown auth spec → passthrough with warning
  • BearerAuth env var empty → 502 + error log (fail closed; inbound credentials stripped)

Security: BearerAuth routes strip inbound Authorization / x-api-key / api-key before injecting the env-sourced bearer, so inbound credentials never leak to a foreign upstream. The env var name is operator-configured, not derived from request input. Bearer tokens are read per-request from os.environ, so a secret rotation + restart is picked up without re-parsing routes.

Testing

  • Unit tests pass (pytest)
  • Linting passes (ruff check .)
  • Type checking passes (mypy headroom)
  • New tests added for new functionality
  • Manual testing performed

Test Output

$ uv run pytest tests/test_multi_upstream_routes.py tests/test_provider_registry.py tests/test_provider_registry_extended.py tests/test_banner_upstream_targets.py -q
.............................................                             [100%]
45 passed in 3.90s

$ ruff check .
All checks passed!

18 new unit tests in tests/test_multi_upstream_routes.py:

  • parse_upstream_routes: empty/invalid JSON, non-array, valid routes, trailing slash, empty prefix, bad protocol, bad auth
  • resolve_upstream: empty routes legacy fallback, prefix match, "*" fallback, no-match-no-star legacy, missing model, first-match-wins, protocol slot, case-insensitive, anthropic-header legacy, BearerAuth env-var exposure

Existing suites continue to pass: test_provider_registry.py (11), test_provider_registry_extended.py (7), test_banner_upstream_targets.py (9).

Real Behavior Proof

  • Environment: a downstream fork of this routing logic runs in production today on a self-hosted Kubernetes cluster as a forked proxy image, fronting three live upstreams behind one proxy/dashboard.
  • Exact command / steps: proxy started with HEADROOM_UPSTREAM_ROUTES configured for glm-* → ollama.com (BearerAuth via env:OLLAMA_API_KEY), claude-* → anthropic (passthrough), and * → default. Drove real OpenAI- and Anthropic-protocol traffic from coding harnesses through the single proxy port.
  • Observed result: glm-* requests reached ollama.com with the injected bearer (inbound auth stripped); claude-* requests reached anthropic via passthrough; all traffic families accumulated into one shared proxy_savings.json and a single dashboard. With HEADROOM_UPSTREAM_ROUTES unset, the proxy behaves identically to the unpatched build.
  • Not tested: mypy headroom type-check pass (see checklist); upstream-side compression quality differences across non-Anthropic vendors are out of scope for this routing change.

Review Readiness

  • I have performed a self-review
  • This PR is ready for human review

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md if applicable

Additional Notes

Add per-request upstream routing so a single proxy instance can fan
out to multiple upstreams by model prefix (e.g. glm-* -> ollama.com,
claude-* -> anthropic, gpt-5.5 -> openai, * -> default) while keeping
one shared savings history / dashboard across all traffic families.

Configuration
------------
Set HEADROOM_UPSTREAM_ROUTES to a JSON array of route objects:
  [
    {"model_prefix": "glm-", "upstream": "https://ollama.com",
     "auth": "env:OLLAMA_API_KEY"},
    {"model_prefix": "claude-", "protocol": "anthropic"},
    {"model_prefix": "gpt-5.5", "auth": "passthrough"},
    {"model_prefix": "*", "auth": "passthrough"}
  ]

Each entry:
- model_prefix: case-insensitive prefix of the request's model field;
  "*" is the catch-all fallback (first match wins, order matters).
- upstream: optional base URL override (trailing slash stripped). If
  omitted, the route's protocol slot selects the default api_targets URL.
- protocol: "openai" or "anthropic" — selects the api_targets slot when
  upstream is not set. Defaults to "openai".
- auth: "passthrough" (forward inbound headers verbatim) or
  "env:VARNAME" (strip inbound Authorization/x-api-key/api-key and
  inject a bearer read from the env var at request time, so token
  rotation is picked up without re-parsing routes).

With HEADROOM_UPSTREAM_ROUTES unset/empty, behavior is byte-identical
to the legacy single-upstream resolution (select_passthrough_base_url
+ PassthroughAuth).

Call sites patched (3)
---------------------
- OpenAI /v1/chat/completions
- OpenAI /v1/responses (HTTP + WS-to-HTTP fallback)
- Anthropic /v1/messages

Tests
-----
18 new unit tests covering parse_upstream_routes and
ProxyProviderRuntime.resolve_upstream, including legacy fallback,
case-insensitive matching, first-match-wins ordering, protocol slot
resolution, BearerAuth env-var exposure, and anthropic-header legacy
routing. Existing test_provider_registry{,_extended}.py and
test_banner_upstream_targets.py continue to pass (27 tests).
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

PR governance

This PR follows the template and is marked ready for human review.

@github-actions github-actions Bot added status: needs author action Pull request body or readiness checklist still needs author updates status: ready for review Pull request body is complete and the author marked it ready for human review and removed status: needs author action Pull request body or readiness checklist still needs author updates labels Jun 22, 2026

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall routing shape is useful, but two blockers need tightening before this is ready. First, resolve_upstream() says an empty env: auth var should fail closed, but the call sites do not detect that condition; they just send the request with inbound auth stripped and no replacement Authorization header. Please return an explicit error/sentinel and make the OpenAI/Anthropic call paths fail with a clear 502 (or equivalent) before contacting the upstream when a configured route token is missing. Second, this PR includes unrelated uv.lock changes for package version/extras/litellm markers/spreadsheet deps that are not part of multi-upstream routing; please drop those from this branch unless they are required and explained/tested here. Also note the PR is currently merge-conflicted (DIRTY) and will need a refresh from main.

…abs-ai#1280)

Addresses review feedback on the multi-upstream routing PR.

resolve_upstream() previously documented a fail-closed contract for a
matched BearerAuth route whose env token is unset/empty, but the call
sites never enforced it: the inbound Authorization/x-api-key was stripped
and the request was forwarded to the foreign upstream with no replacement
credential.

- registry: add UpstreamAuthUnavailable; resolve_upstream() now raises it
  (instead of logging and returning) when a matched BearerAuth route has
  an empty env var. Raising fails *closed* by default — a call site that
  forgets to catch yields a 500, never an unauthenticated forward.
- handlers: catch UpstreamAuthUnavailable at all four forward call sites
  (/v1/chat/completions, /v1/responses HTTP + WS-to-HTTP fallback,
  /v1/messages) and return a 502 (or a WS error event) *before* contacting
  the upstream. The client message is generic so the env-var name never
  leaks. The passthrough/no-routes path cannot raise, preserving the
  byte-identical legacy fallback.

Tests
- new tests/test_multi_upstream_fail_closed.py drives every handler path
  end-to-end with a BearerAuth route whose env var is empty and asserts
  (a) 502 / WS error event and (b) the upstream httpx client is never
  contacted (mock httpx.AsyncClient.send). This closes the gap the
  isolated resolve_upstream unit tests could not catch.
- update minimal handler stubs that now depend on resolve_upstream
  (ws_http_fallback, codex_routing, anthropic backpressure / stage timings)
  to provide a legacy single-upstream resolve_upstream.
@albertolinard

Copy link
Copy Markdown
Author

Thanks for the review @JerrettDavis — all three addressed in 5e39ae5c:

1. Fail-closed not enforced at the call sites. You were exactly right: resolve_upstream() documented the contract but only logged and returned, so the request went out with inbound auth stripped and no replacement. Fixed properly:

  • resolve_upstream() now raises a new UpstreamAuthUnavailable when a matched env: route has an empty/unset token. Raising (vs. a sentinel) fails closed by construction — a call site that forgets to catch yields a 500, never an unauthenticated forward.
  • All four forward call sites catch it and fail before contacting the upstream: /v1/chat/completions, /v1/responses (HTTP), the WS→HTTP fallback, and /v1/messages502 (or a WS error event on the already-upgraded WS arm). The client message is generic so the env-var name never leaks.
  • New tests/test_multi_upstream_fail_closed.py drives each handler path end-to-end with an empty-token route and asserts (a) 502 / WS error and (b) httpx.AsyncClient.send is never called. This is the level the isolated resolve_upstream unit tests couldn't cover.

2. Unrelated uv.lock changes. Dropped — uv.lock is now byte-identical to main (the PR touches no dependency manifests).

3. Merge conflict. Merged main; branch is mergeable again.

While wiring the handler-level tests I also found the routing change had broken a few existing stub-based handler tests (test_ws_http_fallback, test_openai_codex_routing, test_anthropic_pre_upstream_backpressure, test_anthropic_stage_timings) — their minimal mock handlers didn't provide resolve_upstream. Updated those stubs with a legacy single-upstream resolve_upstream so the full handler suite is green again.

ruff check / ruff format clean.

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the current head. The fail-closed gap is fixed now: missing env: route credentials raise UpstreamAuthUnavailable, the OpenAI/Anthropic HTTP paths return 502 before upstream contact, and the WS fallback emits an error event rather than forwarding unauthenticated. The new handler-level tests cover the exact call-site failure mode from the previous review, and the unrelated uv.lock churn is gone.

Only governance checks have run for this head, so this still needs the normal CI path before merge. I do not see a remaining code blocker.

@github-actions github-actions Bot added status: ci failing Required or reported CI checks are failing and removed status: ready for review Pull request body is complete and the author marked it ready for human review labels Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ci failing Required or reported CI checks are failing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Multi-upstream routing by model prefix (HEADROOM_UPSTREAM_ROUTES)

2 participants