feat(proxy): multi-upstream routing by model prefix (HEADROOM_UPSTREAM_ROUTES) by albertolinard · Pull Request #1280 · headroomlabs-ai/headroom

albertolinard · 2026-06-22T10:48:39Z

Description

Adds per-request upstream routing so a single proxy instance can fan out to multiple upstreams by model prefix (e.g. glm-* → ollama.com, claude-* → anthropic, gpt-5.5 → openai, * → default) while keeping one shared savings history / dashboard across all traffic families. With the feature unconfigured, behavior is byte-identical to today's single-upstream resolution.

Closes #1279

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Performance improvement
Code refactoring (no functional changes)

Changes Made

New env var HEADROOM_UPSTREAM_ROUTES — a JSON array of route objects parsed at startup:

[
  {"model_prefix": "glm-", "upstream": "https://ollama.com", "auth": "env:OLLAMA_API_KEY"},
  {"model_prefix": "claude-", "protocol": "anthropic", "auth": "passthrough"},
  {"model_prefix": "gpt-5.5", "auth": "passthrough"},
  {"model_prefix": "*", "auth": "passthrough"}
]

field	description
`model_prefix`	case-insensitive prefix of the request `model` field; `"*"` is the catch-all. First match wins.
`upstream`	optional base URL override (trailing slash stripped). If omitted, `protocol` selects the default `api_targets` URL.
`protocol`	`"openai"` or `"anthropic"` — selects the `api_targets` slot when `upstream` is unset. Default `"openai"`.
`auth`	`"passthrough"` (forward inbound headers) or `"env:VARNAME"` (strip inbound auth, inject env-sourced bearer per-request).

headroom/providers/registry.py — new UpstreamRoute, UpstreamResolution, PassthroughAuth, BearerAuth dataclasses; parse_upstream_routes() env parser; ProxyProviderRuntime.resolve_upstream(); wired into build_proxy_provider_runtime().
headroom/proxy/server.py — HeadroomProxy.resolve_upstream() (URL + auth substitution) and resolve_upstream_url() convenience wrapper.
Routing consulted at 4 forward call sites before the upstream URL is built:
1. headroom/proxy/handlers/openai.py — /v1/chat/completions
2. headroom/proxy/handlers/openai.py — /v1/responses (HTTP)
3. headroom/proxy/handlers/openai.py — /v1/responses WS-to-HTTP fallback (_ws_http_fallback)
4. headroom/proxy/handlers/anthropic.py — /v1/messages
Legacy self.OPENAI_API_URL / self.ANTHROPIC_API_URL paths are used only as fallbacks when no route matches and no "*" catch-all is configured.

Failure modes (all degrade to legacy single-upstream behavior except the fail-closed auth case):

Malformed JSON / non-array → log + ignore
Empty model_prefix → skip with warning
Unknown protocol → default "openai" with warning
Unknown auth spec → passthrough with warning
BearerAuth env var empty → 502 + error log (fail closed; inbound credentials stripped)

Security: BearerAuth routes strip inbound Authorization / x-api-key / api-key before injecting the env-sourced bearer, so inbound credentials never leak to a foreign upstream. The env var name is operator-configured, not derived from request input. Bearer tokens are read per-request from os.environ, so a secret rotation + restart is picked up without re-parsing routes.

Testing

Unit tests pass (pytest)
Linting passes (ruff check .)
Type checking passes (mypy headroom)
New tests added for new functionality
Manual testing performed

Test Output

$ uv run pytest tests/test_multi_upstream_routes.py tests/test_provider_registry.py tests/test_provider_registry_extended.py tests/test_banner_upstream_targets.py -q
.............................................                             [100%]
45 passed in 3.90s

$ ruff check .
All checks passed!

18 new unit tests in tests/test_multi_upstream_routes.py:

parse_upstream_routes: empty/invalid JSON, non-array, valid routes, trailing slash, empty prefix, bad protocol, bad auth
resolve_upstream: empty routes legacy fallback, prefix match, "*" fallback, no-match-no-star legacy, missing model, first-match-wins, protocol slot, case-insensitive, anthropic-header legacy, BearerAuth env-var exposure

Existing suites continue to pass: test_provider_registry.py (11), test_provider_registry_extended.py (7), test_banner_upstream_targets.py (9).

Real Behavior Proof

Environment: a downstream fork of this routing logic runs in production today on a self-hosted Kubernetes cluster as a forked proxy image, fronting three live upstreams behind one proxy/dashboard.
Exact command / steps: proxy started with HEADROOM_UPSTREAM_ROUTES configured for glm-* → ollama.com (BearerAuth via env:OLLAMA_API_KEY), claude-* → anthropic (passthrough), and * → default. Drove real OpenAI- and Anthropic-protocol traffic from coding harnesses through the single proxy port.
Observed result: glm-* requests reached ollama.com with the injected bearer (inbound auth stripped); claude-* requests reached anthropic via passthrough; all traffic families accumulated into one shared proxy_savings.json and a single dashboard. With HEADROOM_UPSTREAM_ROUTES unset, the proxy behaves identically to the unpatched build.
Not tested: mypy headroom type-check pass (see checklist); upstream-side compression quality differences across non-Anthropic vendors are out of scope for this routing change.

Review Readiness

I have performed a self-review
This PR is ready for human review

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have updated the CHANGELOG.md if applicable

Additional Notes

mypy headroom was not run as part of this change; the routing code is fully type-annotated but I left the box unchecked rather than assert an unverified pass — happy to add a typed run if maintainers require it.
Docs / CHANGELOG left unchecked pending maintainer preference on where operator-facing env vars are documented; I can add a HEADROOM_UPSTREAM_ROUTES section wherever existing env vars live.
Complements (does not overlap) [FEATURE] Anthropic-compatible third-party upstream support (DashScope, XF-Yun, SiliconFlow, etc.) #1264 (Anthropic-compatible third-party upstream support — per-upstream compatibility quirks vs. routing to multiple upstreams), and is adjacent to [BUG] Proxy returns 502 with httpx incomplete chunked read against OpenAI-compatible upstream #1112 and [FEATURE] Bedrock-native compression parity for non-Anthropic vendors (Amazon Nova, ZAI/GLM, MiniMax, Kimi) + cross-region inference profile fix #953.

Add per-request upstream routing so a single proxy instance can fan out to multiple upstreams by model prefix (e.g. glm-* -> ollama.com, claude-* -> anthropic, gpt-5.5 -> openai, * -> default) while keeping one shared savings history / dashboard across all traffic families. Configuration ------------ Set HEADROOM_UPSTREAM_ROUTES to a JSON array of route objects: [ {"model_prefix": "glm-", "upstream": "https://ollama.com", "auth": "env:OLLAMA_API_KEY"}, {"model_prefix": "claude-", "protocol": "anthropic"}, {"model_prefix": "gpt-5.5", "auth": "passthrough"}, {"model_prefix": "*", "auth": "passthrough"} ] Each entry: - model_prefix: case-insensitive prefix of the request's model field; "*" is the catch-all fallback (first match wins, order matters). - upstream: optional base URL override (trailing slash stripped). If omitted, the route's protocol slot selects the default api_targets URL. - protocol: "openai" or "anthropic" — selects the api_targets slot when upstream is not set. Defaults to "openai". - auth: "passthrough" (forward inbound headers verbatim) or "env:VARNAME" (strip inbound Authorization/x-api-key/api-key and inject a bearer read from the env var at request time, so token rotation is picked up without re-parsing routes). With HEADROOM_UPSTREAM_ROUTES unset/empty, behavior is byte-identical to the legacy single-upstream resolution (select_passthrough_base_url + PassthroughAuth). Call sites patched (3) --------------------- - OpenAI /v1/chat/completions - OpenAI /v1/responses (HTTP + WS-to-HTTP fallback) - Anthropic /v1/messages Tests ----- 18 new unit tests covering parse_upstream_routes and ProxyProviderRuntime.resolve_upstream, including legacy fallback, case-insensitive matching, first-match-wins ordering, protocol slot resolution, BearerAuth env-var exposure, and anthropic-header legacy routing. Existing test_provider_registry{,_extended}.py and test_banner_upstream_targets.py continue to pass (27 tests).

github-actions · 2026-06-22T10:48:53Z

PR governance

This PR follows the template and is marked ready for human review.

JerrettDavis

The overall routing shape is useful, but two blockers need tightening before this is ready. First, resolve_upstream() says an empty env: auth var should fail closed, but the call sites do not detect that condition; they just send the request with inbound auth stripped and no replacement Authorization header. Please return an explicit error/sentinel and make the OpenAI/Anthropic call paths fail with a clear 502 (or equivalent) before contacting the upstream when a configured route token is missing. Second, this PR includes unrelated uv.lock changes for package version/extras/litellm markers/spreadsheet deps that are not part of multi-upstream routing; please drop those from this branch unless they are required and explained/tested here. Also note the PR is currently merge-conflicted (DIRTY) and will need a refresh from main.

…-routing # Conflicts: # uv.lock

…abs-ai#1280) Addresses review feedback on the multi-upstream routing PR. resolve_upstream() previously documented a fail-closed contract for a matched BearerAuth route whose env token is unset/empty, but the call sites never enforced it: the inbound Authorization/x-api-key was stripped and the request was forwarded to the foreign upstream with no replacement credential. - registry: add UpstreamAuthUnavailable; resolve_upstream() now raises it (instead of logging and returning) when a matched BearerAuth route has an empty env var. Raising fails *closed* by default — a call site that forgets to catch yields a 500, never an unauthenticated forward. - handlers: catch UpstreamAuthUnavailable at all four forward call sites (/v1/chat/completions, /v1/responses HTTP + WS-to-HTTP fallback, /v1/messages) and return a 502 (or a WS error event) *before* contacting the upstream. The client message is generic so the env-var name never leaks. The passthrough/no-routes path cannot raise, preserving the byte-identical legacy fallback. Tests - new tests/test_multi_upstream_fail_closed.py drives every handler path end-to-end with a BearerAuth route whose env var is empty and asserts (a) 502 / WS error event and (b) the upstream httpx client is never contacted (mock httpx.AsyncClient.send). This closes the gap the isolated resolve_upstream unit tests could not catch. - update minimal handler stubs that now depend on resolve_upstream (ws_http_fallback, codex_routing, anthropic backpressure / stage timings) to provide a legacy single-upstream resolve_upstream.

albertolinard · 2026-06-22T22:13:56Z

Thanks for the review @JerrettDavis — all three addressed in 5e39ae5c:

1. Fail-closed not enforced at the call sites. You were exactly right: resolve_upstream() documented the contract but only logged and returned, so the request went out with inbound auth stripped and no replacement. Fixed properly:

resolve_upstream() now raises a new UpstreamAuthUnavailable when a matched env: route has an empty/unset token. Raising (vs. a sentinel) fails closed by construction — a call site that forgets to catch yields a 500, never an unauthenticated forward.
All four forward call sites catch it and fail before contacting the upstream: /v1/chat/completions, /v1/responses (HTTP), the WS→HTTP fallback, and /v1/messages → 502 (or a WS error event on the already-upgraded WS arm). The client message is generic so the env-var name never leaks.
New tests/test_multi_upstream_fail_closed.py drives each handler path end-to-end with an empty-token route and asserts (a) 502 / WS error and (b) httpx.AsyncClient.send is never called. This is the level the isolated resolve_upstream unit tests couldn't cover.

2. Unrelated uv.lock changes. Dropped — uv.lock is now byte-identical to main (the PR touches no dependency manifests).

3. Merge conflict. Merged main; branch is mergeable again.

While wiring the handler-level tests I also found the routing change had broken a few existing stub-based handler tests (test_ws_http_fallback, test_openai_codex_routing, test_anthropic_pre_upstream_backpressure, test_anthropic_stage_timings) — their minimal mock handlers didn't provide resolve_upstream. Updated those stubs with a legacy single-upstream resolve_upstream so the full handler suite is green again.

ruff check / ruff format clean.

JerrettDavis

Re-reviewed the current head. The fail-closed gap is fixed now: missing env: route credentials raise UpstreamAuthUnavailable, the OpenAI/Anthropic HTTP paths return 502 before upstream contact, and the WS fallback emits an error event rather than forwarding unauthenticated. The new handler-level tests cover the exact call-site failure mode from the previous review, and the unrelated uv.lock churn is gone.

Only governance checks have run for this head, so this still needs the normal CI path before merge. I do not see a remaining code blocker.

JerrettDavis requested changes Jun 22, 2026

View reviewed changes

albertolinard added 2 commits June 22, 2026 18:27

Merge remote-tracking branch 'upstream/main' into feat/multi-upstream…

4d7dddc

…-routing # Conflicts: # uv.lock

JerrettDavis approved these changes Jun 22, 2026

View reviewed changes

github-actions Bot added status: ci failing Required or reported CI checks are failing and removed status: ready for review Pull request body is complete and the author marked it ready for human review labels Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(proxy): multi-upstream routing by model prefix (HEADROOM_UPSTREAM_ROUTES)#1280

feat(proxy): multi-upstream routing by model prefix (HEADROOM_UPSTREAM_ROUTES)#1280
albertolinard wants to merge 3 commits into
headroomlabs-ai:mainfrom
albertolinard:feat/multi-upstream-routing

albertolinard commented Jun 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

JerrettDavis left a comment

Uh oh!

albertolinard commented Jun 22, 2026

Uh oh!

JerrettDavis left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

albertolinard commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Changes Made

Testing

Test Output

Real Behavior Proof

Review Readiness

Checklist

Additional Notes

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR governance

Uh oh!

JerrettDavis left a comment

Choose a reason for hiding this comment

Uh oh!

albertolinard commented Jun 22, 2026

Uh oh!

JerrettDavis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

albertolinard commented Jun 22, 2026 •

edited

Loading

github-actions Bot commented Jun 22, 2026 •

edited

Loading