feat(proxy): multi-upstream routing by model prefix (HEADROOM_UPSTREAM_ROUTES)#1280
feat(proxy): multi-upstream routing by model prefix (HEADROOM_UPSTREAM_ROUTES)#1280albertolinard wants to merge 3 commits into
Conversation
Add per-request upstream routing so a single proxy instance can fan
out to multiple upstreams by model prefix (e.g. glm-* -> ollama.com,
claude-* -> anthropic, gpt-5.5 -> openai, * -> default) while keeping
one shared savings history / dashboard across all traffic families.
Configuration
------------
Set HEADROOM_UPSTREAM_ROUTES to a JSON array of route objects:
[
{"model_prefix": "glm-", "upstream": "https://ollama.com",
"auth": "env:OLLAMA_API_KEY"},
{"model_prefix": "claude-", "protocol": "anthropic"},
{"model_prefix": "gpt-5.5", "auth": "passthrough"},
{"model_prefix": "*", "auth": "passthrough"}
]
Each entry:
- model_prefix: case-insensitive prefix of the request's model field;
"*" is the catch-all fallback (first match wins, order matters).
- upstream: optional base URL override (trailing slash stripped). If
omitted, the route's protocol slot selects the default api_targets URL.
- protocol: "openai" or "anthropic" — selects the api_targets slot when
upstream is not set. Defaults to "openai".
- auth: "passthrough" (forward inbound headers verbatim) or
"env:VARNAME" (strip inbound Authorization/x-api-key/api-key and
inject a bearer read from the env var at request time, so token
rotation is picked up without re-parsing routes).
With HEADROOM_UPSTREAM_ROUTES unset/empty, behavior is byte-identical
to the legacy single-upstream resolution (select_passthrough_base_url
+ PassthroughAuth).
Call sites patched (3)
---------------------
- OpenAI /v1/chat/completions
- OpenAI /v1/responses (HTTP + WS-to-HTTP fallback)
- Anthropic /v1/messages
Tests
-----
18 new unit tests covering parse_upstream_routes and
ProxyProviderRuntime.resolve_upstream, including legacy fallback,
case-insensitive matching, first-match-wins ordering, protocol slot
resolution, BearerAuth env-var exposure, and anthropic-header legacy
routing. Existing test_provider_registry{,_extended}.py and
test_banner_upstream_targets.py continue to pass (27 tests).
PR governanceThis PR follows the template and is marked ready for human review. |
JerrettDavis
left a comment
There was a problem hiding this comment.
The overall routing shape is useful, but two blockers need tightening before this is ready. First, resolve_upstream() says an empty env: auth var should fail closed, but the call sites do not detect that condition; they just send the request with inbound auth stripped and no replacement Authorization header. Please return an explicit error/sentinel and make the OpenAI/Anthropic call paths fail with a clear 502 (or equivalent) before contacting the upstream when a configured route token is missing. Second, this PR includes unrelated uv.lock changes for package version/extras/litellm markers/spreadsheet deps that are not part of multi-upstream routing; please drop those from this branch unless they are required and explained/tested here. Also note the PR is currently merge-conflicted (DIRTY) and will need a refresh from main.
…-routing # Conflicts: # uv.lock
…abs-ai#1280) Addresses review feedback on the multi-upstream routing PR. resolve_upstream() previously documented a fail-closed contract for a matched BearerAuth route whose env token is unset/empty, but the call sites never enforced it: the inbound Authorization/x-api-key was stripped and the request was forwarded to the foreign upstream with no replacement credential. - registry: add UpstreamAuthUnavailable; resolve_upstream() now raises it (instead of logging and returning) when a matched BearerAuth route has an empty env var. Raising fails *closed* by default — a call site that forgets to catch yields a 500, never an unauthenticated forward. - handlers: catch UpstreamAuthUnavailable at all four forward call sites (/v1/chat/completions, /v1/responses HTTP + WS-to-HTTP fallback, /v1/messages) and return a 502 (or a WS error event) *before* contacting the upstream. The client message is generic so the env-var name never leaks. The passthrough/no-routes path cannot raise, preserving the byte-identical legacy fallback. Tests - new tests/test_multi_upstream_fail_closed.py drives every handler path end-to-end with a BearerAuth route whose env var is empty and asserts (a) 502 / WS error event and (b) the upstream httpx client is never contacted (mock httpx.AsyncClient.send). This closes the gap the isolated resolve_upstream unit tests could not catch. - update minimal handler stubs that now depend on resolve_upstream (ws_http_fallback, codex_routing, anthropic backpressure / stage timings) to provide a legacy single-upstream resolve_upstream.
|
Thanks for the review @JerrettDavis — all three addressed in 1. Fail-closed not enforced at the call sites. You were exactly right:
2. Unrelated 3. Merge conflict. Merged While wiring the handler-level tests I also found the routing change had broken a few existing stub-based handler tests (
|
JerrettDavis
left a comment
There was a problem hiding this comment.
Re-reviewed the current head. The fail-closed gap is fixed now: missing env: route credentials raise UpstreamAuthUnavailable, the OpenAI/Anthropic HTTP paths return 502 before upstream contact, and the WS fallback emits an error event rather than forwarding unauthenticated. The new handler-level tests cover the exact call-site failure mode from the previous review, and the unrelated uv.lock churn is gone.
Only governance checks have run for this head, so this still needs the normal CI path before merge. I do not see a remaining code blocker.
Description
Adds per-request upstream routing so a single proxy instance can fan out to multiple upstreams by model prefix (e.g.
glm-*→ ollama.com,claude-*→ anthropic,gpt-5.5→ openai,*→ default) while keeping one shared savings history / dashboard across all traffic families. With the feature unconfigured, behavior is byte-identical to today's single-upstream resolution.Closes #1279
Type of Change
Changes Made
New env var
HEADROOM_UPSTREAM_ROUTES— a JSON array of route objects parsed at startup:[ {"model_prefix": "glm-", "upstream": "https://ollama.com", "auth": "env:OLLAMA_API_KEY"}, {"model_prefix": "claude-", "protocol": "anthropic", "auth": "passthrough"}, {"model_prefix": "gpt-5.5", "auth": "passthrough"}, {"model_prefix": "*", "auth": "passthrough"} ]model_prefixmodelfield;"*"is the catch-all. First match wins.upstreamprotocolselects the defaultapi_targetsURL.protocol"openai"or"anthropic"— selects theapi_targetsslot whenupstreamis unset. Default"openai".auth"passthrough"(forward inbound headers) or"env:VARNAME"(strip inbound auth, inject env-sourced bearer per-request).headroom/providers/registry.py— newUpstreamRoute,UpstreamResolution,PassthroughAuth,BearerAuthdataclasses;parse_upstream_routes()env parser;ProxyProviderRuntime.resolve_upstream(); wired intobuild_proxy_provider_runtime().headroom/proxy/server.py—HeadroomProxy.resolve_upstream()(URL + auth substitution) andresolve_upstream_url()convenience wrapper.Routing consulted at 4 forward call sites before the upstream URL is built:
headroom/proxy/handlers/openai.py—/v1/chat/completionsheadroom/proxy/handlers/openai.py—/v1/responses(HTTP)headroom/proxy/handlers/openai.py—/v1/responsesWS-to-HTTP fallback (_ws_http_fallback)headroom/proxy/handlers/anthropic.py—/v1/messagesLegacy
self.OPENAI_API_URL/self.ANTHROPIC_API_URLpaths are used only as fallbacks when no route matches and no"*"catch-all is configured.Failure modes (all degrade to legacy single-upstream behavior except the fail-closed auth case):
model_prefix→ skip with warningprotocol→ default"openai"with warningauthspec →passthroughwith warningSecurity: BearerAuth routes strip inbound
Authorization/x-api-key/api-keybefore injecting the env-sourced bearer, so inbound credentials never leak to a foreign upstream. The env var name is operator-configured, not derived from request input. Bearer tokens are read per-request fromos.environ, so a secret rotation + restart is picked up without re-parsing routes.Testing
pytest)ruff check .)mypy headroom)Test Output
18 new unit tests in
tests/test_multi_upstream_routes.py:parse_upstream_routes: empty/invalid JSON, non-array, valid routes, trailing slash, empty prefix, bad protocol, bad authresolve_upstream: empty routes legacy fallback, prefix match,"*"fallback, no-match-no-star legacy, missing model, first-match-wins, protocol slot, case-insensitive, anthropic-header legacy, BearerAuth env-var exposureExisting suites continue to pass:
test_provider_registry.py(11),test_provider_registry_extended.py(7),test_banner_upstream_targets.py(9).Real Behavior Proof
HEADROOM_UPSTREAM_ROUTESconfigured forglm-*→ ollama.com (BearerAuth viaenv:OLLAMA_API_KEY),claude-*→ anthropic (passthrough), and*→ default. Drove real OpenAI- and Anthropic-protocol traffic from coding harnesses through the single proxy port.glm-*requests reached ollama.com with the injected bearer (inbound auth stripped);claude-*requests reached anthropic via passthrough; all traffic families accumulated into one sharedproxy_savings.jsonand a single dashboard. WithHEADROOM_UPSTREAM_ROUTESunset, the proxy behaves identically to the unpatched build.mypy headroomtype-check pass (see checklist); upstream-side compression quality differences across non-Anthropic vendors are out of scope for this routing change.Review Readiness
Checklist
Additional Notes
mypy headroomwas not run as part of this change; the routing code is fully type-annotated but I left the box unchecked rather than assert an unverified pass — happy to add a typed run if maintainers require it.HEADROOM_UPSTREAM_ROUTESsection wherever existing env vars live.