feat(minimax): native M3/M2.7 backend with cost + dashboard integration#1361
Open
axelfleureau wants to merge 6 commits into
Open
feat(minimax): native M3/M2.7 backend with cost + dashboard integration#1361axelfleureau wants to merge 6 commits into
axelfleureau wants to merge 6 commits into
Conversation
Adds the MiniMax provider as a first-class citizen in headroom's proxy and dashboard. This is a single PR-ready commit on top of upstream/main that combines 21 incremental commits from the axelfleureau/headroom fork. The logic in each file is the post-bugfix version. New files: - headroom/providers/minimax.py — MiniMaxProvider class with context limits, max output, pricing, vision/tool/streaming flags for the MiniMax-M3 / M2.7 / M2.5 / M2.1 / M2 model families. - headroom/proxy/handlers/minimax.py — MiniMaxHandlerMixin that delegates to AnthropicHandlerMixin (wire-compatible) but stamps provider='minimax' on records and strips the minimax/ routing prefix from model names. - tests/test_minimax_cost_fallbacks.py — 15 unit tests covering MiniMax pricing fallback paths in CostTracker + SavingsTracker. Runs on Python 3.14 where litellm is intentionally skipped. Modified files: - headroom/providers/registry.py — resolve MINIMAX_TARGET_API_URL env var and surface the resolved URL through ProviderApiTargets. - headroom/providers/proxy_routes.py — register MiniMaxHandlerMixin in the proxy routes table; route /v1/messages traffic to the MiniMax handler when the request body names a MiniMax-M* model. - headroom/proxy/cost.py — CostTracker._get_cache_prices and _get_list_price consult MiniMaxProvider.MODEL_INPUT_COST when litellm doesn't know the model (Python 3.14 case). Cache economics per-provider loop also recognises provider='minimax'. - headroom/proxy/savings_tracker.py — savings_tracker estimates use MiniMaxProvider pricing as a primary path, litellm as fallback. Both paths work on Python 3.14. - headroom/proxy/handlers/anthropic.py — bind request body before the client-override check in handle_anthropic_batch_passthrough + handle_anthropic_batch_results (fixes UnboundLocalError on GET). - headroom/proxy/handlers/streaming.py — add missing 'import os' used by the MiniMax session-token fallback. - headroom/proxy/server.py — HeadroomProxy mixes in MiniMaxHandlerMixin so model detection runs in the main path. - headroom/dashboard/templates/dashboard.html — Per-Model Token Savings table now includes a Cost (USD) column sourced from cost.per_model[model].input_cost_usd (populated by the cost.py changes above). Existing MiniMax traffic surfaces automatically in Per-Provider Breakdown + Providers + Recent Requests via the existing provider-bucketing logic.
…l/key to ProxyConfig The MiniMax integration referenced MiniMaxHandlerMixin from server.py without exporting it from headroom.proxy.handlers, and the HeadroomProxy.__init__ reads config.minimax_api_key — both fields need to exist on ProxyConfig for the proxy to boot when the MiniMax provider is enabled. These two changes are mechanical: they don't alter runtime behaviour beyond registering the new symbol + accepting the new config keys. Default values keep the proxy compatible with upstream (no API key required, no URL override).
The Per-Model Token Savings table shows models from any upstream the proxy has touched — claude-*, gpt-*, gemini-*, minimax-*, and Bedrock region-prefixed ids like us.anthropic.claude-*. Without a provider column readers can't tell at a glance which upstream billed which row. Adds two methods to the Alpine component: - providerFor(model): substring-based classifier with conservative ordering. Bedrock / AWS is checked first because its region-prefix contains 'anthropic.' (so it must precede the claude* match). Recognises anthropic, openai, gemini, minimax, mistral, deepseek; anything else falls into 'other'. - providerChipClass(provider): tailwind colour tokens per provider. Kept low-saturation so the table reads as a list, not a rainbow. Verified with 19 unit cases via node -e (no test framework needed since the function is plain JS inside an x-data attribute).
On Python 3.14 the headroom-ai fork skips litellm (it pins
requires-python<3.14), so without a fallback CostTracker silently
returns None for every Claude / gpt-* / o-series model and the
dashboard shows $0 for non-MiniMax traffic.
Add _get_cache_prices_fallback() with a small hardcoded pricing
table covering:
- Anthropic Claude 4.x (opus-4, sonnet-4/-4-5, haiku-4-5)
- Anthropic Claude 3.x (3-5-sonnet, 3-opus, 3-haiku) — including
the truncated-datestamp variant 'claude-3-5-sonnet-20' that
exposed the original regex-group bug
- OpenAI gpt-5/-4o/-4/-3.5 and o1/o3/o4 reasoning models
Cache economics are preserved per provider:
- Anthropic: 90% off cache reads, 25% write premium
- OpenAI: 50% off cache reads, no write premium
- MiniMax: 90% off cache reads, 25% write premium (already in code)
Also fix an AttributeError in _retry_request: self.config.minimax_api_key
must use getattr() so SimpleNamespace test doubles (which only model a
subset of ProxyConfig fields) don't crash when the direct-MiniMax-API
auth branch runs. Add minimax_session_token to ProxyConfig for symmetry
with the existing minimax_api_key / minimax_api_url fields.
Tests: 18 new parametrized cases covering 8 Claude + 10 OpenAI variants;
all 33 cost-fallback tests + 20 provider tests + the regression-case
test_retry_request_retries_connect_timeout now pass.
Contributor
PR governanceThis PR does not yet satisfy the required template fields:
Please update the PR body, or move the PR back to draft while it is still in progress. |
…idle
Two UX bugs in the dashboard Session/Historical tabs:
1. Historical tab showed 'waiting for saved requests' until the user
clicked it. fetchHistoryStats() was only called inside pollDashboard()
when viewMode === 'history', and never inside init(). The first
navigation to the tab had nothing to render. Now init() also calls
fetchHistoryStats() so the tab is populated as soon as the page
loads — /stats-history is a small JSON payload (500 most-recent
points + 4 daily/weekly/monthly series + lifetime summary) and is
cheap to load on boot.
2. Session tab hero metrics collapsed to $0 and '0 requests processed'
whenever the proxy had no traffic in the current polling window,
even though ~/.headroom/proxy_savings.json had 83M tokens saved
across the lifetime of the install. Added three Alpine getters:
- displayedTokensSaved: runtime > display_session (if fresh) > lifetime
- displayedRequests: runtime > display_session (if fresh) > lifetime
- displayedTokensSavedSource: 'runtime' | 'session' | 'lifetime'
The Token Savings hero now uses displayedTokensSaved and exposes
the source via :title so it's transparent which tier is being
displayed. A session counts as 'fresh' if last_activity_at is
within the last 5 minutes — past that the lifetime total is the
correct headline number.
No backend changes. No new endpoints. No new tests required (UI-only).
JerrettDavis
requested changes
Jun 24, 2026
JerrettDavis
left a comment
Collaborator
There was a problem hiding this comment.
This is a substantial integration and the shape is promising, but I found blockers that need another pass:
- The focused test suite fails locally. I ran uv run --extra proxy --with pytest --with pytest-asyncio python -m pytest tests/test_provider_minimax.py tests/test_minimax_cost_fallbacks.py -q and got 7 failures in tests/test_minimax_cost_fallbacks.py. The failures are pricing/cache-economics assertions for claude-haiku, gpt-5, gpt-5-mini/nano, gpt-4-turbo, and gpt-3.5 paths, so the fallback pricing behavior is not internally consistent with the tests in this branch.
- The new minimax_api_url plumbing appears unused for the actual /v1/messages route. ProviderApiTargets/ProxyConfig gain minimax fields, but MiniMaxHandlerMixin.handle_minimax_messages intentionally does not pass an upstream_base_url and delegates to AnthropicHandlerMixin so it falls back to self.ANTHROPIC_API_URL. That means setting MINIMAX_TARGET_API_URL or ProxyConfig.minimax_api_url will not route direct MiniMax traffic to the MiniMax endpoint; unless the operator also repoints ANTHROPIC_TARGET_API_URL, MiniMax-looking requests can still go to the Anthropic upstream. Please either wire the MiniMax target into the handler or remove the unused config surface and document the required Anthropic-target setup.
Given the breadth of this PR, please also add a focused routing test that proves a minimax/MiniMax-M3 request forwards to the configured MiniMax upstream with the stripped model name and expected auth header.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the MiniMax provider as a first-class citizen in headroom's proxy and dashboard, alongside Anthropic / OpenAI / Gemini / Bedrock. MiniMax exposes M3 and M2.x through an Anthropic-compatible wire format, so the proxy can route
MiniMax-M*traffic via a thin handler mixin that delegates toAnthropicHandlerMixinwhile stampingprovider="minimax"for cost tracking, savings, and dashboard rendering.This unblocks anyone running Claude Code, Codex CLI, or the Codex desktop app against MiniMax through headroom on Python 3.13+ (litellm path) or 3.14 (litellm-skipped path with hardcoded fallback).
What's in the PR
New provider + handler
headroom/providers/minimax.py—MiniMaxProviderwith model metadata (context windows up to 1M for M3, 200K for M2.x), pricing, vision flags, token counter.headroom/proxy/handlers/minimax.py—MiniMaxHandlerMixinthat delegates toAnthropicHandlerMixin(wire-compat) and strips theminimax/prefix from model names.headroom/proxy/handlers/__init__.py— exports the mixin.headroom/proxy/server.py—HeadroomProxymixes inMiniMaxHandlerMixin;_retry_requestfalls back gracefully whenProxyConfigfields are absent (test doubles).headroom/proxy/handlers/streaming.py— adds the missingimport os(was already broken on Python 3.14).headroom/proxy/handlers/anthropic.py— bindsbodybefore the client-override check (fixesUnboundLocalErroron batch endpoints).headroom/providers/proxy_routes.py— conditional routing: when model matchesMiniMax-M*, hand off toMiniMaxHandlerMixin.headroom/proxy/models.py—ProxyConfiggainsminimax_api_url,minimax_api_key,minimax_session_token.Cost tracking (litellm-optional)
The fork of
headroom-aiskipslitellmon Python 3.14 (litellm>=1.86.2,<2.0; python_version < '3.14'). Without a fallback, the dashboard silently shows$0for every non-MiniMax model. This PR adds a small hardcoded fallback so the dashboard reports real costs:_get_cache_prices_fallback()inheadroom/proxy/cost.pycovers:claude-3-5-sonnet-20) that exposed a regex-group bug._get_cache_pricesis preserved (input-cost dict + savings tracker path).Dashboard
headroom/dashboard/templates/dashboard.html— Per-Model Token Savings table gets a Provider chip column (Bedrock / Anthropic / OpenAI / Gemini / Mistral / DeepSeek / MiniMax / other) and a Cost (USD) column. Provider classification is done in Alpine.js with conservative substring rules; Bedrock is checked before Anthropic becauseus.anthropic.claude-*containsclaude-.Tests (35 new cases)
tests/test_provider_minimax.py— 20 tests forMiniMaxProvider(model metadata, capabilities, token counter, parametrized context-window checks).tests/test_minimax_cost_fallbacks.py— 33 tests for the cost-tracking fallbacks (15 original + 18 new parametrized cases for Anthropic / OpenAI litellm-missing paths).Test status
test_models.py; cargo-missing intest_release_workflows.py; environment-dependent intest_cli/test_wrap_copilot.py). All reproduce on a cleanorigin/main.test_retry_request_retries_connect_timeout) fixed by switchingself.config.minimax_api_keytogetattr(...)soSimpleNamespacetest doubles don't crash. The remaining 6 pre-existing failures are unchanged and unrelated to this PR.ruff check headroom/proxy/cost.py headroom/proxy/models.py headroom/proxy/server.py tests/test_minimax_cost_fallbacks.py tests/test_provider_minimax.py→ All checks passed!Live verification
MiniMax-M3200 OK withinput_cost_usd=0.000518reported per-model on the dashboard (vs $0 without the fallback).Out of scope
/dashboard/minimaxpage./dashboard/minimaxanalytics JSON endpoint.Commits