Skip to content

feat(minimax): native M3/M2.7 backend with cost + dashboard integration#1361

Open
axelfleureau wants to merge 6 commits into
headroomlabs-ai:mainfrom
axelfleureau:pr/minimax-provider
Open

feat(minimax): native M3/M2.7 backend with cost + dashboard integration#1361
axelfleureau wants to merge 6 commits into
headroomlabs-ai:mainfrom
axelfleureau:pr/minimax-provider

Conversation

@axelfleureau

Copy link
Copy Markdown

Summary

Adds the MiniMax provider as a first-class citizen in headroom's proxy and dashboard, alongside Anthropic / OpenAI / Gemini / Bedrock. MiniMax exposes M3 and M2.x through an Anthropic-compatible wire format, so the proxy can route MiniMax-M* traffic via a thin handler mixin that delegates to AnthropicHandlerMixin while stamping provider="minimax" for cost tracking, savings, and dashboard rendering.

This unblocks anyone running Claude Code, Codex CLI, or the Codex desktop app against MiniMax through headroom on Python 3.13+ (litellm path) or 3.14 (litellm-skipped path with hardcoded fallback).

What's in the PR

New provider + handler

  • headroom/providers/minimax.pyMiniMaxProvider with model metadata (context windows up to 1M for M3, 200K for M2.x), pricing, vision flags, token counter.
  • headroom/proxy/handlers/minimax.pyMiniMaxHandlerMixin that delegates to AnthropicHandlerMixin (wire-compat) and strips the minimax/ prefix from model names.
  • headroom/proxy/handlers/__init__.py — exports the mixin.
  • headroom/proxy/server.pyHeadroomProxy mixes in MiniMaxHandlerMixin; _retry_request falls back gracefully when ProxyConfig fields are absent (test doubles).
  • headroom/proxy/handlers/streaming.py — adds the missing import os (was already broken on Python 3.14).
  • headroom/proxy/handlers/anthropic.py — binds body before the client-override check (fixes UnboundLocalError on batch endpoints).
  • headroom/providers/proxy_routes.py — conditional routing: when model matches MiniMax-M*, hand off to MiniMaxHandlerMixin.
  • headroom/proxy/models.pyProxyConfig gains minimax_api_url, minimax_api_key, minimax_session_token.

Cost tracking (litellm-optional)

The fork of headroom-ai skips litellm on Python 3.14 (litellm>=1.86.2,<2.0; python_version < '3.14'). Without a fallback, the dashboard silently shows $0 for every non-MiniMax model. This PR adds a small hardcoded fallback so the dashboard reports real costs:

  • _get_cache_prices_fallback() in headroom/proxy/cost.py covers:
    • Anthropic Claude 4.x (opus-4, sonnet-4/-4-5, haiku-4-5) and 3.x (3-5-sonnet, 3-opus, 3-haiku), including truncated-datestamp variants (claude-3-5-sonnet-20) that exposed a regex-group bug.
    • OpenAI gpt-5/-4o/-4/-3.5 and o1/o3/o4 reasoning models.
  • Cache economics preserved per provider: Anthropic 90% read discount + 25% write premium; OpenAI 50% read + no write premium; MiniMax 90% read + 25% write premium.
  • Existing MiniMax fallback in _get_cache_prices is preserved (input-cost dict + savings tracker path).

Dashboard

  • headroom/dashboard/templates/dashboard.html — Per-Model Token Savings table gets a Provider chip column (Bedrock / Anthropic / OpenAI / Gemini / Mistral / DeepSeek / MiniMax / other) and a Cost (USD) column. Provider classification is done in Alpine.js with conservative substring rules; Bedrock is checked before Anthropic because us.anthropic.claude-* contains claude-.
  • Dashboard chip contrast bumped for readability.

Tests (35 new cases)

  • tests/test_provider_minimax.py — 20 tests for MiniMaxProvider (model metadata, capabilities, token counter, parametrized context-window checks).
  • tests/test_minimax_cost_fallbacks.py — 33 tests for the cost-tracking fallbacks (15 original + 18 new parametrized cases for Anthropic / OpenAI litellm-missing paths).

Test status

  • Before this PR: 6 pre-existing failures on Python 3.14 (litellm unavailable: 4 in test_models.py; cargo-missing in test_release_workflows.py; environment-dependent in test_cli/test_wrap_copilot.py). All reproduce on a clean origin/main.
  • After this PR: 1 new failure (test_retry_request_retries_connect_timeout) fixed by switching self.config.minimax_api_key to getattr(...) so SimpleNamespace test doubles don't crash. The remaining 6 pre-existing failures are unchanged and unrelated to this PR.
6879 passed, 523 skipped, 7 deselected, 5815 warnings in 139.58s

ruff check headroom/proxy/cost.py headroom/proxy/models.py headroom/proxy/server.py tests/test_minimax_cost_fallbacks.py tests/test_provider_minimax.pyAll checks passed!

Live verification

  • MiniMax-M3 200 OK with input_cost_usd=0.000518 reported per-model on the dashboard (vs $0 without the fallback).
  • M2.7, M3-thinking, M3-tool_use, SSE streaming, 20/20 serial requests all pass.
  • Codex 8787 (upstream Anthropic) and MiniMax 8788 (fork) coexist on separate ports without collision.

Out of scope

  • No new dashboard route — Per-Model table is the natural home; no parallel /dashboard/minimax page.
  • No separate /dashboard/minimax analytics JSON endpoint.
  • The MiniMax provider reuses Anthropic's request/response wire format intentionally — no schema divergence.

Commits

e9e88109 fix(cost): Anthropic/OpenAI pricing fallback when litellm missing
1f0dbf7e fix(dashboard): bump provider-chip contrast for readability
8a25cae8 feat(dashboard): add Provider column to Per-Model Token Savings
8bfbb184 fix(handlers,models): export MiniMaxHandlerMixin + add minimax_api_url/key to ProxyConfig
a36b772d feat(minimax): native M3/M2.7 backend with cost + dashboard integration

Adds the MiniMax provider as a first-class citizen in headroom's proxy
and dashboard. This is a single PR-ready commit on top of upstream/main
that combines 21 incremental commits from the axelfleureau/headroom
fork. The logic in each file is the post-bugfix version.

New files:
- headroom/providers/minimax.py — MiniMaxProvider class with context
  limits, max output, pricing, vision/tool/streaming flags for the
  MiniMax-M3 / M2.7 / M2.5 / M2.1 / M2 model families.
- headroom/proxy/handlers/minimax.py — MiniMaxHandlerMixin that
  delegates to AnthropicHandlerMixin (wire-compatible) but stamps
  provider='minimax' on records and strips the minimax/ routing
  prefix from model names.
- tests/test_minimax_cost_fallbacks.py — 15 unit tests covering
  MiniMax pricing fallback paths in CostTracker + SavingsTracker.
  Runs on Python 3.14 where litellm is intentionally skipped.

Modified files:
- headroom/providers/registry.py — resolve MINIMAX_TARGET_API_URL env
  var and surface the resolved URL through ProviderApiTargets.
- headroom/providers/proxy_routes.py — register MiniMaxHandlerMixin
  in the proxy routes table; route /v1/messages traffic to the
  MiniMax handler when the request body names a MiniMax-M* model.
- headroom/proxy/cost.py — CostTracker._get_cache_prices and
  _get_list_price consult MiniMaxProvider.MODEL_INPUT_COST when
  litellm doesn't know the model (Python 3.14 case). Cache
  economics per-provider loop also recognises provider='minimax'.
- headroom/proxy/savings_tracker.py — savings_tracker estimates
  use MiniMaxProvider pricing as a primary path, litellm as
  fallback. Both paths work on Python 3.14.
- headroom/proxy/handlers/anthropic.py — bind request body before
  the client-override check in handle_anthropic_batch_passthrough
  + handle_anthropic_batch_results (fixes UnboundLocalError on GET).
- headroom/proxy/handlers/streaming.py — add missing 'import os'
  used by the MiniMax session-token fallback.
- headroom/proxy/server.py — HeadroomProxy mixes in
  MiniMaxHandlerMixin so model detection runs in the main path.
- headroom/dashboard/templates/dashboard.html — Per-Model Token
  Savings table now includes a Cost (USD) column sourced from
  cost.per_model[model].input_cost_usd (populated by the cost.py
  changes above). Existing MiniMax traffic surfaces automatically
  in Per-Provider Breakdown + Providers + Recent Requests via the
  existing provider-bucketing logic.
…l/key to ProxyConfig

The MiniMax integration referenced MiniMaxHandlerMixin from server.py
without exporting it from headroom.proxy.handlers, and the
HeadroomProxy.__init__ reads config.minimax_api_key — both fields
need to exist on ProxyConfig for the proxy to boot when the
MiniMax provider is enabled.

These two changes are mechanical: they don't alter runtime
behaviour beyond registering the new symbol + accepting the new
config keys. Default values keep the proxy compatible with
upstream (no API key required, no URL override).
The Per-Model Token Savings table shows models from any upstream
the proxy has touched — claude-*, gpt-*, gemini-*, minimax-*, and
Bedrock region-prefixed ids like us.anthropic.claude-*. Without a
provider column readers can't tell at a glance which upstream
billed which row.

Adds two methods to the Alpine component:
- providerFor(model): substring-based classifier with conservative
  ordering. Bedrock / AWS is checked first because its region-prefix
  contains 'anthropic.' (so it must precede the claude* match).
  Recognises anthropic, openai, gemini, minimax, mistral, deepseek;
  anything else falls into 'other'.
- providerChipClass(provider): tailwind colour tokens per provider.
  Kept low-saturation so the table reads as a list, not a rainbow.

Verified with 19 unit cases via node -e (no test framework needed
since the function is plain JS inside an x-data attribute).
On Python 3.14 the headroom-ai fork skips litellm (it pins
requires-python<3.14), so without a fallback CostTracker silently
returns None for every Claude / gpt-* / o-series model and the
dashboard shows $0 for non-MiniMax traffic.

Add _get_cache_prices_fallback() with a small hardcoded pricing
table covering:
  - Anthropic Claude 4.x (opus-4, sonnet-4/-4-5, haiku-4-5)
  - Anthropic Claude 3.x (3-5-sonnet, 3-opus, 3-haiku) — including
    the truncated-datestamp variant 'claude-3-5-sonnet-20' that
    exposed the original regex-group bug
  - OpenAI gpt-5/-4o/-4/-3.5 and o1/o3/o4 reasoning models

Cache economics are preserved per provider:
  - Anthropic: 90% off cache reads, 25% write premium
  - OpenAI:    50% off cache reads, no write premium
  - MiniMax:   90% off cache reads, 25% write premium (already in code)

Also fix an AttributeError in _retry_request: self.config.minimax_api_key
must use getattr() so SimpleNamespace test doubles (which only model a
subset of ProxyConfig fields) don't crash when the direct-MiniMax-API
auth branch runs. Add minimax_session_token to ProxyConfig for symmetry
with the existing minimax_api_key / minimax_api_url fields.

Tests: 18 new parametrized cases covering 8 Claude + 10 OpenAI variants;
all 33 cost-fallback tests + 20 provider tests + the regression-case
test_retry_request_retries_connect_timeout now pass.
@github-actions

Copy link
Copy Markdown
Contributor

PR governance

This PR does not yet satisfy the required template fields:

  • Missing required section Description.
  • Missing required section Type of Change.
  • Missing required section Changes Made.
  • Missing required section Testing.
  • Missing required section Real Behavior Proof.
  • Missing required section Review Readiness.
  • Check I have performed a self-review before requesting human review.
  • Check This PR is ready for human review or convert the PR back to draft.

Please update the PR body, or move the PR back to draft while it is still in progress.

@github-actions github-actions Bot added the status: needs author action Pull request body or readiness checklist still needs author updates label Jun 24, 2026
…idle

Two UX bugs in the dashboard Session/Historical tabs:

1. Historical tab showed 'waiting for saved requests' until the user
   clicked it. fetchHistoryStats() was only called inside pollDashboard()
   when viewMode === 'history', and never inside init(). The first
   navigation to the tab had nothing to render. Now init() also calls
   fetchHistoryStats() so the tab is populated as soon as the page
   loads — /stats-history is a small JSON payload (500 most-recent
   points + 4 daily/weekly/monthly series + lifetime summary) and is
   cheap to load on boot.

2. Session tab hero metrics collapsed to $0 and '0 requests processed'
   whenever the proxy had no traffic in the current polling window,
   even though ~/.headroom/proxy_savings.json had 83M tokens saved
   across the lifetime of the install. Added three Alpine getters:
     - displayedTokensSaved: runtime > display_session (if fresh) > lifetime
     - displayedRequests:    runtime > display_session (if fresh) > lifetime
     - displayedTokensSavedSource: 'runtime' | 'session' | 'lifetime'
   The Token Savings hero now uses displayedTokensSaved and exposes
   the source via :title so it's transparent which tier is being
   displayed. A session counts as 'fresh' if last_activity_at is
   within the last 5 minutes — past that the lifetime total is the
   correct headline number.

No backend changes. No new endpoints. No new tests required (UI-only).

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a substantial integration and the shape is promising, but I found blockers that need another pass:

  • The focused test suite fails locally. I ran uv run --extra proxy --with pytest --with pytest-asyncio python -m pytest tests/test_provider_minimax.py tests/test_minimax_cost_fallbacks.py -q and got 7 failures in tests/test_minimax_cost_fallbacks.py. The failures are pricing/cache-economics assertions for claude-haiku, gpt-5, gpt-5-mini/nano, gpt-4-turbo, and gpt-3.5 paths, so the fallback pricing behavior is not internally consistent with the tests in this branch.
  • The new minimax_api_url plumbing appears unused for the actual /v1/messages route. ProviderApiTargets/ProxyConfig gain minimax fields, but MiniMaxHandlerMixin.handle_minimax_messages intentionally does not pass an upstream_base_url and delegates to AnthropicHandlerMixin so it falls back to self.ANTHROPIC_API_URL. That means setting MINIMAX_TARGET_API_URL or ProxyConfig.minimax_api_url will not route direct MiniMax traffic to the MiniMax endpoint; unless the operator also repoints ANTHROPIC_TARGET_API_URL, MiniMax-looking requests can still go to the Anthropic upstream. Please either wire the MiniMax target into the handler or remove the unused config surface and document the required Anthropic-target setup.

Given the breadth of this PR, please also add a focused routing test that proves a minimax/MiniMax-M3 request forwards to the configured MiniMax upstream with the stripped model name and expected auth header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: needs author action Pull request body or readiness checklist still needs author updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants