Skip to content

GPU free-tier arbitrage routing policy with observable fallback #818

@Spherrrical

Description

@Spherrrical

Plano already sits in the routing layer but it doesn't take advantage of that position to automatically reduce cost for developers. We should build a first-class GPU free-tier arbitrage policy: routing low-stakes or bursty agent traffic to free/low-cost providers when available, with deterministic fallback to the primary when they're unavailable or overloaded, and full trace visibility into every routing decision.

Requirements

  • Configurable arbitrage policy at the model_providers level: specify a ranked list of free/low-cost providers with fallback ordering
  • Deterministic fallback: when a free-tier provider is unavailable, rate-limited, or errors, Plano falls back to the primary predictably
  • All routing decisions surfaced in traces: which provider was selected, why, when it fell back, and what the next selection was
  • Reliability guardrails: free-tier providers should not silently degrade: failures must be explicit and logged

What "done" looks like

A developer can add a minimal config block to enable arbitrage, run a request, and see in the trace: provider selected, reason (free-tier available), fallback chain if applicable.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions