Skip to content

feat: per-key daily rate limits with whitelist tier#34

Open
wms2537 wants to merge 4 commits into
mainfrom
feat/rate-limits
Open

feat: per-key daily rate limits with whitelist tier#34
wms2537 wants to merge 4 commits into
mainfrom
feat/rate-limits

Conversation

@wms2537
Copy link
Copy Markdown
Collaborator

@wms2537 wms2537 commented May 1, 2026

Summary

Adds enforced per-key daily rate limits across all paid endpoints (chat, REST tools, MCP), with a 3-tier system (free / pro / unlimited) configurable per-key from the admin dashboard. Counters reset at UTC midnight.

Closes the gap where the README claimed "100 calls/day" but no code enforced it.

Tiers

Tier Chat / day Tool / day Worst-case spend per key
free (default) 30 100 ~$1.50
pro 300 1000 ~$15
unlimited 10000 10000 uncapped-ish

Numbers tuned against openai/gpt-oss-120b pricing. Tiers live in code (apps/web/src/lib/rateLimit.ts) so they can be tuned without a migration; the schema only stores the tier name.

Storage

KV counters at rl:{subject}:{category}:{YYYY-MM-DD} with 48h TTL. Subject is key:{keyId} for API-key auth and user:{userId} for session auth (playground). Admin Bearer auth bypasses entirely.

KV is eventually consistent across regions — small overshoot under bursty traffic is acceptable for a per-day quota.

Enforcement points

  • POST /api/v1/chat/completions — 429 returns OpenAI-shape {error:{type:"rate_limit_exceeded"}}
  • All 18 POST /api/v1/tools/* routes — 429 returns {error, type, limit, used, resetSeconds, tier}
  • POST /mcp tools/call — 429 returns JSON-RPC error code -32002

Every response (200 and 429) carries X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-RateLimit-Tier. 429s additionally carry Retry-After: <seconds>.

What's new

  • Migration apps/web/migrations/0005_rate_limits.sqlALTER TABLE api_keys ADD COLUMN rate_limit_tier TEXT NOT NULL DEFAULT 'free'
  • Library apps/web/src/lib/rateLimit.tsenforceRateLimit(), checkToolRateLimit(), subjectFor(), tier table
  • Auth validateRequest.ts returns rateLimitTier for API-key auth; MCP validateApiKey() likewise
  • Admin API apps/web/src/app/api/admin/rate-limits/route.tsGET (list keys + 24h call counts), PATCH (bump tier). Gated by X-Admin-Secret
  • Admin UI /dashboard/admin adds a "Rate Limits" tab with a searchable/tier-filterable table and per-key dropdown. Reuses the existing AUTH_SECRET flow
  • User UI /dashboard/keys shows a tier badge on each key (read-only)
  • Docs docs/api/chat-completions.md rate-limits section, README.md, CLAUDE.md

Deployment status

  • Migration 0005_rate_limits.sql already applied to remote D1 arbbuilder
  • Worker deployed (version 20f85e74-ea76-4e91-98ef-0ecea741b079)
  • Verified live with a real arb_ key:
    • Chat 200 → X-RateLimit-Limit: 30, Remaining: 29, Tier: free
    • Tool 200 → X-RateLimit-Limit: 100, Remaining: 99, Tier: free

Test plan

  • cd apps/web && npm test → 20/20 pass
  • npx tsc --noEmit clean
  • Hit chat endpoint with valid key → 200, headers present, counter increments per call
  • Hit any /api/v1/tools/* → 200, headers present, counter increments
  • Bump a key to pro from /dashboard/admin (Rate Limits tab) → next call shows X-RateLimit-Limit: 300
  • Confirm /dashboard/keys shows a tier badge

🤖 Generated with Claude Code

wms2537 and others added 4 commits May 2, 2026 00:37
Adds tier-based daily quotas to /api/v1/chat/completions, every
/api/v1/tools/* route, and /mcp tools/call. Counters live in KV
(rl:{subject}:{category}:{YYYY-MM-DD}, 48h TTL); admin Bearer auth
bypasses; session auth counts per user under the free tier.

Tiers (code-defined, tunable without migration):
  free       -> 30 chat / 100 tool per day
  pro        -> 300 chat / 1000 tool per day
  unlimited  -> 10K / 10K (effectively uncapped)

Schema: 0005_rate_limits.sql adds api_keys.rate_limit_tier (default 'free').

Admin UX: new "Rate Limits" tab on /dashboard/admin lists all keys with
24h call counts and a tier dropdown. Backed by GET/PATCH /api/admin/rate-limits
(X-Admin-Secret). User-facing: /dashboard/keys shows a tier badge per key.

Headers on every response: X-RateLimit-Limit, -Remaining, -Reset, -Tier.
429 also carries Retry-After. Chat returns OpenAI-shape
{ error: { type: "rate_limit_exceeded" }}; tool routes return JSON 429;
MCP returns JSON-RPC error code -32002.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…total)

Both windows must allow a request; whichever is exhausted first triggers 429.
Per-minute catches abuse bursts; per-day caps total cost.

  free       -> 100/min, 1000/day
  pro        -> 500/min, 10K/day
  unlimited  -> 10K/min, 1M/day  (effectively uncapped)

KV keys:
  rl:{subject}:{category}:m:{YYYY-MM-DDTHH:MM}  TTL 120s
  rl:{subject}:{category}:d:{YYYY-MM-DD}        TTL 48h

Headers expose both windows (X-RateLimit-Limit-Minute / -Day) plus the
canonical bottleneck triplet for clients that only check the standard
names. Retry-After uses the denying window. Error messages name the
window that denied.

Tier names and DB schema unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Endpoint reports current chat + tool counters (minute and day windows) plus
24h activity summary from usage_logs. Does not increment any counter, so
clients can poll it freely to plan around the limits.

For admin Bearer auth, returns tier='unlimited' with empty counters since
admin requests bypass enforcement entirely. Session auth gets per-window
counters but no recent summary (no key_id to filter usage_logs on).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GET /api/keys/usage returns counters and 24h activity for every active key
the session user owns; /dashboard/keys polls it every 15s and renders a
two-row bar widget per key (chat + tool, minute + day windows) with
24h call count and success rate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant