Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
7542c67
feat(plugin-openai): add rate limiter, improve embeddings, and update…
odilitime Feb 14, 2026
913e2e3
utils: fix the lockfile
odilitime Feb 18, 2026
90ab886
utils: fix the lockfile
odilitime Feb 18, 2026
052007c
utils: fix the lockfile
odilitime Feb 18, 2026
1a60bad
utils: fix the lockfile
odilitime Feb 18, 2026
632eb93
utils: fix the lockfile
odilitime Feb 18, 2026
eb569e3
utils: fix the lockfile
odilitime Feb 18, 2026
1131992
utils: fix the lockfile
odilitime Feb 18, 2026
e3b6dac
utils: fix the lockfile
odilitime Feb 18, 2026
b6de97e
utils: fix the lockfile
odilitime Feb 18, 2026
5e83b06
utils: fix the lockfile
odilitime Feb 18, 2026
12d397c
utils: fix the lockfile
odilitime Feb 18, 2026
cbf84d1
utils: fix in cursor
odilitime Feb 20, 2026
1ef8bab
banner.ts: fix string
odilitime Feb 20, 2026
a4bd589
models: fix empty
odilitime Feb 20, 2026
386f55c
models: fix in cursor
odilitime Feb 20, 2026
546ea10
utils: simplify implementation
odilitime Feb 20, 2026
6e91d47
utils: fix in cursor
odilitime Feb 20, 2026
bd7defc
banner.ts: fix it if
odilitime Feb 20, 2026
c7bd8fa
banner.ts: simplify implementation
odilitime Feb 20, 2026
ec07a45
banner.ts: fix string
odilitime Feb 20, 2026
3b4ffd0
banner.ts: simplify implementation
odilitime Feb 20, 2026
9c0cec7
banner.ts: simplify implementation
odilitime Feb 20, 2026
059337b
banner.ts: simplify implementation
odilitime Feb 20, 2026
3d76937
banner.ts: fix string
odilitime Feb 20, 2026
f9ffeff
banner.ts: simplify implementation
odilitime Feb 20, 2026
0ca08b9
banner.ts: simplify implementation
odilitime Feb 20, 2026
dbb543d
banner.ts: fix string
odilitime Feb 20, 2026
49c3aba
banner.ts: simplify implementation
odilitime Feb 20, 2026
1e105f1
banner.ts: fix string
odilitime Feb 20, 2026
786bd25
utils: simplify implementation
odilitime Feb 20, 2026
5e06d42
utils: fix in cursor
odilitime Feb 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,7 @@ node_modules
.turbo
dist
.env
.elizadb-test
.elizadb-test

# prr state file (auto-generated)
.pr-resolver-state.json
23 changes: 23 additions & 0 deletions .prr/lessons.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# PRR Lessons Learned

> This file is auto-generated by [prr](https://github.com/elizaOS/prr).
> It contains lessons learned from PR review fixes to help improve future fix attempts.
> You can edit this file manually or let prr update it.
> To share lessons across your team, commit this file to your repo.

## File-Specific Lessons

### src/index.ts

- Fix for src/index.ts:349 - tool modified wrong files (src/banner.ts), need to modify src/index.ts
- Fix for src/index.ts:349 - Looking at the code, I can see that the streaming test at line 333-373 **already has stream: true set** at line 345.

### src/utils/rate-limiter.ts

- Fix for src/utils/rate-limiter.ts - The current code at lines 396-401 already has a proper fix that checks for NaN: typescript
- Fix for src/utils/rate-limiter.ts - RESULT: ALREADY_FIXED — The code at lines 396-401 already includes a proper NaN check: retrySeconds !
- Fix for src/utils/rate-limiter.ts - The review comment mentions that parseFloat on retry-after can return NaN and the || undefined fallback never triggers, but the current code at lines 396-401 already has a proper fix: typescript
- Fix for src/utils/rate-limiter.ts - I can see that the code at lines 296-306 already has a proper fix that checks for NaN and handles zero values correctly: typescript
- Fix for src/utils/rate-limiter.ts:248 - tool modified wrong files (src/banner.ts, src/models/embedding.ts), need to modify src/utils/rate-limiter.ts
- Fix for src/utils/rate-limiter.ts:248 - The cleanup condition must account for map updates during concurrent calls—identity checks fail when other acquire() calls mutate the map before cleanup runs.
- Fix for src/utils/rate-limiter.ts:248 - The cleanup condition compares a stale promise reference against a freshly-created derived promise — they're guaranteed different objects on every subsequent call.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-generated PR review tool state committed to repo

Low Severity

The .prr/lessons.md file contains auto-generated debugging output from the prr review tool — fix attempt logs, RESULT: ALREADY_FIXED markers, and stale line-number references. While the tool suggests committing it, the content is process-specific noise (e.g., "tool modified wrong files", references to now-outdated line numbers) that provides no value to repository readers and clutters the codebase.

Fix in Cursor Fix in Web

74 changes: 74 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Changelog

All notable changes to `@elizaos/plugin-openai` are documented in this file.
Format based on [Keep a Changelog](https://keepachangelog.com/). Newest entries first.

## [Unreleased]

### Added

- **Billing 429 detection and fail-fast behavior** (`src/utils/rate-limiter.ts`)
- New `QuotaExceededError` class for permanent billing failures (distinct from transient `RateLimitError`).
- `extractRateLimitInfo()` now returns `isBillingError: boolean` to distinguish quota exhaustion from rate limiting.
- Billing errors detected via OpenAI's `error.code === "insufficient_quota"` (most reliable) with keyword fallback ("quota", "billing").
- `withRateLimit()` now fails immediately on billing 429s instead of wasting 30+ seconds on 5 retries.
- **WHY**: OpenAI uses HTTP 429 for both "wait a minute" (rate limit) and "add credits" (quota). Before this change, quota errors would retry pointlessly, wasting time and filling logs. Now they fail instantly with a clear billing URL.

- **Enhanced `throwIfRateLimited` for body inspection** (`src/utils/rate-limiter.ts`)
- Changed from synchronous to async to read 429 response bodies.
- Uses `response.clone().json()` to inspect error structure without consuming the original response.
- Throws `QuotaExceededError` for billing 429s, `RateLimitError` for rate-limit 429s.
- Updated all 5 call sites (`embedding.ts`, `image.ts` ×2, `audio.ts` ×2) to `await throwIfRateLimited(response)`.
- **WHY**: Raw-fetch handlers (embeddings, images, audio) need to distinguish billing from rate-limit 429s. Reading the body is the only reliable way to detect OpenAI's quota exhaustion errors.

- **Startup configuration banner** (`src/banner.ts`, `src/init.ts`)
- Displays compact config table on initialization showing API key (masked), base URL, models, and their status (set/default).
- Always includes direct link to OpenAI billing dashboard.
- **WHY**: Configuration issues are the #1 source of runtime errors. The banner makes misconfigurations immediately obvious (e.g., missing key, wrong base URL, unexpected model) without requiring users to dig through env vars or character settings.

- **Automatic tier detection** (`src/utils/rate-limiter.ts`, model handlers)
- New `logTierOnce(response)` function extracts RPM/TPM limits from `x-ratelimit-limit-*` headers.
- Logs account tier info after first successful API call (one-shot, zero cost).
- Integrated into all raw-fetch handlers (`embedding.ts`, `image.ts`, `audio.ts`).
- Silently skips if headers missing (Azure, Ollama, etc.).
- **WHY**: Users often don't know their OpenAI tier, which determines quota limits. Tier doesn't change during runtime, so logging it once from a response we're already processing has zero cost and helps users understand their limits.

- **Daemon-based rate limiting** (`src/utils/rate-limiter.ts`)
- Process-level singleton that persists across agent reinitializations within the same Node process.
- Per-category sliding-window RPM tracking (embeddings, chat, images, audio) — mirrors how OpenAI actually measures rate limits.
- Exponential backoff with jitter on 429 errors, using the `Retry-After` header when available for optimal timing.
- `withRateLimit(category, fn)` wrapper for transparent retry on rate-limited API calls.
- `acquireRateLimit(category)` for throttling without retry (used for streaming where transparent retry isn't possible).
- `throwIfRateLimited(response)` for raw-fetch handlers to convert 429 responses into typed `RateLimitError` before generic error handling.
- Configurable via `OPENAI_RATE_LIMIT_RPM` (global RPM override) and `OPENAI_RATE_LIMIT_MAX_RETRIES` (retry count override).

- **Forward/backward compatibility shims** (`src/types/index.ts`)
- `StreamingTextParams` — extends `GenerateTextParams` with `stream` and `onStreamChunk` for older `@elizaos/core` versions that don't export these.
- `TextStreamResult` — local definition of the streaming result type for older cores.
- These shims are structurally identical to the newer core types, so they become redundant (but harmless) when the core is upgraded.

- **Synthetic embedding fast-paths** (`src/models/embedding.ts`)
- Returns synthetic vectors (no API call) for `null` params, empty text, and the `"test"` string.
- The `"test"` fast-path is specifically for the core runtime's `ensureEmbeddingDimension()` probe, which sends `{ text: "test" }` at startup to discover vector length. Since we already know the dimension from `OPENAI_EMBEDDING_DIMENSIONS`, the API call was wasteful and consumed rate-limit budget at the worst time (startup, when other initialization calls may be in-flight).

### Changed

- **Non-blocking plugin initialization** (`src/init.ts`)
- Removed the eager `GET /models` API key validation fetch that ran during `init()`.
- **Why**: The validation raced with the core's `ensureEmbeddingDimension()` embedding probe for the same rate-limit budget. The validation would succeed (consuming a slot), then the embedding probe would get 429'd — triggering up to 5 retries with exponential backoff, adding 30+ seconds to startup. A bad API key surfaces on the first real model call anyway, so the eager check provided no actionable value.
- Now performs synchronous config presence checks only.

- **DRY model registration** (`src/index.ts`)
- Replaced 11 redundant lambda wrappers (`async (runtime, params) => handler(runtime, params)`) with direct function references (`handleTextEmbedding`, `handleTextSmall`, etc.). Eliminated ~70 lines of boilerplate with zero behavioral change.
- Applied `as unknown as NonNullable<Plugin["models"]>` type assertion to work around a TypeScript contravariance issue (TS2418) in older `@elizaos/core` versions where the `Plugin.models` type has an incompatible intersection with a string index signature.

- **Rate-limited model handlers**
- `embedding.ts` — wrapped in `withRateLimit("embeddings", ...)` with `throwIfRateLimited` for 429 detection.
- `text.ts` — `acquireRateLimit("chat")` for streaming, `withRateLimit("chat", ...)` for non-streaming.
- `object.ts` — wrapped in `withRateLimit("chat", ...)`.
- `image.ts` — generation wrapped in `withRateLimit("images", ...)`, description in `withRateLimit("chat", ...")`.
- `audio.ts` — both TTS and transcription wrapped in `withRateLimit("audio", ...)`.

## [1.6.1] - 2025-01-01

_Baseline version before rate limiting and compatibility changes. No changelog entries recorded for prior versions._
Loading