-
Notifications
You must be signed in to change notification settings - Fork 10
feat(plugin-openai): add rate limiter, improve embeddings, and update… #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
odilitime
wants to merge
32
commits into
1.x
Choose a base branch
from
odi-dev
base: 1.x
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
7542c67
feat(plugin-openai): add rate limiter, improve embeddings, and update…
odilitime 913e2e3
utils: fix the lockfile
odilitime 90ab886
utils: fix the lockfile
odilitime 052007c
utils: fix the lockfile
odilitime 1a60bad
utils: fix the lockfile
odilitime 632eb93
utils: fix the lockfile
odilitime eb569e3
utils: fix the lockfile
odilitime 1131992
utils: fix the lockfile
odilitime e3b6dac
utils: fix the lockfile
odilitime b6de97e
utils: fix the lockfile
odilitime 5e83b06
utils: fix the lockfile
odilitime 12d397c
utils: fix the lockfile
odilitime cbf84d1
utils: fix in cursor
odilitime 1ef8bab
banner.ts: fix string
odilitime a4bd589
models: fix empty
odilitime 386f55c
models: fix in cursor
odilitime 546ea10
utils: simplify implementation
odilitime 6e91d47
utils: fix in cursor
odilitime bd7defc
banner.ts: fix it if
odilitime c7bd8fa
banner.ts: simplify implementation
odilitime ec07a45
banner.ts: fix string
odilitime 3b4ffd0
banner.ts: simplify implementation
odilitime 9c0cec7
banner.ts: simplify implementation
odilitime 059337b
banner.ts: simplify implementation
odilitime 3d76937
banner.ts: fix string
odilitime f9ffeff
banner.ts: simplify implementation
odilitime 0ca08b9
banner.ts: simplify implementation
odilitime dbb543d
banner.ts: fix string
odilitime 49c3aba
banner.ts: simplify implementation
odilitime 1e105f1
banner.ts: fix string
odilitime 786bd25
utils: simplify implementation
odilitime 5e06d42
utils: fix in cursor
odilitime File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| # PRR Lessons Learned | ||
|
|
||
| > This file is auto-generated by [prr](https://github.com/elizaOS/prr). | ||
| > It contains lessons learned from PR review fixes to help improve future fix attempts. | ||
| > You can edit this file manually or let prr update it. | ||
| > To share lessons across your team, commit this file to your repo. | ||
|
|
||
| ## File-Specific Lessons | ||
|
|
||
| ### src/index.ts | ||
|
|
||
| - Fix for src/index.ts:349 - tool modified wrong files (src/banner.ts), need to modify src/index.ts | ||
| - Fix for src/index.ts:349 - Looking at the code, I can see that the streaming test at line 333-373 **already has stream: true set** at line 345. | ||
|
|
||
| ### src/utils/rate-limiter.ts | ||
|
|
||
| - Fix for src/utils/rate-limiter.ts - The current code at lines 396-401 already has a proper fix that checks for NaN: typescript | ||
| - Fix for src/utils/rate-limiter.ts - RESULT: ALREADY_FIXED — The code at lines 396-401 already includes a proper NaN check: retrySeconds ! | ||
| - Fix for src/utils/rate-limiter.ts - The review comment mentions that parseFloat on retry-after can return NaN and the || undefined fallback never triggers, but the current code at lines 396-401 already has a proper fix: typescript | ||
| - Fix for src/utils/rate-limiter.ts - I can see that the code at lines 296-306 already has a proper fix that checks for NaN and handles zero values correctly: typescript | ||
| - Fix for src/utils/rate-limiter.ts:248 - tool modified wrong files (src/banner.ts, src/models/embedding.ts), need to modify src/utils/rate-limiter.ts | ||
| - Fix for src/utils/rate-limiter.ts:248 - The cleanup condition must account for map updates during concurrent calls—identity checks fail when other acquire() calls mutate the map before cleanup runs. | ||
| - Fix for src/utils/rate-limiter.ts:248 - The cleanup condition compares a stale promise reference against a freshly-created derived promise — they're guaranteed different objects on every subsequent call. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| # Changelog | ||
|
|
||
| All notable changes to `@elizaos/plugin-openai` are documented in this file. | ||
| Format based on [Keep a Changelog](https://keepachangelog.com/). Newest entries first. | ||
|
|
||
| ## [Unreleased] | ||
|
|
||
| ### Added | ||
|
|
||
| - **Billing 429 detection and fail-fast behavior** (`src/utils/rate-limiter.ts`) | ||
| - New `QuotaExceededError` class for permanent billing failures (distinct from transient `RateLimitError`). | ||
| - `extractRateLimitInfo()` now returns `isBillingError: boolean` to distinguish quota exhaustion from rate limiting. | ||
| - Billing errors detected via OpenAI's `error.code === "insufficient_quota"` (most reliable) with keyword fallback ("quota", "billing"). | ||
| - `withRateLimit()` now fails immediately on billing 429s instead of wasting 30+ seconds on 5 retries. | ||
| - **WHY**: OpenAI uses HTTP 429 for both "wait a minute" (rate limit) and "add credits" (quota). Before this change, quota errors would retry pointlessly, wasting time and filling logs. Now they fail instantly with a clear billing URL. | ||
|
|
||
| - **Enhanced `throwIfRateLimited` for body inspection** (`src/utils/rate-limiter.ts`) | ||
| - Changed from synchronous to async to read 429 response bodies. | ||
| - Uses `response.clone().json()` to inspect error structure without consuming the original response. | ||
| - Throws `QuotaExceededError` for billing 429s, `RateLimitError` for rate-limit 429s. | ||
| - Updated all 5 call sites (`embedding.ts`, `image.ts` ×2, `audio.ts` ×2) to `await throwIfRateLimited(response)`. | ||
| - **WHY**: Raw-fetch handlers (embeddings, images, audio) need to distinguish billing from rate-limit 429s. Reading the body is the only reliable way to detect OpenAI's quota exhaustion errors. | ||
|
|
||
| - **Startup configuration banner** (`src/banner.ts`, `src/init.ts`) | ||
| - Displays compact config table on initialization showing API key (masked), base URL, models, and their status (set/default). | ||
| - Always includes direct link to OpenAI billing dashboard. | ||
| - **WHY**: Configuration issues are the #1 source of runtime errors. The banner makes misconfigurations immediately obvious (e.g., missing key, wrong base URL, unexpected model) without requiring users to dig through env vars or character settings. | ||
|
|
||
| - **Automatic tier detection** (`src/utils/rate-limiter.ts`, model handlers) | ||
| - New `logTierOnce(response)` function extracts RPM/TPM limits from `x-ratelimit-limit-*` headers. | ||
| - Logs account tier info after first successful API call (one-shot, zero cost). | ||
| - Integrated into all raw-fetch handlers (`embedding.ts`, `image.ts`, `audio.ts`). | ||
| - Silently skips if headers missing (Azure, Ollama, etc.). | ||
| - **WHY**: Users often don't know their OpenAI tier, which determines quota limits. Tier doesn't change during runtime, so logging it once from a response we're already processing has zero cost and helps users understand their limits. | ||
|
|
||
| - **Daemon-based rate limiting** (`src/utils/rate-limiter.ts`) | ||
| - Process-level singleton that persists across agent reinitializations within the same Node process. | ||
| - Per-category sliding-window RPM tracking (embeddings, chat, images, audio) — mirrors how OpenAI actually measures rate limits. | ||
| - Exponential backoff with jitter on 429 errors, using the `Retry-After` header when available for optimal timing. | ||
| - `withRateLimit(category, fn)` wrapper for transparent retry on rate-limited API calls. | ||
| - `acquireRateLimit(category)` for throttling without retry (used for streaming where transparent retry isn't possible). | ||
| - `throwIfRateLimited(response)` for raw-fetch handlers to convert 429 responses into typed `RateLimitError` before generic error handling. | ||
| - Configurable via `OPENAI_RATE_LIMIT_RPM` (global RPM override) and `OPENAI_RATE_LIMIT_MAX_RETRIES` (retry count override). | ||
|
|
||
| - **Forward/backward compatibility shims** (`src/types/index.ts`) | ||
| - `StreamingTextParams` — extends `GenerateTextParams` with `stream` and `onStreamChunk` for older `@elizaos/core` versions that don't export these. | ||
| - `TextStreamResult` — local definition of the streaming result type for older cores. | ||
| - These shims are structurally identical to the newer core types, so they become redundant (but harmless) when the core is upgraded. | ||
|
|
||
| - **Synthetic embedding fast-paths** (`src/models/embedding.ts`) | ||
| - Returns synthetic vectors (no API call) for `null` params, empty text, and the `"test"` string. | ||
| - The `"test"` fast-path is specifically for the core runtime's `ensureEmbeddingDimension()` probe, which sends `{ text: "test" }` at startup to discover vector length. Since we already know the dimension from `OPENAI_EMBEDDING_DIMENSIONS`, the API call was wasteful and consumed rate-limit budget at the worst time (startup, when other initialization calls may be in-flight). | ||
|
|
||
| ### Changed | ||
|
|
||
| - **Non-blocking plugin initialization** (`src/init.ts`) | ||
| - Removed the eager `GET /models` API key validation fetch that ran during `init()`. | ||
| - **Why**: The validation raced with the core's `ensureEmbeddingDimension()` embedding probe for the same rate-limit budget. The validation would succeed (consuming a slot), then the embedding probe would get 429'd — triggering up to 5 retries with exponential backoff, adding 30+ seconds to startup. A bad API key surfaces on the first real model call anyway, so the eager check provided no actionable value. | ||
| - Now performs synchronous config presence checks only. | ||
|
|
||
| - **DRY model registration** (`src/index.ts`) | ||
| - Replaced 11 redundant lambda wrappers (`async (runtime, params) => handler(runtime, params)`) with direct function references (`handleTextEmbedding`, `handleTextSmall`, etc.). Eliminated ~70 lines of boilerplate with zero behavioral change. | ||
| - Applied `as unknown as NonNullable<Plugin["models"]>` type assertion to work around a TypeScript contravariance issue (TS2418) in older `@elizaos/core` versions where the `Plugin.models` type has an incompatible intersection with a string index signature. | ||
|
|
||
| - **Rate-limited model handlers** | ||
| - `embedding.ts` — wrapped in `withRateLimit("embeddings", ...)` with `throwIfRateLimited` for 429 detection. | ||
| - `text.ts` — `acquireRateLimit("chat")` for streaming, `withRateLimit("chat", ...)` for non-streaming. | ||
| - `object.ts` — wrapped in `withRateLimit("chat", ...)`. | ||
| - `image.ts` — generation wrapped in `withRateLimit("images", ...)`, description in `withRateLimit("chat", ...")`. | ||
| - `audio.ts` — both TTS and transcription wrapped in `withRateLimit("audio", ...)`. | ||
|
|
||
| ## [1.6.1] - 2025-01-01 | ||
|
|
||
| _Baseline version before rate limiting and compatibility changes. No changelog entries recorded for prior versions._ |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto-generated PR review tool state committed to repo
Low Severity
The
.prr/lessons.mdfile contains auto-generated debugging output from theprrreview tool — fix attempt logs,RESULT: ALREADY_FIXEDmarkers, and stale line-number references. While the tool suggests committing it, the content is process-specific noise (e.g., "tool modified wrong files", references to now-outdated line numbers) that provides no value to repository readers and clutters the codebase.