diff --git a/.claude/skills/api-validator/SKILL.md b/.claude/skills/api-validator/SKILL.md new file mode 100644 index 0000000000..2587113e07 --- /dev/null +++ b/.claude/skills/api-validator/SKILL.md @@ -0,0 +1,676 @@ +--- +name: api-validator +description: Scan Bifrost HTTP controllers/handlers/integrations and validate OpenAPI API coverage, route methods/paths, parameters, request/response docs, and auth/security information. Use when asked to audit missing or incorrect APIs, compare controllers against docs/openapi, validate auth information, or fix API documentation drift. Invoked with /api-validator [scope] [--fix]. +allowed-tools: Read, Grep, Glob, Bash, Edit, Write, Task, AskUserQuestion, TodoWrite +--- + +# API Validator + +Audit Bifrost's HTTP API surface by scanning controller/handler route registrations, deriving the actual authentication behavior from middleware wiring, and comparing the result with `docs/openapi/openapi.yaml` and referenced OpenAPI path files. + +Use this skill when the user asks for any of the following: +- "missing APIs" +- "incorrect APIs" +- "validate APIs" +- "check OpenAPI docs against controllers" +- "include auth information" +- "fix API docs" +- "audit route coverage" + +Default behavior is **audit only**. Do not edit files unless the user explicitly asks for fixes or approves a proposed fix plan. + +--- + +## Usage + +```bash +/api-validator # Full audit: all handlers + OpenAPI +/api-validator management # Audit only /api/*, /health, /ws, /metrics +/api-validator inference # Audit /v1/* and provider integration APIs +/api-validator auth # Focus on effective OpenAPI security vs middleware auth +/api-validator # Audit one prefix, e.g. /api/governance or /openai +/api-validator --fix # Audit, present plan, then fix after approval +``` + +If scope is unclear, ask: + +```text +Should I audit all APIs, only management APIs (/api/*), only inference/integration APIs (/v1/* and provider prefixes), or a specific path prefix? +``` + +--- + +## Source of truth + +### Code route sources + +Scan these files/directories as the controller source of truth: + +| Area | Source | Notes | +|---|---|---| +| Server route wiring | `transports/bifrost-http/server/server.go` | `RegisterAPIRoutes`, `RegisterInferenceRoutes`, `RegisterUIRoutes`, direct `/metrics`, middleware lists | +| HTTP handlers/controllers | `transports/bifrost-http/handlers/*.go` | `RegisterRoutes` methods contain direct route registrations | +| SDK/provider integrations | `transports/bifrost-http/integrations/*.go` | `RouteConfig` factories and `GenericRouter.RegisterRoutes` register OpenAI/Anthropic/GenAI/Bedrock/Cohere/LiteLLM/LangChain/PydanticAI/Cursor/Passthrough routes | +| Auth middleware | `transports/bifrost-http/handlers/middlewares.go` | `APIMiddleware`, `InferenceMiddleware`, whitelists, realtime auth skips | +| Context auth extraction | `transports/bifrost-http/lib/ctx.go` | Virtual key and API key header extraction | +| Governance VK parser | `plugins/governance/utils.go` | Accepted virtual key headers for VK self-service endpoints | + +### OpenAPI documentation sources + +Scan these files as the documented API source of truth: + +| Area | Source | +|---|---| +| Root spec/path map/security schemes | `docs/openapi/openapi.yaml` | +| Bundled output (generated, **do not edit**) | `docs/openapi/openapi.json` — produced by CI (`.github/workflows/openapi-bundle.yml`) on push to `main` | +| Inference paths | `docs/openapi/paths/inference/*.yaml` | +| Integration paths | `docs/openapi/paths/integrations/**/*.yaml` | +| Management paths | `docs/openapi/paths/management/*.yaml` | +| Schemas | `docs/openapi/schemas/**/*.yaml` | + +`docs/openapi/openapi.json` is a build artifact. **Never edit it by hand and never regenerate it into the checked-in path.** The `OpenAPI Bundle` GitHub Actions workflow runs `python bundle.py` on push to `main` and commits the result as a separate `chore: regenerate openapi.json --skip-ci` commit. When validating locally, always bundle to `/tmp`, not into the repo. + +--- + +## Workflow overview + +1. **Preflight** -- Confirm scope, inspect git status, avoid edits unless approved. +2. **Extract actual API inventory** -- Scan route registrations and route config factories. +3. **Derive actual auth behavior** -- Trace middleware registration and handler-specific auth checks. +4. **Extract documented OpenAPI inventory** -- Resolve path refs and effective operation security. +5. **Compare inventories** -- Detect missing, stale, method/path, parameter, schema, and auth mismatches. +6. **Report findings** -- Present actionable tables with file/line references and recommended fixes. +7. **Fix with approval** -- If requested, update OpenAPI files or controller code after showing a plan. +8. **Validate** -- Re-bundle OpenAPI and run relevant tests/linters if available. + +--- + +## Step 1: Preflight + +Always start with: + +```bash +git status --short +``` + +If the worktree is dirty, mention it and avoid overwriting unrelated changes. + +Identify the scope: + +```bash +# Route registration entry points +grep -R "func (.*RegisterRoutes" -n transports/bifrost-http/handlers transports/bifrost-http/integrations --include='*.go' + +grep -n "func (s \*BifrostHTTPServer) RegisterAPIRoutes\|func (s \*BifrostHTTPServer) RegisterInferenceRoutes\|RegisterUIRoutes" transports/bifrost-http/server/server.go +``` + +If the user asked for `--fix`, still audit first and present a fix plan before editing. + +--- + +## Step 2: Extract actual API inventory from controllers + +Create an actual route inventory with this shape: + +| Method | Path | Handler | Source | Route group | Registration condition | Actual auth class | +|---|---|---|---|---|---|---| +| `GET` | `/api/config` | `ConfigHandler.getConfig` | `handlers/config.go:76` | Management | Always | Admin/session API | + +### 2a. Direct handler route registrations + +Scan all handler route registration methods: + +```bash +grep -R "\.GET\|\.POST\|\.PUT\|\.DELETE\|\.PATCH\|\.HEAD\|\.OPTIONS\|r.Handle" \ + -n transports/bifrost-http/handlers --include='*.go' +``` + +For a cleaner first pass: + +```bash +for f in transports/bifrost-http/handlers/*.go; do + if grep -q "RegisterRoutes" "$f"; then + echo "--- $f" + grep -nE 'func \(h .*RegisterRoutes|\.(GET|POST|PUT|DELETE|PATCH|HEAD|OPTIONS)\(|r\.Handle\(' "$f" + fi +done +``` + +Record: +- HTTP method +- path pattern +- handler function +- file:line +- whether route is registered with `middlewares...`, custom middleware, or no middleware +- whether route registration is conditional + +### 2b. Server-level and conditional route registration + +Read `RegisterAPIRoutes`, `RegisterInferenceRoutes`, and the middleware-assembly block carefully. Locate them by symbol so the ranges don't drift as the file grows: + +```bash +SERVER=transports/bifrost-http/server/server.go + +# RegisterInferenceRoutes — inference route registrations. +# Window is generous (+60) so newly added integration handlers don't fall outside it. +INFER=$(grep -n "func (s \*BifrostHTTPServer) RegisterInferenceRoutes" "$SERVER" | head -1 | cut -d: -f1) +[ -n "$INFER" ] && sed -n "$((INFER)),$((INFER+60))p" "$SERVER" + +# RegisterAPIRoutes — management/api route registrations (conditional plugin routes live here) +API=$(grep -n "func (s \*BifrostHTTPServer) RegisterAPIRoutes" "$SERVER" | head -1 | cut -d: -f1) +[ -n "$API" ] && sed -n "$((API)),$((API+120))p" "$SERVER" + +# Middleware-assembly block — how apiMiddlewares / inferenceMiddlewares are composed, +# where AuthMiddleware.APIMiddleware()/InferenceMiddleware() are appended, and where +# TracingMiddleware / TransportInterceptorMiddleware are layered onto inference routes. +grep -n "apiMiddlewares\s*=\|inferenceMiddlewares\s*=\|AuthMiddleware\.APIMiddleware\|AuthMiddleware\.InferenceMiddleware\|TransportInterceptorMiddleware\|TracingMiddleware" "$SERVER" +``` + +Important conditional registrations: + +| Condition | Routes affected | +|---|---| +| Governance plugin present | `GovernanceHandler.RegisterRoutes` (`/api/governance/*`) | +| Logging plugin present | `LoggingHandler.RegisterRoutes` (`/api/logs*`, `/api/mcp-logs*`) | +| Semantic cache plugin present | `CacheHandler.RegisterRoutes` (`/api/cache/*`) | +| Prompts plugin present | `PromptsHandler.RegisterRoutes` (`/api/prompt-repo/*`) | +| Prometheus plugin present | `/metrics` | +| Dev mode only | `/api/dev/pprof*` | +| OAuth metadata/per-user/consent | Registered without API auth middleware; public by design | +| UI catch-all | `/` and `/{filepath:*}`; do not compare as API docs unless user asks | + +### 2c. Inference and SDK integration routes + +Direct unified inference routes live in `handlers/inference.go`, `handlers/asyncinference.go`, `handlers/mcpinference.go`, and `handlers/mcpserver.go`. + +Provider/framework integration routes are registered through `integrations.GenericRouter` and route config factories. + +Scan integration route config factories: + +```bash +grep -R "func Create.*RouteConfigs\|func OpenAI.*Paths\|RouteConfig{" \ + -n transports/bifrost-http/integrations --include='*.go' +``` + +Key factories/routers: + +| Integration | Code source | +|---|---| +| OpenAI | `integrations/openai.go` (`CreateOpenAIRouteConfigs`, list models, batch, files, containers, container files, realtime path helpers) | +| Anthropic | `integrations/anthropic.go` | +| GenAI/Gemini | `integrations/genai.go` | +| Bedrock | `integrations/bedrock.go` | +| Cohere | `integrations/cohere.go` | +| LiteLLM | `integrations/litellm.go` | +| LangChain | `integrations/langchain.go` | +| PydanticAI | `integrations/pydanticai.go` | +| Cursor | `integrations/cursor.go` | +| Passthrough | `integrations/passthrough.go` catch-all prefixes | +| Realtime WebSocket/WebRTC/client secrets | `handlers/wsresponses.go`, `handlers/wsrealtime.go`, `handlers/webrtc_realtime.go`, `handlers/realtime_client_secrets.go`, `integrations/openai.go` path helper functions | + +**Do not rely only on direct `r.GET(...)` grep for integrations.** Many integration routes are built from `RouteConfig{ Path: pathPrefix + path, Method: ... }` inside loops. + +### 2d. Handle dynamic/catch-all routes correctly + +Some routes are intentionally catch-all in Go but represented as concrete operations in OpenAPI. + +Examples: +- OpenAI Azure-style deployments: `pathPrefix + "/openai/deployments/{deploymentPath:*}"` dispatches by suffix in `GetHTTPRequestType`. Expand the suffixes documented in the switch (`chat/completions`, `responses`, `embeddings`, etc.) before comparing. +- Passthrough routers register `{path:*}` for provider passthrough prefixes. Treat these as catch-all passthrough support. Do not mark every possible downstream provider URL as missing OpenAPI unless the user explicitly wants passthrough documentation. +- UI catch-all `/{filepath:*}` is not an API endpoint. + +Normalize path parameters for comparison: +- Exact path match first. +- Then normalized match where `{name}`, `{deployment-id}`, and `{deploymentPath:*}` become `{param}`. +- If normalized paths match but parameter names differ, report as a **path parameter naming mismatch**, not a missing route. + +--- + +## Step 3: Derive actual auth behavior + +Auth correctness is part of this skill. Always inspect actual middleware wiring instead of guessing from path names. + +### 3a. Read middleware setup + +Read the auth and middleware setup. Locate the blocks by symbol so the ranges don't drift as the files grow — auth derivation for every route depends on these functions being read correctly, so a stale range would poison the whole audit: + +```bash +SERVER=transports/bifrost-http/server/server.go +MW_FILE=transports/bifrost-http/handlers/middlewares.go + +# Middleware-assembly block in server.go — where apiMiddlewares / inferenceMiddlewares +# get AuthMiddleware.APIMiddleware()/InferenceMiddleware() / Tracing / +# TransportInterceptor appended. +MW=$(grep -n "apiMiddlewares := commonMiddlewares" "$SERVER" | head -1 | cut -d: -f1) +[ -n "$MW" ] && sed -n "$((MW)),$((MW+65))p" "$SERVER" + +# InferenceMiddleware — auth wiring for /v1/* and provider integration routes. +INF=$(grep -n "func (m \*AuthMiddleware) InferenceMiddleware" "$MW_FILE" | head -1 | cut -d: -f1) +[ -n "$INF" ] && sed -n "$((INF)),$((INF+15))p" "$MW_FILE" + +# APIMiddleware — auth wiring for /api/* management routes (whitelist + prefix +# whitelist + WebSocket dashboard handling lives here; function body is large). +API_MW=$(grep -n "func (m \*AuthMiddleware) APIMiddleware" "$MW_FILE" | head -1 | cut -d: -f1) +[ -n "$API_MW" ] && sed -n "$((API_MW)),$((API_MW+180))p" "$MW_FILE" +``` + +Key logic: +- `apiMiddlewares` includes `AuthMiddleware.APIMiddleware()` in OSS when config store exists and auth middleware initializes. +- `inferenceMiddlewares` includes `AuthMiddleware.InferenceMiddleware()` in OSS when config store exists and auth middleware initializes. +- `OAuthMetadataHandler`, `PerUserOAuthHandler`, and `ConsentHandler` are registered without auth middleware. +- `/api/governance/virtual-keys/quota` is registered without `middlewares...` and performs virtual-key authentication inside the handler. +- Realtime transport endpoints have special skip behavior in `isRealtimeTransportEndpoint` and handler-level auth extraction. + +### 3b. Actual auth classes + +Classify every route into one of these actual auth classes: + +| Auth class | Actual code behavior | OpenAPI expectation | +|---|---|---| +| Public | No auth middleware, or explicitly whitelisted in `APIMiddleware` | Operation must set `security: []` if root `security` is defined | +| Admin/session API | `APIMiddleware` protects route; accepts session Bearer token, Basic admin auth, and session cookie fallback. WebSocket dashboard also accepts `?ticket=`/legacy token/cookie. | `BearerAuth` and `BasicAuth`; mention cookie/ticket in description where relevant. Do **not** include `VirtualKeyAuth` unless handler accepts VK. Do **not** include `ApiKeyAuth` unless `x-api-key` is actually accepted for that route. | +| Inference API | `InferenceMiddleware` protects route unless `auth_config.disable_auth_on_inference` is true; context extraction supports virtual keys and direct provider/API key headers for inference flows. | Usually `BearerAuth`, `BasicAuth`, `VirtualKeyAuth`, `ApiKeyAuth` | +| Virtual-key self-service | Handler itself parses virtual key from `x-bf-vk`, `Authorization: Bearer sk-bf-*`, `x-api-key: sk-bf-*`, or `x-goog-api-key: sk-bf-*` | `VirtualKeyAuth`, `BearerAuth`, `ApiKeyAuth` (and document Google key header if exposed); no admin-only security | +| Realtime transport | WebSocket/WebRTC/client secret handlers capture auth headers/subprotocols and have special middleware bypasses | Validate per handler; do not assume standard admin or inference auth | +| Dev-only | Only registered when `handlers.IsDevMode()` | Document as dev-only or omit from public OpenAPI, depending existing convention | +| Conditional plugin | Only registered when a plugin is loaded | Document condition in description or mark conditional in report | + +### 3c. Public and whitelisted routes to verify + +The API auth middleware has a system whitelist and prefix whitelist. Verify it in code each time, but expect these routes/prefixes to be public when auth is enabled: + +- `/health` +- `/api/session/login` +- `/api/session/is-auth-enabled` +- `/api/version` +- `/api/oauth/callback` +- `/api/oauth/*` (prefix whitelist in `APIMiddleware`) +- `/.well-known/oauth-protected-resource` +- `/.well-known/oauth-authorization-server` +- `/oauth/consent` +- `/oauth/consent/mcps` +- `/api/oauth/per-user/consent/*` +- `/api/dev/*` (dev-only) +- `/login`, `/favicon.ico`, `/assets/*` (UI/static; not API docs usually) +- SCIM OAuth routes if present in enterprise code + +**Important:** OpenAPI root `security` applies to every operation unless the operation overrides it. If an endpoint is public, it must explicitly use: + +```yaml +security: [] +``` + +Otherwise the docs incorrectly show authentication as required. + +### 3d. Virtual key/header auth details + +For virtual-key behavior, inspect: + +```bash +# Context auth extraction (virtual key + provider/API key headers) +sed -n '1,90p' plugins/governance/utils.go +grep -n "x-bf-vk\|x-goog-api-key\|sk-bf-\|VirtualKey\|ExtractVirtualKey" transports/bifrost-http/lib/ctx.go + +# VK quota handler (locate by symbol so line numbers don't drift) +LINE=$(grep -n "func.*getVirtualKeyQuota" transports/bifrost-http/handlers/governance.go | head -1 | cut -d: -f1) +if [ -n "$LINE" ]; then + sed -n "$((LINE)),$((LINE+45))p" transports/bifrost-http/handlers/governance.go +fi +``` + +Virtual key sources commonly include: +- `x-bf-vk` +- `Authorization: Bearer sk-bf-*` +- `x-api-key: sk-bf-*` +- `x-goog-api-key: sk-bf-*` + +Do not document a header as an auth method unless the code for that route actually accepts it. + +--- + +## Step 4: Extract documented OpenAPI inventory + +### 4a. Read root OpenAPI path map + +```bash +grep -n "^ /" docs/openapi/openapi.yaml +``` + +### 4b. Bundle OpenAPI to resolve refs + +Prefer using the bundler so effective operations are easy to parse: + +```bash +cd docs/openapi +python3 bundle.py --output /tmp/bifrost-openapi.json +cd - +``` + +If `PyYAML` is missing, report that bundling could not run and fall back to reading `docs/openapi/openapi.json` if present: + +```bash +python3 - <<'PY' +import json +spec = json.load(open('docs/openapi/openapi.json')) +print(len(spec.get('paths', {}))) +PY +``` + +### 4c. Print effective documented operations and security + +Use this helper after bundling: + +```bash +python3 - <<'PY' +import json +spec = json.load(open('/tmp/bifrost-openapi.json')) +root_security = spec.get('security') +methods = {'get','post','put','delete','patch','head','options','trace'} +for path, item in sorted(spec.get('paths', {}).items()): + for method, op in sorted(item.items()): + if method not in methods: + continue + effective_security = op.get('security', root_security) + operation_id = op.get('operationId', '') + tags = ','.join(op.get('tags', [])) + print(f"{method.upper():6} {path:70} security={effective_security} operationId={operation_id} tags={tags}") +PY +``` + +Record: +- Path +- Method +- `operationId` +- Tags +- Effective security (`operation.security` if present, otherwise root `security`) +- Parameters (`path`, `query`, `header`) +- Request body presence/content type +- Response status codes/content types + +--- + +## Step 5: Compare actual vs documented APIs + +Create comparison tables. Use exact route matches first, then normalized path-parameter matches. + +### 5a. Coverage mismatches + +Classify as: + +| Type | Meaning | +|---|---| +| Missing in OpenAPI | Controller has a route, but `docs/openapi/openapi.yaml` has no matching path+method | +| Stale in OpenAPI | OpenAPI documents a path+method not found in OSS controllers/integrations | +| Method mismatch | Path exists in both, but methods differ | +| Path parameter mismatch | Same normalized path, different parameter names or wildcard behavior | +| Conditional route undocumented | Route exists only when plugin/dev mode enabled but docs do not mention the condition | +| Catch-all route undocumented | Passthrough/catch-all exists but docs omit it; usually low priority unless public API intended | + +For stale routes, check whether they may be enterprise-only before calling them wrong. If the code is not in the OSS repo, report as: + +```text +Documented route not found in OSS controllers. This may be enterprise-only; confirm expected source before removing. +``` + +### 5b. Auth mismatches + +For every route, compare actual auth class to effective OpenAPI security. + +Flag these issues: +- Public route inherits root auth because `security: []` is missing. +- Admin/session route documents `VirtualKeyAuth` but code does not accept VK. +- Admin/session route documents `ApiKeyAuth` but code does not accept `x-api-key` for that route. +- Virtual-key self-service route documents admin auth only or omits accepted VK headers. +- Inference route omits `VirtualKeyAuth` or `ApiKeyAuth` when the code accepts them. +- WebSocket route documents normal HTTP auth but code requires ticket/query/cookie/subprotocol behavior. +- Realtime route docs do not match handler-level auth parsing. + +### 5c. Parameter mismatches + +For each matched route, inspect handler code for: + +```bash +# Path params +ctx.UserValue("...") + +# Query params +ctx.QueryArgs().Peek("...") +ctx.QueryArgs().GetUintOrZero("...") +ctx.QueryArgs().VisitAll(...) + +# Headers +ctx.Request.Header.Peek("...") +ctx.Request.Header.Cookie("...") +``` + +Compare with OpenAPI `parameters`: +- Missing path parameter +- Missing query parameter +- Required flag mismatch +- Type mismatch (`boolean`, `integer`, `string`, enum) +- Header parameter missing where auth/security scheme is not enough + +### 5d. Request body mismatches + +For routes with request bodies: + +1. Find the Go request struct used by the handler. +2. Compare `json` tags and validation logic to OpenAPI schema. +3. Check required fields and nullable/optional behavior. +4. Check content type handling (`application/json`, multipart, `text/event-stream`, raw SDP, etc.). + +Useful searches: + +```bash +grep -n "type .*Request struct" transports/bifrost-http/handlers/.go +grep -n "json.Unmarshal\|sonic.Unmarshal\|parse.*Multipart\|ContentType" transports/bifrost-http/handlers/.go transports/bifrost-http/integrations/*.go +``` + +### 5e. Response/status mismatches + +Inspect handler responses: + +```bash +grep -n "SendJSON\|SendError\|SendBifrostError\|SetStatusCode\|Status" transports/bifrost-http/handlers/.go +``` + +Compare with OpenAPI responses: +- Missing success status +- Missing error status +- Wrong content type +- Response schema does not match fields actually returned +- Streaming response missing `text/event-stream` +- WebSocket response should document `101` upgrade where applicable + +--- + +## Step 6: Report format + +Always present the audit in this structure: + +````markdown +## API validation report + +### Scope +- **Scope audited:** +- **Controller sources:** +- **OpenAPI sources:** +- **Bundle status:** + +### Summary +| Category | Count | +|---|---:| +| Actual routes found | N | +| Documented operations found | N | +| Missing in OpenAPI | N | +| Stale or not found in OSS controllers | N | +| Method/path mismatches | N | +| Auth/security mismatches | N | +| Parameter/schema/status mismatches | N | + +### Missing APIs in OpenAPI +| Method | Path | Handler/source | Actual auth | Registration condition | Recommended doc file | +|---|---|---|---|---|---| + +### Documented APIs not found in controllers +| Method | Path | OpenAPI source | Effective auth | Notes | +|---|---|---|---|---| + +### Method/path mismatches +| Actual | Documented | Source | Issue | Recommended fix | +|---|---|---|---|---| + +### Auth/security mismatches +| Method | Path | Actual auth from code | OpenAPI effective security | Source | Recommended fix | +|---|---|---|---|---|---| + +### Parameter/request/response mismatches +| Method | Path | Area | Code source | OpenAPI source | Issue | Recommended fix | +|---|---|---|---|---|---|---| + +### Conditional routes +| Method | Path | Condition | Should be documented? | Notes | +|---|---|---|---|---| + +### Recommended fix plan +1. +2. + +### Validation commands +```bash +cd docs/openapi && python3 bundle.py --output /tmp/bifrost-openapi.json +# plus any relevant go tests or OpenAPI lint commands +``` + +**Proceed with fixes?** (yes / no / modify plan) +```` + +If no issues are found, still report the route count, auth matrix used, and commands run. + +--- + +## Step 7: Fix mode + +Only enter fix mode after the user asks for `--fix` or approves the proposed plan. + +### 7a. Fix priority + +Prefer fixes in this order: + +1. **OpenAPI documentation fixes** when controllers are correct and docs are missing/stale. +2. **Controller route fixes** only if docs reflect the intended API and code is actually wrong. +3. **Auth behavior fixes** only after explicit confirmation, because changing middleware/handler auth can be breaking. + +### 7b. OpenAPI edit rules + +When adding or fixing docs: +- Edit YAML sources only. **Never edit `docs/openapi/openapi.json`** — it is regenerated and committed by the `OpenAPI Bundle` GitHub Actions workflow on push to `main`. +- Do not run `python3 bundle.py` with `--output docs/openapi/openapi.json` (or with no `--output`, which defaults to that path). Always bundle to `/tmp` for local validation. +- Add path mapping in `docs/openapi/openapi.yaml`. +- Add or update the relevant file in `docs/openapi/paths//`. +- Add/update schemas in `docs/openapi/schemas//` when needed. +- Use unique `operationId` values. +- Keep tags consistent with existing docs. +- Public operations must set `security: []` if root security exists. +- Inference operations should include `VirtualKeyAuth` when virtual keys are accepted. +- Management admin operations should not claim `VirtualKeyAuth` unless code accepts it. +- Document plugin/dev/enterprise-only conditions in `description`. + +### 7c. Controller edit rules + +When fixing Go routes: +- Edit only after the user explicitly approves controller behavior changes. +- Preserve middleware class intentionally; do not move a route between `apiMiddlewares`, `inferenceMiddlewares`, and no middleware without approval. +- Add/update tests for changed route registration or auth behavior. +- Run module-specific tests from `transports/`: + +```bash +cd transports && go test ./bifrost-http/handlers ./bifrost-http/server ./bifrost-http/integrations +``` + +If this command is too broad or slow, run the specific package/test that covers the edited route. + +### 7d. Validate after edits + +After OpenAPI changes: + +```bash +cd docs/openapi +python3 bundle.py --output /tmp/bifrost-openapi.json +python3 bundle.py --format yaml --output /tmp/bifrost-openapi.yaml +cd - +``` + +Then inspect duplicates and security: + +```bash +python3 - <<'PY' +import json, collections +spec=json.load(open('/tmp/bifrost-openapi.json')) +ids=[] +for path,item in spec.get('paths',{}).items(): + for method,op in item.items(): + if method in {'get','post','put','delete','patch','head','options','trace'}: + oid=op.get('operationId') + if oid: ids.append(oid) +for oid,count in collections.Counter(ids).items(): + if count>1: + print('DUPLICATE operationId', oid, count) +PY +``` + +Report all validation results. + +--- + +## Common Bifrost route groups + +Use this as a checklist; always verify against current code. + +### Management handlers + +| Handler | Expected prefixes | +|---|---| +| `HealthHandler` | `/health` | +| `ConfigHandler` | `/api/config`, `/api/version`, `/api/proxy-config`, `/api/pricing/force-sync` | +| `ProviderHandler` | `/api/providers`, `/api/keys`, `/api/models` | +| `MCPHandler` | `/api/mcp/*` | +| `PluginsHandler` | `/api/plugins*` | +| `SessionHandler` | `/api/session/*` | +| `OAuthHandler` | `/api/oauth/callback`, `/api/oauth/config/{id}/*` | +| `OAuthMetadataHandler` | `/.well-known/oauth-*` | +| `PerUserOAuthHandler` | `/api/oauth/per-user/*` | +| `ConsentHandler` | `/oauth/consent*`, `/api/oauth/per-user/consent/*` | +| `GovernanceHandler` | `/api/governance/*` | +| `LoggingHandler` | `/api/logs*`, `/api/mcp-logs*` | +| `PromptsHandler` | `/api/prompt-repo/*` | +| `CacheHandler` | `/api/cache/*` | +| `WebSocketHandler` | `/ws` | +| `DevPprofHandler` | `/api/dev/pprof*` | + +### Inference handlers + +| Handler/router | Expected prefixes | +|---|---| +| `CompletionHandler` | `/v1/*` core inference | +| `AsyncHandler` | `/v1/async/*` | +| `MCPInferenceHandler` | `/v1/mcp/tool/execute` | +| `MCPServerHandler` | `/mcp` | +| `IntegrationHandler` | `/openai`, `/anthropic`, `/genai`, `/bedrock`, `/cohere`, `/litellm`, `/langchain`, `/pydanticai`, `/cursor`, passthrough prefixes | +| Realtime handlers | `/v1/responses` WS, `/v1/realtime*`, `/openai/...` aliases | + +--- + +## Mandatory rules + +- **Always scan code first.** Do not trust existing OpenAPI docs as source of truth. +- **Always compute effective OpenAPI security.** Operation security overrides root security; missing operation security inherits root security. +- **Always include file:line references** for code and docs findings. +- **Do not report UI catch-all routes as missing APIs.** +- **Do not report passthrough catch-all routes as missing docs unless public documentation is intended.** +- **Do not remove documented routes that may be enterprise-only without confirmation.** +- **Do not edit controllers for auth changes without explicit approval.** +- **Do not claim a route accepts virtual keys, API keys, cookies, tickets, or Basic auth unless code confirms it.** +- **Public endpoints must use `security: []` in OpenAPI when root security is defined.** +- **Never edit or regenerate `docs/openapi/openapi.json` locally.** It is a CI-generated artifact produced by `.github/workflows/openapi-bundle.yml` on push to `main`. Edit only the YAML sources; bundle to `/tmp` for validation. +- **After fixes, run the OpenAPI bundler and report validation results.** diff --git a/.claude/skills/docs-writer/SKILL.md b/.claude/skills/docs-writer/SKILL.md index 02da9b6524..61929a4f67 100644 --- a/.claude/skills/docs-writer/SKILL.md +++ b/.claude/skills/docs-writer/SKILL.md @@ -176,7 +176,7 @@ grep -n 'func.*create\|func.*update\|func.*delete\|func.*get' transports/bifrost | `plugins.go` | `/api/plugins` | CRUD plugins | | `config.go` | `/api/config` | GET/PUT config | | `config.go` | `/api/proxy-config` | GET/PUT proxy config | -| `cache.go` | `/api/cache/clear/{requestId}` | DELETE cache | +| `cache.go` | `/api/cache/clear/{cacheId}` | DELETE cache | | `session.go` | `/api/session/*` | Login/logout/auth check | | `oauth2.go` | `/api/oauth/*` | OAuth callback/status | diff --git a/.github/workflows/release-pipeline.yml b/.github/workflows/release-pipeline.yml index 9b2e625c92..e0624774f2 100644 --- a/.github/workflows/release-pipeline.yml +++ b/.github/workflows/release-pipeline.yml @@ -1771,7 +1771,7 @@ jobs: production.cloudflare.docker.com:443 proxy.golang.org:443 registry-1.docker.io:443 - registry.access.redhat.com:443 + *.redhat.com:443 *.quay.io:443 registry.npmjs.org:443 storage.googleapis.com:443 @@ -1862,7 +1862,7 @@ jobs: production.cloudflare.docker.com:443 proxy.golang.org:443 registry-1.docker.io:443 - registry.access.redhat.com:443 + *.redhat.com:443 *.quay.io:443 registry.npmjs.org:443 storage.googleapis.com:443 diff --git a/.github/workflows/scripts/validate-helm-config-fields.sh b/.github/workflows/scripts/validate-helm-config-fields.sh index 352ef88cde..11de5c7356 100755 --- a/.github/workflows/scripts/validate-helm-config-fields.sh +++ b/.github/workflows/scripts/validate-helm-config-fields.sh @@ -637,7 +637,6 @@ bifrost: cache_by_model: true cache_by_provider: false exclude_system_prompt: true - cleanup_on_shutdown: true vector_store_namespace: "bifrost-cache" otel: enabled: true @@ -710,7 +709,6 @@ assert_field_value 'plugins: semantic_cache conversation_history_threshold' '.pl assert_field_value 'plugins: semantic_cache cache_by_model' '.plugins.[4].config.cache_by_model' 'true' assert_field_value 'plugins: semantic_cache cache_by_provider' '.plugins.[4].config.cache_by_provider' 'false' assert_field_value 'plugins: semantic_cache exclude_system_prompt' '.plugins.[4].config.exclude_system_prompt' 'true' -assert_field_value 'plugins: semantic_cache cleanup_on_shutdown' '.plugins.[4].config.cleanup_on_shutdown' 'true' assert_field_value 'plugins: semantic_cache vector_store_namespace' '.plugins.[4].config.vector_store_namespace' '"bifrost-cache"' # OTEL plugin diff --git a/.gitignore b/.gitignore index a7c2e26109..3702ed6ec6 100644 --- a/.gitignore +++ b/.gitignore @@ -1,6 +1,7 @@ .env .vscode .DS_Store +.tool-versions *_creds* **/venv/ **/__pycache__/** @@ -45,6 +46,7 @@ transports/schema/config.schema.json *.db *.db-shm *.db-wal +transports/bifrost-http/v1.5.x # Test reports test-reports @@ -173,3 +175,7 @@ ui/app/routeTree.gen.ts .next .infisical + +# e2e test artifacts +examples/mcps/auth-demo-server/auth-demo-server +examples/mcps/oauth-demo-server/oauth-demo-server \ No newline at end of file diff --git a/Makefile b/Makefile index 950dad247b..c089340467 100644 --- a/Makefile +++ b/Makefile @@ -66,7 +66,7 @@ define EXPOSE_ENV fi endef -.PHONY: all help dev dev-pulse build-ui build build-cli run run-cli install-air install-pulse clean test test-cli install-ui setup-workspace work-init work-clean docs docker-image docker-run cleanup-enterprise mod-tidy test-integrations-py test-integrations-ts install-playwright run-e2e run-e2e-ui run-e2e-headed format ui install-newman run-provider-harness-test run-cli-harness-test +.PHONY: all help dev dev-pulse build-ui build build-cli run run-cli install-air install-pulse clean test test-cli install-ui setup-workspace work-init work-clean docs docker-image docker-run cleanup-enterprise mod-tidy test-integrations-py test-integrations-ts install-playwright run-e2e run-e2e-ui run-e2e-headed format ui install-newman run-provider-harness-test run-cli-harness-test test-semantic-cache test-semantic-cache-complete _test-semantic-cache-complete-inner all: help @@ -153,6 +153,38 @@ install-junit-viewer: ## Install junit-viewer for HTML report generation (if not dev: install-ui install-air setup-workspace $(if $(DEBUG),install-delve) ## Start complete development environment (UI + API with proxy) @$(EXPOSE_ENV); \ + set +m; \ + ui_pid=""; \ + api_pid=""; \ + cleanup() { \ + $(ECHO) "$(YELLOW)[make dev] cleanup started; ui_pid=$$ui_pid api_pid=$$api_pid$(NC)"; \ + trap - EXIT INT TERM HUP; \ + for pid in "$$ui_pid" "$$api_pid"; do \ + if [ -n "$$pid" ]; then \ + children="$$(pgrep -P "$$pid" 2>/dev/null || true)"; \ + $(ECHO) "$(YELLOW)[make dev] sending TERM to pid $$pid and children: $${children:-none}$(NC)"; \ + kill -TERM $$children "$$pid" 2>/dev/null || true; \ + fi; \ + done; \ + sleep 1; \ + for pid in "$$ui_pid" "$$api_pid"; do \ + if [ -n "$$pid" ]; then \ + children="$$(pgrep -P "$$pid" 2>/dev/null || true)"; \ + $(ECHO) "$(YELLOW)[make dev] sending KILL to pid $$pid and remaining children: $${children:-none}$(NC)"; \ + kill -KILL $$children "$$pid" 2>/dev/null || true; \ + fi; \ + done; \ + $(ECHO) "$(YELLOW)[make dev] waiting for background jobs to exit...$(NC)"; \ + wait 2>/dev/null || true; \ + $(ECHO) "$(GREEN)[make dev] cleanup completed.$(NC)"; \ + }; \ + stop_dev() { \ + $(ECHO) "$(YELLOW)[make dev] received shutdown signal; starting cleanup...$(NC)"; \ + cleanup; \ + exit 130; \ + }; \ + trap cleanup EXIT; \ + trap stop_dev INT TERM HUP; \ $(ECHO) "$(GREEN)Starting Bifrost complete development environment...$(NC)"; \ $(ECHO) "$(YELLOW)This will start:$(NC)"; \ $(ECHO) " 1. UI development server (localhost:3000)"; \ @@ -170,35 +202,57 @@ dev: install-ui install-air setup-workspace $(if $(DEBUG),install-delve) ## Star $(ECHO) "$(YELLOW)Starting UI development server...$(NC)"; \ $(USE_NODE); if [ -n "$(DISABLE_PROFILER)" ]; then \ $(ECHO) "$(CYAN)DevProfiler disabled for testing$(NC)"; \ - cd ui && BIFROST_DISABLE_PROFILER=1 npm run dev & \ + (cd ui && BIFROST_DISABLE_PROFILER=1 npm run dev) & \ else \ - cd ui && npm run dev & \ + (cd ui && npm run dev) & \ fi; \ + ui_pid="$$!"; \ + $(ECHO) "$(YELLOW)[make dev] UI dev server started with pid $$ui_pid$(NC)"; \ sleep 3; \ $(ECHO) "$(YELLOW)Starting API server with UI proxy...$(NC)"; \ $(MAKE) setup-workspace >/dev/null; \ if [ -n "$(DEBUG)" ]; then \ $(ECHO) "$(CYAN)Starting with air + delve debugger on port 2345...$(NC)"; \ $(ECHO) "$(YELLOW)Attach your debugger to localhost:2345$(NC)"; \ - cd transports/bifrost-http && BIFROST_UI_DEV=true air -c .air.debug.toml -- \ + (cd transports/bifrost-http && BIFROST_UI_DEV=true air -c .air.debug.toml -- \ -host "$(HOST)" \ -port "$(PORT)" \ -log-style "$(LOG_STYLE)" \ -log-level "$(LOG_LEVEL)" \ $(if $(PROMETHEUS_LABELS),-prometheus-labels "$(PROMETHEUS_LABELS)") \ - $(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))"); \ + $(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))")) & \ else \ - cd transports/bifrost-http && BIFROST_UI_DEV=true air -c .air.toml -- \ + (cd transports/bifrost-http && BIFROST_UI_DEV=true air -c .air.toml -- \ -host "$(HOST)" \ -port "$(PORT)" \ -log-style "$(LOG_STYLE)" \ -log-level "$(LOG_LEVEL)" \ $(if $(PROMETHEUS_LABELS),-prometheus-labels "$(PROMETHEUS_LABELS)") \ - $(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))"); \ - fi + $(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))")) & \ + fi; \ + api_pid="$$!"; \ + $(ECHO) "$(YELLOW)[make dev] API dev server started with pid $$api_pid$(NC)"; \ + while kill -0 "$$ui_pid" 2>/dev/null && kill -0 "$$api_pid" 2>/dev/null; do sleep 1; done; \ + $(ECHO) "$(YELLOW)[make dev] one of the dev processes exited; running cleanup...$(NC)"; \ + cleanup; \ + exit 1 dev-pulse: install-ui install-pulse setup-workspace $(if $(DEBUG),install-delve) ## Start complete development environment using pulse for hot reloading @$(EXPOSE_ENV); \ + set -m; \ + cleanup() { \ + trap - EXIT INT TERM HUP; \ + kill %1 %2 2>/dev/null || true; \ + sleep 1; \ + kill -KILL %1 %2 2>/dev/null || true; \ + wait 2>/dev/null || true; \ + }; \ + stop_dev() { \ + cleanup; \ + exit 130; \ + }; \ + trap cleanup EXIT; \ + trap stop_dev INT TERM HUP; \ $(ECHO) "$(GREEN)Starting Bifrost complete development environment (pulse)...$(NC)"; \ $(ECHO) "$(YELLOW)This will start:$(NC)"; \ $(ECHO) " 1. UI development server (localhost:3000)"; \ @@ -216,9 +270,9 @@ dev-pulse: install-ui install-pulse setup-workspace $(if $(DEBUG),install-delve) $(ECHO) "$(YELLOW)Starting UI development server...$(NC)"; \ $(USE_NODE); if [ -n "$(DISABLE_PROFILER)" ]; then \ $(ECHO) "$(CYAN)DevProfiler disabled for testing$(NC)"; \ - cd ui && BIFROST_DISABLE_PROFILER=1 npm run dev & \ + (cd ui && BIFROST_DISABLE_PROFILER=1 npm run dev) & \ else \ - cd ui && npm run dev & \ + (cd ui && npm run dev) & \ fi; \ sleep 3; \ $(ECHO) "$(YELLOW)Starting API server with UI proxy...$(NC)"; \ @@ -232,7 +286,7 @@ dev-pulse: install-ui install-pulse setup-workspace $(if $(DEBUG),install-delve) -log-style "$(LOG_STYLE)" \ -log-level "$(LOG_LEVEL)" \ $(if $(PROMETHEUS_LABELS),-prometheus-labels "$(PROMETHEUS_LABELS)") \ - $(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))"); \ + $(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))") & \ else \ PORT="$(PORT)" BIFROST_UI_DEV=true pulse -- \ -host "$(HOST)" \ @@ -240,8 +294,11 @@ dev-pulse: install-ui install-pulse setup-workspace $(if $(DEBUG),install-delve) -log-style "$(LOG_STYLE)" \ -log-level "$(LOG_LEVEL)" \ $(if $(PROMETHEUS_LABELS),-prometheus-labels "$(PROMETHEUS_LABELS)") \ - $(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))"); \ - fi + $(if $(APP_DIR),-app-dir "$(abspath $(APP_DIR))") & \ + fi; \ + while [ "$$(jobs -r | wc -l | tr -d ' ')" -eq 2 ]; do sleep 1; done; \ + cleanup; \ + exit 1 build-ui: install-ui ## Build ui @$(ECHO) "$(GREEN)Building ui...$(NC)" @@ -1077,6 +1134,71 @@ test-all: test-core test-framework test-plugins test-http-transport test test-cl $(ECHO) ""; \ fi +test-semantic-cache: ## Run semantic_cache e2e tests (Usage: [CACHE_TYPE=direct|semantic] [RUN_FORCE=0] make test-semantic-cache). RUN_FORCE defaults to 1. Auto-detects trail CLI and wraps the run when present. + @cd tests/semanticcache && \ + case "$$CACHE_TYPE" in \ + direct) \ + filter='^(TestPreconditions|TestDirect|TestLifecycle)$$'; \ + $(ECHO) "$(CYAN)CACHE_TYPE=direct → running preconditions + direct + lifecycle$(NC)"; \ + ;; \ + semantic) \ + filter='^(TestPreconditions|TestParaphraseFixtures|TestSemantic|TestLifecycle)$$'; \ + $(ECHO) "$(CYAN)CACHE_TYPE=semantic → running preconditions + fixtures + semantic + lifecycle$(NC)"; \ + ;; \ + '') \ + filter=''; \ + $(ECHO) "$(CYAN)CACHE_TYPE unset → running all phases$(NC)"; \ + ;; \ + *) \ + $(ECHO) "$(RED)CACHE_TYPE=$$CACHE_TYPE invalid; expected 'direct', 'semantic', or unset$(NC)"; \ + exit 1; \ + ;; \ + esac; \ + if command -v trail >/dev/null 2>&1; then \ + $(ECHO) "$(GREEN)trail detected — wrapping run in 'trail run' (session id will be printed by trail)$(NC)"; \ + if [ -n "$$filter" ]; then \ + exec trail run -- env RUN_FORCE=$${RUN_FORCE:-1} GOWORK=off go test -v -run "$$filter" ./...; \ + else \ + exec trail run -- env RUN_FORCE=$${RUN_FORCE:-1} GOWORK=off go test -v ./...; \ + fi; \ + else \ + $(ECHO) "$(YELLOW)trail not on PATH — falling back to direct go test (install 'trail' for capture-based debugging)$(NC)"; \ + if [ -n "$$filter" ]; then \ + exec env RUN_FORCE=$${RUN_FORCE:-1} GOWORK=off go test -v -run "$$filter" ./...; \ + else \ + exec env RUN_FORCE=$${RUN_FORCE:-1} GOWORK=off go test -v ./...; \ + fi; \ + fi + +test-semantic-cache-complete: ## Run BOTH plugin unit tests + e2e tests for semantic_cache. RUN_FORCE defaults to 1. Wraps everything in trail if available. + @if command -v trail >/dev/null 2>&1; then \ + $(ECHO) "$(GREEN)trail detected — wrapping unit + e2e tests in a single trail session (id printed by trail)$(NC)"; \ + exec trail run -- $(MAKE) _test-semantic-cache-complete-inner; \ + else \ + $(ECHO) "$(YELLOW)trail not on PATH — running tests directly (install 'trail' for capture-based debugging)$(NC)"; \ + $(MAKE) _test-semantic-cache-complete-inner; \ + fi + +_test-semantic-cache-complete-inner: + @$(ECHO) "" + @$(ECHO) "$(CYAN)═══════════════════════════════════════════════════════════$(NC)" + @$(ECHO) "$(CYAN) Running semantic_cache plugin UNIT tests $(NC)" + @$(ECHO) "$(CYAN)═══════════════════════════════════════════════════════════$(NC)" + @cd plugins/semanticcache && go test -v ./... + @$(ECHO) "" + @$(ECHO) "$(GREEN)═══════════════════════════════════════════════════════════$(NC)" + @$(ECHO) "$(GREEN) Unit tests completed $(NC)" + @$(ECHO) "$(GREEN)═══════════════════════════════════════════════════════════$(NC)" + @$(ECHO) "" + @$(ECHO) "$(CYAN)═══════════════════════════════════════════════════════════$(NC)" + @$(ECHO) "$(CYAN) Running semantic_cache E2E tests $(NC)" + @$(ECHO) "$(CYAN)═══════════════════════════════════════════════════════════$(NC)" + @cd tests/semanticcache && RUN_FORCE=$${RUN_FORCE:-1} GOWORK=off go test -v ./... + @$(ECHO) "" + @$(ECHO) "$(GREEN)═══════════════════════════════════════════════════════════$(NC)" + @$(ECHO) "$(GREEN) E2E tests completed $(NC)" + @$(ECHO) "$(GREEN)═══════════════════════════════════════════════════════════$(NC)" + test-chatbot: ## Run interactive chatbot integration test (Usage: RUN_CHATBOT_TEST=1 make test-chatbot) @$(EXPOSE_ENV); \ $(ECHO) "$(GREEN)Running interactive chatbot integration test...$(NC)"; \ @@ -1537,7 +1659,7 @@ install-newman: ## Install newman + htmlextra reporter if not already installed @$(USE_NODE); npm list -g newman-reporter-htmlextra > /dev/null 2>&1 || ($(ECHO) "$(YELLOW)Installing newman-reporter-htmlextra...$(NC)" && npm install -g newman-reporter-htmlextra) @$(ECHO) "$(GREEN)Newman + htmlextra are ready$(NC)" -run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost provider-harness Postman collection. HELP=1 prints full parameter docs. Per-provider parallelism is ON by default (~3-4× speedup); set PARALLEL=0 for sequential. Filter via PROVIDER=openai|anthropic|bedrock|gemini|vertex|azure|passthrough, FEATURE="" (matches request name/body), RERUN_FAILED=1 (re-run only items that failed last run). INCLUDE_PREVIEW=1 to run [PREVIEW]-tagged account/region-scoped cases. INCLUDE_SKIP=1 to run [SKIP]-tagged criss-cross cells for known-unsupported provider+modality pairs. USE_INFISICAL=1 to source from Infisical (Usage: make run-provider-harness-test [HELP=1] [PARALLEL=0] [PROVIDER=anthropic] [FEATURE="web search"] [RERUN_FAILED=1] [INCLUDE_PREVIEW=1] [INCLUDE_SKIP=1] [BASE_URL=...] [FOLDER="..."] [ENV_FILE=...] [VIEWER_PORT=8090] [CI=1]) +run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost provider-harness Postman collection. HELP=1 prints full parameter docs. Filter via PROVIDER=openai|anthropic|bedrock|gemini|vertex|azure|passthrough, FEATURE="" or FEATURE="," (AND across substrings; matches request name/URL/body), RERUN_FAILED=1 (re-run only items that failed last run). INCLUDE_PREVIEW=1 to run [PREVIEW]-tagged account/region-scoped cases. SKIP_STREAM_CANCEL=1 skips stream cancellation probes. USE_INFISICAL=1 to source from Infisical (Usage: make run-provider-harness-test [HELP=1] [PROVIDER=anthropic] [FEATURE="web search"] [FEATURE="cross-cut,structured output"] [RERUN_FAILED=1] [INCLUDE_PREVIEW=1] [BASE_URL=...] [FOLDER="..."] [ENV_FILE=...] [VIEWER_PORT=8090] [CI=1]) @if [ -n "$(HELP)" ]; then \ printf '\n%s\n' "$(CYAN)run-provider-harness-test - Bifrost provider harness runner$(NC)"; \ printf '%s\n\n' "Runs the Bifrost provider-harness Postman collection through newman, with optional filtering."; \ @@ -1546,8 +1668,9 @@ run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost prov printf ' %-18s %s\n' "HELP=1" "Print this help and exit (no Bifrost or network activity)."; \ printf ' %-18s %s\n' "PROVIDER=" "Filter requests by provider. One of: openai, anthropic, bedrock, gemini, vertex, azure, passthrough."; \ printf ' %-18s %s\n' "" " Matches via PROVIDER_KEYWORDS in tests/e2e/api/runners/filter-collection.mjs (loose name/body substring)."; \ - printf ' %-18s %s\n' "FEATURE=\"\"" "Filter by case-insensitive keyword against the full request JSON (name + URL + body)."; \ - printf ' %-18s %s\n' "" " Examples: FEATURE=\"web search\", FEATURE=\"streaming\", FEATURE=\"prompt caching\"."; \ + printf ' %-18s %s\n' "FEATURE=\"\"" "Filter by case-insensitive keyword(s) against the full request JSON (name + URL + body + ancestor folder names)."; \ + printf ' %-18s %s\n' "" " Single: FEATURE=\"web search\". Multi-keyword AND (comma-separated): FEATURE=\"cross-cut,structured output\"."; \ + printf ' %-18s %s\n' "" " \"cross-cut\" is a structural keyword - matches any row routed through unified /v1/chat/completions with a provider/model body, regardless of name."; \ printf ' %-18s %s\n' "RERUN_FAILED=1" "Re-run only requests that failed in the prior run (reads tmp/newman-report.json)."; \ printf ' %-18s %s\n' "" " Composes with PROVIDER and FEATURE (predicates AND together)."; \ printf ' %-18s %s\n' "BASE_URL=" "Bifrost gateway URL (default: http://localhost:8080). Skips auto-start if /health responds."; \ @@ -1559,16 +1682,14 @@ run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost prov printf ' %-18s %s\n' "INCLUDE_PREVIEW=1" "Run [PREVIEW]-tagged requests (account/region-scoped: vector stores, cached content, MCP servers, preview-model deployments). Off by default."; \ printf ' %-18s %s\n' "INCLUDE_SKIP=1" "Run [SKIP]-tagged criss-cross cells (provider+modality pairs that return NewUnsupportedOperationError by design, e.g., anthropic embeddings, bedrock audio). Off by default."; \ printf ' %-18s %s\n' "PARALLEL=0" "Disable per-provider parallelism (default: ON). When ON, forks one newman per provider (openai, anthropic, bedrock, gemini, vertex, azure) concurrently; reports merged into tmp/newman-report.json. The htmlextra report is only emitted in sequential mode (PARALLEL=0)."; \ + printf ' %-18s %s\n' "SKIP_STREAM_CANCEL=1" "Skip the post-Newman stream-abort probes that verify server-side cancellation on client disconnect."; \ printf ' %-18s %s\n' "USE_INFISICAL=1" "Source secrets from Infisical CLI ('infisical export --path /local --format dotenv') instead of .env."; \ printf '\n%s\n' "$(YELLOW)EXAMPLES$(NC)"; \ printf ' %s\n' "make run-provider-harness-test HELP=1"; \ - printf ' %s\n' "make run-provider-harness-test # full sweep, 6 providers concurrently (default ~3-4× speedup)"; \ - printf ' %s\n' "make run-provider-harness-test PARALLEL=0 # sequential mode (ordered output, htmlextra report)"; \ - printf ' %s\n' "make run-provider-harness-test FOLDER=\"8. Criss-Cross\" # criss-cross matrix only (endpoint × provider × modality)"; \ - printf ' %s\n' "make run-provider-harness-test FOLDER=\"8.2 Text Chat (streaming)\" # criss-cross streaming sub-folder"; \ - printf ' %s\n' "make run-provider-harness-test PROVIDER=bedrock # bedrock-only (includes bedrock-model cells across §8)"; \ - printf ' %s\n' "make run-provider-harness-test FEATURE=\"web search\" # all providers, web-search entries"; \ - printf ' %s\n' "make run-provider-harness-test INCLUDE_SKIP=1 # also run [SKIP] cells (capability-gap matrix)"; \ + printf ' %s\n' "make run-provider-harness-test # full provider sweep"; \ + printf ' %s\n' "make run-provider-harness-test PROVIDER=bedrock # bedrock-only"; \ + printf ' %s\n' "make run-provider-harness-test FEATURE=\"web search\" # all providers, web-search entries"; \ + printf ' %s\n' "make run-provider-harness-test FEATURE=\"cross-cut,structured output\" # AND of substrings"; \ printf ' %s\n' "make run-provider-harness-test RERUN_FAILED=1 # triage iteration loop"; \ printf ' %s\n' "make run-provider-harness-test PROVIDER=anthropic RERUN_FAILED=1 # anthropic failures only"; \ printf ' %s\n' "make run-provider-harness-test PROVIDER=passthrough # passthrough sweep (incl. Bedrock SigV4)"; \ @@ -1578,11 +1699,13 @@ run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost prov printf ' %-30s %s\n' "tmp/newman-cli.log" "Captured newman CLI output (stdout+stderr)."; \ printf ' %-30s %s\n' "tmp/harness-failures.md" "Categorized failure analyzer output + coverage matrices."; \ printf ' %-30s %s\n' "tmp/bifrost-dev.log" "Bifrost runtime log (only if we auto-started it)."; \ + printf ' %-30s %s\n' "tmp/harness-augmented.json" "Provider harness plus generated streaming/thinking rows."; \ printf ' %-30s %s\n' "tmp/harness-filtered.json" "Filtered collection (only if PROVIDER/FEATURE/RERUN_FAILED set)."; \ printf ' %-30s %s\n' "tmp/newman-report-

.json" "Per-provider newman report (parallel mode only)."; \ printf ' %-30s %s\n' "tmp/newman-cli-

.log" "Per-provider newman stdout/stderr (parallel mode only)."; \ printf ' %-30s %s\n' "tmp/parallel-status" "Per-provider pass/fail summary (parallel mode only)."; \ printf ' %-30s %s\n' "tmp/newman-report.html" "htmlextra report (sequential mode only — PARALLEL=0)."; \ + printf ' %-30s %s\n' "tmp/stream-cancel-report.json" "Server-side stream cancellation probe report."; \ printf '\n'; \ exit 0; \ fi @@ -1594,6 +1717,11 @@ run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost prov VIEWER_PORT_VAL="$(or $(VIEWER_PORT),8090)"; \ STARTED_BY_US=0; \ cleanup() { \ + if [ -f tmp/harness-monitor.pid ]; then \ + MPID=$$(cat tmp/harness-monitor.pid); \ + kill $$MPID 2>/dev/null; \ + rm -f tmp/harness-monitor.pid; \ + fi; \ if [ -f tmp/harness-viewer.pid ]; then \ VPID=$$(cat tmp/harness-viewer.pid); \ kill $$VPID 2>/dev/null; \ @@ -1624,6 +1752,25 @@ run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost prov fi; \ }; \ trap cleanup EXIT INT TERM HUP; \ + PICKED_FEATURES=""; \ + if [ -t 0 ] && [ -t 1 ] && [ -z "$$CI" ] && [ -z "$(CI)" ] \ + && [ -z "$(PROVIDER)" ] && [ -z "$(FEATURE)" ] && [ -z "$(FOLDER)" ] \ + && [ -z "$(RERUN_FAILED)" ]; then \ + $(USE_NODE); \ + PICKED_FEATURES=$$(node tests/e2e/api/runners/pick-features.mjs); \ + PICK_RC=$$?; \ + case $$PICK_RC in \ + 0) ;; \ + 1) $(ECHO) "$(YELLOW)Cancelled.$(NC)"; exit 1 ;; \ + 2) ;; \ + *) exit $$PICK_RC ;; \ + esac; \ + if [ -n "$$PICKED_FEATURES" ]; then \ + $(ECHO) "$(GREEN)Modalities: $$PICKED_FEATURES$(NC)"; \ + else \ + $(ECHO) "$(GREEN)Modalities: all (no filter)$(NC)"; \ + fi; \ + fi; \ if curl -fsS --max-time 2 "$$BASE_URL_VAL/health" > /dev/null 2>&1; then \ $(ECHO) "$(GREEN)Bifrost already running at $$BASE_URL_VAL$(NC)"; \ else \ @@ -1643,14 +1790,21 @@ run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost prov exit 1; \ fi; \ fi; \ - COLLECTION_FILE="tests/e2e/api/collections/provider-harness.json"; \ - if [ -n "$(PROVIDER)" ] || [ -n "$(FEATURE)" ] || [ -n "$(RERUN_FAILED)" ]; then \ - $(ECHO) "$(CYAN)Filtering collection (provider=$(PROVIDER), feature=$(FEATURE), rerun-failed=$(RERUN_FAILED))...$(NC)"; \ + $(ECHO) "$(CYAN)Augmenting provider harness with generated streaming/thinking cases...$(NC)"; \ + $(USE_NODE); node tests/e2e/api/runners/augment-provider-harness.mjs \ + --source tests/e2e/api/collections/provider-harness.json \ + --out tmp/harness-augmented.json || { $(ECHO) "$(RED)Harness augmentation failed$(NC)"; exit 1; }; \ + COLLECTION_FILE="tmp/harness-augmented.json"; \ + FEATURE_ANY_FLAG=""; \ + if [ -n "$$PICKED_FEATURES" ]; then FEATURE_ANY_FLAG="--feature-any $$PICKED_FEATURES"; fi; \ + if [ -n "$(PROVIDER)" ] || [ -n "$(FEATURE)" ] || [ -n "$(RERUN_FAILED)" ] || [ -n "$$PICKED_FEATURES" ]; then \ + $(ECHO) "$(CYAN)Filtering collection (provider=$(PROVIDER), feature=$(FEATURE), feature-any=$$PICKED_FEATURES, rerun-failed=$(RERUN_FAILED))...$(NC)"; \ $(USE_NODE); node tests/e2e/api/runners/filter-collection.mjs \ - --source tests/e2e/api/collections/provider-harness.json \ + --source "$$COLLECTION_FILE" \ --out tmp/harness-filtered.json \ $(if $(PROVIDER),--provider $(PROVIDER),) \ $(if $(FEATURE),--feature "$(FEATURE)",) \ + $$FEATURE_ANY_FLAG \ $(if $(RERUN_FAILED),--rerun-failed --report tmp/newman-report.json,) || { $(ECHO) "$(RED)Filter step failed$(NC)"; exit 1; }; \ COLLECTION_FILE="tmp/harness-filtered.json"; \ fi; \ @@ -1694,20 +1848,36 @@ run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost prov $(ECHO) "$(RED)No provider runs were launched. Check PROVIDER/FEATURE/FOLDER filters.$(NC)"; \ exit 1; \ fi; \ + if [ -t 1 ] && [ -z "$$CI" ] && [ -z "$(CI)" ]; then \ + $(USE_NODE); node tests/e2e/api/runners/harness-monitor.mjs \ + --mode parallel \ + --providers "$$PROVIDERS" \ + --tmp-dir tmp \ + --status-file tmp/parallel-status \ + --launched $$LAUNCHED \ + < /dev/null > /dev/tty 2>&1 & \ + echo $$! > tmp/harness-monitor.pid; \ + fi; \ PFAILED=0; \ while read pidp; do \ pid="$${pidp%%:*}"; \ p="$${pidp#*:}"; \ if wait "$$pid"; then \ echo "$$p:pass" >> tmp/parallel-status; \ - $(ECHO) "$(GREEN)[$$p] passed$(NC)"; \ + if [ ! -f tmp/harness-monitor.pid ]; then $(ECHO) "$(GREEN)[$$p] passed$(NC)"; fi; \ else \ echo "$$p:fail" >> tmp/parallel-status; \ - $(ECHO) "$(RED)[$$p] failed$(NC)"; \ + if [ ! -f tmp/harness-monitor.pid ]; then $(ECHO) "$(RED)[$$p] failed$(NC)"; fi; \ PFAILED=$$((PFAILED+1)); \ fi; \ - tail -n 20 "tmp/newman-cli-$$p.log" 2>/dev/null; \ + if [ ! -f tmp/harness-monitor.pid ]; then tail -n 20 "tmp/newman-cli-$$p.log" 2>/dev/null; fi; \ done < tmp/parallel-pids; \ + if [ -f tmp/harness-monitor.pid ]; then \ + MPID=$$(cat tmp/harness-monitor.pid); \ + kill -TERM $$MPID 2>/dev/null; \ + wait $$MPID 2>/dev/null || true; \ + rm -f tmp/harness-monitor.pid; \ + fi; \ $(ECHO) "$(CYAN)Merging per-provider reports into tmp/newman-report.json...$(NC)"; \ if command -v jq >/dev/null 2>&1 && ls tmp/newman-report-*.json >/dev/null 2>&1; then \ jq -s '{collection: (.[0].collection // {}), environment: (.[0].environment // {}), run: {executions: [.[].run.executions[]?], failures: [.[].run.failures[]?], stats: {iterations: {total: 1, pending: 0, failed: 0}, items: {total: ([.[].run.stats.items.total // 0] | add)}, requests: {total: ([.[].run.stats.requests.total // 0] | add), failed: ([.[].run.stats.requests.failed // 0] | add)}}, timings: (.[0].run.timings // {})}}' tmp/newman-report-*.json > tmp/newman-report.json || $(ECHO) "$(YELLOW)Report merge failed; per-provider reports remain at tmp/newman-report-*.json$(NC)"; \ @@ -1727,20 +1897,62 @@ run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost prov done < tmp/parallel-status; \ NEWMAN_EXIT=$$PFAILED; \ else \ - newman run "$$COLLECTION_FILE" \ - --env-var "baseUrl=$$BASE_URL_VAL" \ - $(if $(filter 1 true TRUE yes YES y Y,$(INCLUDE_PREVIEW)),--env-var "include_preview=1",) \ - $(if $(filter 1 true TRUE yes YES y Y,$(INCLUDE_SKIP)),--env-var "include_skip=1",) \ - $(if $(ENV_FILE),--environment $(ENV_FILE),) \ - $(if $(FOLDER),--folder "$(FOLDER)",) \ - --reporters cli,json,htmlextra \ - --reporter-json-export tmp/newman-report.json \ - --reporter-htmlextra-export tmp/newman-report.html \ - --reporter-htmlextra-title "Bifrost Provider Harness" \ - --reporter-htmlextra-darkTheme 2>&1 | tee tmp/newman-cli.log; \ - NEWMAN_EXIT=$$?; \ + SEQ_PROVIDERS="$(PROVIDER)"; \ + if [ -z "$$SEQ_PROVIDERS" ]; then SEQ_PROVIDERS="openai anthropic bedrock gemini vertex azure passthrough"; fi; \ + if [ -t 1 ] && [ -z "$$CI" ] && [ -z "$(CI)" ]; then \ + : > tmp/newman-cli.log; \ + $(USE_NODE); node tests/e2e/api/runners/harness-monitor.mjs \ + --mode sequential \ + --providers "$$SEQ_PROVIDERS" \ + --tmp-dir tmp \ + --log tmp/newman-cli.log \ + < /dev/null > /dev/tty 2>&1 & \ + echo $$! > tmp/harness-monitor.pid; \ + newman run "$$COLLECTION_FILE" \ + --env-var "baseUrl=$$BASE_URL_VAL" \ + $(if $(filter 1 true TRUE yes YES y Y,$(INCLUDE_PREVIEW)),--env-var "include_preview=1",) \ + $(if $(filter 1 true TRUE yes YES y Y,$(INCLUDE_SKIP)),--env-var "include_skip=1",) \ + $(if $(ENV_FILE),--environment $(ENV_FILE),) \ + $(if $(FOLDER),--folder "$(FOLDER)",) \ + --reporters cli,json,htmlextra \ + --reporter-json-export tmp/newman-report.json \ + --reporter-htmlextra-export tmp/newman-report.html \ + --reporter-htmlextra-title "Bifrost Provider Harness" \ + --reporter-htmlextra-darkTheme > tmp/newman-cli.log 2>&1; \ + NEWMAN_EXIT=$$?; \ + if [ -f tmp/harness-monitor.pid ]; then \ + MPID=$$(cat tmp/harness-monitor.pid); \ + kill -TERM $$MPID 2>/dev/null; \ + wait $$MPID 2>/dev/null || true; \ + rm -f tmp/harness-monitor.pid; \ + fi; \ + else \ + newman run "$$COLLECTION_FILE" \ + --env-var "baseUrl=$$BASE_URL_VAL" \ + $(if $(filter 1 true TRUE yes YES y Y,$(INCLUDE_PREVIEW)),--env-var "include_preview=1",) \ + $(if $(filter 1 true TRUE yes YES y Y,$(INCLUDE_SKIP)),--env-var "include_skip=1",) \ + $(if $(ENV_FILE),--environment $(ENV_FILE),) \ + $(if $(FOLDER),--folder "$(FOLDER)",) \ + --reporters cli,json,htmlextra \ + --reporter-json-export tmp/newman-report.json \ + --reporter-htmlextra-export tmp/newman-report.html \ + --reporter-htmlextra-title "Bifrost Provider Harness" \ + --reporter-htmlextra-darkTheme 2>&1 | tee tmp/newman-cli.log; \ + NEWMAN_EXIT=$$?; \ + fi; \ fi; \ $(ECHO) "$(GREEN)Newman finished. Reports: tmp/newman-report.{json,html} + tmp/newman-cli.log$(NC)"; \ + STREAM_CANCEL_EXIT=0; \ + if [ -z "$(SKIP_STREAM_CANCEL)" ] && [ -z "$(RERUN_FAILED)" ] && [ "$(PROVIDER)" != "passthrough" ] && { [ -z "$(FOLDER)" ] || printf '%s' "$(FOLDER)" | grep -qi 'stream'; }; then \ + $(ECHO) "$(CYAN)Running stream cancellation probes...$(NC)"; \ + $(USE_NODE); node tests/e2e/api/runners/run-stream-cancellation.mjs \ + --base-url "$$BASE_URL_VAL" \ + $(if $(PROVIDER),--provider "$(PROVIDER)",) \ + --out tmp/stream-cancel-report.json 2>&1 | tee tmp/stream-cancel-cli.log; \ + STREAM_CANCEL_EXIT=$$?; \ + else \ + $(ECHO) "$(YELLOW)Skipping stream cancellation probes (SKIP_STREAM_CANCEL/RERUN_FAILED/FOLDER filter).$(NC)"; \ + fi; \ $(ECHO) "$(CYAN)Analyzing failures...$(NC)"; \ $(USE_NODE); node tests/e2e/api/runners/analyze-failures.mjs \ --report tmp/newman-report.json \ @@ -1764,4 +1976,5 @@ run-provider-harness-test: $(if $(HELP),,install-newman) ## Run the Bifrost prov $(ECHO) "$(GREEN)Viewer closed.$(NC)"; \ fi; \ fi; \ - exit $$NEWMAN_EXIT + if [ "$$NEWMAN_EXIT" -ne 0 ]; then exit $$NEWMAN_EXIT; fi; \ + exit $$STREAM_CANCEL_EXIT diff --git a/core/bifrost.go b/core/bifrost.go index 5fc49bef91..a696c15235 100644 --- a/core/bifrost.go +++ b/core/bifrost.go @@ -4051,6 +4051,31 @@ func (bifrost *Bifrost) SelectKeyForProviderRequestType(ctx *schemas.BifrostCont return bifrost.keySelector(ctx, supportedKeys, providerKey, model) } +// ComputeRawStorageForProvider determines whether raw request/response payloads should be +// captured and stored in log records for the given provider. This is the same computation +// performed inside executeRequest (lines 5675-5713), exported for callers that bypass +// the normal inference path (e.g. realtime WebSocket/WebRTC sessions). +func (bifrost *Bifrost) ComputeRawStorageForProvider(ctx *schemas.BifrostContext, providerKey schemas.ModelProvider) bool { + if ctx == nil { + ctx = bifrost.ctx + } + if ctx == nil { + return false + } + config, err := bifrost.account.GetConfigForProvider(providerKey) + if err != nil || config == nil { + return false + } + effectiveStore := config.StoreRawRequestResponse + allowStorageOverride, _ := ctx.Value(schemas.BifrostContextKeyAllowPerRequestStorageOverride).(bool) + if allowStorageOverride { + if override, ok := ctx.Value(schemas.BifrostContextKeyStoreRawRequestResponse).(bool); ok { + effectiveStore = override + } + } + return effectiveStore +} + // WSStreamHooks holds the post-hook runner and cleanup function returned by RunStreamPreHooks. // Call PostHookRunner for each streaming chunk, setting StreamEndIndicator on the final chunk. // Call Cleanup when done to release the pipeline back to the pool. diff --git a/core/changelog.md b/core/changelog.md index e69de29bb2..06c408aa46 100644 --- a/core/changelog.md +++ b/core/changelog.md @@ -0,0 +1,2 @@ +[fix]: openai provider - add usage to completed event in responses to chat completions fallback [@kevinpdev](https://github.com/kevinpdev) +[feat]: use chat completions for openai custom providers that disable responses [@kevinpdev](https://github.com/kevinpdev) diff --git a/core/internal/llmtests/realtime.go b/core/internal/llmtests/realtime.go index 400f5f9cda..e024ffb8c5 100644 --- a/core/internal/llmtests/realtime.go +++ b/core/internal/llmtests/realtime.go @@ -49,7 +49,10 @@ func RunRealtimeTest(t *testing.T, client *bifrost.Bifrost, ctx context.Context, } wsURL := rtProvider.RealtimeWebSocketURL(key, testConfig.RealtimeModel) - hdrs := rtProvider.RealtimeHeaders(key) + hdrs, headerErr := rtProvider.RealtimeHeaders(bfCtx, key) + if headerErr != nil { + t.Fatalf("failed to build realtime headers for provider %s: %v", testConfig.Provider, headerErr) + } httpHeaders := http.Header{} for k, v := range hdrs { diff --git a/core/providers/anthropic/anthropic.go b/core/providers/anthropic/anthropic.go index e6f8ca62dd..6ce5950a52 100644 --- a/core/providers/anthropic/anthropic.go +++ b/core/providers/anthropic/anthropic.go @@ -564,8 +564,6 @@ func (provider *AnthropicProvider) ChatCompletionStream(ctx *schemas.BifrostCont headers["x-api-key"] = key.Value.GetValue() } - providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, provider.networkConfig.StreamIdleTimeoutInSeconds) - // Use shared Anthropic streaming logic return HandleAnthropicChatCompletionStreaming( ctx, @@ -574,6 +572,7 @@ func (provider *AnthropicProvider) ChatCompletionStream(ctx *schemas.BifrostCont jsonData, headers, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, provider.networkConfig.BetaHeaderOverrides, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), @@ -594,6 +593,7 @@ func HandleAnthropicChatCompletionStreaming( jsonBody []byte, headers map[string]string, extraHeaders map[string]string, + streamIdleTimeoutInSeconds int, betaHeaderOverrides map[string]bool, sendBackRawRequest bool, sendBackRawResponse bool, @@ -603,6 +603,7 @@ func HandleAnthropicChatCompletionStreaming( logger schemas.Logger, postHookSpanFinalizer func(context.Context), ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, streamIdleTimeoutInSeconds) req := fasthttp.AcquireRequest() resp := fasthttp.AcquireResponse() resp.StreamBody = true // Initialize for streaming @@ -636,7 +637,7 @@ func HandleAnthropicChatCompletionStreaming( providerUtils.DrainLargePayloadRemainder(ctx) } if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -658,7 +659,7 @@ func HandleAnthropicChatCompletionStreaming( // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, parseAnthropicError(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) } @@ -683,7 +684,7 @@ func HandleAnthropicChatCompletionStreaming( } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if resp.BodyStream() == nil { bifrostErr := providerUtils.NewBifrostOperationError( @@ -739,6 +740,10 @@ func HandleAnthropicChatCompletionStreaming( } eventType, eventDataBytes, readErr := sseReader.ReadEvent() if readErr != nil { + // Recheck context cancellation + if ctx.Err() != nil { + return + } if readErr != io.EOF { ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) logger.Warn("Error reading %s stream: %v", providerName, readErr) @@ -1030,8 +1035,6 @@ func (provider *AnthropicProvider) ResponsesStream(ctx *schemas.BifrostContext, headers["x-api-key"] = key.Value.GetValue() } - providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, provider.networkConfig.StreamIdleTimeoutInSeconds) - return HandleAnthropicResponsesStream( ctx, provider.streamingClient, @@ -1039,6 +1042,7 @@ func (provider *AnthropicProvider) ResponsesStream(ctx *schemas.BifrostContext, jsonBody, headers, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, provider.networkConfig.BetaHeaderOverrides, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), @@ -1059,6 +1063,7 @@ func HandleAnthropicResponsesStream( jsonBody []byte, headers map[string]string, extraHeaders map[string]string, + streamIdleTimeoutInSeconds int, betaHeaderOverrides map[string]bool, sendBackRawRequest bool, sendBackRawResponse bool, @@ -1068,6 +1073,7 @@ func HandleAnthropicResponsesStream( logger schemas.Logger, postHookSpanFinalizer func(context.Context), ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, streamIdleTimeoutInSeconds) req := fasthttp.AcquireRequest() resp := fasthttp.AcquireResponse() resp.StreamBody = true @@ -1103,7 +1109,7 @@ func HandleAnthropicResponsesStream( providerUtils.DrainLargePayloadRemainder(ctx) } if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -1125,7 +1131,7 @@ func HandleAnthropicResponsesStream( // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, parseAnthropicError(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) } @@ -1150,7 +1156,7 @@ func HandleAnthropicResponsesStream( } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // If body stream is nil, return an error if resp.BodyStream() == nil { bifrostErr := providerUtils.NewBifrostOperationError( @@ -1204,6 +1210,10 @@ func HandleAnthropicResponsesStream( } eventType, eventDataBytes, readErr := sseReader.ReadEvent() if readErr != nil { + // Recheck context cancellation + if ctx.Err() != nil { + return + } if readErr != io.EOF { ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) logger.Warn("Error reading %s stream: %v", providerName, readErr) @@ -2647,7 +2657,7 @@ func (provider *AnthropicProvider) PassthroughStream( activeClient := providerUtils.PrepareResponseStreaming(ctx, provider.streamingClient, resp) if err := activeClient.Do(fasthttpReq, resp); err != nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -2669,7 +2679,7 @@ func (provider *AnthropicProvider) PassthroughStream( bodyStream := resp.BodyStream() if bodyStream == nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.NewBifrostOperationError( "provider returned an empty stream body", fmt.Errorf("provider returned an empty stream body"), @@ -2700,7 +2710,7 @@ func (provider *AnthropicProvider) PassthroughStream( } close(ch) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) defer stopIdleTimeout() defer stopCancellation() diff --git a/core/providers/anthropic/chat.go b/core/providers/anthropic/chat.go index 199d55b296..d9431bba07 100644 --- a/core/providers/anthropic/chat.go +++ b/core/providers/anthropic/chat.go @@ -745,6 +745,20 @@ func ToAnthropicChatRequest(ctx *schemas.BifrostContext, bifrostReq *schemas.Bif anthropicReq.Messages = anthropicMessages anthropicReq.System = systemContent + // Trim trailing whitespace from the last assistant message text blocks + // ContentStr is converted to a single text ContentBlock during message conversion + // so we trim the text of that block instead. + lastMsgIndex := len(anthropicReq.Messages) - 1 + if lastMsgIndex >= 0 && anthropicReq.Messages[lastMsgIndex].Role == AnthropicMessageRoleAssistant { + blocks := anthropicReq.Messages[lastMsgIndex].Content.ContentBlocks + for j := len(blocks) - 1; j >= 0; j-- { + if blocks[j].Type == AnthropicContentBlockTypeText && blocks[j].Text != nil { + anthropicReq.Messages[lastMsgIndex].Content.ContentBlocks[j].Text = schemas.Ptr(strings.TrimRight(*blocks[j].Text, " \n\r\t")) + break + } + } + } + // Strip request- and tool-level fields the target Anthropic-family // provider does not support. Fail-closed tool validation stays in // ValidateToolsForProvider; this is strip-silently for additive fields. diff --git a/core/providers/anthropic/responses.go b/core/providers/anthropic/responses.go index ce90ab2c2b..fe65e9bcb9 100644 --- a/core/providers/anthropic/responses.go +++ b/core/providers/anthropic/responses.go @@ -2738,9 +2738,8 @@ func (response *AnthropicMessageResponse) ToBifrostResponsesResponse(ctx *schema bifrostResp.Model = response.Model - // Preserve stop reason from Anthropic response if response.StopReason != "" { - bifrostResp.StopReason = schemas.Ptr(string(response.StopReason)) + bifrostResp.StopReason = schemas.Ptr(ConvertAnthropicFinishReasonToBifrost(response.StopReason)) } return bifrostResp @@ -3344,6 +3343,20 @@ func ConvertBifrostMessagesToAnthropicMessages(ctx *schemas.BifrostContext, bifr // Flush any remaining pending tool calls (with tracking) flushPendingToolCallsWithTracking() + // Trim trailing whitespace from the last assistant message + // ContentStr is converted to a single text ContentBlock during message conversion + // so we trim the text of that block instead. + lastMsgIndex := len(anthropicMessages) - 1 + if isRequestMessage && lastMsgIndex >= 0 && anthropicMessages[lastMsgIndex].Role == AnthropicMessageRoleAssistant { + blocks := anthropicMessages[lastMsgIndex].Content.ContentBlocks + for j := len(blocks) - 1; j >= 0; j-- { + if blocks[j].Type == AnthropicContentBlockTypeText && blocks[j].Text != nil { + anthropicMessages[lastMsgIndex].Content.ContentBlocks[j].Text = schemas.Ptr(strings.TrimRight(*blocks[j].Text, " \n\r\t")) + break + } + } + } + return anthropicMessages, systemContent } @@ -5957,4 +5970,4 @@ func generateSyntheticInputJSONDeltas(argumentsJSON string, contentIndex *int) [ } return events -} \ No newline at end of file +} diff --git a/core/providers/anthropic/types.go b/core/providers/anthropic/types.go index 35874f658a..4b4870c282 100644 --- a/core/providers/anthropic/types.go +++ b/core/providers/anthropic/types.go @@ -183,7 +183,9 @@ var ProviderFeatures = map[schemas.ModelProvider]ProviderFeatureSupport{ // WebSearch, CodeExecution, FastMode, TaskBudgets, AdvisorTool, // InferenceGeo, RedactThinking, AdvancedToolUse (full), PromptCachingScope. schemas.Bedrock: { - ComputerUse: true, Bash: true, Memory: true, TextEditor: true, ToolSearch: true, + WebSearch: true, + CodeExecution: true, + ComputerUse: true, Bash: true, Memory: true, TextEditor: true, ToolSearch: true, ContainerBasic: true, // StructuredOutputs: kept true to match pre-existing behavior and the // provider_feature_support_test.go assertion, but NEITHER B-header @@ -1200,18 +1202,18 @@ const ( type AnthropicToolName string const ( - AnthropicToolNameComputer AnthropicToolName = "computer" - AnthropicToolNameWebSearch AnthropicToolName = "web_search" - AnthropicToolNameWebFetch AnthropicToolName = "web_fetch" - AnthropicToolNameBash AnthropicToolName = "bash" - AnthropicToolNameTextEditor AnthropicToolName = "str_replace_based_edit_tool" + AnthropicToolNameComputer AnthropicToolName = "computer" + AnthropicToolNameWebSearch AnthropicToolName = "web_search" + AnthropicToolNameWebFetch AnthropicToolName = "web_fetch" + AnthropicToolNameBash AnthropicToolName = "bash" + AnthropicToolNameTextEditor AnthropicToolName = "str_replace_based_edit_tool" // AnthropicToolNameTextEditorLegacy is the name required for text_editor_20250124 // and text_editor_20250429. Newer text_editor_20250728+ use AnthropicToolNameTextEditor. AnthropicToolNameTextEditorLegacy AnthropicToolName = "str_replace_editor" - AnthropicToolNameCodeExecution AnthropicToolName = "code_execution" - AnthropicToolNameMemory AnthropicToolName = "memory" - AnthropicToolNameToolSearchBM25 AnthropicToolName = "tool_search_tool_bm25" - AnthropicToolNameToolSearchRegex AnthropicToolName = "tool_search_tool_regex" + AnthropicToolNameCodeExecution AnthropicToolName = "code_execution" + AnthropicToolNameMemory AnthropicToolName = "memory" + AnthropicToolNameToolSearchBM25 AnthropicToolName = "tool_search_tool_bm25" + AnthropicToolNameToolSearchRegex AnthropicToolName = "tool_search_tool_regex" ) type AnthropicToolComputerUse struct { diff --git a/core/providers/anthropic/utils.go b/core/providers/anthropic/utils.go index d26be4bef5..80b609f002 100644 --- a/core/providers/anthropic/utils.go +++ b/core/providers/anthropic/utils.go @@ -2654,6 +2654,14 @@ func convertResponsesTextConfigToAnthropicOutputFormat(textConfig *schemas.Respo schema["required"] = format.JSONSchema.Required } + if format.JSONSchema.Defs != nil { + schema["$defs"] = *format.JSONSchema.Defs + } + + if format.JSONSchema.Definitions != nil { + schema["definitions"] = *format.JSONSchema.Definitions + } + if format.JSONSchema.Type != nil && *format.JSONSchema.Type == "object" { schema["additionalProperties"] = false } else if format.JSONSchema.AdditionalProperties != nil { diff --git a/core/providers/anthropic/utils_test.go b/core/providers/anthropic/utils_test.go index 9117e1f9d3..cf7ce0373a 100644 --- a/core/providers/anthropic/utils_test.go +++ b/core/providers/anthropic/utils_test.go @@ -577,6 +577,143 @@ func TestConvertChatResponseFormatToAnthropicOutputFormat(t *testing.T) { } } +func TestConvertResponsesTextConfigToAnthropicOutputFormatPreservesSchemaRefs(t *testing.T) { + schemaType := "object" + properties := map[string]interface{}{ + "record": map[string]interface{}{ + "$ref": "#/$defs/Document", + }, + } + defs := map[string]interface{}{ + "Document": map[string]interface{}{ + "type": "object", + "properties": map[string]interface{}{ + "title": map[string]interface{}{"type": "string"}, + "authors": map[string]interface{}{ + "type": "array", + "items": map[string]interface{}{ + "$ref": "#/$defs/Person", + }, + }, + }, + "required": []interface{}{"title", "authors"}, + }, + "Person": map[string]interface{}{ + "type": "object", + "properties": map[string]interface{}{ + "name": map[string]interface{}{"type": "string"}, + "email": map[string]interface{}{"type": []interface{}{"string", "null"}}, + }, + "required": []interface{}{"name", "email"}, + }, + } + + result := convertResponsesTextConfigToAnthropicOutputFormat(&schemas.ResponsesTextConfig{ + Format: &schemas.ResponsesTextConfigFormat{ + Type: "json_schema", + JSONSchema: &schemas.ResponsesTextConfigFormatJSONSchema{ + Type: &schemaType, + Properties: &properties, + Required: []string{"record"}, + Defs: &defs, + }, + }, + }) + if result == nil { + t.Fatal("expected output format") + } + + var output map[string]interface{} + if err := sonic.Unmarshal(result, &output); err != nil { + t.Fatalf("failed to unmarshal output format: %v", err) + } + + if output["type"] != "json_schema" { + t.Fatalf("expected json_schema type, got %v", output["type"]) + } + + schema, ok := output["schema"].(map[string]interface{}) + if !ok { + t.Fatalf("expected schema map, got %T", output["schema"]) + } + if schema["additionalProperties"] != false { + t.Fatalf("expected additionalProperties=false, got %v", schema["additionalProperties"]) + } + if _, ok := schema["$defs"].(map[string]interface{}); !ok { + t.Fatalf("expected $defs to be preserved, got %v", schema["$defs"]) + } + + outputProperties, ok := schema["properties"].(map[string]interface{}) + if !ok { + t.Fatalf("expected properties map, got %T", schema["properties"]) + } + recordSchema, ok := outputProperties["record"].(map[string]interface{}) + if !ok { + t.Fatalf("expected record schema map, got %T", outputProperties["record"]) + } + if recordSchema["$ref"] != "#/$defs/Document" { + t.Fatalf("expected record $ref to be preserved, got %v", recordSchema["$ref"]) + } +} + +func TestConvertResponsesTextConfigToAnthropicOutputFormatPreservesLegacyDefinitions(t *testing.T) { + schemaType := "object" + properties := map[string]interface{}{ + "record": map[string]interface{}{ + "$ref": "#/definitions/Document", + }, + } + definitions := map[string]interface{}{ + "Document": map[string]interface{}{ + "type": "object", + "properties": map[string]interface{}{ + "title": map[string]interface{}{"type": "string"}, + }, + "required": []interface{}{"title"}, + }, + } + + result := convertResponsesTextConfigToAnthropicOutputFormat(&schemas.ResponsesTextConfig{ + Format: &schemas.ResponsesTextConfigFormat{ + Type: "json_schema", + JSONSchema: &schemas.ResponsesTextConfigFormatJSONSchema{ + Type: &schemaType, + Properties: &properties, + Required: []string{"record"}, + Definitions: &definitions, + }, + }, + }) + if result == nil { + t.Fatal("expected output format") + } + + var output map[string]interface{} + if err := sonic.Unmarshal(result, &output); err != nil { + t.Fatalf("failed to unmarshal output format: %v", err) + } + + schema, ok := output["schema"].(map[string]interface{}) + if !ok { + t.Fatalf("expected schema map, got %T", output["schema"]) + } + if _, ok := schema["definitions"].(map[string]interface{}); !ok { + t.Fatalf("expected definitions to be preserved, got %v", schema["definitions"]) + } + + outputProperties, ok := schema["properties"].(map[string]interface{}) + if !ok { + t.Fatalf("expected properties map, got %T", schema["properties"]) + } + recordSchema, ok := outputProperties["record"].(map[string]interface{}) + if !ok { + t.Fatalf("expected record schema map, got %T", outputProperties["record"]) + } + if recordSchema["$ref"] != "#/definitions/Document" { + t.Fatalf("expected record $ref to be preserved, got %v", recordSchema["$ref"]) + } +} + func TestValidateToolsForProvider(t *testing.T) { tests := []struct { name string @@ -2691,3 +2828,56 @@ func TestIsClaudeCodeRequest(t *testing.T) { }) } } + +// TestBudgetTokensNeverExceedsMaxTokens verifies the strict budget_tokens < max_tokens +// invariant required by both Anthropic and Bedrock for all effort levels. +func TestBudgetTokensNeverExceedsMaxTokens(t *testing.T) { + const minBudget = MinimumReasoningMaxTokens // 1024 + maxTokensValues := []int{1025, 4096, 16000, 32000, 64000, 128000} + efforts := []string{"minimal", "low", "medium", "high", "xhigh", "max"} + + for _, maxTok := range maxTokensValues { + for _, effort := range efforts { + t.Run(fmt.Sprintf("effort=%s/maxTokens=%d", effort, maxTok), func(t *testing.T) { + budget, err := providerUtils.GetBudgetTokensFromReasoningEffort(effort, minBudget, maxTok) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if budget >= maxTok { + t.Errorf("effort=%q maxTokens=%d: budget_tokens=%d violates strict budget_tokens < max_tokens", + effort, maxTok, budget) + } + }) + } + } +} + +// TestBudgetTokensMaxEffortCapsBelowMaxTokens specifically pins the "max" effort +// behavior: ratio=1.0 would produce budget==maxTokens without the cap, which both +// Anthropic and Bedrock reject ("max_tokens must be greater than thinking.budget_tokens"). +func TestBudgetTokensMaxEffortCapsBelowMaxTokens(t *testing.T) { + const minBudget = MinimumReasoningMaxTokens + + cases := []struct { + maxTokens int + wantBudget int + }{ + {maxTokens: 16000, wantBudget: 15999}, + {maxTokens: 32000, wantBudget: 31999}, + {maxTokens: 64000, wantBudget: 63999}, + {maxTokens: 128000, wantBudget: 127999}, + } + + for _, tc := range cases { + t.Run(fmt.Sprintf("maxTokens=%d", tc.maxTokens), func(t *testing.T) { + budget, err := providerUtils.GetBudgetTokensFromReasoningEffort("max", minBudget, tc.maxTokens) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if budget != tc.wantBudget { + t.Errorf("max effort with maxTokens=%d: got budget=%d, want %d", + tc.maxTokens, budget, tc.wantBudget) + } + }) + } +} diff --git a/core/providers/azure/azure.go b/core/providers/azure/azure.go index 6e2caa8d29..9fcec1ed98 100644 --- a/core/providers/azure/azure.go +++ b/core/providers/azure/azure.go @@ -33,6 +33,9 @@ const AzureAuthorizationTokenKey schemas.BifrostContextKey = "azure-authorizatio // DefaultAzureScope is the default scope for Azure authentication. const DefaultAzureScope = "https://cognitiveservices.azure.com/.default" +// DefaultAzureSorageScope is the default scope for Azure storage. +const DefaultAzureStorageScope = "https://storage.azure.com/.default" + // AzureProvider implements the Provider interface for Azure's API. type AzureProvider struct { logger schemas.Logger // Logger for provider operations @@ -491,6 +494,7 @@ func (provider *AzureProvider) TextCompletionStream(ctx *schemas.BifrostContext, request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -637,6 +641,7 @@ func (provider *AzureProvider) ChatCompletionStream(ctx *schemas.BifrostContext, jsonData, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, provider.networkConfig.BetaHeaderOverrides, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), @@ -665,6 +670,7 @@ func (provider *AzureProvider) ChatCompletionStream(ctx *schemas.BifrostContext, request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -792,6 +798,7 @@ func (provider *AzureProvider) ResponsesStream(ctx *schemas.BifrostContext, post jsonData, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, provider.networkConfig.BetaHeaderOverrides, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), @@ -816,6 +823,7 @@ func (provider *AzureProvider) ResponsesStream(ctx *schemas.BifrostContext, post request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -1006,7 +1014,7 @@ func (provider *AzureProvider) SpeechStream(ctx *schemas.BifrostContext, postHoo // Make the request requestErr := provider.client.Do(req, resp) if requestErr != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(requestErr, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -1028,7 +1036,7 @@ func (provider *AzureProvider) SpeechStream(ctx *schemas.BifrostContext, postHoo // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, openai.ParseOpenAIError(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) } @@ -1049,7 +1057,7 @@ func (provider *AzureProvider) SpeechStream(ctx *schemas.BifrostContext, postHoo close(responseChan) }() // Always release response on exit; bodyStream close should prevent indefinite blocking. - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -1334,6 +1342,7 @@ func (provider *AzureProvider) ImageGenerationStream( request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -1406,6 +1415,7 @@ func (provider *AzureProvider) ImageEditStream(ctx *schemas.BifrostContext, post request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, false, providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -2188,7 +2198,6 @@ func (provider *AzureProvider) BatchCreate(ctx *schemas.BifrostContext, key sche inputFileID = uploadResp.ID } - // Validate that we have a file ID (either provided or uploaded) if inputFileID == "" && request.InputBlob == nil { return nil, providerUtils.NewBifrostOperationError("either input_file_id, input_blob, or requests array is required for Azure batch API", nil) @@ -2614,10 +2623,128 @@ func (provider *AzureProvider) BatchDelete(ctx *schemas.BifrostContext, keys []s return nil, providerUtils.NewUnsupportedOperationError(schemas.BatchDeleteRequest, schemas.Azure) } -// BatchResults retrieves batch results from Azure OpenAI by trying each key until successful. -// For Azure (like OpenAI), batch results are obtained by downloading the output_file_id. +// getBlobStorageTokenForKey returns a Bearer token scoped to Azure Blob Storage for a single key. +func (provider *AzureProvider) getBlobStorageTokenForKey(ctx *schemas.BifrostContext, key schemas.Key) (string, *schemas.BifrostError) { + if key.AzureKeyConfig == nil { + return "", nil + } + cfg := key.AzureKeyConfig + + if cfg.ClientID != nil && cfg.ClientSecret != nil && cfg.TenantID != nil && + cfg.ClientID.GetValue() != "" && cfg.ClientSecret.GetValue() != "" && cfg.TenantID.GetValue() != "" { + cred, err := provider.getOrCreateAuth(cfg.TenantID.GetValue(), cfg.ClientID.GetValue(), cfg.ClientSecret.GetValue()) + if err != nil { + return "", providerUtils.NewProviderAPIError("failed to acquire Azure SP credentials for blob storage", err, http.StatusUnauthorized, nil, nil) + } + token, err := cred.GetToken(ctx, policy.TokenRequestOptions{Scopes: []string{DefaultAzureStorageScope}}) + if err != nil { + return "", providerUtils.NewProviderAPIError("failed to get Azure SP token for blob storage", err, http.StatusUnauthorized, nil, nil) + } + if token.Token == "" { + return "", providerUtils.NewProviderAPIError("Azure SP token for blob storage is empty", nil, http.StatusUnauthorized, nil, nil) + } + return token.Token, nil + } + + // No SP credentials: try DefaultAzureCredential (managed identity, workload identity, env vars, etc.). + // Failure is silent — ambient auth simply not available for this key. + cred, err := provider.getOrCreateDefaultAzureCredential() + if err != nil { + return "", nil + } + token, err := cred.GetToken(ctx, policy.TokenRequestOptions{Scopes: []string{DefaultAzureStorageScope}}) + if err != nil || token.Token == "" { + return "", nil + } + return token.Token, nil +} + +// isTrustedAzureBlobHost returns true if the host is a recognized Azure Blob Storage domain. +func isTrustedAzureBlobHost(host string) bool { + return strings.HasSuffix(host, ".blob.core.windows.net") || + strings.HasSuffix(host, ".dfs.core.windows.net") +} + +// downloadBlobURL fetches the content of an Azure Blob Storage URL, trying each key's +// credentials in sequence until a download succeeds — mirroring how FileContent loops keys. +// SAS URLs (containing "sig=") are fetched in a single unauthenticated attempt since the +// token in the URL already grants access. +func (provider *AzureProvider) downloadBlobURL(ctx *schemas.BifrostContext, blobURL string, keys []schemas.Key) ([]byte, int64, *schemas.BifrostError) { + // Validate host for all blob URLs before any outbound request + parsed, parseErr := url.Parse(blobURL) + if parseErr != nil || parsed.Scheme != "https" || !isTrustedAzureBlobHost(parsed.Hostname()) { + return nil, 0, providerUtils.NewBifrostOperationError( + fmt.Sprintf("blob URL is not a trusted Azure Blob Storage endpoint: %s", blobURL), nil, + ) + } + + // SAS URL: credentials are embedded + if strings.Contains(blobURL, "sig=") { + return provider.doGetBlob(ctx, blobURL, "") + } + + // Plain URL: try each key's storage credentials until one succeeds. + var lastErr *schemas.BifrostError + for _, key := range keys { + token, tokenErr := provider.getBlobStorageTokenForKey(ctx, key) + if tokenErr != nil { + lastErr = tokenErr + continue + } + if token == "" { + continue + } + content, latency, err := provider.doGetBlob(ctx, blobURL, token) + if err == nil { + return content, latency, nil + } + lastErr = err + } + + if lastErr != nil { + return nil, 0, lastErr + } + return nil, 0, providerUtils.NewBifrostOperationError("no Azure keys available for blob download", nil) +} + +// doGetBlob performs a single GET request to a blob URL, optionally adding a Bearer token. +func (provider *AzureProvider) doGetBlob(ctx *schemas.BifrostContext, blobURL string, bearerToken string) ([]byte, int64, *schemas.BifrostError) { + req := fasthttp.AcquireRequest() + resp := fasthttp.AcquireResponse() + defer fasthttp.ReleaseRequest(req) + defer fasthttp.ReleaseResponse(resp) + + req.SetRequestURI(blobURL) + req.Header.SetMethod(http.MethodGet) + if bearerToken != "" { + req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", bearerToken)) + req.Header.Set("x-ms-version", "2020-04-08") + } + + latency, bifrostErr, wait := providerUtils.MakeRequestWithContext(ctx, provider.client, req, resp) + defer wait() + if bifrostErr != nil { + return nil, 0, bifrostErr + } + + if resp.StatusCode() != fasthttp.StatusOK { + return nil, 0, providerUtils.NewBifrostOperationError( + fmt.Sprintf("blob download failed with status %d", resp.StatusCode()), nil, + ) + } + + body, err := providerUtils.CheckAndDecodeBody(resp) + if err != nil { + return nil, 0, providerUtils.NewBifrostOperationError(schemas.ErrProviderResponseDecode, err) + } + + return append([]byte(nil), body...), latency.Milliseconds(), nil +} + +// BatchResults retrieves batch results from Azure OpenAI. +// For file-based batches it downloads via output_file_id using the Files API. +// For blob-based batches it fetches the output_blob URL directly using Azure Storage credentials. func (provider *AzureProvider) BatchResults(ctx *schemas.BifrostContext, keys []schemas.Key, request *schemas.BifrostBatchResultsRequest) (*schemas.BifrostBatchResultsResponse, *schemas.BifrostError) { - // First, retrieve the batch to get the output_file_id (using all keys) batchResp, bifrostErr := provider.BatchRetrieve(ctx, keys, &schemas.BifrostBatchRetrieveRequest{ Provider: request.Provider, BatchID: request.BatchID, @@ -2626,23 +2753,35 @@ func (provider *AzureProvider) BatchResults(ctx *schemas.BifrostContext, keys [] return nil, bifrostErr } - if batchResp.OutputFileID == nil || *batchResp.OutputFileID == "" { - return nil, providerUtils.NewBifrostOperationError("batch results not available: output_file_id is empty (batch may not be completed)", nil) - } + var content []byte + var latencyMs int64 - // Download the output file content (using all keys) - fileContentResp, bifrostErr := provider.FileContent(ctx, keys, &schemas.BifrostFileContentRequest{ - Provider: request.Provider, - FileID: *batchResp.OutputFileID, - }) - if bifrostErr != nil { - return nil, bifrostErr + switch { + case batchResp.OutputFileID != nil && *batchResp.OutputFileID != "": + fileContentResp, err := provider.FileContent(ctx, keys, &schemas.BifrostFileContentRequest{ + Provider: request.Provider, + FileID: *batchResp.OutputFileID, + }) + if err != nil { + return nil, err + } + content = fileContentResp.Content + latencyMs = fileContentResp.ExtraFields.Latency + + case batchResp.OutputBlob != nil && *batchResp.OutputBlob != "": + blobContent, blobLatency, err := provider.downloadBlobURL(ctx, *batchResp.OutputBlob, keys) + if err != nil { + return nil, err + } + content = blobContent + latencyMs = blobLatency + + default: + return nil, providerUtils.NewBifrostOperationError("batch results not available: neither output_file_id nor output_blob is set (batch may not be completed yet)", nil) } - // Parse JSONL content - each line is a separate result var results []schemas.BatchResultItem - - parseResult := providerUtils.ParseJSONL(fileContentResp.Content, func(line []byte) error { + parseResult := providerUtils.ParseJSONL(content, func(line []byte) error { var resultItem schemas.BatchResultItem if err := sonic.Unmarshal(line, &resultItem); err != nil { provider.logger.Warn("failed to parse batch result line: %v", err) @@ -2656,7 +2795,7 @@ func (provider *AzureProvider) BatchResults(ctx *schemas.BifrostContext, keys [] BatchID: request.BatchID, Results: results, ExtraFields: schemas.BifrostResponseExtraFields{ - Latency: fileContentResp.ExtraFields.Latency, + Latency: latencyMs, }, } @@ -3597,7 +3736,7 @@ func (provider *AzureProvider) PassthroughStream( startTime := time.Now() if err := activeClient.Do(fasthttpReq, resp); err != nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -3619,7 +3758,7 @@ func (provider *AzureProvider) PassthroughStream( rawBodyStream := resp.BodyStream() if rawBodyStream == nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.NewBifrostOperationError("provider returned an empty stream body", fmt.Errorf("provider returned an empty stream body")) } @@ -3642,7 +3781,7 @@ func (provider *AzureProvider) PassthroughStream( } close(ch) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) defer stopIdleTimeout() defer stopCancellation() diff --git a/core/providers/azure/realtime.go b/core/providers/azure/realtime.go new file mode 100644 index 0000000000..ae19471a00 --- /dev/null +++ b/core/providers/azure/realtime.go @@ -0,0 +1,383 @@ +package azure + +import ( + "bytes" + "encoding/json" + "fmt" + "mime/multipart" + "net/http" + "net/url" + "strings" + + openaiProvider "github.com/maximhq/bifrost/core/providers/openai" + providerUtils "github.com/maximhq/bifrost/core/providers/utils" + "github.com/maximhq/bifrost/core/schemas" + "github.com/valyala/fasthttp" +) + +// openAIEventHelper is a zero-value OpenAI provider used solely to delegate +// event conversion calls. Azure uses the exact same Realtime wire protocol as +// OpenAI, so all event parsing, serialisation, usage extraction, turn detection, +// and output extraction can be reused without modification. +var openAIEventHelper = &openaiProvider.OpenAIProvider{} + +// --------------------------------------------------------------------------- +// RealtimeProvider interface +// --------------------------------------------------------------------------- + +func (provider *AzureProvider) SupportsRealtimeAPI() bool { + return true +} + +func (provider *AzureProvider) RealtimeWebSocketURL(key schemas.Key, model string) string { + endpoint := strings.TrimRight(key.AzureKeyConfig.Endpoint.GetValue(), "/") + endpoint = strings.Replace(endpoint, "https://", "wss://", 1) + endpoint = strings.Replace(endpoint, "http://", "ws://", 1) + + apiVersion := azureRealtimeAPIVersion(key) + + return fmt.Sprintf("%s/openai/v1/realtime?model=%s&api-version=%s", + endpoint, url.QueryEscape(model), url.QueryEscape(apiVersion)) +} + +func (provider *AzureProvider) RealtimeHeaders(ctx *schemas.BifrostContext, key schemas.Key) (map[string]string, *schemas.BifrostError) { + value := key.Value.GetValue() + + // Ephemeral tokens from /client_secrets use Bearer auth. + if strings.HasPrefix(value, "ek_") { + headers := map[string]string{ + "Authorization": "Bearer " + value, + } + for k, v := range provider.networkConfig.ExtraHeaders { + headers[k] = v + } + return headers, nil + } + + headers, authErr := provider.getAzureAuthHeaders(ctx, key, false) + if authErr != nil { + return nil, authErr + } + for k, v := range provider.networkConfig.ExtraHeaders { + headers[k] = v + } + return headers, nil +} + +func (provider *AzureProvider) SupportsRealtimeWebRTC() bool { + return true +} + +func (provider *AzureProvider) ExchangeRealtimeWebRTCSDP( + ctx *schemas.BifrostContext, + key schemas.Key, + model string, + sdp string, + session json.RawMessage, +) (string, *schemas.BifrostError) { + endpoint := strings.TrimRight(key.AzureKeyConfig.Endpoint.GetValue(), "/") + apiVersion := azureRealtimeAPIVersion(key) + + upstreamURL := fmt.Sprintf("%s/openai/v1/realtime?model=%s&api-version=%s", + endpoint, url.QueryEscape(model), url.QueryEscape(apiVersion)) + + // Build multipart body: sdp + optional session + bodyBuf := &bytes.Buffer{} + writer := multipart.NewWriter(bodyBuf) + if err := writer.WriteField("sdp", sdp); err != nil { + return "", newAzureRealtimeError(fasthttp.StatusInternalServerError, "server_error", "failed to encode upstream SDP body", err) + } + if session != nil { + if err := writer.WriteField("session", string(session)); err != nil { + return "", newAzureRealtimeError(fasthttp.StatusInternalServerError, "server_error", "failed to encode upstream session body", err) + } + } + if err := writer.Close(); err != nil { + return "", newAzureRealtimeError(fasthttp.StatusInternalServerError, "server_error", "failed to finalize upstream SDP body", err) + } + + req := fasthttp.AcquireRequest() + resp := fasthttp.AcquireResponse() + defer fasthttp.ReleaseRequest(req) + defer fasthttp.ReleaseResponse(resp) + + req.SetRequestURI(upstreamURL) + req.Header.SetMethod(http.MethodPost) + req.Header.SetContentType(writer.FormDataContentType()) + + // Ephemeral tokens (ek_*) need Bearer auth; regular API keys use api-key header. + value := key.Value.GetValue() + if strings.HasPrefix(value, "ek_") { + req.Header.Set("Authorization", "Bearer "+value) + } else { + authHeaders, authErr := provider.getAzureAuthHeaders(ctx, key, false) + if authErr != nil { + return "", authErr + } + for k, v := range authHeaders { + req.Header.Set(k, v) + } + } + + for k, v := range provider.networkConfig.ExtraHeaders { + req.Header.Set(k, v) + } + req.SetBody(bodyBuf.Bytes()) + + _, bifrostErr, wait := providerUtils.MakeRequestWithContext(ctx, provider.client, req, resp) + defer wait() + if bifrostErr != nil { + return "", bifrostErr + } + + answerBody := resp.Body() + if resp.StatusCode() < fasthttp.StatusOK || resp.StatusCode() >= fasthttp.StatusMultipleChoices { + return "", provider.realtimeWebRTCUpstreamError(ctx, resp.StatusCode(), answerBody) + } + + return string(answerBody), nil +} + +// --------------------------------------------------------------------------- +// Event conversion — delegates to OpenAI (same wire protocol) +// --------------------------------------------------------------------------- + +func (provider *AzureProvider) ToBifrostRealtimeEvent(providerEvent json.RawMessage) (*schemas.BifrostRealtimeEvent, error) { + return openAIEventHelper.ToBifrostRealtimeEvent(providerEvent) +} + +func (provider *AzureProvider) ToProviderRealtimeEvent(bifrostEvent *schemas.BifrostRealtimeEvent) (json.RawMessage, error) { + return openAIEventHelper.ToProviderRealtimeEvent(bifrostEvent) +} + +// --------------------------------------------------------------------------- +// Turn lifecycle — delegates to OpenAI +// --------------------------------------------------------------------------- + +func (provider *AzureProvider) ShouldStartRealtimeTurn(event *schemas.BifrostRealtimeEvent) bool { + return openAIEventHelper.ShouldStartRealtimeTurn(event) +} + +func (provider *AzureProvider) RealtimeTurnFinalEvent() schemas.RealtimeEventType { + return openAIEventHelper.RealtimeTurnFinalEvent() +} + +func (provider *AzureProvider) ShouldForwardRealtimeEvent(event *schemas.BifrostRealtimeEvent) bool { + return true +} + +func (provider *AzureProvider) ShouldAccumulateRealtimeOutput(eventType schemas.RealtimeEventType) bool { + return openAIEventHelper.ShouldAccumulateRealtimeOutput(eventType) +} + +func (provider *AzureProvider) RealtimeWebRTCDataChannelLabel() string { + return "oai-events" +} + +func (provider *AzureProvider) RealtimeWebSocketSubprotocol() string { + return "realtime" +} + +// --------------------------------------------------------------------------- +// RealtimeUsageExtractor — delegates to OpenAI +// --------------------------------------------------------------------------- + +func (provider *AzureProvider) ExtractRealtimeTurnUsage(terminalEventRaw []byte) *schemas.BifrostLLMUsage { + return openAIEventHelper.ExtractRealtimeTurnUsage(terminalEventRaw) +} + +func (provider *AzureProvider) ExtractRealtimeTurnOutput(terminalEventRaw []byte) *schemas.ChatMessage { + return openAIEventHelper.ExtractRealtimeTurnOutput(terminalEventRaw) +} + +// --------------------------------------------------------------------------- +// RealtimeSessionProvider — client_secrets only (not legacy /sessions) +// --------------------------------------------------------------------------- + +func (provider *AzureProvider) CreateRealtimeClientSecret( + ctx *schemas.BifrostContext, + key schemas.Key, + endpointType schemas.RealtimeSessionEndpointType, + rawRequest json.RawMessage, +) (*schemas.BifrostPassthroughResponse, *schemas.BifrostError) { + // Azure does not support the legacy /sessions endpoint. + if endpointType == schemas.RealtimeSessionEndpointSessions { + return nil, &schemas.BifrostError{ + IsBifrostError: true, + StatusCode: schemas.Ptr(fasthttp.StatusBadRequest), + Error: &schemas.ErrorField{ + Type: schemas.Ptr("invalid_request_error"), + Message: "Azure does not support the legacy /sessions endpoint; use /v1/realtime/client_secrets instead", + }, + ExtraFields: schemas.BifrostErrorExtraFields{ + RequestType: schemas.RealtimeRequest, + Provider: provider.GetProviderKey(), + }, + } + } + + normalizedBody, _, bifrostErr := openaiProvider.NormalizeRealtimeClientSecretRequest(rawRequest, schemas.Azure, endpointType) + if bifrostErr != nil { + return nil, bifrostErr + } + + endpoint := strings.TrimRight(key.AzureKeyConfig.Endpoint.GetValue(), "/") + apiVersion := azureRealtimeAPIVersion(key) + upstreamURL := fmt.Sprintf("%s/openai/v1/realtime/client_secrets?api-version=%s", + endpoint, url.QueryEscape(apiVersion)) + + req := fasthttp.AcquireRequest() + resp := fasthttp.AcquireResponse() + defer fasthttp.ReleaseRequest(req) + defer fasthttp.ReleaseResponse(resp) + + req.SetRequestURI(upstreamURL) + req.Header.SetMethod(http.MethodPost) + req.Header.SetContentType("application/json") + + authHeaders, authErr := provider.getAzureAuthHeaders(ctx, key, false) + if authErr != nil { + return nil, authErr + } + for k, v := range authHeaders { + req.Header.Set(k, v) + } + for k, v := range provider.networkConfig.ExtraHeaders { + req.Header.Set(k, v) + } + req.SetBody(normalizedBody) + + latency, bifrostErr, wait := providerUtils.MakeRequestWithContext(ctx, provider.client, req, resp) + defer wait() + if bifrostErr != nil { + return nil, bifrostErr + } + + headers := providerUtils.ExtractProviderResponseHeaders(resp) + ctx.SetValue(schemas.BifrostContextKeyProviderResponseHeaders, headers) + + if resp.StatusCode() < fasthttp.StatusOK || resp.StatusCode() >= fasthttp.StatusMultipleChoices { + return nil, provider.parseRealtimeClientSecretError(ctx, resp) + } + + body, err := providerUtils.CheckAndDecodeBody(resp) + if err != nil { + return nil, providerUtils.NewBifrostOperationError("failed to decode response body", err) + } + + out := &schemas.BifrostPassthroughResponse{ + StatusCode: resp.StatusCode(), + Headers: headers, + Body: body, + ExtraFields: schemas.BifrostResponseExtraFields{ + Latency: latency.Milliseconds(), + ProviderResponseHeaders: headers, + }, + } + if providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest) { + providerUtils.ParseAndSetRawRequestIfJSON(req, &out.ExtraFields) + } + + return out, nil +} + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +func (provider *AzureProvider) realtimeWebRTCUpstreamError(ctx *schemas.BifrostContext, statusCode int, body []byte) *schemas.BifrostError { + message := fmt.Sprintf("upstream realtime handshake failed for %s", provider.GetProviderKey()) + var parsed struct { + Error struct { + Message string `json:"message"` + } `json:"error"` + } + if json.Unmarshal(body, &parsed) == nil && parsed.Error.Message != "" { + message = parsed.Error.Message + } + + bifrostErr := &schemas.BifrostError{ + IsBifrostError: false, + StatusCode: schemas.Ptr(statusCode), + Error: &schemas.ErrorField{ + Type: schemas.Ptr("upstream_error"), + Message: message, + }, + ExtraFields: schemas.BifrostErrorExtraFields{ + RequestType: schemas.RealtimeRequest, + Provider: provider.GetProviderKey(), + }, + } + if providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse) { + bifrostErr.ExtraFields.RawResponse = map[string]any{ + "status": statusCode, + "body": string(body), + } + } + return bifrostErr +} + +func newAzureRealtimeError(status int, errorType, message string, err error) *schemas.BifrostError { + bifrostErr := &schemas.BifrostError{ + IsBifrostError: true, + StatusCode: schemas.Ptr(status), + Error: &schemas.ErrorField{ + Type: schemas.Ptr(errorType), + Message: message, + }, + ExtraFields: schemas.BifrostErrorExtraFields{ + RequestType: schemas.RealtimeRequest, + Provider: schemas.Azure, + }, + } + if err != nil { + bifrostErr.Error.Error = err + } + return bifrostErr +} + +// azureRealtimeAPIVersion returns the API version to use for realtime endpoints. +// Realtime requires a preview API version. If the key has an explicit version +// configured we honour it; otherwise we fall back to the preview version rather +// than the stable default (which does not support realtime). +func azureRealtimeAPIVersion(key schemas.Key) string { + if key.AzureKeyConfig != nil && key.AzureKeyConfig.APIVersion != nil { + if apiVersion := key.AzureKeyConfig.APIVersion.GetValue(); apiVersion != "" { + return apiVersion + } + } + return AzureAPIVersionPreview +} + +func (provider *AzureProvider) parseRealtimeClientSecretError(ctx *schemas.BifrostContext, resp *fasthttp.Response) *schemas.BifrostError { + body, _ := providerUtils.CheckAndDecodeBody(resp) + var parsed struct { + Error struct { + Code string `json:"code"` + Message string `json:"message"` + } `json:"error"` + } + msg := string(body) + if json.Unmarshal(body, &parsed) == nil && parsed.Error.Message != "" { + msg = parsed.Error.Message + } + bifrostErr := &schemas.BifrostError{ + IsBifrostError: false, + StatusCode: schemas.Ptr(resp.StatusCode()), + Error: &schemas.ErrorField{ + Type: schemas.Ptr("upstream_error"), + Message: msg, + }, + ExtraFields: schemas.BifrostErrorExtraFields{ + RequestType: schemas.RealtimeRequest, + Provider: provider.GetProviderKey(), + }, + } + if providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse) { + bifrostErr.ExtraFields.RawResponse = map[string]any{ + "status": resp.StatusCode(), + "body": string(body), + } + } + return bifrostErr +} diff --git a/core/providers/bedrock/bedrock.go b/core/providers/bedrock/bedrock.go index c5d5d07e9e..750c484876 100644 --- a/core/providers/bedrock/bedrock.go +++ b/core/providers/bedrock/bedrock.go @@ -28,17 +28,20 @@ import ( "github.com/maximhq/bifrost/core/providers/anthropic" providerUtils "github.com/maximhq/bifrost/core/providers/utils" schemas "github.com/maximhq/bifrost/core/schemas" + "github.com/valyala/fasthttp" ) // BedrockProvider implements the Provider interface for AWS Bedrock. type BedrockProvider struct { - logger schemas.Logger // Logger for provider operations - client *http.Client // HTTP client for unary API requests (Client.Timeout bounds overall response) - streamingClient *http.Client // HTTP client for streaming API requests (no Timeout; idle governed by NewIdleTimeoutReader) - networkConfig schemas.NetworkConfig // Network configuration including extra headers - customProviderConfig *schemas.CustomProviderConfig // Custom provider config - sendBackRawRequest bool // Whether to include raw request in BifrostResponse - sendBackRawResponse bool // Whether to include raw response in BifrostResponse + logger schemas.Logger // Logger for provider operations + client *http.Client // HTTP client for unary API requests (Client.Timeout bounds overall response) + streamingClient *http.Client // HTTP client for streaming API requests (no Timeout; idle governed by NewIdleTimeoutReader) + mantleClient *fasthttp.Client // fasthttp client for Bedrock Mantle (OpenAI-compatible) requests + mantleStreamingClient *fasthttp.Client // fasthttp streaming client for Bedrock Mantle streaming requests + networkConfig schemas.NetworkConfig // Network configuration including extra headers + customProviderConfig *schemas.CustomProviderConfig // Custom provider config + sendBackRawRequest bool // Whether to include raw request in BifrostResponse + sendBackRawResponse bool // Whether to include raw response in BifrostResponse } // assumeRoleCredsCache caches *aws.CredentialsCache instances keyed by the @@ -121,19 +124,36 @@ func NewBedrockProvider(config *schemas.ProviderConfig, logger schemas.Logger) ( client := &http.Client{Transport: transport, Timeout: requestTimeout} streamingClient := providerUtils.BuildStreamingHTTPClient(client) + // fasthttp clients for Bedrock Mantle (OpenAI-compatible endpoint) + mantleFasthttpClient := &fasthttp.Client{ + ReadTimeout: requestTimeout, + WriteTimeout: requestTimeout, + MaxConnsPerHost: config.NetworkConfig.MaxConnsPerHost, + MaxIdleConnDuration: 30 * time.Second, + MaxConnWaitTimeout: requestTimeout, + MaxConnDuration: time.Second * time.Duration(schemas.DefaultMaxConnDurationInSeconds), + ConnPoolStrategy: fasthttp.FIFO, + } + mantleFasthttpClient = providerUtils.ConfigureProxy(mantleFasthttpClient, config.ProxyConfig, logger) + mantleFasthttpClient = providerUtils.ConfigureDialer(mantleFasthttpClient) + mantleFasthttpClient = providerUtils.ConfigureTLS(mantleFasthttpClient, config.NetworkConfig, logger) + mantleStreamingFasthttpClient := providerUtils.BuildStreamingClient(mantleFasthttpClient) + // Pre-warm response pools for i := 0; i < config.ConcurrencyAndBufferSize.Concurrency; i++ { bedrockChatResponsePool.Put(&BedrockConverseResponse{}) } return &BedrockProvider{ - logger: logger, - client: client, - streamingClient: streamingClient, - networkConfig: config.NetworkConfig, - customProviderConfig: config.CustomProviderConfig, - sendBackRawRequest: config.SendBackRawRequest, - sendBackRawResponse: config.SendBackRawResponse, + logger: logger, + client: client, + streamingClient: streamingClient, + mantleClient: mantleFasthttpClient, + mantleStreamingClient: mantleStreamingFasthttpClient, + networkConfig: config.NetworkConfig, + customProviderConfig: config.CustomProviderConfig, + sendBackRawRequest: config.SendBackRawRequest, + sendBackRawResponse: config.SendBackRawResponse, }, nil } @@ -502,6 +522,30 @@ func (provider *BedrockProvider) makeStreamingRequest(ctx *schemas.BifrostContex // It is used in providers like Bedrock. // It sets required headers, calculates the request body hash, and signs the request // using the provided AWS credentials. +// signAWSRequestFromKey is a convenience wrapper around signAWSRequest that reads +// credentials from a BedrockKeyConfig. When cfg is nil (no explicit key configured), +// all credential fields are zero-valued, causing signAWSRequest to fall back to the +// default AWS credential chain (IAM role, env vars, instance profile, etc.). +func signAWSRequestFromKey( + ctx *schemas.BifrostContext, + req *http.Request, + cfg *schemas.BedrockKeyConfig, + region, service string, +) *schemas.BifrostError { + if cfg != nil { + return signAWSRequest(ctx, req, + cfg.AccessKey, cfg.SecretKey, + cfg.SessionToken, cfg.RoleARN, + cfg.ExternalID, cfg.RoleSessionName, + region, service) + } + // No config: pass zero EnvVar values so signAWSRequest uses the default chain. + return signAWSRequest(ctx, req, + schemas.EnvVar{}, schemas.EnvVar{}, + nil, nil, nil, nil, + region, service) +} + // Returns a BifrostError if signing fails. func signAWSRequest( ctx *schemas.BifrostContext, @@ -1028,6 +1072,10 @@ func (provider *BedrockProvider) ChatCompletion(ctx *schemas.BifrostContext, key return nil, err } + if isMantleModel(request.Model) { + return provider.chatCompletionViaMantle(ctx, key, request) + } + // Use centralized Bedrock converter jsonData, bifrostErr := providerUtils.CheckContextAndGetRequestBody( ctx, @@ -1103,6 +1151,11 @@ func (provider *BedrockProvider) ChatCompletionStream(ctx *schemas.BifrostContex if err := providerUtils.CheckOperationAllowed(schemas.Bedrock, provider.customProviderConfig, schemas.ChatCompletionStreamRequest); err != nil { return nil, err } + + if isMantleModel(request.Model) { + return provider.chatCompletionStreamViaMantle(ctx, postHookRunner, postHookSpanFinalizer, key, request) + } + jsonData, bifrostErr := providerUtils.CheckContextAndGetRequestBody( ctx, request, @@ -1407,6 +1460,10 @@ func (provider *BedrockProvider) Responses(ctx *schemas.BifrostContext, key sche return nil, err } + if isMantleModel(request.Model) { + return provider.responsesViaMantle(ctx, key, request) + } + // Use centralized Bedrock converter jsonData, bifrostErr := providerUtils.CheckContextAndGetRequestBody( ctx, @@ -1475,6 +1532,10 @@ func (provider *BedrockProvider) ResponsesStream(ctx *schemas.BifrostContext, po return nil, err } + if isMantleModel(request.Model) { + return provider.responsesStreamViaMantle(ctx, postHookRunner, postHookSpanFinalizer, key, request) + } + jsonData, bifrostErr := providerUtils.CheckContextAndGetRequestBody( ctx, request, diff --git a/core/providers/bedrock/bedrock_test.go b/core/providers/bedrock/bedrock_test.go index 63048214ea..3964f6375e 100644 --- a/core/providers/bedrock/bedrock_test.go +++ b/core/providers/bedrock/bedrock_test.go @@ -2376,34 +2376,20 @@ func TestToBedrockResponsesRequest_AnthropicTextFormatUsesOutputConfig(t *testin bedrockReq, err := bedrock.ToBedrockResponsesRequest(ctx, req) require.NoError(t, err) require.NotNil(t, bedrockReq) - require.NotNil(t, bedrockReq.AdditionalModelRequestFields, "expected additional model request fields for anthropic responses structured output") - outputConfigRaw, hasOutputConfig := bedrockReq.AdditionalModelRequestFields.Get("output_config") - require.True(t, hasOutputConfig, "expected output_config for anthropic responses structured output") - - outputConfig, ok := schemas.SafeExtractOrderedMap(outputConfigRaw) - require.True(t, ok, "expected output_config to be an ordered map") - - formatRaw, hasFormat := outputConfig.Get("format") - require.True(t, hasFormat, "expected output_config.format") - - formatMap, ok := schemas.SafeExtractOrderedMap(formatRaw) - require.True(t, ok, "expected output_config.format to be an ordered map") - - formatType, ok := formatMap.Get("type") - require.True(t, ok, "expected output_config.format.type") - assert.Equal(t, "json_schema", formatType) - - schemaRaw, ok := formatMap.Get("schema") - require.True(t, ok, "expected output_config.format.schema") - schemaMap, ok := schemas.SafeExtractOrderedMap(schemaRaw) - require.True(t, ok, "expected output_config.format.schema to remain ordered") - require.NotNil(t, schemaMap) - - if bedrockReq.ToolConfig != nil { - assert.Nil(t, bedrockReq.ToolConfig.ToolChoice, "expected no forced tool choice for anthropic responses structured output") - assert.Empty(t, bedrockReq.ToolConfig.Tools, "expected no synthetic structured output tool for anthropic responses structured output") + // PR #3184 moved Anthropic structured output off native output_config.format + // (rejected by Opus 4.7) onto the synthetic bf_so_* tool path used by all + // Bedrock models. The test now asserts the synthetic tool reached + // toolConfig.tools. + require.NotNil(t, bedrockReq.ToolConfig, "expected toolConfig for structured output") + foundSyntheticTool := false + for _, tool := range bedrockReq.ToolConfig.Tools { + if tool.ToolSpec != nil && strings.HasPrefix(tool.ToolSpec.Name, "bf_so_") { + foundSyntheticTool = true + break + } } + require.True(t, foundSyntheticTool, "expected synthetic bf_so_* tool for structured output") } func TestToBedrockResponsesRequest_NonAnthropicTextFormatStillUsesToolConversion(t *testing.T) { @@ -3312,22 +3298,17 @@ func TestAnthropicStructuredOutputUsesOutputConfigWithoutForcedToolChoice(t *tes require.NotNil(t, result) require.NotNil(t, result.AdditionalModelRequestFields) - outputConfigRaw, hasOutputConfig := result.AdditionalModelRequestFields.Get("output_config") - require.True(t, hasOutputConfig, "expected output_config for anthropic structured output") - - outputConfig, ok := outputConfigRaw.(*schemas.OrderedMap) - require.True(t, ok, "expected output_config to be an ordered map") - - formatRaw, hasFormat := outputConfig.Get("format") - require.True(t, hasFormat, "expected output_config.format") - - format, ok := formatRaw.(*schemas.OrderedMap) - require.True(t, ok, "expected output_config.format to be an ordered map") - formatType, hasType := format.Get("type") - require.True(t, hasType, "expected output_config.format.type") - assert.Equal(t, "json_schema", formatType) - _, hasSchema := format.Get("schema") - assert.True(t, hasSchema, "expected output_config.format.schema") + // PR #3184 moved Anthropic structured output off native output_config.format + // onto the synthetic bf_so_* tool path used by all Bedrock models. + require.NotNil(t, result.ToolConfig, "expected toolConfig for structured output") + foundSyntheticTool := false + for _, tool := range result.ToolConfig.Tools { + if tool.ToolSpec != nil && strings.HasPrefix(tool.ToolSpec.Name, "bf_so_") { + foundSyntheticTool = true + break + } + } + require.True(t, foundSyntheticTool, "expected synthetic bf_so_* tool for structured output") // reasoning should still be preserved for anthropic thinkingRaw, hasThinking := result.AdditionalModelRequestFields.Get("thinking") @@ -3335,12 +3316,6 @@ func TestAnthropicStructuredOutputUsesOutputConfigWithoutForcedToolChoice(t *tes thinking, ok := thinkingRaw.(map[string]any) require.True(t, ok, "expected thinking to be a map") assert.Equal(t, "enabled", thinking["type"]) - - // structured output should NOT force tool choice on Bedrock anthropic - if result.ToolConfig != nil { - assert.Nil(t, result.ToolConfig.ToolChoice, "expected no forced tool choice for anthropic structured output") - assert.Empty(t, result.ToolConfig.Tools, "expected no synthetic structured output tool for anthropic structured output") - } } func TestAnthropicStructuredOutputAcceptsOrderedMaps(t *testing.T) { @@ -3385,26 +3360,19 @@ func TestAnthropicStructuredOutputAcceptsOrderedMaps(t *testing.T) { require.NotNil(t, result) require.NotNil(t, result.AdditionalModelRequestFields) - outputConfigRaw, hasOutputConfig := result.AdditionalModelRequestFields.Get("output_config") - require.True(t, hasOutputConfig, "expected output_config for anthropic structured output") - - outputConfig, ok := outputConfigRaw.(*schemas.OrderedMap) - require.True(t, ok, "expected output_config to be an ordered map") - - formatRaw, hasFormat := outputConfig.Get("format") - require.True(t, hasFormat, "expected output_config.format") - - format, ok := formatRaw.(*schemas.OrderedMap) - require.True(t, ok, "expected output_config.format to be an ordered map") - - formatType, ok := format.Get("type") - require.True(t, ok, "expected output_config.format.type") - assert.Equal(t, "json_schema", formatType) - - schemaRaw, ok := format.Get("schema") - require.True(t, ok, "expected output_config.format.schema") - _, ok = schemaRaw.(*schemas.OrderedMap) - require.True(t, ok, "expected output_config.format.schema to remain ordered") + // PR #3184 moved Anthropic structured output off native output_config.format + // onto the synthetic bf_so_* tool path. Test asserts the synthetic tool path + // accepts OrderedMap-shaped response_format input without dropping the schema. + require.NotNil(t, result.ToolConfig, "expected toolConfig for structured output") + var syntheticTool *bedrock.BedrockTool + for i, tool := range result.ToolConfig.Tools { + if tool.ToolSpec != nil && strings.HasPrefix(tool.ToolSpec.Name, "bf_so_") { + syntheticTool = &result.ToolConfig.Tools[i] + break + } + } + require.NotNil(t, syntheticTool, "expected synthetic bf_so_* tool for structured output") + require.NotEmpty(t, syntheticTool.ToolSpec.InputSchema.JSON, "expected synthetic tool schema bytes") } // betaListContains reports whether the OrderedMap's anthropic_beta entry @@ -3677,23 +3645,27 @@ func TestAnthropicStructuredOutputMergesAdditionalModelRequestFieldPaths(t *test require.NotNil(t, result) require.NotNil(t, result.AdditionalModelRequestFields) + // Structured output is routed through the synthetic bf_so_* tool path on all + // Bedrock models (see PR #3184 and utils.go:1172). Native output_config.format + // is intentionally not written for any Bedrock model, so the merge test + // asserts the synthetic tool reached toolConfig.tools instead. + require.NotNil(t, result.ToolConfig, "expected toolConfig for structured output") + foundSyntheticTool := false + for _, tool := range result.ToolConfig.Tools { + if tool.ToolSpec != nil && strings.HasPrefix(tool.ToolSpec.Name, "bf_so_") { + foundSyntheticTool = true + break + } + } + require.True(t, foundSyntheticTool, "expected synthetic bf_so_* tool for structured output") + + // Incoming additionalModelRequestFieldPaths.output_config key must be merged + // into AdditionalModelRequestFields.output_config even though the structured + // output path no longer writes output_config.format itself. outputConfigRaw, hasOutputConfig := result.AdditionalModelRequestFields.Get("output_config") - require.True(t, hasOutputConfig, "expected output_config to exist after merge") + require.True(t, hasOutputConfig, "expected output_config to exist from user-provided fields") outputConfig, ok := outputConfigRaw.(*schemas.OrderedMap) require.True(t, ok, "expected output_config to be an ordered map") - - // Existing structured output format must be preserved. - formatRaw, hasFormat := outputConfig.Get("format") - require.True(t, hasFormat, "expected output_config.format to be preserved") - format, ok := formatRaw.(*schemas.OrderedMap) - require.True(t, ok, "expected output_config.format to be an ordered map") - formatType, hasType := format.Get("type") - require.True(t, hasType, "expected output_config.format.type") - assert.Equal(t, "json_schema", formatType) - _, hasSchema := format.Get("schema") - assert.True(t, hasSchema, "expected output_config.format.schema") - - // Incoming additionalModelRequestFieldPaths.output_config key must be merged. foo, hasFoo := outputConfig.Get("foo") require.True(t, hasFoo, "expected output_config.foo to be preserved") assert.Equal(t, "bar", foo) @@ -4069,9 +4041,9 @@ func TestBedrockStopReasonMapping(t *testing.T) { {"MaxTokens", "max_tokens", "length"}, {"StopSequence", "stop_sequence", "stop"}, {"ToolUse", "tool_use", "tool_calls"}, - {"GuardrailIntervened", "guardrail_intervened", "content_filter"}, + {"GuardrailIntervened", "guardrail_intervened", "guardrail_intervened"}, // no clean mapping — passes through {"ContentFiltered", "content_filtered", "content_filter"}, - {"UnknownReason", "some_unknown_reason", "stop"}, + {"UnknownReason", "some_unknown_reason", "some_unknown_reason"}, // no clean mapping — passes through } for _, tt := range tests { @@ -4104,6 +4076,116 @@ func TestBedrockStopReasonMapping(t *testing.T) { } } +// TestBedrockStopReasonMappingResponsesPath tests stop reason normalisation for +// the Responses API path (BedrockConverseResponse.ToBifrostResponsesResponse). +func TestBedrockStopReasonMappingResponsesPath(t *testing.T) { + t.Parallel() + + tests := []struct { + name string + bedrockReason string + expectedBifrost string + }{ + {"EndTurn", "end_turn", "stop"}, + {"MaxTokens", "max_tokens", "length"}, + {"StopSequence", "stop_sequence", "stop"}, + {"ToolUse", "tool_use", "tool_calls"}, + {"ContentFiltered", "content_filtered", "content_filter"}, + {"GuardrailIntervened", "guardrail_intervened", "guardrail_intervened"}, // no clean mapping — passes through + {"UnknownReason", "some_unknown_reason", "some_unknown_reason"}, // no clean mapping — passes through + } + + ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + response := &bedrock.BedrockConverseResponse{ + StopReason: tt.bedrockReason, + Output: &bedrock.BedrockConverseOutput{ + Message: &bedrock.BedrockMessage{ + Role: bedrock.BedrockMessageRoleAssistant, + Content: []bedrock.BedrockContentBlock{ + {Text: schemas.Ptr("Response text")}, + }, + }, + }, + } + + bifrostResp, err := response.ToBifrostResponsesResponse(ctx) + require.NoError(t, err) + require.NotNil(t, bifrostResp) + require.NotNil(t, bifrostResp.StopReason, "StopReason should be set") + assert.Equal(t, tt.expectedBifrost, *bifrostResp.StopReason, + "Bedrock stop reason %q should map to %q in responses path", tt.bedrockReason, tt.expectedBifrost) + }) + } +} + +// TestBifrostToBedrockStopReasonReverseMapping tests the reverse conversion +// (BifrostResponsesResponse.StopReason → BedrockConverseResponse.StopReason). +func TestBifrostToBedrockStopReasonReverseMapping(t *testing.T) { + t.Parallel() + + textOutput := []schemas.ResponsesMessage{ + { + Type: schemas.Ptr(schemas.ResponsesMessageTypeMessage), + Role: schemas.Ptr(schemas.ResponsesInputMessageRoleAssistant), + Content: &schemas.ResponsesMessageContent{ + ContentBlocks: []schemas.ResponsesMessageContentBlock{ + { + Type: schemas.ResponsesOutputMessageContentTypeText, + Text: schemas.Ptr("Hello"), + }, + }, + }, + }, + } + + tests := []struct { + name string + stopReason *string + incompleteDetails *schemas.ResponsesResponseIncompleteDetails + expectedBedrock string + }{ + {"Stop", schemas.Ptr("stop"), nil, "end_turn"}, + {"Length", schemas.Ptr("length"), nil, "max_tokens"}, + {"ToolCalls", schemas.Ptr("tool_calls"), nil, "tool_use"}, + {"ContentFilter", schemas.Ptr("content_filter"), nil, "content_filtered"}, + {"GuardrailIntervened", schemas.Ptr("guardrail_intervened"), nil, "guardrail_intervened"}, // passes through + {"UnknownPassthrough", schemas.Ptr("some_unknown_reason"), nil, "some_unknown_reason"}, // passes through + { + // StopReason takes priority over IncompleteDetails + name: "StopReasonOverridesIncompleteDetails", + stopReason: schemas.Ptr("stop"), + incompleteDetails: &schemas.ResponsesResponseIncompleteDetails{Reason: "max_tokens"}, + expectedBedrock: "end_turn", + }, + { + // IncompleteDetails is used when StopReason is nil + name: "IncompleteDetailsFallback", + stopReason: nil, + incompleteDetails: &schemas.ResponsesResponseIncompleteDetails{Reason: "max_tokens"}, + expectedBedrock: "max_tokens", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + input := &schemas.BifrostResponsesResponse{ + Output: textOutput, + StopReason: tt.stopReason, + IncompleteDetails: tt.incompleteDetails, + } + + actual, err := bedrock.ToBedrockConverseResponse(input) + require.NoError(t, err) + require.NotNil(t, actual) + assert.Equal(t, tt.expectedBedrock, actual.StopReason, + "Bifrost stop reason %v should reverse-map to Bedrock %q", tt.stopReason, tt.expectedBedrock) + }) + } +} + func TestGuardrailConfigStreamProcessingMode(t *testing.T) { t.Parallel() diff --git a/core/providers/bedrock/chat.go b/core/providers/bedrock/chat.go index df631e9f61..36ef976679 100644 --- a/core/providers/bedrock/chat.go +++ b/core/providers/bedrock/chat.go @@ -3,6 +3,7 @@ package bedrock import ( "context" "fmt" + "strings" "time" "github.com/google/uuid" @@ -42,6 +43,19 @@ func ToBedrockChatCompletionRequest(ctx *schemas.BifrostContext, bifrostReq *sch bedrockReq.System = systemMessages } + // Trim trailing whitespace from the last assistant message text blocks + // (only for Anthropic models which use text-based prefill) + lastMsgIndex := len(bedrockReq.Messages) - 1 + if schemas.IsAnthropicModel(bifrostReq.Model) && lastMsgIndex >= 0 && bedrockReq.Messages[lastMsgIndex].Role == BedrockMessageRoleAssistant { + blocks := bedrockReq.Messages[lastMsgIndex].Content + for j := len(blocks) - 1; j >= 0; j-- { + if blocks[j].Text != nil { + bedrockReq.Messages[lastMsgIndex].Content[j].Text = schemas.Ptr(strings.TrimRight(*blocks[j].Text, " \n\r\t")) + break + } + } + } + // Convert parameters and configurations if err := convertChatParameters(ctx, bifrostReq, bedrockReq); err != nil { return nil, fmt.Errorf("failed to convert chat parameters: %w", err) diff --git a/core/providers/bedrock/mantle.go b/core/providers/bedrock/mantle.go new file mode 100644 index 0000000000..a9caa829e8 --- /dev/null +++ b/core/providers/bedrock/mantle.go @@ -0,0 +1,254 @@ +package bedrock + +import ( + "bytes" + "context" + "fmt" + "maps" + "net/http" + "strings" + + openai "github.com/maximhq/bifrost/core/providers/openai" + providerUtils "github.com/maximhq/bifrost/core/providers/utils" + schemas "github.com/maximhq/bifrost/core/schemas" +) + +// isMantleModel reports whether a model should be routed via the Bedrock Mantle endpoint. +// Accepts "gpt-oss-120b", "openai.gpt-oss-120b", or region-prefixed variants. +func isMantleModel(model string) bool { + return strings.Contains(model, "gpt-oss") +} + +// mantleURL builds the Bedrock Mantle endpoint URL for the given region and API path. +func mantleURL(region, path string) string { + return fmt.Sprintf("https://bedrock-mantle.%s.api.aws/v1/%s", region, path) +} + +// mantleSigV4Headers computes SigV4 auth headers for a mantle request by signing a dummy +// net/http.Request. jsonData must be the exact bytes that will be sent. accept must match +// the Accept header the actual request will send, since SigV4 signs all request headers. +func (provider *BedrockProvider) mantleSigV4Headers( + ctx *schemas.BifrostContext, + jsonData []byte, + requestURL, accept string, + key schemas.Key, + region string, +) (map[string]string, *schemas.BifrostError) { + req, err := http.NewRequestWithContext(ctx, http.MethodPost, requestURL, bytes.NewReader(jsonData)) + if err != nil { + return nil, providerUtils.NewBifrostOperationError("failed to create signing request", err) + } + req.Header.Set("Accept", accept) + if bifrostErr := signAWSRequestFromKey(ctx, req, key.BedrockKeyConfig, region, bedrockMantleSigningService); bifrostErr != nil { + return nil, bifrostErr + } + headers := map[string]string{ + "Authorization": req.Header.Get("Authorization"), + "X-Amz-Date": req.Header.Get("X-Amz-Date"), + "x-amz-content-sha256": req.Header.Get("x-amz-content-sha256"), + "Accept": accept, + } + if token := req.Header.Get("X-Amz-Security-Token"); token != "" { + headers["X-Amz-Security-Token"] = token + } + return headers, nil +} + +// chatCompletionViaMantle handles non-streaming chat completions for mantle (gpt-oss) models. +func (provider *BedrockProvider) chatCompletionViaMantle( + ctx *schemas.BifrostContext, + key schemas.Key, + request *schemas.BifrostChatRequest, +) (*schemas.BifrostChatResponse, *schemas.BifrostError) { + region := resolveBedrockRegion(key, request.Model) + url := mantleURL(region, "chat/completions") + + // Build extraHeaders: always start with network-config headers, then overlay SigV4 if needed. + // Allocate explicitly so maps.Copy never writes into a nil map. + extraHeaders := make(map[string]string, len(provider.networkConfig.ExtraHeaders)) + maps.Copy(extraHeaders, provider.networkConfig.ExtraHeaders) + if key.Value.GetValue() == "" { + // SigV4: pre-build body for signing. HandleOpenAIChatCompletionRequest rebuilds the + // same bytes (deterministic marshaling), so the signature stays valid. + jsonData, bifrostErr := providerUtils.CheckContextAndGetRequestBody(ctx, request, func() (providerUtils.RequestBodyWithExtraParams, error) { + return openai.ToOpenAIChatRequest(ctx, request), nil + }) + if bifrostErr != nil { + return nil, bifrostErr + } + sigHeaders, bifrostErr := provider.mantleSigV4Headers(ctx, jsonData, url, "application/json", key, region) + if bifrostErr != nil { + return nil, bifrostErr + } + maps.Copy(extraHeaders, sigHeaders) + } + + return openai.HandleOpenAIChatCompletionRequest( + ctx, + provider.mantleClient, + url, + request, + key, + extraHeaders, + providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), + providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), + provider.GetProviderKey(), + nil, nil, + provider.logger, + ) +} + +// chatCompletionStreamViaMantle handles streaming chat completions for mantle (gpt-oss) models. +func (provider *BedrockProvider) chatCompletionStreamViaMantle( + ctx *schemas.BifrostContext, + postHookRunner schemas.PostHookRunner, + postHookSpanFinalizer func(context.Context), + key schemas.Key, + request *schemas.BifrostChatRequest, +) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + region := resolveBedrockRegion(key, request.Model) + url := mantleURL(region, "chat/completions") + + // Bearer: identical to Groq / any OpenAI-compatible provider. + if key.Value.GetValue() != "" { + authHeader := map[string]string{"Authorization": "Bearer " + key.Value.GetValue()} + return openai.HandleOpenAIChatCompletionStreaming( + ctx, provider.mantleStreamingClient, url, request, + authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, + providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), + providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), + provider.GetProviderKey(), postHookRunner, + nil, nil, nil, nil, nil, + provider.logger, postHookSpanFinalizer, + ) + } + + // SigV4: pre-build body to sign, then pass it via customRequestConverter so the handler + // sends the exact same bytes we signed. + openaiReq := openai.ToOpenAIChatRequest(ctx, request) + openaiReq.Stream = schemas.Ptr(true) + openaiReq.StreamOptions = &schemas.ChatStreamOptions{IncludeUsage: schemas.Ptr(true)} + + jsonData, bifrostErr := providerUtils.CheckContextAndGetRequestBody(ctx, request, func() (providerUtils.RequestBodyWithExtraParams, error) { + return openaiReq, nil + }) + if bifrostErr != nil { + return nil, bifrostErr + } + authHeader, bifrostErr := provider.mantleSigV4Headers(ctx, jsonData, url, "text/event-stream", key, region) + if bifrostErr != nil { + return nil, bifrostErr + } + + return openai.HandleOpenAIChatCompletionStreaming( + ctx, provider.mantleStreamingClient, url, request, + authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, + providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), + providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), + provider.GetProviderKey(), postHookRunner, + func(_ *schemas.BifrostChatRequest) (providerUtils.RequestBodyWithExtraParams, error) { + return openaiReq, nil + }, + nil, nil, nil, nil, + provider.logger, postHookSpanFinalizer, + ) +} + +// responsesViaMantle handles non-streaming Responses API requests for mantle (gpt-oss) models. +func (provider *BedrockProvider) responsesViaMantle( + ctx *schemas.BifrostContext, + key schemas.Key, + request *schemas.BifrostResponsesRequest, +) (*schemas.BifrostResponsesResponse, *schemas.BifrostError) { + region := resolveBedrockRegion(key, request.Model) + url := mantleURL(region, "responses") + + extraHeaders := make(map[string]string, len(provider.networkConfig.ExtraHeaders)) + maps.Copy(extraHeaders, provider.networkConfig.ExtraHeaders) + if key.Value.GetValue() == "" { + jsonData, bifrostErr := providerUtils.CheckContextAndGetRequestBody(ctx, request, func() (providerUtils.RequestBodyWithExtraParams, error) { + return openai.ToOpenAIResponsesRequest(request), nil + }) + if bifrostErr != nil { + return nil, bifrostErr + } + sigHeaders, bifrostErr := provider.mantleSigV4Headers(ctx, jsonData, url, "application/json", key, region) + if bifrostErr != nil { + return nil, bifrostErr + } + maps.Copy(extraHeaders, sigHeaders) + } + + return openai.HandleOpenAIResponsesRequest( + ctx, + provider.mantleClient, + url, + request, + key, + extraHeaders, + providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), + providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), + provider.GetProviderKey(), + nil, nil, + provider.logger, + ) +} + +// responsesStreamViaMantle handles streaming Responses API requests for mantle (gpt-oss) models. +func (provider *BedrockProvider) responsesStreamViaMantle( + ctx *schemas.BifrostContext, + postHookRunner schemas.PostHookRunner, + postHookSpanFinalizer func(context.Context), + key schemas.Key, + request *schemas.BifrostResponsesRequest, +) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + region := resolveBedrockRegion(key, request.Model) + url := mantleURL(region, "responses") + + // Bearer: identical to Groq / any OpenAI-compatible provider. + if key.Value.GetValue() != "" { + authHeader := map[string]string{"Authorization": "Bearer " + key.Value.GetValue()} + return openai.HandleOpenAIResponsesStreaming( + ctx, provider.mantleStreamingClient, url, request, + authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, + providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), + providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), + provider.GetProviderKey(), postHookRunner, + nil, nil, nil, nil, + provider.logger, postHookSpanFinalizer, + ) + } + + // SigV4: pre-build body to sign. + openaiReq := openai.ToOpenAIResponsesRequest(request) + openaiReq.Stream = schemas.Ptr(true) + + jsonData, bifrostErr := providerUtils.CheckContextAndGetRequestBody(ctx, request, func() (providerUtils.RequestBodyWithExtraParams, error) { + return openaiReq, nil + }) + if bifrostErr != nil { + return nil, bifrostErr + } + authHeader, bifrostErr := provider.mantleSigV4Headers(ctx, jsonData, url, "text/event-stream", key, region) + if bifrostErr != nil { + return nil, bifrostErr + } + + return openai.HandleOpenAIResponsesStreaming( + ctx, provider.mantleStreamingClient, url, request, + authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, + providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), + providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), + provider.GetProviderKey(), postHookRunner, + nil, nil, + func(_ *openai.OpenAIResponsesRequest) *openai.OpenAIResponsesRequest { + return openaiReq + }, + nil, + provider.logger, postHookSpanFinalizer, + ) +} diff --git a/core/providers/bedrock/responses.go b/core/providers/bedrock/responses.go index 43a5c8eb62..b58377200a 100644 --- a/core/providers/bedrock/responses.go +++ b/core/providers/bedrock/responses.go @@ -19,20 +19,24 @@ import ( // BedrockResponsesStreamState tracks state during streaming conversion for responses API type BedrockResponsesStreamState struct { - ContentIndexToOutputIndex map[int]int // Maps Bedrock contentBlockIndex to OpenAI output_index - ToolArgumentBuffers map[int]string // Maps output_index to accumulated tool argument JSON - ItemIDs map[int]string // Maps output_index to item ID for stable IDs - ToolCallIDs map[int]string // Maps output_index to tool call ID (callID) - ToolCallNames map[int]string // Maps output_index to tool call name - ReasoningContentIndices map[int]bool // Tracks which content indices are reasoning blocks - CompletedOutputIndices map[int]bool // Tracks which output indices have been completed - CurrentOutputIndex int // Current output index counter - MessageID *string // Message ID (generated) - Model *string // Model name - StopReason *string // Stop reason for the message - CreatedAt int // Timestamp for created_at consistency - HasEmittedCreated bool // Whether we've emitted response.created - HasEmittedInProgress bool // Whether we've emitted response.in_progress + ContentIndexToOutputIndex map[int]int // Maps Bedrock contentBlockIndex to OpenAI output_index + ToolArgumentBuffers map[int]string // Maps output_index to accumulated tool argument JSON + ItemIDs map[int]string // Maps output_index to item ID for stable IDs + ToolCallIDs map[int]string // Maps output_index to tool call ID (callID) + ToolCallNames map[int]string // Maps output_index to tool call name + ReasoningContentIndices map[int]bool // Tracks which content indices are reasoning blocks + CodeInterpreterIndices map[int]bool // Tracks which output indices are nova_code_interpreter calls + NovaGroundingIndices map[int]bool // Tracks which output indices are nova_grounding (web_search_call) blocks + NovaGroundingCitations map[int][]schemas.ResponsesWebSearchToolCallActionSearchSource // Collected citation sources per nova_grounding output index + CompletedOutputIndices map[int]bool // Tracks which output indices have been completed + AnnotationIndices map[int]int // Maps output_index to next annotation index for sequential citation numbering + CurrentOutputIndex int // Current output index counter + MessageID *string // Message ID (generated) + Model *string // Model name + StopReason *string // Stop reason for the message + CreatedAt int // Timestamp for created_at consistency + HasEmittedCreated bool // Whether we've emitted response.created + HasEmittedInProgress bool // Whether we've emitted response.in_progress } // bedrockResponsesStreamStatePool provides a pool for Bedrock responses stream state objects. @@ -45,7 +49,11 @@ var bedrockResponsesStreamStatePool = sync.Pool{ ToolCallIDs: make(map[int]string), ToolCallNames: make(map[int]string), ReasoningContentIndices: make(map[int]bool), + CodeInterpreterIndices: make(map[int]bool), + NovaGroundingIndices: make(map[int]bool), + NovaGroundingCitations: make(map[int][]schemas.ResponsesWebSearchToolCallActionSearchSource), CompletedOutputIndices: make(map[int]bool), + AnnotationIndices: make(map[int]int), CurrentOutputIndex: 0, CreatedAt: int(time.Now().Unix()), HasEmittedCreated: false, @@ -89,11 +97,31 @@ func acquireBedrockResponsesStreamState() *BedrockResponsesStreamState { } else { clear(state.ReasoningContentIndices) } + if state.CodeInterpreterIndices == nil { + state.CodeInterpreterIndices = make(map[int]bool) + } else { + clear(state.CodeInterpreterIndices) + } + if state.NovaGroundingIndices == nil { + state.NovaGroundingIndices = make(map[int]bool) + } else { + clear(state.NovaGroundingIndices) + } + if state.NovaGroundingCitations == nil { + state.NovaGroundingCitations = make(map[int][]schemas.ResponsesWebSearchToolCallActionSearchSource) + } else { + clear(state.NovaGroundingCitations) + } if state.CompletedOutputIndices == nil { state.CompletedOutputIndices = make(map[int]bool) } else { clear(state.CompletedOutputIndices) } + if state.AnnotationIndices == nil { + state.AnnotationIndices = make(map[int]int) + } else { + clear(state.AnnotationIndices) + } // Reset other fields state.CurrentOutputIndex = 0 state.MessageID = nil @@ -145,11 +173,31 @@ func (state *BedrockResponsesStreamState) flush() { } else { clear(state.ReasoningContentIndices) } + if state.CodeInterpreterIndices == nil { + state.CodeInterpreterIndices = make(map[int]bool) + } else { + clear(state.CodeInterpreterIndices) + } + if state.NovaGroundingIndices == nil { + state.NovaGroundingIndices = make(map[int]bool) + } else { + clear(state.NovaGroundingIndices) + } + if state.NovaGroundingCitations == nil { + state.NovaGroundingCitations = make(map[int][]schemas.ResponsesWebSearchToolCallActionSearchSource) + } else { + clear(state.NovaGroundingCitations) + } if state.CompletedOutputIndices == nil { state.CompletedOutputIndices = make(map[int]bool) } else { clear(state.CompletedOutputIndices) } + if state.AnnotationIndices == nil { + state.AnnotationIndices = make(map[int]int) + } else { + clear(state.AnnotationIndices) + } state.CurrentOutputIndex = 0 state.MessageID = nil state.Model = nil @@ -404,78 +452,83 @@ func (chunk *BedrockStreamEvent) ToBifrostResponsesStream(sequenceNumber int, st prevItemID := state.ItemIDs[prevOutputIndex] prevToolName := state.ToolCallNames[prevOutputIndex] accumulatedArgs := state.ToolArgumentBuffers[prevOutputIndex] + statusCompleted := "completed" - // Emit content_part.done for tool call - emptyText := "" - part := &schemas.ResponsesMessageContentBlock{ - Type: schemas.ResponsesOutputMessageContentTypeText, - Text: &emptyText, - ResponsesOutputMessageContentText: &schemas.ResponsesOutputMessageContentText{ - LogProbs: []schemas.ResponsesOutputMessageContentTextLogProb{}, - Annotations: []schemas.ResponsesOutputMessageContentTextAnnotation{}, - }, - } - responses = append(responses, &schemas.BifrostResponsesStreamResponse{ - Type: schemas.ResponsesStreamResponseTypeContentPartDone, - SequenceNumber: sequenceNumber + len(responses), - OutputIndex: schemas.Ptr(prevOutputIndex), - ContentIndex: schemas.Ptr(prevContentIndex), - ItemID: &prevItemID, - Part: part, - }) + if state.CodeInterpreterIndices[prevOutputIndex] { + ciEvents := emitCodeInterpreterDoneEvents(prevOutputIndex, prevContentIndex, prevItemID, prevToolCallID, accumulatedArgs, sequenceNumber+len(responses)) + responses = append(responses, ciEvents...) + } else if state.NovaGroundingIndices[prevOutputIndex] { + citations := state.NovaGroundingCitations[prevOutputIndex] + wsEvents := emitNovaGroundingDoneEvents(prevOutputIndex, prevContentIndex, prevItemID, citations, accumulatedArgs, sequenceNumber+len(responses)) + responses = append(responses, wsEvents...) + } else { + // Close a regular function_call block + emptyText := "" + part := &schemas.ResponsesMessageContentBlock{ + Type: schemas.ResponsesOutputMessageContentTypeText, + Text: &emptyText, + ResponsesOutputMessageContentText: &schemas.ResponsesOutputMessageContentText{ + LogProbs: []schemas.ResponsesOutputMessageContentTextLogProb{}, + Annotations: []schemas.ResponsesOutputMessageContentTextAnnotation{}, + }, + } + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeContentPartDone, + SequenceNumber: sequenceNumber + len(responses), + OutputIndex: schemas.Ptr(prevOutputIndex), + ContentIndex: schemas.Ptr(prevContentIndex), + ItemID: &prevItemID, + Part: part, + }) - // Emit function_call_arguments.done with full arguments - if accumulatedArgs != "" { - var doneItem *schemas.ResponsesMessage - if prevToolCallID != "" || prevToolName != "" { - doneItem = &schemas.ResponsesMessage{ - ResponsesToolMessage: &schemas.ResponsesToolMessage{}, + if accumulatedArgs != "" { + var doneItem *schemas.ResponsesMessage + if prevToolCallID != "" || prevToolName != "" { + doneItem = &schemas.ResponsesMessage{ + ResponsesToolMessage: &schemas.ResponsesToolMessage{}, + } + if prevToolCallID != "" { + doneItem.ResponsesToolMessage.CallID = &prevToolCallID + } + if prevToolName != "" { + doneItem.ResponsesToolMessage.Name = &prevToolName + } + } + argsDoneResponse := &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeFunctionCallArgumentsDone, + SequenceNumber: sequenceNumber + len(responses), + OutputIndex: schemas.Ptr(prevOutputIndex), + Arguments: &accumulatedArgs, } - if prevToolCallID != "" { - doneItem.ResponsesToolMessage.CallID = &prevToolCallID + if prevItemID != "" { + argsDoneResponse.ItemID = &prevItemID } - if prevToolName != "" { - doneItem.ResponsesToolMessage.Name = &prevToolName + if doneItem != nil { + argsDoneResponse.Item = doneItem } + responses = append(responses, argsDoneResponse) } - argsDoneResponse := &schemas.BifrostResponsesStreamResponse{ - Type: schemas.ResponsesStreamResponseTypeFunctionCallArgumentsDone, + toolDoneItem := &schemas.ResponsesMessage{ + ID: &prevItemID, + Type: schemas.Ptr(schemas.ResponsesMessageTypeFunctionCall), + Status: &statusCompleted, + ResponsesToolMessage: &schemas.ResponsesToolMessage{ + CallID: &prevToolCallID, + Name: &prevToolName, + Arguments: &accumulatedArgs, + }, + } + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeOutputItemDone, SequenceNumber: sequenceNumber + len(responses), OutputIndex: schemas.Ptr(prevOutputIndex), - Arguments: &accumulatedArgs, - } - if prevItemID != "" { - argsDoneResponse.ItemID = &prevItemID - } - if doneItem != nil { - argsDoneResponse.Item = doneItem - } - responses = append(responses, argsDoneResponse) - } - - // Emit output_item.done for tool call - statusCompleted := "completed" - toolDoneItem := &schemas.ResponsesMessage{ - ID: &prevItemID, - Type: schemas.Ptr(schemas.ResponsesMessageTypeFunctionCall), - Status: &statusCompleted, - ResponsesToolMessage: &schemas.ResponsesToolMessage{ - CallID: &prevToolCallID, - Name: &prevToolName, - Arguments: &accumulatedArgs, - }, + ContentIndex: schemas.Ptr(prevContentIndex), + ItemID: &prevItemID, + Item: toolDoneItem, + }) } - responses = append(responses, &schemas.BifrostResponsesStreamResponse{ - Type: schemas.ResponsesStreamResponseTypeOutputItemDone, - SequenceNumber: sequenceNumber + len(responses), - OutputIndex: schemas.Ptr(prevOutputIndex), - ContentIndex: schemas.Ptr(prevContentIndex), - ItemID: &prevItemID, - Item: toolDoneItem, - }) - // Mark this output index as completed state.CompletedOutputIndices[prevOutputIndex] = true } @@ -483,37 +536,101 @@ func (chunk *BedrockStreamEvent) ToBifrostResponsesStream(sequenceNumber int, st // Create new output index for this tool use outputIndex := state.CurrentOutputIndex state.ContentIndexToOutputIndex[contentBlockIndex] = outputIndex - state.CurrentOutputIndex++ // Increment for next use + state.CurrentOutputIndex++ - // Store tool use ID as item ID and call ID toolUseID := chunk.Start.ToolUse.ToolUseID toolName := chunk.Start.ToolUse.Name state.ItemIDs[outputIndex] = toolUseID state.ToolCallIDs[outputIndex] = toolUseID state.ToolCallNames[outputIndex] = toolName - statusInProgress := "in_progress" - item := &schemas.ResponsesMessage{ - ID: &toolUseID, - Type: schemas.Ptr(schemas.ResponsesMessageTypeFunctionCall), - Status: &statusInProgress, - ResponsesToolMessage: &schemas.ResponsesToolMessage{ - CallID: &toolUseID, - Name: &toolName, - Arguments: schemas.Ptr(""), // Arguments will be filled by deltas - }, - } - - // Initialize argument buffer for this tool call + // Initialize argument buffer state.ToolArgumentBuffers[outputIndex] = "" - responses = append(responses, &schemas.BifrostResponsesStreamResponse{ - Type: schemas.ResponsesStreamResponseTypeOutputItemAdded, - SequenceNumber: sequenceNumber + len(responses), - OutputIndex: schemas.Ptr(outputIndex), - ContentIndex: schemas.Ptr(contentBlockIndex), - Item: item, - }) + statusInProgress := "in_progress" + + if toolName == "nova_code_interpreter" { + // Emit output_item.added then code_interpreter_call.in_progress + state.CodeInterpreterIndices[outputIndex] = true + item := &schemas.ResponsesMessage{ + ID: &toolUseID, + Type: schemas.Ptr(schemas.ResponsesMessageTypeCodeInterpreterCall), + Status: &statusInProgress, + ResponsesToolMessage: &schemas.ResponsesToolMessage{ + ResponsesCodeInterpreterToolCall: &schemas.ResponsesCodeInterpreterToolCall{ + ContainerID: toolUseID, + Outputs: []schemas.ResponsesCodeInterpreterOutput{}, + }, + }, + } + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeOutputItemAdded, + SequenceNumber: sequenceNumber + len(responses), + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: schemas.Ptr(contentBlockIndex), + Item: item, + }) + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeCodeInterpreterCallInProgress, + SequenceNumber: sequenceNumber + len(responses), + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: schemas.Ptr(contentBlockIndex), + Item: item, + }) + } else if toolName == string(BedrockSystemToolNovaGrounding) { + state.NovaGroundingIndices[outputIndex] = true + state.NovaGroundingCitations[outputIndex] = nil + item := &schemas.ResponsesMessage{ + ID: &toolUseID, + Type: schemas.Ptr(schemas.ResponsesMessageTypeWebSearchCall), + Status: &statusInProgress, + ResponsesToolMessage: &schemas.ResponsesToolMessage{ + CallID: &toolUseID, + Action: &schemas.ResponsesToolMessageActionStruct{ + ResponsesWebSearchToolCallAction: &schemas.ResponsesWebSearchToolCallAction{ + Type: "search", + }, + }, + }, + } + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeOutputItemAdded, + SequenceNumber: sequenceNumber + len(responses), + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: schemas.Ptr(contentBlockIndex), + Item: item, + }) + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeWebSearchCallInProgress, + SequenceNumber: sequenceNumber + len(responses), + OutputIndex: schemas.Ptr(outputIndex), + ItemID: &toolUseID, + }) + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeWebSearchCallSearching, + SequenceNumber: sequenceNumber + len(responses), + OutputIndex: schemas.Ptr(outputIndex), + ItemID: &toolUseID, + }) + } else { + item := &schemas.ResponsesMessage{ + ID: &toolUseID, + Type: schemas.Ptr(schemas.ResponsesMessageTypeFunctionCall), + Status: &statusInProgress, + ResponsesToolMessage: &schemas.ResponsesToolMessage{ + CallID: &toolUseID, + Name: &toolName, + Arguments: schemas.Ptr(""), + }, + } + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeOutputItemAdded, + SequenceNumber: sequenceNumber + len(responses), + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: schemas.Ptr(contentBlockIndex), + Item: item, + }) + } return responses, nil, false } @@ -705,21 +822,71 @@ func (chunk *BedrockStreamEvent) ToBifrostResponsesStream(sequenceNumber int, st return []*schemas.BifrostResponsesStreamResponse{response}, nil, false } + case chunk.Delta.Citation != nil: + citation := chunk.Delta.Citation + if citation.Location.Web != nil { + if state.NovaGroundingIndices[outputIndex] { + domain := citation.Location.Web.Domain + state.NovaGroundingCitations[outputIndex] = append( + state.NovaGroundingCitations[outputIndex], + schemas.ResponsesWebSearchToolCallActionSearchSource{ + Type: "url", + URL: citation.Location.Web.URL, + Title: &domain, + }, + ) + } + // Emit as url_citation annotation (covers both nova_grounding and text blocks). + itemID := state.ItemIDs[outputIndex] + annotationIndex := state.AnnotationIndices[outputIndex] + state.AnnotationIndices[outputIndex]++ + annotation := &schemas.ResponsesOutputMessageContentTextAnnotation{ + Type: "url_citation", + URL: schemas.Ptr(citation.Location.Web.URL), + Title: schemas.Ptr(citation.Location.Web.Domain), + } + response := &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeOutputTextAnnotationAdded, + SequenceNumber: sequenceNumber, + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: &contentBlockIndex, + AnnotationIndex: &annotationIndex, + Annotation: annotation, + } + if itemID != "" { + response.ItemID = &itemID + } + return []*schemas.BifrostResponsesStreamResponse{response}, nil, false + } + case chunk.Delta.ToolUse != nil: - // Handle tool use delta - function call arguments + // Handle tool use delta - function call arguments or code interpreter code toolUseDelta := chunk.Delta.ToolUse if toolUseDelta.Input != "" { - // Accumulate argument deltas state.ToolArgumentBuffers[outputIndex] += toolUseDelta.Input itemID := state.ItemIDs[outputIndex] - response := &schemas.BifrostResponsesStreamResponse{ - Type: schemas.ResponsesStreamResponseTypeFunctionCallArgumentsDelta, - SequenceNumber: sequenceNumber, - OutputIndex: schemas.Ptr(outputIndex), - ContentIndex: &contentBlockIndex, - Delta: &toolUseDelta.Input, + + var response *schemas.BifrostResponsesStreamResponse + if state.CodeInterpreterIndices[outputIndex] { + // Each nova_code_interpreter delta is a complete JSON object {"snippet":"..."}. + codeDelta := providerUtils.GetJSONField([]byte(toolUseDelta.Input), "snippet").String() + response = &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeCodeInterpreterCallCodeDelta, + SequenceNumber: sequenceNumber, + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: &contentBlockIndex, + Delta: &codeDelta, + } + } else { + response = &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeFunctionCallArgumentsDelta, + SequenceNumber: sequenceNumber, + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: &contentBlockIndex, + Delta: &toolUseDelta.Input, + } } if itemID != "" { response.ItemID = &itemID @@ -844,17 +1011,7 @@ func (chunk *BedrockStreamEvent) ToBifrostResponsesStream(sequenceNumber int, st case chunk.StopReason != nil: // Stop reason - track it for the final response - var stopReason string - switch *chunk.StopReason { - case "tool_use": - stopReason = "tool_calls" - case "end_turn": - stopReason = "stop" - case "max_tokens": - stopReason = "length" - default: - stopReason = *chunk.StopReason - } + stopReason := convertBedrockStopReason(*chunk.StopReason) state.StopReason = &stopReason // Items should be closed explicitly when content blocks end return nil, nil, false @@ -863,10 +1020,134 @@ func (chunk *BedrockStreamEvent) ToBifrostResponsesStream(sequenceNumber int, st return nil, nil, false } +// emitCodeInterpreterDoneEvents extracts the code from accumulated JSON args and emits +// code_interpreter_call.code.done + code_interpreter_call.completed + output_item.done in sequence. +func emitCodeInterpreterDoneEvents(outputIndex, contentIndex int, itemID, containerID, accumulatedArgs string, baseSequenceNumber int) []*schemas.BifrostResponsesStreamResponse { + code := providerUtils.GetJSONField([]byte(accumulatedArgs), "snippet").String() + statusCompleted := "completed" + codeDone := &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeCodeInterpreterCallCodeDone, + SequenceNumber: baseSequenceNumber, + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: &contentIndex, + ItemID: &itemID, + Delta: &code, + } + doneItem := &schemas.ResponsesMessage{ + ID: &itemID, + Type: schemas.Ptr(schemas.ResponsesMessageTypeCodeInterpreterCall), + Status: &statusCompleted, + ResponsesToolMessage: &schemas.ResponsesToolMessage{ + ResponsesCodeInterpreterToolCall: &schemas.ResponsesCodeInterpreterToolCall{ + Code: &code, + ContainerID: containerID, + Outputs: []schemas.ResponsesCodeInterpreterOutput{}, + }, + }, + } + completed := &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeCodeInterpreterCallCompleted, + SequenceNumber: baseSequenceNumber + 1, + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: &contentIndex, + ItemID: &itemID, + Item: doneItem, + } + outputDone := &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeOutputItemDone, + SequenceNumber: baseSequenceNumber + 2, + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: &contentIndex, + ItemID: &itemID, + Item: doneItem, + } + return []*schemas.BifrostResponsesStreamResponse{codeDone, completed, outputDone} +} + +// emitNovaGroundingDoneEvents emits web_search_call.completed + output_item.done for a nova_grounding block. +// accumulatedArgs holds the raw toolUse input JSON (e.g. `{"query":"..."}`) from the block's deltas. +func emitNovaGroundingDoneEvents(outputIndex, contentIndex int, itemID string, citations []schemas.ResponsesWebSearchToolCallActionSearchSource, accumulatedArgs string, baseSequenceNumber int) []*schemas.BifrostResponsesStreamResponse { + statusCompleted := "completed" + action := &schemas.ResponsesWebSearchToolCallAction{ + Type: "search", + Sources: citations, + } + // Extract the search query from the accumulated toolUse input. + if q := providerUtils.GetJSONField([]byte(accumulatedArgs), "query").String(); q != "" { + action.Query = &q + action.Queries = []string{q} + } + doneItem := &schemas.ResponsesMessage{ + ID: &itemID, + Type: schemas.Ptr(schemas.ResponsesMessageTypeWebSearchCall), + Status: &statusCompleted, + ResponsesToolMessage: &schemas.ResponsesToolMessage{ + CallID: &itemID, + Action: &schemas.ResponsesToolMessageActionStruct{ + ResponsesWebSearchToolCallAction: action, + }, + }, + } + return []*schemas.BifrostResponsesStreamResponse{ + { + Type: schemas.ResponsesStreamResponseTypeWebSearchCallCompleted, + SequenceNumber: baseSequenceNumber, + OutputIndex: schemas.Ptr(outputIndex), + ItemID: &itemID, + }, + { + Type: schemas.ResponsesStreamResponseTypeOutputItemDone, + SequenceNumber: baseSequenceNumber + 1, + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: &contentIndex, + ItemID: &itemID, + Item: doneItem, + }, + } +} + // FinalizeBedrockStream finalizes the stream by closing any open items and emitting completed event func FinalizeBedrockStream(state *BedrockResponsesStreamState, sequenceNumber int, usage *schemas.ResponsesResponseUsage) []*schemas.BifrostResponsesStreamResponse { var responses []*schemas.BifrostResponsesStreamResponse + // Synthesize lifecycle events if Bedrock never sent a messageStart + if !state.HasEmittedCreated { + if state.MessageID == nil { + messageID := fmt.Sprintf("msg_%d", state.CreatedAt) + state.MessageID = &messageID + } + createdResponse := &schemas.BifrostResponsesResponse{ + ID: state.MessageID, + CreatedAt: state.CreatedAt, + Usage: usage, + } + if state.Model != nil { + createdResponse.Model = *state.Model + } + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeCreated, + SequenceNumber: sequenceNumber + len(responses), + Response: createdResponse, + }) + state.HasEmittedCreated = true + } + + if !state.HasEmittedInProgress { + inProgressResponse := &schemas.BifrostResponsesResponse{ + ID: state.MessageID, + CreatedAt: state.CreatedAt, + } + if state.Model != nil { + inProgressResponse.Model = *state.Model + } + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeInProgress, + SequenceNumber: sequenceNumber + len(responses), + Response: inProgressResponse, + }) + state.HasEmittedInProgress = true + } + // Close any open items (text items and tool calls) for contentIndex, outputIndex := range state.ContentIndexToOutputIndex { // Skip reasoning blocks @@ -889,80 +1170,84 @@ func FinalizeBedrockStream(state *BedrockResponsesStreamState, sequenceNumber in isToolCall := toolCallID != "" if isToolCall { - // This is a tool call that needs to be closed - - // Emit content_part.done for tool call - emptyText := "" - part := &schemas.ResponsesMessageContentBlock{ - Type: schemas.ResponsesOutputMessageContentTypeText, - Text: &emptyText, - ResponsesOutputMessageContentText: &schemas.ResponsesOutputMessageContentText{ - LogProbs: []schemas.ResponsesOutputMessageContentTextLogProb{}, - Annotations: []schemas.ResponsesOutputMessageContentTextAnnotation{}, - }, - } - responses = append(responses, &schemas.BifrostResponsesStreamResponse{ - Type: schemas.ResponsesStreamResponseTypeContentPartDone, - SequenceNumber: sequenceNumber + len(responses), - OutputIndex: schemas.Ptr(outputIndex), - ContentIndex: &contentIndex, - ItemID: &itemID, - Part: part, - }) - - // Emit function_call_arguments.done with full arguments toolName := state.ToolCallNames[outputIndex] accumulatedArgs := state.ToolArgumentBuffers[outputIndex] - if accumulatedArgs != "" { - var doneItem *schemas.ResponsesMessage - if toolCallID != "" || toolName != "" { - doneItem = &schemas.ResponsesMessage{ - ResponsesToolMessage: &schemas.ResponsesToolMessage{}, + statusCompleted := "completed" + + if state.CodeInterpreterIndices[outputIndex] { + ciEvents := emitCodeInterpreterDoneEvents(outputIndex, contentIndex, itemID, toolCallID, accumulatedArgs, sequenceNumber+len(responses)) + responses = append(responses, ciEvents...) + } else if state.NovaGroundingIndices[outputIndex] { + citations := state.NovaGroundingCitations[outputIndex] + wsEvents := emitNovaGroundingDoneEvents(outputIndex, contentIndex, itemID, citations, accumulatedArgs, sequenceNumber+len(responses)) + responses = append(responses, wsEvents...) + } else { + // Close a regular function_call + emptyText := "" + part := &schemas.ResponsesMessageContentBlock{ + Type: schemas.ResponsesOutputMessageContentTypeText, + Text: &emptyText, + ResponsesOutputMessageContentText: &schemas.ResponsesOutputMessageContentText{ + LogProbs: []schemas.ResponsesOutputMessageContentTextLogProb{}, + Annotations: []schemas.ResponsesOutputMessageContentTextAnnotation{}, + }, + } + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeContentPartDone, + SequenceNumber: sequenceNumber + len(responses), + OutputIndex: schemas.Ptr(outputIndex), + ContentIndex: &contentIndex, + ItemID: &itemID, + Part: part, + }) + + if accumulatedArgs != "" { + var doneItem *schemas.ResponsesMessage + if toolCallID != "" || toolName != "" { + doneItem = &schemas.ResponsesMessage{ + ResponsesToolMessage: &schemas.ResponsesToolMessage{}, + } + if toolCallID != "" { + doneItem.ResponsesToolMessage.CallID = &toolCallID + } + if toolName != "" { + doneItem.ResponsesToolMessage.Name = &toolName + } + } + response := &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeFunctionCallArgumentsDone, + SequenceNumber: sequenceNumber + len(responses), + OutputIndex: schemas.Ptr(outputIndex), + Arguments: &accumulatedArgs, } - if toolCallID != "" { - doneItem.ResponsesToolMessage.CallID = &toolCallID + if itemID != "" { + response.ItemID = &itemID } - if toolName != "" { - doneItem.ResponsesToolMessage.Name = &toolName + if doneItem != nil { + response.Item = doneItem } + responses = append(responses, response) } - response := &schemas.BifrostResponsesStreamResponse{ - Type: schemas.ResponsesStreamResponseTypeFunctionCallArgumentsDone, + doneItem := &schemas.ResponsesMessage{ + ID: &itemID, + Type: schemas.Ptr(schemas.ResponsesMessageTypeFunctionCall), + Status: &statusCompleted, + ResponsesToolMessage: &schemas.ResponsesToolMessage{ + CallID: &toolCallID, + Name: &toolName, + Arguments: &accumulatedArgs, + }, + } + responses = append(responses, &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeOutputItemDone, SequenceNumber: sequenceNumber + len(responses), OutputIndex: schemas.Ptr(outputIndex), - Arguments: &accumulatedArgs, - } - if itemID != "" { - response.ItemID = &itemID - } - if doneItem != nil { - response.Item = doneItem - } - responses = append(responses, response) - } - - // Emit output_item.done for tool call - statusCompleted := "completed" - doneItem := &schemas.ResponsesMessage{ - ID: &itemID, - Type: schemas.Ptr(schemas.ResponsesMessageTypeFunctionCall), - Status: &statusCompleted, - ResponsesToolMessage: &schemas.ResponsesToolMessage{ - CallID: &toolCallID, - Name: &toolName, - Arguments: &accumulatedArgs, - }, - } - - responses = append(responses, &schemas.BifrostResponsesStreamResponse{ - Type: schemas.ResponsesStreamResponseTypeOutputItemDone, - SequenceNumber: sequenceNumber + len(responses), - OutputIndex: schemas.Ptr(outputIndex), - ContentIndex: &contentIndex, - ItemID: &itemID, - Item: doneItem, - }) + ContentIndex: &contentIndex, + ItemID: &itemID, + Item: doneItem, + }) + } // end else (regular function call) } else { // This is likely a text item that needs to be closed @@ -1165,14 +1450,24 @@ func ToBedrockConverseStreamResponse(bifrostResp *schemas.BifrostResponsesStream return nil, nil case schemas.ResponsesStreamResponseTypeOutputItemAdded: - // Content block start + // Content block start — handles nova_grounding (web_search_call), function calls, and text items. if bifrostResp.Item != nil && bifrostResp.Item.ResponsesToolMessage != nil { - // Tool use start - if bifrostResp.Item.ResponsesToolMessage.Name != nil && bifrostResp.Item.ResponsesToolMessage.CallID != nil { - contentBlockIndex := 0 - if bifrostResp.ContentIndex != nil { - contentBlockIndex = *bifrostResp.ContentIndex + contentBlockIndex := 0 + if bifrostResp.ContentIndex != nil { + contentBlockIndex = *bifrostResp.ContentIndex + } + // web_search_call (nova_grounding): CallID is set, Name is nil + if bifrostResp.Item.Type != nil && *bifrostResp.Item.Type == schemas.ResponsesMessageTypeWebSearchCall && + bifrostResp.Item.ResponsesToolMessage.CallID != nil { + event.ContentBlockIndex = &contentBlockIndex + event.Start = &BedrockContentBlockStart{ + ToolUse: &BedrockToolUseStart{ + ToolUseID: *bifrostResp.Item.ResponsesToolMessage.CallID, + Name: string(BedrockSystemToolNovaGrounding), + }, } + } else if bifrostResp.Item.ResponsesToolMessage.Name != nil && bifrostResp.Item.ResponsesToolMessage.CallID != nil { + // Regular function call event.ContentBlockIndex = &contentBlockIndex event.Start = &BedrockContentBlockStart{ ToolUse: &BedrockToolUseStart{ @@ -1180,15 +1475,97 @@ func ToBedrockConverseStreamResponse(bifrostResp *schemas.BifrostResponsesStream Name: *bifrostResp.Item.ResponsesToolMessage.Name, }, } + } else { + return nil, nil } } else if bifrostResp.Item != nil { // Text item added - Bedrock doesn't have an explicit text start event, so we skip it - // Check if it's a text message (has content blocks or is a message type) if bifrostResp.Item.Content != nil || (bifrostResp.Item.Type != nil && *bifrostResp.Item.Type == schemas.ResponsesMessageTypeMessage) { return nil, nil } } + case schemas.ResponsesStreamResponseTypeOutputTextAnnotationAdded: + // url_citation annotation → contentBlockDelta.citation + if bifrostResp.Annotation != nil && bifrostResp.Annotation.URL != nil { + contentBlockIndex := 0 + if bifrostResp.ContentIndex != nil { + contentBlockIndex = *bifrostResp.ContentIndex + } + domain := "" + if bifrostResp.Annotation.Title != nil { + domain = *bifrostResp.Annotation.Title + } + event.ContentBlockIndex = &contentBlockIndex + event.Delta = &BedrockContentBlockDelta{ + Citation: &BedrockCitation{ + Location: BedrockCitationLocation{ + Web: &BedrockWebCitationLocation{ + URL: *bifrostResp.Annotation.URL, + Domain: domain, + }, + }, + }, + } + } else { + return nil, nil + } + + case schemas.ResponsesStreamResponseTypeWebSearchCallInProgress, + schemas.ResponsesStreamResponseTypeWebSearchCallSearching, + schemas.ResponsesStreamResponseTypeWebSearchCallCompleted, + schemas.ResponsesStreamResponseTypeWebSearchCallResultsAdded, + schemas.ResponsesStreamResponseTypeWebSearchCallResultsCompleted: + // No Bedrock equivalent for these status events — skip. + return nil, nil + + case schemas.ResponsesStreamResponseTypeCodeInterpreterCallInProgress: + // nova_code_interpreter → contentBlockStart + if bifrostResp.Item != nil && bifrostResp.Item.ResponsesToolMessage != nil && + bifrostResp.Item.ResponsesToolMessage.ResponsesCodeInterpreterToolCall != nil { + toolUseID := bifrostResp.Item.ResponsesToolMessage.ResponsesCodeInterpreterToolCall.ContainerID + if toolUseID == "" && bifrostResp.Item.ID != nil { + toolUseID = *bifrostResp.Item.ID + } + contentBlockIndex := 0 + if bifrostResp.ContentIndex != nil { + contentBlockIndex = *bifrostResp.ContentIndex + } + event.ContentBlockIndex = &contentBlockIndex + event.Start = &BedrockContentBlockStart{ + ToolUse: &BedrockToolUseStart{ + ToolUseID: toolUseID, + Name: string(BedrockSystemToolNovaCodeInterpreter), + }, + } + } else { + return nil, nil + } + + case schemas.ResponsesStreamResponseTypeCodeInterpreterCallCodeDelta: + // nova_code_interpreter toolUse delta — wrap snippet back into {"snippet":"..."} JSON + if bifrostResp.Delta != nil && *bifrostResp.Delta != "" { + contentBlockIndex := 0 + if bifrostResp.ContentIndex != nil { + contentBlockIndex = *bifrostResp.ContentIndex + } + inputJSON, _ := json.Marshal(map[string]string{"snippet": *bifrostResp.Delta}) + event.ContentBlockIndex = &contentBlockIndex + event.Delta = &BedrockContentBlockDelta{ + ToolUse: &BedrockToolUseDelta{ + Input: string(inputJSON), + }, + } + } else { + return nil, nil + } + + case schemas.ResponsesStreamResponseTypeCodeInterpreterCallCodeDone, + schemas.ResponsesStreamResponseTypeCodeInterpreterCallCompleted, + schemas.ResponsesStreamResponseTypeCodeInterpreterCallInterpreting: + // No Bedrock equivalent — skip. + return nil, nil + case schemas.ResponsesStreamResponseTypeOutputTextDelta: // Text delta if bifrostResp.Delta != nil && *bifrostResp.Delta != "" { @@ -1436,6 +1813,18 @@ func (request *BedrockConverseRequest) ToBifrostResponsesRequest(ctx *schemas.Bi } bifrostReq.Params.Tools = append(bifrostReq.Params.Tools, bifrostTool) + } else if tool.SystemTool != nil { + // Nova system tools: nova_grounding → web_search, nova_code_interpreter → code_interpreter + var toolType schemas.ResponsesToolType + switch tool.SystemTool.Name { + case BedrockSystemToolNovaGrounding: + toolType = schemas.ResponsesToolTypeWebSearch + case BedrockSystemToolNovaCodeInterpreter: + toolType = schemas.ResponsesToolTypeCodeInterpreter + default: + continue + } + bifrostReq.Params.Tools = append(bifrostReq.Params.Tools, schemas.ResponsesTool{Type: toolType}) } else if tool.CachePoint != nil && !schemas.IsNovaModel(bifrostReq.Model) { // add cache control to last tool in tools array if len(bifrostReq.Params.Tools) > 0 { @@ -1665,7 +2054,16 @@ func ToBedrockResponsesRequest(ctx *schemas.BifrostContext, bifrostReq *schemas. // map bifrost messages to bedrock messages using the new conversion method if bifrostReq.Input != nil { - messages, systemMessages, err := ConvertBifrostMessagesToBedrockMessages(ctx, bifrostReq.Input) + input := bifrostReq.Input + if schemas.IsAnthropicModel(bifrostReq.Model) && ctx.Value(schemas.BifrostContextKeySupportsAssistantPrefill) == false { + trimmed := len(input) + for trimmed > 0 && input[trimmed-1].Role != nil && *input[trimmed-1].Role == schemas.ResponsesInputMessageRoleAssistant { + trimmed-- + } + input = input[:trimmed] + } + + messages, systemMessages, err := ConvertBifrostMessagesToBedrockMessages(ctx, input) if err != nil { return nil, fmt.Errorf("failed to convert Responses messages: %w", err) } @@ -1682,6 +2080,19 @@ func ToBedrockResponsesRequest(ctx *schemas.BifrostContext, bifrostReq *schemas. } } } + + // Trim trailing whitespace from the last assistant message text blocks + // (only for Anthropic models which use text-based prefill) + lastMsgIndex := len(bedrockReq.Messages) - 1 + if schemas.IsAnthropicModel(bifrostReq.Model) && lastMsgIndex >= 0 && bedrockReq.Messages[lastMsgIndex].Role == BedrockMessageRoleAssistant { + blocks := bedrockReq.Messages[lastMsgIndex].Content + for j := len(blocks) - 1; j >= 0; j-- { + if blocks[j].Text != nil { + bedrockReq.Messages[lastMsgIndex].Content[j].Text = schemas.Ptr(strings.TrimRight(*blocks[j].Text, " \n\r\t")) + break + } + } + } } var responsesStructuredOutputTool *BedrockTool @@ -1909,7 +2320,24 @@ func ToBedrockResponsesRequest(ctx *schemas.BifrostContext, bifrostReq *schemas. // Convert tools if bifrostReq.Params != nil && bifrostReq.Params.Tools != nil { var bedrockTools []BedrockTool + isNova2 := schemas.IsNova2Model(bifrostReq.Model) for _, tool := range bifrostReq.Params.Tools { + if tool.Type == schemas.ResponsesToolTypeWebSearch || tool.Type == schemas.ResponsesToolTypeCodeInterpreter { + if !isNova2 { + return nil, fmt.Errorf("tool type %q is only supported on Nova 2 models in Bedrock; got model %q", tool.Type, bifrostReq.Model) + } + var systemToolName BedrockSystemToolType + switch tool.Type { + case schemas.ResponsesToolTypeWebSearch: + systemToolName = BedrockSystemToolNovaGrounding + case schemas.ResponsesToolTypeCodeInterpreter: + systemToolName = BedrockSystemToolNovaCodeInterpreter + } + bedrockTools = append(bedrockTools, BedrockTool{ + SystemTool: &BedrockSystemTool{Name: systemToolName}, + }) + continue + } if tool.ResponsesToolFunction != nil { // Create the complete schema object that Bedrock expects var schemaObject interface{} @@ -2077,6 +2505,11 @@ func (response *BedrockConverseResponse) ToBifrostResponsesResponse(ctx *schemas bifrostResp.ServiceTier = &response.ServiceTier.Type } + if response.StopReason != "" { + stopReason := convertBedrockStopReason(response.StopReason) + bifrostResp.StopReason = &stopReason + } + return bifrostResp, nil } @@ -2112,9 +2545,19 @@ func ToBedrockConverseResponse(bifrostResp *schemas.BifrostResponsesResponse) (* message.Content = append(message.Content, bedrockMsg.Content...) } - // Check for tool use in the content blocks + // Check for tool use in the content blocks. Server-managed tools + // (nova_grounding, nova_code_interpreter) return both toolUse and + // toolResult in the same message — their stop reason is "end_turn", + // not "tool_use". Only flag hasToolUse when there is an unmatched + // toolUse (i.e. the model is waiting for a client-side tool result). + resolvedToolUseIDs := make(map[string]bool) + for _, block := range message.Content { + if block.ToolResult != nil { + resolvedToolUseIDs[block.ToolResult.ToolUseID] = true + } + } for _, block := range message.Content { - if block.ToolUse != nil { + if block.ToolUse != nil && !resolvedToolUseIDs[block.ToolUse.ToolUseID] { hasToolUse = true break } @@ -2123,10 +2566,11 @@ func ToBedrockConverseResponse(bifrostResp *schemas.BifrostResponsesResponse) (* bedrockResp.Output.Message = message - // Find stop reason from incomplete details or derive from response - // Priority: IncompleteDetails > tool_use detection > end_turn + // Derive stop reason: StopReason > IncompleteDetails > tool_use detection > end_turn stopReason := "end_turn" - if bifrostResp.IncompleteDetails != nil { + if bifrostResp.StopReason != nil { + stopReason = convertBifrostToBedrockStopReason(*bifrostResp.StopReason) + } else if bifrostResp.IncompleteDetails != nil { stopReason = bifrostResp.IncompleteDetails.Reason } else if hasToolUse { stopReason = "tool_use" @@ -2183,16 +2627,17 @@ func ensureResponsesToolConfigForConversation(bifrostReq *schemas.BifrostRespons return // Already has tool config } - hasToolContent, tools := extractToolsFromResponsesConversationHistory(bifrostReq.Input) + hasToolContent, tools := extractToolsFromResponsesConversationHistory(bifrostReq.Input, bifrostReq.Model) if hasToolContent && len(tools) > 0 { bedrockReq.ToolConfig = &BedrockToolConfig{Tools: tools} } } // extractToolsFromResponsesConversationHistory extracts tools from Responses conversation history -func extractToolsFromResponsesConversationHistory(messages []schemas.ResponsesMessage) (bool, []BedrockTool) { +func extractToolsFromResponsesConversationHistory(messages []schemas.ResponsesMessage, model string) (bool, []BedrockTool) { var hasToolContent bool toolMap := make(map[string]*schemas.ResponsesTool) // Use map to deduplicate by name + var hasNovaGrounding, hasNovaCodeInterpreter bool for _, msg := range messages { // Check if message contains tool use or tool result @@ -2217,11 +2662,17 @@ func extractToolsFromResponsesConversationHistory(messages []schemas.ResponsesMe } } } + case schemas.ResponsesMessageTypeWebSearchCall: + hasToolContent = true + hasNovaGrounding = true + case schemas.ResponsesMessageTypeCodeInterpreterCall: + hasToolContent = true + hasNovaCodeInterpreter = true } } } - // Convert map to slice + // Convert function tool map to BedrockTool slice var tools []BedrockTool for _, tool := range toolMap { if tool.Name != nil && tool.ResponsesToolFunction != nil { @@ -2252,6 +2703,16 @@ func extractToolsFromResponsesConversationHistory(messages []schemas.ResponsesMe } } + // Append system tools found in history — only valid on Nova 2 models + if schemas.IsNova2Model(model) { + if hasNovaGrounding { + tools = append(tools, BedrockTool{SystemTool: &BedrockSystemTool{Name: BedrockSystemToolNovaGrounding}}) + } + if hasNovaCodeInterpreter { + tools = append(tools, BedrockTool{SystemTool: &BedrockSystemTool{Name: BedrockSystemToolNovaCodeInterpreter}}) + } + } + return hasToolContent, tools } @@ -2387,7 +2848,7 @@ func (m *ToolCallStateManager) RegisterToolCall(callID, toolName, arguments stri // RegisterToolResult registers a tool result func (m *ToolCallStateManager) RegisterToolResult(callID string, content []BedrockContentBlock, status string, cacheControl *schemas.CacheControl) { - // Attemp to deduplicate the result similar to tool call. Need to check in 2 places, since after moving + // Attempt to deduplicate the result similar to tool call. Need to check in 2 places, since after moving // on from pendingResults into a completed toolCall, the same ID might come again. if _, ok := m.pendingResults[callID]; ok { return @@ -2516,6 +2977,9 @@ func ConvertBifrostMessagesToBedrockMessages(ctx context.Context, bifrostMessage var bedrockMessages []BedrockMessage var systemMessages []BedrockSystemMessage var pendingReasoningContentBlocks []BedrockContentBlock + // pendingServerToolBlocks accumulates nova_grounding / nova_code_interpreter toolUse+toolResult + // blocks that must be prepended to the next assistant text message (same-turn server-managed tools). + var pendingServerToolBlocks []BedrockContentBlock // Initialize the state manager for tracking tool calls and results stateManager := NewToolCallStateManager() @@ -2854,6 +3318,12 @@ func ConvertBifrostMessagesToBedrockMessages(ctx context.Context, bifrostMessage // Convert user/assistant text message bedrockMsg := convertBifrostMessageToBedrockMessage(ctx, &msg) if bedrockMsg != nil { + // Prepend buffered server-managed tool blocks (nova_grounding / nova_code_interpreter) + // to the assistant message they belong to — they're part of the same turn. + if bedrockMsg.Role == BedrockMessageRoleAssistant && len(pendingServerToolBlocks) > 0 { + bedrockMsg.Content = append(pendingServerToolBlocks, bedrockMsg.Content...) + pendingServerToolBlocks = nil + } bedrockMessages = append(bedrockMessages, *bedrockMsg) } } @@ -2865,9 +3335,110 @@ func ConvertBifrostMessagesToBedrockMessages(ctx context.Context, bifrostMessage if len(reasoningBlocks) > 0 { pendingReasoningContentBlocks = append(pendingReasoningContentBlocks, reasoningBlocks...) } + + case schemas.ResponsesMessageTypeWebSearchCall: + // Convert web_search_call → nova_grounding toolUse + toolResult. + if msg.ResponsesToolMessage == nil || msg.ResponsesToolMessage.CallID == nil { + continue + } + callID := *msg.ResponsesToolMessage.CallID + // Build toolUse input from the search query (matches original Bedrock format). + inputMap := map[string]string{} + if msg.ResponsesToolMessage.Action != nil && + msg.ResponsesToolMessage.Action.ResponsesWebSearchToolCallAction != nil { + action := msg.ResponsesToolMessage.Action.ResponsesWebSearchToolCallAction + if action.Query != nil { + inputMap["query"] = *action.Query + } + } + inputBytes, _ := json.Marshal(inputMap) + toolUseBlock := BedrockContentBlock{ + ToolUse: &BedrockToolUse{ + ToolUseID: callID, + Name: string(BedrockSystemToolNovaGrounding), + Input: json.RawMessage(inputBytes), + Type: "server_tool_use", + }, + } + // Serialize sources as JSON for the toolResult content; preserve type and status. + sourcesText := "[]" + if msg.ResponsesToolMessage.Action != nil && + msg.ResponsesToolMessage.Action.ResponsesWebSearchToolCallAction != nil { + action := msg.ResponsesToolMessage.Action.ResponsesWebSearchToolCallAction + if len(action.Sources) > 0 { + if b, err := json.Marshal(action.Sources); err == nil { + sourcesText = string(b) + } + } + } + resultType := BedrockNovaGroundingResultType + toolResultBlock := BedrockContentBlock{ + ToolResult: &BedrockToolResult{ + ToolUseID: callID, + Type: &resultType, + Status: schemas.Ptr("success"), + Content: []BedrockContentBlock{{Text: &sourcesText}}, + }, + } + pendingServerToolBlocks = append(pendingServerToolBlocks, toolUseBlock, toolResultBlock) + + case schemas.ResponsesMessageTypeCodeInterpreterCall: + // Convert code_interpreter_call → nova_code_interpreter toolUse + toolResult. + // Both blocks are buffered and prepended to the next assistant message. + if msg.ResponsesToolMessage == nil || msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall == nil { + continue + } + ci := msg.ResponsesToolMessage.ResponsesCodeInterpreterToolCall + toolUseID := ci.ContainerID + if toolUseID == "" && msg.ID != nil { + toolUseID = *msg.ID + } + code := "" + if ci.Code != nil { + code = *ci.Code + } + inputBytes, _ := json.Marshal(map[string]string{"snippet": code}) + toolUseBlock := BedrockContentBlock{ + ToolUse: &BedrockToolUse{ + ToolUseID: toolUseID, + Name: string(BedrockSystemToolNovaCodeInterpreter), + Input: json.RawMessage(inputBytes), + Type: "server_tool_use", + }, + } + // Build toolResult from outputs (stdout/stderr). + var stdOut, stdErr string + for _, output := range ci.Outputs { + if output.ResponsesCodeInterpreterOutputLogs != nil { + stdOut += output.ResponsesCodeInterpreterOutputLogs.Logs + } + } + execResultBytes, _ := json.Marshal(struct { + StdOut string `json:"stdOut"` + StdErr string `json:"stdErr"` + }{StdOut: stdOut, StdErr: stdErr}) + execResultStr := string(execResultBytes) + resultType := BedrockNovaCodeInterpreterResultType + toolResultBlock := BedrockContentBlock{ + ToolResult: &BedrockToolResult{ + ToolUseID: toolUseID, + Type: &resultType, + Content: []BedrockContentBlock{{Text: &execResultStr}}, + }, + } + pendingServerToolBlocks = append(pendingServerToolBlocks, toolUseBlock, toolResultBlock) } } + // Flush any remaining server-managed tool blocks (no following assistant message). + if len(pendingServerToolBlocks) > 0 { + bedrockMessages = append(bedrockMessages, BedrockMessage{ + Role: BedrockMessageRoleAssistant, + Content: pendingServerToolBlocks, + }) + pendingServerToolBlocks = nil + } + // Flush any remaining pending tool calls flushPendingToolCalls() @@ -3074,7 +3645,55 @@ func convertSingleBedrockMessageToBifrostMessages(ctx *schemas.BifrostContext, m } } + // Pre-scan: build toolUseId → toolResult map for nova_code_interpreter_result blocks + // so we can attach execution output when we encounter the matching toolUse block. + novaCodeResults := make(map[string]*BedrockToolResult) + for i := range msg.Content { + r := msg.Content[i].ToolResult + if r != nil && r.Type != nil && *r.Type == BedrockNovaCodeInterpreterResultType { + novaCodeResults[r.ToolUseID] = r + } + } + + // Pre-scan: collect nova_grounding toolUseIDs and citation sources from citationsContent. + // nova_grounding toolResults (paired with the toolUse) are skipped in the main loop; + // citation URLs from text blocks are surfaced as sources on the web_search_call item. + novaGroundingToolUseIDs := make(map[string]bool) + var novaGroundingSources []schemas.ResponsesWebSearchToolCallActionSearchSource + seenCitationURLs := make(map[string]bool) + for i := range msg.Content { + if msg.Content[i].ToolUse != nil && msg.Content[i].ToolUse.Name == string(BedrockSystemToolNovaGrounding) { + novaGroundingToolUseIDs[msg.Content[i].ToolUse.ToolUseID] = true + } + if msg.Content[i].CitationsContent != nil { + for _, citation := range msg.Content[i].CitationsContent.Citations { + if citation.Location.Web != nil && !seenCitationURLs[citation.Location.Web.URL] { + seenCitationURLs[citation.Location.Web.URL] = true + domain := citation.Location.Web.Domain + novaGroundingSources = append(novaGroundingSources, schemas.ResponsesWebSearchToolCallActionSearchSource{ + Type: "url", + URL: citation.Location.Web.URL, + Title: &domain, + }) + } + } + } + } + + // lastTextOutputIdx tracks the index into outputMessages of the most recently appended + // text message, so standalone citationsContent blocks can be attached to it as annotations. + lastTextOutputIdx := -1 + for _, block := range msg.Content { + // Skip nova_code_interpreter_result tool results — they are consumed via novaCodeResults above. + if block.ToolResult != nil && block.ToolResult.Type != nil && *block.ToolResult.Type == BedrockNovaCodeInterpreterResultType { + continue + } + // Skip nova_grounding tool results — server-managed, consumed by the pre-scan above. + if block.ToolResult != nil && novaGroundingToolUseIDs[block.ToolResult.ToolUseID] { + continue + } + if block.Text != nil { // Text content role := convertBedrockRoleToBifrostRole(msg.Role) @@ -3091,6 +3710,37 @@ func convertSingleBedrockMessageToBifrostMessages(ctx *schemas.BifrostContext, m bifrostMsg.ID = schemas.Ptr("msg_" + fmt.Sprintf("%d", time.Now().UnixNano())) } outputMessages = append(outputMessages, bifrostMsg) + // Track this message so standalone citationsContent blocks can be attached to it. + lastTextOutputIdx = len(outputMessages) - 1 + + } else if block.CitationsContent != nil { + // Standalone citationsContent block — attach citations as url_citation annotations + // to the most recently created text message (interleaved in the Bedrock format). + if lastTextOutputIdx >= 0 { + lastMsg := &outputMessages[lastTextOutputIdx] + if lastMsg.Content != nil && len(lastMsg.Content.ContentBlocks) > 0 { + cb := &lastMsg.Content.ContentBlocks[0] + if cb.ResponsesOutputMessageContentText == nil { + cb.ResponsesOutputMessageContentText = &schemas.ResponsesOutputMessageContentText{ + LogProbs: []schemas.ResponsesOutputMessageContentTextLogProb{}, + Annotations: []schemas.ResponsesOutputMessageContentTextAnnotation{}, + } + } + for _, citation := range block.CitationsContent.Citations { + if citation.Location.Web == nil { + continue + } + cb.ResponsesOutputMessageContentText.Annotations = append( + cb.ResponsesOutputMessageContentText.Annotations, + schemas.ResponsesOutputMessageContentTextAnnotation{ + Type: "url_citation", + URL: schemas.Ptr(citation.Location.Web.URL), + Title: schemas.Ptr(citation.Location.Web.Domain), + }, + ) + } + } + } } else if block.ReasoningContent != nil { // Reasoning content - collect to create a single reasoning message @@ -3125,6 +3775,96 @@ func convertSingleBedrockMessageToBifrostMessages(ctx *schemas.BifrostContext, m bifrostMsg.ID = schemas.Ptr("msg_" + fmt.Sprintf("%d", time.Now().UnixNano())) } outputMessages = append(outputMessages, bifrostMsg) + } else if toolUseName == "nova_code_interpreter" { + // Nova code interpreter: build a code_interpreter_call message. + // Bedrock returns the code under the "snippet" key in toolUse.input. + var snippetInput []byte + if block.ToolUse.Input != nil { + snippetInput = block.ToolUse.Input + } + codeSnippet := providerUtils.GetJSONField(snippetInput, "snippet").String() + + // Build outputs from the paired toolResult (pre-scanned above). + var ciOutputs []schemas.ResponsesCodeInterpreterOutput + if result, ok := novaCodeResults[toolUseID]; ok { + // Extract the JSON payload: {"stdOut":"...","stdErr":"...","exitCode":0,"isError":false} + var execResult struct { + StdOut string `json:"stdOut"` + StdErr string `json:"stdErr"` + } + for _, c := range result.Content { + if c.Text != nil { + _ = json.Unmarshal([]byte(*c.Text), &execResult) + break + } + } + if execResult.StdOut != "" { + ciOutputs = append(ciOutputs, schemas.ResponsesCodeInterpreterOutput{ + ResponsesCodeInterpreterOutputLogs: &schemas.ResponsesCodeInterpreterOutputLogs{ + Type: "logs", + Logs: execResult.StdOut, + }, + }) + } + if execResult.StdErr != "" { + ciOutputs = append(ciOutputs, schemas.ResponsesCodeInterpreterOutput{ + ResponsesCodeInterpreterOutputLogs: &schemas.ResponsesCodeInterpreterOutputLogs{ + Type: "logs", + Logs: execResult.StdErr, + }, + }) + } + } + if ciOutputs == nil { + ciOutputs = []schemas.ResponsesCodeInterpreterOutput{} + } + + ciMsg := schemas.ResponsesMessage{ + Type: schemas.Ptr(schemas.ResponsesMessageTypeCodeInterpreterCall), + Status: schemas.Ptr("completed"), + ResponsesToolMessage: &schemas.ResponsesToolMessage{ + ResponsesCodeInterpreterToolCall: &schemas.ResponsesCodeInterpreterToolCall{ + Code: &codeSnippet, + ContainerID: toolUseID, + Outputs: ciOutputs, + }, + }, + } + if isOutputMessage { + ciMsg.ID = schemas.Ptr("msg_" + fmt.Sprintf("%d", time.Now().UnixNano())) + role := schemas.ResponsesInputMessageRoleAssistant + ciMsg.Role = &role + } + outputMessages = append(outputMessages, ciMsg) + + } else if toolUseName == string(BedrockSystemToolNovaGrounding) { + // nova_grounding → web_search_call with query from toolUse.input and citations from text blocks. + wsAction := &schemas.ResponsesWebSearchToolCallAction{ + Type: "search", + Sources: novaGroundingSources, + } + if block.ToolUse.Input != nil { + if q := providerUtils.GetJSONField(block.ToolUse.Input, "query").String(); q != "" { + wsAction.Query = &q + wsAction.Queries = []string{q} + } + } + wsMsg := schemas.ResponsesMessage{ + Type: schemas.Ptr(schemas.ResponsesMessageTypeWebSearchCall), + Status: schemas.Ptr("completed"), + ResponsesToolMessage: &schemas.ResponsesToolMessage{ + CallID: &toolUseID, + Action: &schemas.ResponsesToolMessageActionStruct{ + ResponsesWebSearchToolCallAction: wsAction, + }, + }, + } + if isOutputMessage { + wsMsg.ID = schemas.Ptr("msg_" + fmt.Sprintf("%d", time.Now().UnixNano())) + role := schemas.ResponsesInputMessageRoleAssistant + wsMsg.Role = &role + } + outputMessages = append(outputMessages, wsMsg) } else { // Normal tool call message arguments := "{}" @@ -3453,6 +4193,32 @@ func convertBifrostResponsesMessageContentBlocksToBedrockContentBlocks(ctx conte blocks = append(blocks, bedrockBlock) } + // For text blocks: emit a citationsContent block per url_citation annotation, + // reconstructing the interleaved text+citation structure Bedrock uses. + if bedrockBlock.Text != nil && block.ResponsesOutputMessageContentText != nil { + for _, annotation := range block.ResponsesOutputMessageContentText.Annotations { + if annotation.Type != "url_citation" || annotation.URL == nil { + continue + } + domain := "" + if annotation.Title != nil { + domain = *annotation.Title + } + blocks = append(blocks, BedrockContentBlock{ + CitationsContent: &BedrockCitationsContent{ + Citations: []BedrockCitation{{ + Location: BedrockCitationLocation{ + Web: &BedrockWebCitationLocation{ + URL: *annotation.URL, + Domain: domain, + }, + }, + }}, + }, + }) + } + } + if block.CacheControl != nil { blocks = append(blocks, BedrockContentBlock{ CachePoint: &BedrockCachePoint{ diff --git a/core/providers/bedrock/types.go b/core/providers/bedrock/types.go index 7c136fb626..4c5367c325 100644 --- a/core/providers/bedrock/types.go +++ b/core/providers/bedrock/types.go @@ -11,11 +11,15 @@ import ( // DefaultBedrockRegion is the default region for Bedrock const DefaultBedrockRegion = "us-east-1" -// bedrockSigningService is the SigV4 service name used when signing all Bedrock -// API requests. AWS requires "bedrock" as the credential scope service for both -// bedrock-runtime and bedrock-agent-runtime endpoints. +// bedrockSigningService is the SigV4 service name for the standard Bedrock endpoints +// (bedrock-runtime, bedrock-agent-runtime). const bedrockSigningService = "bedrock" +// bedrockMantleSigningService is the SigV4 service name for the Bedrock Mantle endpoint +// (bedrock-mantle.{region}.api.aws). AWS requires a distinct service name in the +// credential scope; using "bedrock" will cause signature verification failures. +const bedrockMantleSigningService = "bedrock-mantle" + const MinimumReasoningMaxTokens = 1 const DefaultCompletionMaxTokens = 4096 // Only used for relative reasoning max token calculation - not passed in body by default @@ -206,6 +210,9 @@ type BedrockContentBlock struct { // Cache point for the content block CachePoint *BedrockCachePoint `json:"cachePoint,omitempty"` + + // Citations from nova_grounding — co-located with a text block in the same content block + CitationsContent *BedrockCitationsContent `json:"citationsContent,omitempty"` } type BedrockCachePointType string @@ -245,9 +252,10 @@ type BedrockDocumentSourceData struct { // BedrockToolUse represents a tool use request type BedrockToolUse struct { - ToolUseID string `json:"toolUseId"` // Required: Unique identifier for this tool use - Name string `json:"name"` // Required: Name of the tool to use - Input json.RawMessage `json:"input"` // Required: Input parameters for the tool (json.RawMessage preserves key ordering for prompt caching) + ToolUseID string `json:"toolUseId"` // Required: Unique identifier for this tool use + Name string `json:"name"` // Required: Name of the tool to use + Input json.RawMessage `json:"input"` // Required: Input parameters for the tool (json.RawMessage preserves key ordering for prompt caching) + Type string `json:"type,omitempty"` // Optional: "server_tool_use" for Nova system tools } // BedrockToolResult represents the result of a tool use @@ -255,6 +263,7 @@ type BedrockToolResult struct { ToolUseID string `json:"toolUseId"` // Required: ID of the tool use this result corresponds to Content []BedrockContentBlock `json:"content"` // Required: Content of the tool result Status *string `json:"status,omitempty"` // Optional: Status of tool execution ("success" or "error") + Type *string `json:"type,omitempty"` // Optional: result type e.g. "nova_code_interpreter_result" } // BedrockGuardContent represents guard content for guardrails @@ -304,6 +313,22 @@ type BedrockToolConfig struct { type BedrockTool struct { ToolSpec *BedrockToolSpec `json:"toolSpec,omitempty"` // Tool specification CachePoint *BedrockCachePoint `json:"cachePoint,omitempty"` // Cache point for the tool + SystemTool *BedrockSystemTool `json:"systemTool,omitempty"` // Nova system tool (nova_grounding, nova_code_interpreter) +} + +type BedrockSystemToolType string + +const ( + BedrockSystemToolNovaGrounding BedrockSystemToolType = "nova_grounding" + BedrockSystemToolNovaCodeInterpreter BedrockSystemToolType = "nova_code_interpreter" +) + +const BedrockNovaCodeInterpreterResultType = "nova_code_interpreter_result" +const BedrockNovaGroundingResultType = "nova_grounding_result" + +// BedrockSystemTool represents a Nova-managed system tool +type BedrockSystemTool struct { + Name BedrockSystemToolType `json:"name"` // "nova_grounding" | "nova_code_interpreter" } // BedrockToolSpec represents the specification of a tool @@ -643,6 +668,28 @@ type BedrockContentBlockDelta struct { Text *string `json:"text,omitempty"` // Text content delta ReasoningContent *BedrockReasoningContentText `json:"reasoningContent,omitempty"` // Reasoning content delta ToolUse *BedrockToolUseDelta `json:"toolUse,omitempty"` // Tool use delta + Citation *BedrockCitation `json:"citation,omitempty"` // nova_grounding citation delta +} + +// BedrockWebCitationLocation represents the web location of a citation +type BedrockWebCitationLocation struct { + URL string `json:"url"` + Domain string `json:"domain"` +} + +// BedrockCitationLocation represents the location of a citation (union type) +type BedrockCitationLocation struct { + Web *BedrockWebCitationLocation `json:"web,omitempty"` +} + +// BedrockCitation represents a single citation returned by nova_grounding +type BedrockCitation struct { + Location BedrockCitationLocation `json:"location"` +} + +// BedrockCitationsContent represents the citations block embedded in a text content block +type BedrockCitationsContent struct { + Citations []BedrockCitation `json:"citations"` } // BedrockToolUseDelta represents incremental tool use content diff --git a/core/providers/bedrock/utils.go b/core/providers/bedrock/utils.go index 295335c245..998b888fb0 100644 --- a/core/providers/bedrock/utils.go +++ b/core/providers/bedrock/utils.go @@ -40,14 +40,21 @@ var ( multiSpaceRegex = regexp.MustCompile(`\s{2,}`) // bedrockFinishReasonToBifrost maps Bedrock Converse API stop reasons to Bifrost format. - // Bedrock has additional stop reasons beyond Anthropic (guardrail_intervened, content_filtered). + // Unmappable reasons (e.g. guardrail_intervened) are passed through as-is. bedrockFinishReasonToBifrost = map[string]string{ - "end_turn": "stop", - "max_tokens": "length", - "stop_sequence": "stop", - "tool_use": "tool_calls", - "guardrail_intervened": "content_filter", - "content_filtered": "content_filter", + "end_turn": "stop", + "max_tokens": "length", + "stop_sequence": "stop", + "tool_use": "tool_calls", + "content_filtered": "content_filter", + } + + // bifrostToBedrockStopReason is the reverse of bedrockFinishReasonToBifrost. + bifrostToBedrockStopReason = map[string]string{ + "stop": "end_turn", + "length": "max_tokens", + "tool_calls": "tool_use", + "content_filter": "content_filtered", } ) @@ -56,7 +63,15 @@ func convertBedrockStopReason(stopReason string) string { if reason, ok := bedrockFinishReasonToBifrost[stopReason]; ok { return reason } - return "stop" + return stopReason +} + +// convertBifrostToBedrockStopReason converts a Bifrost stop reason back to Bedrock format. +func convertBifrostToBedrockStopReason(bifrostReason string) string { + if reason, ok := bifrostToBedrockStopReason[bifrostReason]; ok { + return reason + } + return bifrostReason } // normalizeBedrockFilename normalizes a filename to meet Bedrock's requirements: @@ -2100,4 +2115,4 @@ func tryParseJSONIntoContentBlock(text string) BedrockContentBlock { wrapped = append(wrapped, '}') return BedrockContentBlock{JSON: json.RawMessage(wrapped)} } -} \ No newline at end of file +} diff --git a/core/providers/cerebras/cerebras.go b/core/providers/cerebras/cerebras.go index 45292d6d24..b2f38ea38f 100644 --- a/core/providers/cerebras/cerebras.go +++ b/core/providers/cerebras/cerebras.go @@ -116,6 +116,7 @@ func (provider *CerebrasProvider) TextCompletionStream(ctx *schemas.BifrostConte request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -163,6 +164,7 @@ func (provider *CerebrasProvider) ChatCompletionStream(ctx *schemas.BifrostConte request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), schemas.Cerebras, diff --git a/core/providers/cohere/cohere.go b/core/providers/cohere/cohere.go index 1bffa10e87..b43166c7f3 100644 --- a/core/providers/cohere/cohere.go +++ b/core/providers/cohere/cohere.go @@ -460,7 +460,7 @@ func (provider *CohereProvider) ChatCompletionStream(ctx *schemas.BifrostContext providerUtils.DrainLargePayloadRemainder(ctx) } if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -482,7 +482,7 @@ func (provider *CohereProvider) ChatCompletionStream(ctx *schemas.BifrostContext // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, parseCohereError(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) } @@ -509,7 +509,7 @@ func (provider *CohereProvider) ChatCompletionStream(ctx *schemas.BifrostContext } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -537,10 +537,11 @@ func (provider *CohereProvider) ChatCompletionStream(ctx *schemas.BifrostContext } data, readErr := sseReader.ReadDataLine() if readErr != nil { + // Recheck context cancellation + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) provider.logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, provider.logger, postHookSpanFinalizer) @@ -724,7 +725,7 @@ func (provider *CohereProvider) ResponsesStream(ctx *schemas.BifrostContext, pos providerUtils.DrainLargePayloadRemainder(ctx) } if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -746,7 +747,7 @@ func (provider *CohereProvider) ResponsesStream(ctx *schemas.BifrostContext, pos // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, parseCohereError(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) } @@ -773,7 +774,7 @@ func (provider *CohereProvider) ResponsesStream(ctx *schemas.BifrostContext, pos } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -806,10 +807,11 @@ func (provider *CohereProvider) ResponsesStream(ctx *schemas.BifrostContext, pos } data, readErr := sseReader.ReadDataLine() if readErr != nil { + // Recheck context cancellation + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) provider.logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, provider.logger, postHookSpanFinalizer) diff --git a/core/providers/elevenlabs/elevenlabs.go b/core/providers/elevenlabs/elevenlabs.go index f092ce6d16..ebc3317cb7 100644 --- a/core/providers/elevenlabs/elevenlabs.go +++ b/core/providers/elevenlabs/elevenlabs.go @@ -352,7 +352,7 @@ func (provider *ElevenlabsProvider) SpeechStream(ctx *schemas.BifrostContext, po startTime := time.Now() err := provider.streamingClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -374,7 +374,7 @@ func (provider *ElevenlabsProvider) SpeechStream(ctx *schemas.BifrostContext, po // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, parseElevenlabsError(resp), jsonBody, nil, provider.sendBackRawRequest, provider.sendBackRawResponse) } @@ -392,7 +392,7 @@ func (provider *ElevenlabsProvider) SpeechStream(ctx *schemas.BifrostContext, po } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() diff --git a/core/providers/elevenlabs/realtime.go b/core/providers/elevenlabs/realtime.go index a18e1cd514..20ef26da26 100644 --- a/core/providers/elevenlabs/realtime.go +++ b/core/providers/elevenlabs/realtime.go @@ -26,7 +26,7 @@ func (provider *ElevenlabsProvider) RealtimeWebSocketURL(key schemas.Key, model } // RealtimeHeaders returns the headers required for the ElevenLabs Conversational AI WebSocket. -func (provider *ElevenlabsProvider) RealtimeHeaders(key schemas.Key) map[string]string { +func (provider *ElevenlabsProvider) RealtimeHeaders(_ *schemas.BifrostContext, key schemas.Key) (map[string]string, *schemas.BifrostError) { headers := map[string]string{ "xi-api-key": key.Value.GetValue(), } @@ -36,7 +36,7 @@ func (provider *ElevenlabsProvider) RealtimeHeaders(key schemas.Key) map[string] } headers[k] = v } - return headers + return headers, nil } // SupportsRealtimeWebRTC returns false — ElevenLabs WebRTC SDP exchange is not yet implemented. diff --git a/core/providers/fireworks/fireworks.go b/core/providers/fireworks/fireworks.go index 827d1777df..646acbb3cf 100644 --- a/core/providers/fireworks/fireworks.go +++ b/core/providers/fireworks/fireworks.go @@ -112,6 +112,7 @@ func (provider *FireworksProvider) TextCompletionStream(ctx *schemas.BifrostCont request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -159,6 +160,7 @@ func (provider *FireworksProvider) ChatCompletionStream(ctx *schemas.BifrostCont request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), schemas.Fireworks, @@ -204,6 +206,7 @@ func (provider *FireworksProvider) ResponsesStream(ctx *schemas.BifrostContext, request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), diff --git a/core/providers/gemini/chat.go b/core/providers/gemini/chat.go index a0b399d034..2a398301b0 100644 --- a/core/providers/gemini/chat.go +++ b/core/providers/gemini/chat.go @@ -31,7 +31,10 @@ func ToGeminiChatCompletionRequest(bifrostReq *schemas.BifrostChatRequest) (*Gem } // Handle tool-related parameters if len(bifrostReq.Params.Tools) > 0 { - geminiReq.Tools = convertBifrostToolsToGemini(bifrostReq.Params.Tools) + geminiReq.Tools, err = convertBifrostToolsToGemini(bifrostReq.Params.Tools) + if err != nil { + return nil, err + } // Convert tool choice to tool config if bifrostReq.Params.ToolChoice != nil { diff --git a/core/providers/gemini/gemini.go b/core/providers/gemini/gemini.go index a074938b16..356a300db3 100644 --- a/core/providers/gemini/gemini.go +++ b/core/providers/gemini/gemini.go @@ -428,7 +428,7 @@ func HandleGeminiChatCompletionStream( // Make the request — caller is responsible for passing a streaming-configured client. doErr := client.Do(req, resp) if doErr != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(doErr, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -450,7 +450,7 @@ func HandleGeminiChatCompletionStream( // Check for HTTP errors — use parseGeminiError to preserve upstream error details if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) respBody := append([]byte(nil), resp.Body()...) return nil, providerUtils.EnrichError(ctx, parseGeminiError(resp), jsonBody, respBody, sendBackRawRequest, sendBackRawResponse) } @@ -476,7 +476,7 @@ func HandleGeminiChatCompletionStream( } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if resp.BodyStream() == nil { bifrostErr := providerUtils.NewBifrostOperationError( @@ -928,7 +928,7 @@ func HandleGeminiResponsesStream( // Make the request — caller is responsible for passing a streaming-configured client. doErr := client.Do(req, resp) if doErr != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(doErr, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -950,7 +950,7 @@ func HandleGeminiResponsesStream( // Check for HTTP errors — use parseGeminiError to preserve upstream error details if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, parseGeminiError(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) } @@ -976,7 +976,7 @@ func HandleGeminiResponsesStream( close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if resp.BodyStream() == nil { bifrostErr := providerUtils.NewBifrostOperationError( @@ -1417,7 +1417,7 @@ func (provider *GeminiProvider) SpeechStream(ctx *schemas.BifrostContext, postHo // Make the request err := provider.streamingClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -1439,7 +1439,7 @@ func (provider *GeminiProvider) SpeechStream(ctx *schemas.BifrostContext, postHo // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, parseGeminiError(resp), jsonBody, nil, provider.sendBackRawRequest, provider.sendBackRawResponse) } @@ -1467,7 +1467,7 @@ func (provider *GeminiProvider) SpeechStream(ctx *schemas.BifrostContext, postHo close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) @@ -1496,10 +1496,11 @@ func (provider *GeminiProvider) SpeechStream(ctx *schemas.BifrostContext, postHo data, readErr := sseReader.ReadDataLine() if readErr != nil { + // Recheck context cancellation + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) provider.logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, provider.logger, postHookSpanFinalizer) @@ -1706,7 +1707,7 @@ func (provider *GeminiProvider) TranscriptionStream(ctx *schemas.BifrostContext, // Make the request err := provider.streamingClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -1728,7 +1729,7 @@ func (provider *GeminiProvider) TranscriptionStream(ctx *schemas.BifrostContext, // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, parseGeminiError(resp), jsonBody, nil, provider.sendBackRawRequest, provider.sendBackRawResponse) } @@ -1755,7 +1756,7 @@ func (provider *GeminiProvider) TranscriptionStream(ctx *schemas.BifrostContext, } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -1785,10 +1786,11 @@ func (provider *GeminiProvider) TranscriptionStream(ctx *schemas.BifrostContext, data, readErr := sseReader.ReadDataLine() if readErr != nil { + // Recheck context cancellation + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) provider.logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, provider.logger, postHookSpanFinalizer) @@ -4189,7 +4191,7 @@ func (provider *GeminiProvider) PassthroughStream( activeClient := providerUtils.PrepareResponseStreaming(ctx, provider.streamingClient, resp) if err := activeClient.Do(fasthttpReq, resp); err != nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -4211,7 +4213,7 @@ func (provider *GeminiProvider) PassthroughStream( bodyStream := resp.BodyStream() if bodyStream == nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.NewBifrostOperationError( "provider returned an empty stream body", fmt.Errorf("provider returned an empty stream body"), @@ -4242,7 +4244,7 @@ func (provider *GeminiProvider) PassthroughStream( } close(ch) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) defer stopIdleTimeout() defer stopCancellation() @@ -4291,9 +4293,11 @@ func (provider *GeminiProvider) PassthroughStream( if ctx.Err() != nil { return // let defer handle cancel/timeout } - ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) - extraFields.Latency = time.Since(startTime).Milliseconds() - providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, ch, provider.logger, postHookSpanFinalizer) + if readErr != io.EOF { + ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) + extraFields.Latency = time.Since(startTime).Milliseconds() + providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, ch, provider.logger, postHookSpanFinalizer) + } return } } diff --git a/core/providers/gemini/responses.go b/core/providers/gemini/responses.go index 95fead5661..bcef38f810 100644 --- a/core/providers/gemini/responses.go +++ b/core/providers/gemini/responses.go @@ -93,7 +93,10 @@ func ToGeminiResponsesRequest(bifrostReq *schemas.BifrostResponsesRequest) (*Gem geminiReq.ExtraParams = bifrostReq.Params.ExtraParams // Handle tool-related parameters if len(bifrostReq.Params.Tools) > 0 { - geminiReq.Tools = convertResponsesToolsToGemini(bifrostReq.Params.Tools) + geminiReq.Tools, err = convertResponsesToolsToGemini(bifrostReq.Params.Tools) + if err != nil { + return nil, err + } // Convert tool choice if present if bifrostReq.Params.ToolChoice != nil { @@ -2191,6 +2194,16 @@ func convertGeminiToolsToResponsesTools(tools []Tool) []schemas.ResponsesTool { if fn.Parameters != nil { params := convertSchemaToFunctionParameters(fn.Parameters) responsesTool.ResponsesToolFunction.Parameters = ¶ms + } else if fn.ParametersJSONSchema != nil { + raw, err := providerUtils.MarshalSorted(fn.ParametersJSONSchema) + if err != nil { + continue + } + var params schemas.ToolFunctionParameters + if err := json.Unmarshal(raw, ¶ms); err != nil { + continue + } + responsesTool.ResponsesToolFunction.Parameters = ¶ms } responsesTools = append(responsesTools, responsesTool) } @@ -2779,7 +2792,7 @@ func (r *GeminiGenerationRequest) convertParamsToGenerationConfigResponses(param } // convertResponsesToolsToGemini converts Responses tools to Gemini tools -func convertResponsesToolsToGemini(tools []schemas.ResponsesTool) []Tool { +func convertResponsesToolsToGemini(tools []schemas.ResponsesTool) ([]Tool, error) { geminiTool := Tool{} hasWebSearchTool := false @@ -2805,12 +2818,13 @@ func convertResponsesToolsToGemini(tools []schemas.ResponsesTool) []Tool { } return "" }(), - Parameters: func() *Schema { - if tool.ResponsesToolFunction.Parameters != nil { - return convertFunctionParametersToSchema(*tool.ResponsesToolFunction.Parameters) - } - return nil - }(), + } + if tool.ResponsesToolFunction.Parameters != nil { + raw, err := providerUtils.MarshalSorted(tool.ResponsesToolFunction.Parameters) + if err != nil { + return []Tool{}, fmt.Errorf("marshal tool %q parameters: %w", *tool.Name, err) + } + funcDecl.ParametersJSONSchema = json.RawMessage(raw) } geminiTool.FunctionDeclarations = append(geminiTool.FunctionDeclarations, funcDecl) } @@ -2833,9 +2847,9 @@ func convertResponsesToolsToGemini(tools []schemas.ResponsesTool) []Tool { } if len(geminiTool.FunctionDeclarations) > 0 || geminiTool.GoogleSearch != nil { - return []Tool{geminiTool} + return []Tool{geminiTool}, nil } - return []Tool{} + return []Tool{}, nil } // convertResponsesToolChoiceToGemini converts Responses tool choice to Gemini tool config diff --git a/core/providers/gemini/uniontype_test.go b/core/providers/gemini/uniontype_test.go index fcb84e312a..25b74cbd16 100644 --- a/core/providers/gemini/uniontype_test.go +++ b/core/providers/gemini/uniontype_test.go @@ -120,9 +120,8 @@ func TestConvertPropertyToSchema_UnionType(t *testing.T) { } } -// TestConvertBifrostToolsToGemini_UnionTypeProperty is the end-to-end test -// that reproduces the Goose+Vertex bug: a tool parameter with -// "type": ["integer", "null"] must produce a non-empty Gemini type field. +// TestConvertBifrostToolsToGemini_UnionTypeProperty verifies that tool parameters +// with JSON Schema union types are passed through unchanged in parametersJsonSchema. func TestConvertBifrostToolsToGemini_UnionTypeProperty(t *testing.T) { toolJSON := `{ "type": "function", @@ -149,28 +148,37 @@ func TestConvertBifrostToolsToGemini_UnionTypeProperty(t *testing.T) { var chatTool schemas.ChatTool require.NoError(t, json.Unmarshal([]byte(toolJSON), &chatTool)) - geminiTools := convertBifrostToolsToGemini([]schemas.ChatTool{chatTool}) + geminiTools, err := convertBifrostToolsToGemini([]schemas.ChatTool{chatTool}) + require.NoError(t, err) require.Len(t, geminiTools, 1) require.Len(t, geminiTools[0].FunctionDeclarations, 1) fd := geminiTools[0].FunctionDeclarations[0] - require.NotNil(t, fd.Parameters) + require.NotNil(t, fd.ParametersJSONSchema) + assert.Nil(t, fd.Parameters, "chat tools use parametersJsonSchema passthrough, not Gemini Schema") + + raw, err := json.Marshal(fd.ParametersJSONSchema) + require.NoError(t, err) - timeoutSchema, ok := fd.Parameters.Properties["timeout_secs"] + var paramsSchema map[string]interface{} + require.NoError(t, json.Unmarshal(raw, ¶msSchema)) + + properties, ok := paramsSchema["properties"].(map[string]interface{}) + require.True(t, ok, "parameters must have properties") + + timeoutProp, ok := properties["timeout_secs"].(map[string]interface{}) require.True(t, ok, "timeout_secs property must be present") - // Before the fix this was "" — Vertex AI rejected with - // "parameters.timeout_secs schema didn't specify the schema type field" - assert.NotEmpty(t, timeoutSchema.Type, "Type must not be empty for union-typed property") - assert.Equal(t, Type("integer"), timeoutSchema.Type) - require.NotNil(t, timeoutSchema.Nullable) - assert.True(t, *timeoutSchema.Nullable) + timeoutType, ok := timeoutProp["type"].([]interface{}) + require.True(t, ok, "timeout_secs type must be a JSON Schema union array") + assert.Equal(t, "integer", timeoutType[0]) + assert.Equal(t, "null", timeoutType[1]) + assert.Equal(t, "Timeout in seconds", timeoutProp["description"]) - // The non-union "command" property must be unaffected - commandSchema, ok := fd.Parameters.Properties["command"] + commandProp, ok := properties["command"].(map[string]interface{}) require.True(t, ok) - assert.Equal(t, Type("string"), commandSchema.Type) - assert.Nil(t, commandSchema.Nullable) + assert.Equal(t, "string", commandProp["type"]) + assert.Equal(t, "Command to run", commandProp["description"]) } func boolPtr(b bool) *bool { return &b } @@ -321,7 +329,8 @@ func TestConvertBifrostToolsToGemini_WirePayload(t *testing.T) { var chatTool schemas.ChatTool require.NoError(t, json.Unmarshal([]byte(toolJSON), &chatTool)) - geminiTools := convertBifrostToolsToGemini([]schemas.ChatTool{chatTool}) + geminiTools, err := convertBifrostToolsToGemini([]schemas.ChatTool{chatTool}) + require.NoError(t, err) require.Len(t, geminiTools, 1) // Serialize to the exact bytes that would be sent to Vertex diff --git a/core/providers/gemini/utils.go b/core/providers/gemini/utils.go index 43708fd2a2..0e7c3cdc3a 100644 --- a/core/providers/gemini/utils.go +++ b/core/providers/gemini/utils.go @@ -1229,7 +1229,7 @@ func convertParamsToGenerationConfig(params *schemas.ChatParameters, responseMod } // convertBifrostToolsToGemini converts Bifrost tools to Gemini format -func convertBifrostToolsToGemini(bifrostTools []schemas.ChatTool) []Tool { +func convertBifrostToolsToGemini(bifrostTools []schemas.ChatTool) ([]Tool, error) { geminiTool := Tool{} for _, tool := range bifrostTools { @@ -1241,7 +1241,11 @@ func convertBifrostToolsToGemini(bifrostTools []schemas.ChatTool) []Tool { Name: tool.Function.Name, } if tool.Function.Parameters != nil { - fd.Parameters = convertFunctionParametersToSchema(*tool.Function.Parameters) + raw, err := providerUtils.MarshalSorted(tool.Function.Parameters) + if err != nil { + return nil, fmt.Errorf("marshal tool %q parameters: %w", tool.Function.Name, err) + } + fd.ParametersJSONSchema = json.RawMessage(raw) } if tool.Function.Description != nil { fd.Description = *tool.Function.Description @@ -1251,9 +1255,9 @@ func convertBifrostToolsToGemini(bifrostTools []schemas.ChatTool) []Tool { } if len(geminiTool.FunctionDeclarations) > 0 { - return []Tool{geminiTool} + return []Tool{geminiTool}, nil } - return []Tool{} + return []Tool{}, nil } // convertFunctionParametersToSchema converts Bifrost function parameters to Gemini Schema diff --git a/core/providers/groq/groq.go b/core/providers/groq/groq.go index 9667b989ff..f5671b8d63 100644 --- a/core/providers/groq/groq.go +++ b/core/providers/groq/groq.go @@ -132,6 +132,7 @@ func (provider *GroqProvider) ChatCompletionStream(ctx *schemas.BifrostContext, request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), schemas.Groq, diff --git a/core/providers/huggingface/huggingface.go b/core/providers/huggingface/huggingface.go index 38ddd84e9f..4a5b379bb7 100644 --- a/core/providers/huggingface/huggingface.go +++ b/core/providers/huggingface/huggingface.go @@ -577,6 +577,7 @@ func (provider *HuggingFaceProvider) ChatCompletionStream(ctx *schemas.BifrostCo request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -1063,7 +1064,7 @@ func (provider *HuggingFaceProvider) ImageGenerationStream(ctx *schemas.BifrostC // Make the request err := provider.streamingClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -1085,7 +1086,7 @@ func (provider *HuggingFaceProvider) ImageGenerationStream(ctx *schemas.BifrostC // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, parseHuggingFaceImageError(resp), jsonBody, nil, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse)) } @@ -1104,7 +1105,7 @@ func (provider *HuggingFaceProvider) ImageGenerationStream(ctx *schemas.BifrostC // Start streaming in a goroutine go func() { defer providerUtils.EnsureStreamFinalizerCalled(ctx, postHookSpanFinalizer) - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) defer close(responseChan) if resp.BodyStream() == nil { @@ -1145,10 +1146,10 @@ func (provider *HuggingFaceProvider) ImageGenerationStream(ctx *schemas.BifrostC data, readErr := sseReader.ReadDataLine() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } bifrostErr := providerUtils.NewBifrostOperationError( fmt.Sprintf("Error reading fal-ai stream: %v", readErr), readErr) @@ -1442,7 +1443,7 @@ func (provider *HuggingFaceProvider) ImageEditStream(ctx *schemas.BifrostContext // Make the request err := provider.streamingClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -1464,7 +1465,7 @@ func (provider *HuggingFaceProvider) ImageEditStream(ctx *schemas.BifrostContext // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.EnrichError(ctx, parseHuggingFaceImageError(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) } @@ -1483,7 +1484,7 @@ func (provider *HuggingFaceProvider) ImageEditStream(ctx *schemas.BifrostContext // Start streaming in a goroutine go func() { defer providerUtils.EnsureStreamFinalizerCalled(ctx, postHookSpanFinalizer) - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) defer close(responseChan) if resp.BodyStream() == nil { @@ -1524,10 +1525,10 @@ func (provider *HuggingFaceProvider) ImageEditStream(ctx *schemas.BifrostContext data, readErr := sseReader.ReadDataLine() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } bifrostErr := providerUtils.NewBifrostOperationError( fmt.Sprintf("Error reading fal-ai stream: %v", readErr), readErr) diff --git a/core/providers/huggingface/types.go b/core/providers/huggingface/types.go index 5ba75a3159..a1d9b932db 100644 --- a/core/providers/huggingface/types.go +++ b/core/providers/huggingface/types.go @@ -12,7 +12,7 @@ import ( // # MODELS TYPES -// refered from https://huggingface.co/api/models +// referred from https://huggingface.co/api/models type HuggingFaceModel struct { ID string `json:"_id"` ModelID string `json:"modelId"` diff --git a/core/providers/mistral/mistral.go b/core/providers/mistral/mistral.go index b833f8b7cc..4198fc759f 100644 --- a/core/providers/mistral/mistral.go +++ b/core/providers/mistral/mistral.go @@ -208,6 +208,7 @@ func (provider *MistralProvider) ChatCompletionStream(ctx *schemas.BifrostContex provider.normalizeChatRequestForConversion(request), authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -426,7 +427,7 @@ func (provider *MistralProvider) TranscriptionStream(ctx *schemas.BifrostContext // Make the request err := provider.streamingClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -448,7 +449,7 @@ func (provider *MistralProvider) TranscriptionStream(ctx *schemas.BifrostContext // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, ParseMistralError(resp) } @@ -474,7 +475,7 @@ func (provider *MistralProvider) TranscriptionStream(ctx *schemas.BifrostContext } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -503,11 +504,11 @@ func (provider *MistralProvider) TranscriptionStream(ctx *schemas.BifrostContext eventType, eventDataBytes, readErr := sseReader.ReadEvent() if readErr != nil { + // If context was cancelled/timed out, let defer handle it + if ctx.Err() != nil { + return + } if readErr != io.EOF { - // If context was cancelled/timed out, let defer handle it - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) provider.logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, provider.logger, postHookSpanFinalizer) diff --git a/core/providers/nebius/nebius.go b/core/providers/nebius/nebius.go index 13e2cb4e33..cb249d873e 100644 --- a/core/providers/nebius/nebius.go +++ b/core/providers/nebius/nebius.go @@ -119,6 +119,7 @@ func (provider *NebiusProvider) TextCompletionStream(ctx *schemas.BifrostContext request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -178,6 +179,7 @@ func (provider *NebiusProvider) ChatCompletionStream(ctx *schemas.BifrostContext request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), diff --git a/core/providers/ollama/ollama.go b/core/providers/ollama/ollama.go index 1bc620e947..ae2785afe2 100644 --- a/core/providers/ollama/ollama.go +++ b/core/providers/ollama/ollama.go @@ -160,6 +160,7 @@ func (provider *OllamaProvider) TextCompletionStream(ctx *schemas.BifrostContext request, nil, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -211,6 +212,7 @@ func (provider *OllamaProvider) ChatCompletionStream(ctx *schemas.BifrostContext request, nil, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), schemas.Ollama, diff --git a/core/providers/openai/batch.go b/core/providers/openai/batch.go index 25b4f0a1b5..e4dc35b20d 100644 --- a/core/providers/openai/batch.go +++ b/core/providers/openai/batch.go @@ -42,6 +42,11 @@ type OpenAIBatchResponse struct { CancelledAt *int64 `json:"cancelled_at,omitempty"` RequestCounts *OpenAIBatchRequestCounts `json:"request_counts,omitempty"` Metadata map[string]string `json:"metadata,omitempty"` + + // Azure Blob Storage URLs (returned by Azure when using blob storage input/output) + InputBlob *string `json:"input_blob,omitempty"` + OutputBlob *string `json:"output_blob,omitempty"` + ErrorBlob *string `json:"error_blob,omitempty"` } // OpenAIBatchRequestCounts represents the request counts for a batch. @@ -97,6 +102,9 @@ func (r *OpenAIBatchResponse) ToBifrostBatchCreateResponse(latency time.Duration CreatedAt: r.CreatedAt, OutputFileID: r.OutputFileID, ErrorFileID: r.ErrorFileID, + InputBlob: r.InputBlob, + OutputBlob: r.OutputBlob, + ErrorBlob: r.ErrorBlob, ExtraFields: schemas.BifrostResponseExtraFields{ Latency: latency.Milliseconds(), }, @@ -146,6 +154,9 @@ func (r *OpenAIBatchResponse) ToBifrostBatchRetrieveResponse(latency time.Durati OutputFileID: r.OutputFileID, ErrorFileID: r.ErrorFileID, Errors: r.Errors, + InputBlob: r.InputBlob, + OutputBlob: r.OutputBlob, + ErrorBlob: r.ErrorBlob, ExtraFields: schemas.BifrostResponseExtraFields{ Latency: latency.Milliseconds(), }, diff --git a/core/providers/openai/openai.go b/core/providers/openai/openai.go index 5a8e63c004..2cecb2b8a6 100644 --- a/core/providers/openai/openai.go +++ b/core/providers/openai/openai.go @@ -401,6 +401,7 @@ func (provider *OpenAIProvider) TextCompletionStream(ctx *schemas.BifrostContext request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -422,6 +423,7 @@ func HandleOpenAITextCompletionStreaming( request *schemas.BifrostTextCompletionRequest, authHeader map[string]string, extraHeaders map[string]string, + streamIdleTimeoutInSeconds int, sendBackRawRequest bool, sendBackRawResponse bool, providerName schemas.ModelProvider, @@ -432,6 +434,7 @@ func HandleOpenAITextCompletionStreaming( logger schemas.Logger, postHookSpanFinalizer func(context.Context), ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, streamIdleTimeoutInSeconds) headers := map[string]string{ "Content-Type": "application/json", "Accept": "text/event-stream", @@ -487,7 +490,7 @@ func HandleOpenAITextCompletionStreaming( // Make the request err := activeClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -509,7 +512,7 @@ func HandleOpenAITextCompletionStreaming( // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) providerUtils.MaterializeStreamErrorBody(ctx, resp) if customErrorConverter != nil { return nil, providerUtils.EnrichError(ctx, customErrorConverter(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) @@ -538,7 +541,7 @@ func HandleOpenAITextCompletionStreaming( } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -577,10 +580,10 @@ func HandleOpenAITextCompletionStreaming( } data, readErr := sseReader.ReadDataLine() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, logger, postHookSpanFinalizer) @@ -917,6 +920,7 @@ func (provider *OpenAIProvider) ChatCompletionStream(ctx *schemas.BifrostContext request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -940,6 +944,7 @@ func HandleOpenAIChatCompletionStreaming( request *schemas.BifrostChatRequest, authHeader map[string]string, extraHeaders map[string]string, + streamIdleTimeoutInSeconds int, sendBackRawRequest bool, sendBackRawResponse bool, providerName schemas.ModelProvider, @@ -952,6 +957,7 @@ func HandleOpenAIChatCompletionStreaming( logger schemas.Logger, postHookSpanFinalizer func(context.Context), ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, streamIdleTimeoutInSeconds) // Check if the request is a redirect from ResponsesStream to ChatCompletionStream isResponsesToChatCompletionsFallback := false var responsesStreamState *schemas.ChatToResponsesStreamState @@ -1025,7 +1031,7 @@ func HandleOpenAIChatCompletionStreaming( // Make the request err := activeClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -1047,7 +1053,7 @@ func HandleOpenAIChatCompletionStreaming( // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) providerUtils.MaterializeStreamErrorBody(ctx, resp) if customErrorConverter != nil { return nil, providerUtils.EnrichError(ctx, customErrorConverter(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) @@ -1078,7 +1084,7 @@ func HandleOpenAIChatCompletionStreaming( schemas.ReleaseChatToResponsesStreamState(responsesStreamState) close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -1113,6 +1119,9 @@ func HandleOpenAIChatCompletionStreaming( var modelName string var created int forwardedTerminalFinishReason := false + // Defer final completed/incomplete event until usage chunk arrives (fallback path only). + var pendingFinalEvent *schemas.BifrostResponsesStreamResponse + usageSeen := false for { // If context was cancelled/timed out, let defer handle it @@ -1121,10 +1130,10 @@ func HandleOpenAIChatCompletionStreaming( } data, readErr := sseReader.ReadDataLine() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, logger, postHookSpanFinalizer) @@ -1175,6 +1184,32 @@ func HandleOpenAIChatCompletionStreaming( } if isResponsesToChatCompletionsFallback { + // Accumulate usage across chunks; attached to final event below. + if response.Usage != nil { + usageSeen = true + if response.Usage.PromptTokens > usage.PromptTokens { + usage.PromptTokens = response.Usage.PromptTokens + } + if response.Usage.CompletionTokens > usage.CompletionTokens { + usage.CompletionTokens = response.Usage.CompletionTokens + } + if response.Usage.TotalTokens > usage.TotalTokens { + usage.TotalTokens = response.Usage.TotalTokens + } + if calculatedTotal := usage.PromptTokens + usage.CompletionTokens; calculatedTotal > usage.TotalTokens { + usage.TotalTokens = calculatedTotal + } + if response.Usage.PromptTokensDetails != nil { + usage.PromptTokensDetails = response.Usage.PromptTokensDetails + } + if response.Usage.CompletionTokensDetails != nil { + usage.CompletionTokensDetails = response.Usage.CompletionTokensDetails + } + if response.Usage.Cost != nil { + usage.Cost = response.Usage.Cost + } + } + spreadResponses := response.ToBifrostResponsesStreamResponse(responsesStreamState) for _, response := range spreadResponses { if response.Type == schemas.ResponsesStreamResponseTypeError { @@ -1206,14 +1241,9 @@ func HandleOpenAIChatCompletionStreaming( } if response.Type == schemas.ResponsesStreamResponseTypeCompleted || response.Type == schemas.ResponsesStreamResponseTypeIncomplete { - // Set raw request if enabled - if sendBackRawRequest { - providerUtils.ParseAndSetRawRequest(&response.ExtraFields, jsonBody) - } - response.ExtraFields.Latency = time.Since(startTime).Milliseconds() - ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) - providerUtils.ProcessAndSendResponse(ctx, postHookRunner, providerUtils.GetBifrostResponseForStreamResponse(nil, nil, response, nil, nil, nil), responseChan, postHookSpanFinalizer) - return + // Defer sending until stream end so usage can be attached. + pendingFinalEvent = response + continue } response.ExtraFields.Latency = time.Since(lastChunkTime).Milliseconds() @@ -1315,7 +1345,19 @@ func HandleOpenAIChatCompletionStreaming( } } - if !isResponsesToChatCompletionsFallback { + if isResponsesToChatCompletionsFallback { + if pendingFinalEvent != nil { + if usageSeen && pendingFinalEvent.Response != nil { + pendingFinalEvent.Response.Usage = usage.ToResponsesResponseUsage() + } + if sendBackRawRequest { + providerUtils.ParseAndSetRawRequest(&pendingFinalEvent.ExtraFields, jsonBody) + } + pendingFinalEvent.ExtraFields.Latency = time.Since(startTime).Milliseconds() + ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) + providerUtils.ProcessAndSendResponse(ctx, postHookRunner, providerUtils.GetBifrostResponseForStreamResponse(nil, nil, pendingFinalEvent, nil, nil, nil), responseChan, postHookSpanFinalizer) + } + } else { finalFinishReason := finishReason if forwardedTerminalFinishReason { finalFinishReason = nil @@ -1339,6 +1381,14 @@ func HandleOpenAIChatCompletionStreaming( // Responses performs a responses request to the OpenAI API. func (provider *OpenAIProvider) Responses(ctx *schemas.BifrostContext, key schemas.Key, request *schemas.BifrostResponsesRequest) (*schemas.BifrostResponsesResponse, *schemas.BifrostError) { + if provider.shouldFallbackResponsesToChat(schemas.ResponsesRequest, schemas.ChatCompletionRequest) { + chatResponse, err := provider.ChatCompletion(ctx, key, request.ToChatRequest()) + if err != nil { + return nil, err + } + return chatResponse.ToBifrostResponsesResponse(), nil + } + // Check if chat completion is allowed for this provider if err := providerUtils.CheckOperationAllowed(schemas.OpenAI, provider.customProviderConfig, schemas.ResponsesRequest); err != nil { return nil, err @@ -1502,6 +1552,11 @@ func HandleOpenAIResponsesRequest( // ResponsesStream performs a streaming responses request to the OpenAI API. func (provider *OpenAIProvider) ResponsesStream(ctx *schemas.BifrostContext, postHookRunner schemas.PostHookRunner, postHookSpanFinalizer func(context.Context), key schemas.Key, request *schemas.BifrostResponsesRequest) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + if provider.shouldFallbackResponsesToChat(schemas.ResponsesStreamRequest, schemas.ChatCompletionStreamRequest) { + ctx.SetValue(schemas.BifrostContextKeyIsResponsesToChatCompletionFallback, true) + return provider.ChatCompletionStream(ctx, postHookRunner, postHookSpanFinalizer, key, request.ToChatRequest()) + } + // Check if chat completion stream is allowed for this provider if err := providerUtils.CheckOperationAllowed(schemas.OpenAI, provider.customProviderConfig, schemas.ResponsesStreamRequest); err != nil { return nil, err @@ -1525,6 +1580,7 @@ func (provider *OpenAIProvider) ResponsesStream(ctx *schemas.BifrostContext, pos request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -1547,6 +1603,7 @@ func HandleOpenAIResponsesStreaming( request *schemas.BifrostResponsesRequest, authHeader map[string]string, extraHeaders map[string]string, + streamIdleTimeoutInSeconds int, sendBackRawRequest bool, sendBackRawResponse bool, providerName schemas.ModelProvider, @@ -1558,6 +1615,7 @@ func HandleOpenAIResponsesStreaming( logger schemas.Logger, postHookSpanFinalizer func(context.Context), ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, streamIdleTimeoutInSeconds) // Prepare SGL headers (SGL typically doesn't require authorization, but we include it if provided) headers := map[string]string{ "Content-Type": "application/json", @@ -1614,7 +1672,7 @@ func HandleOpenAIResponsesStreaming( // Make the request err := activeClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -1636,7 +1694,7 @@ func HandleOpenAIResponsesStreaming( // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) providerUtils.MaterializeStreamErrorBody(ctx, resp) if customErrorConverter != nil { return nil, providerUtils.EnrichError(ctx, customErrorConverter(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) @@ -1665,7 +1723,7 @@ func HandleOpenAIResponsesStreaming( } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -1699,10 +1757,10 @@ func HandleOpenAIResponsesStreaming( } data, readErr := sseReader.ReadDataLine() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, logger, postHookSpanFinalizer) @@ -1972,6 +2030,17 @@ func HandleOpenAIEmbeddingRequest( return response, nil } +// shouldFallbackResponsesToChat reports whether a Responses call should be +// transparently translated into Chat Completions. This applies when a custom +// provider disables the Responses operation but still allows Chat Completions. +func (provider *OpenAIProvider) shouldFallbackResponsesToChat(responsesOp, chatOp schemas.RequestType) bool { + cfg := provider.customProviderConfig + if cfg == nil || cfg.AllowedRequests == nil { + return false + } + return !cfg.IsOperationAllowed(responsesOp) && cfg.IsOperationAllowed(chatOp) +} + // Speech handles non-streaming speech synthesis requests. // It formats the request body, makes the API call, and returns the response. // Returns the response and any error that occurred. @@ -2128,6 +2197,7 @@ func (provider *OpenAIProvider) SpeechStream(ctx *schemas.BifrostContext, postHo request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -2148,6 +2218,7 @@ func HandleOpenAISpeechStreamRequest( request *schemas.BifrostSpeechRequest, authHeader map[string]string, extraHeaders map[string]string, + streamIdleTimeoutInSeconds int, sendBackRawRequest bool, sendBackRawResponse bool, providerName schemas.ModelProvider, @@ -2157,6 +2228,7 @@ func HandleOpenAISpeechStreamRequest( logger schemas.Logger, postHookSpanFinalizer func(context.Context), ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, streamIdleTimeoutInSeconds) // Create HTTP request for streaming req := fasthttp.AcquireRequest() resp := fasthttp.AcquireResponse() @@ -2213,7 +2285,7 @@ func HandleOpenAISpeechStreamRequest( // Make the request err := activeClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -2235,7 +2307,7 @@ func HandleOpenAISpeechStreamRequest( // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) providerUtils.MaterializeStreamErrorBody(ctx, resp) return nil, providerUtils.EnrichError(ctx, ParseOpenAIError(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) } @@ -2261,7 +2333,7 @@ func HandleOpenAISpeechStreamRequest( } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -2297,10 +2369,10 @@ func HandleOpenAISpeechStreamRequest( data, readErr := sseReader.ReadDataLine() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, logger, postHookSpanFinalizer) @@ -2568,6 +2640,7 @@ func (provider *OpenAIProvider) TranscriptionStream(ctx *schemas.BifrostContext, request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), false, provider.GetProviderKey(), @@ -2589,6 +2662,7 @@ func HandleOpenAITranscriptionStreamRequest( request *schemas.BifrostTranscriptionRequest, authHeader map[string]string, extraHeaders map[string]string, + streamIdleTimeoutInSeconds int, sendBackRawResponse bool, accumulateText bool, providerName schemas.ModelProvider, @@ -2599,6 +2673,7 @@ func HandleOpenAITranscriptionStreamRequest( logger schemas.Logger, postHookSpanFinalizer func(context.Context), ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, streamIdleTimeoutInSeconds) // Use centralized converter reqBody := ToOpenAITranscriptionRequest(request) if reqBody == nil { @@ -2651,7 +2726,7 @@ func HandleOpenAITranscriptionStreamRequest( // Make the request err := client.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -2673,7 +2748,7 @@ func HandleOpenAITranscriptionStreamRequest( // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) providerUtils.MaterializeStreamErrorBody(ctx, resp) return nil, ParseOpenAIError(resp) } @@ -2699,7 +2774,7 @@ func HandleOpenAITranscriptionStreamRequest( } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -2736,10 +2811,10 @@ func HandleOpenAITranscriptionStreamRequest( data, readErr := sseReader.ReadDataLine() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, logger, postHookSpanFinalizer) @@ -2997,6 +3072,7 @@ func (provider *OpenAIProvider) ImageGenerationStream( request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -3016,6 +3092,7 @@ func HandleOpenAIImageGenerationStreaming( request *schemas.BifrostImageGenerationRequest, authHeader map[string]string, extraHeaders map[string]string, + streamIdleTimeoutInSeconds int, sendBackRawRequest bool, sendBackRawResponse bool, providerName schemas.ModelProvider, @@ -3026,6 +3103,7 @@ func HandleOpenAIImageGenerationStreaming( logger schemas.Logger, postHookSpanFinalizer func(context.Context), ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, streamIdleTimeoutInSeconds) // Set headers headers := map[string]string{ "Content-Type": "application/json", @@ -3086,7 +3164,7 @@ func HandleOpenAIImageGenerationStreaming( // Make the request err := activeClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -3108,7 +3186,7 @@ func HandleOpenAIImageGenerationStreaming( // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) providerUtils.MaterializeStreamErrorBody(ctx, resp) return nil, providerUtils.EnrichError(ctx, ParseOpenAIError(resp), jsonBody, nil, sendBackRawRequest, sendBackRawResponse) } @@ -3134,7 +3212,7 @@ func HandleOpenAIImageGenerationStreaming( } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -3177,8 +3255,10 @@ func HandleOpenAIImageGenerationStreaming( data, readErr := sseReader.ReadDataLine() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, logger, postHookSpanFinalizer) } break @@ -4180,7 +4260,7 @@ func HandleOpenAIImageEditRequest( latency, bifrostErr, wait := providerUtils.MakeRequestWithContext(ctx, activeClient, req, resp) defer wait() if bifrostErr != nil { - return nil, providerUtils.EnrichError(ctx, bifrostErr, bodyData, nil, sendBackRawRequest, sendBackRawResponse) + return nil, providerUtils.EnrichError(ctx, bifrostErr, nil, nil, sendBackRawRequest, sendBackRawResponse) } // Extract provider response headers early so they're available on error paths too providerResponseHeaders := providerUtils.ExtractProviderResponseHeaders(resp) @@ -4188,7 +4268,7 @@ func HandleOpenAIImageEditRequest( if resp.StatusCode() != fasthttp.StatusOK { providerUtils.MaterializeStreamErrorBody(ctx, resp) - return nil, providerUtils.EnrichError(ctx, ParseOpenAIError(resp), bodyData, nil, sendBackRawRequest, sendBackRawResponse) + return nil, providerUtils.EnrichError(ctx, ParseOpenAIError(resp), nil, nil, sendBackRawRequest, sendBackRawResponse) } bodyBytes, lpResult, finalErr := finalizeOpenAIResponse(ctx, resp, latency, providerName, logger) @@ -4203,7 +4283,7 @@ func HandleOpenAIImageEditRequest( } response := &schemas.BifrostImageGenerationResponse{} - rawRequest, rawResponse, bifrostErr := providerUtils.HandleProviderResponse(bodyBytes, response, bodyData, false, sendBackRawResponse) + rawRequest, rawResponse, bifrostErr := providerUtils.HandleProviderResponse(bodyBytes, response, nil, false, sendBackRawResponse) if bifrostErr != nil { return nil, bifrostErr } @@ -4241,6 +4321,7 @@ func (provider *OpenAIProvider) ImageEditStream(ctx *schemas.BifrostContext, pos request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, false, providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -4260,6 +4341,7 @@ func HandleOpenAIImageEditStreamRequest( request *schemas.BifrostImageEditRequest, authHeader map[string]string, extraHeaders map[string]string, + streamIdleTimeoutInSeconds int, sendBackRawRequest bool, sendBackRawResponse bool, providerName schemas.ModelProvider, @@ -4270,6 +4352,7 @@ func HandleOpenAIImageEditStreamRequest( logger schemas.Logger, postHookSpanFinalizer func(context.Context), ) (chan *schemas.BifrostStreamChunk, *schemas.BifrostError) { + providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, streamIdleTimeoutInSeconds) reqBody := ToOpenAIImageEditRequest(request) if reqBody == nil { return nil, providerUtils.NewBifrostOperationError("image edit input is not provided", nil) @@ -4320,7 +4403,7 @@ func HandleOpenAIImageEditStreamRequest( // Make the request err := client.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -4341,9 +4424,9 @@ func HandleOpenAIImageEditStreamRequest( // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) providerUtils.MaterializeStreamErrorBody(ctx, resp) - return nil, providerUtils.EnrichError(ctx, ParseOpenAIError(resp), body.Bytes(), nil, sendBackRawRequest, sendBackRawResponse) + return nil, providerUtils.EnrichError(ctx, ParseOpenAIError(resp), nil, nil, sendBackRawRequest, sendBackRawResponse) } // Large payload streaming passthrough — pipe raw upstream SSE to client @@ -4367,7 +4450,7 @@ func HandleOpenAIImageEditStreamRequest( } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -4410,6 +4493,9 @@ func HandleOpenAIImageEditStreamRequest( data, readErr := sseReader.ReadDataLine() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { logger.Warn(fmt.Sprintf("Error reading stream: %v", readErr)) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, logger, postHookSpanFinalizer) @@ -4425,7 +4511,7 @@ func HandleOpenAIImageEditStreamRequest( if err := sonic.UnmarshalString(jsonData, &bifrostErr); err == nil { if bifrostErr.Error != nil && bifrostErr.Error.Message != "" { ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) - providerUtils.ProcessAndSendBifrostError(ctx, postHookRunner, providerUtils.EnrichError(ctx, &bifrostErr, body.Bytes(), nil, sendBackRawRequest, sendBackRawResponse), responseChan, logger, postHookSpanFinalizer) + providerUtils.ProcessAndSendBifrostError(ctx, postHookRunner, providerUtils.EnrichError(ctx, &bifrostErr, nil, nil, sendBackRawRequest, sendBackRawResponse), responseChan, logger, postHookSpanFinalizer) return } } @@ -4701,7 +4787,7 @@ func HandleOpenAIImageVariationRequest( latency, bifrostErr, wait := providerUtils.MakeRequestWithContext(ctx, activeClient, req, resp) defer wait() if bifrostErr != nil { - return nil, providerUtils.EnrichError(ctx, bifrostErr, bodyData, nil, sendBackRawRequest, sendBackRawResponse) + return nil, providerUtils.EnrichError(ctx, bifrostErr, nil, nil, sendBackRawRequest, sendBackRawResponse) } // Extract provider response headers early so they're available on error paths too providerResponseHeaders := providerUtils.ExtractProviderResponseHeaders(resp) @@ -4709,7 +4795,7 @@ func HandleOpenAIImageVariationRequest( if resp.StatusCode() != fasthttp.StatusOK { providerUtils.MaterializeStreamErrorBody(ctx, resp) - return nil, providerUtils.EnrichError(ctx, ParseOpenAIError(resp), bodyData, nil, sendBackRawRequest, sendBackRawResponse) + return nil, providerUtils.EnrichError(ctx, ParseOpenAIError(resp), nil, nil, sendBackRawRequest, sendBackRawResponse) } bodyBytes, lpResult, finalErr := finalizeOpenAIResponse(ctx, resp, latency, providerName, logger) @@ -4724,7 +4810,7 @@ func HandleOpenAIImageVariationRequest( } response := &schemas.BifrostImageGenerationResponse{} - _, rawResponse, bifrostErr := providerUtils.HandleProviderResponse(bodyBytes, response, bodyData, sendBackRawRequest, sendBackRawResponse) + _, rawResponse, bifrostErr := providerUtils.HandleProviderResponse(bodyBytes, response, nil, false, sendBackRawResponse) if bifrostErr != nil { return nil, bifrostErr } @@ -6912,6 +6998,7 @@ func (provider *OpenAIProvider) PassthroughStream( return nil, err } + providerUtils.SetStreamIdleTimeoutIfEmpty(ctx, provider.networkConfig.StreamIdleTimeoutInSeconds) path := req.Path if after, ok := strings.CutPrefix(path, "/v1"); ok { path = after @@ -6948,7 +7035,7 @@ func (provider *OpenAIProvider) PassthroughStream( startTime := time.Now() if err := activeClient.Do(fasthttpReq, resp); err != nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -6970,7 +7057,7 @@ func (provider *OpenAIProvider) PassthroughStream( rawBodyStream := resp.BodyStream() if rawBodyStream == nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.NewBifrostOperationError( "provider returned an empty stream body", fmt.Errorf("provider returned an empty stream body")) @@ -6998,7 +7085,7 @@ func (provider *OpenAIProvider) PassthroughStream( } close(ch) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) defer stopIdleTimeout() defer stopCancellation() @@ -7033,9 +7120,11 @@ func (provider *OpenAIProvider) PassthroughStream( if ctx.Err() != nil { return // let defer handle cancel/timeout } - ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) - extraFields.Latency = time.Since(startTime).Milliseconds() - providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, ch, provider.logger, postHookSpanFinalizer) + if readErr != io.EOF { + ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) + extraFields.Latency = time.Since(startTime).Milliseconds() + providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, ch, provider.logger, postHookSpanFinalizer) + } return } } diff --git a/core/providers/openai/realtime.go b/core/providers/openai/realtime.go index 1a2e46bf34..65cddb4dd3 100644 --- a/core/providers/openai/realtime.go +++ b/core/providers/openai/realtime.go @@ -30,14 +30,14 @@ func (provider *OpenAIProvider) RealtimeWebSocketURL(key schemas.Key, model stri } // RealtimeHeaders returns the headers required for the OpenAI Realtime WebSocket connection. -func (provider *OpenAIProvider) RealtimeHeaders(key schemas.Key) map[string]string { +func (provider *OpenAIProvider) RealtimeHeaders(_ *schemas.BifrostContext, key schemas.Key) (map[string]string, *schemas.BifrostError) { headers := map[string]string{ "Authorization": "Bearer " + key.Value.GetValue(), } for k, v := range provider.networkConfig.ExtraHeaders { headers[k] = v } - return headers + return headers, nil } // SupportsRealtimeWebRTC reports that OpenAI supports WebRTC SDP exchange. @@ -217,7 +217,7 @@ func (provider *OpenAIProvider) CreateRealtimeClientSecret( return nil, err } - normalizedBody, _, bifrostErr := normalizeRealtimeClientSecretRequest(rawRequest, provider.GetProviderKey(), endpointType) + normalizedBody, _, bifrostErr := NormalizeRealtimeClientSecretRequest(rawRequest, provider.GetProviderKey(), endpointType) if bifrostErr != nil { return nil, bifrostErr } @@ -226,7 +226,8 @@ func (provider *OpenAIProvider) CreateRealtimeClientSecret( defer fasthttp.ReleaseRequest(req) defer fasthttp.ReleaseResponse(resp) - req.SetRequestURI(provider.buildRequestURL(ctx, realtimeSessionUpstreamPath(endpointType), schemas.RealtimeRequest)) + upstreamURL := provider.buildRequestURL(ctx, realtimeSessionUpstreamPath(endpointType), schemas.RealtimeRequest) + req.SetRequestURI(upstreamURL) req.Header.SetMethod(http.MethodPost) req.Header.SetContentType("application/json") for k, v := range provider.realtimeSessionHeaders(key, endpointType) { @@ -268,7 +269,11 @@ func (provider *OpenAIProvider) CreateRealtimeClientSecret( return out, nil } -func normalizeRealtimeClientSecretRequest( +// NormalizeRealtimeClientSecretRequest normalizes a realtime client secret request body +// by parsing the model string, resolving the provider, and restructuring the body +// to match the upstream provider's expected format. Exported for reuse by providers +// that share the same OpenAI-compatible Realtime protocol (e.g. Azure). +func NormalizeRealtimeClientSecretRequest( rawRequest json.RawMessage, defaultProvider schemas.ModelProvider, endpointType schemas.RealtimeSessionEndpointType, @@ -316,6 +321,7 @@ func normalizeRealtimeClientSecretsRequest( return nil, "", newRealtimeClientSecretError(fasthttp.StatusInternalServerError, "server_error", "failed to encode normalized model", marshalErr) } session["model"] = modelJSON + StripNestedModelPrefixes(session) if _, ok := session["type"]; !ok { typeJSON, marshalErr := json.Marshal("realtime") if marshalErr != nil { @@ -361,6 +367,7 @@ func normalizeRealtimeSessionsRequest( } root["model"] = modelJSON delete(root, "session") + StripNestedModelPrefixes(root) normalizedBody, marshalErr := json.Marshal(root) if marshalErr != nil { @@ -370,6 +377,68 @@ func normalizeRealtimeSessionsRequest( return normalizedBody, normalizedModel, nil } +// StripNestedModelPrefixes removes provider prefixes (e.g. "openai/whisper-1" → "whisper-1") +// from known nested model fields in the realtime session config. This prevents forwarding +// Bifrost-style "provider/model" strings to upstream providers that expect bare model names. +func StripNestedModelPrefixes(session map[string]json.RawMessage) { + // Old format: input_audio_transcription.model + stripModelInNestedObject(session, "input_audio_transcription") + + // New format: audio.input.transcription.model + if audioRaw, ok := session["audio"]; ok { + var audio map[string]json.RawMessage + if json.Unmarshal(audioRaw, &audio) == nil { + if inputRaw, ok := audio["input"]; ok { + var input map[string]json.RawMessage + if json.Unmarshal(inputRaw, &input) == nil { + if stripModelInNestedObject(input, "transcription") { + if updated, err := json.Marshal(input); err == nil { + audio["input"] = updated + if updatedAudio, err := json.Marshal(audio); err == nil { + session["audio"] = updatedAudio + } + } + } + } + } + } + } +} + +// stripModelInNestedObject strips the provider prefix from a "model" field inside a nested +// object at session[key]. Returns true if any change was made. +func stripModelInNestedObject(parent map[string]json.RawMessage, key string) bool { + objRaw, ok := parent[key] + if !ok || len(objRaw) == 0 || bytes.Equal(objRaw, []byte("null")) { + return false + } + var obj map[string]json.RawMessage + if json.Unmarshal(objRaw, &obj) != nil { + return false + } + modelRaw, ok := obj["model"] + if !ok { + return false + } + var modelStr string + if json.Unmarshal(modelRaw, &modelStr) != nil { + return false + } + // Strip provider prefix if present (e.g. "openai/whisper-1" → "whisper-1") + _, bareModel := schemas.ParseModelString(modelStr, "") + if bareModel == modelStr { + return false // no prefix to strip + } + if updated, err := json.Marshal(bareModel); err == nil { + obj["model"] = updated + if updatedObj, err := json.Marshal(obj); err == nil { + parent[key] = updatedObj + return true + } + } + return false +} + func (provider *OpenAIProvider) realtimeSessionHeaders( key schemas.Key, endpointType schemas.RealtimeSessionEndpointType, @@ -965,3 +1034,16 @@ func isRealtimeDeltaEvent(eventType string) bool { } return false } + +// ExtractNestedVoice digs into the new session.audio.output.voice path. +func ExtractNestedVoice(audioRaw json.RawMessage) string { + var audio struct { + Output struct { + Voice string `json:"voice"` + } `json:"output"` + } + if err := json.Unmarshal(audioRaw, &audio); err == nil && audio.Output.Voice != "" { + return audio.Output.Voice + } + return "" +} diff --git a/core/providers/openai/realtime_test.go b/core/providers/openai/realtime_test.go index 5710230b9b..9c5d1f0d2c 100644 --- a/core/providers/openai/realtime_test.go +++ b/core/providers/openai/realtime_test.go @@ -11,13 +11,13 @@ import ( func TestNormalizeRealtimeClientSecretRequest(t *testing.T) { t.Parallel() - body, model, bifrostErr := normalizeRealtimeClientSecretRequest( + body, model, bifrostErr := NormalizeRealtimeClientSecretRequest( json.RawMessage(`{"model":"openai/gpt-4o-realtime-preview","voice":"alloy"}`), schemas.OpenAI, schemas.RealtimeSessionEndpointClientSecrets, ) if bifrostErr != nil { - t.Fatalf("normalizeRealtimeClientSecretRequest() error = %v", bifrostErr) + t.Fatalf("NormalizeRealtimeClientSecretRequest() error = %v", bifrostErr) } if model != "gpt-4o-realtime-preview" { t.Fatalf("model = %q, want %q", model, "gpt-4o-realtime-preview") @@ -46,13 +46,13 @@ func TestNormalizeRealtimeClientSecretRequest(t *testing.T) { func TestNormalizeRealtimeClientSecretRequestUsesDefaultProvider(t *testing.T) { t.Parallel() - body, model, bifrostErr := normalizeRealtimeClientSecretRequest( + body, model, bifrostErr := NormalizeRealtimeClientSecretRequest( json.RawMessage(`{"session":{"model":"gpt-4o-realtime-preview"}}`), schemas.OpenAI, schemas.RealtimeSessionEndpointClientSecrets, ) if bifrostErr != nil { - t.Fatalf("normalizeRealtimeClientSecretRequest() error = %v", bifrostErr) + t.Fatalf("NormalizeRealtimeClientSecretRequest() error = %v", bifrostErr) } if model != "gpt-4o-realtime-preview" { t.Fatalf("model = %q, want %q", model, "gpt-4o-realtime-preview") @@ -78,13 +78,13 @@ func TestNormalizeRealtimeClientSecretRequestUsesDefaultProvider(t *testing.T) { func TestNormalizeRealtimeSessionsRequest(t *testing.T) { t.Parallel() - body, model, bifrostErr := normalizeRealtimeClientSecretRequest( + body, model, bifrostErr := NormalizeRealtimeClientSecretRequest( json.RawMessage(`{"session":{"model":"openai/gpt-4o-realtime-preview","voice":"alloy"}}`), schemas.OpenAI, schemas.RealtimeSessionEndpointSessions, ) if bifrostErr != nil { - t.Fatalf("normalizeRealtimeClientSecretRequest() error = %v", bifrostErr) + t.Fatalf("NormalizeRealtimeClientSecretRequest() error = %v", bifrostErr) } if model != "gpt-4o-realtime-preview" { t.Fatalf("model = %q, want %q", model, "gpt-4o-realtime-preview") diff --git a/core/providers/openrouter/openrouter.go b/core/providers/openrouter/openrouter.go index 36e4ff0566..2bc5162d7c 100644 --- a/core/providers/openrouter/openrouter.go +++ b/core/providers/openrouter/openrouter.go @@ -294,6 +294,7 @@ func (provider *OpenRouterProvider) TextCompletionStream(ctx *schemas.BifrostCon request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -342,6 +343,7 @@ func (provider *OpenRouterProvider) ChatCompletionStream(ctx *schemas.BifrostCon request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), schemas.OpenRouter, @@ -388,6 +390,7 @@ func (provider *OpenRouterProvider) ResponsesStream(ctx *schemas.BifrostContext, request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), diff --git a/core/providers/parasail/parasail.go b/core/providers/parasail/parasail.go index ae4cb22ab7..e0d6a84c13 100644 --- a/core/providers/parasail/parasail.go +++ b/core/providers/parasail/parasail.go @@ -128,6 +128,7 @@ func (provider *ParasailProvider) ChatCompletionStream(ctx *schemas.BifrostConte request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), schemas.Parasail, diff --git a/core/providers/perplexity/perplexity.go b/core/providers/perplexity/perplexity.go index addb6a5fb8..52d498e198 100644 --- a/core/providers/perplexity/perplexity.go +++ b/core/providers/perplexity/perplexity.go @@ -202,6 +202,7 @@ func (provider *PerplexityProvider) ChatCompletionStream(ctx *schemas.BifrostCon request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), schemas.Perplexity, diff --git a/core/providers/replicate/replicate.go b/core/providers/replicate/replicate.go index 652f99c63b..bd469a4f00 100644 --- a/core/providers/replicate/replicate.go +++ b/core/providers/replicate/replicate.go @@ -593,7 +593,7 @@ func (provider *ReplicateProvider) TextCompletionStream(ctx *schemas.BifrostCont } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) @@ -627,11 +627,11 @@ func (provider *ReplicateProvider) TextCompletionStream(ctx *schemas.BifrostCont eventType, eventDataBytes, readErr := sseReader.ReadEvent() if readErr != nil { + // If context was cancelled/timed out, let defer handle it + if ctx.Err() != nil { + return + } if readErr != io.EOF { - // If context was cancelled/timed out, let defer handle it - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) provider.logger.Warn("Error reading stream: %v", readErr) enrichedErr := providerUtils.EnrichError(ctx, providerUtils.NewBifrostOperationError(schemas.ErrProviderDoRequest, readErr), jsonData, nil, provider.sendBackRawRequest, provider.sendBackRawResponse) @@ -932,7 +932,7 @@ func (provider *ReplicateProvider) ChatCompletionStream(ctx *schemas.BifrostCont } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) @@ -966,11 +966,11 @@ func (provider *ReplicateProvider) ChatCompletionStream(ctx *schemas.BifrostCont eventType, eventDataBytes, readErr := sseReader.ReadEvent() if readErr != nil { + // If context was cancelled/timed out, let defer handle it + if ctx.Err() != nil { + return + } if readErr != io.EOF { - // If context was cancelled/timed out, let defer handle it - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) provider.logger.Warn("Error reading stream: %v", readErr) enrichedErr := providerUtils.EnrichError(ctx, providerUtils.NewBifrostOperationError(schemas.ErrProviderDoRequest, readErr), jsonData, nil, provider.sendBackRawRequest, provider.sendBackRawResponse) @@ -1273,7 +1273,7 @@ func (provider *ReplicateProvider) ResponsesStream(ctx *schemas.BifrostContext, // Make the streaming request streamErr := provider.streamingClient.Do(req, resp) if streamErr != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(streamErr, context.Canceled) { return nil, providerUtils.EnrichError(ctx, &schemas.BifrostError{ IsBifrostError: false, @@ -1295,7 +1295,7 @@ func (provider *ReplicateProvider) ResponsesStream(ctx *schemas.BifrostContext, // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) body := resp.Body() return nil, providerUtils.EnrichError(ctx, parseReplicateError(body, resp.StatusCode()), jsonData, nil, provider.sendBackRawRequest, provider.sendBackRawResponse) } @@ -1326,7 +1326,7 @@ func (provider *ReplicateProvider) ResponsesStream(ctx *schemas.BifrostContext, } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) @@ -1376,10 +1376,10 @@ func (provider *ReplicateProvider) ResponsesStream(ctx *schemas.BifrostContext, eventType, eventDataBytes, readErr := sseReader.ReadEvent() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) provider.logger.Warn("Error reading stream: %v", readErr) bifrostErr := providerUtils.NewBifrostOperationError(schemas.ErrProviderDoRequest, readErr) @@ -1906,7 +1906,7 @@ func (provider *ReplicateProvider) ImageGenerationStream(ctx *schemas.BifrostCon } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) @@ -1943,10 +1943,10 @@ func (provider *ReplicateProvider) ImageGenerationStream(ctx *schemas.BifrostCon eventType, eventDataBytes, readErr := sseReader.ReadEvent() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) provider.logger.Warn(fmt.Sprintf("Error reading SSE stream: %v", readErr)) enrichedErr := providerUtils.EnrichError(ctx, providerUtils.NewBifrostOperationError(schemas.ErrProviderDoRequest, readErr), jsonData, nil, sendBackRawRequest, sendBackRawResponse) @@ -2312,7 +2312,7 @@ func (provider *ReplicateProvider) ImageEditStream(ctx *schemas.BifrostContext, } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) @@ -2349,10 +2349,10 @@ func (provider *ReplicateProvider) ImageEditStream(ctx *schemas.BifrostContext, eventType, eventDataBytes, readErr := sseReader.ReadEvent() if readErr != nil { + if ctx.Err() != nil { + return + } if readErr != io.EOF { - if errors.Is(readErr, context.Canceled) { - return - } enrichedErr := providerUtils.EnrichError(ctx, providerUtils.NewBifrostOperationError("stream read error", readErr), jsonData, nil, sendBackRawRequest, sendBackRawResponse) ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) providerUtils.ProcessAndSendBifrostError(ctx, postHookRunner, enrichedErr, responseChan, provider.logger, postHookSpanFinalizer) diff --git a/core/providers/replicate/utils.go b/core/providers/replicate/utils.go index 1d88337539..336dbbb057 100644 --- a/core/providers/replicate/utils.go +++ b/core/providers/replicate/utils.go @@ -110,7 +110,7 @@ func listenToReplicateStreamURL( fasthttp.ReleaseRequest(req) if err != nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, nil, &schemas.BifrostError{ IsBifrostError: false, @@ -134,7 +134,7 @@ func listenToReplicateStreamURL( // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, nil, parseReplicateError(resp.Body(), resp.StatusCode()) } diff --git a/core/providers/sgl/sgl.go b/core/providers/sgl/sgl.go index f47c7b34e4..e35885983b 100644 --- a/core/providers/sgl/sgl.go +++ b/core/providers/sgl/sgl.go @@ -163,6 +163,7 @@ func (provider *SGLProvider) TextCompletionStream(ctx *schemas.BifrostContext, p request, nil, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -216,6 +217,7 @@ func (provider *SGLProvider) ChatCompletionStream(ctx *schemas.BifrostContext, p request, nil, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), schemas.SGL, diff --git a/core/providers/utils/utils.go b/core/providers/utils/utils.go index b46100b0ab..09b3c3a73a 100644 --- a/core/providers/utils/utils.go +++ b/core/providers/utils/utils.go @@ -27,7 +27,7 @@ import ( "github.com/bytedance/sonic" "github.com/maximhq/bifrost/core/network" - schemas "github.com/maximhq/bifrost/core/schemas" + "github.com/maximhq/bifrost/core/schemas" "github.com/tidwall/gjson" "github.com/tidwall/sjson" "github.com/valyala/fasthttp" @@ -2041,7 +2041,7 @@ func EnsureStreamFinalizerCalled(ctx context.Context, finalizer func(context.Con // Returns a cleanup function that MUST be called when streaming is done to // prevent the goroutine from closing the stream during normal operation. // Works with both fasthttp's BodyStream() (io.Reader) and net/http's resp.Body (io.ReadCloser). -func SetupStreamCancellation(ctx context.Context, bodyStream io.Reader, logger schemas.Logger) (cleanup func()) { +func SetupStreamCancellation(ctx *schemas.BifrostContext, bodyStream io.Reader, logger schemas.Logger) (cleanup func()) { done := make(chan struct{}) closed := make(chan struct{}) @@ -2054,6 +2054,12 @@ func SetupStreamCancellation(ctx context.Context, bodyStream io.Reader, logger s if err := closer.Close(); err != nil { getLogger().Debug(fmt.Sprintf("Error closing body stream on context done: %v", err)) } + ctx.SetValue(schemas.BifrostContextKeyConnectionClosed, true) + } else if wce, ok := bodyStream.(streamCloserWithError); ok { + if err := wce.CloseWithError(ctx.Err()); err != nil { + getLogger().Debug(fmt.Sprintf("Error closing body stream on context done: %v", err)) + } + ctx.SetValue(schemas.BifrostContextKeyConnectionClosed, true) } case <-done: // If context was also cancelled (race between done and ctx.Done), @@ -2062,12 +2068,20 @@ func SetupStreamCancellation(ctx context.Context, bodyStream io.Reader, logger s if closer, ok := bodyStream.(io.Closer); ok { if err := closer.Close(); err != nil { getLogger().Debug(fmt.Sprintf("Error closing body stream on done with cancelled context: %v", err)) + } else { + ctx.SetValue(schemas.BifrostContextKeyConnectionClosed, true) } + } else if wce, ok := bodyStream.(streamCloserWithError); ok { + if err := wce.CloseWithError(ctx.Err()); err != nil { + getLogger().Debug(fmt.Sprintf("Error closing body stream on done with cancelled context: %v", err)) + } else { + ctx.SetValue(schemas.BifrostContextKeyConnectionClosed, true) + } + } } } }() - return func() { close(done) <-closed // Wait for goroutine to finish closing the stream before ReleaseStreamingResponse drains @@ -2102,6 +2116,23 @@ func GetStreamIdleTimeout(ctx *schemas.BifrostContext) time.Duration { return DefaultStreamIdleTimeout } +// streamCloserWithError is implemented by fasthttp's streaming body reader. +// Calling CloseWithError with a non-nil error closes the underlying TCP +// connection, interrupting any blocked Read. +type streamCloserWithError interface { + CloseWithError(err error) error +} + +// closeBodyStream closes bodyStream using whatever interface it supports: +// io.Closer for net/http responses, streamCloserWithError for fasthttp. +func closeBodyStream(bodyStream io.Reader, err error) { + if closer, ok := bodyStream.(io.Closer); ok { + closer.Close() + } else if wce, ok := bodyStream.(streamCloserWithError); ok { + wce.CloseWithError(err) + } +} + // idleTimeoutReader wraps an io.Reader and closes the underlying body stream // if no data arrives within the configured timeout. This unblocks any pending // Read() call on the wrapped reader. @@ -2111,12 +2142,16 @@ type idleTimeoutReader struct { timeout time.Duration timer *time.Timer once sync.Once + fired atomic.Bool // set true when the idle timer fires } // NewIdleTimeoutReader wraps reader with idle detection. If reader.Read() returns // no data for the given timeout duration, bodyStream is closed to unblock the read. -// bodyStream must implement io.Closer for the timeout to take effect; if it does not, -// the wrapper still functions but cannot force-close the stream. +// Supports both io.Closer and fasthttp's CloseWithError interface — the latter +// closes the underlying TCP connection when called with a non-nil error, which is +// required to interrupt a blocked Read on fasthttp streaming responses. +// When the timer fires, any subsequent error from Read is translated to +// ErrStreamIdleTimeout so callers do not need per-handler error checks. // Returns the wrapped reader and a cleanup function that MUST be called (via defer) // when streaming is complete, to stop the timer and prevent premature closure. func NewIdleTimeoutReader(reader io.Reader, bodyStream io.Reader, timeout time.Duration) (io.Reader, func()) { @@ -2130,9 +2165,8 @@ func NewIdleTimeoutReader(reader io.Reader, bodyStream io.Reader, timeout time.D } r.timer = time.AfterFunc(timeout, func() { r.once.Do(func() { - if closer, ok := r.bodyStream.(io.Closer); ok { - closer.Close() - } + r.fired.Store(true) + closeBodyStream(r.bodyStream, ErrStreamIdleTimeout) }) }) return r, func() { r.timer.Stop() } @@ -2143,9 +2177,16 @@ func (r *idleTimeoutReader) Read(p []byte) (int, error) { if n > 0 { r.timer.Reset(r.timeout) } + if err != nil && err != io.EOF && r.fired.Load() { + return n, ErrStreamIdleTimeout + } return n, err } +// ErrStreamIdleTimeout is returned when no data is received within the configured +// stream_idle_timeout_in_seconds window. +var ErrStreamIdleTimeout = errors.New("stream idle timeout: no data received within configured window") + // HandleStreamCancellation should be called when a streaming goroutine exits // due to context cancellation. It ensures proper cleanup by: // 1. Checking if StreamEndIndicator was already set (to avoid duplicate handling) @@ -2170,7 +2211,7 @@ func HandleStreamCancellation( } // Create cancellation error cancelErr := &schemas.BifrostError{ - StatusCode: schemas.Ptr(499), // Client Closed Request + StatusCode: new(499), // Client Closed Request Error: &schemas.ErrorField{ Message: "Request cancelled: client disconnected", Type: schemas.Ptr(schemas.RequestCancelled), @@ -2373,14 +2414,19 @@ func ProviderIsResponsesAPINative(providerName schemas.ModelProvider) bool { } // ReleaseStreamingResponse releases a streaming response by draining the body stream and releasing the response. -func ReleaseStreamingResponse(resp *fasthttp.Response) { +func ReleaseStreamingResponse(ctx *schemas.BifrostContext, resp *fasthttp.Response) { defer func() { if r := recover(); r != nil { - getLogger().Error("recovered panic in ReleaseStreamingResponse: %v", r) + getLogger().Debug("stream already closed before drain in ReleaseStreamingResponse: %v\n", r) } // Always release the response to prevent leaks, even after a panic fasthttp.ReleaseResponse(resp) }() + // First we will check if the connection is already closed + // In that case we won't drain the body stream, as it is already closed + if closed, ok := ctx.Value(schemas.BifrostContextKeyConnectionClosed).(bool); ok && closed { + return + } // Drain any remaining data from the body stream before releasing. // This prevents "whitespace in header" errors when the connection is reused // (see: https://github.com/valyala/fasthttp/issues/1743). @@ -2729,6 +2775,11 @@ func GetBudgetTokensFromReasoningEffort( budget := minBudgetTokens + int(ratio*float64(maxTokens-minBudgetTokens)) + // Both Anthropic and Bedrock require budget_tokens < max_tokens (strict). + if budget >= maxTokens { + budget = maxTokens - 1 + } + return budget, nil } diff --git a/core/providers/utils/utils_test.go b/core/providers/utils/utils_test.go index 223d341509..a5d6ae7d0c 100644 --- a/core/providers/utils/utils_test.go +++ b/core/providers/utils/utils_test.go @@ -1439,3 +1439,106 @@ func TestShouldSendBackRawResponse(t *testing.T) { }) } } + +func TestGetBudgetTokensFromReasoningEffort(t *testing.T) { + const min = 1024 + const max = 16000 + + tests := []struct { + effort string + wantErr bool + check func(t *testing.T, budget int) + }{ + { + effort: "none", + check: func(t *testing.T, budget int) { assertEqual(t, 0, budget, "none effort") }, + }, + { + effort: "minimal", + check: func(t *testing.T, budget int) { + assertRange(t, min, max-1, budget, "minimal") + }, + }, + { + effort: "low", + check: func(t *testing.T, budget int) { + assertRange(t, min, max-1, budget, "low") + }, + }, + { + effort: "medium", + check: func(t *testing.T, budget int) { + assertRange(t, min, max-1, budget, "medium") + }, + }, + { + effort: "high", + check: func(t *testing.T, budget int) { + assertRange(t, min, max-1, budget, "high") + }, + }, + { + effort: "xhigh", + check: func(t *testing.T, budget int) { + assertRange(t, min, max-1, budget, "xhigh") + }, + }, + { + // "max" with ratio=1.0 would produce budget==maxTokens without the cap. + // Bedrock and Anthropic both require budget_tokens < max_tokens (strict). + effort: "max", + check: func(t *testing.T, budget int) { + if budget >= max { + t.Errorf("max effort: budget %d must be < maxTokens %d", budget, max) + } + assertEqual(t, max-1, budget, "max effort caps at maxTokens-1") + }, + }, + { + effort: "unknown", + check: func(t *testing.T, budget int) { + assertRange(t, min, max-1, budget, "unknown effort uses safe default") + }, + }, + { + // minBudgetTokens > maxTokens — always an error + effort: "high", + wantErr: true, + }, + } + + for i, tt := range tests { + t.Run(fmt.Sprintf("%d_%s", i, tt.effort), func(t *testing.T) { + maxTokens := max + minTokens := min + if tt.wantErr { + minTokens = max + 1 + } + budget, err := GetBudgetTokensFromReasoningEffort(tt.effort, minTokens, maxTokens) + if tt.wantErr { + if err == nil { + t.Errorf("expected error when minBudgetTokens > maxTokens, got none") + } + return + } + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + tt.check(t, budget) + }) + } +} + +func assertEqual(t *testing.T, want, got int, label string) { + t.Helper() + if got != want { + t.Errorf("%s: got %d, want %d", label, got, want) + } +} + +func assertRange(t *testing.T, low, high, got int, label string) { + t.Helper() + if got < low || got > high { + t.Errorf("%s: got %d, want in [%d, %d]", label, got, low, high) + } +} diff --git a/core/providers/vertex/vertex.go b/core/providers/vertex/vertex.go index 4f37abda6b..3f6e9aac51 100644 --- a/core/providers/vertex/vertex.go +++ b/core/providers/vertex/vertex.go @@ -844,6 +844,7 @@ func (provider *VertexProvider) ChatCompletionStream(ctx *schemas.BifrostContext jsonData, headers, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, provider.networkConfig.BetaHeaderOverrides, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), @@ -975,6 +976,7 @@ func (provider *VertexProvider) ChatCompletionStream(ctx *schemas.BifrostContext request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), providerName, @@ -1305,6 +1307,7 @@ func (provider *VertexProvider) ResponsesStream(ctx *schemas.BifrostContext, pos jsonBody, headers, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, provider.networkConfig.BetaHeaderOverrides, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), @@ -3063,13 +3066,13 @@ func (provider *VertexProvider) PassthroughStream( tokenSource, err := getAuthTokenSource(key) if err != nil { removeVertexClient(key.VertexKeyConfig.AuthCredentials.GetValue()) - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.NewBifrostOperationError("error creating auth token source", err) } token, err := tokenSource.Token() if err != nil { removeVertexClient(key.VertexKeyConfig.AuthCredentials.GetValue()) - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.NewBifrostOperationError("error getting token", err) } fasthttpReq.Header.Set("Authorization", "Bearer "+token.AccessToken) @@ -3098,7 +3101,7 @@ func (provider *VertexProvider) PassthroughStream( activeClient := providerUtils.PrepareResponseStreaming(ctx, provider.streamingClient, resp) if err := activeClient.Do(fasthttpReq, resp); err != nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -3124,7 +3127,7 @@ func (provider *VertexProvider) PassthroughStream( bodyStream := resp.BodyStream() if bodyStream == nil { - providerUtils.ReleaseStreamingResponse(resp) + providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, providerUtils.NewBifrostOperationError( "provider returned an empty stream body", fmt.Errorf("provider returned an empty stream body")) @@ -3154,7 +3157,7 @@ func (provider *VertexProvider) PassthroughStream( } close(ch) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) defer stopIdleTimeout() defer stopCancellation() streamStart := time.Now() @@ -3207,9 +3210,11 @@ func (provider *VertexProvider) PassthroughStream( if ctx.Err() != nil { return // let defer handle cancel/timeout } - ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) - extraFields.Latency = time.Since(streamStart).Milliseconds() - providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, ch, provider.logger, postHookSpanFinalizer) + if readErr != io.EOF { + ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) + extraFields.Latency = time.Since(streamStart).Milliseconds() + providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, ch, provider.logger, postHookSpanFinalizer) + } return } } diff --git a/core/providers/vllm/vllm.go b/core/providers/vllm/vllm.go index 7952e161cd..f2ca725b1a 100644 --- a/core/providers/vllm/vllm.go +++ b/core/providers/vllm/vllm.go @@ -158,6 +158,7 @@ func (provider *VLLMProvider) TextCompletionStream(ctx *schemas.BifrostContext, request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -211,6 +212,7 @@ func (provider *VLLMProvider) ChatCompletionStream(ctx *schemas.BifrostContext, request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -487,7 +489,7 @@ func (provider *VLLMProvider) TranscriptionStream(ctx *schemas.BifrostContext, p // Make the request err := provider.streamingClient.Do(req, resp) if err != nil { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) if errors.Is(err, context.Canceled) { return nil, &schemas.BifrostError{ IsBifrostError: false, @@ -509,7 +511,7 @@ func (provider *VLLMProvider) TranscriptionStream(ctx *schemas.BifrostContext, p // Check for HTTP errors if resp.StatusCode() != fasthttp.StatusOK { - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) return nil, openai.ParseOpenAIError(resp) } @@ -536,7 +538,7 @@ func (provider *VLLMProvider) TranscriptionStream(ctx *schemas.BifrostContext, p } close(responseChan) }() - defer providerUtils.ReleaseStreamingResponse(resp) + defer providerUtils.ReleaseStreamingResponse(ctx, resp) // Decompress gzip-encoded streams transparently (no-op for non-gzip) reader, releaseGzip := providerUtils.DecompressStreamBody(resp) defer releaseGzip() @@ -565,11 +567,11 @@ func (provider *VLLMProvider) TranscriptionStream(ctx *schemas.BifrostContext, p dataBytes, readErr := sseReader.ReadDataLine() if readErr != nil { + // If context was cancelled/timed out, let defer handle it + if ctx.Err() != nil { + return + } if readErr != io.EOF { - // If context was cancelled/timed out, let defer handle it - if ctx.Err() != nil { - return - } ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) logger.Warn("Error reading stream: %v", readErr) providerUtils.ProcessAndSendError(ctx, postHookRunner, readErr, responseChan, logger, postHookSpanFinalizer) diff --git a/core/providers/xai/xai.go b/core/providers/xai/xai.go index e787f307fd..1c9d777a97 100644 --- a/core/providers/xai/xai.go +++ b/core/providers/xai/xai.go @@ -113,6 +113,7 @@ func (provider *XAIProvider) TextCompletionStream(ctx *schemas.BifrostContext, p request, nil, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), @@ -160,6 +161,7 @@ func (provider *XAIProvider) ChatCompletionStream(ctx *schemas.BifrostContext, p request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), schemas.XAI, @@ -205,6 +207,7 @@ func (provider *XAIProvider) ResponsesStream(ctx *schemas.BifrostContext, postHo request, authHeader, provider.networkConfig.ExtraHeaders, + provider.networkConfig.StreamIdleTimeoutInSeconds, providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest), providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse), provider.GetProviderKey(), diff --git a/core/schemas/batch.go b/core/schemas/batch.go index dbd9fe580b..0f7bdd2f6c 100644 --- a/core/schemas/batch.go +++ b/core/schemas/batch.go @@ -127,6 +127,11 @@ type BifrostBatchCreateResponse struct { // Gemini-specific (operation response) OperationName *string `json:"operation_name,omitempty"` + // Azure-specific Blob Storage URLs (returned when using blob storage input/output) + InputBlob *string `json:"input_blob,omitempty"` + OutputBlob *string `json:"output_blob,omitempty"` + ErrorBlob *string `json:"error_blob,omitempty"` + ExtraFields BifrostResponseExtraFields `json:"extra_fields"` } @@ -214,6 +219,11 @@ type BifrostBatchRetrieveResponse struct { Done *bool `json:"done,omitempty"` Progress *int `json:"progress,omitempty"` // Percentage progress + // Azure-specific Blob Storage URLs (returned when using blob storage input/output) + InputBlob *string `json:"input_blob,omitempty"` + OutputBlob *string `json:"output_blob,omitempty"` + ErrorBlob *string `json:"error_blob,omitempty"` + ExtraFields BifrostResponseExtraFields `json:"extra_fields"` } diff --git a/core/schemas/bifrost.go b/core/schemas/bifrost.go index 0c2352e53e..5266fe41e8 100644 --- a/core/schemas/bifrost.go +++ b/core/schemas/bifrost.go @@ -251,6 +251,8 @@ const ( BifrostContextKeyRealtimeProviderSessionID BifrostContextKey = "bifrost-realtime-provider-session-id" // string BifrostContextKeyRealtimeSource BifrostContextKey = "bifrost-realtime-source" // string ("ei" or "lm") BifrostContextKeyRealtimeEventType BifrostContextKey = "bifrost-realtime-event-type" // string + BifrostContextKeyRealtimeTransport BifrostContextKey = "bifrost-realtime-transport" // string ("websocket" or "webrtc") + BifrostContextKeyRealtimeVoice BifrostContextKey = "bifrost-realtime-voice" // string BifrostIsAsyncRequest BifrostContextKey = "bifrost-is-async-request" // bool (set by bifrost - DO NOT SET THIS MANUALLY)) - whether the request is an async request (only used in gateway) BifrostContextKeyRequestHeaders BifrostContextKey = "bifrost-request-headers" // map[string]string (all request headers with lowercased keys) BifrostContextKeyAllowPerRequestStorageOverride BifrostContextKey = "bifrost-allow-per-request-storage-override" // bool (set by transport from config — gates whether x-bf-disable-content-logging and x-bf-store-raw-request-response per-request overrides are honored) @@ -299,6 +301,7 @@ const ( IsAPIKeyAuthContextKey BifrostContextKey = "is_api_key_auth" IsLocalAdminContextKey BifrostContextKey = "is_local_admin" // bool (set by auth middleware when password-based auth succeeds - local admin user bypasses RBAC) BifrostContextKeyPassthroughOverridesPresent BifrostContextKey = "passthrough_overrides_present" // bool (set by HTTP transport) - passthrough raw request requested + BifrostContextKeyConnectionClosed BifrostContextKey = "connection_closed" ) const ( @@ -1412,6 +1415,10 @@ type BifrostCacheDebug struct { // Semantic cache only (only when cache is hit) Threshold *float64 `json:"threshold,omitempty"` Similarity *float64 `json:"similarity,omitempty"` + + // CacheHitLatency is the time in milliseconds spent serving the cache hit + // (lookup + response build). Only set when CacheHit is true. + CacheHitLatency *int64 `json:"cache_hit_latency,omitempty"` } const ( diff --git a/core/schemas/chatcompletions.go b/core/schemas/chatcompletions.go index 1f1932c5a5..9ddfbeb3b9 100644 --- a/core/schemas/chatcompletions.go +++ b/core/schemas/chatcompletions.go @@ -1447,7 +1447,7 @@ type ChatAssistantMessageToolCall struct { // ChatAssistantMessageToolCallFunction represents a call to a function. type ChatAssistantMessageToolCallFunction struct { Name *string `json:"name"` - Arguments string `json:"arguments"` // stringified json as retured by OpenAI, might not be a valid JSON always + Arguments string `json:"arguments"` // stringified json as returned by OpenAI, might not be a valid JSON always } // ChatAudioMessageAudio represents audio data in a message. diff --git a/core/schemas/context.go b/core/schemas/context.go index e7cace6def..14bcd18643 100644 --- a/core/schemas/context.go +++ b/core/schemas/context.go @@ -132,6 +132,41 @@ func (bc *BifrostContext) WithValue(key any, value any) *BifrostContext { return bc } +// Root returns the underlying root BifrostContext. For root contexts this is +// the receiver itself; for plugin-scoped contexts it is the underlying root +// that scoped Value/SetValue calls delegate to. +// +// PLUGIN AUTHORS: capture Root() synchronously inside Pre/PostLLMHook (or +// any other hook) when you need to write to the context from a goroutine +// that outlives the hook. The plugin-scoped *BifrostContext passed into your +// hook is reclaimed by an internal sync.Pool the moment the hook returns — +// any later SetValue/Value call on it lands in detached storage that nobody +// downstream can read (and can leak into a future pool reuse). The root, +// in contrast, lives for the entire request, so a pointer captured here is +// safe to use for the lifetime of the request even after your hook returns. +// +// Example: +// +// func (p *Plugin) PreLLMHook(ctx *schemas.BifrostContext, req ...) (...) { +// rootCtx := ctx.Root() // capture before the scope is released +// go func() { +// // ... long-running work that produces stream chunks ... +// rootCtx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) +// }() +// return req, &schemas.LLMPluginShortCircuit{Stream: ch}, nil +// } +func (bc *BifrostContext) Root() *BifrostContext { + // Unwrap the full delegation chain. A scoped context can in principle be + // derived from another scoped context (e.g. nested plugin scopes), and + // stopping at the first valueDelegate would return an intermediate pooled + // scope — which loses the async-safety guarantee as soon as that + // intermediate scope is released. + for bc != nil && bc.valueDelegate != nil { + bc = bc.valueDelegate + } + return bc +} + // BlockRestrictedWrites returns true if restricted writes are blocked. func (bc *BifrostContext) BlockRestrictedWrites() { bc.blockRestrictedWrites.Store(true) diff --git a/core/schemas/context_test.go b/core/schemas/context_test.go index 4d75f50528..da6f368188 100644 --- a/core/schemas/context_test.go +++ b/core/schemas/context_test.go @@ -376,6 +376,40 @@ func TestPluginLog_PoolReuse(t *testing.T) { } } +// TestRoot_UnwrapsChainedValueDelegates verifies Root() walks the entire +// delegate chain. A naive single-step unwrap would return an intermediate +// pooled scope, which loses the async-safety guarantee as soon as that +// intermediate scope is recycled. +func TestRoot_UnwrapsChainedValueDelegates(t *testing.T) { + root := NewBifrostContext(context.Background(), NoDeadline) + + a := "outer" + b := "inner" + outer := root.WithPluginScope(&a) + // Manually build a second scoped context whose delegate is the first + // scoped context — simulates a plugin that derives its own scope from + // an already-scoped ctx. + inner := &BifrostContext{ + parent: outer.parent, + done: outer.done, + pluginScope: &b, + valueDelegate: outer, + } + + got := inner.Root() + if got != root { + t.Fatalf("Root() did not walk the chain to the request root: got %p, want %p", got, root) + } + if got.valueDelegate != nil { + t.Fatalf("Root() returned a context with a non-nil valueDelegate: %+v", got) + } + + // Sanity: Root() on a non-scoped context returns itself. + if root.Root() != root { + t.Fatal("Root() on a non-scoped context should return the receiver") + } +} + // TestNewBifrostContext_DerivedFromReleasedScope_NoPanic locks in the // deterministic half of the scoped-parent-release bug: a derived BifrostContext // must not deref a pool-released scoped ancestor when its accessors are called. diff --git a/core/schemas/provider.go b/core/schemas/provider.go index 80f6bf3d91..1937210cb5 100644 --- a/core/schemas/provider.go +++ b/core/schemas/provider.go @@ -226,7 +226,7 @@ var DefaultNetworkConfig = NetworkConfig{ // ConcurrencyAndBufferSize represents configuration for concurrent operations and buffer sizes. type ConcurrencyAndBufferSize struct { - Concurrency int `json:"concurrency"` // Number of concurrent operations. Also used as the initial pool size for the provider reponses. + Concurrency int `json:"concurrency"` // Number of concurrent operations. Also used as the initial pool size for the provider responses. BufferSize int `json:"buffer_size"` // Size of the buffer } diff --git a/core/schemas/realtime.go b/core/schemas/realtime.go index ec4fd6789d..cb4004582a 100644 --- a/core/schemas/realtime.go +++ b/core/schemas/realtime.go @@ -181,7 +181,7 @@ type RealtimeSessionRoute struct { type RealtimeProvider interface { SupportsRealtimeAPI() bool RealtimeWebSocketURL(key Key, model string) string - RealtimeHeaders(key Key) map[string]string + RealtimeHeaders(ctx *BifrostContext, key Key) (map[string]string, *BifrostError) // SupportsRealtimeWebRTC reports whether the provider supports WebRTC SDP exchange. SupportsRealtimeWebRTC() bool // ExchangeRealtimeWebRTCSDP performs the provider-specific SDP signaling exchange. diff --git a/core/schemas/responses.go b/core/schemas/responses.go index 2af5ff47c1..0e1b7ac637 100644 --- a/core/schemas/responses.go +++ b/core/schemas/responses.go @@ -786,6 +786,10 @@ type ResponsesMessage struct { ID *string `json:"id,omitempty"` // Common ID field for most item types Type *ResponsesMessageType `json:"type,omitempty"` Status *string `json:"status,omitempty"` // "in_progress" | "completed" | "incomplete" | "interpreting" | "failed" + // Phase labels an assistant message as intermediate "commentary" or completed "final_answer". + // Required on gpt-5.3-codex+ history replay; dropping it causes significant performance degradation. + // See https://developers.openai.com/api/docs/guides/prompt-guidance + Phase *string `json:"phase,omitempty"` Role *ResponsesMessageRoleType `json:"role,omitempty"` Content *ResponsesMessageContent `json:"content,omitempty"` @@ -2477,14 +2481,23 @@ type BifrostResponsesStreamResponse struct { OutputIndex *int `json:"output_index,omitempty"` Item *ResponsesMessage `json:"item"` + // SummaryIndex identifies which summary block within an item a delta belongs to. + // Emitted on response.reasoning_summary_text.{delta,done} and + // response.reasoning_summary_part.{added,done}. + // See https://platform.openai.com/docs/api-reference/responses-streaming + SummaryIndex *int `json:"summary_index,omitempty"` ContentIndex *int `json:"content_index,omitempty"` ItemID *string `json:"item_id,omitempty"` Part *ResponsesMessageContentBlock `json:"part,omitempty"` - Delta *string `json:"delta,omitempty"` - Signature *string `json:"signature,omitempty"` // Not in OpenAI's spec, but sent by other providers - LogProbs []ResponsesOutputMessageContentTextLogProb `json:"logprobs"` + Delta *string `json:"delta,omitempty"` + Signature *string `json:"signature,omitempty"` // Not in OpenAI's spec, but sent by other providers + // Obfuscation is random padding added to delta events to normalize payload size as a + // side-channel mitigation. Toggle via StreamOptions.IncludeObfuscation. + // See https://platform.openai.com/docs/api-reference/responses-streaming + Obfuscation *string `json:"obfuscation,omitempty"` + LogProbs []ResponsesOutputMessageContentTextLogProb `json:"logprobs"` Text *string `json:"text,omitempty"` // Full text of the output item, comes with event "response.output_text.done" @@ -2531,11 +2544,13 @@ func (resp *BifrostResponsesStreamResponse) WithDefaults() *BifrostResponsesStre // Copy all streaming-specific fields result.OutputIndex = resp.OutputIndex result.Item = resp.Item + result.SummaryIndex = resp.SummaryIndex result.ContentIndex = resp.ContentIndex result.ItemID = resp.ItemID result.Part = resp.Part result.Delta = resp.Delta result.Signature = resp.Signature + result.Obfuscation = resp.Obfuscation result.Text = resp.Text result.Refusal = resp.Refusal result.Arguments = resp.Arguments diff --git a/core/schemas/responses_test.go b/core/schemas/responses_test.go new file mode 100644 index 0000000000..778902473d --- /dev/null +++ b/core/schemas/responses_test.go @@ -0,0 +1,73 @@ +package schemas + +import ( + "strings" + "testing" +) + +func TestBifrostResponsesStreamResponsePreservesOpenAIStreamMetadata(t *testing.T) { + raw := []byte(`{"type":"response.reasoning_summary_text.delta","delta":"thinking","item_id":"rs_123","obfuscation":"opaque","output_index":0,"sequence_number":4,"summary_index":0}`) + + var resp BifrostResponsesStreamResponse + if err := Unmarshal(raw, &resp); err != nil { + t.Fatalf("unmarshal response stream chunk: %v", err) + } + + if resp.SummaryIndex == nil || *resp.SummaryIndex != 0 { + t.Fatalf("expected summary_index to survive unmarshal, got %#v", resp.SummaryIndex) + } + if resp.Obfuscation == nil || *resp.Obfuscation != "opaque" { + t.Fatalf("expected obfuscation to survive unmarshal, got %#v", resp.Obfuscation) + } + + defaulted := resp.WithDefaults() + if defaulted.SummaryIndex == nil || *defaulted.SummaryIndex != 0 { + t.Fatalf("expected summary_index to survive WithDefaults, got %#v", defaulted.SummaryIndex) + } + if defaulted.Obfuscation == nil || *defaulted.Obfuscation != "opaque" { + t.Fatalf("expected obfuscation to survive WithDefaults, got %#v", defaulted.Obfuscation) + } + + encoded, err := MarshalSorted(defaulted) + if err != nil { + t.Fatalf("marshal defaulted response stream chunk: %v", err) + } + if !strings.Contains(string(encoded), `"summary_index":0`) { + t.Fatalf("expected encoded chunk to contain summary_index, got %s", encoded) + } + if !strings.Contains(string(encoded), `"obfuscation":"opaque"`) { + t.Fatalf("expected encoded chunk to contain obfuscation, got %s", encoded) + } + + encodedChunk, err := MarshalSorted(BifrostStreamChunk{BifrostResponsesStreamResponse: defaulted}) + if err != nil { + t.Fatalf("marshal response stream chunk wrapper: %v", err) + } + if !strings.Contains(string(encodedChunk), `"summary_index":0`) { + t.Fatalf("expected encoded stream chunk to contain summary_index, got %s", encodedChunk) + } + if !strings.Contains(string(encodedChunk), `"obfuscation":"opaque"`) { + t.Fatalf("expected encoded stream chunk to contain obfuscation, got %s", encodedChunk) + } +} + +func TestResponsesMessagePreservesOpenAIPhase(t *testing.T) { + raw := []byte(`{"id":"msg_123","type":"message","status":"in_progress","content":[],"phase":"final_answer","role":"assistant"}`) + + var msg ResponsesMessage + if err := Unmarshal(raw, &msg); err != nil { + t.Fatalf("unmarshal responses message: %v", err) + } + + if msg.Phase == nil || *msg.Phase != "final_answer" { + t.Fatalf("expected phase to survive unmarshal, got %#v", msg.Phase) + } + + encoded, err := MarshalSorted(msg) + if err != nil { + t.Fatalf("marshal responses message: %v", err) + } + if !strings.Contains(string(encoded), `"phase":"final_answer"`) { + t.Fatalf("expected encoded message to contain phase, got %s", encoded) + } +} diff --git a/core/schemas/utils.go b/core/schemas/utils.go index 169fb1238e..189bd72e15 100644 --- a/core/schemas/utils.go +++ b/core/schemas/utils.go @@ -1263,6 +1263,10 @@ func IsNovaModel(model string) bool { return strings.Contains(model, "nova") } +func IsNova2Model(model string) bool { + return strings.Contains(model, "nova-2") && (strings.Contains(model, "lite") || strings.Contains(model, "sonic")) +} + // IsAnthropicModel checks if the model is an Anthropic model. func IsAnthropicModel(model string) bool { return strings.Contains(model, "anthropic.") || strings.Contains(model, "claude") diff --git a/docs/deployment-guides/config-json/cluster.mdx b/docs/deployment-guides/config-json/cluster.mdx index 13d1796913..e86bc14dcf 100644 --- a/docs/deployment-guides/config-json/cluster.mdx +++ b/docs/deployment-guides/config-json/cluster.mdx @@ -9,15 +9,18 @@ icon: "circle-nodes" -`cluster_config` enables multi-node Bifrost enterprise clustering with gossip-based membership and optional automatic node discovery. +`cluster_config` enables multi-node Bifrost enterprise clustering. The `type` field selects how nodes form a cluster: -You can form a cluster in two ways: +- **`mesh`** (default) - peer-to-peer membership over gossip, with optional automatic discovery. Requires nodes to reach each other directly. +- **`broker`** - every node makes a single outbound connection to a central broker that relays messages between nodes. Use this on platforms without peer-to-peer connectivity (e.g. Google Cloud Run). See [Broker Mode](#broker-mode) below. + +In `mesh` mode you can form a cluster in two ways: - Define static `peers` (`host:port`) - Enable `discovery` with one of: `kubernetes`, `dns`, `udp`, `consul`, `etcd`, `mdns` -At least one of `peers` or `discovery.enabled: true` must be configured when `cluster_config.enabled` is true. +In `mesh` mode, at least one of `peers` or `discovery.enabled: true` must be configured when `cluster_config.enabled` is true. In `broker` mode, `broker.address` is required and `peers`/`discovery`/`gossip` are ignored. --- @@ -102,6 +105,43 @@ For version 1.4.x - you will need to expose 10102 TCP port and 10101 UDP port fo --- +## Broker Mode + +In `broker` mode, nodes do not connect to each other. Each node opens a single +outbound stream to a central broker process, which relays every message to all +other connected nodes and pushes roster updates. Because nodes only need +outbound connectivity, broker mode works on platforms where peer-to-peer +networking is unavailable, such as Google Cloud Run. + +```json +{ + "cluster_config": { + "enabled": true, + "type": "broker", + "region": "us-east-1", + "broker": { + "address": "broker.example.run.app:443", + "tls": true, + "auth_token": "your-shared-secret" + } + } +} +``` + +The same Bifrost binary runs as the broker when started with `-mode=broker` +(or `BIFROST_MODE=broker`). The broker process reads `cluster_config.broker` +from the same `config.json` and serves on `broker.listen_port` (default +`50051`); it runs no database, providers, or HTTP gateway. + + +All nodes must connect to the **same** broker process. Run the broker as a +single instance (for Cloud Run, a service pinned to one instance with HTTP/2 +enabled). See [Enterprise Clustering → Broker Mode](/enterprise/clustering#broker-mode) +for the full deployment guide. + + +--- + ## Field Reference ### `cluster_config` @@ -109,10 +149,21 @@ For version 1.4.x - you will need to expose 10102 TCP port and 10101 UDP port fo | Field | Type | Description | |-------|------|-------------| | `enabled` | boolean | Enables cluster mode | +| `type` | string | `mesh` (default) or `broker` | | `region` | string | Region label for this node (defaults to `"unknown"` at runtime when omitted) | -| `peers` | array of strings | Static peer addresses in `host:port` format | -| `gossip` | object | Gossip/memberlist settings | -| `discovery` | object | Automatic node discovery settings | +| `peers` | array of strings | Static peer addresses in `host:port` format (`mesh` mode only) | +| `gossip` | object | Gossip/memberlist settings (`mesh` mode only) | +| `discovery` | object | Automatic node discovery settings (`mesh` mode only) | +| `broker` | object | Broker settings, used when `type` is `broker` | + +### `cluster_config.broker` + +| Field | Type | Description | +|-------|------|-------------| +| `address` | string | `host:port` of the broker that nodes dial (required in `broker` mode) | +| `tls` | boolean | Dial the broker over TLS (set `true` for HTTPS endpoints like Cloud Run) | +| `auth_token` | string | Optional shared secret sent on connect; the broker rejects nodes without it when set | +| `listen_port` | integer | Port the broker process serves on, used only by `-mode=broker` (default `50051`) | ### `cluster_config.gossip` diff --git a/docs/deployment-guides/config-json/guardrails.mdx b/docs/deployment-guides/config-json/guardrails.mdx index efb68a0edb..b782cf3f47 100644 --- a/docs/deployment-guides/config-json/guardrails.mdx +++ b/docs/deployment-guides/config-json/guardrails.mdx @@ -185,7 +185,9 @@ For `auth_type: "default_credential"` (managed identity / Azure CLI - no credent `analyze_severity_threshold` accepts `"low"`, `"medium"`, or `"high"`. - + + +Calls CrowdStrike AIDR's `guard_chat_completions` endpoint for policy-driven AI threat detection, blocking, and redaction. ```json { @@ -193,14 +195,52 @@ For `auth_type: "default_credential"` (managed identity / Azure CLI - no credent "guardrail_providers": [ { "id": 4, + "provider_name": "crowdstrike-aidr", + "policy_name": "crowdstrike-aidr-prod", + "enabled": true, + "timeout": 30, + "config": { + "api_key": "env.CS_AIDR_TOKEN", + "base_url": "env.CS_AIDR_BASE_URL", + "app_id": "bifrost-production", + "collector_instance_id": "prod-us-east-1" + } + } + ] + } +} +``` + +`base_url` is optional and defaults to `https://api.crowdstrike.com/aidr/aiguard`. Bifrost appends `/v1/guard_chat_completions`, so the base URL can be the collector base URL rather than the full endpoint URL. + + + + +```json +{ + "guardrails_config": { + "guardrail_providers": [ + { + "id": 5, "provider_name": "patronus-ai", "policy_name": "patronus-eval", "enabled": true, "timeout": 30, "config": { "api_key": "env.PATRONUS_API_KEY", - "environment": "production", - "sampling_rate": 100 + "base_url": "https://api.patronus.ai", + "evaluators": [ + { + "evaluator": "pii", + "explain_strategy": "on-fail" + }, + { + "evaluator": "judge", + "criteria": "patronus:is-concise", + "explain_strategy": "on-fail" + } + ], + "capture": "none" } } ] @@ -216,7 +256,7 @@ For `auth_type: "default_credential"` (managed identity / Azure CLI - no credent "guardrails_config": { "guardrail_providers": [ { - "id": 5, + "id": 6, "provider_name": "grayswan", "policy_name": "grayswan-jailbreak", "enabled": true, @@ -245,7 +285,7 @@ For `auth_type: "default_credential"` (managed identity / Azure CLI - no credent | Field | Required | Description | |-------|----------|-------------| | `id` | Yes | Unique integer ID - referenced by rules via `provider_config_ids` | -| `provider_name` | Yes | Backend: `"regex"`, `"secrets"`, `"bedrock"`, `"azure"`, `"patronus-ai"`, `"grayswan"` | +| `provider_name` | Yes | Backend: `"regex"`, `"secrets"`, `"bedrock"`, `"azure"`, `"crowdstrike-aidr"`, `"patronus-ai"`, `"grayswan"` | | `policy_name` | Yes | Human-readable policy label | | `enabled` | Yes | `true` to activate | | `timeout` | No | Execution timeout in seconds | @@ -296,14 +336,28 @@ Any field marked **env.\* supported** accepts a bare `"env.VAR_NAME"` string in | `sampling_rate` | No | **Plain only** | `0`–`100`; percentage of requests to evaluate (default: `100`) | | `timeout` | No | **Plain only** | Execution timeout in seconds | +### CrowdStrike AIDR + +| Field | Required | env.\* supported | Notes | +|-------|----------|-----------------|-------| +| `api_key` | Yes | Yes | AIDR collector token | +| `base_url` | No | Yes | AIDR base URL. Defaults to `https://api.crowdstrike.com/aidr/aiguard` | +| `app_id` | No | **Plain only** | Application or service identifier shown in AIDR logs | +| `collector_instance_id` | No | **Plain only** | Deployment or collector instance label shown in AIDR logs | +| `timeout` | No | **Plain only** | Provider execution timeout in seconds | + ### Patronus AI | Field | Required | env.\* supported | Notes | |-------|----------|-----------------|-------| | `api_key` | Yes | Yes | Patronus AI API key | -| `environment` | No | Yes | `"production"` (default) \| `"development"` | -| `sampling_rate` | No | **Plain only** | `0`–`100`; percentage of requests to evaluate (default: `100`) | -| `timeout` | No | **Plain only** | Execution timeout in seconds | +| `base_url` | No | Yes | Custom Patronus API base URL. Defaults to `https://api.patronus.ai` | +| `evaluators` | Yes | **Plain only** | Array of Patronus evaluator objects | +| `evaluators[].evaluator` | Yes | **Plain only** | Patronus evaluator name, such as `pii`, `toxicity-perspective-api`, `judge`, or a custom evaluator ID | +| `evaluators[].criteria` | No | **Plain only** | Criteria/profile name for evaluators that require one, such as `patronus:is-concise` | +| `evaluators[].explain_strategy` | No | **Plain only** | `never` \| `on-fail` \| `on-success` \| `always` | +| `capture` | No | **Plain only** | `none` \| `fails-only` \| `all`; defaults to `none` | +| `timeout` | No | **Plain only** | Provider execution timeout in seconds | ### Gray Swan diff --git a/docs/deployment-guides/config-json/plugins.mdx b/docs/deployment-guides/config-json/plugins.mdx index 5bb8806dc2..1d340c01c7 100644 --- a/docs/deployment-guides/config-json/plugins.mdx +++ b/docs/deployment-guides/config-json/plugins.mdx @@ -132,6 +132,7 @@ Exports distributed traces to any OTel-compatible collector (Jaeger, Zipkin, Tem | `config.headers` | No | - | Custom headers for the collector (supports `env.` prefix) | | `config.insecure` | No | `false` | Skip TLS verification | | `config.tls_ca_cert` | No | - | Path to TLS CA certificate | +| `config.plugin_span_filter` | No | - | Filter which plugin hook spans are exported. See [Filtering Plugin Spans](/features/observability/otel#filtering-plugin-spans) | ```json { diff --git a/docs/deployment-guides/helm/cluster.mdx b/docs/deployment-guides/helm/cluster.mdx index 52e6f72a5d..04b0c3105a 100644 --- a/docs/deployment-guides/helm/cluster.mdx +++ b/docs/deployment-guides/helm/cluster.mdx @@ -22,6 +22,15 @@ Cluster mode requires **PostgreSQL** as the storage backend. SQLite is single-no | Multiple replicas, shared DB only | Optional - DB provides eventual consistency | | Multiple replicas with strict per-minute rate limiting | **Enable cluster mode** - in-memory counters are synced via gossip | | Geographic multi-region | Enable cluster mode with DNS or Consul discovery | +| Serverless platforms without peer-to-peer networking (e.g. Cloud Run) | Use **broker mode** instead of gossip - see note below | + + +The Helm chart deploys the default **mesh** clustering, which needs nodes to +reach each other directly over gossip (`10101`) and gRPC (`10102`). On +platforms that do not allow peer-to-peer connectivity - such as Google Cloud +Run - use **broker mode**, where nodes only make an outbound connection to a +central relay. See [Enterprise Clustering → Broker Mode](/enterprise/clustering#broker-mode). + --- diff --git a/docs/deployment-guides/helm/guardrails.mdx b/docs/deployment-guides/helm/guardrails.mdx index ca7073c9d1..f880ef897d 100644 --- a/docs/deployment-guides/helm/guardrails.mdx +++ b/docs/deployment-guides/helm/guardrails.mdx @@ -154,21 +154,50 @@ bifrost: ``` - + + +Calls CrowdStrike AIDR's `guard_chat_completions` endpoint for policy-driven AI threat detection, blocking, and redaction. ```yaml bifrost: guardrails: providers: - id: 4 + provider_name: "crowdstrike-aidr" + policy_name: "crowdstrike-aidr-prod" + enabled: true + timeout: 30 + config: + api_key: "env.CS_AIDR_TOKEN" + base_url: "env.CS_AIDR_BASE_URL" + app_id: "bifrost-production" + collector_instance_id: "prod-us-east-1" +``` + +`base_url` is optional and defaults to `https://api.crowdstrike.com/aidr/aiguard`. Bifrost appends `/v1/guard_chat_completions`, so the base URL can be the collector base URL rather than the full endpoint URL. + + + + +```yaml +bifrost: + guardrails: + providers: + - id: 5 provider_name: "patronus-ai" policy_name: "patronus-eval" enabled: true timeout: 30 config: api_key: "env.PATRONUS_API_KEY" - environment: "production" # production | development (env.* supported) - sampling_rate: 100 + base_url: "https://api.patronus.ai" # optional custom endpoint (env.* supported) + evaluators: + - evaluator: "pii" + explain_strategy: "on-fail" + - evaluator: "judge" + criteria: "patronus:is-concise" + explain_strategy: "on-fail" + capture: "none" # none | fails-only | all ``` @@ -178,7 +207,7 @@ bifrost: bifrost: guardrails: providers: - - id: 5 + - id: 6 provider_name: "grayswan" policy_name: "grayswan-jailbreak" enabled: true @@ -244,14 +273,28 @@ Any field marked **env.\* supported** below accepts a bare `"env.VAR_NAME"` stri | `sampling_rate` | No | **Plain only** | `0`–`100`; percentage of requests to evaluate (default: `100`) | | `timeout` | No | **Plain only** | Execution timeout in seconds | +### CrowdStrike AIDR + +| Field | Required | env.\* supported | Notes | +|-------|----------|-----------------|-------| +| `api_key` | Yes | Yes | AIDR collector token | +| `base_url` | No | Yes | AIDR base URL. Defaults to `https://api.crowdstrike.com/aidr/aiguard` | +| `app_id` | No | **Plain only** | Application or service identifier shown in AIDR logs | +| `collector_instance_id` | No | **Plain only** | Deployment or collector instance label shown in AIDR logs | +| `timeout` | No | **Plain only** | Provider execution timeout in seconds | + ### Patronus AI | Field | Required | env.\* supported | Notes | |-------|----------|-----------------|-------| | `api_key` | Yes | Yes | Patronus AI API key | -| `environment` | No | Yes | `"production"` (default) \| `"development"` | -| `sampling_rate` | No | **Plain only** | `0`–`100`; percentage of requests to evaluate (default: `100`) | -| `timeout` | No | **Plain only** | Execution timeout in seconds | +| `base_url` | No | Yes | Custom Patronus API base URL. Defaults to `https://api.patronus.ai` | +| `evaluators` | Yes | **Plain only** | Array of Patronus evaluator objects | +| `evaluators[].evaluator` | Yes | **Plain only** | Patronus evaluator name, such as `pii`, `toxicity-perspective-api`, `judge`, or a custom evaluator ID | +| `evaluators[].criteria` | No | **Plain only** | Criteria/profile name for evaluators that require one, such as `patronus:is-concise` | +| `evaluators[].explain_strategy` | No | **Plain only** | `never` \| `on-fail` \| `on-success` \| `always` | +| `capture` | No | **Plain only** | `none` \| `fails-only` \| `all`; defaults to `none` | +| `timeout` | No | **Plain only** | Provider execution timeout in seconds | ### Gray Swan diff --git a/docs/deployment-guides/helm/plugins.mdx b/docs/deployment-guides/helm/plugins.mdx index 887cf85600..c0f8b695b8 100644 --- a/docs/deployment-guides/helm/plugins.mdx +++ b/docs/deployment-guides/helm/plugins.mdx @@ -178,7 +178,6 @@ Two modes: | `bifrost.plugins.semanticCache.config.cache_by_model` | Include model name in cache key | `true` | | `bifrost.plugins.semanticCache.config.cache_by_provider` | Include provider name in cache key | `true` | | `bifrost.plugins.semanticCache.config.exclude_system_prompt` | Exclude system prompt from cache key | `false` | -| `bifrost.plugins.semanticCache.config.cleanup_on_shutdown` | Delete cache data on pod shutdown | `false` | **Semantic mode (with OpenAI embeddings + Weaviate):** diff --git a/docs/docs.json b/docs/docs.json index 0d7aec147a..9abab63d8d 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -388,6 +388,7 @@ "pages": [ "integrations/guardrails/aws-bedrock", "integrations/guardrails/azure-content-safety", + "integrations/guardrails/crowdstrike-aidr", "integrations/guardrails/grayswan", "integrations/guardrails/patronus-ai" ] diff --git a/docs/enterprise/clustering.mdx b/docs/enterprise/clustering.mdx index 98467aad49..cc7bfd8bc8 100644 --- a/docs/enterprise/clustering.mdx +++ b/docs/enterprise/clustering.mdx @@ -190,6 +190,131 @@ If you omit the `grpc` block entirely, both defaults apply. Override only when t --- +## Broker Mode + +The default `mesh` clustering described above is peer-to-peer: every node must +accept inbound gossip and gRPC connections from every other node. Some +environments do not allow that. **Google Cloud Run**, for example, gives each +instance only a single inbound serving port, ephemeral instances with no +stable addresses, and no instance-to-instance networking - so memberlist +gossip and the gRPC mesh cannot form. + +**Broker mode** solves this. Instead of connecting to each other, every node +makes a single **outbound** connection to a central **broker** process. The +broker is a pure relay: a message received from one node is fanned out to all +other connected nodes. The broker also pushes a **roster** (the list of +connected node IDs) to every node. + +``` + ┌──────────┐ outbound ┌──────────┐ outbound ┌──────────┐ + │ Node A │──── stream ──▶│ Broker │◀── stream ────│ Node B │ + │(CloudRun)│ │ (relay) │ │(CloudRun)│ + └──────────┘ └──────────┘ └──────────┘ + message from A ──▶ broker ──▶ forwarded to B, C, … (never back to A) +``` + +Because nodes only need **outbound** connectivity, broker mode runs on any +platform that can make an outbound gRPC connection. + +### How it differs from mesh mode + +| Aspect | Mesh mode | Broker mode | +|--------|-----------|-------------| +| Connectivity | Every node connects to every node | Each node makes one outbound connection to the broker | +| Membership | memberlist gossip | Roster pushed by the broker | +| Discovery | 6 discovery methods | Not used - the broker is the rendezvous point | +| Ports on a node | `10101/TCP+UDP`, `10102/TCP` inbound | None - outbound only | +| Leader election | Deterministic over gossip members | Deterministic over the broker roster (same algorithm) | +| Entity replication | Over the gRPC mesh | Over the broker relay - identical entity types | + +Leadership in broker mode uses the **same deterministic rule** as mesh mode: +the lexicographically-smallest node ID in the roster is the leader. Every node +computes this independently from the roster the broker pushes, so there is +nothing to configure and no broker-side election. + +### Configuration + +Nodes run in broker mode by setting `cluster_config.type` to `broker` and +pointing at the broker address: + +```json +{ + "cluster_config": { + "enabled": true, + "type": "broker", + "region": "us-east-1", + "broker": { + "address": "broker.example.run.app:443", + "tls": true, + "auth_token": "your-shared-secret" + } + } +} +``` + +See the [config.json cluster reference](/deployment-guides/config-json/cluster#broker-mode) +for the full field list. + +### Running the broker + +The broker is **not** a separate binary - the same Bifrost Enterprise image +runs as the broker when started with the `-mode=broker` flag (or the +`BIFROST_MODE=broker` environment variable): + +```bash +bifrost-enterprise -mode=broker -app-dir /app/data +``` + +In broker mode the process branches before the normal server bootstrap: it starts +**only** the relay gRPC server and runs no database, providers, plugins, or +HTTP gateway. It reads `cluster_config.broker` from the same `config.json` and +serves on `broker.listen_port` (default `50051`). A standard gRPC health +service is registered for readiness probes. + +### Deploying on Cloud Run + + +All nodes must connect to the **same** broker process. Fan-out cannot span +multiple broker instances, so the broker must run as a **single instance**. + + +**Broker service:** +- Deploy as a Cloud Run service with `min-instances=1` and `max-instances=1`. +- Enable **HTTP/2** (end-to-end) so gRPC works. +- Expose on `:443`; nodes use the service URL as `broker.address` with `tls: true`. +- Set the Cloud Run container port to `50051` so it matches + `cluster_config.broker.listen_port`, or override `listen_port` to `8080` to + match Cloud Run's default `$PORT`. +- Set `auth_token` so only your nodes can connect. + +**Node services:** +- Deploy normally - they only need outbound access to the broker URL. +- Set `cluster_config.type` to `broker` and `broker.address` to the broker URL. + + +Cloud Run caps a single request - including a streaming gRPC connection - at +60 minutes. When the broker stream is closed by the platform, each node +automatically reconnects with exponential backoff, so this is transparent. +gRPC keepalive pings are enabled on both sides to keep otherwise-idle streams +alive within that window. + + +### Roster and reconnection + +The broker pushes the roster on three triggers: when a node connects or +disconnects, the full roster to a node as its first frame on join, and a +periodic rebroadcast (every ~20s) as a safety net for any node that missed an +event-driven update. A node that stops receiving roster heartbeats treats the +broker as down and enters its reconnect loop. + + +There is a brief window after a node disconnects where nodes can disagree on +the leader until the updated roster lands everywhere - the same +eventual-consistency window that gossip has in mesh mode. + + +--- + ## Service Discovery Methods Bifrost supports 6 service discovery methods to fit any infrastructure. Choose based on your deployment environment: diff --git a/docs/enterprise/guardrails.mdx b/docs/enterprise/guardrails.mdx index b4def12dfe..e77963fb62 100644 --- a/docs/enterprise/guardrails.mdx +++ b/docs/enterprise/guardrails.mdx @@ -1,6 +1,6 @@ --- title: "Guardrails" -description: "Enterprise-grade content safety and security validation with support for native regex and secrets detection, AWS Bedrock Guardrails, Azure Content Safety, GraySwan Cygnal, and Patronus AI." +description: "Enterprise-grade content safety and security validation with support for native regex and secrets detection, AWS Bedrock Guardrails, Azure Content Safety, CrowdStrike AIDR, GraySwan Cygnal, and Patronus AI." icon: "road-barrier" --- @@ -27,6 +27,9 @@ icon: "road-barrier" Multi-modal content moderation with severity-based filtering. + + Inline AI threat detection, policy enforcement, redaction, and AIDR audit visibility. + AI safety monitoring with natural language rule definitions. @@ -42,7 +45,7 @@ Bifrost Guardrails are built around two core concepts that work together to prov | Concept | Description | |---------|-------------| | **Rules** | Custom policies defined using CEL (Common Expression Language) that determine what content to validate and when. Rules can apply to inputs, outputs, or both, and can be linked to one or more profiles for evaluation. | -| **Profiles** | Configurations for guardrail providers, including Bifrost-native providers (Custom Regex, Secrets Detection) and external providers (AWS Bedrock, Azure Content Safety, GraySwan, Patronus AI). Profiles are reusable and can be shared across multiple rules. | +| **Profiles** | Configurations for guardrail providers, including Bifrost-native providers (Custom Regex, Secrets Detection) and external providers (AWS Bedrock, Azure Content Safety, CrowdStrike AIDR, GraySwan, Patronus AI). Profiles are reusable and can be shared across multiple rules. | **How They Work Together:** - **Profiles** define *how* content is evaluated using native Bifrost checks or external provider capabilities @@ -54,7 +57,7 @@ Bifrost Guardrails are built around two core concepts that work together to prov | Feature | Description | |---------|-------------| -| **Multi-Provider Support** | Bifrost-native Custom Regex and Secrets Detection, plus AWS Bedrock, Azure Content Safety, GraySwan, and Patronus AI integrations | +| **Multi-Provider Support** | Bifrost-native Custom Regex and Secrets Detection, plus AWS Bedrock, Azure Content Safety, CrowdStrike AIDR, GraySwan, and Patronus AI integrations | | **Dual-Stage Validation** | Guard both inputs (prompts) and outputs (responses) | | **Real-Time Processing** | Synchronous and asynchronous validation modes | | **CEL-Based Rules** | Define custom policies using Common Expression Language | @@ -90,6 +93,7 @@ flowchart TB Rule2[Rule: Content Filter] Rule3[Rule: Prompt Injection] Rule4[Rule: Credential Leakage] + Rule5[Rule: AI Threat Detection] end subgraph profiles [Guardrail Profiles] @@ -99,11 +103,13 @@ flowchart TB Profile4[GraySwan Profile] Profile5[Secrets Detection Profile] Profile6[Custom Regex Profile] + Profile7[CrowdStrike AIDR Profile] end InputValidation --> Rule1 InputValidation --> Rule3 InputValidation --> Rule4 + InputValidation --> Rule5 OutputValidation --> Rule2 Rule1 --> Profile6 @@ -111,6 +117,7 @@ flowchart TB Rule2 --> Profile3 Rule3 --> Profile1 Rule4 --> Profile5 + Rule5 --> Profile7 ``` **Flow Description:** @@ -184,59 +191,71 @@ Guardrail Rules are custom policies that define when and how content validation +The HTTP API uses camelCase field names (`celExpression`, `applyTo`, `samplingRate`, `selectedGuardrailProfiles`). Profiles are referenced as `":"` strings (for example, `"regex:1"`, `"patronus-ai:6"`). + **Create a Guardrail Rule:** ```bash -curl -X POST http://localhost:8080/api/enterprise/guardrails/rules \ +curl -X POST http://localhost:8080/api/guardrails/rules \ -H "Content-Type: application/json" \ -d '{ - "id": 1, "name": "Block PII in Prompts", "description": "Prevent PII from being sent to LLM providers", "enabled": true, - "cel_expression": "request.messages.exists(m, m.role == \"user\")", - "apply_to": "input", - "sampling_rate": 100, + "celExpression": "request.messages.exists(m, m.role == \"user\")", + "applyTo": "input", + "samplingRate": 100, "timeout": 5000, - "provider_config_ids": [1, 2] + "selectedGuardrailProfiles": ["regex:1", "bedrock:2"] }' ``` **List All Rules:** ```bash -curl -X GET http://localhost:8080/api/enterprise/guardrails/rules \ +curl -X GET http://localhost:8080/api/guardrails/rules \ -H "Content-Type: application/json" # Response { + "count": 1, + "limit": 1, + "offset": 0, "rules": [ { "id": 1, "name": "Block PII in Prompts", "description": "Prevent PII from being sent to LLM providers", "enabled": true, - "cel_expression": "request.messages.exists(m, m.role == \"user\")", - "apply_to": "input", - "sampling_rate": 100, + "celExpression": "request.messages.exists(m, m.role == \"user\")", + "applyTo": "input", + "samplingRate": 100, "timeout": 5000, - "provider_config_ids": [1, 2] + "selectedGuardrailProfiles": ["regex:1", "bedrock:2"] } ] } ``` **Update a Rule:** + +`PUT` revalidates against the full rule schema. Send the complete rule body (same shape as `POST`), not a patch. ```bash -curl -X PUT http://localhost:8080/api/enterprise/guardrails/rules/1 \ +curl -X PUT http://localhost:8080/api/guardrails/rules/1 \ -H "Content-Type: application/json" \ -d '{ + "name": "Block PII in Prompts", + "description": "Prevent PII from being sent to LLM providers", "enabled": false, - "sampling_rate": 50 + "celExpression": "request.messages.exists(m, m.role == \"user\")", + "applyTo": "input", + "samplingRate": 50, + "timeout": 5000, + "selectedGuardrailProfiles": ["regex:1", "bedrock:2"] }' ``` **Delete a Rule:** ```bash -curl -X DELETE http://localhost:8080/api/enterprise/guardrails/rules/1 +curl -X DELETE http://localhost:8080/api/guardrails/rules/1 ``` @@ -378,7 +397,7 @@ Profiles are reusable configurations for guardrail providers. External providers | Property | Type | Required | Description | |----------|------|----------|-------------| | `id` | integer | Yes | Unique identifier for the profile | -| `provider_name` | string | Yes | Provider type: `regex`, `secrets`, `bedrock`, `azure`, `grayswan`, `patronus-ai` | +| `provider_name` | string | Yes | Provider type: `regex`, `secrets`, `bedrock`, `azure`, `crowdstrike-aidr`, `grayswan`, `patronus-ai` | | `policy_name` | string | Yes | Descriptive name for the policy | | `enabled` | boolean | Yes | Whether the profile is active | | `config` | object | No | Provider-specific configuration | @@ -397,7 +416,7 @@ Profiles are reusable configurations for guardrail providers. External providers 2. **Select Provider Type** - - Choose from: Secrets Detection, Custom Regex, AWS Bedrock, Azure Content Safety, GraySwan, or Patronus AI + - Choose from: Secrets Detection, Custom Regex, AWS Bedrock, Azure Content Safety, CrowdStrike AIDR, GraySwan, or Patronus AI 3. **Configure Provider Settings** - Enter credentials and endpoint information for external providers, or local settings for native providers @@ -411,14 +430,14 @@ Profiles are reusable configurations for guardrail providers. External providers +Profiles are managed per provider type at `/api/guardrails/{provider}`, where `{provider}` is one of `secrets`, `regex`, `bedrock`, `azure`, `crowdstrike-aidr`, `grayswan`, or `patronus-ai`. The API assigns the configuration ID after creation. + **Create a Profile:** ```bash -curl -X POST http://localhost:8080/api/enterprise/guardrails/providers \ +curl -X POST http://localhost:8080/api/guardrails/bedrock \ -H "Content-Type: application/json" \ -d '{ - "id": 1, - "provider_name": "bedrock", - "policy_name": "PII Detection Profile", + "name": "PII Detection Profile", "enabled": true, "config": { "access_key": "env.AWS_ACCESS_KEY_ID", @@ -430,42 +449,46 @@ curl -X POST http://localhost:8080/api/enterprise/guardrails/providers \ }' ``` -**List All Profiles:** +**List All Profiles (grouped by provider):** ```bash -curl -X GET http://localhost:8080/api/enterprise/guardrails/providers \ +curl -X GET http://localhost:8080/api/guardrails \ -H "Content-Type: application/json" # Response -{ - "providers": [ - { - "id": 1, - "provider_name": "bedrock", - "policy_name": "PII Detection Profile", - "enabled": true - }, - { - "id": 2, - "provider_name": "azure", - "policy_name": "Content Safety Profile", - "enabled": true - } - ] -} +[ + { + "name": "regex", + "configs": [ + { "id": 1, "name": "PII Detection", "enabled": true, "patterns": [...] } + ] + }, + { + "name": "bedrock", + "configs": [ + { "id": 2, "name": "PII Detection Profile", "enabled": true, "guardrail_arn": "...", "region": "us-east-1" } + ] + } +] ``` +To list only a single provider's profiles, hit the provider path directly: `GET /api/guardrails/bedrock`. + **Update a Profile:** ```bash -curl -X PUT http://localhost:8080/api/enterprise/guardrails/providers/1 \ +curl -X PUT http://localhost:8080/api/guardrails/bedrock \ -H "Content-Type: application/json" \ -d '{ + "id": 1, + "name": "PII Detection Profile", "enabled": false }' ``` **Delete a Profile:** ```bash -curl -X DELETE http://localhost:8080/api/enterprise/guardrails/providers/1 +curl -X DELETE http://localhost:8080/api/guardrails/bedrock \ + -H "Content-Type: application/json" \ + -d '{"id": 1}' ``` @@ -525,6 +548,19 @@ curl -X DELETE http://localhost:8080/api/enterprise/guardrails/providers/1 }, { "id": 5, + "provider_name": "crowdstrike-aidr", + "policy_name": "CrowdStrike AIDR Production", + "enabled": true, + "timeout": 30, + "config": { + "api_key": "env.CS_AIDR_TOKEN", + "base_url": "env.CS_AIDR_BASE_URL", + "app_id": "bifrost-production", + "collector_instance_id": "prod-us-east-1" + } + }, + { + "id": 6, "provider_name": "grayswan", "policy_name": "Custom Safety Rules", "enabled": true, @@ -539,13 +575,25 @@ curl -X DELETE http://localhost:8080/api/enterprise/guardrails/providers/1 } }, { - "id": 6, + "id": 7, "provider_name": "patronus-ai", - "policy_name": "Hallucination Detection", + "policy_name": "Patronus Quality Checks", "enabled": true, "config": { "api_key": "env.PATRONUS_API_KEY", - "api_endpoint": "https://api.patronus.ai/v1" + "base_url": "https://api.patronus.ai", + "evaluators": [ + { + "evaluator": "pii", + "explain_strategy": "on-fail" + }, + { + "evaluator": "judge", + "criteria": "patronus:is-concise", + "explain_strategy": "on-fail" + } + ], + "capture": "none" } } ] @@ -603,6 +651,16 @@ guardrails_config: analyze_severity_threshold: "medium" jailbreak_shield_enabled: true - id: 5 + provider_name: "crowdstrike-aidr" + policy_name: "CrowdStrike AIDR Production" + enabled: true + timeout: 30 + config: + api_key: "env.CS_AIDR_TOKEN" + base_url: "env.CS_AIDR_BASE_URL" + app_id: "bifrost-production" + collector_instance_id: "prod-us-east-1" + - id: 6 provider_name: "grayswan" policy_name: "Custom Safety Rules" enabled: true @@ -613,12 +671,20 @@ guardrails_config: rules: no_pii: "Do not allow personally identifiable information" professional_tone: "Ensure responses maintain a professional tone" - - id: 6 + - id: 7 provider_name: "patronus-ai" - policy_name: "Hallucination Detection" + policy_name: "Patronus Quality Checks" enabled: true config: - api_endpoint: "https://api.patronus.ai/v1" + api_key: "env.PATRONUS_API_KEY" + base_url: "https://api.patronus.ai" + evaluators: + - evaluator: "pii" + explain_strategy: "on-fail" + - evaluator: "judge" + criteria: "patronus:is-concise" + explain_strategy: "on-fail" + capture: "none" ``` @@ -628,18 +694,19 @@ guardrails_config: Third-party guardrail providers offer different capabilities. Bifrost-native providers are documented separately: [Secrets Detection](/enterprise/guardrails/secrets-detection) covers credential leakage, and [Custom Regex](/enterprise/guardrails/custom-regex) covers deterministic pattern checks, including the PII Detection template. -| Capability | AWS Bedrock | Azure Content Safety | GraySwan | Patronus AI | -|------------|-------------|----------------------|----------|-------------| -| PII Detection | Yes | No | No | Yes | -| Content Filtering | Yes | Yes | Yes | Yes | -| Prompt Injection | Yes | Yes | Yes | Yes | -| Hallucination Detection | No | No | No | Yes | -| Toxicity Screening | Yes | Yes | Yes | Yes | -| Custom Policies | Yes | Yes | Yes | Yes | -| Custom Natural Language Rules | No | No | Yes | No | -| Image Support | Yes | No | No | No | -| IPI Detection | No | Yes | Yes | No | -| Mutation Detection | No | No | Yes | No | +| Capability | AWS Bedrock | Azure Content Safety | CrowdStrike AIDR | GraySwan | Patronus AI | +|------------|-------------|----------------------|------------------|----------|-------------| +| PII Detection | Yes | No | Yes | No | Yes | +| Content Filtering | Yes | Yes | Yes | Yes | Yes | +| Prompt Injection | Yes | Yes | Yes | Yes | Yes | +| Hallucination Detection | No | No | No | No | Yes | +| Toxicity Screening | Yes | Yes | Yes | Yes | Yes | +| Custom Policies | Yes | Yes | Yes | Yes | Yes | +| Custom Natural Language Rules | No | No | No | Yes | No | +| Image Support | Yes | No | No | No | No | +| IPI Detection | No | Yes | Policy-dependent | Yes | No | +| Mutation Detection | No | No | No | Yes | No | +| Output Redaction | Yes | No | Yes | No | No | ### Best Practices diff --git a/docs/features/observability/otel.mdx b/docs/features/observability/otel.mdx index 0185adc110..5459aed2bf 100644 --- a/docs/features/observability/otel.mdx +++ b/docs/features/observability/otel.mdx @@ -903,6 +903,49 @@ Sensitive credentials never appear in config files: The plugin reads `OTEL_API_KEY` from the environment at runtime. +### Filtering Plugin Spans + +By default every plugin's pre- and post-hook execution generates a span, which can bloat traces when many plugins are active (e.g. 8 built-in plugins × 2 hooks = 16 plugin spans per request). Use `plugin_span_filter` inside the OTEL plugin config to control which plugin spans are exported. + +**Via config.json** (inside the OTEL plugin config): + +```json +{ + "plugins": [ + { + "name": "otel", + "enabled": true, + "config": { + "collector_url": "...", + "trace_type": "genai_extension", + "protocol": "http", + "plugin_span_filter": { + "mode": "exclude", + "plugins": ["logging", "compat", "telemetry", "otel"] + } + } + } + ] +} +``` + +**Via the UI**: Open the Plugins page and click **Configure Plugin Tracing**. Toggle individual plugins on or off and save. UI-saved settings persist across restarts unless `plugin_span_filter` is set in config.json with a higher `version` value. + +**Filter modes:** + +| Mode | Behaviour | +|------|-----------| +| `exclude` | Export spans for all plugins **except** those listed | +| `include` | Export spans **only** for the listed plugins | + +**Built-in plugin names** (for reference): `telemetry`, `prompts`, `logging`, `governance`, `otel`, `semantic_cache`, `compat`, `maxim`. + +When a plugin span is filtered out, its children are automatically re-parented to the nearest exported ancestor so the trace hierarchy stays connected. + + + `plugin_span_filter` follows the standard plugin config precedence rules. To make a config.json value override UI-saved DB settings on restart, set a higher `version` on the OTEL plugin entry (e.g. `"version": 2`). See [Plugin Versioning](/deployment-guides/config-json/plugins) for details. + + --- ## When to Use diff --git a/docs/features/semantic-caching.mdx b/docs/features/semantic-caching.mdx index f25747c720..d2eb9b2cba 100644 --- a/docs/features/semantic-caching.mdx +++ b/docs/features/semantic-caching.mdx @@ -169,7 +169,9 @@ bifrostConfig := schemas.BifrostConfig{ **Cache Settings**: - **TTL (seconds)**: How long cached responses are kept (default: 300 s). - **Similarity Threshold**: Cosine similarity cutoff for a cache hit (0–1, default: 0.8). -- **Dimension**: Vector dimension matching your embedding model (e.g. 1536 for `text-embedding-3-small`). +- **Dimension**: Vector size produced by the embedding model — must match the model exactly. Common values: `1536` for OpenAI `text-embedding-3-small`, `3072` for `text-embedding-3-large`, `768` for many Cohere/Voyage models. Use `1` only in direct-only mode (no provider). + +> **Heads up**: a vector store namespace can only hold vectors of *one* dimension. Whenever you change the embedding **provider**, **model**, or **dimension**, make sure the new dimension still matches what the model produces — otherwise writes to the existing namespace will fail and reads will silently miss. The namespace is **not** recreated automatically; either point `vector_store_namespace` at a fresh name or drop the existing class/index in your vector store before saving. **Conversation Settings**: - **Conversation History Threshold**: Skip caching when the conversation has more than this many messages (default: 3). @@ -196,7 +198,6 @@ bifrostConfig := schemas.BifrostConfig{ "embedding_model": "text-embedding-3-small", "dimension": 1536, - "cleanup_on_shutdown": true, "ttl": "5m", "threshold": 0.8, @@ -279,7 +280,6 @@ bifrost: config: dimension: 1 ttl: "5m" - cleanup_on_shutdown: true cache_by_model: true cache_by_provider: true ``` @@ -297,7 +297,6 @@ bifrost: "config": { "dimension": 1, "ttl": "5m", - "cleanup_on_shutdown": true, "cache_by_model": true, "cache_by_provider": true } @@ -612,6 +611,7 @@ Example HTTP Response: "extra_fields": { "cache_debug": { "cache_hit": false, + "cache_id": "550e8500-e29b-41d4-a725-446655440001", "provider_used": "openai", "model_used": "gpt-4o-mini", "input_tokens": 20 @@ -620,22 +620,21 @@ Example HTTP Response: } ``` - -These variables allow you to detect cached responses and get the cache entry ID needed for clearing specific entries. +`cache_debug` is populated on both hits and misses. `cache_id` is the storage ID of the entry — use it to invalidate the entry later. The embedding-related fields (`provider_used`, `model_used`, `input_tokens`) are only present when semantic search actually ran. ### Clear Specific Cache Entry -Use the request ID from cached responses to clear specific entries: +Use the `cache_id` from `cache_debug` to clear a specific entry: ```go -// Clear specific entry by request ID -err := plugin.ClearCacheForRequestID("550e8400-e29b-41d4-a716-446655440000") +// Clear specific entry by cache ID (read from response.ExtraFields.CacheDebug.CacheID) +err := plugin.ClearCacheForCacheID("550e8500-e29b-41d4-a725-446655440001") -// Clear all entries for a cache key +// Clear all entries for a cache key err := plugin.ClearCacheForKey("support-session-456") ``` @@ -644,8 +643,8 @@ err := plugin.ClearCacheForKey("support-session-456") ```bash -# Clear specific cached entry by request ID -curl -X DELETE http://localhost:8080/api/cache/clear/550e8400-e29b-41d4-a716-446655440000 +# Clear specific cached entry by cache ID +curl -X DELETE http://localhost:8080/api/cache/clear/550e8500-e29b-41d4-a725-446655440001 # Clear all entries for a cache key curl -X DELETE http://localhost:8080/api/cache/clear-by-key/support-session-456 @@ -665,16 +664,15 @@ The semantic cache automatically handles cleanup to prevent storage bloat: - **Namespace Isolation**: Each Bifrost instance uses isolated vector store namespaces to prevent conflicts **Manual Cleanup Options:** -- Clear specific entries by request ID (see examples above) +- Clear specific entries by cache ID (see examples above) - Clear all entries for a cache key - Restart Bifrost to clear all cache data -The semantic cache namespace and all its cache entries are deleted when Bifrost client shuts down **only if `cleanup_on_shutdown` is set to `true`**. By default (`cleanup_on_shutdown: false`), cache data persists between restarts. DO NOT use the plugin's namespace for external purposes. - +**Dimension / Provider / Model Changes**: A vector store namespace can only hold vectors of **one** dimension. If you change `dimension` (or switch to an embedding `provider`/`model` that produces a different vector size), the existing namespace is **not** recreated automatically — `CreateNamespace` is a no-op when the class/collection already exists. Subsequent writes will fail (vector-size mismatch) and reads will silently miss. Before saving the change, either: - -**Dimension Changes**: If you update the `dimension` config, the existing namespace will contain data with mixed dimensions, causing retrieval issues. To avoid this, either use a different `vector_store_namespace` or set `cleanup_on_shutdown: true` before restarting. +- point `vector_store_namespace` at a fresh name, or +- drop the existing class/index in your vector store, or --- diff --git a/docs/integrations/anthropic-sdk/overview.mdx b/docs/integrations/anthropic-sdk/overview.mdx index 50de367e18..d0fa417453 100644 --- a/docs/integrations/anthropic-sdk/overview.mdx +++ b/docs/integrations/anthropic-sdk/overview.mdx @@ -15,7 +15,7 @@ This integration enables you to utilize Bifrost's features like governance, load **Enabling the beta header**: Anthropic frequently uses the `anthropic-beta` header to gate access to new features. Clients like Vercels AI SDK use these. Bifrost will block unrecognized headers by default for security purposes. -To enable the beta header for full compatability, add `anthropic-beta` to the AllowList under Settings -> Client Settings in the UI. +To enable the beta header for full compatibility, add `anthropic-beta` to the AllowList under Settings -> Client Settings in the UI. --- diff --git a/docs/integrations/guardrails/crowdstrike-aidr.mdx b/docs/integrations/guardrails/crowdstrike-aidr.mdx new file mode 100644 index 0000000000..5657716c94 --- /dev/null +++ b/docs/integrations/guardrails/crowdstrike-aidr.mdx @@ -0,0 +1,361 @@ +--- +title: "CrowdStrike AIDR" +description: "Integrate CrowdStrike AI Detection and Response with Bifrost Enterprise to inspect LLM inputs and outputs, block policy violations, redact sensitive content, and send AI security telemetry to AIDR." +icon: "binoculars" +--- + +## Overview + +Bifrost Enterprise supports **CrowdStrike AI Detection and Response (AIDR)** as a third-party guardrail provider for LLM request and response traffic. + +Use it when your organization already manages AI security policies in CrowdStrike and you want Bifrost to enforce those policies inline before prompts reach an LLM and before model responses reach users. + +CrowdStrike owns the detection policy. Bifrost owns the gateway enforcement path: it selects when to call AIDR, sends the relevant AI traffic, then blocks or rewrites the Bifrost request/response based on AIDR's verdict. + +## When To Use It + +CrowdStrike AIDR is useful for: + +- Detecting and blocking prompt injection or jailbreak attempts +- Preventing sensitive data, credentials, PII, or custom entities from being sent to an LLM +- Redacting or defanging content when your AIDR policy returns transformed text +- Evaluating both input prompts and output completions with different AIDR policy rules +- Sending AI security findings and metadata into the CrowdStrike AIDR console +- Inspecting tool definitions, assistant tool calls, and tool results in chat-based agent flows + + +Bifrost follows the AIDR policy response. Detector findings alone do not block traffic unless AIDR returns `blocked: true`. If you want Bifrost to stop a request, configure the relevant AIDR policy rule action to block. + + +## Prerequisites + +- Bifrost Enterprise with the guardrails plugin enabled +- A CrowdStrike Falcon tenant in a supported AIDR cloud: US-1, US-2, or EU-1 +- An AIDR subscription: **AIDR for Workforce** or **AIDR for Agents**. For Bifrost gateway/application traffic, **AIDR for Agents** is the relevant subscription. +- A Falcon user with permission to manage AIDR collectors, typically the AIDR Admin role +- A CrowdStrike AIDR collector assigned to the policy you want Bifrost to enforce +- Network egress from Bifrost to the configured AIDR API URL over HTTPS + +For CrowdStrike-side subscription, role, supported cloud, policy, and collector details, see the [CrowdStrike AIDR overview](https://aidr-docs.crowdstrike.com/docs/aidr/). + +## Set Up The AIDR Collector + +Before configuring Bifrost, create or open the CrowdStrike AIDR collector that Bifrost will use: + +1. In the Falcon console, open the menu and go to **AI Detection and Response** > **Collectors**. +2. Create an **Application** collector for Bifrost, or open an existing collector your security team already created. +3. Configure the collector name, logging mode, and policy. + - Assign a policy if you want AIDR to block or redact traffic. + - If no policy is assigned, AIDR can still provide visibility, but Bifrost will not receive policy block/redaction decisions to enforce. +4. Save the collector. +5. Open the collector's **Config** tab. +6. Copy the **API token** into Bifrost as `api_key`. +7. Copy the **Base URL** into Bifrost as `base_url`. + +For US-1, the base URL is usually `https://api.crowdstrike.com/aidr/aiguard`. For US-2 or EU-1 tenants, use the regional base URL shown in the collector configuration. + +## How It Works + +1. Create a Bifrost guardrail provider with `provider_name: "crowdstrike-aidr"`. +2. Attach that provider configuration to one or more guardrail rules. +3. When a rule matches, Bifrost extracts text content and tool context from the request or response. +4. Bifrost calls AIDR at `{base_url}/v1/guard_chat_completions` with `event_type: "input"` or `event_type: "output"`. +5. AIDR evaluates the payload with the policy assigned to your collector. +6. If AIDR returns `blocked: true`, Bifrost returns `GUARDRAIL_INTERVENED` and does not continue that request/response path. +7. If AIDR returns `transformed: true` with `guard_output`, Bifrost applies the transformed text to the request or response. +8. If AIDR returns neither `blocked` nor `transformed`, Bifrost allows the original content through. + +### Payload Sent To AIDR + +Bifrost sends AIDR an OpenAI Chat Completions-shaped `guard_input` payload: + +```json +{ + "event_type": "input", + "guard_input": { + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "Find the employee record for Jane Doe." + } + ], + "tools": [ + { + "type": "function", + "function": { + "name": "hr_lookup", + "description": "Return employee details by name" + } + } + ] + }, + "app_id": "bifrost-production", + "collector_instance_id": "prod-us-east-1", + "llm_provider": "openai", + "model": "gpt-4o-mini" +} +``` + +Bifrost sets: + +| Field | Source | +|-------|--------| +| `event_type` | Guardrail rule phase: `input` or `output` | +| `guard_input.messages` | Text fields extracted by the guardrail layer from chat, Responses API, text completions, rerank queries/documents, image prompts, and other text-bearing request/response objects | +| `guard_input.tools` | Chat tool definitions from the request, when present | +| `tool_calls` and `tool_call_id` | Assistant tool calls and tool response IDs, when present | +| `app_id` | Optional value from the CrowdStrike provider configuration | +| `collector_instance_id` | Optional value from the CrowdStrike provider configuration | +| `llm_provider` | Provider selected by Bifrost for the LLM request | +| `model` | Model requested through Bifrost | + +## Configuration Fields + +| Field | Type | Required | Default | Description | +|-------|------|----------|---------|-------------| +| `api_key` | string | Yes | - | AIDR collector token. Use `env.CS_AIDR_TOKEN` for production. | +| `base_url` | string | No | `https://api.crowdstrike.com/aidr/aiguard` | AIDR API base URL. Bifrost appends `/v1/guard_chat_completions`; it also accepts a value that already includes that suffix. | +| `app_id` | string | No | - | Application or service identifier shown in AIDR logs. | +| `collector_instance_id` | string | No | - | Deployment or instance label shown in AIDR logs. | +| `timeout` | integer | No | `30` | Provider execution timeout in seconds. In `config.json`, this is a provider-level field. In the management API and UI, it is submitted with the provider config and stored as the provider timeout. | + + +Use the base URL and collector token from the CrowdStrike AIDR collector's configuration page. Regional tenants commonly use base URLs under `https://api.us-2.crowdstrike.com/aidr/aiguard` or `https://api.eu-1.crowdstrike.com/aidr/aiguard`. + + +## Configuration + + + + +1. Go to **Guardrails** > **Providers**. +2. Select **CrowdStrike AIDR**. +3. Click **Add Configuration**. + + + CrowdStrike AIDR configuration in Bifrost dashboard + + +4. Enter a descriptive **Name**, such as `crowdstrike-aidr-prod`. +5. Set **Collector Token** directly or through an environment variable such as `env.CS_AIDR_TOKEN`. +6. Set **Base URL** to the AIDR base URL from the collector configuration. Leave it empty to use `https://api.crowdstrike.com/aidr/aiguard`. +7. Optionally set **App ID** and **Collector Instance ID** to improve AIDR log attribution. +8. Set the timeout and save the configuration. +9. Go to **Guardrails** > **Configuration** and attach the CrowdStrike AIDR profile to an input, output, or both-phase rule. + + + + +Create the CrowdStrike AIDR provider configuration directly with the management API. The provider route is `/api/guardrails/crowdstrike-aidr`. + +```bash +curl -X POST http://localhost:8080/api/guardrails/crowdstrike-aidr \ + -H "Content-Type: application/json" \ + -d '{ + "name": "crowdstrike-aidr-prod", + "enabled": true, + "config": { + "api_key": "env.CS_AIDR_TOKEN", + "base_url": "env.CS_AIDR_BASE_URL", + "app_id": "bifrost-production", + "collector_instance_id": "prod-us-east-1", + "timeout": 30 + } + }' +``` + +Fetch the generated configuration ID: + +```bash +curl -X GET http://localhost:8080/api/guardrails/crowdstrike-aidr \ + -H "Content-Type: application/json" +``` + +Attach it to a rule by referencing `crowdstrike-aidr:` in `selectedGuardrailProfiles`: + +```bash +curl -X POST http://localhost:8080/api/guardrails/rules \ + -H "Content-Type: application/json" \ + -d '{ + "name": "crowdstrike-aidr-all-chat", + "description": "Run CrowdStrike AIDR on prompts and completions", + "enabled": true, + "celExpression": "true", + "applyTo": "both", + "samplingRate": 100, + "timeout": 60, + "maxTurnsToSend": 8, + "selectedGuardrailProfiles": ["crowdstrike-aidr:12"] + }' +``` + + + + +```json +{ + "guardrails_config": { + "guardrail_providers": [ + { + "id": 70, + "provider_name": "crowdstrike-aidr", + "policy_name": "crowdstrike-aidr-prod", + "enabled": true, + "timeout": 30, + "config": { + "api_key": "env.CS_AIDR_TOKEN", + "base_url": "env.CS_AIDR_BASE_URL", + "app_id": "bifrost-production", + "collector_instance_id": "prod-us-east-1" + } + } + ], + "guardrail_rules": [ + { + "id": 701, + "name": "crowdstrike-aidr-all-chat", + "description": "Run CrowdStrike AIDR on prompts and completions", + "enabled": true, + "cel_expression": "true", + "apply_to": "both", + "sampling_rate": 100, + "timeout": 60, + "max_turns_to_send": 8, + "provider_config_ids": [70] + } + ] + } +} +``` + + + + +```yaml +bifrost: + guardrails: + providers: + - id: 70 + provider_name: "crowdstrike-aidr" + policy_name: "crowdstrike-aidr-prod" + enabled: true + timeout: 30 + config: + api_key: "env.CS_AIDR_TOKEN" + base_url: "env.CS_AIDR_BASE_URL" + app_id: "bifrost-production" + collector_instance_id: "prod-us-east-1" + + rules: + - id: 701 + name: "crowdstrike-aidr-all-chat" + description: "Run CrowdStrike AIDR on prompts and completions" + enabled: true + cel_expression: "true" + apply_to: "both" + sampling_rate: 100 + timeout: 60 + max_turns_to_send: 8 + provider_config_ids: [70] +``` + + + + +## Policy Outcomes + +Bifrost maps the AIDR response into Bifrost guardrail behavior like this: + +| AIDR response | Bifrost behavior | +|---------------|------------------| +| `result.blocked: true` | Blocks with `GUARDRAIL_INTERVENED`. The error reason uses AIDR `display_message`, top-level `summary`, or result `summary`, in that order. | +| `result.blocked: false`, `result.transformed: true`, valid `guard_output` | Allows the request/response but replaces the original text with the transformed AIDR output. | +| `result.blocked: false`, `result.transformed: false` | Allows the original content unchanged. | +| Missing `result`, malformed response, timeout, non-2xx response, or transformed output count mismatch | Treats the provider call as failed. Check Bifrost logs for the exact guardrail error. | + +Bifrost also records AIDR usage metadata for logs and spans: + +- Evaluated AIDR policy name +- Whether AIDR blocked the interaction +- Whether AIDR transformed the content +- Detector count +- Detector names + +## Blocked Error Response + +When CrowdStrike AIDR blocks content, Bifrost returns HTTP `400` with `type: "guardrail_intervention"`. The error message uses the AIDR-provided reason when available. + +For an input guardrail, the LLM request is not sent to the model. For an output guardrail, the model response is replaced by the error response. + +Trimmed example: + +```json +{ + "type": "guardrail_intervention", + "is_bifrost_error": false, + "status_code": 400, + "error": { + "type": "guardrail_intervention", + "message": "Blocked by CrowdStrike AIDR policy: Malicious Prompt was detected and blocked." + }, + "extra_fields": { + "request_type": "chat_completion" + } +} +``` + +If AIDR does not return a display message or summary, the message is: + +```text +Blocked by CrowdStrike AIDR policy +``` + + +CrowdStrike AIDR output inspection and redaction only apply to non-streaming response bodies today. Streaming output redaction is not supported; input guardrails can still run before a streaming request is sent to the LLM. + + +## Useful Rule Patterns + +Run AIDR only for external-user traffic: + +```text +headers["x-user-type"] == "external" +``` + +Run AIDR only for production virtual keys: + +```text +headers["x-bf-vk"] == "prod" +``` + +Run AIDR only for a specific provider or model: + +```text +provider == "openai" && model.startsWith("gpt-4") +``` + +Run AIDR on all requests while limiting historical context sent to AIDR: + +```json +{ + "celExpression": "true", + "applyTo": "both", + "maxTurnsToSend": 8 +} +``` + +## Troubleshooting + +| Symptom | What to check | +|---------|---------------| +| AIDR does not block a prompt | Confirm the AIDR policy action is set to block. Bifrost does not independently block on detector findings when `blocked` is false. | +| Redaction does not appear | Confirm AIDR returned `transformed: true` with `guard_output.messages`. Bifrost only rewrites content when the transformed output count matches the original text count. | +| AIDR returns `401` or `403` | Check the collector token and make sure Bifrost is using the token for the correct AIDR collector. | +| AIDR request times out | Increase the provider or rule timeout, reduce the conversation history with `maxTurnsToSend`, or narrow the CEL rule so fewer large requests are evaluated. | +| No AIDR findings are easy to correlate | Set `app_id` and `collector_instance_id`, and use Bifrost request logs/spans alongside the AIDR Findings page. | + +For general rule and profile concepts, see [Guardrails](/enterprise/guardrails). For direct `config.json` setup, see [Guardrails in config.json](/deployment-guides/config-json/guardrails). diff --git a/docs/integrations/guardrails/patronus-ai.mdx b/docs/integrations/guardrails/patronus-ai.mdx index 3c2f2bfe32..b5f41471ba 100644 --- a/docs/integrations/guardrails/patronus-ai.mdx +++ b/docs/integrations/guardrails/patronus-ai.mdx @@ -4,23 +4,234 @@ description: "Integrate Patronus AI with Bifrost for LLM security and safety inc icon: "brain" --- -Bifrost integrates with **Patronus AI** to provide specialized LLM security and safety with advanced evaluation capabilities. This page covers the configuration and capabilities of the Patronus AI guardrail provider. +## Overview + +Bifrost Enterprise supports [**Patronus AI**](https://www.patronus.ai/) as a third-party guardrail provider for evaluating LLM request and response text with Patronus evaluators. + +Use it when you want evaluator-based checks such as PII detection, toxicity screening, prompt-injection checks, response quality criteria, or custom evaluators from your Patronus account. + +## How It Works + +You'll need a Patronus API key to authenticate with their Evaluate API - grab one from the [Patronus dashboard](https://app.patronus.ai/experiments). + +1. You create a guardrail provider with `provider_name: "patronus-ai"` and your Patronus API key. +2. You configure one or more Patronus evaluators. +3. You attach that provider to a guardrail rule. +4. The rule decides when to run the provider and whether to evaluate `input`, `output`, or `both`. +5. Bifrost calls the Patronus Evaluate API at `/v1/evaluate`. +6. If any evaluator returns `pass: false`, Bifrost returns `GUARDRAIL_INTERVENED`. + +The Patronus evaluator flow supported here is text-based: Bifrost sends selected request or response text as the evaluation input. ## Capabilities -- **Hallucination Detection**: Identify factually incorrect responses -- **PII Detection**: Comprehensive personal data identification -- **Toxicity Screening**: Multi-language toxic content detection -- **Prompt Injection Defense**: Advanced attack pattern recognition -- **Custom Evaluators**: Build organization-specific safety checks -- **Real-Time Monitoring**: Continuous safety validation +- **PII Detection**: Identify personally identifiable information using Patronus evaluators +- **Toxicity Screening**: Evaluate text for toxic or unsafe content +- **Prompt Injection Checks**: Use Patronus judge criteria such as `patronus:prompt-injection` +- **Response Quality Checks**: Evaluate outputs for criteria such as conciseness, helpfulness, politeness, JSON validity, code validity, or CSV validity +- **Bias Checks**: Use Patronus criteria for age, gender, and racial bias checks +- **Custom Evaluators**: Use evaluator IDs and criteria configured in your Patronus account + +## Configuration Fields + +| Field | Type | Required | Default | Description | +|-------|------|----------|---------|-------------| +| `api_key` | string | Yes | - | Patronus API key. Supports `env.PATRONUS_API_KEY`. | +| `base_url` | string | No | `https://api.patronus.ai` | Custom Patronus API base URL. Bifrost appends `/v1/evaluate`. | +| `evaluators` | array | Yes | - | Patronus evaluator entries to run. At least one is required. | +| `capture` | enum | No | `none` | Controls whether Patronus stores evaluation results: `none`, `fails-only`, or `all`. | +| `timeout` | integer | No | `30` | Provider execution timeout in seconds. | + +### Evaluator Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `evaluator` | string | Yes | Patronus evaluator name, such as `pii`, `toxicity-perspective-api`, `judge`, or a custom evaluator ID. | +| `criteria` | string | No | Criteria/profile name for evaluators that require one, for example `patronus:is-concise`. | +| `explain_strategy` | enum | No | When to include evaluator explanations: `never`, `on-fail`, `on-success`, or `always`. | + +### Capture Modes + +Captured evaluation results appear under the **Traces** section in the Patronus dashboard. + +| Value | Meaning | +|-------|---------| +| `none` | Do not capture evaluation results in Patronus. | +| `fails-only` | Capture only failed evaluator results in Patronus. | +| `all` | Capture all evaluator results in Patronus. | + +### Explanation Response Modes + +| Value | Meaning | +|-------|---------| +| `never` | Do not request evaluator explanations. | +| `on-fail` | Request explanations for failed evaluator results. | +| `on-success` | Request explanations for passed evaluator results. | +| `always` | Request explanations for all evaluator results. | + +## Built-In UI Presets + +The Bifrost dashboard exposes common Patronus evaluator presets: + +| Preset | Evaluator | Criteria | +|--------|-----------|----------| +| Detect PII | `pii` | - | +| Detect Toxicity | `toxicity-perspective-api` | - | +| Prompt Injection | `judge` | `patronus:prompt-injection` | +| Answer Refusal | `judge` | `patronus:answer-refusal` | +| Is Concise | `judge` | `patronus:is-concise` | +| Is Helpful | `judge` | `patronus:is-helpful` | +| Is Polite | `judge` | `patronus:is-polite` | +| No Apologies | `judge` | `patronus:no-apologies` | +| No OpenAI Reference | `judge` | `patronus:no-openai-reference` | +| No Age Bias | `judge` | `patronus:no-age-bias` | +| No Gender Bias | `judge` | `patronus:no-gender-bias` | +| No Racial Bias | `judge` | `patronus:no-racial-bias` | +| Is JSON | `judge` | `patronus:is-json` | +| Is Code | `judge` | `patronus:is-code` | +| Is CSV | `judge` | `patronus:is-csv` | + +You can also select **Custom evaluator** and provide your own `evaluator` and optional `criteria`. + +## Configuration + + + + +1. Go to **Guardrails** > **Providers**. +2. Select **Patronus AI**. +3. Click **Add Configuration**. + + + Patronus AI configuration in Bifrost dashboard + + +4. Enter a descriptive **Name**. +5. Set your **API Key** directly or through an environment variable. +6. Leave **Base URL** empty or use the default `https://api.patronus.ai`, or set a custom Patronus endpoint. +7. Add one or more evaluators. +8. Choose a **Capture** mode. Bifrost defaults to **None**. +9. Set the timeout and save the configuration. +10. Attach the configuration to a guardrail rule under **Guardrails** > **Configuration**. + + + + +Create the Patronus AI provider configuration directly with the management API. The Enterprise backend registers guardrail provider APIs at `/api/guardrails/{provider}`; the provider type is the path segment (`patronus-ai`), and the API assigns the configuration ID after creation. + +```bash +curl -X POST http://localhost:8080/api/guardrails/patronus-ai \ + -H "Content-Type: application/json" \ + -d '{ + "name": "patronus-quality-checks", + "enabled": true, + "config": { + "api_key": "env.PATRONUS_API_KEY", + "base_url": "https://api.patronus.ai", + "evaluators": [ + { + "evaluator": "pii", + "explain_strategy": "on-fail" + }, + { + "evaluator": "judge", + "criteria": "patronus:is-concise", + "explain_strategy": "on-fail" + } + ], + "capture": "none", + "timeout": 30 + } + }' +``` + +To attach it to a rule, fetch the generated config ID with `GET /api/guardrails/patronus-ai`, then reference it in `selectedGuardrailProfiles` (in the form `patronus-ai:`) on `POST /api/guardrails/rules`. + + + + +```json +{ + "guardrails_config": { + "guardrail_providers": [ + { + "id": 40, + "provider_name": "patronus-ai", + "policy_name": "patronus-quality-checks", + "enabled": true, + "timeout": 30, + "config": { + "api_key": "env.PATRONUS_API_KEY", + "base_url": "https://api.patronus.ai", + "evaluators": [ + { + "evaluator": "pii", + "explain_strategy": "on-fail" + }, + { + "evaluator": "judge", + "criteria": "patronus:is-concise", + "explain_strategy": "on-fail" + } + ], + "capture": "none" + } + } + ], + "guardrail_rules": [ + { + "id": 401, + "name": "patronus-openai-output", + "description": "Run Patronus checks on OpenAI responses", + "enabled": true, + "cel_expression": "provider == 'openai'", + "apply_to": "output", + "sampling_rate": 100, + "timeout": 30, + "provider_config_ids": [40] + } + ] + } +} +``` + + + + +```yaml +bifrost: + guardrails: + providers: + - id: 40 + provider_name: "patronus-ai" + policy_name: "patronus-quality-checks" + enabled: true + timeout: 30 + config: + api_key: "env.PATRONUS_API_KEY" + base_url: "https://api.patronus.ai" + evaluators: + - evaluator: "pii" + explain_strategy: "on-fail" + - evaluator: "judge" + criteria: "patronus:is-concise" + explain_strategy: "on-fail" + capture: "none" -## Advanced Features + rules: + - id: 401 + name: "patronus-openai-output" + description: "Run Patronus checks on OpenAI responses" + enabled: true + cel_expression: "provider == 'openai'" + apply_to: "output" + sampling_rate: 100 + timeout: 30 + provider_config_ids: [40] +``` -- Context-aware evaluation -- Multi-turn conversation analysis -- Custom policy templates -- Integration with existing safety workflows + + ## Provider Capabilities Comparison diff --git a/docs/media/ui-crowdstrike-aidr-config.png b/docs/media/ui-crowdstrike-aidr-config.png new file mode 100644 index 0000000000..250da46ced Binary files /dev/null and b/docs/media/ui-crowdstrike-aidr-config.png differ diff --git a/docs/media/ui-patronus-config.png b/docs/media/ui-patronus-config.png new file mode 100644 index 0000000000..c03dcb17cd Binary files /dev/null and b/docs/media/ui-patronus-config.png differ diff --git a/docs/migration-guides/v1.5.0.mdx b/docs/migration-guides/v1.5.0.mdx index 0878720d67..0cf4eca096 100644 --- a/docs/migration-guides/v1.5.0.mdx +++ b/docs/migration-guides/v1.5.0.mdx @@ -690,6 +690,117 @@ The supported `BifrostContextKeyAPIKeyID` / `BifrostContextKeyAPIKeyName` path c --- +## Breaking Change 15: Semantic Cache Clear API is Now Cache-ID Based + +The semantic cache "clear by request ID" API has been removed. Storage IDs in the cache are deterministic UUIDv5 hashes derived from the request payload (so the same prompt across many requests maps to a single cache entry), which made the previous request-ID-based delete unable to match anything written by the direct-search path. + +The replacement is keyed on the cache entry's storage ID, which is now stamped on every response in `extra_fields.cache_debug.cache_id` — on cache hits **and** cache misses. Hold onto that ID from the response if you ever need to invalidate the entry. + +### REST API + +| Before (v1.4.x) | After (v1.5.0) | +|---|---| +| `DELETE /api/cache/clear/{requestId}` | `DELETE /api/cache/clear/{cacheId}` | + +The path parameter name and meaning both changed. The cache key endpoint (`DELETE /api/cache/clear-by-key/{cacheKey}`) is unchanged. + +**Before:** +```bash +curl -X DELETE localhost:8080/api/cache/clear/req-aaa-bbb-ccc +``` + +**After:** +```bash +# Read the cache ID from a prior response +CACHE_ID=$(curl ... | jq -r '.extra_fields.cache_debug.cache_id') + +curl -X DELETE localhost:8080/api/cache/clear/$CACHE_ID +``` + +### Go SDK + +The `ClearCacheForRequestID` method on `*semanticcache.Plugin` has been removed and replaced by `ClearCacheForCacheID`. + +**Before:** +```go +err := plugin.ClearCacheForRequestID(requestID) +``` + +**After:** +```go +// On hit or miss, the storage ID is exposed via CacheDebug.CacheID +cacheID := response.ExtraFields.CacheDebug.CacheID +if cacheID != nil { + err := plugin.ClearCacheForCacheID(*cacheID) +} +``` + +### Why the rename + +A single cache entry is reused across many request IDs (that is the point of caching). A request-ID-based delete only ever made sense for the original writer of the entry, and even that broke once direct search switched to deterministic storage IDs. The cache ID is the only stable handle that works for both writers and readers, so the API now reflects that. + +### CacheDebug on misses + +`extra_fields.cache_debug` is now populated on cache misses too — previously it was only emitted when semantic search ran. The new fields on a miss: + +- `cache_hit: false` +- `cache_id`: the storage ID where the entry was written (use this with `ClearCacheForCacheID`) +- `provider_used` / `model_used` / `input_tokens`: only present when semantic search actually ran (i.e. embedding model was invoked) + +If you parse `cache_debug` and assumed it was either absent or had `cache_hit: true`, update your consumer to handle the `cache_hit: false` shape. + +--- + +## Breaking Change 16: Semantic Cache `cleanup_on_shutdown` Removed + +The `cleanup_on_shutdown` option on the semantic cache plugin config has been removed. Cache entries and the vector store namespace are no longer deleted when Bifrost shuts down — cache data always persists between restarts. + +**Before:** +```json +{ + "plugins": { + "semantic_cache": { + "config": { + "ttl": "5m", + "cleanup_on_shutdown": true + } + } + } +} +``` + +**After:** +```json +{ + "plugins": { + "semantic_cache": { + "config": { + "ttl": "5m" + } + } + } +} +``` + +The field is no longer part of the config schema and will be rejected by validation. Remove it from `config.json`, Helm values, and any `PUT /api/config` payloads. + +### How to clear cache data + +If you previously relied on `cleanup_on_shutdown: true` to drop the cache on restart, use one of the supported invalidation paths instead: + +- `DELETE /api/cache/clear/{cacheId}` — invalidate a single entry +- `DELETE /api/cache/clear-by-key/{cacheKey}` — invalidate all entries for a cache key +- Drop the vector store class/collection or point `vector_store_namespace` at a fresh name to start clean + +### Dimension / provider / model changes + +The previous `cleanup_on_shutdown: true` + restart workflow was the documented escape hatch for changing `dimension` (or switching to an embedding `provider`/`model` that produces a different vector size). That option is gone. To rotate the namespace now, either: + +- point `vector_store_namespace` at a fresh name, or +- drop the existing class/index in your vector store before restarting + +--- + ## Opting Out: `version: 1` Compatibility Mode If you are not ready to adopt the new deny-by-default semantics, you can add a single field to `config.json` to restore v1.4.x behavior for all allow-list fields loaded from that file: @@ -788,6 +899,14 @@ If your code reads `selected_key_id` / `selected_key_name` from the request cont Remove `allow_direct_keys` from `config.json` and any `PUT /api/config` payloads. Audit HTTP callers that sent provider keys in `Authorization` / `x-api-key` / `x-goog-api-key` / `x-bf-bedrock-*` / `x-bf-azure-endpoint` headers — those keys are no longer forwarded. Audit Go SDK callers for any reference to `schemas.BifrostContextKeyDirectKey` — the constant is removed and code referencing it will not compile. Replace both flavours with a Bifrost-managed provider key, optionally pinned per request via `BifrostContextKeyAPIKeyID` / `BifrostContextKeyAPIKeyName` (Go SDK) or a virtual key (`sk-bf-*`, HTTP). + + +Replace `DELETE /api/cache/clear/{requestId}` with `DELETE /api/cache/clear/{cacheId}`, and replace `plugin.ClearCacheForRequestID(...)` with `plugin.ClearCacheForCacheID(...)`. Read the cache ID from `extra_fields.cache_debug.cache_id` on the response (now populated on misses too). + + + +Drop the `cleanup_on_shutdown` field from the semantic cache plugin config in `config.json`, Helm values, and any API payloads — it is no longer part of the schema. Cache data now always persists across restarts; use the cache clear endpoints or rotate `vector_store_namespace` to drop entries. + --- diff --git a/docs/openapi/openapi.json b/docs/openapi/openapi.json index afcd4003dc..9bb52d600c 100644 --- a/docs/openapi/openapi.json +++ b/docs/openapi/openapi.json @@ -41914,20 +41914,20 @@ } } }, - "/api/cache/clear/{requestId}": { + "/api/cache/clear/{cacheId}": { "delete": { - "operationId": "clearCacheByRequestId", - "summary": "Clear cache by request ID", - "description": "Clears cache entries associated with a specific request ID.", + "operationId": "clearCacheByCacheId", + "summary": "Clear cache entry by cache ID", + "description": "Deletes a single cache entry by its storage ID. Read the cache ID from\n`extra_fields.cache_debug.cache_id` on a prior response — it is populated\non both cache hits and cache misses.\n", "tags": [ "Cache" ], "parameters": [ { - "name": "requestId", + "name": "cacheId", "in": "path", "required": true, - "description": "Request ID to clear cache for", + "description": "Storage ID of the cache entry to delete", "schema": { "type": "string" } diff --git a/docs/openapi/openapi.yaml b/docs/openapi/openapi.yaml index ebfad00cce..4e581f5410 100644 --- a/docs/openapi/openapi.yaml +++ b/docs/openapi/openapi.yaml @@ -100,6 +100,8 @@ tags: description: Container management operations - name: Async Jobs description: Asynchronous job submission and retrieval endpoints + - name: Realtime + description: Realtime WebSocket and WebRTC endpoints # Provider Integrations - name: OpenAI Integration description: OpenAI-compatible API endpoints (/openai/*) @@ -246,6 +248,16 @@ paths: /v1/async/ocr/{job_id}: $ref: './paths/inference/ocr.yaml#/async-ocr-job' + # ==================== Realtime API ==================== + /v1/realtime: + $ref: './paths/inference/realtime.yaml#/realtime' + /v1/realtime/calls: + $ref: './paths/inference/realtime.yaml#/realtime-calls' + /v1/realtime/client_secrets: + $ref: './paths/inference/realtime.yaml#/realtime-client-secrets' + /v1/realtime/sessions: + $ref: './paths/inference/realtime.yaml#/realtime-sessions' + # ==================== OpenAI Integration ==================== # Chat Completions /openai/v1/chat/completions: @@ -323,10 +335,32 @@ paths: /openai/v1/containers/{container_id}/files/{file_id}/content: $ref: './paths/integrations/openai/containers.yaml#/container-files-content' + # Videos API + /openai/v1/videos: + $ref: './paths/integrations/openai/videos.yaml#/videos' + /openai/v1/videos/{video_id}: + $ref: './paths/integrations/openai/videos.yaml#/videos-by-id' + /openai/v1/videos/{video_id}/content: + $ref: './paths/integrations/openai/videos.yaml#/videos-content' + /openai/v1/videos/{video_id}/remix: + $ref: './paths/integrations/openai/videos.yaml#/videos-remix' + + # Realtime / WebSocket / WebRTC (OpenAI-prefixed aliases) + /openai/v1/realtime: + $ref: './paths/integrations/openai/realtime.yaml#/realtime' + /openai/v1/realtime/calls: + $ref: './paths/integrations/openai/realtime.yaml#/realtime-calls' + /openai/v1/realtime/client_secrets: + $ref: './paths/integrations/openai/realtime.yaml#/realtime-client-secrets' + /openai/v1/realtime/sessions: + $ref: './paths/integrations/openai/realtime.yaml#/realtime-sessions' + # ==================== Anthropic Integration ==================== # Messages API /anthropic/v1/messages: $ref: './paths/integrations/anthropic/messages.yaml#/messages' + /anthropic/v1/messages/{path}: + $ref: './paths/integrations/anthropic/messages.yaml#/messages-wildcard' # Legacy Complete API /anthropic/v1/complete: @@ -383,6 +417,26 @@ paths: /genai/v1beta/files/{file_id}: $ref: './paths/integrations/genai/files.yaml#/files-by-id' + # Batches + /genai/v1beta/batches: + $ref: './paths/integrations/genai/batches.yaml#/batches' + /genai/v1beta/batches/{batch_id}: + $ref: './paths/integrations/genai/batches.yaml#/batches-by-id' + + # Cached Contents + /genai/v1beta/cachedContents: + $ref: './paths/integrations/genai/cached-contents.yaml#/cached-contents' + /genai/v1beta/cachedContents/{cached_id}: + $ref: './paths/integrations/genai/cached-contents.yaml#/cached-contents-by-id' + + # Rank (Vertex AI) + /genai/v1/rank: + $ref: './paths/integrations/genai/rank.yaml#/rank' + + # Long-running video operation polling + /genai/v1beta/models/{model}/operations/{operation_id}: + $ref: './paths/integrations/genai/video-operations.yaml#/video-operations' + # ==================== Bedrock Integration ==================== # Converse API /bedrock/model/{modelId}/converse: @@ -396,14 +450,26 @@ paths: /bedrock/model/{modelId}/invoke-with-response-stream: $ref: './paths/integrations/bedrock/invoke.yaml#/invoke-stream' + # Count Tokens API + /bedrock/model/{modelId}/count-tokens: + $ref: './paths/integrations/bedrock/count-tokens.yaml#/count-tokens' + # Batch API + /bedrock/model-invocation-job: + $ref: './paths/integrations/bedrock/batch.yaml#/batch-job-create' /bedrock/model-invocation-jobs: - $ref: './paths/integrations/bedrock/batch.yaml#/batch-jobs' - /bedrock/model-invocation-jobs/{jobIdentifier}: + $ref: './paths/integrations/bedrock/batch.yaml#/batch-jobs-list' + /bedrock/model-invocation-job/{job_arn}: $ref: './paths/integrations/bedrock/batch.yaml#/batch-job-by-id' - /bedrock/model-invocation-jobs/{jobIdentifier}/stop: + /bedrock/model-invocation-job/{job_arn}/stop: $ref: './paths/integrations/bedrock/batch.yaml#/batch-job-cancel' + # S3-compatible file storage (boto3 client compatible) + /bedrock/files/{bucket}: + $ref: './paths/integrations/bedrock/s3.yaml#/s3-bucket' + /bedrock/files/{bucket}/{key}: + $ref: './paths/integrations/bedrock/s3.yaml#/s3-object' + # ==================== Cohere Integration ==================== # Chat (v2) /cohere/v2/chat: @@ -413,10 +479,69 @@ paths: /cohere/v2/embed: $ref: './paths/integrations/cohere/embed.yaml#/embed' + # Rerank (v2) + /cohere/v2/rerank: + $ref: './paths/integrations/cohere/rerank.yaml#/rerank' + # Tokenize (v1) /cohere/v1/tokenize: $ref: './paths/integrations/cohere/tokenize.yaml#/tokenize' + # ==================== Cursor IDE Integration ==================== + # Hybrid chat completions (parses Responses payload, returns chat-completions shape) + /cursor/v1/chat/completions: + $ref: './paths/integrations/cursor/cursor.yaml#/chat-completions' + + # Models (OpenAI shape) + /cursor/v1/models: + $ref: './paths/integrations/cursor/cursor.yaml#/models' + + # Anthropic-compatible mounts + /cursor/v1/complete: + $ref: './paths/integrations/cursor/cursor.yaml#/anthropic-complete' + /cursor/v1/messages: + $ref: './paths/integrations/cursor/cursor.yaml#/anthropic-messages' + /cursor/v1/messages/{path}: + $ref: './paths/integrations/cursor/cursor.yaml#/anthropic-messages-wildcard' + /cursor/v1/messages/count_tokens: + $ref: './paths/integrations/cursor/cursor.yaml#/anthropic-count-tokens' + + # Gemini-compatible mounts + /cursor/v1beta/models: + $ref: './paths/integrations/cursor/cursor.yaml#/genai-models' + /cursor/v1beta/models/{model}: + $ref: './paths/integrations/cursor/cursor.yaml#/genai-model-action' + /cursor/v1beta/models/{model}/operations/{operation_id}: + $ref: './paths/integrations/cursor/cursor.yaml#/genai-video-operation' + + # Gemini Rank (Vertex) + /cursor/v1/rank: + $ref: './paths/integrations/cursor/cursor.yaml#/genai-rank' + + # Bedrock-compatible mounts + /cursor/model/{modelId}/converse: + $ref: './paths/integrations/cursor/cursor.yaml#/bedrock-converse' + /cursor/model/{modelId}/converse-stream: + $ref: './paths/integrations/cursor/cursor.yaml#/bedrock-converse-stream' + /cursor/model/{modelId}/invoke: + $ref: './paths/integrations/cursor/cursor.yaml#/bedrock-invoke' + /cursor/model/{modelId}/invoke-with-response-stream: + $ref: './paths/integrations/cursor/cursor.yaml#/bedrock-invoke-stream' + /cursor/rerank: + $ref: './paths/integrations/cursor/cursor.yaml#/bedrock-rerank' + /cursor/model/{modelId}/count-tokens: + $ref: './paths/integrations/cursor/cursor.yaml#/bedrock-count-tokens' + + # Cohere-compatible mounts + /cursor/v2/chat: + $ref: './paths/integrations/cursor/cursor.yaml#/cohere-chat' + /cursor/v2/embed: + $ref: './paths/integrations/cursor/cursor.yaml#/cohere-embed' + /cursor/v2/rerank: + $ref: './paths/integrations/cursor/cursor.yaml#/cohere-rerank' + /cursor/v1/tokenize: + $ref: './paths/integrations/cursor/cursor.yaml#/cohere-tokenize' + # ==================== LiteLLM Integration ==================== # OpenAI-compatible /litellm/v1/completions: @@ -624,6 +749,8 @@ paths: # Plugins /api/plugins: $ref: './paths/management/plugins.yaml#/plugins' + /api/plugins/builtins: + $ref: './paths/management/plugins.yaml#/plugins-builtins' /api/plugins/{name}: $ref: './paths/management/plugins.yaml#/~1plugins~1{name}' @@ -730,6 +857,10 @@ paths: $ref: './paths/management/logging.yaml#/logs' /api/logs/{id}: $ref: './paths/management/logging.yaml#/logs-by-id' + /api/logs/sessions/{session_id}: + $ref: './paths/management/logging.yaml#/logs-sessions-by-id' + /api/logs/sessions/{session_id}/summary: + $ref: './paths/management/logging.yaml#/logs-sessions-summary-by-id' /api/logs/stats: $ref: './paths/management/logging.yaml#/logs-stats' /api/logs/histogram: @@ -748,10 +879,18 @@ paths: $ref: './paths/management/logging.yaml#/logs-histogram-tokens-by-provider' /api/logs/histogram/latency/by-provider: $ref: './paths/management/logging.yaml#/logs-histogram-latency-by-provider' + /api/logs/histogram/cost/by-dimension: + $ref: './paths/management/logging.yaml#/logs-histogram-cost-by-dimension' + /api/logs/histogram/tokens/by-dimension: + $ref: './paths/management/logging.yaml#/logs-histogram-tokens-by-dimension' + /api/logs/histogram/latency/by-dimension: + $ref: './paths/management/logging.yaml#/logs-histogram-latency-by-dimension' /api/logs/dropped: $ref: './paths/management/logging.yaml#/logs-dropped' /api/logs/filterdata: $ref: './paths/management/logging.yaml#/logs-filterdata' + /api/logs/rankings: + $ref: './paths/management/logging.yaml#/logs-rankings' /api/logs/recalculate-cost: $ref: './paths/management/logging.yaml#/logs-recalculate-cost' @@ -764,6 +903,12 @@ paths: $ref: './paths/management/logging.yaml#/mcp-logs-stats' /api/mcp-logs/filterdata: $ref: './paths/management/logging.yaml#/mcp-logs-filterdata' + /api/mcp-logs/histogram: + $ref: './paths/management/logging.yaml#/mcp-logs-histogram' + /api/mcp-logs/histogram/cost: + $ref: './paths/management/logging.yaml#/mcp-logs-histogram-cost' + /api/mcp-logs/histogram/top-tools: + $ref: './paths/management/logging.yaml#/mcp-logs-histogram-top-tools' # Prompt Repository /api/prompt-repo/folders: @@ -788,8 +933,8 @@ paths: $ref: './paths/management/prompts.yaml#/sessions-commit' # Cache - /api/cache/clear/{requestId}: - $ref: './paths/management/cache.yaml#/clear-by-request-id' + /api/cache/clear/{cacheId}: + $ref: './paths/management/cache.yaml#/clear-by-cache-id' /api/cache/clear-by-key/{cacheKey}: $ref: './paths/management/cache.yaml#/clear-by-cache-key' diff --git a/docs/openapi/paths/inference/realtime.yaml b/docs/openapi/paths/inference/realtime.yaml new file mode 100644 index 0000000000..2c734ad7aa --- /dev/null +++ b/docs/openapi/paths/inference/realtime.yaml @@ -0,0 +1,201 @@ +realtime: + get: + operationId: connectRealtime + summary: Realtime API WebSocket + description: | + Opens a bidirectional WebSocket session to a realtime-capable provider + (e.g. OpenAI Realtime, Azure Realtime preview). Bifrost proxies the upstream + socket and applies governance, observability, and key selection on connect. + + The target model is provided via the `model` query parameter (or `deployment` + for Azure-style routes). The OpenAI SDK sends the API key over the + `openai-insecure-api-key.` WebSocket subprotocol; Bifrost extracts it and + treats it the same as a Bearer header. + + Inference auth applies — Bearer/Basic/Virtual Key/API Key headers are all + accepted, plus the subprotocol form above. + tags: + - Realtime + parameters: + - name: model + in: query + required: false + description: | + Provider model identifier in `provider/model` form (e.g. `openai/gpt-4o-realtime-preview`). + Either `model` or `deployment` must be supplied; requests with neither (or both) are rejected with HTTP 400 before the WebSocket upgrade completes. + schema: + type: string + - name: deployment + in: query + required: false + description: Azure deployment name (for Azure-style routes). + schema: + type: string + - name: Upgrade + in: header + required: true + description: Must be `websocket` + schema: + type: string + enum: [websocket] + - name: Sec-WebSocket-Protocol + in: header + required: false + description: | + Optional WebSocket subprotocol. Use `openai-insecure-api-key.` to + authenticate when headers cannot be supplied by the client. + schema: + type: string + responses: + '101': + description: Switching Protocols — WebSocket upgrade succeeded + '400': + description: Invalid model/provider or unsupported realtime configuration + '401': + description: Authentication failed + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +realtime-calls: + post: + operationId: createRealtimeCall + summary: Realtime WebRTC SDP exchange + description: | + Negotiates a WebRTC peer connection with the realtime provider on behalf of + the client. Implements the OpenAI GA `/realtime/calls` contract: the request + body is `multipart/form-data` with `sdp` (client SDP offer) and `session` + (JSON session description containing `model`). + + Bifrost forwards the offer to the upstream provider, returns the upstream + SDP answer to the client, and pipes RTP media between the two peers for the + lifetime of the session. + + Inference auth applies (Bearer/Basic/Virtual Key/API Key). + tags: + - Realtime + requestBody: + required: true + content: + multipart/form-data: + schema: + type: object + required: + - sdp + - session + properties: + sdp: + type: string + description: Client-generated SDP offer + session: + type: string + description: | + JSON-encoded session descriptor. `session.model` is required and + must be in `provider/model` form. + application/sdp: + schema: + type: string + description: | + Legacy raw-SDP format (beta). When using this content type, supply + the model via the `?model=` query parameter instead of the session + JSON. Used by older OpenAI SDKs. + responses: + '200': + description: SDP answer from the upstream realtime provider + content: + application/sdp: + schema: + type: string + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '401': + description: Authentication failed + '502': + description: Upstream provider rejected the SDP exchange + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +realtime-client-secrets: + post: + operationId: createRealtimeClientSecret + summary: Mint a realtime ephemeral client secret + description: | + Calls the upstream realtime provider's `client_secrets` endpoint to mint a + short-lived ephemeral token (e.g. for browser-based WebRTC clients). + Bifrost selects a provider key, evaluates governance, and proxies the + response. The returned token is cached and mapped to the originating + virtual key for downstream attribution. + + Request body must be JSON. `session.model` (or top-level `model`) must use + `provider/model` form. + tags: + - Realtime + requestBody: + required: true + content: + application/json: + schema: + type: object + description: Provider-specific realtime client-secret payload (passthrough). + responses: + '200': + description: Upstream client-secret response (provider passthrough) + content: + application/json: + schema: + type: object + description: Provider-specific response body + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '401': + description: Authentication failed + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +realtime-sessions: + post: + operationId: createRealtimeSession + summary: Mint a realtime session (legacy alias) + description: | + Legacy alias for the realtime client-secret minting endpoint. Behaves + identically to `createRealtimeClientSecret` but uses the `sessions` route + shape; provided for compatibility with older OpenAI Realtime client + libraries. + tags: + - Realtime + requestBody: + required: true + content: + application/json: + schema: + type: object + description: Provider-specific realtime session payload (passthrough). + responses: + '200': + description: Upstream session response (provider passthrough) + content: + application/json: + schema: + type: object + description: Provider-specific response body + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '401': + description: Authentication failed + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/inference/responses.yaml b/docs/openapi/paths/inference/responses.yaml index 9333550e60..e481aa3e23 100644 --- a/docs/openapi/paths/inference/responses.yaml +++ b/docs/openapi/paths/inference/responses.yaml @@ -31,3 +31,50 @@ responses: - BasicAuth: [] - VirtualKeyAuth: [] - ApiKeyAuth: [] + + get: + operationId: createResponseWebSocket + summary: Responses API over WebSocket + description: | + Upgrades the connection to a WebSocket and runs the OpenAI Responses API + in WebSocket Mode. Clients send `response.create` events on the socket and + receive streamed events through the standard inference pipeline (PreLLMHook, + key selection, provider call, PostLLMHook). + + Auth is identical to the HTTP POST variant — Bearer/Basic/Virtual Key/API Key + may be supplied as request headers on the upgrade request. The OpenAI SDK can + also pass an API key via the `openai-insecure-api-key.` WebSocket + subprotocol. + + This GET endpoint shares its path with the POST inference endpoint; route + selection is based on the `Upgrade: websocket` request header. + tags: + - Responses + parameters: + - name: Upgrade + in: header + required: true + description: Must be `websocket` + schema: + type: string + enum: [websocket] + - name: Sec-WebSocket-Protocol + in: header + required: false + description: | + Optional WebSocket subprotocol. Use `openai-insecure-api-key.` to + authenticate when headers cannot be supplied by the client. + schema: + type: string + responses: + '101': + description: Switching Protocols — WebSocket upgrade succeeded + '401': + description: Authentication failed + '429': + description: Maximum concurrent WebSocket connections reached + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/integrations/bedrock/batch.yaml b/docs/openapi/paths/integrations/bedrock/batch.yaml index 20dea5a110..d69fb37b70 100644 --- a/docs/openapi/paths/integrations/bedrock/batch.yaml +++ b/docs/openapi/paths/integrations/bedrock/batch.yaml @@ -1,13 +1,22 @@ # AWS Bedrock - Batch Inference Endpoints -batch-jobs: +batch-job-create: post: operationId: bedrockCreateBatchJob summary: Create batch inference job (Bedrock format) description: | Creates a batch inference job using AWS Bedrock format. + Routes to native Bedrock by default; set `x-model-provider` to route the + job to another provider (`openai`, `gemini`, etc.). tags: - Bedrock Integration + parameters: + - name: x-model-provider + in: header + required: false + schema: + type: string + description: Optional provider override for cross-provider batch routing. requestBody: required: true content: @@ -38,6 +47,8 @@ batch-jobs: - BasicAuth: [] - VirtualKeyAuth: [] - ApiKeyAuth: [] + +batch-jobs-list: get: operationId: bedrockListBatchJobs summary: List batch inference jobs (Bedrock format) @@ -60,7 +71,7 @@ batch-jobs: in: query schema: type: string - enum: [Submitted, InProgress, Completed, Failed, Stopping, Stopped, PartiallyCompleted, Expired, Validating, + enum: [Submitted, InProgress, Completed, Failed, Stopping, Stopped, PartiallyCompleted, Expired, Validating, Scheduled] description: Filter by status - name: nameContains @@ -68,6 +79,12 @@ batch-jobs: schema: type: string description: Filter by job name containing this string + - name: x-model-provider + in: header + required: false + schema: + type: string + description: Optional provider override for cross-provider batch routing. responses: '200': description: Successful response @@ -87,12 +104,12 @@ batch-jobs: application/json: schema: $ref: '../../../schemas/integrations/bedrock/common.yaml#/BedrockError' - security: - BearerAuth: [] - BasicAuth: [] - VirtualKeyAuth: [] - ApiKeyAuth: [] + batch-job-by-id: get: operationId: bedrockRetrieveBatchJob @@ -102,12 +119,21 @@ batch-job-by-id: tags: - Bedrock Integration parameters: - - name: jobIdentifier + - name: job_arn in: path required: true schema: type: string - description: Job identifier + description: | + Bedrock job ARN (URL-encoded). For non-Bedrock providers, the + `arn:aws:bedrock:us-east-1:444444444444:batch:` prefix is stripped + automatically before routing. + - name: x-model-provider + in: header + required: false + schema: + type: string + description: Optional provider override for cross-provider batch routing. responses: '200': description: Successful response @@ -133,21 +159,31 @@ batch-job-by-id: - BasicAuth: [] - VirtualKeyAuth: [] - ApiKeyAuth: [] + batch-job-cancel: post: operationId: bedrockCancelBatchJob summary: Cancel batch inference job (Bedrock format) description: | - Cancels a batch inference job using AWS Bedrock format. + Stops a batch inference job using AWS Bedrock format. tags: - Bedrock Integration parameters: - - name: jobIdentifier + - name: job_arn in: path required: true schema: type: string - description: Job identifier to cancel + description: | + Bedrock job ARN to stop (URL-encoded). For non-Bedrock providers, the + `arn:aws:bedrock:us-east-1:444444444444:batch:` prefix is stripped + automatically before routing. + - name: x-model-provider + in: header + required: false + schema: + type: string + description: Optional provider override for cross-provider batch routing. responses: '200': description: Successful response diff --git a/docs/openapi/paths/integrations/bedrock/count-tokens.yaml b/docs/openapi/paths/integrations/bedrock/count-tokens.yaml new file mode 100644 index 0000000000..165f7ce87f --- /dev/null +++ b/docs/openapi/paths/integrations/bedrock/count-tokens.yaml @@ -0,0 +1,65 @@ +# AWS Bedrock - Count Tokens Endpoint + +count-tokens: + post: + operationId: bedrockCountTokens + summary: Count tokens (Bedrock format) + description: | + Counts tokens for a Converse-style request using AWS Bedrock format. + The request body must include `input.converse` with a complete Converse + payload; only Converse-shaped input is supported. + tags: + - Bedrock Integration + parameters: + - name: modelId + in: path + required: true + schema: + type: string + description: Bedrock model identifier (e.g. `anthropic.claude-3-5-sonnet-20240620-v1:0`). + requestBody: + required: true + content: + application/json: + schema: + type: object + required: + - input + properties: + input: + type: object + required: + - converse + properties: + converse: + description: Converse-shaped request used to compute the token count. + type: object + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + properties: + inputTokens: + type: integer + description: Number of input tokens that would be billed for this request. + '400': + description: Bad request + content: + application/json: + schema: + $ref: '../../../schemas/integrations/bedrock/common.yaml#/BedrockError' + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/bedrock/common.yaml#/BedrockError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/integrations/bedrock/s3.yaml b/docs/openapi/paths/integrations/bedrock/s3.yaml new file mode 100644 index 0000000000..9ed0d31d14 --- /dev/null +++ b/docs/openapi/paths/integrations/bedrock/s3.yaml @@ -0,0 +1,233 @@ +# AWS Bedrock - S3-Compatible File Storage +# +# These routes let boto3's S3 client (using `endpoint_url`) target Bifrost +# directly for file operations that back Bedrock batch jobs. +# `x-model-provider` (header) lets the same routes target other providers' +# file backends. + +s3-object: + put: + operationId: bedrockS3PutObject + summary: S3-compatible PutObject + description: | + Uploads an object to the Bifrost file store using S3 PutObject semantics. + The response is empty with an `ETag` header, mirroring native S3. + tags: + - Bedrock Integration + parameters: + - name: bucket + in: path + required: true + schema: + type: string + description: Bucket name. + - name: key + in: path + required: true + schema: + type: string + description: Object key. Supports `/`-separated nested keys (`{key:*}` wildcard). + - name: x-model-provider + in: header + required: false + schema: + type: string + description: Optional provider override; defaults to `bedrock`. + requestBody: + required: true + content: + application/octet-stream: + schema: + type: string + format: binary + description: Raw object bytes. + responses: + '200': + description: Object stored. Empty body with `ETag` response header. + '500': + description: Internal error returned as S3 XML. + content: + application/xml: + schema: + type: string + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + get: + operationId: bedrockS3GetObject + summary: S3-compatible GetObject + description: Retrieves raw object bytes from the Bifrost file store. + tags: + - Bedrock Integration + parameters: + - name: bucket + in: path + required: true + schema: + type: string + description: Bucket name. + - name: key + in: path + required: true + schema: + type: string + description: Object key (`{key:*}` wildcard). + - name: x-model-provider + in: header + required: false + schema: + type: string + description: Optional provider override; defaults to `bedrock`. + responses: + '200': + description: Raw object content. + content: + application/octet-stream: + schema: + type: string + format: binary + '404': + description: Object not found (S3 XML error). + content: + application/xml: + schema: + type: string + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + head: + operationId: bedrockS3HeadObject + summary: S3-compatible HeadObject + description: | + Returns S3 metadata headers (ETag, Content-Length, etc.) without a body. + tags: + - Bedrock Integration + parameters: + - name: bucket + in: path + required: true + schema: + type: string + description: Bucket name. + - name: key + in: path + required: true + schema: + type: string + description: Object key (`{key:*}` wildcard). + - name: x-model-provider + in: header + required: false + schema: + type: string + description: Optional provider override; defaults to `bedrock`. + responses: + '200': + description: Empty body with object metadata headers. + '404': + description: Object not found (empty body). + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + delete: + operationId: bedrockS3DeleteObject + summary: S3-compatible DeleteObject + description: Deletes an object from the Bifrost file store. + tags: + - Bedrock Integration + parameters: + - name: bucket + in: path + required: true + schema: + type: string + description: Bucket name. + - name: key + in: path + required: true + schema: + type: string + description: Object key (`{key:*}` wildcard). + - name: x-model-provider + in: header + required: false + schema: + type: string + description: Optional provider override; defaults to `bedrock`. + responses: + '200': + description: Object deleted. Empty body. + '500': + description: Internal error returned as S3 XML. + content: + application/xml: + schema: + type: string + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +s3-bucket: + get: + operationId: bedrockS3ListObjects + summary: S3-compatible ListObjectsV2 + description: | + Lists objects in a bucket using S3 ListObjectsV2 semantics. Supports + the `prefix` and `max-keys` query parameters used by boto3. + tags: + - Bedrock Integration + parameters: + - name: bucket + in: path + required: true + schema: + type: string + description: Bucket name. + - name: prefix + in: query + required: false + schema: + type: string + description: Filter keys by prefix. + - name: max-keys + in: query + required: false + schema: + type: integer + description: Maximum number of keys to return. + - name: x-model-provider + in: header + required: false + schema: + type: string + description: Optional provider override; defaults to `bedrock`. + responses: + '200': + description: ListObjectsV2 result (S3 XML or JSON, depending on provider). + content: + application/xml: + schema: + type: string + application/json: + schema: + type: object + additionalProperties: true + '500': + description: Internal error returned as S3 XML. + content: + application/xml: + schema: + type: string + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/integrations/cohere/rerank.yaml b/docs/openapi/paths/integrations/cohere/rerank.yaml new file mode 100644 index 0000000000..e664d7ddf5 --- /dev/null +++ b/docs/openapi/paths/integrations/cohere/rerank.yaml @@ -0,0 +1,86 @@ +# Cohere Integration - Rerank Endpoint + +rerank: + post: + operationId: cohereRerank + summary: Rerank documents (Cohere format) + description: | + Reranks a list of documents against a query using Cohere's v2 Rerank + API. The request body matches Cohere's native format. + tags: + - Cohere Integration + requestBody: + required: true + content: + application/json: + schema: + type: object + required: + - model + - query + - documents + properties: + model: + type: string + description: Cohere rerank model identifier (e.g. `rerank-english-v3.0`). + query: + type: string + description: Query string to rank documents against. + documents: + type: array + description: Documents to rerank. Strings or objects with a `text` field. + items: + oneOf: + - type: string + - type: object + required: + - text + properties: + text: + type: string + additionalProperties: true + top_n: + type: integer + description: Return only the top N reranked results. + return_documents: + type: boolean + description: Include the original document text in the response. + max_tokens_per_doc: + type: integer + description: Truncate each document to this token limit before ranking. + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + properties: + id: + type: string + results: + type: array + items: + type: object + properties: + index: + type: integer + relevance_score: + type: number + document: + type: object + additionalProperties: true + meta: + type: object + additionalProperties: true + additionalProperties: true + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/integrations/cursor/cursor.yaml b/docs/openapi/paths/integrations/cursor/cursor.yaml new file mode 100644 index 0000000000..5d199dfca1 --- /dev/null +++ b/docs/openapi/paths/integrations/cursor/cursor.yaml @@ -0,0 +1,634 @@ +# Cursor IDE Integration +# +# The /cursor/* prefix re-exposes OpenAI list-models, Anthropic, GenAI, +# Bedrock and Cohere routes for use by Cursor's hybrid client. The +# /cursor/v1/chat/completions endpoint is special: it parses requests in +# the OpenAI Responses API format (Cursor's hybrid payload) and converts +# responses back to chat-completions shape. + +chat-completions: + post: + operationId: cursorChatCompletions + summary: Cursor hybrid chat completions + description: | + Accepts Cursor's hybrid chat-completions payload (which is structurally + a Responses API request with `input` blocks) and returns a chat- + completions-shaped response (`choices` + `delta` chunks for streams). + Routes the request through the Responses pipeline internally. + tags: + - Cursor Integration + requestBody: + required: true + content: + application/json: + schema: + $ref: '../../../schemas/integrations/openai/responses.yaml#/OpenAIResponsesRequest' + responses: + '200': + description: Successful response + content: + application/json: + schema: + $ref: '../../../schemas/integrations/openai/chat.yaml#/OpenAIChatResponse' + text/event-stream: + schema: + $ref: '../../../schemas/integrations/openai/chat.yaml#/OpenAIChatStreamResponse' + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +models: + get: + operationId: cursorListModels + summary: List models (Cursor) + description: Lists available models, returning the OpenAI `/v1/models` shape. + tags: + - Cursor Integration + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +# ---- Anthropic-compatible (mounted under /cursor) ---- +anthropic-messages: + post: + operationId: cursorAnthropicMessages + summary: Anthropic messages (Cursor mount) + description: Cursor mount of `POST /anthropic/v1/messages`. Same request/response shape and streaming behaviour. + tags: + - Cursor Integration + requestBody: + required: true + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/messages.yaml#/AnthropicMessageRequest' + responses: + '200': + description: Successful response + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/messages.yaml#/AnthropicMessageResponse' + text/event-stream: + schema: + $ref: '../../../schemas/integrations/anthropic/messages.yaml#/AnthropicStreamEvent' + '400': + description: Bad request + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/common.yaml#/AnthropicError' + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/common.yaml#/AnthropicError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +anthropic-messages-wildcard: + post: + operationId: cursorAnthropicMessagesWildcard + summary: Anthropic messages — wildcard (Cursor mount) + description: | + Cursor mount of the Anthropic messages wildcard (`POST /anthropic/v1/messages/{path}`). + Routes extended Anthropic messages endpoints (e.g. batches, count tokens) + through Cursor. + tags: + - Cursor Integration + parameters: + - name: path + in: path + required: true + schema: + type: string + requestBody: + required: true + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/messages.yaml#/AnthropicMessageRequest' + responses: + '200': + description: Successful response + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/messages.yaml#/AnthropicMessageResponse' + '400': + description: Bad request + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/common.yaml#/AnthropicError' + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/common.yaml#/AnthropicError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +anthropic-complete: + post: + operationId: cursorAnthropicComplete + summary: Anthropic complete (Cursor mount) + description: Cursor mount of the legacy Anthropic `POST /v1/complete` endpoint. + tags: + - Cursor Integration + requestBody: + required: true + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/text.yaml#/AnthropicTextRequest' + responses: + '200': + description: Successful response + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/text.yaml#/AnthropicTextResponse' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +anthropic-count-tokens: + post: + operationId: cursorAnthropicCountTokens + summary: Anthropic count tokens (Cursor mount) + description: Cursor mount of `POST /anthropic/v1/messages/count_tokens`. + tags: + - Cursor Integration + requestBody: + required: true + content: + application/json: + schema: + $ref: '../../../schemas/integrations/anthropic/messages.yaml#/AnthropicMessageRequest' + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +# ---- GenAI-compatible (mounted under /cursor) ---- +genai-model-action: + post: + operationId: cursorGeminiModelAction + summary: Gemini model action (Cursor mount) + description: | + Cursor mount of Gemini's wildcard generate-content/embed/count-tokens/ + predict endpoints. The `model` path parameter includes the action + suffix (e.g. `gemini-pro:generateContent`). + tags: + - Cursor Integration + parameters: + - name: model + in: path + required: true + schema: + type: string + description: Model name optionally followed by an action suffix. + requestBody: + required: true + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/generation.yaml#/GeminiGenerationRequest' + responses: + '200': + description: Successful response + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/generation.yaml#/GeminiGenerationResponse' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +genai-models: + get: + operationId: cursorGeminiListModels + summary: Gemini list models (Cursor mount) + description: Cursor mount of `GET /genai/v1beta/models`. + tags: + - Cursor Integration + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +genai-video-operation: + get: + operationId: cursorGeminiRetrieveVideoOperation + summary: Gemini video operation polling (Cursor mount) + tags: + - Cursor Integration + parameters: + - name: model + in: path + required: true + schema: + type: string + - name: operation_id + in: path + required: true + schema: + type: string + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +# ---- Vertex Rank (mounted under /cursor) ---- +genai-rank: + post: + operationId: cursorVertexRank + summary: Vertex rank (Cursor mount) + description: Cursor mount of `POST /genai/v1/rank` — Vertex AI ranking. + tags: + - Cursor Integration + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +# ---- Bedrock-compatible (mounted under /cursor) ---- +bedrock-converse: + post: + operationId: cursorBedrockConverse + summary: Bedrock converse (Cursor mount) + tags: + - Cursor Integration + parameters: + - name: modelId + in: path + required: true + schema: + type: string + requestBody: + required: true + content: + application/json: + schema: + $ref: '../../../schemas/integrations/bedrock/converse.yaml#/BedrockConverseRequest' + responses: + '200': + description: Successful response + content: + application/json: + schema: + $ref: '../../../schemas/integrations/bedrock/converse.yaml#/BedrockConverseResponse' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +bedrock-converse-stream: + post: + operationId: cursorBedrockConverseStream + summary: Bedrock converse stream (Cursor mount) + tags: + - Cursor Integration + parameters: + - name: modelId + in: path + required: true + schema: + type: string + requestBody: + required: true + content: + application/json: + schema: + $ref: '../../../schemas/integrations/bedrock/converse.yaml#/BedrockConverseRequest' + responses: + '200': + description: AWS event stream of converse chunks. + content: + application/vnd.amazon.eventstream: + schema: + type: string + format: binary + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +bedrock-invoke: + post: + operationId: cursorBedrockInvoke + summary: Bedrock invoke (Cursor mount) + tags: + - Cursor Integration + parameters: + - name: modelId + in: path + required: true + schema: + type: string + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + responses: + '200': + description: Bedrock invoke response (raw model output). + content: + application/json: + schema: + type: object + additionalProperties: true + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +bedrock-invoke-stream: + post: + operationId: cursorBedrockInvokeStream + summary: Bedrock invoke-with-response-stream (Cursor mount) + tags: + - Cursor Integration + parameters: + - name: modelId + in: path + required: true + schema: + type: string + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + responses: + '200': + description: AWS event stream of invoke chunks. + content: + application/vnd.amazon.eventstream: + schema: + type: string + format: binary + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +bedrock-rerank: + post: + operationId: cursorBedrockRerank + summary: Bedrock rerank (Cursor mount) + tags: + - Cursor Integration + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +bedrock-count-tokens: + post: + operationId: cursorBedrockCountTokens + summary: Bedrock count tokens (Cursor mount) + tags: + - Cursor Integration + parameters: + - name: modelId + in: path + required: true + schema: + type: string + requestBody: + required: true + content: + application/json: + schema: + type: object + properties: + input: + type: object + properties: + converse: + type: object + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + properties: + inputTokens: + type: integer + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +# ---- Cohere-compatible (mounted under /cursor) ---- +cohere-chat: + post: + operationId: cursorCohereChat + summary: Cohere chat (Cursor mount) + tags: + - Cursor Integration + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +cohere-embed: + post: + operationId: cursorCohereEmbed + summary: Cohere embed (Cursor mount) + tags: + - Cursor Integration + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +cohere-rerank: + post: + operationId: cursorCohereRerank + summary: Cohere rerank (Cursor mount) + tags: + - Cursor Integration + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +cohere-tokenize: + post: + operationId: cursorCohereTokenize + summary: Cohere tokenize (Cursor mount) + tags: + - Cursor Integration + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/integrations/genai/batches.yaml b/docs/openapi/paths/integrations/genai/batches.yaml new file mode 100644 index 0000000000..e29abbdaee --- /dev/null +++ b/docs/openapi/paths/integrations/genai/batches.yaml @@ -0,0 +1,164 @@ +# Google GenAI (Gemini) - Batches API +# +# Note: Gemini uses `:action` URL conventions. `POST /v1beta/batches/{batch_id}` +# is the cancel operation (Gemini sends the request as `/v1beta/batches/{name}:cancel` +# and the router matches the trailing segment via the wildcard parameter). + +batches: + get: + operationId: geminiListBatches + summary: List batch jobs (Gemini format) + description: Lists batch jobs in Gemini format. Supports `pageSize` / `pageToken` pagination. + tags: + - GenAI Integration + parameters: + - name: pageSize + in: query + required: false + schema: + type: integer + - name: pageToken + in: query + required: false + schema: + type: string + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '400': + description: Bad request + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +batches-by-id: + get: + operationId: geminiRetrieveBatch + summary: Retrieve a batch job (Gemini format) + tags: + - GenAI Integration + parameters: + - name: batch_id + in: path + required: true + schema: + type: string + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '404': + description: Batch not found + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + post: + operationId: geminiCancelBatch + summary: Cancel a batch job (Gemini format) + description: | + Cancels a batch job. Gemini conventionally sends the request as + `/v1beta/batches/{batch_id}:cancel`; the router matches the `:cancel` + suffix into the `batch_id` path parameter. + tags: + - GenAI Integration + parameters: + - name: batch_id + in: path + required: true + schema: + type: string + description: Batch identifier, optionally with a `:cancel` action suffix. + responses: + '200': + description: Cancellation accepted. + content: + application/json: + schema: + type: object + additionalProperties: true + '404': + description: Batch not found + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + delete: + operationId: geminiDeleteBatch + summary: Delete a batch job (Gemini format) + tags: + - GenAI Integration + parameters: + - name: batch_id + in: path + required: true + schema: + type: string + responses: + '200': + description: Successful response (empty object on Gemini). + content: + application/json: + schema: + type: object + additionalProperties: true + '404': + description: Batch not found + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/integrations/genai/cached-contents.yaml b/docs/openapi/paths/integrations/genai/cached-contents.yaml new file mode 100644 index 0000000000..8ad54c6da8 --- /dev/null +++ b/docs/openapi/paths/integrations/genai/cached-contents.yaml @@ -0,0 +1,238 @@ +# Google GenAI (Gemini) - Cached Contents API + +cached-contents: + post: + operationId: geminiCreateCachedContent + summary: Create cached content (Gemini format) + description: | + Creates a cached content entry that can be re-used across subsequent + generate-content calls to reduce repeated prefix tokens. + tags: + - GenAI Integration + parameters: + - name: x-model-provider + in: header + required: false + schema: + type: string + description: Optional provider override. Defaults to `gemini`. Use `vertex` to route to Vertex. + requestBody: + required: true + content: + application/json: + schema: + type: object + required: + - model + properties: + model: + type: string + description: Gemini model identifier (e.g. `gemini-1.5-pro`). The `models/` prefix is stripped. + displayName: + type: string + systemInstruction: + type: object + description: System instruction content. + additionalProperties: true + contents: + type: array + items: {} + tools: + type: array + items: {} + toolConfig: {} + ttl: + type: string + description: TTL in protobuf duration format (e.g. `3600s`). + expireTime: + type: string + format: date-time + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '400': + description: Bad request + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + get: + operationId: geminiListCachedContents + summary: List cached content entries (Gemini format) + tags: + - GenAI Integration + parameters: + - name: pageSize + in: query + required: false + schema: + type: integer + - name: pageToken + in: query + required: false + schema: + type: string + - name: x-model-provider + in: header + required: false + schema: + type: string + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +cached-contents-by-id: + get: + operationId: geminiRetrieveCachedContent + summary: Retrieve cached content (Gemini format) + tags: + - GenAI Integration + parameters: + - name: cached_id + in: path + required: true + schema: + type: string + - name: x-model-provider + in: header + required: false + schema: + type: string + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '404': + description: Cached content not found + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + patch: + operationId: geminiUpdateCachedContent + summary: Update cached content (Gemini format) + description: | + Updates the TTL or expiration time of a cached content entry. Only + `ttl` or `expireTime` may be modified. + tags: + - GenAI Integration + parameters: + - name: cached_id + in: path + required: true + schema: + type: string + - name: x-model-provider + in: header + required: false + schema: + type: string + requestBody: + required: true + content: + application/json: + schema: + type: object + properties: + ttl: + type: string + description: TTL in protobuf duration format. + expireTime: + type: string + format: date-time + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '400': + description: Bad request + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + delete: + operationId: geminiDeleteCachedContent + summary: Delete cached content (Gemini format) + tags: + - GenAI Integration + parameters: + - name: cached_id + in: path + required: true + schema: + type: string + - name: x-model-provider + in: header + required: false + schema: + type: string + responses: + '200': + description: Deletion accepted. Returns an empty JSON object (`{}`), mirroring Gemini's native delete response. + content: + application/json: + schema: + type: object + additionalProperties: true + '404': + description: Cached content not found + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/integrations/genai/rank.yaml b/docs/openapi/paths/integrations/genai/rank.yaml new file mode 100644 index 0000000000..91624a86a0 --- /dev/null +++ b/docs/openapi/paths/integrations/genai/rank.yaml @@ -0,0 +1,65 @@ +# Google GenAI / Vertex - Rerank Endpoint + +rank: + post: + operationId: vertexRank + summary: Rerank documents (Vertex Rank) + description: | + Reranks records using Google Vertex AI's Ranking API. The request body + follows the Vertex `rankRecords` schema. + tags: + - GenAI Integration + requestBody: + required: true + content: + application/json: + schema: + type: object + properties: + model: + type: string + description: Vertex ranking model identifier. + query: + type: string + records: + type: array + items: + type: object + properties: + id: + type: string + title: + type: string + content: + type: string + additionalProperties: true + topN: + type: integer + ignoreRecordDetailsInResponse: + type: boolean + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '400': + description: Bad request + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/integrations/genai/video-operations.yaml b/docs/openapi/paths/integrations/genai/video-operations.yaml new file mode 100644 index 0000000000..28fa102431 --- /dev/null +++ b/docs/openapi/paths/integrations/genai/video-operations.yaml @@ -0,0 +1,53 @@ +# Google GenAI (Gemini) - Long-running Video Operation Polling + +video-operations: + get: + operationId: geminiRetrieveVideoOperation + summary: Retrieve video generation operation (Gemini format) + description: | + Polls the status of a long-running video generation operation produced + by `models/{model}:generateVideos`. The Gemini SDK appends the + operation name as a wildcard path segment. If the operation name + contains `/`, it must be percent-encoded in the request path (for + example, `%2F`) to remain conformant with OpenAPI 3.x path-parameter + semantics. + tags: + - GenAI Integration + parameters: + - name: model + in: path + required: true + schema: + type: string + description: Video model identifier (e.g. `veo-3.1-generate-preview`). + - name: operation_id + in: path + required: true + schema: + type: string + description: Operation name. If it contains `/`, it must be percent-encoded in the request path (for example, `%2F`). + responses: + '200': + description: Operation status payload (Gemini video generation response). + content: + application/json: + schema: + type: object + additionalProperties: true + '404': + description: Operation not found + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + '500': + description: Internal server error + content: + application/json: + schema: + $ref: '../../../schemas/integrations/genai/common.yaml#/GeminiError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/integrations/openai/realtime.yaml b/docs/openapi/paths/integrations/openai/realtime.yaml new file mode 100644 index 0000000000..f8b5444d5c --- /dev/null +++ b/docs/openapi/paths/integrations/openai/realtime.yaml @@ -0,0 +1,164 @@ +# OpenAI Integration - Realtime/WebSocket Aliases +# +# These operations are OpenAI-prefixed aliases for the canonical realtime +# endpoints documented under the Realtime tag. The handler behaviour is +# identical; SDK clients targeting `/openai/v1/*` reach the same handlers. + +realtime: + get: + operationId: openaiRealtimeWebSocket + summary: WebSocket Realtime (OpenAI alias) + description: | + OpenAI-prefixed alias of `GET /v1/realtime`. WebSocket upgrade endpoint + for OpenAI Realtime; selects the model via the `model` query parameter + (Azure GA) or `deployment` (preview). + tags: + - OpenAI Integration + parameters: + - name: model + in: query + required: false + schema: + type: string + - name: deployment + in: query + required: false + schema: + type: string + - name: Upgrade + in: header + required: true + schema: + type: string + enum: [websocket] + - name: Sec-WebSocket-Protocol + in: header + required: false + schema: + type: string + responses: + '101': + description: WebSocket connection established. + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '401': + description: Unauthorized. + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +realtime-calls: + post: + operationId: openaiRealtimeCall + summary: WebRTC Realtime SDP exchange (OpenAI alias) + description: | + OpenAI-prefixed alias of `POST /v1/realtime/calls`. Performs the WebRTC + SDP exchange for OpenAI Realtime calls. Accepts either multipart form + data with `sdp` + `session` parts (GA) or a raw SDP body (legacy). + tags: + - OpenAI Integration + requestBody: + required: true + content: + multipart/form-data: + schema: + type: object + properties: + sdp: + type: string + description: WebRTC offer SDP. + session: + type: string + description: JSON-encoded session configuration. + application/sdp: + schema: + type: string + description: Raw WebRTC offer SDP (legacy form). + responses: + '200': + description: WebRTC answer SDP. + content: + application/sdp: + schema: + type: string + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +realtime-client-secrets: + post: + operationId: openaiRealtimeClientSecret + summary: Create realtime client secret (OpenAI alias) + description: | + OpenAI-prefixed alias of `POST /v1/realtime/client_secrets`. Mints an + ephemeral client secret used to authorize a downstream Realtime client. + tags: + - OpenAI Integration + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + description: OpenAI client-secret request payload. + responses: + '200': + description: Ephemeral client secret payload. + content: + application/json: + schema: + type: object + additionalProperties: true + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +realtime-sessions: + post: + operationId: openaiRealtimeSession + summary: Create realtime session (OpenAI alias) + description: | + OpenAI-prefixed alias of `POST /v1/realtime/sessions`. Creates a + pre-configured realtime session that can be joined by clients. + tags: + - OpenAI Integration + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + description: OpenAI realtime session configuration. + responses: + '200': + description: Realtime session payload. + content: + application/json: + schema: + type: object + additionalProperties: true + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/integrations/openai/responses.yaml b/docs/openapi/paths/integrations/openai/responses.yaml index c017abbbc9..f685e386ab 100644 --- a/docs/openapi/paths/integrations/openai/responses.yaml +++ b/docs/openapi/paths/integrations/openai/responses.yaml @@ -62,6 +62,44 @@ responses: - BasicAuth: [] - VirtualKeyAuth: [] - ApiKeyAuth: [] + get: + operationId: openaiResponsesWebSocket + summary: WebSocket Responses (OpenAI alias) + description: | + WebSocket upgrade endpoint for the Responses API. Mirrors the canonical + `GET /v1/responses` WS endpoint; the OpenAI-prefixed path is selected + when the request includes an `Upgrade: websocket` header. + Authentication accepts the same headers as the inference HTTP surface + plus the `openai-insecure-api-key.` subprotocol fallback. + tags: + - OpenAI Integration + parameters: + - name: Upgrade + in: header + required: true + schema: + type: string + enum: [websocket] + - name: Sec-WebSocket-Protocol + in: header + required: false + schema: + type: string + description: | + Optional subprotocol; supports `openai-insecure-api-key.` for + SDK-style auth. + responses: + '101': + description: WebSocket connection established. + '401': + description: Unauthorized. + '429': + description: Maximum concurrent WebSocket connections reached. + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] azure-responses: post: operationId: azureCreateResponse diff --git a/docs/openapi/paths/integrations/openai/videos.yaml b/docs/openapi/paths/integrations/openai/videos.yaml new file mode 100644 index 0000000000..5f05313af2 --- /dev/null +++ b/docs/openapi/paths/integrations/openai/videos.yaml @@ -0,0 +1,240 @@ +# OpenAI - Videos API +# +# Path aliases (registered for each operation): +# /v1/videos*, /videos*, /openai/videos* +# Only the canonical /openai/v1/videos* surface is documented here; the other +# prefixes are router conveniences that share the same handler. + +videos: + post: + operationId: openaiCreateVideo + summary: Create a video generation + description: | + Submits a video generation job using OpenAI's Videos API. + The request is multipart/form-data and may include reference images. + tags: + - OpenAI Integration + requestBody: + required: true + content: + multipart/form-data: + schema: + type: object + required: + - prompt + - model + properties: + model: + type: string + description: Video model identifier (e.g. `sora-1.0`). + prompt: + type: string + description: Natural-language description of the requested video. + size: + type: string + description: Output resolution (e.g. `1024x1024`). + seconds: + type: integer + description: Target duration in seconds. + input_reference: + type: string + format: binary + description: Optional reference image used to seed generation. + additionalProperties: true + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + description: OpenAI video generation job payload. + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + get: + operationId: openaiListVideos + summary: List video generations + description: Lists previously submitted video generation jobs. + tags: + - OpenAI Integration + parameters: + - name: limit + in: query + required: false + schema: + type: integer + - name: after + in: query + required: false + schema: + type: string + - name: order + in: query + required: false + schema: + type: string + enum: [asc, desc] + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + description: OpenAI video list payload. + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +videos-by-id: + get: + operationId: openaiRetrieveVideo + summary: Retrieve a video generation + description: Returns metadata for a previously submitted video generation job. + tags: + - OpenAI Integration + parameters: + - name: video_id + in: path + required: true + schema: + type: string + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '404': + description: Video not found + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + delete: + operationId: openaiDeleteVideo + summary: Delete a video generation + description: Deletes a previously generated video. + tags: + - OpenAI Integration + parameters: + - name: video_id + in: path + required: true + schema: + type: string + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '404': + description: Video not found + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +videos-content: + get: + operationId: openaiDownloadVideo + summary: Download generated video content + description: Streams the binary video bytes for a completed generation job. + tags: + - OpenAI Integration + parameters: + - name: video_id + in: path + required: true + schema: + type: string + responses: + '200': + description: Raw video bytes + content: + application/octet-stream: + schema: + type: string + format: binary + video/mp4: + schema: + type: string + format: binary + '404': + description: Video not found + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] + +videos-remix: + post: + operationId: openaiRemixVideo + summary: Remix an existing video + description: Creates a new generation by remixing a prior video with new prompt parameters. + tags: + - OpenAI Integration + parameters: + - name: video_id + in: path + required: true + schema: + type: string + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + description: OpenAI video-remix request payload. + responses: + '200': + description: Successful response + content: + application/json: + schema: + type: object + additionalProperties: true + '400': + $ref: '../../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../../openapi.yaml#/components/responses/InternalError' + security: + - BearerAuth: [] + - BasicAuth: [] + - VirtualKeyAuth: [] + - ApiKeyAuth: [] diff --git a/docs/openapi/paths/management/cache.yaml b/docs/openapi/paths/management/cache.yaml index 7c570acebf..29c9d5609d 100644 --- a/docs/openapi/paths/management/cache.yaml +++ b/docs/openapi/paths/management/cache.yaml @@ -1,15 +1,18 @@ -clear-by-request-id: +clear-by-cache-id: delete: - operationId: clearCacheByRequestId - summary: Clear cache by request ID - description: Clears cache entries associated with a specific request ID. + operationId: clearCacheByCacheId + summary: Clear cache entry by cache ID + description: | + Deletes a single cache entry by its storage ID. Read the cache ID from + `extra_fields.cache_debug.cache_id` on a prior response — it is populated + on both cache hits and cache misses. tags: - Cache parameters: - - name: requestId + - name: cacheId in: path required: true - description: Request ID to clear cache for + description: Storage ID of the cache entry to delete schema: type: string responses: diff --git a/docs/openapi/paths/management/logging.yaml b/docs/openapi/paths/management/logging.yaml index ea9f07fe02..bdec15bf33 100644 --- a/docs/openapi/paths/management/logging.yaml +++ b/docs/openapi/paths/management/logging.yaml @@ -1023,3 +1023,473 @@ mcp-logs-filterdata: $ref: '../../schemas/management/logging.yaml#/MCPLogsFilterDataResponse' '500': $ref: '../../openapi.yaml#/components/responses/InternalError' + +logs-sessions-by-id: + get: + operationId: getLogSessionById + summary: Get logs for a session + description: | + Returns the paginated logs belonging to a single parent-request session + (grouped by `parent_request_id`). Sorted ascending by timestamp by default. + tags: + - Logging + parameters: + - name: session_id + in: path + required: true + description: Parent request ID identifying the session + schema: + type: string + - name: limit + in: query + description: Number of logs to return (default 200, max 200) + schema: + type: integer + default: 200 + maximum: 200 + - name: offset + in: query + description: Number of logs to skip + schema: + type: integer + default: 0 + - name: order + in: query + description: Sort order (default `asc`) + schema: + type: string + enum: [asc, desc] + default: asc + responses: + '200': + description: Session logs retrieved successfully + content: + application/json: + schema: + $ref: '../../schemas/management/logging.yaml#/SessionDetailResult' + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + +logs-sessions-summary-by-id: + get: + operationId: getLogSessionSummaryById + summary: Get aggregate totals for a session + description: Returns aggregate request count, token usage, cost, and duration for a single session. + tags: + - Logging + parameters: + - name: session_id + in: path + required: true + description: Parent request ID identifying the session + schema: + type: string + responses: + '200': + description: Session summary retrieved successfully + content: + application/json: + schema: + $ref: '../../schemas/management/logging.yaml#/SessionSummaryResult' + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + +logs-rankings: + get: + operationId: getModelRankings + summary: Get model usage rankings + description: | + Returns models ranked by usage with trend percentages versus the previous + comparable period. Accepts the same filter parameters as the histogram + endpoints. + tags: + - Logging + parameters: + - $ref: '#/_histogram-parameters/providers' + - $ref: '#/_histogram-parameters/models' + - $ref: '#/_histogram-parameters/status' + - $ref: '#/_histogram-parameters/objects' + - $ref: '#/_histogram-parameters/selected_key_ids' + - $ref: '#/_histogram-parameters/virtual_key_ids' + - $ref: '#/_histogram-parameters/routing_rule_ids' + - $ref: '#/_histogram-parameters/routing_engine_used' + - $ref: '#/_histogram-parameters/start_time' + - $ref: '#/_histogram-parameters/end_time' + - $ref: '#/_histogram-parameters/min_latency' + - $ref: '#/_histogram-parameters/max_latency' + - $ref: '#/_histogram-parameters/min_tokens' + - $ref: '#/_histogram-parameters/max_tokens' + - $ref: '#/_histogram-parameters/min_cost' + - $ref: '#/_histogram-parameters/max_cost' + - $ref: '#/_histogram-parameters/missing_cost_only' + - $ref: '#/_histogram-parameters/content_search' + responses: + '200': + description: Model rankings retrieved successfully + content: + application/json: + schema: + $ref: '../../schemas/management/logging.yaml#/ModelRankingResult' + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + +logs-histogram-cost-by-dimension: + get: + operationId: getLogsDimensionCostHistogram + summary: Get cost histogram by dimension + description: | + Returns time-bucketed cost data grouped by an arbitrary dimension + (`provider`, `team_id`, `customer_id`, `user_id`, `business_unit_id`). + The dimension is supplied via the required `dimension` query parameter. + tags: + - Logging + parameters: + - name: dimension + in: query + required: true + description: Grouping dimension + schema: + $ref: '../../schemas/management/logging.yaml#/HistogramDimension' + - $ref: '#/_histogram-parameters/providers' + - $ref: '#/_histogram-parameters/models' + - $ref: '#/_histogram-parameters/status' + - $ref: '#/_histogram-parameters/objects' + - $ref: '#/_histogram-parameters/selected_key_ids' + - $ref: '#/_histogram-parameters/virtual_key_ids' + - $ref: '#/_histogram-parameters/routing_rule_ids' + - $ref: '#/_histogram-parameters/routing_engine_used' + - $ref: '#/_histogram-parameters/start_time' + - $ref: '#/_histogram-parameters/end_time' + - $ref: '#/_histogram-parameters/min_latency' + - $ref: '#/_histogram-parameters/max_latency' + - $ref: '#/_histogram-parameters/min_tokens' + - $ref: '#/_histogram-parameters/max_tokens' + - $ref: '#/_histogram-parameters/min_cost' + - $ref: '#/_histogram-parameters/max_cost' + - $ref: '#/_histogram-parameters/missing_cost_only' + - $ref: '#/_histogram-parameters/content_search' + responses: + '200': + description: Dimension cost histogram retrieved successfully + content: + application/json: + schema: + $ref: '../../schemas/management/logging.yaml#/DimensionCostHistogramResult' + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + +logs-histogram-tokens-by-dimension: + get: + operationId: getLogsDimensionTokenHistogram + summary: Get token histogram by dimension + description: | + Returns time-bucketed token usage grouped by an arbitrary dimension. + See `getLogsDimensionCostHistogram` for the list of supported dimensions. + tags: + - Logging + parameters: + - name: dimension + in: query + required: true + description: Grouping dimension + schema: + $ref: '../../schemas/management/logging.yaml#/HistogramDimension' + - $ref: '#/_histogram-parameters/providers' + - $ref: '#/_histogram-parameters/models' + - $ref: '#/_histogram-parameters/status' + - $ref: '#/_histogram-parameters/objects' + - $ref: '#/_histogram-parameters/selected_key_ids' + - $ref: '#/_histogram-parameters/virtual_key_ids' + - $ref: '#/_histogram-parameters/routing_rule_ids' + - $ref: '#/_histogram-parameters/routing_engine_used' + - $ref: '#/_histogram-parameters/start_time' + - $ref: '#/_histogram-parameters/end_time' + - $ref: '#/_histogram-parameters/min_latency' + - $ref: '#/_histogram-parameters/max_latency' + - $ref: '#/_histogram-parameters/min_tokens' + - $ref: '#/_histogram-parameters/max_tokens' + - $ref: '#/_histogram-parameters/min_cost' + - $ref: '#/_histogram-parameters/max_cost' + - $ref: '#/_histogram-parameters/missing_cost_only' + - $ref: '#/_histogram-parameters/content_search' + responses: + '200': + description: Dimension token histogram retrieved successfully + content: + application/json: + schema: + $ref: '../../schemas/management/logging.yaml#/DimensionTokenHistogramResult' + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + +logs-histogram-latency-by-dimension: + get: + operationId: getLogsDimensionLatencyHistogram + summary: Get latency histogram by dimension + description: | + Returns time-bucketed latency percentiles (avg, p90, p95, p99) grouped by + an arbitrary dimension. + tags: + - Logging + parameters: + - name: dimension + in: query + required: true + description: Grouping dimension + schema: + $ref: '../../schemas/management/logging.yaml#/HistogramDimension' + - $ref: '#/_histogram-parameters/providers' + - $ref: '#/_histogram-parameters/models' + - $ref: '#/_histogram-parameters/status' + - $ref: '#/_histogram-parameters/objects' + - $ref: '#/_histogram-parameters/selected_key_ids' + - $ref: '#/_histogram-parameters/virtual_key_ids' + - $ref: '#/_histogram-parameters/routing_rule_ids' + - $ref: '#/_histogram-parameters/routing_engine_used' + - $ref: '#/_histogram-parameters/start_time' + - $ref: '#/_histogram-parameters/end_time' + - $ref: '#/_histogram-parameters/min_latency' + - $ref: '#/_histogram-parameters/max_latency' + - $ref: '#/_histogram-parameters/min_tokens' + - $ref: '#/_histogram-parameters/max_tokens' + - $ref: '#/_histogram-parameters/min_cost' + - $ref: '#/_histogram-parameters/max_cost' + - $ref: '#/_histogram-parameters/missing_cost_only' + - $ref: '#/_histogram-parameters/content_search' + responses: + '200': + description: Dimension latency histogram retrieved successfully + content: + application/json: + schema: + $ref: '../../schemas/management/logging.yaml#/DimensionLatencyHistogramResult' + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + +mcp-logs-histogram: + get: + operationId: getMCPLogsHistogram + summary: Get MCP tool call volume histogram + description: Returns time-bucketed MCP tool call volume with success/error breakdown. + tags: + - Logging + parameters: + - name: tool_names + in: query + description: Comma-separated list of tool names to filter by + schema: + type: string + - name: server_labels + in: query + description: Comma-separated list of server labels to filter by + schema: + type: string + - name: status + in: query + description: Comma-separated list of statuses to filter by (processing, success, error) + schema: + type: string + - name: virtual_key_ids + in: query + description: Comma-separated list of virtual key IDs to filter by + schema: + type: string + - name: llm_request_ids + in: query + description: Comma-separated list of LLM request IDs to filter by + schema: + type: string + - name: start_time + in: query + description: Start time filter (RFC3339 format) + schema: + type: string + format: date-time + - name: end_time + in: query + description: End time filter (RFC3339 format) + schema: + type: string + format: date-time + - name: min_latency + in: query + description: Minimum latency filter (milliseconds) + schema: + type: number + - name: max_latency + in: query + description: Maximum latency filter (milliseconds) + schema: + type: number + - name: content_search + in: query + description: Search in tool arguments and results + schema: + type: string + responses: + '200': + description: MCP histogram retrieved successfully + content: + application/json: + schema: + $ref: '../../schemas/management/logging.yaml#/MCPHistogramResult' + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + +mcp-logs-histogram-cost: + get: + operationId: getMCPLogsCostHistogram + summary: Get MCP cost histogram + description: Returns time-bucketed MCP tool call cost data. + tags: + - Logging + parameters: + - name: tool_names + in: query + description: Comma-separated list of tool names to filter by + schema: + type: string + - name: server_labels + in: query + description: Comma-separated list of server labels to filter by + schema: + type: string + - name: status + in: query + description: Comma-separated list of statuses to filter by (processing, success, error) + schema: + type: string + - name: virtual_key_ids + in: query + description: Comma-separated list of virtual key IDs to filter by + schema: + type: string + - name: llm_request_ids + in: query + description: Comma-separated list of LLM request IDs to filter by + schema: + type: string + - name: start_time + in: query + description: Start time filter (RFC3339 format) + schema: + type: string + format: date-time + - name: end_time + in: query + description: End time filter (RFC3339 format) + schema: + type: string + format: date-time + - name: min_latency + in: query + description: Minimum latency filter (milliseconds) + schema: + type: number + - name: max_latency + in: query + description: Maximum latency filter (milliseconds) + schema: + type: number + - name: content_search + in: query + description: Search in tool arguments and results + schema: + type: string + responses: + '200': + description: MCP cost histogram retrieved successfully + content: + application/json: + schema: + $ref: '../../schemas/management/logging.yaml#/MCPCostHistogramResult' + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + +mcp-logs-histogram-top-tools: + get: + operationId: getMCPLogsTopTools + summary: Get top MCP tools by call count + description: Returns the top 10 MCP tools by call count, with cost totals. + tags: + - Logging + parameters: + - name: tool_names + in: query + description: Comma-separated list of tool names to filter by + schema: + type: string + - name: server_labels + in: query + description: Comma-separated list of server labels to filter by + schema: + type: string + - name: status + in: query + description: Comma-separated list of statuses to filter by (processing, success, error) + schema: + type: string + - name: virtual_key_ids + in: query + description: Comma-separated list of virtual key IDs to filter by + schema: + type: string + - name: llm_request_ids + in: query + description: Comma-separated list of LLM request IDs to filter by + schema: + type: string + - name: start_time + in: query + description: Start time filter (RFC3339 format) + schema: + type: string + format: date-time + - name: end_time + in: query + description: End time filter (RFC3339 format) + schema: + type: string + format: date-time + - name: min_latency + in: query + description: Minimum latency filter (milliseconds) + schema: + type: number + - name: max_latency + in: query + description: Maximum latency filter (milliseconds) + schema: + type: number + - name: content_search + in: query + description: Search in tool arguments and results + schema: + type: string + responses: + '200': + description: MCP top tools retrieved successfully + content: + application/json: + schema: + $ref: '../../schemas/management/logging.yaml#/MCPTopToolsResult' + '400': + $ref: '../../openapi.yaml#/components/responses/BadRequest' + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' diff --git a/docs/openapi/paths/management/plugins.yaml b/docs/openapi/paths/management/plugins.yaml index 510aecc1bc..fd9c76e93a 100644 --- a/docs/openapi/paths/management/plugins.yaml +++ b/docs/openapi/paths/management/plugins.yaml @@ -1,3 +1,28 @@ +plugins-builtins: + get: + operationId: listBuiltinPlugins + summary: List built-in plugin names + description: | + Returns the canonical list of built-in plugin names available in this Bifrost build. + Use this to discover which plugins can be enabled without supplying a custom binary. + tags: + - Plugins + responses: + '200': + description: Built-in plugin names retrieved successfully + content: + application/json: + schema: + type: object + properties: + plugins: + type: array + description: Canonical names of built-in plugins + items: + type: string + '500': + $ref: '../../openapi.yaml#/components/responses/InternalError' + plugins: get: operationId: listPlugins diff --git a/docs/openapi/schemas/management/logging.yaml b/docs/openapi/schemas/management/logging.yaml index 1084b1a501..2e9001516a 100644 --- a/docs/openapi/schemas/management/logging.yaml +++ b/docs/openapi/schemas/management/logging.yaml @@ -827,3 +827,325 @@ ProviderLatencyHistogramResult: type: array items: type: string + +HistogramDimension: + type: string + description: | + Grouping dimension for dimension-aware histograms. Used by the + `/api/logs/histogram/{cost,tokens,latency}/by-dimension` endpoints. + enum: + - provider + - team_id + - customer_id + - user_id + - business_unit_id + +DimensionCostHistogramBucket: + type: object + description: Time-bucketed cost data grouped by an arbitrary dimension + properties: + timestamp: + type: string + format: date-time + total_cost: + type: number + by_dimension: + type: object + additionalProperties: + type: number + description: Cost breakdown keyed by the dimension value + +DimensionCostHistogramResult: + type: object + description: Dimension-grouped cost histogram result + properties: + buckets: + type: array + items: + $ref: '#/DimensionCostHistogramBucket' + bucket_size_seconds: + type: integer + format: int64 + dimension: + $ref: '#/HistogramDimension' + dimension_values: + type: array + items: + type: string + +DimensionTokenStats: + type: object + description: Token statistics for a single dimension value + properties: + prompt_tokens: + type: integer + format: int64 + completion_tokens: + type: integer + format: int64 + total_tokens: + type: integer + format: int64 + +DimensionTokenHistogramBucket: + type: object + description: Time-bucketed token usage grouped by an arbitrary dimension + properties: + timestamp: + type: string + format: date-time + by_dimension: + type: object + additionalProperties: + $ref: '#/DimensionTokenStats' + +DimensionTokenHistogramResult: + type: object + description: Dimension-grouped token histogram result + properties: + buckets: + type: array + items: + $ref: '#/DimensionTokenHistogramBucket' + bucket_size_seconds: + type: integer + format: int64 + dimension: + $ref: '#/HistogramDimension' + dimension_values: + type: array + items: + type: string + +DimensionLatencyStats: + type: object + description: Latency statistics for a single dimension value + properties: + avg_latency: + type: number + p90_latency: + type: number + p95_latency: + type: number + p99_latency: + type: number + total_requests: + type: integer + format: int64 + +DimensionLatencyHistogramBucket: + type: object + description: Time-bucketed latency data grouped by an arbitrary dimension + properties: + timestamp: + type: string + format: date-time + by_dimension: + type: object + additionalProperties: + $ref: '#/DimensionLatencyStats' + +DimensionLatencyHistogramResult: + type: object + description: Dimension-grouped latency histogram result + properties: + buckets: + type: array + items: + $ref: '#/DimensionLatencyHistogramBucket' + bucket_size_seconds: + type: integer + format: int64 + dimension: + $ref: '#/HistogramDimension' + dimension_values: + type: array + items: + type: string + +SessionDetailResult: + type: object + description: Paginated logs for a single parent-request session + properties: + session_id: + type: string + logs: + type: array + items: + $ref: '#/LogEntry' + pagination: + type: object + description: Pagination metadata. Session logs are always sorted by timestamp; the `sort_by` field is not configurable. + properties: + limit: + type: integer + offset: + type: integer + order: + type: string + enum: [asc, desc] + total_count: + type: integer + format: int64 + description: Total number of items matching the query + count: + type: integer + format: int64 + description: Total log count for this session + returned_count: + type: integer + description: Number of logs returned in this response + has_more: + type: boolean + +SessionSummaryResult: + type: object + description: Aggregate totals for a single parent-request session + properties: + session_id: + type: string + count: + type: integer + format: int64 + total_cost: + type: number + total_tokens: + type: integer + format: int64 + started_at: + type: string + description: Timestamp of the first log in the session (RFC3339 nanos) + latest_at: + type: string + description: Timestamp of the most recent log in the session (RFC3339 nanos) + duration_ms: + type: integer + format: int64 + +ModelRankingEntry: + type: object + description: Aggregated stats for a single model over the query period + properties: + model: + type: string + provider: + type: string + total_requests: + type: integer + format: int64 + success_count: + type: integer + format: int64 + success_rate: + type: number + total_tokens: + type: integer + format: int64 + total_cost: + type: number + avg_latency: + type: number + +ModelRankingTrend: + type: object + description: Percentage change versus the previous comparable period + properties: + has_previous_period: + type: boolean + requests_trend: + type: number + tokens_trend: + type: number + cost_trend: + type: number + latency_trend: + type: number + +ModelRankingWithTrend: + allOf: + - $ref: '#/ModelRankingEntry' + - type: object + properties: + trend: + $ref: '#/ModelRankingTrend' + +ModelRankingResult: + type: object + description: Models ranked by usage with trend comparison + properties: + rankings: + type: array + items: + $ref: '#/ModelRankingWithTrend' + +MCPHistogramBucket: + type: object + description: Time-bucketed MCP tool call volume + properties: + timestamp: + type: string + format: date-time + count: + type: integer + format: int64 + success: + type: integer + format: int64 + error: + type: integer + format: int64 + +MCPHistogramResult: + type: object + description: Time-bucketed MCP tool call volume histogram + properties: + buckets: + type: array + items: + $ref: '#/MCPHistogramBucket' + bucket_size_seconds: + type: integer + format: int64 + +MCPCostHistogramBucket: + type: object + description: Time-bucketed MCP cost data + properties: + timestamp: + type: string + format: date-time + total_cost: + type: number + +MCPCostHistogramResult: + type: object + description: Time-bucketed MCP cost histogram + properties: + buckets: + type: array + items: + $ref: '#/MCPCostHistogramBucket' + bucket_size_seconds: + type: integer + format: int64 + +MCPTopToolResult: + type: object + description: Aggregated stats for a single MCP tool + properties: + tool_name: + type: string + count: + type: integer + format: int64 + cost: + type: number + +MCPTopToolsResult: + type: object + description: Top MCP tools by call count (limit 10) + properties: + tools: + type: array + items: + $ref: '#/MCPTopToolResult' diff --git a/examples/k8s/examples/values-semantic-search-redis.yaml b/examples/k8s/examples/values-semantic-search-redis.yaml index 4d9cd18224..8867162e2f 100644 --- a/examples/k8s/examples/values-semantic-search-redis.yaml +++ b/examples/k8s/examples/values-semantic-search-redis.yaml @@ -18,7 +18,6 @@ bifrost: cache_by_model: true cache_by_provider: true exclude_system_prompt: true - cleanup_on_shutdown: false vector_store_namespace: "bifrost-semantic-cache" vectorStore: diff --git a/examples/k8s/examples/values-semantic-search-weaviate.yaml b/examples/k8s/examples/values-semantic-search-weaviate.yaml index 7691106be7..0d00d87d1a 100644 --- a/examples/k8s/examples/values-semantic-search-weaviate.yaml +++ b/examples/k8s/examples/values-semantic-search-weaviate.yaml @@ -18,7 +18,6 @@ bifrost: cache_by_model: true cache_by_provider: true exclude_system_prompt: false - cleanup_on_shutdown: false vector_store_namespace: "bifrost-semantic-cache" vectorStore: diff --git a/examples/mcps/temperature/package-lock.json b/examples/mcps/temperature/package-lock.json index dd0f524b85..c567f600df 100644 --- a/examples/mcps/temperature/package-lock.json +++ b/examples/mcps/temperature/package-lock.json @@ -8,7 +8,7 @@ "name": "temperature-mcp-server", "version": "1.0.0", "dependencies": { - "@modelcontextprotocol/sdk": "1.0.4", + "@modelcontextprotocol/sdk": "1.29.0", "zod": "3.24.1" }, "devDependencies": { @@ -16,15 +16,75 @@ "typescript": "5.9.3" } }, + "node_modules/@hono/node-server": { + "version": "1.19.14", + "resolved": "https://registry.npmjs.org/@hono/node-server/-/node-server-1.19.14.tgz", + "integrity": "sha512-GwtvgtXxnWsucXvbQXkRgqksiH2Qed37H9xHZocE5sA3N8O8O8/8FA3uclQXxXVzc9XBZuEOMK7+r02FmSpHtw==", + "license": "MIT", + "engines": { + "node": ">=18.14.1" + }, + "peerDependencies": { + "hono": "^4" + } + }, "node_modules/@modelcontextprotocol/sdk": { - "version": "1.0.4", - "resolved": "https://registry.npmjs.org/@modelcontextprotocol/sdk/-/sdk-1.0.4.tgz", - "integrity": "sha512-C+jw1lF6HSGzs7EZpzHbXfzz9rj9him4BaoumlTciW/IDDgIpweF/qiCWKlP02QKg5PPcgY6xY2WCt5y2tpYow==", + "version": "1.29.0", + "resolved": "https://registry.npmjs.org/@modelcontextprotocol/sdk/-/sdk-1.29.0.tgz", + "integrity": "sha512-zo37mZA9hJWpULgkRpowewez1y6ML5GsXJPY8FI0tBBCd77HEvza4jDqRKOXgHNn867PVGCyTdzqpz0izu5ZjQ==", "license": "MIT", "dependencies": { + "@hono/node-server": "^1.19.9", + "ajv": "^8.17.1", + "ajv-formats": "^3.0.1", "content-type": "^1.0.5", + "cors": "^2.8.5", + "cross-spawn": "^7.0.5", + "eventsource": "^3.0.2", + "eventsource-parser": "^3.0.0", + "express": "^5.2.1", + "express-rate-limit": "^8.2.1", + "hono": "^4.11.4", + "jose": "^6.1.3", + "json-schema-typed": "^8.0.2", + "pkce-challenge": "^5.0.0", "raw-body": "^3.0.0", - "zod": "^3.23.8" + "zod": "^3.25 || ^4.0", + "zod-to-json-schema": "^3.25.1" + }, + "engines": { + "node": ">=18" + }, + "peerDependencies": { + "@cfworker/json-schema": "^4.1.1", + "zod": "^3.25 || ^4.0" + }, + "peerDependenciesMeta": { + "@cfworker/json-schema": { + "optional": true + }, + "zod": { + "optional": false + } + } + }, + "node_modules/@modelcontextprotocol/sdk/node_modules/zod": { + "version": "4.4.3", + "resolved": "https://registry.npmjs.org/zod/-/zod-4.4.3.tgz", + "integrity": "sha512-ytENFjIJFl2UwYglde2jchW2Hwm4GJFLDiSXWdTrJQBIN9Fcyp7n4DhxJEiWNAJMV1/BqWfW/kkg71UDcHJyTQ==", + "license": "MIT", + "peer": true, + "funding": { + "url": "https://github.com/sponsors/colinhacks" + } + }, + "node_modules/@modelcontextprotocol/sdk/node_modules/zod-to-json-schema": { + "version": "3.25.2", + "resolved": "https://registry.npmjs.org/zod-to-json-schema/-/zod-to-json-schema-3.25.2.tgz", + "integrity": "sha512-O/PgfnpT1xKSDeQYSCfRI5Gy3hPf91mKVDuYLUHZJMiDFptvP41MSnWofm8dnCm0256ZNfZIM7DSzuSMAFnjHA==", + "license": "ISC", + "peerDependencies": { + "zod": "^3.25.28 || ^4" } }, "node_modules/@types/node": { @@ -34,6 +94,76 @@ "dev": true, "license": "MIT" }, + "node_modules/accepts": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/accepts/-/accepts-2.0.0.tgz", + "integrity": "sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng==", + "license": "MIT", + "dependencies": { + "mime-types": "^3.0.0", + "negotiator": "^1.0.0" + }, + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/ajv": { + "version": "8.20.0", + "resolved": "https://registry.npmjs.org/ajv/-/ajv-8.20.0.tgz", + "integrity": "sha512-Thbli+OlOj+iMPYFBVBfJ3OmCAnaSyNn4M1vz9T6Gka5Jt9ba/HIR56joy65tY6kx/FCF5VXNB819Y7/GUrBGA==", + "license": "MIT", + "dependencies": { + "fast-deep-equal": "^3.1.3", + "fast-uri": "^3.0.1", + "json-schema-traverse": "^1.0.0", + "require-from-string": "^2.0.2" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/epoberezkin" + } + }, + "node_modules/ajv-formats": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/ajv-formats/-/ajv-formats-3.0.1.tgz", + "integrity": "sha512-8iUql50EUR+uUcdRQ3HDqa6EVyo3docL8g5WJ3FNcWmu62IbkGUue/pEyLBW8VGKKucTPgqeks4fIU1DA4yowQ==", + "license": "MIT", + "dependencies": { + "ajv": "^8.0.0" + }, + "peerDependencies": { + "ajv": "^8.0.0" + }, + "peerDependenciesMeta": { + "ajv": { + "optional": true + } + } + }, + "node_modules/body-parser": { + "version": "2.2.2", + "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-2.2.2.tgz", + "integrity": "sha512-oP5VkATKlNwcgvxi0vM0p/D3n2C3EReYVX+DNYs5TjZFn/oQt2j+4sVJtSMr18pdRr8wjTcBl6LoV+FUwzPmNA==", + "license": "MIT", + "dependencies": { + "bytes": "^3.1.2", + "content-type": "^1.0.5", + "debug": "^4.4.3", + "http-errors": "^2.0.0", + "iconv-lite": "^0.7.0", + "on-finished": "^2.4.1", + "qs": "^6.14.1", + "raw-body": "^3.0.1", + "type-is": "^2.0.1" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, "node_modules/bytes": { "version": "3.1.2", "resolved": "https://registry.npmjs.org/bytes/-/bytes-3.1.2.tgz", @@ -43,6 +173,48 @@ "node": ">= 0.8" } }, + "node_modules/call-bind-apply-helpers": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz", + "integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "function-bind": "^1.1.2" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/call-bound": { + "version": "1.0.4", + "resolved": "https://registry.npmjs.org/call-bound/-/call-bound-1.0.4.tgz", + "integrity": "sha512-+ys997U96po4Kx/ABpBCqhA9EuxJaQWDQg7295H4hBphv3IZg0boBKuwYpt4YXp6MZ5AmZQnU/tyMTlRpaSejg==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.2", + "get-intrinsic": "^1.3.0" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/content-disposition": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/content-disposition/-/content-disposition-1.1.0.tgz", + "integrity": "sha512-5jRCH9Z/+DRP7rkvY83B+yGIGX96OYdJmzngqnw2SBSxqCFPd0w2km3s5iawpGX8krnwSGmF0FW5Nhr0Hfai3g==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, "node_modules/content-type": { "version": "1.0.5", "resolved": "https://registry.npmjs.org/content-type/-/content-type-1.0.5.tgz", @@ -52,6 +224,72 @@ "node": ">= 0.6" } }, + "node_modules/cookie": { + "version": "0.7.2", + "resolved": "https://registry.npmjs.org/cookie/-/cookie-0.7.2.tgz", + "integrity": "sha512-yki5XnKuf750l50uGTllt6kKILY4nQ1eNIQatoXEByZ5dWgnKqbnqmTrBE5B4N7lrMJKQ2ytWMiTO2o0v6Ew/w==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/cookie-signature": { + "version": "1.2.2", + "resolved": "https://registry.npmjs.org/cookie-signature/-/cookie-signature-1.2.2.tgz", + "integrity": "sha512-D76uU73ulSXrD1UXF4KE2TMxVVwhsnCgfAyTg9k8P6KGZjlXKrOLe4dJQKI3Bxi5wjesZoFXJWElNWBjPZMbhg==", + "license": "MIT", + "engines": { + "node": ">=6.6.0" + } + }, + "node_modules/cors": { + "version": "2.8.6", + "resolved": "https://registry.npmjs.org/cors/-/cors-2.8.6.tgz", + "integrity": "sha512-tJtZBBHA6vjIAaF6EnIaq6laBBP9aq/Y3ouVJjEfoHbRBcHBAHYcMh/w8LDrk2PvIMMq8gmopa5D4V8RmbrxGw==", + "license": "MIT", + "dependencies": { + "object-assign": "^4", + "vary": "^1" + }, + "engines": { + "node": ">= 0.10" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/cross-spawn": { + "version": "7.0.6", + "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz", + "integrity": "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==", + "license": "MIT", + "dependencies": { + "path-key": "^3.1.0", + "shebang-command": "^2.0.0", + "which": "^2.0.1" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/debug": { + "version": "4.4.3", + "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz", + "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==", + "license": "MIT", + "dependencies": { + "ms": "^2.1.3" + }, + "engines": { + "node": ">=6.0" + }, + "peerDependenciesMeta": { + "supports-color": { + "optional": true + } + } + }, "node_modules/depd": { "version": "2.0.0", "resolved": "https://registry.npmjs.org/depd/-/depd-2.0.0.tgz", @@ -61,6 +299,316 @@ "node": ">= 0.8" } }, + "node_modules/dunder-proto": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz", + "integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.1", + "es-errors": "^1.3.0", + "gopd": "^1.2.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/ee-first": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/ee-first/-/ee-first-1.1.1.tgz", + "integrity": "sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==", + "license": "MIT" + }, + "node_modules/encodeurl": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-2.0.0.tgz", + "integrity": "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/es-define-property": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.1.tgz", + "integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/es-errors": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz", + "integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/es-object-atoms": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/es-object-atoms/-/es-object-atoms-1.1.1.tgz", + "integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/escape-html": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/escape-html/-/escape-html-1.0.3.tgz", + "integrity": "sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==", + "license": "MIT" + }, + "node_modules/etag": { + "version": "1.8.1", + "resolved": "https://registry.npmjs.org/etag/-/etag-1.8.1.tgz", + "integrity": "sha512-aIL5Fx7mawVa300al2BnEE4iNvo1qETxLrPI/o05L7z6go7fCw1J6EQmbK4FmJ2AS7kgVF/KEZWufBfdClMcPg==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/eventsource": { + "version": "3.0.7", + "resolved": "https://registry.npmjs.org/eventsource/-/eventsource-3.0.7.tgz", + "integrity": "sha512-CRT1WTyuQoD771GW56XEZFQ/ZoSfWid1alKGDYMmkt2yl8UXrVR4pspqWNEcqKvVIzg6PAltWjxcSSPrboA4iA==", + "license": "MIT", + "dependencies": { + "eventsource-parser": "^3.0.1" + }, + "engines": { + "node": ">=18.0.0" + } + }, + "node_modules/eventsource-parser": { + "version": "3.0.8", + "resolved": "https://registry.npmjs.org/eventsource-parser/-/eventsource-parser-3.0.8.tgz", + "integrity": "sha512-70QWGkr4snxr0OXLRWsFLeRBIRPuQOvt4s8QYjmUlmlkyTZkRqS7EDVRZtzU3TiyDbXSzaOeF0XUKy8PchzukQ==", + "license": "MIT", + "engines": { + "node": ">=18.0.0" + } + }, + "node_modules/express": { + "version": "5.2.1", + "resolved": "https://registry.npmjs.org/express/-/express-5.2.1.tgz", + "integrity": "sha512-hIS4idWWai69NezIdRt2xFVofaF4j+6INOpJlVOLDO8zXGpUVEVzIYk12UUi2JzjEzWL3IOAxcTubgz9Po0yXw==", + "license": "MIT", + "peer": true, + "dependencies": { + "accepts": "^2.0.0", + "body-parser": "^2.2.1", + "content-disposition": "^1.0.0", + "content-type": "^1.0.5", + "cookie": "^0.7.1", + "cookie-signature": "^1.2.1", + "debug": "^4.4.0", + "depd": "^2.0.0", + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "etag": "^1.8.1", + "finalhandler": "^2.1.0", + "fresh": "^2.0.0", + "http-errors": "^2.0.0", + "merge-descriptors": "^2.0.0", + "mime-types": "^3.0.0", + "on-finished": "^2.4.1", + "once": "^1.4.0", + "parseurl": "^1.3.3", + "proxy-addr": "^2.0.7", + "qs": "^6.14.0", + "range-parser": "^1.2.1", + "router": "^2.2.0", + "send": "^1.1.0", + "serve-static": "^2.2.0", + "statuses": "^2.0.1", + "type-is": "^2.0.1", + "vary": "^1.1.2" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/express-rate-limit": { + "version": "8.5.2", + "resolved": "https://registry.npmjs.org/express-rate-limit/-/express-rate-limit-8.5.2.tgz", + "integrity": "sha512-5Kb34ipNX694DH48vN9irak1Qx30nb0PLYHXfJgw4YEjiC3ZEmZJhwOp+VfiCYwFzvFTdB9QkArYS5kXa2cx2A==", + "license": "MIT", + "dependencies": { + "ip-address": "^10.2.0" + }, + "engines": { + "node": ">= 16" + }, + "funding": { + "url": "https://github.com/sponsors/express-rate-limit" + }, + "peerDependencies": { + "express": ">= 4.11" + } + }, + "node_modules/fast-deep-equal": { + "version": "3.1.3", + "resolved": "https://registry.npmjs.org/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz", + "integrity": "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==", + "license": "MIT" + }, + "node_modules/fast-uri": { + "version": "3.1.2", + "resolved": "https://registry.npmjs.org/fast-uri/-/fast-uri-3.1.2.tgz", + "integrity": "sha512-rVjf7ArG3LTk+FS6Yw81V1DLuZl1bRbNrev6Tmd/9RaroeeRRJhAt7jg/6YFxbvAQXUCavSoZhPPj6oOx+5KjQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "BSD-3-Clause" + }, + "node_modules/finalhandler": { + "version": "2.1.1", + "resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-2.1.1.tgz", + "integrity": "sha512-S8KoZgRZN+a5rNwqTxlZZePjT/4cnm0ROV70LedRHZ0p8u9fRID0hJUZQpkKLzro8LfmC8sx23bY6tVNxv8pQA==", + "license": "MIT", + "dependencies": { + "debug": "^4.4.0", + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "on-finished": "^2.4.1", + "parseurl": "^1.3.3", + "statuses": "^2.0.1" + }, + "engines": { + "node": ">= 18.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/forwarded": { + "version": "0.2.0", + "resolved": "https://registry.npmjs.org/forwarded/-/forwarded-0.2.0.tgz", + "integrity": "sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/fresh": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/fresh/-/fresh-2.0.0.tgz", + "integrity": "sha512-Rx/WycZ60HOaqLKAi6cHRKKI7zxWbJ31MhntmtwMoaTeF7XFH9hhBp8vITaMidfljRQ6eYWCKkaTK+ykVJHP2A==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/function-bind": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz", + "integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==", + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/get-intrinsic": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz", + "integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.2", + "es-define-property": "^1.0.1", + "es-errors": "^1.3.0", + "es-object-atoms": "^1.1.1", + "function-bind": "^1.1.2", + "get-proto": "^1.0.1", + "gopd": "^1.2.0", + "has-symbols": "^1.1.0", + "hasown": "^2.0.2", + "math-intrinsics": "^1.1.0" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/get-proto": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/get-proto/-/get-proto-1.0.1.tgz", + "integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==", + "license": "MIT", + "dependencies": { + "dunder-proto": "^1.0.1", + "es-object-atoms": "^1.0.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/gopd": { + "version": "1.2.0", + "resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz", + "integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/has-symbols": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz", + "integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/hasown": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.3.tgz", + "integrity": "sha512-ej4AhfhfL2Q2zpMmLo7U1Uv9+PyhIZpgQLGT1F9miIGmiCJIoCgSmczFdrc97mWT4kVY72KA+WnnhJ5pghSvSg==", + "license": "MIT", + "dependencies": { + "function-bind": "^1.1.2" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/hono": { + "version": "4.12.18", + "resolved": "https://registry.npmjs.org/hono/-/hono-4.12.18.tgz", + "integrity": "sha512-RWzP96k/yv0PQfyXnWjs6zot20TqfpfsNXhOnev8d1InAxubW93L11/oNUc3tQqn2G0bSdAOBpX+2uDFHV7kdQ==", + "license": "MIT", + "peer": true, + "engines": { + "node": ">=16.9.0" + } + }, "node_modules/http-errors": { "version": "2.0.1", "resolved": "https://registry.npmjs.org/http-errors/-/http-errors-2.0.1.tgz", @@ -103,6 +651,243 @@ "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==", "license": "ISC" }, + "node_modules/ip-address": { + "version": "10.2.0", + "resolved": "https://registry.npmjs.org/ip-address/-/ip-address-10.2.0.tgz", + "integrity": "sha512-/+S6j4E9AHvW9SWMSEY9Xfy66O5PWvVEJ08O0y5JGyEKQpojb0K0GKpz/v5HJ/G0vi3D2sjGK78119oXZeE0qA==", + "license": "MIT", + "engines": { + "node": ">= 12" + } + }, + "node_modules/ipaddr.js": { + "version": "1.9.1", + "resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-1.9.1.tgz", + "integrity": "sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g==", + "license": "MIT", + "engines": { + "node": ">= 0.10" + } + }, + "node_modules/is-promise": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/is-promise/-/is-promise-4.0.0.tgz", + "integrity": "sha512-hvpoI6korhJMnej285dSg6nu1+e6uxs7zG3BYAm5byqDsgJNWwxzM6z6iZiAgQR4TJ30JmBTOwqZUw3WlyH3AQ==", + "license": "MIT" + }, + "node_modules/isexe": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/isexe/-/isexe-2.0.0.tgz", + "integrity": "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==", + "license": "ISC" + }, + "node_modules/jose": { + "version": "6.2.3", + "resolved": "https://registry.npmjs.org/jose/-/jose-6.2.3.tgz", + "integrity": "sha512-YYVDInQKFJfR/xa3ojUTl8c2KoTwiL1R5Wg9YCydwH0x0B9grbzlg5HC7mMjCtUJjbQ/YnGEZIhI5tCgfTb4Hw==", + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/panva" + } + }, + "node_modules/json-schema-traverse": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-1.0.0.tgz", + "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==", + "license": "MIT" + }, + "node_modules/json-schema-typed": { + "version": "8.0.2", + "resolved": "https://registry.npmjs.org/json-schema-typed/-/json-schema-typed-8.0.2.tgz", + "integrity": "sha512-fQhoXdcvc3V28x7C7BMs4P5+kNlgUURe2jmUT1T//oBRMDrqy1QPelJimwZGo7Hg9VPV3EQV5Bnq4hbFy2vetA==", + "license": "BSD-2-Clause" + }, + "node_modules/math-intrinsics": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz", + "integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/media-typer": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/media-typer/-/media-typer-1.1.0.tgz", + "integrity": "sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/merge-descriptors": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-2.0.0.tgz", + "integrity": "sha512-Snk314V5ayFLhp3fkUREub6WtjBfPdCPY1Ln8/8munuLuiYhsABgBVWsozAG+MWMbVEvcdcpbi9R7ww22l9Q3g==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/mime-db": { + "version": "1.54.0", + "resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.54.0.tgz", + "integrity": "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/mime-types": { + "version": "3.0.2", + "resolved": "https://registry.npmjs.org/mime-types/-/mime-types-3.0.2.tgz", + "integrity": "sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A==", + "license": "MIT", + "dependencies": { + "mime-db": "^1.54.0" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/ms": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz", + "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==", + "license": "MIT" + }, + "node_modules/negotiator": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/negotiator/-/negotiator-1.0.0.tgz", + "integrity": "sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/object-assign": { + "version": "4.1.1", + "resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz", + "integrity": "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/object-inspect": { + "version": "1.13.4", + "resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.4.tgz", + "integrity": "sha512-W67iLl4J2EXEGTbfeHCffrjDfitvLANg0UlX3wFUUSTx92KXRFegMHUVgSqE+wvhAbi4WqjGg9czysTV2Epbew==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/on-finished": { + "version": "2.4.1", + "resolved": "https://registry.npmjs.org/on-finished/-/on-finished-2.4.1.tgz", + "integrity": "sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg==", + "license": "MIT", + "dependencies": { + "ee-first": "1.1.1" + }, + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/once": { + "version": "1.4.0", + "resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz", + "integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==", + "license": "ISC", + "dependencies": { + "wrappy": "1" + } + }, + "node_modules/parseurl": { + "version": "1.3.3", + "resolved": "https://registry.npmjs.org/parseurl/-/parseurl-1.3.3.tgz", + "integrity": "sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/path-key": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/path-key/-/path-key-3.1.1.tgz", + "integrity": "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q==", + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/path-to-regexp": { + "version": "8.4.2", + "resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.4.2.tgz", + "integrity": "sha512-qRcuIdP69NPm4qbACK+aDogI5CBDMi1jKe0ry5rSQJz8JVLsC7jV8XpiJjGRLLol3N+R5ihGYcrPLTno6pAdBA==", + "license": "MIT", + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/pkce-challenge": { + "version": "5.0.1", + "resolved": "https://registry.npmjs.org/pkce-challenge/-/pkce-challenge-5.0.1.tgz", + "integrity": "sha512-wQ0b/W4Fr01qtpHlqSqspcj3EhBvimsdh0KlHhH8HRZnMsEa0ea2fTULOXOS9ccQr3om+GcGRk4e+isrZWV8qQ==", + "license": "MIT", + "engines": { + "node": ">=16.20.0" + } + }, + "node_modules/proxy-addr": { + "version": "2.0.7", + "resolved": "https://registry.npmjs.org/proxy-addr/-/proxy-addr-2.0.7.tgz", + "integrity": "sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg==", + "license": "MIT", + "dependencies": { + "forwarded": "0.2.0", + "ipaddr.js": "1.9.1" + }, + "engines": { + "node": ">= 0.10" + } + }, + "node_modules/qs": { + "version": "6.15.1", + "resolved": "https://registry.npmjs.org/qs/-/qs-6.15.1.tgz", + "integrity": "sha512-6YHEFRL9mfgcAvql/XhwTvf5jKcOiiupt2FiJxHkiX1z4j7WL8J/jRHYLluORvc1XxB5rV20KoeK00gVJamspg==", + "license": "BSD-3-Clause", + "dependencies": { + "side-channel": "^1.1.0" + }, + "engines": { + "node": ">=0.6" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/range-parser": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/range-parser/-/range-parser-1.2.1.tgz", + "integrity": "sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, "node_modules/raw-body": { "version": "3.0.2", "resolved": "https://registry.npmjs.org/raw-body/-/raw-body-3.0.2.tgz", @@ -118,18 +903,181 @@ "node": ">= 0.10" } }, + "node_modules/require-from-string": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/require-from-string/-/require-from-string-2.0.2.tgz", + "integrity": "sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/router": { + "version": "2.2.0", + "resolved": "https://registry.npmjs.org/router/-/router-2.2.0.tgz", + "integrity": "sha512-nLTrUKm2UyiL7rlhapu/Zl45FwNgkZGaCpZbIHajDYgwlJCOzLSk+cIPAnsEqV955GjILJnKbdQC1nVPz+gAYQ==", + "license": "MIT", + "dependencies": { + "debug": "^4.4.0", + "depd": "^2.0.0", + "is-promise": "^4.0.0", + "parseurl": "^1.3.3", + "path-to-regexp": "^8.0.0" + }, + "engines": { + "node": ">= 18" + } + }, "node_modules/safer-buffer": { "version": "2.1.2", "resolved": "https://registry.npmjs.org/safer-buffer/-/safer-buffer-2.1.2.tgz", "integrity": "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==", "license": "MIT" }, + "node_modules/send": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/send/-/send-1.2.1.tgz", + "integrity": "sha512-1gnZf7DFcoIcajTjTwjwuDjzuz4PPcY2StKPlsGAQ1+YH20IRVrBaXSWmdjowTJ6u8Rc01PoYOGHXfP1mYcZNQ==", + "license": "MIT", + "dependencies": { + "debug": "^4.4.3", + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "etag": "^1.8.1", + "fresh": "^2.0.0", + "http-errors": "^2.0.1", + "mime-types": "^3.0.2", + "ms": "^2.1.3", + "on-finished": "^2.4.1", + "range-parser": "^1.2.1", + "statuses": "^2.0.2" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/serve-static": { + "version": "2.2.1", + "resolved": "https://registry.npmjs.org/serve-static/-/serve-static-2.2.1.tgz", + "integrity": "sha512-xRXBn0pPqQTVQiC8wyQrKs2MOlX24zQ0POGaj0kultvoOCstBQM5yvOhAVSUwOMjQtTvsPWoNCHfPGwaaQJhTw==", + "license": "MIT", + "dependencies": { + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "parseurl": "^1.3.3", + "send": "^1.2.0" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, "node_modules/setprototypeof": { "version": "1.2.0", "resolved": "https://registry.npmjs.org/setprototypeof/-/setprototypeof-1.2.0.tgz", "integrity": "sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw==", "license": "ISC" }, + "node_modules/shebang-command": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/shebang-command/-/shebang-command-2.0.0.tgz", + "integrity": "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA==", + "license": "MIT", + "dependencies": { + "shebang-regex": "^3.0.0" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/shebang-regex": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/shebang-regex/-/shebang-regex-3.0.0.tgz", + "integrity": "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A==", + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/side-channel": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/side-channel/-/side-channel-1.1.0.tgz", + "integrity": "sha512-ZX99e6tRweoUXqR+VBrslhda51Nh5MTQwou5tnUDgbtyM0dBgmhEDtWGP/xbKn6hqfPRHujUNwz5fy/wbbhnpw==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "object-inspect": "^1.13.3", + "side-channel-list": "^1.0.0", + "side-channel-map": "^1.0.1", + "side-channel-weakmap": "^1.0.2" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/side-channel-list": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/side-channel-list/-/side-channel-list-1.0.1.tgz", + "integrity": "sha512-mjn/0bi/oUURjc5Xl7IaWi/OJJJumuoJFQJfDDyO46+hBWsfaVM65TBHq2eoZBhzl9EchxOijpkbRC8SVBQU0w==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "object-inspect": "^1.13.4" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/side-channel-map": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/side-channel-map/-/side-channel-map-1.0.1.tgz", + "integrity": "sha512-VCjCNfgMsby3tTdo02nbjtM/ewra6jPHmpThenkTYh8pG9ucZ/1P8So4u4FGBek/BjpOVsDCMoLA/iuBKIFXRA==", + "license": "MIT", + "dependencies": { + "call-bound": "^1.0.2", + "es-errors": "^1.3.0", + "get-intrinsic": "^1.2.5", + "object-inspect": "^1.13.3" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/side-channel-weakmap": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/side-channel-weakmap/-/side-channel-weakmap-1.0.2.tgz", + "integrity": "sha512-WPS/HvHQTYnHisLo9McqBHOJk2FkHO/tlpvldyrnem4aeQp4hai3gythswg6p01oSoTl58rcpiFAjF2br2Ak2A==", + "license": "MIT", + "dependencies": { + "call-bound": "^1.0.2", + "es-errors": "^1.3.0", + "get-intrinsic": "^1.2.5", + "object-inspect": "^1.13.3", + "side-channel-map": "^1.0.1" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, "node_modules/statuses": { "version": "2.0.2", "resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.2.tgz", @@ -148,6 +1096,37 @@ "node": ">=0.6" } }, + "node_modules/type-is": { + "version": "2.1.0", + "resolved": "https://registry.npmjs.org/type-is/-/type-is-2.1.0.tgz", + "integrity": "sha512-faYHw0anBbc/kWF3zFTEnxSFOAGUX9GFbOBthvDdLsIlEoWOFOtS0zgCiQYwIskL9iGXZL3kAXD8OoZ4GmMATA==", + "license": "MIT", + "dependencies": { + "content-type": "^2.0.0", + "media-typer": "^1.1.0", + "mime-types": "^3.0.0" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/type-is/node_modules/content-type": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/content-type/-/content-type-2.0.0.tgz", + "integrity": "sha512-j/O/d7GcZCyNl7/hwZAb606rzqkyvaDctLmckbxLzHvFBzTJHuGEdodATcP3yIRoDrLHkIATJuvzbFlp/ki2cQ==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, "node_modules/typescript": { "version": "5.9.3", "resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz", @@ -171,6 +1150,36 @@ "node": ">= 0.8" } }, + "node_modules/vary": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz", + "integrity": "sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/which": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz", + "integrity": "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==", + "license": "ISC", + "dependencies": { + "isexe": "^2.0.0" + }, + "bin": { + "node-which": "bin/node-which" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/wrappy": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz", + "integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==", + "license": "ISC" + }, "node_modules/zod": { "version": "3.24.1", "resolved": "https://registry.npmjs.org/zod/-/zod-3.24.1.tgz", diff --git a/examples/mcps/temperature/package.json b/examples/mcps/temperature/package.json index 615d3373c7..e7dd5b482e 100644 --- a/examples/mcps/temperature/package.json +++ b/examples/mcps/temperature/package.json @@ -10,7 +10,7 @@ "dev": "tsc && node dist/index.js" }, "dependencies": { - "@modelcontextprotocol/sdk": "1.0.4", + "@modelcontextprotocol/sdk": "1.29.0", "zod": "3.24.1" }, "devDependencies": { @@ -18,6 +18,6 @@ "typescript": "5.9.3" }, "overrides": { - "hono": "4.12.14" + "hono": "4.12.18" } } diff --git a/examples/mcps/test-tools-server/package-lock.json b/examples/mcps/test-tools-server/package-lock.json index c3a73fe92e..76c4cf4d9d 100644 --- a/examples/mcps/test-tools-server/package-lock.json +++ b/examples/mcps/test-tools-server/package-lock.json @@ -8,7 +8,7 @@ "name": "test-tools-server", "version": "1.0.0", "dependencies": { - "@modelcontextprotocol/sdk": "1.0.4", + "@modelcontextprotocol/sdk": "1.29.0", "zod": "3.24.1" }, "bin": { @@ -19,15 +19,75 @@ "typescript": "5.3.3" } }, + "node_modules/@hono/node-server": { + "version": "1.19.14", + "resolved": "https://registry.npmjs.org/@hono/node-server/-/node-server-1.19.14.tgz", + "integrity": "sha512-GwtvgtXxnWsucXvbQXkRgqksiH2Qed37H9xHZocE5sA3N8O8O8/8FA3uclQXxXVzc9XBZuEOMK7+r02FmSpHtw==", + "license": "MIT", + "engines": { + "node": ">=18.14.1" + }, + "peerDependencies": { + "hono": "^4" + } + }, "node_modules/@modelcontextprotocol/sdk": { - "version": "1.0.4", - "resolved": "https://registry.npmjs.org/@modelcontextprotocol/sdk/-/sdk-1.0.4.tgz", - "integrity": "sha512-C+jw1lF6HSGzs7EZpzHbXfzz9rj9him4BaoumlTciW/IDDgIpweF/qiCWKlP02QKg5PPcgY6xY2WCt5y2tpYow==", + "version": "1.29.0", + "resolved": "https://registry.npmjs.org/@modelcontextprotocol/sdk/-/sdk-1.29.0.tgz", + "integrity": "sha512-zo37mZA9hJWpULgkRpowewez1y6ML5GsXJPY8FI0tBBCd77HEvza4jDqRKOXgHNn867PVGCyTdzqpz0izu5ZjQ==", "license": "MIT", "dependencies": { + "@hono/node-server": "^1.19.9", + "ajv": "^8.17.1", + "ajv-formats": "^3.0.1", "content-type": "^1.0.5", + "cors": "^2.8.5", + "cross-spawn": "^7.0.5", + "eventsource": "^3.0.2", + "eventsource-parser": "^3.0.0", + "express": "^5.2.1", + "express-rate-limit": "^8.2.1", + "hono": "^4.11.4", + "jose": "^6.1.3", + "json-schema-typed": "^8.0.2", + "pkce-challenge": "^5.0.0", "raw-body": "^3.0.0", - "zod": "^3.23.8" + "zod": "^3.25 || ^4.0", + "zod-to-json-schema": "^3.25.1" + }, + "engines": { + "node": ">=18" + }, + "peerDependencies": { + "@cfworker/json-schema": "^4.1.1", + "zod": "^3.25 || ^4.0" + }, + "peerDependenciesMeta": { + "@cfworker/json-schema": { + "optional": true + }, + "zod": { + "optional": false + } + } + }, + "node_modules/@modelcontextprotocol/sdk/node_modules/zod": { + "version": "4.4.3", + "resolved": "https://registry.npmjs.org/zod/-/zod-4.4.3.tgz", + "integrity": "sha512-ytENFjIJFl2UwYglde2jchW2Hwm4GJFLDiSXWdTrJQBIN9Fcyp7n4DhxJEiWNAJMV1/BqWfW/kkg71UDcHJyTQ==", + "license": "MIT", + "peer": true, + "funding": { + "url": "https://github.com/sponsors/colinhacks" + } + }, + "node_modules/@modelcontextprotocol/sdk/node_modules/zod-to-json-schema": { + "version": "3.25.2", + "resolved": "https://registry.npmjs.org/zod-to-json-schema/-/zod-to-json-schema-3.25.2.tgz", + "integrity": "sha512-O/PgfnpT1xKSDeQYSCfRI5Gy3hPf91mKVDuYLUHZJMiDFptvP41MSnWofm8dnCm0256ZNfZIM7DSzuSMAFnjHA==", + "license": "ISC", + "peerDependencies": { + "zod": "^3.25.28 || ^4" } }, "node_modules/@types/node": { @@ -40,6 +100,76 @@ "undici-types": "~5.26.4" } }, + "node_modules/accepts": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/accepts/-/accepts-2.0.0.tgz", + "integrity": "sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng==", + "license": "MIT", + "dependencies": { + "mime-types": "^3.0.0", + "negotiator": "^1.0.0" + }, + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/ajv": { + "version": "8.20.0", + "resolved": "https://registry.npmjs.org/ajv/-/ajv-8.20.0.tgz", + "integrity": "sha512-Thbli+OlOj+iMPYFBVBfJ3OmCAnaSyNn4M1vz9T6Gka5Jt9ba/HIR56joy65tY6kx/FCF5VXNB819Y7/GUrBGA==", + "license": "MIT", + "dependencies": { + "fast-deep-equal": "^3.1.3", + "fast-uri": "^3.0.1", + "json-schema-traverse": "^1.0.0", + "require-from-string": "^2.0.2" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/epoberezkin" + } + }, + "node_modules/ajv-formats": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/ajv-formats/-/ajv-formats-3.0.1.tgz", + "integrity": "sha512-8iUql50EUR+uUcdRQ3HDqa6EVyo3docL8g5WJ3FNcWmu62IbkGUue/pEyLBW8VGKKucTPgqeks4fIU1DA4yowQ==", + "license": "MIT", + "dependencies": { + "ajv": "^8.0.0" + }, + "peerDependencies": { + "ajv": "^8.0.0" + }, + "peerDependenciesMeta": { + "ajv": { + "optional": true + } + } + }, + "node_modules/body-parser": { + "version": "2.2.2", + "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-2.2.2.tgz", + "integrity": "sha512-oP5VkATKlNwcgvxi0vM0p/D3n2C3EReYVX+DNYs5TjZFn/oQt2j+4sVJtSMr18pdRr8wjTcBl6LoV+FUwzPmNA==", + "license": "MIT", + "dependencies": { + "bytes": "^3.1.2", + "content-type": "^1.0.5", + "debug": "^4.4.3", + "http-errors": "^2.0.0", + "iconv-lite": "^0.7.0", + "on-finished": "^2.4.1", + "qs": "^6.14.1", + "raw-body": "^3.0.1", + "type-is": "^2.0.1" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, "node_modules/bytes": { "version": "3.1.2", "resolved": "https://registry.npmjs.org/bytes/-/bytes-3.1.2.tgz", @@ -49,6 +179,48 @@ "node": ">= 0.8" } }, + "node_modules/call-bind-apply-helpers": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz", + "integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "function-bind": "^1.1.2" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/call-bound": { + "version": "1.0.4", + "resolved": "https://registry.npmjs.org/call-bound/-/call-bound-1.0.4.tgz", + "integrity": "sha512-+ys997U96po4Kx/ABpBCqhA9EuxJaQWDQg7295H4hBphv3IZg0boBKuwYpt4YXp6MZ5AmZQnU/tyMTlRpaSejg==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.2", + "get-intrinsic": "^1.3.0" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/content-disposition": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/content-disposition/-/content-disposition-1.1.0.tgz", + "integrity": "sha512-5jRCH9Z/+DRP7rkvY83B+yGIGX96OYdJmzngqnw2SBSxqCFPd0w2km3s5iawpGX8krnwSGmF0FW5Nhr0Hfai3g==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, "node_modules/content-type": { "version": "1.0.5", "resolved": "https://registry.npmjs.org/content-type/-/content-type-1.0.5.tgz", @@ -58,6 +230,72 @@ "node": ">= 0.6" } }, + "node_modules/cookie": { + "version": "0.7.2", + "resolved": "https://registry.npmjs.org/cookie/-/cookie-0.7.2.tgz", + "integrity": "sha512-yki5XnKuf750l50uGTllt6kKILY4nQ1eNIQatoXEByZ5dWgnKqbnqmTrBE5B4N7lrMJKQ2ytWMiTO2o0v6Ew/w==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/cookie-signature": { + "version": "1.2.2", + "resolved": "https://registry.npmjs.org/cookie-signature/-/cookie-signature-1.2.2.tgz", + "integrity": "sha512-D76uU73ulSXrD1UXF4KE2TMxVVwhsnCgfAyTg9k8P6KGZjlXKrOLe4dJQKI3Bxi5wjesZoFXJWElNWBjPZMbhg==", + "license": "MIT", + "engines": { + "node": ">=6.6.0" + } + }, + "node_modules/cors": { + "version": "2.8.6", + "resolved": "https://registry.npmjs.org/cors/-/cors-2.8.6.tgz", + "integrity": "sha512-tJtZBBHA6vjIAaF6EnIaq6laBBP9aq/Y3ouVJjEfoHbRBcHBAHYcMh/w8LDrk2PvIMMq8gmopa5D4V8RmbrxGw==", + "license": "MIT", + "dependencies": { + "object-assign": "^4", + "vary": "^1" + }, + "engines": { + "node": ">= 0.10" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/cross-spawn": { + "version": "7.0.6", + "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz", + "integrity": "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==", + "license": "MIT", + "dependencies": { + "path-key": "^3.1.0", + "shebang-command": "^2.0.0", + "which": "^2.0.1" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/debug": { + "version": "4.4.3", + "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz", + "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==", + "license": "MIT", + "dependencies": { + "ms": "^2.1.3" + }, + "engines": { + "node": ">=6.0" + }, + "peerDependenciesMeta": { + "supports-color": { + "optional": true + } + } + }, "node_modules/depd": { "version": "2.0.0", "resolved": "https://registry.npmjs.org/depd/-/depd-2.0.0.tgz", @@ -67,6 +305,316 @@ "node": ">= 0.8" } }, + "node_modules/dunder-proto": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz", + "integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.1", + "es-errors": "^1.3.0", + "gopd": "^1.2.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/ee-first": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/ee-first/-/ee-first-1.1.1.tgz", + "integrity": "sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==", + "license": "MIT" + }, + "node_modules/encodeurl": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-2.0.0.tgz", + "integrity": "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/es-define-property": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.1.tgz", + "integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/es-errors": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz", + "integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/es-object-atoms": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/es-object-atoms/-/es-object-atoms-1.1.1.tgz", + "integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/escape-html": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/escape-html/-/escape-html-1.0.3.tgz", + "integrity": "sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==", + "license": "MIT" + }, + "node_modules/etag": { + "version": "1.8.1", + "resolved": "https://registry.npmjs.org/etag/-/etag-1.8.1.tgz", + "integrity": "sha512-aIL5Fx7mawVa300al2BnEE4iNvo1qETxLrPI/o05L7z6go7fCw1J6EQmbK4FmJ2AS7kgVF/KEZWufBfdClMcPg==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/eventsource": { + "version": "3.0.7", + "resolved": "https://registry.npmjs.org/eventsource/-/eventsource-3.0.7.tgz", + "integrity": "sha512-CRT1WTyuQoD771GW56XEZFQ/ZoSfWid1alKGDYMmkt2yl8UXrVR4pspqWNEcqKvVIzg6PAltWjxcSSPrboA4iA==", + "license": "MIT", + "dependencies": { + "eventsource-parser": "^3.0.1" + }, + "engines": { + "node": ">=18.0.0" + } + }, + "node_modules/eventsource-parser": { + "version": "3.0.8", + "resolved": "https://registry.npmjs.org/eventsource-parser/-/eventsource-parser-3.0.8.tgz", + "integrity": "sha512-70QWGkr4snxr0OXLRWsFLeRBIRPuQOvt4s8QYjmUlmlkyTZkRqS7EDVRZtzU3TiyDbXSzaOeF0XUKy8PchzukQ==", + "license": "MIT", + "engines": { + "node": ">=18.0.0" + } + }, + "node_modules/express": { + "version": "5.2.1", + "resolved": "https://registry.npmjs.org/express/-/express-5.2.1.tgz", + "integrity": "sha512-hIS4idWWai69NezIdRt2xFVofaF4j+6INOpJlVOLDO8zXGpUVEVzIYk12UUi2JzjEzWL3IOAxcTubgz9Po0yXw==", + "license": "MIT", + "peer": true, + "dependencies": { + "accepts": "^2.0.0", + "body-parser": "^2.2.1", + "content-disposition": "^1.0.0", + "content-type": "^1.0.5", + "cookie": "^0.7.1", + "cookie-signature": "^1.2.1", + "debug": "^4.4.0", + "depd": "^2.0.0", + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "etag": "^1.8.1", + "finalhandler": "^2.1.0", + "fresh": "^2.0.0", + "http-errors": "^2.0.0", + "merge-descriptors": "^2.0.0", + "mime-types": "^3.0.0", + "on-finished": "^2.4.1", + "once": "^1.4.0", + "parseurl": "^1.3.3", + "proxy-addr": "^2.0.7", + "qs": "^6.14.0", + "range-parser": "^1.2.1", + "router": "^2.2.0", + "send": "^1.1.0", + "serve-static": "^2.2.0", + "statuses": "^2.0.1", + "type-is": "^2.0.1", + "vary": "^1.1.2" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/express-rate-limit": { + "version": "8.5.2", + "resolved": "https://registry.npmjs.org/express-rate-limit/-/express-rate-limit-8.5.2.tgz", + "integrity": "sha512-5Kb34ipNX694DH48vN9irak1Qx30nb0PLYHXfJgw4YEjiC3ZEmZJhwOp+VfiCYwFzvFTdB9QkArYS5kXa2cx2A==", + "license": "MIT", + "dependencies": { + "ip-address": "^10.2.0" + }, + "engines": { + "node": ">= 16" + }, + "funding": { + "url": "https://github.com/sponsors/express-rate-limit" + }, + "peerDependencies": { + "express": ">= 4.11" + } + }, + "node_modules/fast-deep-equal": { + "version": "3.1.3", + "resolved": "https://registry.npmjs.org/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz", + "integrity": "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==", + "license": "MIT" + }, + "node_modules/fast-uri": { + "version": "3.1.2", + "resolved": "https://registry.npmjs.org/fast-uri/-/fast-uri-3.1.2.tgz", + "integrity": "sha512-rVjf7ArG3LTk+FS6Yw81V1DLuZl1bRbNrev6Tmd/9RaroeeRRJhAt7jg/6YFxbvAQXUCavSoZhPPj6oOx+5KjQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "BSD-3-Clause" + }, + "node_modules/finalhandler": { + "version": "2.1.1", + "resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-2.1.1.tgz", + "integrity": "sha512-S8KoZgRZN+a5rNwqTxlZZePjT/4cnm0ROV70LedRHZ0p8u9fRID0hJUZQpkKLzro8LfmC8sx23bY6tVNxv8pQA==", + "license": "MIT", + "dependencies": { + "debug": "^4.4.0", + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "on-finished": "^2.4.1", + "parseurl": "^1.3.3", + "statuses": "^2.0.1" + }, + "engines": { + "node": ">= 18.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/forwarded": { + "version": "0.2.0", + "resolved": "https://registry.npmjs.org/forwarded/-/forwarded-0.2.0.tgz", + "integrity": "sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/fresh": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/fresh/-/fresh-2.0.0.tgz", + "integrity": "sha512-Rx/WycZ60HOaqLKAi6cHRKKI7zxWbJ31MhntmtwMoaTeF7XFH9hhBp8vITaMidfljRQ6eYWCKkaTK+ykVJHP2A==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/function-bind": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz", + "integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==", + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/get-intrinsic": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz", + "integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.2", + "es-define-property": "^1.0.1", + "es-errors": "^1.3.0", + "es-object-atoms": "^1.1.1", + "function-bind": "^1.1.2", + "get-proto": "^1.0.1", + "gopd": "^1.2.0", + "has-symbols": "^1.1.0", + "hasown": "^2.0.2", + "math-intrinsics": "^1.1.0" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/get-proto": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/get-proto/-/get-proto-1.0.1.tgz", + "integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==", + "license": "MIT", + "dependencies": { + "dunder-proto": "^1.0.1", + "es-object-atoms": "^1.0.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/gopd": { + "version": "1.2.0", + "resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz", + "integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/has-symbols": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz", + "integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/hasown": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.3.tgz", + "integrity": "sha512-ej4AhfhfL2Q2zpMmLo7U1Uv9+PyhIZpgQLGT1F9miIGmiCJIoCgSmczFdrc97mWT4kVY72KA+WnnhJ5pghSvSg==", + "license": "MIT", + "dependencies": { + "function-bind": "^1.1.2" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/hono": { + "version": "4.12.18", + "resolved": "https://registry.npmjs.org/hono/-/hono-4.12.18.tgz", + "integrity": "sha512-RWzP96k/yv0PQfyXnWjs6zot20TqfpfsNXhOnev8d1InAxubW93L11/oNUc3tQqn2G0bSdAOBpX+2uDFHV7kdQ==", + "license": "MIT", + "peer": true, + "engines": { + "node": ">=16.9.0" + } + }, "node_modules/http-errors": { "version": "2.0.1", "resolved": "https://registry.npmjs.org/http-errors/-/http-errors-2.0.1.tgz", @@ -109,6 +657,243 @@ "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==", "license": "ISC" }, + "node_modules/ip-address": { + "version": "10.2.0", + "resolved": "https://registry.npmjs.org/ip-address/-/ip-address-10.2.0.tgz", + "integrity": "sha512-/+S6j4E9AHvW9SWMSEY9Xfy66O5PWvVEJ08O0y5JGyEKQpojb0K0GKpz/v5HJ/G0vi3D2sjGK78119oXZeE0qA==", + "license": "MIT", + "engines": { + "node": ">= 12" + } + }, + "node_modules/ipaddr.js": { + "version": "1.9.1", + "resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-1.9.1.tgz", + "integrity": "sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g==", + "license": "MIT", + "engines": { + "node": ">= 0.10" + } + }, + "node_modules/is-promise": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/is-promise/-/is-promise-4.0.0.tgz", + "integrity": "sha512-hvpoI6korhJMnej285dSg6nu1+e6uxs7zG3BYAm5byqDsgJNWwxzM6z6iZiAgQR4TJ30JmBTOwqZUw3WlyH3AQ==", + "license": "MIT" + }, + "node_modules/isexe": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/isexe/-/isexe-2.0.0.tgz", + "integrity": "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==", + "license": "ISC" + }, + "node_modules/jose": { + "version": "6.2.3", + "resolved": "https://registry.npmjs.org/jose/-/jose-6.2.3.tgz", + "integrity": "sha512-YYVDInQKFJfR/xa3ojUTl8c2KoTwiL1R5Wg9YCydwH0x0B9grbzlg5HC7mMjCtUJjbQ/YnGEZIhI5tCgfTb4Hw==", + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/panva" + } + }, + "node_modules/json-schema-traverse": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-1.0.0.tgz", + "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==", + "license": "MIT" + }, + "node_modules/json-schema-typed": { + "version": "8.0.2", + "resolved": "https://registry.npmjs.org/json-schema-typed/-/json-schema-typed-8.0.2.tgz", + "integrity": "sha512-fQhoXdcvc3V28x7C7BMs4P5+kNlgUURe2jmUT1T//oBRMDrqy1QPelJimwZGo7Hg9VPV3EQV5Bnq4hbFy2vetA==", + "license": "BSD-2-Clause" + }, + "node_modules/math-intrinsics": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz", + "integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/media-typer": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/media-typer/-/media-typer-1.1.0.tgz", + "integrity": "sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/merge-descriptors": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-2.0.0.tgz", + "integrity": "sha512-Snk314V5ayFLhp3fkUREub6WtjBfPdCPY1Ln8/8munuLuiYhsABgBVWsozAG+MWMbVEvcdcpbi9R7ww22l9Q3g==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/mime-db": { + "version": "1.54.0", + "resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.54.0.tgz", + "integrity": "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/mime-types": { + "version": "3.0.2", + "resolved": "https://registry.npmjs.org/mime-types/-/mime-types-3.0.2.tgz", + "integrity": "sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A==", + "license": "MIT", + "dependencies": { + "mime-db": "^1.54.0" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/ms": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz", + "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==", + "license": "MIT" + }, + "node_modules/negotiator": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/negotiator/-/negotiator-1.0.0.tgz", + "integrity": "sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/object-assign": { + "version": "4.1.1", + "resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz", + "integrity": "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/object-inspect": { + "version": "1.13.4", + "resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.4.tgz", + "integrity": "sha512-W67iLl4J2EXEGTbfeHCffrjDfitvLANg0UlX3wFUUSTx92KXRFegMHUVgSqE+wvhAbi4WqjGg9czysTV2Epbew==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/on-finished": { + "version": "2.4.1", + "resolved": "https://registry.npmjs.org/on-finished/-/on-finished-2.4.1.tgz", + "integrity": "sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg==", + "license": "MIT", + "dependencies": { + "ee-first": "1.1.1" + }, + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/once": { + "version": "1.4.0", + "resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz", + "integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==", + "license": "ISC", + "dependencies": { + "wrappy": "1" + } + }, + "node_modules/parseurl": { + "version": "1.3.3", + "resolved": "https://registry.npmjs.org/parseurl/-/parseurl-1.3.3.tgz", + "integrity": "sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/path-key": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/path-key/-/path-key-3.1.1.tgz", + "integrity": "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q==", + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/path-to-regexp": { + "version": "8.4.2", + "resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.4.2.tgz", + "integrity": "sha512-qRcuIdP69NPm4qbACK+aDogI5CBDMi1jKe0ry5rSQJz8JVLsC7jV8XpiJjGRLLol3N+R5ihGYcrPLTno6pAdBA==", + "license": "MIT", + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/pkce-challenge": { + "version": "5.0.1", + "resolved": "https://registry.npmjs.org/pkce-challenge/-/pkce-challenge-5.0.1.tgz", + "integrity": "sha512-wQ0b/W4Fr01qtpHlqSqspcj3EhBvimsdh0KlHhH8HRZnMsEa0ea2fTULOXOS9ccQr3om+GcGRk4e+isrZWV8qQ==", + "license": "MIT", + "engines": { + "node": ">=16.20.0" + } + }, + "node_modules/proxy-addr": { + "version": "2.0.7", + "resolved": "https://registry.npmjs.org/proxy-addr/-/proxy-addr-2.0.7.tgz", + "integrity": "sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg==", + "license": "MIT", + "dependencies": { + "forwarded": "0.2.0", + "ipaddr.js": "1.9.1" + }, + "engines": { + "node": ">= 0.10" + } + }, + "node_modules/qs": { + "version": "6.15.1", + "resolved": "https://registry.npmjs.org/qs/-/qs-6.15.1.tgz", + "integrity": "sha512-6YHEFRL9mfgcAvql/XhwTvf5jKcOiiupt2FiJxHkiX1z4j7WL8J/jRHYLluORvc1XxB5rV20KoeK00gVJamspg==", + "license": "BSD-3-Clause", + "dependencies": { + "side-channel": "^1.1.0" + }, + "engines": { + "node": ">=0.6" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/range-parser": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/range-parser/-/range-parser-1.2.1.tgz", + "integrity": "sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, "node_modules/raw-body": { "version": "3.0.2", "resolved": "https://registry.npmjs.org/raw-body/-/raw-body-3.0.2.tgz", @@ -124,18 +909,181 @@ "node": ">= 0.10" } }, + "node_modules/require-from-string": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/require-from-string/-/require-from-string-2.0.2.tgz", + "integrity": "sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/router": { + "version": "2.2.0", + "resolved": "https://registry.npmjs.org/router/-/router-2.2.0.tgz", + "integrity": "sha512-nLTrUKm2UyiL7rlhapu/Zl45FwNgkZGaCpZbIHajDYgwlJCOzLSk+cIPAnsEqV955GjILJnKbdQC1nVPz+gAYQ==", + "license": "MIT", + "dependencies": { + "debug": "^4.4.0", + "depd": "^2.0.0", + "is-promise": "^4.0.0", + "parseurl": "^1.3.3", + "path-to-regexp": "^8.0.0" + }, + "engines": { + "node": ">= 18" + } + }, "node_modules/safer-buffer": { "version": "2.1.2", "resolved": "https://registry.npmjs.org/safer-buffer/-/safer-buffer-2.1.2.tgz", "integrity": "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==", "license": "MIT" }, + "node_modules/send": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/send/-/send-1.2.1.tgz", + "integrity": "sha512-1gnZf7DFcoIcajTjTwjwuDjzuz4PPcY2StKPlsGAQ1+YH20IRVrBaXSWmdjowTJ6u8Rc01PoYOGHXfP1mYcZNQ==", + "license": "MIT", + "dependencies": { + "debug": "^4.4.3", + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "etag": "^1.8.1", + "fresh": "^2.0.0", + "http-errors": "^2.0.1", + "mime-types": "^3.0.2", + "ms": "^2.1.3", + "on-finished": "^2.4.1", + "range-parser": "^1.2.1", + "statuses": "^2.0.2" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/serve-static": { + "version": "2.2.1", + "resolved": "https://registry.npmjs.org/serve-static/-/serve-static-2.2.1.tgz", + "integrity": "sha512-xRXBn0pPqQTVQiC8wyQrKs2MOlX24zQ0POGaj0kultvoOCstBQM5yvOhAVSUwOMjQtTvsPWoNCHfPGwaaQJhTw==", + "license": "MIT", + "dependencies": { + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "parseurl": "^1.3.3", + "send": "^1.2.0" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, "node_modules/setprototypeof": { "version": "1.2.0", "resolved": "https://registry.npmjs.org/setprototypeof/-/setprototypeof-1.2.0.tgz", "integrity": "sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw==", "license": "ISC" }, + "node_modules/shebang-command": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/shebang-command/-/shebang-command-2.0.0.tgz", + "integrity": "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA==", + "license": "MIT", + "dependencies": { + "shebang-regex": "^3.0.0" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/shebang-regex": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/shebang-regex/-/shebang-regex-3.0.0.tgz", + "integrity": "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A==", + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/side-channel": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/side-channel/-/side-channel-1.1.0.tgz", + "integrity": "sha512-ZX99e6tRweoUXqR+VBrslhda51Nh5MTQwou5tnUDgbtyM0dBgmhEDtWGP/xbKn6hqfPRHujUNwz5fy/wbbhnpw==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "object-inspect": "^1.13.3", + "side-channel-list": "^1.0.0", + "side-channel-map": "^1.0.1", + "side-channel-weakmap": "^1.0.2" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/side-channel-list": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/side-channel-list/-/side-channel-list-1.0.1.tgz", + "integrity": "sha512-mjn/0bi/oUURjc5Xl7IaWi/OJJJumuoJFQJfDDyO46+hBWsfaVM65TBHq2eoZBhzl9EchxOijpkbRC8SVBQU0w==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "object-inspect": "^1.13.4" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/side-channel-map": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/side-channel-map/-/side-channel-map-1.0.1.tgz", + "integrity": "sha512-VCjCNfgMsby3tTdo02nbjtM/ewra6jPHmpThenkTYh8pG9ucZ/1P8So4u4FGBek/BjpOVsDCMoLA/iuBKIFXRA==", + "license": "MIT", + "dependencies": { + "call-bound": "^1.0.2", + "es-errors": "^1.3.0", + "get-intrinsic": "^1.2.5", + "object-inspect": "^1.13.3" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/side-channel-weakmap": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/side-channel-weakmap/-/side-channel-weakmap-1.0.2.tgz", + "integrity": "sha512-WPS/HvHQTYnHisLo9McqBHOJk2FkHO/tlpvldyrnem4aeQp4hai3gythswg6p01oSoTl58rcpiFAjF2br2Ak2A==", + "license": "MIT", + "dependencies": { + "call-bound": "^1.0.2", + "es-errors": "^1.3.0", + "get-intrinsic": "^1.2.5", + "object-inspect": "^1.13.3", + "side-channel-map": "^1.0.1" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, "node_modules/statuses": { "version": "2.0.2", "resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.2.tgz", @@ -154,6 +1102,37 @@ "node": ">=0.6" } }, + "node_modules/type-is": { + "version": "2.1.0", + "resolved": "https://registry.npmjs.org/type-is/-/type-is-2.1.0.tgz", + "integrity": "sha512-faYHw0anBbc/kWF3zFTEnxSFOAGUX9GFbOBthvDdLsIlEoWOFOtS0zgCiQYwIskL9iGXZL3kAXD8OoZ4GmMATA==", + "license": "MIT", + "dependencies": { + "content-type": "^2.0.0", + "media-typer": "^1.1.0", + "mime-types": "^3.0.0" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/type-is/node_modules/content-type": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/content-type/-/content-type-2.0.0.tgz", + "integrity": "sha512-j/O/d7GcZCyNl7/hwZAb606rzqkyvaDctLmckbxLzHvFBzTJHuGEdodATcP3yIRoDrLHkIATJuvzbFlp/ki2cQ==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, "node_modules/typescript": { "version": "5.3.3", "resolved": "https://registry.npmjs.org/typescript/-/typescript-5.3.3.tgz", @@ -184,6 +1163,36 @@ "node": ">= 0.8" } }, + "node_modules/vary": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz", + "integrity": "sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/which": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz", + "integrity": "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==", + "license": "ISC", + "dependencies": { + "isexe": "^2.0.0" + }, + "bin": { + "node-which": "bin/node-which" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/wrappy": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz", + "integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==", + "license": "ISC" + }, "node_modules/zod": { "version": "3.24.1", "resolved": "https://registry.npmjs.org/zod/-/zod-3.24.1.tgz", diff --git a/examples/mcps/test-tools-server/package.json b/examples/mcps/test-tools-server/package.json index 2bf18d26c5..f84982abe6 100644 --- a/examples/mcps/test-tools-server/package.json +++ b/examples/mcps/test-tools-server/package.json @@ -11,7 +11,7 @@ "prepare": "npm run build" }, "dependencies": { - "@modelcontextprotocol/sdk": "1.0.4", + "@modelcontextprotocol/sdk": "1.29.0", "zod": "3.24.1" }, "devDependencies": { @@ -19,6 +19,6 @@ "typescript": "5.3.3" }, "overrides": { - "hono": "4.12.14" + "hono": "4.12.18" } } diff --git a/framework/configstore/migrations.go b/framework/configstore/migrations.go index 76756dd32b..60354facd2 100644 --- a/framework/configstore/migrations.go +++ b/framework/configstore/migrations.go @@ -710,6 +710,12 @@ func triggerMigrations(ctx context.Context, db *gorm.DB) error { if err := migrationDropAllowDirectKeysColumnDDL(ctx, db); err != nil { return err } + if err := migrationAddTeamCalendarAlignedColumn(ctx, db); err != nil { + return err + } + if err := migrationDropLegacyCalendarAlignedColumns(ctx, db); err != nil { + return err + } return nil } @@ -7527,3 +7533,94 @@ func migrationUniqueTeamNames(ctx context.Context, db *gorm.DB) error { }, }) } + +// migrationDropLegacyCalendarAlignedColumns drops the legacy calendar_aligned +// columns from governance_budgets and governance_rate_limits. Calendar +// alignment is now a VK-only setting (governance_virtual_keys.calendar_aligned); +// budget and rate-limit reset logic derives the value from the owning VK at +// reset time. The columns were re-added by migrate_calendar_aligned after +// add_multi_budget_tables dropped governance_budgets.calendar_aligned, so any +// DB that ran both still has them — this migration cleans them up. +func migrationDropLegacyCalendarAlignedColumns(ctx context.Context, db *gorm.DB) error { + m := migrator.New(db, migrator.DefaultOptions, []*migrator.Migration{{ + ID: "drop_legacy_calendar_aligned_columns", + Migrate: func(tx *gorm.DB) error { + tx = tx.WithContext(ctx) + mig := tx.Migrator() + if mig.HasColumn(&tables.TableBudget{}, "calendar_aligned") { + if err := mig.DropColumn(&tables.TableBudget{}, "calendar_aligned"); err != nil { + return fmt.Errorf("failed to drop legacy calendar_aligned column from governance_budgets: %w", err) + } + } + if mig.HasColumn(&tables.TableRateLimit{}, "calendar_aligned") { + if err := mig.DropColumn(&tables.TableRateLimit{}, "calendar_aligned"); err != nil { + return fmt.Errorf("failed to drop legacy calendar_aligned column from governance_rate_limits: %w", err) + } + } + return nil + }, + Rollback: func(tx *gorm.DB) error { return nil }, + }}) + if err := m.Migrate(); err != nil { + return fmt.Errorf("error running drop_legacy_calendar_aligned_columns migration: %s", err.Error()) + } + return nil +} + +// migrationAddTeamCalendarAlignedColumn adds calendar_aligned to governance_teams so +// team-level calendar alignment (governing all team budgets and the team rate limit) +// can be persisted. +func migrationAddTeamCalendarAlignedColumn(ctx context.Context, db *gorm.DB) error { + m := migrator.New(db, migrator.DefaultOptions, []*migrator.Migration{{ + ID: "add_team_calendar_aligned_column", + Migrate: func(tx *gorm.DB) error { + tx = tx.WithContext(ctx) + mig := tx.Migrator() + if !mig.HasColumn(&tables.TableTeam{}, "calendar_aligned") { + if err := mig.AddColumn(&tables.TableTeam{}, "CalendarAligned"); err != nil { + return fmt.Errorf("failed to add calendar_aligned column to governance_teams: %w", err) + } + } + // Backfill from legacy per-budget / per-rate-limit flags before the + // drop migration removes them. Any team-owned budget with + // calendar_aligned=true, or a team rate-limit with calendar_aligned=true, + // promotes the team to calendar-aligned so behavior is preserved across upgrade. + if mig.HasColumn(&tables.TableBudget{}, "calendar_aligned") { + if err := tx.Exec(` + UPDATE governance_teams + SET calendar_aligned = TRUE + WHERE EXISTS ( + SELECT 1 FROM governance_budgets b + WHERE b.team_id = governance_teams.id AND b.calendar_aligned = TRUE + ) + `).Error; err != nil { + return fmt.Errorf("failed to backfill team calendar_aligned from budgets: %w", err) + } + } + if mig.HasColumn(&tables.TableRateLimit{}, "calendar_aligned") { + if err := tx.Exec(` + UPDATE governance_teams + SET calendar_aligned = TRUE + WHERE rate_limit_id IN ( + SELECT id FROM governance_rate_limits WHERE calendar_aligned = TRUE + ) + `).Error; err != nil { + return fmt.Errorf("failed to backfill team calendar_aligned from rate limits: %w", err) + } + } + return nil + }, + Rollback: func(tx *gorm.DB) error { + tx = tx.WithContext(ctx) + mig := tx.Migrator() + if mig.HasColumn(&tables.TableTeam{}, "calendar_aligned") { + return mig.DropColumn(&tables.TableTeam{}, "calendar_aligned") + } + return nil + }, + }}) + if err := m.Migrate(); err != nil { + return fmt.Errorf("error running add_team_calendar_aligned_column migration: %s", err.Error()) + } + return nil +} diff --git a/framework/configstore/rdb.go b/framework/configstore/rdb.go index f49f2c9f9b..22a84d8fd7 100644 --- a/framework/configstore/rdb.go +++ b/framework/configstore/rdb.go @@ -2457,7 +2457,7 @@ func (s *RDBConfigStore) UpdateVirtualKey(ctx context.Context, virtualKey *table } else { virtualKey.ID = existing.ID if err := txDB.WithContext(ctx). - Select("name", "description", "value", "is_active", "team_id", "customer_id", "budget_id", "rate_limit_id", "config_hash", "updated_at", "encryption_status", "value_hash"). + Select("name", "description", "value", "is_active", "team_id", "customer_id", "rate_limit_id", "calendar_aligned", "config_hash", "updated_at", "encryption_status", "value_hash"). Updates(virtualKey).Error; err != nil { return s.parseGormError(err) } diff --git a/framework/configstore/tables/budget.go b/framework/configstore/tables/budget.go index d2bbe8a68e..015ec6e696 100644 --- a/framework/configstore/tables/budget.go +++ b/framework/configstore/tables/budget.go @@ -20,7 +20,18 @@ type TableBudget struct { VirtualKeyID *string `gorm:"type:varchar(255);index" json:"virtual_key_id,omitempty"` ProviderConfigID *uint `gorm:"index" json:"provider_config_id,omitempty"` - CalendarAligned bool `gorm:"default:false" json:"calendar_aligned"` // When true, all budgets under this VK reset at clean calendar boundaries + // Deprecated: set calendar_aligned on the parent access profile / VK / team + // instead. Kept for backward compatibility with older config.json files; + // the OSS applyV1Compat path and the enterprise access-profile reconciler + // promote any true value here to the owner's top-level CalendarAligned at + // load time. + CalendarAlignedInput *bool `gorm:"-" json:"calendar_aligned,omitempty"` + + // Derived from the owning entity (VK / PC's parent VK / Team). Populated by + // the owner's AfterFind hook on cold load and by the governance store's + // Create/Update *InMemory methods on write. Never persisted; consumed by + // the reset path to decide rolling vs. calendar-aligned window. + IsCalendarAligned bool `gorm:"-" json:"-"` // Config hash is used to detect the changes synced from config.json file // Every time we sync the config.json file, we will update the config hash diff --git a/framework/configstore/tables/ratelimit.go b/framework/configstore/tables/ratelimit.go index 0268b53164..086a9449a2 100644 --- a/framework/configstore/tables/ratelimit.go +++ b/framework/configstore/tables/ratelimit.go @@ -23,7 +23,15 @@ type TableRateLimit struct { RequestCurrentUsage int64 `gorm:"default:0" json:"request_current_usage"` // Current request usage RequestLastReset time.Time `gorm:"index" json:"request_last_reset"` // Last time request counter was reset - CalendarAligned bool `gorm:"default:false" json:"calendar_aligned"` // When true, all budgets under this VK reset at clean calendar boundaries + // Deprecated: set calendar_aligned on the parent access profile / VK / team + // instead. Kept for backward compatibility with older config.json files; + // the OSS applyV1Compat path and the enterprise access-profile reconciler + // promote any true value here to the owner's top-level CalendarAligned at + // load time. + CalendarAlignedInput *bool `gorm:"-" json:"calendar_aligned,omitempty"` + + // Derived from the owning entity. See TableBudget.IsCalendarAligned. + IsCalendarAligned bool `gorm:"-" json:"-"` // Config hash is used to detect the changes synced from config.json file // Every time we sync the config.json file, we will update the config hash diff --git a/framework/configstore/tables/team.go b/framework/configstore/tables/team.go index 8721c6259f..5cd1cef915 100644 --- a/framework/configstore/tables/team.go +++ b/framework/configstore/tables/team.go @@ -32,6 +32,8 @@ type TableTeam struct { Claims *string `gorm:"type:text" json:"-"` ParsedClaims map[string]any `gorm:"-" json:"claims"` + CalendarAligned bool `gorm:"default:false" json:"calendar_aligned"` + // Config hash is used to detect the changes synced from config.json file // Every time we sync the config.json file, we will update the config hash ConfigHash string `gorm:"type:varchar(255);null" json:"config_hash"` @@ -75,7 +77,10 @@ func (t *TableTeam) BeforeSave(tx *gorm.DB) error { return nil } -// AfterFind hook for TableTeam to deserialize JSON fields +// AfterFind hook for TableTeam to deserialize JSON fields and propagate +// calendar_aligned down to owned budgets / rate_limit. The reset path reads +// the stamped value off the budget / rate_limit; the governance store's +// Update*InMemory paths re-stamp on every team update. func (t *TableTeam) AfterFind(tx *gorm.DB) error { if t.Profile != nil { if err := json.Unmarshal([]byte(*t.Profile), &t.ParsedProfile); err != nil { @@ -92,5 +97,11 @@ func (t *TableTeam) AfterFind(tx *gorm.DB) error { return err } } + for i := range t.Budgets { + t.Budgets[i].IsCalendarAligned = t.CalendarAligned + } + if t.RateLimit != nil { + t.RateLimit.IsCalendarAligned = t.CalendarAligned + } return nil } diff --git a/framework/configstore/tables/virtualkey.go b/framework/configstore/tables/virtualkey.go index a5be07b675..d2015d9cf8 100644 --- a/framework/configstore/tables/virtualkey.go +++ b/framework/configstore/tables/virtualkey.go @@ -210,11 +210,7 @@ type TableVirtualKey struct { CustomerID *string `gorm:"type:varchar(255);index" json:"customer_id,omitempty"` RateLimitID *string `gorm:"type:varchar(255);index" json:"rate_limit_id,omitempty"` - // Deprecated - // Calendar aligned is not the property of virtual key but its property of the budget and ratelimit - // So in the migration we will move this to the budget/ratelimit table - // And this won't be referred - CalendarAligned bool `gorm:"default:false" json:"calendar_aligned"` // When true, all budgets under this VK reset at clean calendar boundaries + CalendarAligned bool `gorm:"default:false" json:"calendar_aligned"` // Relationships Team *TableTeam `gorm:"foreignKey:TeamID" json:"team,omitempty"` @@ -269,12 +265,31 @@ func (vk *TableVirtualKey) BeforeSave(tx *gorm.DB) error { return nil } -// AfterFind is a GORM hook that decrypts the virtual key value after reading from the database. +// AfterFind is a GORM hook that decrypts the virtual key value after reading +// from the database and propagates VK-level calendar_aligned down to owned +// budgets / rate_limit and to each provider config's budgets / rate_limit. +// The reset path reads the stamped value; Update*InMemory paths re-stamp on +// every VK update. func (vk *TableVirtualKey) AfterFind(tx *gorm.DB) error { if vk.EncryptionStatus == EncryptionStatusEncrypted { if err := decryptString(&vk.Value); err != nil { return fmt.Errorf("failed to decrypt virtual key value: %w", err) } } + for i := range vk.Budgets { + vk.Budgets[i].IsCalendarAligned = vk.CalendarAligned + } + if vk.RateLimit != nil { + vk.RateLimit.IsCalendarAligned = vk.CalendarAligned + } + for i := range vk.ProviderConfigs { + pc := &vk.ProviderConfigs[i] + for j := range pc.Budgets { + pc.Budgets[j].IsCalendarAligned = vk.CalendarAligned + } + if pc.RateLimit != nil { + pc.RateLimit.IsCalendarAligned = vk.CalendarAligned + } + } return nil } diff --git a/framework/go.mod b/framework/go.mod index 8ea385390c..dd4af02bcd 100644 --- a/framework/go.mod +++ b/framework/go.mod @@ -50,6 +50,7 @@ require ( github.com/go-jose/go-jose/v4 v4.1.4 // indirect github.com/go-logr/logr v1.4.3 // indirect github.com/go-logr/stdr v1.2.2 // indirect + github.com/go-openapi/analysis v0.24.2 // indirect github.com/go-openapi/swag/cmdutils v0.25.4 // indirect github.com/go-openapi/swag/conv v0.25.4 // indirect github.com/go-openapi/swag/fileutils v0.25.4 // indirect @@ -117,7 +118,6 @@ require ( github.com/cloudwego/base64x v0.1.6 // indirect github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect - github.com/go-openapi/analysis v0.24.2 // indirect github.com/go-openapi/errors v0.22.5 // indirect github.com/go-openapi/jsonpointer v0.22.4 // indirect github.com/go-openapi/jsonreference v0.21.4 // indirect diff --git a/framework/logstore/matviews.go b/framework/logstore/matviews.go index efb220c633..25d2ad0922 100644 --- a/framework/logstore/matviews.go +++ b/framework/logstore/matviews.go @@ -634,7 +634,8 @@ func canUseMatViewFilters(f SearchFilters) bool { f.MinLatency == nil && f.MaxLatency == nil && f.MinTokens == nil && f.MaxTokens == nil && f.MinCost == nil && f.MaxCost == nil && - !f.MissingCostOnly + !f.MissingCostOnly && + len(f.CacheHitTypes) == 0 } // canUseMatView checks both that materialized views are ready (created and diff --git a/framework/logstore/migrations.go b/framework/logstore/migrations.go index fa3a404838..cb4ee04571 100644 --- a/framework/logstore/migrations.go +++ b/framework/logstore/migrations.go @@ -324,6 +324,9 @@ func triggerMigrations(ctx context.Context, db *gorm.DB) error { if err := migrationAddStopReasonColumn(ctx, db); err != nil { return err } + if err := migrationAddSafeJsonbFunction(ctx, db); err != nil { + return err + } // migrationSplitFilterDataMatView is intentionally NOT invoked in this // release. Dropping mv_logs_filterdata while old replicas are still // serving /api/logs/filterdata from it would surface "relation does not @@ -3038,3 +3041,64 @@ func migrationAddStopReasonColumn(ctx context.Context, db *gorm.DB) error { } return nil } + +// migrationAddSafeJsonbFunction installs a PL/pgSQL helper that the +// /api/logs list query uses to extract the last element of input_history / +// responses_input_history without aborting the whole query on a single bad row. +// +// The previous inline guard (`left(btrim(x),1)='['`) only checked the first +// character before casting to jsonb. Any row that looked array-shaped but +// contained malformed JSON (unterminated structures, trailing commas, unpaired +// UTF-16 surrogates, etc.) would fail the cast with 22P02 / 22P05 and abort the +// entire list response. The helper wraps the cast in an EXCEPTION block and +// returns the raw TEXT on any parse failure. +// +// Postgres-only; SQLite is guarded inline in listSelectColumns via json_valid(). +func migrationAddSafeJsonbFunction(ctx context.Context, db *gorm.DB) error { + if db.Dialector.Name() != "postgres" { + return nil + } + opts := *migrator.DefaultOptions + opts.UseTransaction = true + m := migrator.New(db, &opts, []*migrator.Migration{{ + ID: "logs_add_safe_jsonb_function", + Migrate: func(tx *gorm.DB) error { + tx = tx.WithContext(ctx) + const stmt = ` +CREATE OR REPLACE FUNCTION bifrost_safe_jsonb(t text) RETURNS text +LANGUAGE plpgsql IMMUTABLE AS $$ +DECLARE + j jsonb; +BEGIN + IF t IS NULL OR t = '' OR t = '[]' THEN + RETURN t; + END IF; + IF left(btrim(t), 1) <> '[' THEN + RETURN t; + END IF; + BEGIN + j := t::jsonb; + EXCEPTION WHEN invalid_text_representation OR untranslatable_character THEN + RETURN t; + END; + IF jsonb_typeof(j) <> 'array' OR jsonb_array_length(j) = 0 THEN + RETURN t; + END IF; + RETURN jsonb_build_array(j->-1)::text; +END; +$$;` + if err := tx.Exec(stmt).Error; err != nil { + return fmt.Errorf("failed to create bifrost_safe_jsonb: %w", err) + } + return nil + }, + Rollback: func(tx *gorm.DB) error { + tx = tx.WithContext(ctx) + return tx.Exec("DROP FUNCTION IF EXISTS bifrost_safe_jsonb(text)").Error + }, + }}) + if err := m.Migrate(); err != nil { + return fmt.Errorf("error while adding bifrost_safe_jsonb function: %s", err.Error()) + } + return nil +} diff --git a/framework/logstore/rdb.go b/framework/logstore/rdb.go index ead0e4a0c3..7776550565 100644 --- a/framework/logstore/rdb.go +++ b/framework/logstore/rdb.go @@ -192,6 +192,29 @@ func (s *RDBLogStore) applyFilters(baseQuery *gorm.DB, filters SearchFilters) *g // cost is null and status is not error baseQuery = baseQuery.Where("(cost IS NULL OR cost <= 0) AND status NOT IN ('error')") } + if len(filters.CacheHitTypes) > 0 { + // Only keep allowed values to avoid passing arbitrary input into the JSON path expression. + valid := make([]string, 0, len(filters.CacheHitTypes)) + for _, t := range filters.CacheHitTypes { + if t == "direct" || t == "semantic" { + valid = append(valid, t) + } + } + if len(valid) > 0 { + if s.db.Dialector.Name() == "postgres" { + // Match the same loose-JSON guard used by aggregateCacheHits so the regex extract is safe. + baseQuery = baseQuery.Where( + "cache_debug IS NOT NULL AND cache_debug <> '' AND cache_debug ~ '^\\s*\\{.*\\}\\s*$' AND substring(cache_debug from '\"hit_type\"[[:space:]]*:[[:space:]]*\"([^\"]+)\"') IN ?", + valid, + ) + } else { + baseQuery = baseQuery.Where( + "cache_debug IS NOT NULL AND cache_debug != '' AND json_valid(cache_debug) AND json_extract(cache_debug, '$.hit_type') IN ?", + valid, + ) + } + } + } if filters.ContentSearch != "" { dialect := s.db.Dialector.Name() if dialect == "postgres" { @@ -642,7 +665,7 @@ func (s *RDBLogStore) listSelectColumns() string { "business_unit_id", "business_unit_name", "speech_input", "transcription_input", "image_generation_input", "video_generation_input", "latency", "token_usage", "cost", "status", "error_details", "stream", - "content_summary", "metadata", + "content_summary", "metadata", "cache_debug", "is_large_payload_request", "is_large_payload_response", "prompt_tokens", "completion_tokens", "total_tokens", "created_at", @@ -651,35 +674,35 @@ func (s *RDBLogStore) listSelectColumns() string { var inputHistoryExpr, responsesInputExpr, outputMessageExpr string switch s.db.Dialector.Name() { case "postgres": - // Postgres jsonb cannot represent \u0000 (errcode 22P05) and rejects - // malformed JSON (22P02). A single bad row would otherwise abort the - // whole list query. Guard the cast: only attempt jsonb conversion - // when the TEXT looks like a JSON array and contains no \u0000 - // escape; otherwise fall back to returning the raw TEXT. + // Postgres jsonb rejects malformed JSON (22P02), \u0000 escapes + // (22P05), and unpaired UTF-16 surrogates (22P05). A single bad row + // would otherwise abort the whole list query. bifrost_safe_jsonb + // wraps the cast in an EXCEPTION block and returns the raw TEXT on + // any parse failure; see migrationAddSafeJsonbFunction. inputHistoryExpr = `CASE WHEN object_type = 'realtime.turn' THEN input_history - WHEN input_history IS NOT NULL AND input_history != '' AND input_history != '[]' - AND position('\u0000' in input_history) = 0 - AND left(btrim(input_history), 1) = '[' - THEN jsonb_build_array(input_history::jsonb->-1)::text - ELSE input_history END AS input_history` + ELSE bifrost_safe_jsonb(input_history) + END AS input_history` responsesInputExpr = `CASE WHEN object_type = 'realtime.turn' THEN responses_input_history - WHEN responses_input_history IS NOT NULL AND responses_input_history != '' AND responses_input_history != '[]' - AND position('\u0000' in responses_input_history) = 0 - AND left(btrim(responses_input_history), 1) = '[' - THEN jsonb_build_array(responses_input_history::jsonb->-1)::text - ELSE responses_input_history END AS responses_input_history` + ELSE bifrost_safe_jsonb(responses_input_history) + END AS responses_input_history` outputMessageExpr = `CASE WHEN object_type = 'realtime.turn' THEN output_message ELSE NULL END AS output_message` default: // sqlite inputHistoryExpr = `CASE WHEN object_type = 'realtime.turn' THEN input_history WHEN input_history IS NOT NULL AND input_history != '' AND input_history != '[]' + AND json_valid(input_history) = 1 + AND json_type(input_history) = 'array' + AND json_array_length(input_history) > 0 THEN json_array(json_extract(input_history, '$[' || (json_array_length(input_history) - 1) || ']')) ELSE input_history END AS input_history` responsesInputExpr = `CASE WHEN object_type = 'realtime.turn' THEN responses_input_history WHEN responses_input_history IS NOT NULL AND responses_input_history != '' AND responses_input_history != '[]' + AND json_valid(responses_input_history) = 1 + AND json_type(responses_input_history) = 'array' + AND json_array_length(responses_input_history) > 0 THEN json_array(json_extract(responses_input_history, '$[' || (json_array_length(responses_input_history) - 1) || ']')) ELSE responses_input_history END AS responses_input_history` outputMessageExpr = `CASE WHEN object_type = 'realtime.turn' THEN output_message ELSE NULL END AS output_message` diff --git a/framework/logstore/safe_jsonb_test.go b/framework/logstore/safe_jsonb_test.go new file mode 100644 index 0000000000..4e3e4279ce --- /dev/null +++ b/framework/logstore/safe_jsonb_test.go @@ -0,0 +1,314 @@ +package logstore + +import ( + "context" + "strings" + "testing" + "time" + + "github.com/google/uuid" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "gorm.io/gorm" +) + +// malformedHistoryCase covers the scenarios that previously aborted /api/logs +// via the unsafe input_history::jsonb cast (and the mirror cases on +// responses_input_history). Each case exercises one shape of bad data; the +// SearchLogs query must complete without error and return every row. +type malformedHistoryCase struct { + name string + objectType string // defaults to "chat.completion"; set to "realtime.turn" to test passthrough branch + inputHistory string // raw TEXT stored in logs.input_history + respHistory string // raw TEXT stored in logs.responses_input_history + shouldCrashPG bool // true if the row would have aborted the pre-fix Postgres list query +} + +func malformedHistoryCases() []malformedHistoryCase { + // Built at runtime so the source file stays free of literal NUL bytes / + // lone surrogates that the Go parser would reject. + bs := "\\" + u0000 := bs + "u0000" + uD800 := bs + "uD800" + uDC00 := bs + "uDC00" + + // Build a large but valid JSON array to exercise the happy path on inputs + // that wouldn't fit comfortably inline. + var big strings.Builder + big.WriteByte('[') + for i := 0; i < 1000; i++ { + if i > 0 { + big.WriteByte(',') + } + big.WriteString(`{"role":"user","content":"msg"}`) + } + big.WriteString(`,{"role":"assistant","content":"last"}]`) + + return []malformedHistoryCase{ + // ---------- malformed: jsonb cast failures (22P02) ---------- + { + name: "unterminated_object_in_array", + inputHistory: `[{"role":"user","content":"hi"`, + shouldCrashPG: true, + }, + { + name: "garbage_after_bracket", + inputHistory: `[abc, not json]`, + shouldCrashPG: true, + }, + { + name: "trailing_comma", + inputHistory: `[{"role":"user","content":"hi"},]`, + shouldCrashPG: true, + }, + { + name: "unclosed_array_only", + inputHistory: `[`, + shouldCrashPG: true, + }, + { + name: "open_bracket_then_brace_unclosed", + inputHistory: `[{`, + shouldCrashPG: true, + }, + { + name: "nan_value_not_valid_json", + inputHistory: `[NaN]`, + shouldCrashPG: true, + }, + { + name: "infinity_value_not_valid_json", + inputHistory: `[Infinity]`, + shouldCrashPG: true, + }, + // ---------- malformed: jsonb cast failures (22P05 character set) ---------- + { + name: "unpaired_high_surrogate", + inputHistory: `[{"role":"user","content":"bad ` + uD800 + ` surrogate"}]`, + shouldCrashPG: true, + }, + { + name: "unpaired_low_surrogate", + inputHistory: `[{"role":"user","content":"bad ` + uDC00 + ` low"}]`, + shouldCrashPG: true, + }, + { + name: "bad_surrogate_pair_high_then_ascii", + inputHistory: `[{"role":"user","content":"` + uD800 + bs + `u0041"}]`, + shouldCrashPG: true, + }, + { + name: "u0000_escape_inside_string", + inputHistory: `[{"role":"user","content":"null byte ` + u0000 + ` here"}]`, + shouldCrashPG: true, + }, + // ---------- valid happy path: should pass through fast lane ---------- + { + name: "literal_backslash_u0000_valid_jsonb", + inputHistory: `[{"role":"user","content":"backslash-u-zero \\u0000 literal"}]`, + }, + { + name: "single_element_array", + inputHistory: `[{"role":"user","content":"only one"}]`, + }, + { + name: "array_of_primitives", + inputHistory: `[1,2,3]`, + }, + { + name: "array_with_null_last_element", + inputHistory: `[{"role":"user","content":"x"}, null]`, + }, + { + name: "deeply_nested_valid", + inputHistory: `[{"role":"user","content":{"nested":{"deep":{"value":42}}}}]`, + }, + { + name: "unicode_emoji_content", + inputHistory: `[{"role":"user","content":"hello 🎉 world ✨"}]`, + }, + { + name: "large_valid_array", + inputHistory: big.String(), + }, + // ---------- valid non-array / structurally OK: fall through to raw ---------- + { + name: "leading_whitespace_then_array", + inputHistory: " [\t{\"role\":\"user\",\"content\":\"ok\"}]", + }, + { + name: "top_level_object_not_array", + inputHistory: `{"not":"an array"}`, + }, + { + name: "null_literal", + inputHistory: `null`, + }, + { + name: "whitespace_only", + inputHistory: " \t ", + }, + // ---------- realtime.turn passthrough: outer CASE bypasses safe function ---------- + { + name: "realtime_turn_malformed_passthrough", + objectType: "realtime.turn", + inputHistory: `[{"role":"user"`, // malformed; must not crash even though safe fn is bypassed + }, + // ---------- mirror column ---------- + { + name: "malformed_responses_input_history", + respHistory: `[{"role":"user"`, + shouldCrashPG: true, + }, + { + name: "valid_responses_input_history", + respHistory: `[{"role":"user","content":"ok"},{"role":"assistant","content":"hi"}]`, + }, + } +} + +// insertMalformedLog inserts a logs row with the given raw input_history / +// responses_input_history TEXT values, bypassing GORM serialization so the +// exact byte sequence reaches the database. +func insertMalformedLog(t *testing.T, db *gorm.DB, c malformedHistoryCase, ts time.Time) string { + t.Helper() + id := uuid.New().String() + objType := c.objectType + if objType == "" { + objType = "chat.completion" + } + err := db.Exec(` + INSERT INTO logs (id, timestamp, object_type, provider, model, status, + input_history, responses_input_history, created_at) + VALUES (?, ?, ?, 'openai', 'gpt-4', 'success', ?, ?, ?) + `, id, ts, objType, c.inputHistory, c.respHistory, ts).Error + require.NoError(t, err, "failed to insert row for case %q", c.name) + return id +} + +// runMalformedHistorySuite exercises SearchLogs against each malformed case +// and asserts the query completes successfully for every row. Reused by the +// SQLite and Postgres entry points so both dialect branches of +// listSelectColumns are covered. +func runMalformedHistorySuite(t *testing.T, store *RDBLogStore, db *gorm.DB) { + t.Helper() + ctx := context.Background() + now := time.Now().UTC() + + cases := malformedHistoryCases() + expectedIDs := make(map[string]string, len(cases)) + for i, c := range cases { + // Spread timestamps so the DESC sort is stable per-case. + id := insertMalformedLog(t, db, c, now.Add(-time.Duration(i)*time.Second)) + expectedIDs[id] = c.name + } + + result, err := store.SearchLogs(ctx, SearchFilters{}, PaginationOptions{Limit: 1000}) + require.NoError(t, err, "SearchLogs must not fail on malformed input_history") + require.NotNil(t, result) + + // Every inserted row must come back, regardless of payload shape. + gotIDs := make(map[string]bool, len(result.Logs)) + for _, l := range result.Logs { + gotIDs[l.ID] = true + } + for id, name := range expectedIDs { + assert.True(t, gotIDs[id], "row for case %q (id=%s) missing from SearchLogs result", name, id) + } +} + +// TestSearchLogs_MalformedInputHistory_SQLite exercises the SQLite branch of +// listSelectColumns, which now gates json_extract on json_valid + json_type. +func TestSearchLogs_MalformedInputHistory_SQLite(t *testing.T) { + store := newTestSQLiteStore(t) + defer store.Close(context.Background()) + runMalformedHistorySuite(t, store, store.db) +} + +// TestSearchLogs_MalformedInputHistory_Postgres exercises the Postgres branch, +// which now routes through the bifrost_safe_jsonb PL/pgSQL helper. Skipped +// when Postgres is unavailable. +func TestSearchLogs_MalformedInputHistory_Postgres(t *testing.T) { + store, db := setupPerfTestDB(t) + runMalformedHistorySuite(t, store, db) +} + +// TestBifrostSafeJsonb_DirectInvocation exercises the PL/pgSQL helper in +// isolation so a regression in the function body shows up here rather than +// only at the list-query level. Each subtest asserts the exact TEXT the +// function returns so we can also detect silent behaviour drift on the +// happy path (e.g. canonical jsonb spacing changes between PG versions). +func TestBifrostSafeJsonb_DirectInvocation(t *testing.T) { + _, db := setupPerfTestDB(t) + + bs := "\\" + u0000 := bs + "u0000" + uD800 := bs + "uD800" + uDC00 := bs + "uDC00" + + // Build large valid array — last element should be `{"last": true}`. + var big strings.Builder + big.WriteByte('[') + for i := 0; i < 500; i++ { + big.WriteString(`{"i":`) + big.WriteByte(byte('0' + (i % 10))) + big.WriteString(`},`) + } + big.WriteString(`{"last":true}]`) + + cases := []struct { + name string + in string + want string + }{ + // fast-path bypasses + {name: "empty_string", in: "", want: ""}, + {name: "empty_array", in: "[]", want: "[]"}, + + // happy path: last-element extraction + {name: "valid_two_element_array", in: `[{"a":1},{"b":2}]`, want: `[{"b": 2}]`}, + {name: "single_element_array", in: `[{"a":1}]`, want: `[{"a": 1}]`}, + {name: "array_of_primitives", in: `[1,2,3]`, want: `[3]`}, + {name: "array_with_null_last", in: `[{"a":1},null]`, want: `[null]`}, + {name: "nested_object_last", in: `[{"x":{"y":{"z":42}}}]`, want: `[{"x": {"y": {"z": 42}}}]`}, + {name: "large_valid_array_last_only", in: big.String(), want: `[{"last": true}]`}, + + // non-array values: function returns raw TEXT unchanged + {name: "object_not_array_returns_raw", in: `{"x":1}`, want: `{"x":1}`}, + {name: "null_literal_returns_raw", in: `null`, want: `null`}, + {name: "number_literal_returns_raw", in: `42`, want: `42`}, + {name: "string_literal_returns_raw", in: `"hello"`, want: `"hello"`}, + {name: "whitespace_only_returns_raw", in: " \t ", want: " \t "}, + + // malformed: EXCEPTION branch catches, returns raw TEXT + {name: "unterminated_returns_raw", in: `[{"role":"user"`, want: `[{"role":"user"`}, + {name: "garbage_after_bracket_returns_raw", in: `[abc]`, want: `[abc]`}, + {name: "trailing_comma_returns_raw", in: `[1,2,]`, want: `[1,2,]`}, + {name: "nan_returns_raw", in: `[NaN]`, want: `[NaN]`}, + {name: "infinity_returns_raw", in: `[Infinity]`, want: `[Infinity]`}, + {name: "high_surrogate_returns_raw", in: `[{"c":"` + uD800 + `"}]`, want: `[{"c":"` + uD800 + `"}]`}, + {name: "low_surrogate_returns_raw", in: `[{"c":"` + uDC00 + `"}]`, want: `[{"c":"` + uDC00 + `"}]`}, + {name: "bad_surrogate_pair_returns_raw", in: `[{"c":"` + uD800 + bs + `u0041"}]`, want: `[{"c":"` + uD800 + bs + `u0041"}]`}, + {name: "u0000_escape_returns_raw", in: `[{"c":"x` + u0000 + `y"}]`, want: `[{"c":"x` + u0000 + `y"}]`}, + } + + for _, c := range cases { + t.Run(c.name, func(t *testing.T) { + var got string + err := db.Raw("SELECT bifrost_safe_jsonb(?)", c.in).Scan(&got).Error + require.NoError(t, err, "bifrost_safe_jsonb must not propagate parse errors") + assert.Equal(t, c.want, got) + }) + } +} + +// TestBifrostSafeJsonb_NullInput verifies the function handles SQL NULL +// (distinct from empty string) by returning NULL without erroring. +func TestBifrostSafeJsonb_NullInput(t *testing.T) { + _, db := setupPerfTestDB(t) + + var got *string + err := db.Raw("SELECT bifrost_safe_jsonb(NULL::text)").Scan(&got).Error + require.NoError(t, err, "bifrost_safe_jsonb(NULL) must not error") + assert.Nil(t, got, "bifrost_safe_jsonb(NULL) should return NULL") +} diff --git a/framework/logstore/tables.go b/framework/logstore/tables.go index ac0b352628..95f1aff4f1 100644 --- a/framework/logstore/tables.go +++ b/framework/logstore/tables.go @@ -68,6 +68,7 @@ type SearchFilters struct { MinCost *float64 `json:"min_cost,omitempty"` MaxCost *float64 `json:"max_cost,omitempty"` MissingCostOnly bool `json:"missing_cost_only,omitempty"` + CacheHitTypes []string `json:"cache_hit_types,omitempty"` // For filtering by local-cache hit type ("direct", "semantic") ContentSearch string `json:"content_search,omitempty"` MetadataFilters map[string]string `json:"metadata_filters,omitempty"` // key=metadataKey, value=metadataValue for filtering by metadata } diff --git a/framework/modelcatalog/pricing.go b/framework/modelcatalog/pricing.go index 68b6551be5..f3c13ff907 100644 --- a/framework/modelcatalog/pricing.go +++ b/framework/modelcatalog/pricing.go @@ -322,7 +322,7 @@ func (mc *ModelCatalog) calculateBaseCost(result *schemas.BifrostResponse, scope // Route to the appropriate compute function switch requestType { - case schemas.ChatCompletionRequest, schemas.TextCompletionRequest, schemas.ResponsesRequest: + case schemas.ChatCompletionRequest, schemas.TextCompletionRequest, schemas.ResponsesRequest, schemas.RealtimeRequest: return computeTextCost(pricing, input.usage, input.tier) case schemas.EmbeddingRequest: return computeEmbeddingCost(pricing, input.usage, input.tier) @@ -457,6 +457,7 @@ func responsesUsageToBifrostUsage(u *schemas.ResponsesResponseUsage) *schemas.Bi if u.OutputTokensDetails != nil { usage.CompletionTokensDetails = &schemas.ChatCompletionTokensDetails{ ReasoningTokens: u.OutputTokensDetails.ReasoningTokens, + AudioTokens: u.OutputTokensDetails.AudioTokens, } if u.OutputTokensDetails.NumSearchQueries != nil { usage.CompletionTokensDetails.NumSearchQueries = u.OutputTokensDetails.NumSearchQueries @@ -561,13 +562,43 @@ func computeTextCost(pricing *configstoreTables.TableModelPricing, usage *schema outputCost := float64(completionTokens) * outputRate + // Audio token cost: when token details include audio tokens, price them + // at the dedicated audio rate and subtract from the text token costs above. + // Realtime and audio-enabled chat models report audio tokens in details. + audioCost := 0.0 + inputAudioTokens := 0 + outputAudioTokens := 0 + if usage.PromptTokensDetails != nil { + inputAudioTokens = usage.PromptTokensDetails.AudioTokens + } + if usage.CompletionTokensDetails != nil { + outputAudioTokens = usage.CompletionTokensDetails.AudioTokens + } + if inputAudioTokens < 0 { + inputAudioTokens = 0 + } else if inputAudioTokens > promptTokens { + inputAudioTokens = promptTokens + } + if outputAudioTokens < 0 { + outputAudioTokens = 0 + } else if outputAudioTokens > completionTokens { + outputAudioTokens = completionTokens + } + if inputAudioTokens > 0 && pricing.InputCostPerAudioToken != nil { + // Subtract audio tokens charged at text rate, add at audio rate. + audioCost += float64(inputAudioTokens) * (*pricing.InputCostPerAudioToken - inputRate) + } + if outputAudioTokens > 0 && pricing.OutputCostPerAudioToken != nil { + audioCost += float64(outputAudioTokens) * (*pricing.OutputCostPerAudioToken - outputRate) + } + // Search query cost searchCost := 0.0 if pricing.SearchContextCostPerQuery != nil && usage.CompletionTokensDetails != nil && usage.CompletionTokensDetails.NumSearchQueries != nil { searchCost = float64(*usage.CompletionTokensDetails.NumSearchQueries) * *pricing.SearchContextCostPerQuery } - return inputCost + outputCost + searchCost + return inputCost + outputCost + audioCost + searchCost } // computeEmbeddingCost handles embedding requests (input-only). diff --git a/framework/modelcatalog/sync.go b/framework/modelcatalog/sync.go index 3c10f929f8..3aad4f5925 100644 --- a/framework/modelcatalog/sync.go +++ b/framework/modelcatalog/sync.go @@ -395,7 +395,7 @@ func (mc *ModelCatalog) applyModelParameters(paramsData map[string]json.RawMessa } if err := json.Unmarshal(rawData, &p); err == nil && (p.MaxOutputTokens != nil || parsed.VertexMultiRegionOnly != nil) { modelParamsEntries[model] = providerUtils.ModelParams{ - MaxOutputTokens: p.MaxOutputTokens, + MaxOutputTokens: p.MaxOutputTokens, IsVertexMultiRegionOnly: parsed.VertexMultiRegionOnly, } } @@ -504,4 +504,4 @@ func (mc *ModelCatalog) loadModelParametersFromURL(ctx context.Context) (map[str mc.logger.Debug("successfully downloaded and parsed %d model parameters records", len(paramsData)) return paramsData, nil -} \ No newline at end of file +} diff --git a/framework/streaming/responses.go b/framework/streaming/responses.go index 7712d6ecaa..91b0326eb7 100644 --- a/framework/streaming/responses.go +++ b/framework/streaming/responses.go @@ -51,6 +51,11 @@ func deepCopyResponsesStreamResponse(original *schemas.BifrostResponsesStreamRes copy.OutputIndex = ©OutputIndex } + if original.SummaryIndex != nil { + copySummaryIndex := *original.SummaryIndex + copy.SummaryIndex = ©SummaryIndex + } + if original.Item != nil { copyItem := deepCopyResponsesMessage(*original.Item) copy.Item = ©Item @@ -76,6 +81,16 @@ func deepCopyResponsesStreamResponse(original *schemas.BifrostResponsesStreamRes copy.Delta = ©Delta } + if original.Signature != nil { + copySignature := *original.Signature + copy.Signature = ©Signature + } + + if original.Obfuscation != nil { + copyObfuscation := *original.Obfuscation + copy.Obfuscation = ©Obfuscation + } + // Deep copy LogProbs slice if present if original.LogProbs != nil { copy.LogProbs = make([]schemas.ResponsesOutputMessageContentTextLogProb, len(original.LogProbs)) @@ -173,6 +188,16 @@ func deepCopyResponsesMessage(original schemas.ResponsesMessage) schemas.Respons copy.Type = ©Type } + if original.Status != nil { + copyStatus := *original.Status + copy.Status = ©Status + } + + if original.Phase != nil { + copyPhase := *original.Phase + copy.Phase = ©Phase + } + if original.Role != nil { copyRole := *original.Role copy.Role = ©Role diff --git a/framework/streaming/responses_test.go b/framework/streaming/responses_test.go new file mode 100644 index 0000000000..26515cec18 --- /dev/null +++ b/framework/streaming/responses_test.go @@ -0,0 +1,71 @@ +package streaming + +import ( + "testing" + + "github.com/maximhq/bifrost/core/schemas" +) + +// TestDeepCopyResponsesStreamResponsePreservesAllFields guards the deep-copy +// helper against silently dropping fields that survive unmarshal/WithDefaults. +// Covers the fields introduced in PR #3528 (Phase, SummaryIndex, Obfuscation) +// plus the latent leaks the same PR incidentally fixed (Status, Signature). +func TestDeepCopyResponsesStreamResponsePreservesAllFields(t *testing.T) { + original := &schemas.BifrostResponsesStreamResponse{ + Type: schemas.ResponsesStreamResponseTypeReasoningSummaryTextDelta, + SequenceNumber: 4, + SummaryIndex: schemas.Ptr(2), + Signature: schemas.Ptr("sig-xyz"), + Obfuscation: schemas.Ptr("opaque-padding"), + Item: &schemas.ResponsesMessage{ + ID: schemas.Ptr("msg_123"), + Status: schemas.Ptr("in_progress"), + Phase: schemas.Ptr("final_answer"), + }, + } + + copied := deepCopyResponsesStreamResponse(original) + if copied == nil { + t.Fatal("expected non-nil deep copy") + } + + // Value equality on the new + latent-leak fields. + if got := copied.SummaryIndex; got == nil || *got != 2 { + t.Errorf("SummaryIndex: want 2, got %#v", got) + } + if got := copied.Signature; got == nil || *got != "sig-xyz" { + t.Errorf("Signature: want %q, got %#v", "sig-xyz", got) + } + if got := copied.Obfuscation; got == nil || *got != "opaque-padding" { + t.Errorf("Obfuscation: want %q, got %#v", "opaque-padding", got) + } + if got := copied.Item.Status; got == nil || *got != "in_progress" { + t.Errorf("Item.Status: want %q, got %#v", "in_progress", got) + } + if got := copied.Item.Phase; got == nil || *got != "final_answer" { + t.Errorf("Item.Phase: want %q, got %#v", "final_answer", got) + } + + // Independence: mutating the original's pointees must not mutate the copy. + *original.SummaryIndex = 99 + *original.Signature = "mutated" + *original.Obfuscation = "mutated" + *original.Item.Status = "mutated" + *original.Item.Phase = "mutated" + + if *copied.SummaryIndex != 2 { + t.Errorf("SummaryIndex aliased original: got %d", *copied.SummaryIndex) + } + if *copied.Signature != "sig-xyz" { + t.Errorf("Signature aliased original: got %q", *copied.Signature) + } + if *copied.Obfuscation != "opaque-padding" { + t.Errorf("Obfuscation aliased original: got %q", *copied.Obfuscation) + } + if *copied.Item.Status != "in_progress" { + t.Errorf("Item.Status aliased original: got %q", *copied.Item.Status) + } + if *copied.Item.Phase != "final_answer" { + t.Errorf("Item.Phase aliased original: got %q", *copied.Item.Phase) + } +} diff --git a/framework/vectorstore/weaviate.go b/framework/vectorstore/weaviate.go index 9c34ab2c83..4db066e156 100644 --- a/framework/vectorstore/weaviate.go +++ b/framework/vectorstore/weaviate.go @@ -476,6 +476,12 @@ func newWeaviateStore(ctx context.Context, config *WeaviateConfig, logger schema } func (s *WeaviateStore) CreateNamespace(ctx context.Context, className string, dimension int, properties map[string]VectorStoreProperties) error { + // Reject names Weaviate would silently auto-capitalize: writes via REST + // route fine, but the GraphQL read path is case-strict and breaks. + if err := validateClassName(className); err != nil { + return err + } + // Check if class exists exists, err := s.client.Schema().ClassExistenceChecker(). WithClassName(className). @@ -637,3 +643,20 @@ func convertOperator(op QueryOperator) filters.WhereOperator { return filters.Equal } } + +// validateClassName enforces Weaviate's class-name rule that the first +// character must be an uppercase ASCII letter. Weaviate's REST endpoints +// silently auto-capitalize a lowercase first character on class creation, +// which means writes appear to succeed under the user-supplied name but +// GraphQL reads (which are case-strict) then fail with "Did you mean +// ?". Surface this at config-save time instead. +func validateClassName(name string) error { + if name == "" { + return nil + } + first := name[0] + if first < 'A' || first > 'Z' { + return fmt.Errorf("Weaviate requires class names to start with an uppercase letter (A-Z); got %q. Try %q", name, strings.ToUpper(name[:1])+name[1:]) + } + return nil +} diff --git a/helm-charts/bifrost/Chart.yaml b/helm-charts/bifrost/Chart.yaml index 24c3ab040e..1000db3fa9 100644 --- a/helm-charts/bifrost/Chart.yaml +++ b/helm-charts/bifrost/Chart.yaml @@ -2,7 +2,7 @@ apiVersion: v2 name: bifrost description: A Helm chart for deploying Bifrost - AI Gateway with unified interface for multiple providers type: application -version: 2.1.16 +version: 2.1.17 appVersion: "1.5.0" keywords: - ai diff --git a/helm-charts/bifrost/README.md b/helm-charts/bifrost/README.md index 7b9d5f860c..691e7c3c2c 100644 --- a/helm-charts/bifrost/README.md +++ b/helm-charts/bifrost/README.md @@ -4,10 +4,14 @@ Official Helm charts for deploying [Bifrost](https://github.com/maximhq/bifrost) - a high-performance AI gateway with unified interface for multiple providers. -**Latest Version:** 2.1.14 +**Latest Version:** 2.1.17 ## Changelog +### 2.1.17 + +- Added `max_turns_to_send` to guardrail rules. The integer caps how many historical conversation turns are sent to the guardrail provider on apply; the latest message is always included on top, and `0` (default) sends all turns. Wired into `values.schema.json`, `config.schema.json`, and `templates/_helpers.tpl` so it renders into `guardrails_config.guardrail_rules[].max_turns_to_send`. + ### 2.1.14 - Removed the obsolete `bifrost.client.allowDirectKeys` assertion from `validate-helm-config-fields.sh`. The field was deleted from the chart schema and codebase in a prior release, so the test was rendering an invalid values file and helm was rejecting it via `additionalProperties: false`. diff --git a/helm-charts/bifrost/templates/_helpers.tpl b/helm-charts/bifrost/templates/_helpers.tpl index 9849b760f4..b1d66c13ec 100644 --- a/helm-charts/bifrost/templates/_helpers.tpl +++ b/helm-charts/bifrost/templates/_helpers.tpl @@ -647,6 +647,7 @@ false {{- if hasKey . "query" }}{{- $_ := set $rule "query" .query }}{{- end }} {{- if .sampling_rate }}{{- $_ := set $rule "sampling_rate" .sampling_rate }}{{- end }} {{- if .timeout }}{{- $_ := set $rule "timeout" .timeout }}{{- end }} +{{- if hasKey . "max_turns_to_send" }}{{- $_ := set $rule "max_turns_to_send" .max_turns_to_send }}{{- end }} {{- if .provider_config_ids }}{{- $_ := set $rule "provider_config_ids" .provider_config_ids }}{{- end }} {{- $rules = append $rules $rule }} {{- end }} @@ -1100,9 +1101,6 @@ false {{- if hasKey $inputConfig "exclude_system_prompt" }} {{- $_ := set $scConfig "exclude_system_prompt" $inputConfig.exclude_system_prompt }} {{- end }} -{{- if hasKey $inputConfig "cleanup_on_shutdown" }} -{{- $_ := set $scConfig "cleanup_on_shutdown" $inputConfig.cleanup_on_shutdown }} -{{- end }} {{- $plugin := dict "enabled" true "name" "semantic_cache" "config" $scConfig }} {{- if hasKey .Values.bifrost.plugins.semanticCache "version" }}{{- $_ := set $plugin "version" (.Values.bifrost.plugins.semanticCache.version | int) }}{{- end }} {{- $plugins = append $plugins $plugin }} diff --git a/helm-charts/bifrost/values.schema.json b/helm-charts/bifrost/values.schema.json index 9507bb8a97..23d45b0f95 100644 --- a/helm-charts/bifrost/values.schema.json +++ b/helm-charts/bifrost/values.schema.json @@ -817,9 +817,6 @@ "exclude_system_prompt": { "type": "boolean" }, - "cleanup_on_shutdown": { - "type": "boolean" - }, "vector_store_namespace": { "type": "string" }, @@ -2019,7 +2016,12 @@ "timeout": { "type": "integer", "minimum": 0, - "description": "Timeout in milliseconds for rule execution" + "description": "Timeout in seconds for rule execution" + }, + "max_turns_to_send": { + "type": "integer", + "minimum": 0, + "description": "Number of historical conversation turns to send to the guardrail provider; the latest message is always included on top. 0 sends all turns." }, "provider_config_ids": { "type": "array", @@ -2064,7 +2066,7 @@ "timeout": { "type": "integer", "minimum": 0, - "description": "Timeout in milliseconds for provider execution" + "description": "Timeout in seconds for provider execution" }, "config": { "type": "object", diff --git a/helm-charts/bifrost/values.yaml b/helm-charts/bifrost/values.yaml index b4d85cf890..051d90bd02 100644 --- a/helm-charts/bifrost/values.yaml +++ b/helm-charts/bifrost/values.yaml @@ -420,7 +420,6 @@ bifrost: cache_by_model: true cache_by_provider: true exclude_system_prompt: false - cleanup_on_shutdown: false vector_store_namespace: "" otel: @@ -649,6 +648,7 @@ bifrost: # apply_to: "input" # sampling_rate: 100 # timeout: 1000 + # max_turns_to_send: 0 providers: [] # - id: 1 # provider_name: "bedrock" diff --git a/helm-charts/index.yaml b/helm-charts/index.yaml index c90b9cec92..f56233f2a1 100644 --- a/helm-charts/index.yaml +++ b/helm-charts/index.yaml @@ -1,6 +1,29 @@ apiVersion: v1 entries: bifrost: + - apiVersion: v2 + appVersion: 1.5.0 + created: "2026-05-14T00:00:00Z" + description: A Helm chart for deploying Bifrost - AI Gateway with unified interface + for multiple providers + home: https://www.getmaxim.ai/bifrost + icon: https://www.getbifrost.ai/favicon.png + keywords: + - ai + - gateway + - llm + - openai + - anthropic + maintainers: + - email: support@getbifrost.ai + name: Bifrost Team + name: bifrost + sources: + - https://github.com/maximhq/bifrost + type: application + urls: + - https://maximhq.github.io/bifrost/helm-charts/bifrost-2.1.17.tgz + version: 2.1.17 - apiVersion: v2 appVersion: 1.5.0 created: "2026-05-12T19:31:15.185424+05:30" @@ -892,4 +915,4 @@ entries: urls: - https://maximhq.github.io/bifrost/helm-charts/bifrost-1.3.36.tgz version: 1.3.36 -generated: "2026-05-12T19:31:15.183068+05:30" +generated: "2026-05-14T00:00:00Z" diff --git a/npx/bifrost/package-lock.json b/npx/bifrost/package-lock.json index 5edb2b6da1..6a7beb5e10 100644 --- a/npx/bifrost/package-lock.json +++ b/npx/bifrost/package-lock.json @@ -1,12 +1,12 @@ { "name": "@maximhq/bifrost", - "version": "1.0.6", + "version": "1.6.3", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "@maximhq/bifrost", - "version": "1.0.4", + "version": "1.6.3", "license": "Apache-2.0", "bin": { "bifrost": "bin.js" diff --git a/npx/bifrost/package.json b/npx/bifrost/package.json index 3fe55f9df0..f2ce6ebc58 100644 --- a/npx/bifrost/package.json +++ b/npx/bifrost/package.json @@ -1,6 +1,6 @@ { "name": "@maximhq/bifrost", - "version": "1.6.2", + "version": "1.6.3", "description": "High-performance AI gateway CLI - connect to 12+ providers through a single API", "keywords": ["ai", "gateway", "openai", "anthropic", "cli", "bifrost"], "homepage": "https://github.com/maximhq/bifrost", diff --git a/plugins/governance/main.go b/plugins/governance/main.go index d87dcb0e93..78cb321a3a 100644 --- a/plugins/governance/main.go +++ b/plugins/governance/main.go @@ -355,8 +355,15 @@ func (p *GovernancePlugin) HTTPTransportPreHook(ctx *schemas.BifrostContext, req return nil, nil } - // If no body, check if large payload mode is active for read-only governance + // If no body, check if the request carries a model via query params (e.g. realtime + // WebSocket upgrades: GET /v1/realtime?model=... or Azure preview ?deployment=...) + // or if large payload mode is active. + // For query-param-based models we build a synthetic payload so routing rules and VK + // load-balancing can rewrite provider/model, then propagate changes back to the query. if len(req.Body) == 0 { + if modelParam := realtimeModelQueryParam(req); modelParam != "" { + return p.governRealtimeQueryParam(ctx, req, virtualKeyValue, hasRoutingRules) + } isLargePayload, _ := ctx.Value(schemas.BifrostContextKeyLargePayloadMode).(bool) if !isLargePayload { return nil, nil @@ -572,6 +579,93 @@ func (p *GovernancePlugin) governLargePayload(ctx *schemas.BifrostContext, req * return nil, nil } +// realtimeModelQueryParam returns the query parameter used as the realtime model selector. +// Azure preview realtime uses `deployment`, while GA/OpenAI-compatible paths use `model`. +func realtimeModelQueryParam(req *schemas.HTTPRequest) string { + if req == nil || req.Query == nil { + return "" + } + if modelParam := req.Query["model"]; modelParam != "" { + return modelParam + } + return req.Query["deployment"] +} + +// governRealtimeQueryParam handles governance for bodyless realtime requests +// (e.g. WebSocket upgrade GET /v1/realtime?model=... or Azure preview +// /realtime?deployment=...) where the model lives in a query parameter instead +// of the JSON body. We build a synthetic payload so routing rules and VK +// load-balancing can evaluate normally, then propagate any model rewrite back +// to the original query param for the downstream handler to pick up. +func (p *GovernancePlugin) governRealtimeQueryParam(ctx *schemas.BifrostContext, req *schemas.HTTPRequest, virtualKeyValue *string, hasRoutingRules bool) (*schemas.HTTPResponse, error) { + modelQueryKey := "model" + modelParam := req.Query[modelQueryKey] + if modelParam == "" { + modelQueryKey = "deployment" + modelParam = req.Query[modelQueryKey] + } + if modelParam == "" { + return nil, nil + } + + payload := map[string]any{ + "model": modelParam, + } + originalModel := modelParam + + // Process virtual key if provided + var virtualKey *configstoreTables.TableVirtualKey + if virtualKeyValue != nil { + vk, ok := p.store.GetVirtualKey(ctx, *virtualKeyValue) + if !ok || vk == nil || !vk.IsActiveValue() { + return nil, nil + } + virtualKey = vk + } + + // Attaching team and customer based on the virtual key + if virtualKey != nil { + if virtualKey.TeamID != nil { + ctx.SetValue(schemas.BifrostContextKeyGovernanceTeamID, *virtualKey.TeamID) + } + if virtualKey.Team != nil { + ctx.SetValue(schemas.BifrostContextKeyGovernanceTeamName, virtualKey.Team.Name) + } + if virtualKey.CustomerID != nil { + ctx.SetValue(schemas.BifrostContextKeyGovernanceCustomerID, *virtualKey.CustomerID) + } + if virtualKey.Customer != nil { + ctx.SetValue(schemas.BifrostContextKeyGovernanceCustomerName, virtualKey.Customer.Name) + } + } + + // Apply routing rules + if hasRoutingRules { + var err error + payload, _, err = p.applyRoutingRules(ctx, req, payload, virtualKey) + if err != nil { + return nil, err + } + } + + // Process virtual key: load balance provider + if virtualKey != nil { + var err error + payload, err = p.loadBalanceProvider(ctx, req, payload, virtualKey) + if err != nil { + return nil, err + } + } + + // Propagate model changes back to the original query param so the downstream + // realtime handler sees the routed/load-balanced model. + if newModel, ok := payload["model"].(string); ok && newModel != originalModel { + req.Query[modelQueryKey] = newModel + } + + return nil, nil +} + // HTTPTransportPostHook intercepts requests after they are processed (governance decision point) // It modifies the response in-place and returns nil to continue func (p *GovernancePlugin) HTTPTransportPostHook(ctx *schemas.BifrostContext, req *schemas.HTTPRequest, resp *schemas.HTTPResponse) error { @@ -1100,8 +1194,8 @@ func (p *GovernancePlugin) EvaluateGovernanceRequest(ctx *schemas.BifrostContext } p.cfgMutex.RUnlock() return nil, &schemas.BifrostError{ - Type: bifrost.Ptr("virtual_key_required"), - StatusCode: bifrost.Ptr(401), + Type: new("virtual_key_required"), + StatusCode: new(401), Error: &schemas.ErrorField{ Message: message, }, @@ -1223,8 +1317,8 @@ func (p *GovernancePlugin) EvaluateGovernanceRequest(ctx *schemas.BifrostContext case DecisionVirtualKeyNotFound, DecisionVirtualKeyBlocked, DecisionModelBlocked, DecisionProviderBlocked: return result, &schemas.BifrostError{ - Type: bifrost.Ptr(string(result.Decision)), - StatusCode: bifrost.Ptr(403), + Type: new(string(result.Decision)), + StatusCode: new(403), Error: &schemas.ErrorField{ Message: result.Reason, }, @@ -1232,8 +1326,8 @@ func (p *GovernancePlugin) EvaluateGovernanceRequest(ctx *schemas.BifrostContext case DecisionRateLimited, DecisionTokenLimited, DecisionRequestLimited: return result, &schemas.BifrostError{ - Type: bifrost.Ptr(string(result.Decision)), - StatusCode: bifrost.Ptr(429), + Type: new(string(result.Decision)), + StatusCode: new(429), Error: &schemas.ErrorField{ Message: result.Reason, }, @@ -1241,8 +1335,8 @@ func (p *GovernancePlugin) EvaluateGovernanceRequest(ctx *schemas.BifrostContext case DecisionBudgetExceeded: return result, &schemas.BifrostError{ - Type: bifrost.Ptr(string(result.Decision)), - StatusCode: bifrost.Ptr(402), + Type: new(string(result.Decision)), + StatusCode: new(402), Error: &schemas.ErrorField{ Message: result.Reason, }, @@ -1250,8 +1344,8 @@ func (p *GovernancePlugin) EvaluateGovernanceRequest(ctx *schemas.BifrostContext case DecisionMCPToolBlocked: return result, &schemas.BifrostError{ - Type: bifrost.Ptr(string(result.Decision)), - StatusCode: bifrost.Ptr(403), + Type: new(string(result.Decision)), + StatusCode: new(403), Error: &schemas.ErrorField{ Message: result.Reason, }, @@ -1260,7 +1354,7 @@ func (p *GovernancePlugin) EvaluateGovernanceRequest(ctx *schemas.BifrostContext default: // Fallback to deny for unknown decisions return result, &schemas.BifrostError{ - Type: bifrost.Ptr(string(result.Decision)), + Type: new(string(result.Decision)), Error: &schemas.ErrorField{ Message: "Governance decision error", }, diff --git a/plugins/governance/store.go b/plugins/governance/store.go index 755dd570b7..0c49b383b0 100644 --- a/plugins/governance/store.go +++ b/plugins/governance/store.go @@ -1373,9 +1373,10 @@ func (gs *LocalGovernanceStore) ResetExpiredBudgetsInMemory(ctx context.Context) if !ok || budget == nil { return true } + calendarAligned := budget.IsCalendarAligned var shouldReset bool var newLastReset time.Time - if budget.CalendarAligned { + if calendarAligned { currentPeriodStart := configstoreTables.GetCalendarPeriodStart(budget.ResetDuration, now) if currentPeriodStart.After(budget.LastReset) { shouldReset = true @@ -1451,8 +1452,9 @@ func (gs *LocalGovernanceStore) ResetExpiredRateLimitsInMemory(ctx context.Conte if !ok || rateLimit == nil { return true } - tokenNewLastReset := resolvePeriodStart(rateLimit.TokenResetDuration, rateLimit.CalendarAligned, rateLimit.TokenLastReset) - requestNewLastReset := resolvePeriodStart(rateLimit.RequestResetDuration, rateLimit.CalendarAligned, rateLimit.RequestLastReset) + calendarAligned := rateLimit.IsCalendarAligned + tokenNewLastReset := resolvePeriodStart(rateLimit.TokenResetDuration, calendarAligned, rateLimit.TokenLastReset) + requestNewLastReset := resolvePeriodStart(rateLimit.RequestResetDuration, calendarAligned, rateLimit.RequestLastReset) if tokenNewLastReset == nil && requestNewLastReset == nil { return true } @@ -2257,21 +2259,26 @@ func (gs *LocalGovernanceStore) CreateVirtualKeyInMemory(ctx context.Context, vk // Store budgets for i := range vk.Budgets { + vk.Budgets[i].IsCalendarAligned = vk.CalendarAligned gs.budgets.Store(vk.Budgets[i].ID, &vk.Budgets[i]) } // Create associated rate limit if exists if vk.RateLimit != nil { + vk.RateLimit.IsCalendarAligned = vk.CalendarAligned gs.rateLimits.Store(vk.RateLimit.ID, vk.RateLimit) } // Create provider config budgets and rate limits if they exist if vk.ProviderConfigs != nil { - for _, pc := range vk.ProviderConfigs { - for i := range pc.Budgets { - gs.budgets.Store(pc.Budgets[i].ID, &pc.Budgets[i]) + for i := range vk.ProviderConfigs { + pc := &vk.ProviderConfigs[i] + for j := range pc.Budgets { + pc.Budgets[j].IsCalendarAligned = vk.CalendarAligned + gs.budgets.Store(pc.Budgets[j].ID, &pc.Budgets[j]) } if pc.RateLimit != nil { + pc.RateLimit.IsCalendarAligned = vk.CalendarAligned gs.rateLimits.Store(pc.RateLimit.ID, pc.RateLimit) } } @@ -2318,6 +2325,7 @@ func (gs *LocalGovernanceStore) UpdateVirtualKeyInMemory(ctx context.Context, vk clone.Budgets[i].LastReset = existingBudget.LastReset } } + clone.Budgets[i].IsCalendarAligned = clone.CalendarAligned gs.budgets.Store(clone.Budgets[i].ID, &clone.Budgets[i]) } // Delete removed multi-budgets @@ -2340,6 +2348,7 @@ func (gs *LocalGovernanceStore) UpdateVirtualKeyInMemory(ctx context.Context, vk clone.RateLimit.RequestLastReset = existingRateLimit.RequestLastReset } } + clone.RateLimit.IsCalendarAligned = clone.CalendarAligned // Update the rate limit in the main rateLimits sync.Map gs.rateLimits.Store(clone.RateLimit.ID, clone.RateLimit) // Clean up old rate limit if ID changed (e.g., after AP propagation @@ -2385,6 +2394,7 @@ func (gs *LocalGovernanceStore) UpdateVirtualKeyInMemory(ctx context.Context, vk clone.ProviderConfigs[i].RateLimit.RequestLastReset = existingRateLimit.RequestLastReset } } + clone.ProviderConfigs[i].RateLimit.IsCalendarAligned = clone.CalendarAligned gs.rateLimits.Store(clone.ProviderConfigs[i].RateLimit.ID, clone.ProviderConfigs[i].RateLimit) } else { // Rate limit was removed from provider config, delete it from memory if it existed @@ -2402,6 +2412,7 @@ func (gs *LocalGovernanceStore) UpdateVirtualKeyInMemory(ctx context.Context, vk b.LastReset = existingBudget.LastReset } } + b.IsCalendarAligned = clone.CalendarAligned gs.budgets.Store(b.ID, b) } // Delete removed multi-budgets for this provider config @@ -2486,12 +2497,14 @@ func (gs *LocalGovernanceStore) CreateTeamInMemory(ctx context.Context, team *co // Create associated budgets if they exist for i := range team.Budgets { + team.Budgets[i].IsCalendarAligned = team.CalendarAligned b := team.Budgets[i] gs.budgets.Store(b.ID, &b) } // Create associated rate limit if exists if team.RateLimit != nil { + team.RateLimit.IsCalendarAligned = team.CalendarAligned gs.rateLimits.Store(team.RateLimit.ID, team.RateLimit) } @@ -2532,6 +2545,7 @@ func (gs *LocalGovernanceStore) UpdateTeamInMemory(ctx context.Context, team *co b.LastReset = lb.LastReset } } + b.IsCalendarAligned = clone.CalendarAligned gs.budgets.Store(b.ID, b) } for id := range existingBudgetIDs { @@ -2552,6 +2566,7 @@ func (gs *LocalGovernanceStore) UpdateTeamInMemory(ctx context.Context, team *co clone.RateLimit.RequestLastReset = existingRateLimit.RequestLastReset } } + clone.RateLimit.IsCalendarAligned = clone.CalendarAligned gs.rateLimits.Store(clone.RateLimit.ID, clone.RateLimit) // Clean up old rate limit if ID changed (e.g., UUID rotation on propagation) if existingTeam.RateLimit != nil && existingTeam.RateLimit.ID != clone.RateLimit.ID { diff --git a/plugins/logging/main.go b/plugins/logging/main.go index b843e290a8..02ecc07aef 100644 --- a/plugins/logging/main.go +++ b/plugins/logging/main.go @@ -528,6 +528,13 @@ func (p *LoggerPlugin) PreLLMHook(ctx *schemas.BifrostContext, req *schemas.Bifr case schemas.RealtimeRequest: if req.ResponsesRequest != nil { initialData.Params = req.ResponsesRequest.Params + if req.ResponsesRequest.Params != nil { + var tools []schemas.ChatTool + for _, tool := range req.ResponsesRequest.Params.Tools { + tools = append(tools, *tool.ToChatTool()) + } + initialData.Tools = tools + } } case schemas.EmbeddingRequest: initialData.Params = req.EmbeddingRequest.Params @@ -790,11 +797,6 @@ func (p *LoggerPlugin) PostLLMHook(ctx *schemas.BifrostContext, result *schemas. } pending := pendingVal.(*PendingLogData) - if requestType == schemas.RealtimeRequest { - if resolvedRealtimeSessionID := bifrost.GetStringFromContext(ctx, schemas.BifrostContextKeyRealtimeSessionID); resolvedRealtimeSessionID != "" { - pending.ParentRequestID = resolvedRealtimeSessionID - } - } // Should never happen, but just in case // Fallback to request type from pending data if request type is not set @@ -827,13 +829,29 @@ func (p *LoggerPlugin) PostLLMHook(ctx *schemas.BifrostContext, result *schemas. } // Extract routing engine logs from context before entering goroutine routingEngineLogs := formatRoutingEngineLogs(ctx.GetRoutingEngineLogs()) + if requestType == schemas.RealtimeRequest { + if resolvedRealtimeSessionID := bifrost.GetStringFromContext(ctx, schemas.BifrostContextKeyRealtimeSessionID); resolvedRealtimeSessionID != "" { + pending.ParentRequestID = resolvedRealtimeSessionID + } + pending.InitialData.Metadata = mergeRealtimeMetadata(pending.InitialData.Metadata, ctx) + if routingEngines, ok := ctx.Value(schemas.BifrostContextKeyRoutingEnginesUsed).([]string); ok { + pending.InitialData.RoutingEngineUsed = routingEngines + pending.RoutingEnginesUsed = routingEngines + } + } // Build the complete log entry with input (from PreLLMHook) + output (from PostLLMHook) entry := buildCompleteLogEntryFromPending(pending) - // Apply common output fields + // Apply common output fields. For cache hits, prefer the cache-serve + // latency stamped by the semantic cache plugin over the original provider + // latency preserved in the cached response. var latency int64 if result != nil { - latency = result.GetExtraFields().Latency + ef := result.GetExtraFields() + latency = ef.Latency + if ef.CacheDebug != nil && ef.CacheDebug.CacheHit && ef.CacheDebug.CacheHitLatency != nil { + latency = *ef.CacheDebug.CacheHitLatency + } } applyOutputFieldsToEntry(entry, selectedKeyID, selectedKeyName, virtualKeyID, virtualKeyName, routingRuleID, routingRuleName, selectedPromptID, selectedPromptName, selectedPromptVersion, teamID, teamName, customerID, customerName, userID, userName, businessUnitID, businessUnitName, numberOfRetries, latency, attemptTrail) entry.MetadataParsed = pending.InitialData.Metadata diff --git a/plugins/logging/operations.go b/plugins/logging/operations.go index d11310505c..83f6f4bc14 100644 --- a/plugins/logging/operations.go +++ b/plugins/logging/operations.go @@ -378,16 +378,6 @@ func (p *LoggerPlugin) applyStreamingOutputToEntry(entry *logstore.Log, streamRe entry.StopReason = streamResponse.Data.FinishReason } - // Cache - if streamResponse.Data.CacheDebug != nil { - entry.CacheDebugParsed = streamResponse.Data.CacheDebug - } - - // Finish/stop reason - always persist regardless of content logging settings - if streamResponse.Data.FinishReason != nil { - entry.StopReason = streamResponse.Data.FinishReason - } - // Passthrough status code if streamResponse.Data.PassthroughOutput != nil { if params, ok := entry.ParamsParsed.(*schemas.PassthroughLogParams); ok { diff --git a/plugins/logging/utils.go b/plugins/logging/utils.go index df9da1e573..b4a73bde1e 100644 --- a/plugins/logging/utils.go +++ b/plugins/logging/utils.go @@ -752,6 +752,8 @@ func mergeRealtimeMetadata(metadata map[string]interface{}, ctx *schemas.Bifrost set("provider_session_id", schemas.BifrostContextKeyRealtimeProviderSessionID) set("realtime_source", schemas.BifrostContextKeyRealtimeSource) set("realtime_event_type", schemas.BifrostContextKeyRealtimeEventType) + set("realtime_transport", schemas.BifrostContextKeyRealtimeTransport) + set("realtime_voice", schemas.BifrostContextKeyRealtimeVoice) if bifrost.GetStringFromContext(ctx, schemas.BifrostContextKeyRealtimeSessionID) != "" { if metadata == nil { metadata = make(map[string]interface{}) diff --git a/plugins/otel/converter.go b/plugins/otel/converter.go index 0d1e4e908f..2875aee0fc 100644 --- a/plugins/otel/converter.go +++ b/plugins/otel/converter.go @@ -3,6 +3,7 @@ package otel import ( "encoding/hex" "fmt" + "slices" "strings" "github.com/maximhq/bifrost/core/schemas" @@ -69,11 +70,81 @@ func hexToBytes(hexStr string, length int) []byte { return bytes } +// shouldExportSpan reports whether a span should be included in the export. +// Non-plugin spans are always exported. Plugin spans are checked against pluginSpanFilter. +func (p *OtelPlugin) shouldExportSpan(span *schemas.Span) bool { + if span.Kind != schemas.SpanKindPlugin || p.pluginSpanFilter == nil { + return true + } + // Span names follow the pattern "plugin..prehook" / "plugin..posthook". + parts := strings.SplitN(span.Name, ".", 3) + if len(parts) < 2 { + return true + } + pluginName := parts[1] + + inList := slices.Contains(p.pluginSpanFilter.Plugins, pluginName) + + if p.pluginSpanFilter.Mode == PluginSpanFilterModeInclude { + return inList + } + return !inList // exclude mode +} + +// buildReparentMap returns a map of filteredSpanID → effective ancestor spanID for all +// spans that will be skipped. When plugin spans are chained (each span's parent is the +// previous plugin's span), removing a span from the middle would leave its children with +// a dangling parent ID. The map lets us rewrite those parent IDs to the nearest exported +// ancestor, handling consecutive filtered spans in a chain. +func (p *OtelPlugin) buildReparentMap(spans []*schemas.Span) map[string]string { + if p.pluginSpanFilter == nil { + return nil + } + // First pass: record direct parent ID for every filtered span. + filtered := make(map[string]string) // spanID -> parentID + for _, span := range spans { + if !p.shouldExportSpan(span) { + filtered[span.SpanID] = span.ParentID + } + } + if len(filtered) == 0 { + return nil + } + // Second pass: resolve chains so each filtered span maps to its first exported ancestor. + // Cap the walk at len(filtered) to break out of any cycle caused by malformed span data. + maxHops := len(filtered) + for spanID := range filtered { + parentID := filtered[spanID] + for range maxHops { + grandParentID, isFiltered := filtered[parentID] + if !isFiltered { + break + } + parentID = grandParentID + } + filtered[spanID] = parentID + } + return filtered +} + // convertTraceToResourceSpan converts a Bifrost trace to OTEL ResourceSpan func (p *OtelPlugin) convertTraceToResourceSpan(trace *schemas.Trace) *ResourceSpan { + reparent := p.buildReparentMap(trace.Spans) otelSpans := make([]*Span, 0, len(trace.Spans)) for _, span := range trace.Spans { + if !p.shouldExportSpan(span) { + continue + } otelSpan := p.convertSpanToOTELSpan(trace.TraceID, span) + // If the span's direct parent was filtered, rewrite its parent ID to the + // nearest exported ancestor so the hierarchy stays connected. + if effectiveParent, ok := reparent[span.ParentID]; ok { + if effectiveParent == "" { + otelSpan.ParentSpanId = nil + } else { + otelSpan.ParentSpanId = hexToBytes(effectiveParent, 8) + } + } if span == trace.RootSpan { if requestID := trace.GetRequestID(); requestID != "" { otelSpan.Attributes = append(otelSpan.Attributes, kvStr(schemas.AttrRequestID, requestID)) diff --git a/plugins/otel/converter_test.go b/plugins/otel/converter_test.go new file mode 100644 index 0000000000..f9e8833b93 --- /dev/null +++ b/plugins/otel/converter_test.go @@ -0,0 +1,178 @@ +package otel + +import ( + "testing" + "time" + + "github.com/maximhq/bifrost/core/schemas" +) + +func makeSpan(id, parentID, name string, kind schemas.SpanKind) *schemas.Span { + return &schemas.Span{ + SpanID: id, + ParentID: parentID, + Name: name, + Kind: kind, + StartTime: time.Now(), + EndTime: time.Now(), + } +} + +func TestShouldExportSpan(t *testing.T) { + tests := []struct { + name string + filter *PluginSpanFilter + span *schemas.Span + want bool + }{ + { + name: "nil filter exports everything", + filter: nil, + span: makeSpan("1", "", "plugin.logging.prehook", schemas.SpanKindPlugin), + want: true, + }, + { + name: "non-plugin span always exported regardless of filter", + filter: &PluginSpanFilter{Mode: PluginSpanFilterModeExclude, Plugins: []string{"logging"}}, + span: makeSpan("1", "", "llm.call", schemas.SpanKindLLMCall), + want: true, + }, + { + name: "exclude mode: plugin in list is suppressed", + filter: &PluginSpanFilter{Mode: PluginSpanFilterModeExclude, Plugins: []string{"logging", "compat"}}, + span: makeSpan("1", "", "plugin.logging.prehook", schemas.SpanKindPlugin), + want: false, + }, + { + name: "exclude mode: plugin not in list is exported", + filter: &PluginSpanFilter{Mode: PluginSpanFilterModeExclude, Plugins: []string{"logging"}}, + span: makeSpan("1", "", "plugin.governance.posthook", schemas.SpanKindPlugin), + want: true, + }, + { + name: "exclude mode: posthook variant suppressed the same as prehook", + filter: &PluginSpanFilter{Mode: PluginSpanFilterModeExclude, Plugins: []string{"logging"}}, + span: makeSpan("1", "", "plugin.logging.posthook", schemas.SpanKindPlugin), + want: false, + }, + { + name: "include mode: plugin in list is exported", + filter: &PluginSpanFilter{Mode: PluginSpanFilterModeInclude, Plugins: []string{"guardrails"}}, + span: makeSpan("1", "", "plugin.guardrails.prehook", schemas.SpanKindPlugin), + want: true, + }, + { + name: "include mode: plugin not in list is suppressed", + filter: &PluginSpanFilter{Mode: PluginSpanFilterModeInclude, Plugins: []string{"guardrails"}}, + span: makeSpan("1", "", "plugin.logging.prehook", schemas.SpanKindPlugin), + want: false, + }, + { + name: "exclude mode: empty list suppresses nothing", + filter: &PluginSpanFilter{Mode: PluginSpanFilterModeExclude, Plugins: []string{}}, + span: makeSpan("1", "", "plugin.logging.prehook", schemas.SpanKindPlugin), + want: true, + }, + { + name: "include mode: empty list suppresses everything", + filter: &PluginSpanFilter{Mode: PluginSpanFilterModeInclude, Plugins: []string{}}, + span: makeSpan("1", "", "plugin.logging.prehook", schemas.SpanKindPlugin), + want: false, + }, + { + name: "span name without dots passes through", + filter: &PluginSpanFilter{Mode: PluginSpanFilterModeExclude, Plugins: []string{"logging"}}, + span: makeSpan("1", "", "nodots", schemas.SpanKindPlugin), + want: true, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + p := &OtelPlugin{pluginSpanFilter: tt.filter} + if got := p.shouldExportSpan(tt.span); got != tt.want { + t.Errorf("shouldExportSpan() = %v, want %v", got, tt.want) + } + }) + } +} + +func TestBuildReparentMap(t *testing.T) { + excludeLogging := &PluginSpanFilter{Mode: PluginSpanFilterModeExclude, Plugins: []string{"logging"}} + + t.Run("nil filter returns nil map", func(t *testing.T) { + p := &OtelPlugin{pluginSpanFilter: nil} + spans := []*schemas.Span{makeSpan("a", "root", "plugin.logging.prehook", schemas.SpanKindPlugin)} + if m := p.buildReparentMap(spans); m != nil { + t.Errorf("expected nil, got %v", m) + } + }) + + t.Run("no filtered spans returns nil map", func(t *testing.T) { + p := &OtelPlugin{pluginSpanFilter: excludeLogging} + spans := []*schemas.Span{ + makeSpan("a", "root", "plugin.governance.prehook", schemas.SpanKindPlugin), + } + if m := p.buildReparentMap(spans); m != nil { + t.Errorf("expected nil, got %v", m) + } + }) + + t.Run("single filtered span maps to its direct parent", func(t *testing.T) { + p := &OtelPlugin{pluginSpanFilter: excludeLogging} + // root -> logging (filtered) -> governance + spans := []*schemas.Span{ + makeSpan("root", "", "request", schemas.SpanKindInternal), + makeSpan("log-pre", "root", "plugin.logging.prehook", schemas.SpanKindPlugin), + makeSpan("gov-pre", "log-pre", "plugin.governance.prehook", schemas.SpanKindPlugin), + } + m := p.buildReparentMap(spans) + if m == nil { + t.Fatal("expected non-nil map") + } + if got := m["log-pre"]; got != "root" { + t.Errorf("filtered span should map to parent 'root', got %q", got) + } + }) + + t.Run("chain of filtered spans resolves to nearest exported ancestor", func(t *testing.T) { + // root -> telemetry (filtered) -> logging (filtered) -> governance + p := &OtelPlugin{pluginSpanFilter: &PluginSpanFilter{ + Mode: PluginSpanFilterModeExclude, + Plugins: []string{"telemetry", "logging"}, + }} + spans := []*schemas.Span{ + makeSpan("root", "", "request", schemas.SpanKindInternal), + makeSpan("tel-pre", "root", "plugin.telemetry.prehook", schemas.SpanKindPlugin), + makeSpan("log-pre", "tel-pre", "plugin.logging.prehook", schemas.SpanKindPlugin), + makeSpan("gov-pre", "log-pre", "plugin.governance.prehook", schemas.SpanKindPlugin), + } + m := p.buildReparentMap(spans) + if m == nil { + t.Fatal("expected non-nil map") + } + // Both filtered spans must resolve to "root" so governance.prehook re-parents there. + if got := m["tel-pre"]; got != "root" { + t.Errorf("tel-pre should resolve to 'root', got %q", got) + } + if got := m["log-pre"]; got != "root" { + t.Errorf("log-pre should skip the chain and resolve to 'root', got %q", got) + } + }) + + t.Run("filtered span with no parent resolves to empty string", func(t *testing.T) { + p := &OtelPlugin{pluginSpanFilter: excludeLogging} + spans := []*schemas.Span{ + // logging span has no parent (root of trace) + makeSpan("log-pre", "", "plugin.logging.prehook", schemas.SpanKindPlugin), + makeSpan("gov-pre", "log-pre", "plugin.governance.prehook", schemas.SpanKindPlugin), + } + m := p.buildReparentMap(spans) + if m == nil { + t.Fatal("expected non-nil map") + } + if got := m["log-pre"]; got != "" { + t.Errorf("root-level filtered span should resolve to empty string, got %q", got) + } + }) +} diff --git a/plugins/otel/filter_test.go b/plugins/otel/filter_test.go new file mode 100644 index 0000000000..af9945b7be --- /dev/null +++ b/plugins/otel/filter_test.go @@ -0,0 +1,98 @@ +package otel + +import ( + "encoding/json" + "testing" +) + +// TestPluginSpanFilterUnmarshal verifies that plugin_span_filter round-trips +// through JSON correctly, including when embedded in a full Config. +func TestPluginSpanFilterUnmarshal(t *testing.T) { + tests := []struct { + name string + raw string + wantMode PluginSpanFilterMode + wantList []string + }{ + { + name: "exclude mode", + raw: `{"mode":"exclude","plugins":["logging","compat"]}`, + wantMode: PluginSpanFilterModeExclude, + wantList: []string{"logging", "compat"}, + }, + { + name: "include mode", + raw: `{"mode":"include","plugins":["guardrails"]}`, + wantMode: PluginSpanFilterModeInclude, + wantList: []string{"guardrails"}, + }, + { + name: "empty plugins list", + raw: `{"mode":"exclude","plugins":[]}`, + wantMode: PluginSpanFilterModeExclude, + wantList: []string{}, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + var f PluginSpanFilter + if err := json.Unmarshal([]byte(tt.raw), &f); err != nil { + t.Fatalf("unexpected unmarshal error: %v", err) + } + if f.Mode != tt.wantMode { + t.Errorf("Mode = %q, want %q", f.Mode, tt.wantMode) + } + if len(f.Plugins) != len(tt.wantList) { + t.Errorf("Plugins length = %d, want %d", len(f.Plugins), len(tt.wantList)) + return + } + for i, p := range tt.wantList { + if f.Plugins[i] != p { + t.Errorf("Plugins[%d] = %q, want %q", i, f.Plugins[i], p) + } + } + }) + } +} + +// TestConfigPluginSpanFilterField verifies that plugin_span_filter is correctly +// parsed when present inside a full Config JSON blob. +func TestConfigPluginSpanFilterField(t *testing.T) { + raw := `{ + "collector_url": "localhost:4317", + "trace_type": "genai_extension", + "protocol": "grpc", + "plugin_span_filter": { + "mode": "exclude", + "plugins": ["logging", "telemetry"] + } + }` + + var cfg Config + if err := json.Unmarshal([]byte(raw), &cfg); err != nil { + t.Fatalf("unexpected unmarshal error: %v", err) + } + if cfg.PluginSpanFilter == nil { + t.Fatal("expected PluginSpanFilter to be set") + } + if cfg.PluginSpanFilter.Mode != PluginSpanFilterModeExclude { + t.Errorf("Mode = %q, want %q", cfg.PluginSpanFilter.Mode, PluginSpanFilterModeExclude) + } + if len(cfg.PluginSpanFilter.Plugins) != 2 { + t.Errorf("Plugins length = %d, want 2", len(cfg.PluginSpanFilter.Plugins)) + } +} + +// TestConfigPluginSpanFilterAbsent verifies that omitting plugin_span_filter +// leaves Config.PluginSpanFilter as nil (no default applied). +func TestConfigPluginSpanFilterAbsent(t *testing.T) { + raw := `{"collector_url":"localhost:4317","trace_type":"genai_extension","protocol":"grpc"}` + var cfg Config + if err := json.Unmarshal([]byte(raw), &cfg); err != nil { + t.Fatalf("unexpected unmarshal error: %v", err) + } + if cfg.PluginSpanFilter != nil { + t.Errorf("expected PluginSpanFilter to be nil when absent, got %+v", cfg.PluginSpanFilter) + } +} diff --git a/plugins/otel/go.mod b/plugins/otel/go.mod index 949ff77425..de1a3fe054 100644 --- a/plugins/otel/go.mod +++ b/plugins/otel/go.mod @@ -11,6 +11,7 @@ require ( go.opentelemetry.io/otel/metric v1.43.0 go.opentelemetry.io/otel/sdk v1.43.0 go.opentelemetry.io/otel/sdk/metric v1.43.0 + go.opentelemetry.io/proto/otlp v1.10.0 google.golang.org/grpc v1.80.0 google.golang.org/protobuf v1.36.11 ) @@ -161,7 +162,6 @@ require ( github.com/cloudwego/base64x v0.1.6 // indirect github.com/klauspost/cpuid/v2 v2.3.0 // indirect github.com/twitchyliquid64/golang-asm v0.15.1 // indirect - go.opentelemetry.io/proto/otlp v1.10.0 golang.org/x/arch v0.23.0 // indirect golang.org/x/sys v0.42.0 // indirect ) diff --git a/plugins/otel/main.go b/plugins/otel/main.go index 3b7c47d0fb..6ca4d6634a 100644 --- a/plugins/otel/main.go +++ b/plugins/otel/main.go @@ -44,6 +44,21 @@ const ProtocolHTTP Protocol = "http" // ProtocolGRPC is the second protocol const ProtocolGRPC Protocol = "grpc" +// PluginSpanFilterMode controls whether the plugins list is an allowlist or denylist. +type PluginSpanFilterMode string + +const ( + PluginSpanFilterModeInclude PluginSpanFilterMode = "include" + PluginSpanFilterModeExclude PluginSpanFilterMode = "exclude" +) + +// PluginSpanFilter configures which plugin spans are exported to the OTEL collector. +// Mode "include" exports only the listed plugins; mode "exclude" exports everything except them. +type PluginSpanFilter struct { + Mode PluginSpanFilterMode `json:"mode"` + Plugins []string `json:"plugins"` +} + type Config struct { ServiceName string `json:"service_name"` CollectorURL string `json:"collector_url"` @@ -57,6 +72,10 @@ type Config struct { MetricsEnabled bool `json:"metrics_enabled"` MetricsEndpoint string `json:"metrics_endpoint"` MetricsPushInterval int `json:"metrics_push_interval"` // in seconds, default 15 + + // PluginSpanFilter is the DB-stored fallback when otel_plugin_span_filter is absent in config.json. + // The top-level config.json field takes precedence and is passed via Init's pluginSpanFilter param. + PluginSpanFilter *PluginSpanFilter `json:"plugin_span_filter,omitempty"` } // UnmarshalJSON applies field defaults that the zero-value wouldn't capture. @@ -105,6 +124,8 @@ type OtelPlugin struct { // Metrics push support metricsExporter *MetricsExporter + + pluginSpanFilter *PluginSpanFilter } // Init function for the OTEL plugin @@ -129,6 +150,14 @@ func Init(ctx context.Context, config *Config, _logger schemas.Logger, pricingMa } } } + if config.PluginSpanFilter != nil { + switch config.PluginSpanFilter.Mode { + case PluginSpanFilterModeInclude, PluginSpanFilterModeExclude: + default: + return nil, fmt.Errorf("plugin_span_filter.mode %q is invalid: must be %q or %q", + config.PluginSpanFilter.Mode, PluginSpanFilterModeInclude, PluginSpanFilterModeExclude) + } + } if config.ServiceName == "" { config.ServiceName = "bifrost" } @@ -169,6 +198,7 @@ func Init(ctx context.Context, config *Config, _logger schemas.Logger, pricingMa bifrostVersion: bifrostVersion, attributesFromEnvironment: attributesFromEnvironment, instanceAttrs: instanceAttrs, + pluginSpanFilter: config.PluginSpanFilter, } p.ctx, p.cancel = context.WithCancel(ctx) if config.Protocol == ProtocolGRPC { @@ -274,6 +304,14 @@ func (p *OtelPlugin) ValidateConfig(config any) (*Config, error) { if otelConfig.Protocol == "" { return nil, fmt.Errorf("protocol is required") } + if otelConfig.PluginSpanFilter != nil { + switch otelConfig.PluginSpanFilter.Mode { + case PluginSpanFilterModeInclude, PluginSpanFilterModeExclude: + default: + return nil, fmt.Errorf("plugin_span_filter.mode %q is invalid: must be %q or %q", + otelConfig.PluginSpanFilter.Mode, PluginSpanFilterModeInclude, PluginSpanFilterModeExclude) + } + } return &otelConfig, nil } diff --git a/plugins/semanticcache/config_unmarshal_test.go b/plugins/semanticcache/config_unmarshal_test.go index d38ad31fc6..5d2edd44f1 100644 --- a/plugins/semanticcache/config_unmarshal_test.go +++ b/plugins/semanticcache/config_unmarshal_test.go @@ -43,7 +43,6 @@ func TestUnmarshalJSON_AllFields(t *testing.T) { input := `{ "provider": "openai", "embedding_model": "text-embedding-3-small", - "cleanup_on_shutdown": true, "dimension": 1536, "ttl": "10m", "threshold": 0.9, @@ -66,9 +65,6 @@ func TestUnmarshalJSON_AllFields(t *testing.T) { if config.EmbeddingModel != "text-embedding-3-small" { t.Errorf("EmbeddingModel: expected %q, got %q", "text-embedding-3-small", config.EmbeddingModel) } - if !config.CleanUpOnShutdown { - t.Error("CleanUpOnShutdown: expected true") - } if config.Dimension != 1536 { t.Errorf("Dimension: expected 1536, got %d", config.Dimension) } @@ -136,32 +132,32 @@ func TestUnmarshalJSON_TTLFormats(t *testing.T) { func TestUnmarshalJSON_BoolPointerFields(t *testing.T) { tests := []struct { - name string - json string - expectCacheByModel *bool - expectCacheByProv *bool - expectExcludeSys *bool + name string + json string + expectCacheByModel *bool + expectCacheByProv *bool + expectExcludeSys *bool }{ { - name: "all set to true", - json: `{"dimension": 1536, "cache_by_model": true, "cache_by_provider": true, "exclude_system_prompt": true}`, - expectCacheByModel: bifrost.Ptr(true), - expectCacheByProv: bifrost.Ptr(true), - expectExcludeSys: bifrost.Ptr(true), + name: "all set to true", + json: `{"dimension": 1536, "cache_by_model": true, "cache_by_provider": true, "exclude_system_prompt": true}`, + expectCacheByModel: bifrost.Ptr(true), + expectCacheByProv: bifrost.Ptr(true), + expectExcludeSys: bifrost.Ptr(true), }, { - name: "all set to false", - json: `{"dimension": 1536, "cache_by_model": false, "cache_by_provider": false, "exclude_system_prompt": false}`, - expectCacheByModel: bifrost.Ptr(false), - expectCacheByProv: bifrost.Ptr(false), - expectExcludeSys: bifrost.Ptr(false), + name: "all set to false", + json: `{"dimension": 1536, "cache_by_model": false, "cache_by_provider": false, "exclude_system_prompt": false}`, + expectCacheByModel: bifrost.Ptr(false), + expectCacheByProv: bifrost.Ptr(false), + expectExcludeSys: bifrost.Ptr(false), }, { - name: "all omitted", - json: `{"dimension": 1536}`, - expectCacheByModel: nil, - expectCacheByProv: nil, - expectExcludeSys: nil, + name: "all omitted", + json: `{"dimension": 1536}`, + expectCacheByModel: nil, + expectCacheByProv: nil, + expectExcludeSys: nil, }, } diff --git a/plugins/semanticcache/main.go b/plugins/semanticcache/main.go index 235560f6a9..7020a51742 100644 --- a/plugins/semanticcache/main.go +++ b/plugins/semanticcache/main.go @@ -7,12 +7,9 @@ import ( "context" "encoding/json" "fmt" - "strconv" "sync" "time" - "github.com/google/uuid" - bifrost "github.com/maximhq/bifrost/core" "github.com/maximhq/bifrost/core/schemas" "github.com/maximhq/bifrost/framework/vectorstore" @@ -21,17 +18,23 @@ import ( // Config contains configuration for the semantic cache plugin. // The VectorStore abstraction handles the underlying storage implementation and its defaults. // Only specify values you want to override from the semantic cache defaults. +// +// Modes: +// - Semantic mode: set Provider + EmbeddingModel + Dimension > 0. Both direct +// hash matching and embedding-based similarity search are enabled. +// - Direct-only mode: set Provider="" and Dimension=1. The plugin disables +// semantic search entirely; cache lookups go through the deterministic +// direct hash path. Dimension=1 keeps stores that require a vector happy. type Config struct { // Embedding Model settings - REQUIRED for semantic caching Provider schemas.ModelProvider `json:"provider"` EmbeddingModel string `json:"embedding_model,omitempty"` // Model to use for generating embeddings (optional) // Plugin behavior settings - CleanUpOnShutdown bool `json:"cleanup_on_shutdown,omitempty"` // Clean up cache on shutdown (default: false) TTL time.Duration `json:"ttl,omitempty"` // Time-to-live for cached responses (default: 5min) - Threshold float64 `json:"threshold,omitempty"` // Cosine similarity threshold for semantic matching (default: 0.8) + Threshold float64 `json:"threshold,omitempty"` // Cosine similarity threshold for semantic matching (0 = unset → default 0.8) VectorStoreNamespace string `json:"vector_store_namespace,omitempty"` // Namespace for vector store (optional) - Dimension int `json:"dimension"` // Dimension for vector store + Dimension int `json:"dimension"` // Dimension for vector store (must be > 0 when Provider is set; use 1 for direct-only mode) // Advanced caching behavior DefaultCacheKey string `json:"default_cache_key,omitempty"` // Default cache key used when no per-request key is provided (optional, caching is disabled when empty and no per-request key is set) @@ -41,117 +44,129 @@ type Config struct { ExcludeSystemPrompt *bool `json:"exclude_system_prompt,omitempty"` // Exclude system prompt in cache key (default: false) } -// UnmarshalJSON implements custom JSON unmarshaling for semantic cache Config. -// It supports TTL parsing from both string durations ("1m", "1hr") and numeric seconds for configurable cache behavior. +// UnmarshalJSON implements custom JSON unmarshaling for Config so TTL accepts +// either a duration string ("1m", "1h") or a JSON number (seconds). All other +// fields decode through the default path via a type alias, so adding a new +// field on Config does not require touching this method. func (c *Config) UnmarshalJSON(data []byte) error { - // Define a temporary struct to avoid infinite recursion - type TempConfig struct { - Provider string `json:"provider"` - EmbeddingModel string `json:"embedding_model,omitempty"` - CleanUpOnShutdown bool `json:"cleanup_on_shutdown,omitempty"` - Dimension int `json:"dimension"` - TTL interface{} `json:"ttl,omitempty"` - Threshold float64 `json:"threshold,omitempty"` - VectorStoreNamespace string `json:"vector_store_namespace,omitempty"` - DefaultCacheKey string `json:"default_cache_key,omitempty"` - ConversationHistoryThreshold int `json:"conversation_history_threshold,omitempty"` - CacheByModel *bool `json:"cache_by_model,omitempty"` - CacheByProvider *bool `json:"cache_by_provider,omitempty"` - ExcludeSystemPrompt *bool `json:"exclude_system_prompt,omitempty"` - } - - var temp TempConfig - if err := json.Unmarshal(data, &temp); err != nil { + // alias suppresses Config's UnmarshalJSON to avoid infinite recursion. + // The outer TTL (json.RawMessage) shadows alias.TTL because the json + // package picks the shallower field on a name conflict. + type alias Config + aux := &struct { + TTL json.RawMessage `json:"ttl,omitempty"` + *alias + }{alias: (*alias)(c)} + if err := json.Unmarshal(data, aux); err != nil { return fmt.Errorf("failed to unmarshal config: %w", err) } - // Set simple fields - c.Provider = schemas.ModelProvider(temp.Provider) - c.EmbeddingModel = temp.EmbeddingModel - c.CleanUpOnShutdown = temp.CleanUpOnShutdown - c.Dimension = temp.Dimension - c.CacheByModel = temp.CacheByModel - c.CacheByProvider = temp.CacheByProvider - c.VectorStoreNamespace = temp.VectorStoreNamespace - c.ConversationHistoryThreshold = temp.ConversationHistoryThreshold - c.Threshold = temp.Threshold - c.DefaultCacheKey = temp.DefaultCacheKey - c.ExcludeSystemPrompt = temp.ExcludeSystemPrompt - // Handle TTL field with custom parsing for VectorStore-backed cache behavior - if temp.TTL != nil { - switch v := temp.TTL.(type) { - case string: - // Try parsing as duration string (e.g., "1m", "1hr") for semantic cache TTL - duration, err := time.ParseDuration(v) - if err != nil { - return fmt.Errorf("failed to parse TTL duration string '%s': %w", v, err) - } - c.TTL = duration - case int: - // Handle integer seconds for semantic cache TTL - c.TTL = time.Duration(v) * time.Second - default: - // Try converting to string and parsing as number for semantic cache TTL - ttlStr := fmt.Sprintf("%v", v) - if seconds, err := strconv.ParseFloat(ttlStr, 64); err == nil { - c.TTL = time.Duration(seconds * float64(time.Second)) - } else { - return fmt.Errorf("unsupported TTL type: %T (value: %v)", v, v) - } - } + if len(aux.TTL) == 0 || string(aux.TTL) == "null" { + return nil } + // Try string first ("1m"); fall back to a JSON number (seconds). + var s string + if err := json.Unmarshal(aux.TTL, &s); err == nil { + d, err := time.ParseDuration(s) + if err != nil { + return fmt.Errorf("failed to parse TTL duration string '%s': %w", s, err) + } + c.TTL = d + } else { + var seconds float64 + if err := json.Unmarshal(aux.TTL, &seconds); err != nil { + return fmt.Errorf("unsupported TTL value: %s", string(aux.TTL)) + } + c.TTL = time.Duration(seconds * float64(time.Second)) + } + if c.TTL < 0 { + return fmt.Errorf("TTL must be non-negative, got %v", c.TTL) + } return nil } -// StreamChunk represents a single chunk from a streaming response +// StreamChunk is one chunk from a streaming response, retained until the +// stream completes so it can be persisted as part of the cache entry. type StreamChunk struct { - Timestamp time.Time // When chunk was received - Response *schemas.BifrostResponse // The actual response chunk - FinishReason *string // If this is the final chunk + // Timestamp records when this chunk arrived at PostLLMHook. Used by the + // reaper to drop accumulators stuck without a final chunk. + Timestamp time.Time + // Response is the chunk payload as delivered by the provider. + Response *schemas.BifrostResponse } -// StreamAccumulator manages accumulation of streaming chunks for caching +// StreamAccumulator collects the chunks of a single streaming response so +// they can be flushed as one cache entry on the final chunk. type StreamAccumulator struct { - RequestID string // The request ID - StorageID string // The final cache entry ID - Chunks []*StreamChunk // All chunks for this stream - IsComplete bool // Whether the stream is complete - HasError bool // Whether any chunk in the stream had an error - FinalTimestamp time.Time // When the stream completed - Embedding []float32 // Embedding for the original request - Metadata map[string]any // Metadata for caching - TTL time.Duration // TTL for this cache entry - mu sync.Mutex // Protects chunk operations + // mu serializes Chunks/IsComplete updates across the per-chunk PostLLMHook + // invocations and the periodic reaper. + mu sync.Mutex + // RequestID is the BifrostContext request ID this accumulator is keyed by. + RequestID string + // StorageID is the cache entry ID the accumulated stream will be written under. + StorageID string + // Chunks holds every chunk seen so far, in arrival order. + Chunks []*StreamChunk + // LastSeenAt records the arrival time of the most recent chunk. The reaper + // uses this so a long-running stream isn't evicted mid-flight; first-chunk + // time alone would falsely flag still-active streams as abandoned. + LastSeenAt time.Time + // IsComplete is set when the final chunk has been observed; further final + // chunks are no-ops to keep flush idempotent. + IsComplete bool + // Embedding is the request embedding to attach to the cache entry, or nil + // for direct-only writes. + Embedding []float32 + // Metadata is the unified metadata captured at first-chunk time and reused + // at flush. expires_at is locked in here, so TTL is fixed at first chunk. + Metadata map[string]any + // TTL is retained for symmetry with Metadata; the effective expiry is the + // expires_at value already baked into Metadata. + TTL time.Duration } -// EmbeddingRequestExecutor is a function that executes a request and returns a response and an error. -// It maps to .EmbeddingRequest() of the bifrost client. +// EmbeddingRequestExecutor invokes the embedding endpoint on the bifrost +// client. The plugin calls it on cache misses to compute the request +// embedding for semantic similarity search and storage. It mirrors the +// signature of bifrost.Client.EmbeddingRequest. type EmbeddingRequestExecutor func(ctx *schemas.BifrostContext, req *schemas.BifrostEmbeddingRequest) (*schemas.BifrostEmbeddingResponse, *schemas.BifrostError) -// Plugin implements the schemas.LLMPlugin interface for semantic caching. -// It caches responses using a two-tier approach: direct hash matching for exact requests -// and semantic similarity search for related content. The plugin supports configurable caching behavior -// via the VectorStore abstraction, including TTL management and streaming response handling. -// -// Fields: -// - store: VectorStore instance for semantic cache operations -// - config: Plugin configuration including semantic cache and caching settings -// - logger: Logger instance for plugin operations +// Plugin implements schemas.LLMPlugin for semantic caching. It serves cached +// responses via two complementary lookup paths: a direct O(1) hash match on +// (provider, model, cache_key, request_hash, params_hash) for exact replays, +// and an embedding-based similarity search for semantically related content. +// Streaming responses are accumulated chunk-by-chunk and stored as a single +// entry on the final chunk; TTL bookkeeping is per-entry via expires_at. type Plugin struct { store vectorstore.VectorStore config *Config logger schemas.Logger embeddingRequestExecutor EmbeddingRequestExecutor - streamAccumulators sync.Map // Track stream accumulators by request ID - waitGroup sync.WaitGroup + // streamAccumulators maps request ID → its in-progress *StreamAccumulator. + streamAccumulators sync.Map + // cacheStates maps request ID → its *cacheState (see state.go) for the + // span between PreLLMHook and PostLLMHook. + cacheStates sync.Map + // writersWg tracks short-lived per-request goroutines (the async cache + // writes spawned in PostLLMHook). WaitForPendingOperations blocks on this + // — tests use it to flush writes before asserting on the store. + writersWg sync.WaitGroup + // cleanupWg tracks the long-running background loops (stream + cacheState + // reapers). Only Cleanup blocks on this, after closing stopCh. + cleanupWg sync.WaitGroup + // stopCh is closed by Cleanup to signal the background reaper loops to exit. + stopCh chan struct{} + // cleanupOnce guards Cleanup so close(stopCh) doesn't panic if the harness + // invokes Cleanup more than once (e.g. plugin registered against multiple + // interface caches). + cleanupOnce sync.Once } // Plugin constants const ( PluginName string = "semantic_cache" DefaultVectorStoreNamespace string = "BifrostSemanticCachePlugin" - PluginLoggerPrefix string = "[Semantic Cache]" CacheConnectionTimeout time.Duration = 5 * time.Second CreateNamespaceTimeout time.Duration = 30 * time.Second CacheSetTimeout time.Duration = 30 * time.Second @@ -160,13 +175,14 @@ const ( DefaultConversationHistoryThreshold int = 3 ) -var SelectFields = []string{"request_hash", "response", "stream_chunks", "expires_at", "cache_key", "provider", "model"} +// SelectFields enumerates the properties projected back from the vector store +// on a cache hit. params_hash and from_bifrost_semantic_cache_plugin are +// filter-only (used in WHERE-style queries to narrow matches) and intentionally +// omitted from this projection — keep them defined in VectorStoreProperties +// below so the store creates the columns/indexes, but don't fetch them. +var SelectFields = []string{"response", "stream_chunks", "expires_at", "cache_key", "provider", "model"} var VectorStoreProperties = map[string]vectorstore.VectorStoreProperties{ - "request_hash": { - DataType: vectorstore.VectorStorePropertyTypeString, - Description: "The hash of the request", - }, "response": { DataType: vectorstore.VectorStorePropertyTypeString, Description: "The response from the provider", @@ -201,24 +217,15 @@ var VectorStoreProperties = map[string]vectorstore.VectorStoreProperties{ }, } +// Per-request context keys. Callers set these on BifrostContext before the +// request enters Bifrost; the plugin reads them in Pre/PostLLMHook. CacheKey +// (or Config.DefaultCacheKey) is the only one required for caching to engage. const ( - CacheKey schemas.BifrostContextKey = "semantic_cache_key" // To set the cache key for a request - REQUIRED for all requests - CacheTTLKey schemas.BifrostContextKey = "semantic_cache_ttl" // To explicitly set the TTL for a request - CacheThresholdKey schemas.BifrostContextKey = "semantic_cache_threshold" // To explicitly set the threshold for a request - CacheTypeKey schemas.BifrostContextKey = "semantic_cache_cache_type" // To explicitly set the cache type for a request - CacheNoStoreKey schemas.BifrostContextKey = "semantic_cache_no_store" // To explicitly disable storing the response in the cache - - // context keys for internal usage - requestIDKey schemas.BifrostContextKey = "semantic_cache_request_id" - requestStorageIDKey schemas.BifrostContextKey = "semantic_cache_request_storage_id" - requestHashKey schemas.BifrostContextKey = "semantic_cache_request_hash" - requestEmbeddingKey schemas.BifrostContextKey = "semantic_cache_embedding" - requestEmbeddingTokensKey schemas.BifrostContextKey = "semantic_cache_embedding_tokens" - requestParamsHashKey schemas.BifrostContextKey = "semantic_cache_params_hash" - requestModelKey schemas.BifrostContextKey = "semantic_cache_model" - requestProviderKey schemas.BifrostContextKey = "semantic_cache_provider" - isCacheHitKey schemas.BifrostContextKey = "semantic_cache_is_cache_hit" - cacheHitTypeKey schemas.BifrostContextKey = "semantic_cache_cache_hit_type" + CacheKey schemas.BifrostContextKey = "semantic_cache-key" // String. Required (or DefaultCacheKey) — bucket entries under a tenant/feature scope. + CacheTTLKey schemas.BifrostContextKey = "semantic_cache-ttl" // time.Duration. Per-request override of Config.TTL. + CacheThresholdKey schemas.BifrostContextKey = "semantic_cache-threshold" // float64. Per-request override of the semantic similarity threshold. + CacheTypeKey schemas.BifrostContextKey = "semantic_cache-cache_type" // CacheType. Narrow lookup to a single path (direct or semantic). + CacheNoStoreKey schemas.BifrostContextKey = "semantic_cache-no_store" // bool. Skip writing the response to cache (still served from cache on hit). ) type CacheType string @@ -228,20 +235,12 @@ const ( CacheTypeSemantic CacheType = "semantic" ) -// Init creates a new semantic cache plugin instance with the provided configuration. -// It uses the VectorStore abstraction for cache operations and returns a configured plugin. -// -// The VectorStore handles the underlying storage implementation and its defaults. -// The plugin only sets defaults for its own behavior (TTL, cache key generation, etc.). +// Init validates the configuration, creates the namespace in the underlying +// VectorStore, starts the background reaper goroutines, and returns a plugin +// ready to be wired into the Bifrost plugin pipeline. // -// Parameters: -// - config: Semantic cache and plugin configuration (CacheKey is required) -// - logger: Logger instance for the plugin -// - store: VectorStore instance for cache operations -// -// Returns: -// - schemas.LLMPlugin: A configured semantic cache plugin instance -// - error: Any error that occurred during plugin initialization +// Note: Init mutates *config in place to fill in defaults — TTL, Threshold, +// CacheBy* — so the caller sees the resolved values after this returns. func Init(ctx context.Context, config *Config, logger schemas.Logger, store vectorstore.VectorStore) (schemas.LLMPlugin, error) { if config == nil { return nil, fmt.Errorf("config is required") @@ -249,43 +248,51 @@ func Init(ctx context.Context, config *Config, logger schemas.Logger, store vect if store == nil { return nil, fmt.Errorf("store is required") } + if config.Dimension < 0 { + return nil, fmt.Errorf("dimension must be non-negative, got %d", config.Dimension) + } + if config.Provider != "" && config.Dimension <= 0 { + return nil, fmt.Errorf("dimension must be > 0 when provider is set (got dimension=%d, provider=%q)", config.Dimension, config.Provider) + } // Set plugin-specific defaults if config.VectorStoreNamespace == "" { - logger.Debug(PluginLoggerPrefix + " Vector store namespace is not set, using default of " + DefaultVectorStoreNamespace) + logger.Debug("Vector store namespace is not set, using default of %s", DefaultVectorStoreNamespace) config.VectorStoreNamespace = DefaultVectorStoreNamespace } if config.TTL == 0 { - logger.Debug(PluginLoggerPrefix + " TTL is not set, using default of 5 minutes") + logger.Debug("TTL is not set, using default of %v", DefaultCacheTTL) config.TTL = DefaultCacheTTL } if config.Threshold == 0 { - logger.Debug(PluginLoggerPrefix + " Threshold is not set, using default of " + strconv.FormatFloat(DefaultCacheThreshold, 'f', -1, 64)) + logger.Debug("Threshold is not set, using default of %v", DefaultCacheThreshold) config.Threshold = DefaultCacheThreshold } if config.ConversationHistoryThreshold == 0 { - logger.Debug(PluginLoggerPrefix + " Conversation history threshold is not set, using default of " + strconv.Itoa(DefaultConversationHistoryThreshold)) + logger.Debug("Conversation history threshold is not set, using default of %d", DefaultConversationHistoryThreshold) config.ConversationHistoryThreshold = DefaultConversationHistoryThreshold } // Set cache behavior defaults if config.CacheByModel == nil { + logger.Debug("CacheByModel is not set, defaulting to true") config.CacheByModel = new(true) } if config.CacheByProvider == nil { + logger.Debug("CacheByProvider is not set, defaulting to true") config.CacheByProvider = new(true) } plugin := &Plugin{ - store: store, - config: config, - logger: logger, - waitGroup: sync.WaitGroup{}, + store: store, + config: config, + logger: logger, + stopCh: make(chan struct{}), } if config.Provider == "" && config.Dimension == 1 { - logger.Info(PluginLoggerPrefix + " Starting in direct-only mode (dimension=1, no embedding provider)") + logger.Info("Starting in direct-only mode (dimension=1, no embedding provider)") } else if config.Provider == "" { - logger.Warn(PluginLoggerPrefix + " Incomplete semantic mode config: missing provider, falling back to direct search only") + logger.Warn("Incomplete semantic mode config: missing provider, falling back to direct search only") } createCtx, cancel := context.WithTimeout(ctx, CreateNamespaceTimeout) @@ -294,382 +301,312 @@ func Init(ctx context.Context, config *Config, logger schemas.Logger, store vect return nil, fmt.Errorf("failed to create namespace for semantic cache: %w", err) } + plugin.cleanupWg.Add(1) + go plugin.runStreamCleanupLoop() + + plugin.cleanupWg.Add(1) + go plugin.runCacheStateCleanupLoop() + return plugin, nil } -// GetName returns the canonical name of the semantic cache plugin. -// This name is used for plugin identification and logging purposes. -// -// Returns: -// - string: The plugin name for semantic cache +// GetName returns the canonical name used for plugin identification and logging. func (plugin *Plugin) GetName() string { return PluginName } -// HTTPTransportPreHook is not used for this plugin +// HTTPTransportPreHook is not used by the semantic cache plugin. func (plugin *Plugin) HTTPTransportPreHook(ctx *schemas.BifrostContext, req *schemas.HTTPRequest) (*schemas.HTTPResponse, error) { return nil, nil } -// HTTPTransportPostHook is not used for this plugin +// HTTPTransportPostHook is not used by the semantic cache plugin. func (plugin *Plugin) HTTPTransportPostHook(ctx *schemas.BifrostContext, req *schemas.HTTPRequest, resp *schemas.HTTPResponse) error { return nil } -// HTTPTransportStreamChunkHook passes through streaming chunks unchanged +// HTTPTransportStreamChunkHook passes streaming chunks through unchanged. func (plugin *Plugin) HTTPTransportStreamChunkHook(ctx *schemas.BifrostContext, req *schemas.HTTPRequest, chunk *schemas.BifrostStreamChunk) (*schemas.BifrostStreamChunk, error) { return chunk, nil } -// PreLLMHook is called before a request is processed by Bifrost. -// It performs a two-stage cache lookup: first direct hash matching, then semantic similarity search. -// Uses UUID-based keys for entries stored in the VectorStore. -// -// Parameters: -// - ctx: Pointer to the schemas.BifrostContext -// - req: The incoming Bifrost request -// -// Returns: -// - *schemas.BifrostRequest: The original request -// - *schemas.BifrostResponse: Cached response if found, nil otherwise -// - error: Any error that occurred during cache lookup +// PreLLMHook performs the cache lookup before the request reaches the +// provider. It runs the direct hash path first (cheapest), falls back to +// semantic similarity search when configured, and short-circuits the +// pipeline with a cached response on hit. On miss, it leaves per-request +// state on the plugin keyed by request ID for PostLLMHook to consume when +// the upstream response arrives. func (plugin *Plugin) PreLLMHook(ctx *schemas.BifrostContext, req *schemas.BifrostRequest) (*schemas.BifrostRequest, *schemas.LLMPluginShortCircuit, error) { - provider, model, _ := req.GetRequestFields() - // Get the cache key from the context - var cacheKey string - var ok bool - - cacheKey, ok = ctx.Value(CacheKey).(string) - if !ok || cacheKey == "" { - if plugin.config.DefaultCacheKey != "" { - cacheKey = plugin.config.DefaultCacheKey - plugin.logger.Debug(PluginLoggerPrefix + " Using default cache key: " + cacheKey) - } else { - plugin.logger.Debug(PluginLoggerPrefix + " No cache key found in context, continuing without caching") - return req, nil, nil - } + cacheKey, ok := plugin.resolveCacheKey(ctx) + if !ok { + return req, nil, nil } - // Clear request-scoped semantic cache state up front in case the context is reused. - plugin.clearRequestScopedContext(ctx) + // Without a request ID we have nowhere to anchor per-request state. The + // framework always stamps this before plugin hooks run; direct callers + // (tests, custom integrations) must set it too. + requestID, ok := ctx.Value(schemas.BifrostContextKeyRequestID).(string) + if !ok || requestID == "" { + return req, nil, nil + } if !isSemanticCacheSupportedRequestType(req.RequestType) { - plugin.logger.Debug(PluginLoggerPrefix + " Skipping caching for unsupported request type: " + string(req.RequestType)) return req, nil, nil } - if plugin.isConversationHistoryThresholdExceeded(req) { - plugin.logger.Debug(PluginLoggerPrefix + " Skipping caching for request with conversation history threshold exceeded") + // Create state up front so a reused/retried request ID never inherits stale fields. + state := plugin.createCacheState(requestID) + + if plugin.isConversationHistoryThresholdExceeded(state, req) { + plugin.clearCacheState(requestID) return req, nil, nil } - // Generate UUID for this request - requestID := uuid.New().String() - - // Store request ID, model, and provider in context for PostLLMHook - ctx.SetValue(requestIDKey, requestID) - ctx.SetValue(requestModelKey, model) - ctx.SetValue(requestProviderKey, provider) + performDirectSearch, performSemanticSearch := plugin.resolveCacheTypes(ctx) + + // If neither search path can produce a lookup in the current plugin + // configuration, skip caching entirely (no read, no write). Concretely: + // - x-bf-cache-type=semantic against a direct-only plugin (Provider="", + // Dimension=1) — generateEmbedding would fail with "provider is + // required", PostLLMHook would still write an orphan entry under a + // random request UUID that no future read can find. + // - x-bf-cache-type=direct against a misconfigured semantic-only plugin + // where direct search is disabled. + // - An unknown cache-type header value (resolveCacheTypes returns false + // for both paths). + // The embedding executor alone isn't a sufficient gate — the framework + // wires it on every plugin, but the plugin's config decides whether + // semantic search is actually viable. + canDoSemanticSearch := plugin.embeddingRequestExecutor != nil && + plugin.config.Provider != "" && + plugin.config.EmbeddingModel != "" && + plugin.config.Dimension > 1 && + req.EmbeddingRequest == nil && + req.TranscriptionRequest == nil + if !performDirectSearch && (!performSemanticSearch || !canDoSemanticSearch) { + plugin.clearCacheState(requestID) + msg := "skipping cache: no search path available for this request (cache_type narrowed to a path that the current plugin configuration cannot serve)" + plugin.logger.Warn(msg) + ctx.Log(schemas.LogLevelWarn, msg) + return req, nil, nil + } - performDirectSearch, performSemanticSearch := true, true - if ctx.Value(CacheTypeKey) != nil { - cacheTypeVal, ok := ctx.Value(CacheTypeKey).(CacheType) - if !ok { - plugin.logger.Warn(PluginLoggerPrefix + " Cache type is not a CacheType, using all available cache types") - } else { - performDirectSearch = cacheTypeVal == CacheTypeDirect - performSemanticSearch = cacheTypeVal == CacheTypeSemantic - } + // Compute metadata + paramsHash once and reuse across both search paths. + metadata, err := plugin.buildRequestMetadataForCaching(state, req) + if err != nil { + plugin.clearCacheState(requestID) + plugin.logger.Debug("metadata build failed, caching disabled for this request: %v", err) + return req, nil, nil } + paramsHash, err := hashMap(metadata) + if err != nil { + plugin.clearCacheState(requestID) + plugin.logger.Debug("params hash failed, caching disabled for this request: %v", err) + return req, nil, nil + } + state.ParamsHash = paramsHash if performDirectSearch { - shortCircuit, err := plugin.performDirectSearch(ctx, req, cacheKey) + shortCircuit, err := plugin.performDirectSearch(ctx, state, req, cacheKey, metadata, paramsHash) if err != nil { - plugin.logger.Warn(PluginLoggerPrefix + " Direct search failed: " + err.Error() + " (" + describeRequestShape(req) + ")") - // Don't return - continue to semantic search fallback - shortCircuit = nil // Ensure we don't use an invalid shortCircuit - } - - if shortCircuit != nil { + msg := fmt.Sprintf("direct search failed (vector store unreachable?): %v", err) + plugin.logger.Warn(msg) + ctx.Log(schemas.LogLevelWarn, msg) + } else if shortCircuit != nil { return req, shortCircuit, nil } } - if performSemanticSearch && plugin.embeddingRequestExecutor != nil { - if req.EmbeddingRequest != nil || req.TranscriptionRequest != nil { - plugin.logger.Debug(PluginLoggerPrefix + " Skipping semantic search for embedding/transcription input") - // For vector stores that require vectors, set a zero vector placeholder - // This allows direct hash matching to work without the overhead of generating embeddings - if plugin.store.RequiresVectors() && plugin.config.Dimension > 0 { - zeroVector := make([]float32, plugin.config.Dimension) - ctx.SetValue(requestEmbeddingKey, zeroVector) - plugin.logger.Debug(PluginLoggerPrefix + " Using zero vector placeholder for embedding/transcription request storage") - } - return req, nil, nil - } - - // Try semantic search as fallback - shortCircuit, err := plugin.performSemanticSearch(ctx, req, cacheKey) - if err != nil { - plugin.logger.Debug(PluginLoggerPrefix + " Semantic search skipped: " + err.Error() + " (" + describeRequestShape(req) + ")") - return req, nil, nil - } - - if shortCircuit != nil { - return req, shortCircuit, nil - } - } else if !performSemanticSearch && plugin.store.RequiresVectors() && plugin.embeddingRequestExecutor != nil { - // Vector store requires vectors but we're in direct-only mode - // Generate embeddings for storage purposes (not for searching) - if req.EmbeddingRequest != nil || req.TranscriptionRequest != nil { - plugin.logger.Debug(PluginLoggerPrefix + " Skipping embedding generation for embedding/transcription input") - // For vector stores that require vectors, set a zero vector placeholder - // This allows direct hash matching to work without the overhead of generating embeddings - if plugin.config.Dimension > 0 { - zeroVector := make([]float32, plugin.config.Dimension) - ctx.SetValue(requestEmbeddingKey, zeroVector) - plugin.logger.Debug(PluginLoggerPrefix + " Using zero vector placeholder for embedding/transcription request storage") + if performSemanticSearch { + // Reuse canDoSemanticSearch so the default cache-type path (both flags + // true) applies the same provider/model/dimension gate as the explicit + // semantic-only path — otherwise a misconfigured plugin wastes one + // generateEmbedding round-trip per request before failing downstream. + if !canDoSemanticSearch { + plugin.setZeroVectorIfRequired(state) + } else { + shortCircuit, err := plugin.performSemanticSearch(ctx, state, req, cacheKey, paramsHash) + if err != nil { + // Embedding failures (rate-limit, auth, timeout) are + // operationally important — surface at Warn and on the response. + msg := fmt.Sprintf("semantic search skipped: %v", err) + plugin.logger.Warn(msg) + ctx.Log(schemas.LogLevelWarn, msg) + } else if shortCircuit != nil { + return req, shortCircuit, nil } - return req, nil, nil - } - - // Use zero vector for direct-only cache type to prevent semantic search matches - // This preserves cache type isolation - direct-only entries won't be found by semantic search - if plugin.config.Dimension > 0 { - zeroVector := make([]float32, plugin.config.Dimension) - ctx.SetValue(requestEmbeddingKey, zeroVector) - plugin.logger.Debug(PluginLoggerPrefix + " Using zero vector for direct-only cache storage (preserves isolation)") } + } else if !performSemanticSearch { + // Direct-only mode. If the vector store requires vectors for every entry + // (Qdrant, Pinecone) we write a zero vector. Note: this collapses all + // direct-only entries onto the same point in vector space, so a + // semantic search across cache types under the same cache_key/params + // could surface them. params_hash filtering is the actual isolation. + plugin.setZeroVectorIfRequired(state) } return req, nil, nil } -// PostLLMHook is called after a response is received from a provider. -// It caches responses in the VectorStore using UUID-based keys with unified metadata structure -// including provider, model, request hash, and TTL. Handles both single and streaming responses. -// -// The function performs the following operations: -// 1. Checks configurable caching behavior and skips caching for unsuccessful responses if configured -// 2. Retrieves the request hash and ID from the context (set during PreLLMHook) -// 3. Marshals the response for storage -// 4. Stores the unified cache entry in the VectorStore asynchronously (non-blocking) -// -// The VectorStore Add operation runs in a separate goroutine to avoid blocking the response. -// The function gracefully handles errors and continues without caching if any step fails, -// ensuring that response processing is never interrupted by caching issues. -// -// Parameters: -// - ctx: Pointer to the schemas.BifrostContext containing the request hash and ID -// - res: The response from the provider to be cached -// - bifrostErr: The error from the provider, if any (used for success determination) -// -// Returns: -// - *schemas.BifrostResponse: The original response, unmodified -// - *schemas.BifrostError: The original error, unmodified -// - error: Any error that occurred during caching preparation (always nil as errors are handled gracefully) -func (plugin *Plugin) PostLLMHook(ctx *schemas.BifrostContext, res *schemas.BifrostResponse, bifrostErr *schemas.BifrostError) (*schemas.BifrostResponse, *schemas.BifrostError, error) { - if bifrostErr != nil { - return res, bifrostErr, nil +// resolveCacheKey returns the per-request cache key (or the configured default) +// and a bool indicating whether the caller should proceed with caching. +func (plugin *Plugin) resolveCacheKey(ctx *schemas.BifrostContext) (string, bool) { + if cacheKey, ok := ctx.Value(CacheKey).(string); ok && cacheKey != "" { + return cacheKey, true } - - // Skip caching for large payloads — body is too large to materialize for cache storage - if isLargePayload, ok := ctx.Value(schemas.BifrostContextKeyLargePayloadMode).(bool); ok && isLargePayload { - plugin.logger.Debug(PluginLoggerPrefix + " Skipping semantic cache for large payload request") - return res, nil, nil - } - if isLargeResponse, ok := ctx.Value(schemas.BifrostContextKeyLargeResponseMode).(bool); ok && isLargeResponse { - plugin.logger.Debug(PluginLoggerPrefix + " Skipping semantic cache for large payload response") - return res, nil, nil + if plugin.config.DefaultCacheKey != "" { + return plugin.config.DefaultCacheKey, true } + return "", false +} - isCacheHit := ctx.Value(isCacheHitKey) - if isCacheHit != nil { - isCacheHitValue, ok := isCacheHit.(bool) - if ok && isCacheHitValue { - return res, nil, nil - } +// resolveCacheTypes returns whether direct and semantic search paths should +// run for this request. Defaults both to true; an explicit CacheTypeKey on +// the context narrows to just one. +func (plugin *Plugin) resolveCacheTypes(ctx *schemas.BifrostContext) (direct bool, semantic bool) { + direct, semantic = true, true + ctxVal := ctx.Value(CacheTypeKey) + if ctxVal == nil { + return } + cacheTypeVal, ok := ctxVal.(CacheType) + if !ok { + msg := fmt.Sprintf("CacheTypeKey is not a CacheType (got %T), using all available cache types", ctxVal) + plugin.logger.Warn(msg) + ctx.Log(schemas.LogLevelWarn, msg) + return + } + direct = cacheTypeVal == CacheTypeDirect + semantic = cacheTypeVal == CacheTypeSemantic + return +} - // Check if caching is explicitly disabled - noStore := ctx.Value(CacheNoStoreKey) - if noStore != nil { - noStoreValue, ok := noStore.(bool) - if ok && noStoreValue { - plugin.logger.Debug(PluginLoggerPrefix + " Caching is explicitly disabled for this request, continuing without caching") - return res, nil, nil - } +// setZeroVectorIfRequired writes a zero embedding placeholder when the store +// mandates a vector per entry. See PreLLMHook for the isolation caveat. +func (plugin *Plugin) setZeroVectorIfRequired(state *cacheState) { + if !plugin.store.RequiresVectors() || plugin.config.Dimension <= 0 { + return } + state.Embeddings = make([]float32, plugin.config.Dimension) +} - // Get the cache key from context - cacheKey, ok := ctx.Value(CacheKey).(string) - if !ok || cacheKey == "" { - if plugin.config.DefaultCacheKey != "" { - cacheKey = plugin.config.DefaultCacheKey - } else { - return res, nil, nil - } +// PostLLMHook caches the upstream response keyed by the storageID resolved +// in PreLLMHook (deterministic directCacheID for direct hits, request UUID +// otherwise). The store write runs in a goroutine tracked by writersWg with +// its own background context + CacheSetTimeout, so client cancellation +// after the response is delivered doesn't drop the cache write. Returns the +// response unmodified — caching never alters the request flow. +func (plugin *Plugin) PostLLMHook(ctx *schemas.BifrostContext, res *schemas.BifrostResponse, bifrostErr *schemas.BifrostError) (*schemas.BifrostResponse, *schemas.BifrostError, error) { + if bifrostErr != nil { + // We rely on errors always arriving as the final chunk for streams, so + // we abort caching here without further bookkeeping. Any partial + // accumulator from a prior chunk gets reaped by the periodic cleanup. + return res, bifrostErr, nil } - // Get the request ID from context - requestID, ok := ctx.Value(requestIDKey).(string) + requestID, ok := ctx.Value(schemas.BifrostContextKeyRequestID).(string) if !ok { return res, nil, nil } - storageID := requestID - // When direct lookup prepared a deterministic storage ID, reuse it here so - // default-mode traffic warms the GetChunk fast path instead of only the - // legacy search path. - if v, ok := ctx.Value(requestStorageIDKey).(string); ok && v != "" { - storageID = v - } - // Check cache type to optimize embedding handling - var embedding []float32 - var hash string - var shouldStoreEmbeddings = true - var shouldStoreHash = true - - if ctx.Value(CacheTypeKey) != nil { - cacheTypeVal, ok := ctx.Value(CacheTypeKey).(CacheType) - if ok { - if cacheTypeVal == CacheTypeDirect { - // For direct-only caching, skip embedding operations entirely - // unless the vector store requires vectors for all entries - if plugin.store.RequiresVectors() { - // Vector stores like Qdrant and Pinecone require vectors for all entries - // Keep embeddings enabled for storage, but lookups will still use direct hash matching - plugin.logger.Debug(PluginLoggerPrefix + " Vector store requires vectors, keeping embedding generation enabled for storage") - } else { - shouldStoreEmbeddings = false - plugin.logger.Debug(PluginLoggerPrefix + " Skipping embedding operations for direct-only cache type") - } - } else if cacheTypeVal == CacheTypeSemantic { - shouldStoreHash = false - plugin.logger.Debug(PluginLoggerPrefix + " Skipping hash operations for semantic cache type") - } - } - } - - if shouldStoreHash { - // Get the hash from context - hash, ok = ctx.Value(requestHashKey).(string) - if !ok { - plugin.logger.Warn(PluginLoggerPrefix + " Hash is not a string. Continuing without caching") - return res, nil, nil - } - } extraFields := res.GetExtraFields() requestType := extraFields.RequestType - - // Get embedding from context if available and needed - // For embedding/transcription requests, we still need to retrieve the zero vector placeholder - // if the vector store requires vectors for all entries - isEmbeddingOrTranscription := requestType == schemas.EmbeddingRequest || requestType == schemas.TranscriptionRequest - needsEmbedding := shouldStoreEmbeddings && !isEmbeddingOrTranscription - needsZeroVector := isEmbeddingOrTranscription && plugin.store.RequiresVectors() - - if needsEmbedding || needsZeroVector { - embeddingValue := ctx.Value(requestEmbeddingKey) - if embeddingValue != nil { - embedding, ok = embeddingValue.([]float32) - if !ok { - plugin.logger.Warn(PluginLoggerPrefix + " Embedding is not a []float32, continuing without caching") - return res, nil, nil - } - } - // Note: embedding can be nil for direct cache hits or when semantic search is disabled - // This is fine - we can still cache using direct hash matching (unless store requires vectors) - } - - // Get the provider from context - provider, ok := ctx.Value(requestProviderKey).(schemas.ModelProvider) - if !ok { - plugin.logger.Warn(PluginLoggerPrefix + " Provider is not a schemas.ModelProvider, continuing without caching") + cacheDebug := extraFields.CacheDebug + + // Final-chunk signaling for cache replays: stampCacheDebugForHit only + // stamps CacheDebug.CacheHit=true on the LAST replay chunk (see search.go). + // When we see that stamp, we set the stream-end indicator on the root ctx + // synchronously — same goroutine as the rest of the post-hook chain. + // + // Why not set the indicator from the cache replay goroutine instead? It + // races: the producer can advance to its next iteration (and SetValue) + // while the receiver is still running PostLLMHooks for the previous + // chunk, poisoning that chunk's IsFinalChunk read. + if bifrost.IsStreamRequestType(requestType) && cacheDebug != nil && cacheDebug.CacheHit { + ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) + } + // Cache hit replay: cache_debug was already stamped in PreLLMHook by + // stampCacheDebugForHit. There's nothing further to do here — no new + // telemetry to stamp, no write to perform. + if cacheDebug != nil && cacheDebug.CacheHit { + plugin.clearCacheState(requestID) return res, nil, nil } - // Get the model from context - model, ok := ctx.Value(requestModelKey).(string) + cacheKey, ok := plugin.resolveCacheKey(ctx) if !ok { - plugin.logger.Warn(PluginLoggerPrefix + " Model is not a string, continuing without caching") return res, nil, nil } - + provider := extraFields.Provider + model := extraFields.OriginalModelRequested + isStream := bifrost.IsStreamRequestType(requestType) isFinalChunk := bifrost.IsFinalChunk(ctx) - // Get the input tokens from context (can be nil if not set) - inputTokens, ok := ctx.Value(requestEmbeddingTokensKey).(int) - if ok { - isStreamRequest := bifrost.IsStreamRequestType(requestType) + state := plugin.getCacheState(requestID) + if state == nil || state.ParamsHash == "" { + // PreLLMHook bailed before computing the params hash (unsupported + // request type, conversation-history threshold, metadata error, + // no-search-path narrow, etc.). Without state we have no telemetry + // to stamp and no entry to write. + return res, nil, nil + } - if !isStreamRequest || (isStreamRequest && isFinalChunk) { - if extraFields.CacheDebug == nil { - extraFields.CacheDebug = &schemas.BifrostCacheDebug{} - } - extraFields.CacheDebug.CacheHit = false - extraFields.CacheDebug.ProviderUsed = bifrost.Ptr(string(plugin.config.Provider)) - extraFields.CacheDebug.ModelUsed = bifrost.Ptr(plugin.config.EmbeddingModel) - extraFields.CacheDebug.InputTokens = &inputTokens + // Free state once the request is fully observed. For non-streams that's + // after this PostLLMHook returns; for streams, only on the final chunk. + defer func() { + if !isStream || isFinalChunk { + plugin.clearCacheState(requestID) } + }() + + // PreLLMHook short-circuited from cache (non-final stream chunks of a + // replay land here). Telemetry is already stamped on the final chunk by + // stampCacheDebugForHit; non-final chunks have no telemetry to add. + // Without this guard non-final chunks would slip into addStreamingResponse + // and trigger a duplicate write at the same directCacheID + // (Weaviate 422 "id already exists"). + if state.ShortCircuited { + return res, nil, nil } - cacheTTL := plugin.config.TTL + storageID, embedding, shouldStoreEmbeddings := plugin.resolveStorageIDAndEmbedding(ctx, state, requestID, requestType) - ttlValue := ctx.Value(CacheTTLKey) - if ttlValue != nil { - // Get the request TTL from the context - ttl, ok := ttlValue.(time.Duration) - if !ok { - plugin.logger.Warn(PluginLoggerPrefix + " TTL is not a time.Duration, using default TTL") - } else { - cacheTTL = ttl - } + // Stamp cache_debug telemetry FIRST so callers can observe that the + // plugin ran a lookup, regardless of whether we then choose to skip + // writing the entry (no-store header, large-payload modes, etc.). + // Observability shouldn't depend on the write decision — that was + // previously the case and made the cache layer invisible to callers + // using no-store. + plugin.stampCacheDebugForMiss(state, extraFields, storageID, isStream, isFinalChunk) + + // Now decide whether to actually write. Skipping the write still + // leaves cache_debug stamped above. + if plugin.shouldSkipCacheWrite(ctx) { + return res, nil, nil } - // Get metadata from context BEFORE goroutine to avoid race conditions - // when the same context is reused across multiple requests - paramsHash, _ := ctx.Value(requestParamsHashKey).(string) - - // Snapshot the response synchronously for the non-streaming cache path. - // Marshaling inside the cache goroutine races with the framework returning - // res upstream and downstream consumers mutating it (CacheDebug, etc.). - // Streaming uses a chunk accumulator that snapshots per-chunk separately. - var singleResponseData []byte - if !bifrost.IsStreamRequestType(requestType) { - var marshalErr error - singleResponseData, marshalErr = json.Marshal(res) - if marshalErr != nil { - plugin.logger.Warn("%s Failed to snapshot response for caching: %v", PluginLoggerPrefix, marshalErr) - return res, nil, nil - } + cacheTTL := plugin.resolveTTL(ctx) + paramsHash := state.ParamsHash + + embeddingToStore := embedding + if !shouldStoreEmbeddings { + embeddingToStore = nil } - // Cache everything in a unified VectorEntry asynchronously to avoid blocking the response - plugin.waitGroup.Add(1) + plugin.writersWg.Add(1) go func() { - defer plugin.waitGroup.Done() - // Create a background context with timeout for the cache operation + defer plugin.writersWg.Done() cacheCtx, cancel := context.WithTimeout(context.Background(), CacheSetTimeout) defer cancel() - // Build unified metadata with provider, model, and all params - unifiedMetadata := plugin.buildUnifiedMetadata(provider, model, paramsHash, hash, cacheKey, cacheTTL) - - // Handle streaming vs non-streaming responses - // Pass nil for embedding if we're in direct-only mode to optimize storage - embeddingToStore := embedding - if !shouldStoreEmbeddings { - embeddingToStore = nil - } - - if bifrost.IsStreamRequestType(requestType) { - if err := plugin.addStreamingResponse(cacheCtx, requestID, storageID, res, bifrostErr, embeddingToStore, unifiedMetadata, cacheTTL, isFinalChunk); err != nil { - plugin.logger.Warn("%s Failed to cache streaming response: %v", PluginLoggerPrefix, err) + unifiedMetadata := plugin.buildUnifiedMetadata(provider, model, paramsHash, cacheKey, cacheTTL) + if isStream { + if err := plugin.addStreamingResponse(cacheCtx, requestID, storageID, res, embeddingToStore, unifiedMetadata, cacheTTL, isFinalChunk); err != nil { + plugin.logger.Warn("Failed to cache streaming response (namespace=%s, id=%s): %v. The cache_id stamped on the response will not resolve on subsequent lookups.", plugin.config.VectorStoreNamespace, storageID, err) } } else { - if err := plugin.addSingleResponse(cacheCtx, storageID, singleResponseData, embeddingToStore, unifiedMetadata, cacheTTL); err != nil { - plugin.logger.Warn("%s Failed to cache single response: %v", PluginLoggerPrefix, err) + if err := plugin.addNonStreamingResponse(cacheCtx, storageID, res, embeddingToStore, unifiedMetadata, cacheTTL); err != nil { + plugin.logger.Warn("Failed to cache single response (namespace=%s, id=%s): %v. The cache_id stamped on the response will not resolve on subsequent lookups.", plugin.config.VectorStoreNamespace, storageID, err) } } }() @@ -677,93 +614,134 @@ func (plugin *Plugin) PostLLMHook(ctx *schemas.BifrostContext, res *schemas.Bifr return res, nil, nil } -// WaitForPendingOperations blocks until all pending cache operations (goroutines) complete. -// This is useful in tests to ensure cache entries are stored before checking for cache hits. -func (plugin *Plugin) WaitForPendingOperations() { - plugin.waitGroup.Wait() +// shouldSkipCacheWrite returns true if the upstream response should NOT be +// written to the cache store. Telemetry (cache_debug) is stamped before this +// is consulted, so callers retain observability on misses even when no_store +// or large-payload modes are in effect. The cache-hit-replay case is handled +// separately as an early return in PostLLMHook because it must short-circuit +// before stamping (cache_debug for hits is already populated by +// stampCacheDebugForHit during PreLLMHook). +func (plugin *Plugin) shouldSkipCacheWrite(ctx *schemas.BifrostContext) bool { + if isLargePayload, ok := ctx.Value(schemas.BifrostContextKeyLargePayloadMode).(bool); ok && isLargePayload { + return true + } + if isLargeResponse, ok := ctx.Value(schemas.BifrostContextKeyLargeResponseMode).(bool); ok && isLargeResponse { + return true + } + if noStore, ok := ctx.Value(CacheNoStoreKey).(bool); ok && noStore { + return true + } + return false } -// Cleanup performs cleanup operations for the semantic cache plugin. -// It removes all cached entries created by this plugin from the VectorStore only if CleanUpOnShutdown is true. -// Identifies cache entries by the presence of semantic cache-specific fields (request_hash, cache_key). -// -// The function performs the following operations: -// 1. Checks if cleanup is enabled via CleanUpOnShutdown config -// 2. Retrieves all entries and filters client-side to identify cache entries -// 3. Deletes all matching cache entries from the VectorStore in batches -// -// This method should be called when shutting down the application to ensure -// proper resource cleanup if configured to do so. -// -// Returns: -// - error: Any error that occurred during cleanup operations -func (plugin *Plugin) Cleanup() error { - plugin.waitGroup.Wait() - - // Clean up old stream accumulators first - plugin.cleanupOldStreamAccumulators() - - // Only clean up cache entries if configured to do so - if !plugin.config.CleanUpOnShutdown { - plugin.logger.Debug(PluginLoggerPrefix + " Cleanup on shutdown is disabled, skipping cache cleanup") - return nil +// resolveStorageIDAndEmbedding picks the storage ID (deterministic directCacheID +// when direct search ran, else the request UUID) and resolves the embedding +// from per-request state. shouldStoreEmbeddings is false for explicit +// direct-only requests on stores that don't require vectors — those entries +// skip the embedding column entirely. +func (plugin *Plugin) resolveStorageIDAndEmbedding(ctx *schemas.BifrostContext, state *cacheState, requestID string, requestType schemas.RequestType) (storageID string, embedding []float32, shouldStoreEmbeddings bool) { + storageID = requestID + if state.DirectCacheID != "" { + storageID = state.DirectCacheID } - // Clean up all cache entries created by this plugin - ctx, cancel := context.WithTimeout(context.Background(), CacheSetTimeout) - defer cancel() + shouldStoreEmbeddings = true + if cacheTypeVal, isCacheType := ctx.Value(CacheTypeKey).(CacheType); isCacheType && cacheTypeVal == CacheTypeDirect && !plugin.store.RequiresVectors() { + shouldStoreEmbeddings = false + } - plugin.logger.Debug(PluginLoggerPrefix + " Starting cleanup of cache entries...") + isEmbeddingOrTranscription := requestType == schemas.EmbeddingRequest || requestType == schemas.TranscriptionRequest + needsEmbedding := shouldStoreEmbeddings && !isEmbeddingOrTranscription + needsZeroVector := isEmbeddingOrTranscription && plugin.store.RequiresVectors() - // Delete all cache entries created by this plugin - queries := []vectorstore.Query{ - { - Field: "from_bifrost_semantic_cache_plugin", - Operator: vectorstore.QueryOperatorEqual, - Value: true, - }, + if needsEmbedding || needsZeroVector { + // embedding may still be nil — fine for direct hash matching unless the + // store requires vectors (in which case Add will reject downstream). + embedding = state.Embeddings } + return storageID, embedding, shouldStoreEmbeddings +} - results, err := plugin.store.DeleteAll(ctx, plugin.config.VectorStoreNamespace, queries) - if err != nil { - return fmt.Errorf("failed to delete cache entries: %w", err) +// stampCacheDebugForMiss attaches cache miss telemetry to the response. It +// always sets CacheHit=false and CacheID to the storage ID where the entry +// will be written, so the caller can later invalidate via ClearCacheForCacheID. +// Embedding-cost fields (ProviderUsed/ModelUsed/InputTokens) are only stamped +// when semantic search actually ran. For streams, only the final chunk is +// stamped to avoid duplicating telemetry. +func (plugin *Plugin) stampCacheDebugForMiss(state *cacheState, extraFields *schemas.BifrostResponseExtraFields, storageID string, isStream, isFinalChunk bool) { + if isStream && !isFinalChunk { + return + } + if extraFields.CacheDebug == nil { + extraFields.CacheDebug = &schemas.BifrostCacheDebug{} + } + cd := extraFields.CacheDebug + cd.CacheHit = false + cd.CacheID = bifrost.Ptr(storageID) + if state.EmbeddingsInputTokens > 0 { + inputTokens := state.EmbeddingsInputTokens + cd.ProviderUsed = bifrost.Ptr(string(plugin.config.Provider)) + cd.ModelUsed = bifrost.Ptr(plugin.config.EmbeddingModel) + cd.InputTokens = &inputTokens } +} - for _, result := range results { - if result.Status == vectorstore.DeleteStatusError { - plugin.logger.Warn("%s Failed to delete cache entry: %s", PluginLoggerPrefix, result.Error) +// resolveTTL returns the per-request TTL override if present, else the plugin +// default. A non-positive override (0 or negative) is treated as "use default" +// to mirror how Config.UnmarshalJSON + Init treat TTL=0 at construction time — +// otherwise a header of "0s" would yield expires_at=now and silently kill the +// cache write for the affected request, which is rarely what the caller wants. +func (plugin *Plugin) resolveTTL(ctx *schemas.BifrostContext) time.Duration { + if v := ctx.Value(CacheTTLKey); v != nil { + if ttl, ok := v.(time.Duration); ok { + if ttl > 0 { + return ttl + } + plugin.logger.Debug("ignoring non-positive per-request TTL override %v, falling back to plugin default", ttl) + } else { + plugin.logger.Warn("TTL is not a time.Duration, using default TTL") } } - plugin.logger.Info("%s Cleanup completed - deleted all cache entries", PluginLoggerPrefix) + return plugin.config.TTL +} - if err := plugin.store.DeleteNamespace(ctx, plugin.config.VectorStoreNamespace); err != nil { - return fmt.Errorf("failed to delete namespace: %w", err) - } +// WaitForPendingOperations blocks until all pending cache operations (goroutines) complete. +// This is useful in tests to ensure cache entries are stored before checking for cache hits. +// It does NOT wait on background loops — those only exit on Cleanup. +func (plugin *Plugin) WaitForPendingOperations() { + plugin.writersWg.Wait() +} +// Cleanup signals the background loops to stop and waits for in-flight cache +// writes to drain before returning. When CleanUpOnShutdown is true, it then +// deletes every entry tagged from_bifrost_semantic_cache_plugin and drops +// the namespace — useful for ephemeral test environments. The default is to +// leave entries in place so they can serve subsequent process restarts. +func (plugin *Plugin) Cleanup() error { + plugin.cleanupOnce.Do(func() { + close(plugin.stopCh) + plugin.writersWg.Wait() + plugin.cleanupWg.Wait() + + // Final sweep: the periodic reaper only fires once per streamCleanupInterval, + // so any abandoned accumulator added in the window between the last tick + // and stopCh is still in memory. This call evicts those before we return. + plugin.cleanupOldStreamAccumulators() + }) return nil } -// SetEmbeddingRequestExecutor sets the embedding request executor for the plugin. -// Needs to be set before the plugin is used. -// -// Parameters: -// - executor: The embedding request executor to set +// SetEmbeddingRequestExecutor wires up the function the plugin uses to call +// out to the embedding provider. Must be set before the plugin starts +// serving traffic; semantic search is silently skipped while it's nil. func (plugin *Plugin) SetEmbeddingRequestExecutor(executor EmbeddingRequestExecutor) { plugin.embeddingRequestExecutor = executor } -// Public Methods for External Use - -// ClearCacheForKey deletes cache entries for a specific cache key. -// Uses the unified VectorStore interface for deletion of all entries with the given cache key. -// -// Parameters: -// - cacheKey: The specific cache key to delete -// -// Returns: -// - error: Any error that occurred during cache key deletion +// ClearCacheForKey deletes every entry written under the given cache_key. +// Use this to invalidate a tenant or feature scope in bulk. Per-entry +// deletion is available via ClearCacheForCacheID. func (plugin *Plugin) ClearCacheForKey(cacheKey string) error { - // Delete all entries with "cache_key" equal to the given cacheKey queries := []vectorstore.Query{ { Field: "cache_key", @@ -781,52 +759,35 @@ func (plugin *Plugin) ClearCacheForKey(cacheKey string) error { defer cancel() results, err := plugin.store.DeleteAll(ctx, plugin.config.VectorStoreNamespace, queries) if err != nil { - plugin.logger.Warn("%s Failed to delete cache entries for key '%s': %v", PluginLoggerPrefix, cacheKey, err) + plugin.logger.Warn("Failed to delete cache entries for key '%s': %v", cacheKey, err) return err } for _, result := range results { if result.Status == vectorstore.DeleteStatusError { - plugin.logger.Warn("%s Failed to delete cache entry for key %s: %s", PluginLoggerPrefix, result.ID, result.Error) + plugin.logger.Warn("Failed to delete cache entry for key %s: %s", result.ID, result.Error) } } - plugin.logger.Debug(fmt.Sprintf("%s Deleted all cache entries for key %s", PluginLoggerPrefix, cacheKey)) + plugin.logger.Debug("Deleted all cache entries for key %s", cacheKey) return nil } -// ClearCacheForRequestID deletes cache entries for a specific request ID. -// Uses the unified VectorStore interface to delete the single entry by its UUID. -// -// Parameters: -// - requestID: The UUID-based request ID to delete cache entries for -// -// Returns: -// - error: Any error that occurred during cache key deletion -func (plugin *Plugin) ClearCacheForRequestID(requestID string) error { - // With the unified VectorStore interface, we delete the single entry by its UUID +// ClearCacheForCacheID deletes a single cache entry by its storage ID. The +// caller obtains the ID from BifrostResponse.ExtraFields.CacheDebug.CacheID, +// which is stamped on both cache hits and cache misses — so the same handle +// works whether the request wrote the entry or read it. +func (plugin *Plugin) ClearCacheForCacheID(cacheID string) error { + if cacheID == "" { + return fmt.Errorf("cache ID is required") + } ctx, cancel := context.WithTimeout(context.Background(), CacheSetTimeout) defer cancel() - if err := plugin.store.Delete(ctx, plugin.config.VectorStoreNamespace, requestID); err != nil { - plugin.logger.Warn("%s Failed to delete cache entry: %v", PluginLoggerPrefix, err) + if err := plugin.store.Delete(ctx, plugin.config.VectorStoreNamespace, cacheID); err != nil { + plugin.logger.Warn("Failed to delete cache entry %s: %v", cacheID, err) return err } - - plugin.logger.Debug(fmt.Sprintf("%s Deleted cache entry for key %s", PluginLoggerPrefix, requestID)) - + plugin.logger.Debug("Deleted cache entry %s", cacheID) return nil } - -func (plugin *Plugin) clearRequestScopedContext(ctx *schemas.BifrostContext) { - ctx.ClearValue(requestIDKey) - ctx.ClearValue(requestStorageIDKey) - ctx.ClearValue(requestHashKey) - ctx.ClearValue(requestParamsHashKey) - ctx.ClearValue(requestModelKey) - ctx.ClearValue(requestProviderKey) - ctx.ClearValue(requestEmbeddingKey) - ctx.ClearValue(requestEmbeddingTokensKey) - ctx.ClearValue(isCacheHitKey) - ctx.ClearValue(cacheHitTypeKey) -} diff --git a/plugins/semanticcache/main_test.go b/plugins/semanticcache/main_test.go new file mode 100644 index 0000000000..0924fa726b --- /dev/null +++ b/plugins/semanticcache/main_test.go @@ -0,0 +1,39 @@ +package semanticcache + +import ( + "context" + "os" + "testing" + "time" + + bifrost "github.com/maximhq/bifrost/core" + "github.com/maximhq/bifrost/core/schemas" + "github.com/maximhq/bifrost/framework/vectorstore" +) + +// TestMain drops the shared test namespace BEFORE the run starts (in case a +// previous run was interrupted and left stale entries) AND once after — both +// matter: tests share one namespace + one cache_key prefix per t.Name(), +// so stale writes from a prior interrupted run would surface as spurious +// cache hits on the first request of the next run. +func TestMain(m *testing.M) { + dropSharedTestNamespace() // pre-run sweep + code := m.Run() + dropSharedTestNamespace() // post-run sweep + os.Exit(code) +} + +func dropSharedTestNamespace() { + cfg := getWeaviateConfigFromEnv() + store, err := vectorstore.NewVectorStore(context.Background(), &vectorstore.Config{ + Type: vectorstore.VectorStoreTypeWeaviate, + Config: cfg, + Enabled: true, + }, bifrost.NewDefaultLogger(schemas.LogLevelError)) + if err != nil { + return + } + ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) + defer cancel() + _ = store.DeleteNamespace(ctx, SharedTestNamespace) +} diff --git a/plugins/semanticcache/plugin_api_test.go b/plugins/semanticcache/plugin_api_test.go new file mode 100644 index 0000000000..78dbbd45af --- /dev/null +++ b/plugins/semanticcache/plugin_api_test.go @@ -0,0 +1,359 @@ +package semanticcache + +import ( + "context" + "sync" + "sync/atomic" + "testing" + "time" + + bifrost "github.com/maximhq/bifrost/core" + "github.com/maximhq/bifrost/core/schemas" + "github.com/maximhq/bifrost/framework/vectorstore" +) + +// observableStore is a fuller mock than directFastPathStore — it records all +// Delete / DeleteAll / DeleteNamespace calls so the tests can assert on the +// public Clear* APIs and on Cleanup teardown behavior. +type observableStore struct { + mu sync.Mutex + chunks map[string]vectorstore.SearchResult + addIDs []string + deleteIDs []string + deleteAllQueries [][]vectorstore.Query + namespaceDeletes int + deleteAllErr error + deleteErr error + deleteAllResults []vectorstore.DeleteResult +} + +func newObservableStore() *observableStore { + return &observableStore{chunks: make(map[string]vectorstore.SearchResult)} +} + +func (s *observableStore) Ping(ctx context.Context) error { return nil } +func (s *observableStore) CreateNamespace(ctx context.Context, ns string, dim int, props map[string]vectorstore.VectorStoreProperties) error { + return nil +} +func (s *observableStore) DeleteNamespace(ctx context.Context, ns string) error { + s.mu.Lock() + s.namespaceDeletes++ + s.mu.Unlock() + return nil +} +func (s *observableStore) GetChunk(ctx context.Context, ns string, id string) (vectorstore.SearchResult, error) { + s.mu.Lock() + defer s.mu.Unlock() + r, ok := s.chunks[id] + if !ok { + return vectorstore.SearchResult{}, vectorstore.ErrNotFound + } + return r, nil +} +func (s *observableStore) GetChunks(ctx context.Context, ns string, ids []string) ([]vectorstore.SearchResult, error) { + return nil, vectorstore.ErrNotSupported +} +func (s *observableStore) GetAll(ctx context.Context, ns string, q []vectorstore.Query, sf []string, cur *string, lim int64) ([]vectorstore.SearchResult, *string, error) { + return nil, nil, vectorstore.ErrNotSupported +} +func (s *observableStore) GetNearest(ctx context.Context, ns string, v []float32, q []vectorstore.Query, sf []string, th float64, lim int64) ([]vectorstore.SearchResult, error) { + return nil, vectorstore.ErrNotSupported +} +func (s *observableStore) RequiresVectors() bool { return false } +func (s *observableStore) Add(ctx context.Context, ns string, id string, e []float32, m map[string]interface{}) error { + s.mu.Lock() + s.addIDs = append(s.addIDs, id) + s.chunks[id] = vectorstore.SearchResult{ID: id, Properties: m} + s.mu.Unlock() + return nil +} +func (s *observableStore) Delete(ctx context.Context, ns string, id string) error { + s.mu.Lock() + s.deleteIDs = append(s.deleteIDs, id) + delete(s.chunks, id) + err := s.deleteErr + s.mu.Unlock() + return err +} +func (s *observableStore) DeleteAll(ctx context.Context, ns string, queries []vectorstore.Query) ([]vectorstore.DeleteResult, error) { + s.mu.Lock() + s.deleteAllQueries = append(s.deleteAllQueries, queries) + results := s.deleteAllResults + err := s.deleteAllErr + s.mu.Unlock() + return results, err +} +func (s *observableStore) Close(ctx context.Context, ns string) error { return nil } + +func newTestPlugin(t *testing.T, store vectorstore.VectorStore) *Plugin { + t.Helper() + cfg := getDefaultTestConfig() + return &Plugin{ + store: store, + config: cfg, + logger: bifrost.NewDefaultLogger(schemas.LogLevelDebug), + stopCh: make(chan struct{}), + } +} + +// ----------------------------------------------------------------------------- +// ClearCacheForCacheID +// ----------------------------------------------------------------------------- + +func TestClearCacheForCacheID_EmptyIDRejected(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + if err := plugin.ClearCacheForCacheID(""); err == nil { + t.Fatal("expected error for empty cache ID") + } +} + +func TestClearCacheForCacheID_PointDelete(t *testing.T) { + store := newObservableStore() + plugin := newTestPlugin(t, store) + + if err := plugin.ClearCacheForCacheID("cache-abc"); err != nil { + t.Fatalf("ClearCacheForCacheID failed: %v", err) + } + store.mu.Lock() + defer store.mu.Unlock() + if len(store.deleteIDs) != 1 || store.deleteIDs[0] != "cache-abc" { + t.Fatalf("expected single Delete call for 'cache-abc', got %v", store.deleteIDs) + } +} + +// ----------------------------------------------------------------------------- +// ClearCacheForKey +// ----------------------------------------------------------------------------- + +func TestClearCacheForKey_FiltersByCacheKeyAndPluginMarker(t *testing.T) { + store := newObservableStore() + plugin := newTestPlugin(t, store) + + if err := plugin.ClearCacheForKey("session-42"); err != nil { + t.Fatalf("ClearCacheForKey failed: %v", err) + } + + store.mu.Lock() + defer store.mu.Unlock() + if len(store.deleteAllQueries) != 1 { + t.Fatalf("expected one DeleteAll call, got %d", len(store.deleteAllQueries)) + } + queries := store.deleteAllQueries[0] + gotKey, gotMarker := false, false + for _, q := range queries { + if q.Field == "cache_key" && q.Value == "session-42" && q.Operator == vectorstore.QueryOperatorEqual { + gotKey = true + } + if q.Field == "from_bifrost_semantic_cache_plugin" && q.Value == true { + gotMarker = true + } + } + if !gotKey { + t.Errorf("expected cache_key=session-42 filter, got %+v", queries) + } + if !gotMarker { + t.Errorf("expected from_bifrost_semantic_cache_plugin=true filter, got %+v", queries) + } +} + +// ----------------------------------------------------------------------------- +// stampCacheDebugForMiss +// ----------------------------------------------------------------------------- + +func TestStampCacheDebugForMiss_AlwaysSetsCacheID(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + state := &cacheState{} + extra := &schemas.BifrostResponseExtraFields{} + + plugin.stampCacheDebugForMiss(state, extra, "stored-id-123", false, false) + + if extra.CacheDebug == nil { + t.Fatal("expected CacheDebug to be stamped on miss") + } + if extra.CacheDebug.CacheHit { + t.Fatal("expected CacheHit=false on miss") + } + if extra.CacheDebug.CacheID == nil || *extra.CacheDebug.CacheID != "stored-id-123" { + t.Fatalf("expected CacheID=stored-id-123, got %v", extra.CacheDebug.CacheID) + } + // No semantic search ran → embedding fields should be unset. + if extra.CacheDebug.ProviderUsed != nil || extra.CacheDebug.InputTokens != nil { + t.Fatalf("expected embedding fields nil on direct-only miss, got %+v", extra.CacheDebug) + } +} + +func TestStampCacheDebugForMiss_AddsTelemetryWhenSemanticRan(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + state := &cacheState{EmbeddingsInputTokens: 42} + extra := &schemas.BifrostResponseExtraFields{} + + plugin.stampCacheDebugForMiss(state, extra, "id-x", false, false) + + if extra.CacheDebug.InputTokens == nil || *extra.CacheDebug.InputTokens != 42 { + t.Fatalf("expected InputTokens=42, got %v", extra.CacheDebug.InputTokens) + } + if extra.CacheDebug.ProviderUsed == nil { + t.Fatal("expected ProviderUsed to be stamped when semantic ran") + } +} + +func TestStampCacheDebugForMiss_StreamSkipsNonFinalChunks(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + state := &cacheState{} + extra := &schemas.BifrostResponseExtraFields{} + + plugin.stampCacheDebugForMiss(state, extra, "id-y", true, false) // mid-stream + + if extra.CacheDebug != nil { + t.Fatal("expected mid-stream chunk to NOT be stamped") + } +} + +// ----------------------------------------------------------------------------- +// Cleanup +// ----------------------------------------------------------------------------- + +func TestCleanup_SkipsEntryDeletionWhenDisabled(t *testing.T) { + store := newObservableStore() + plugin := newTestPlugin(t, store) // CleanUpOnShutdown=false + + if err := plugin.Cleanup(); err != nil { + t.Fatalf("Cleanup failed: %v", err) + } + + store.mu.Lock() + defer store.mu.Unlock() + if len(store.deleteAllQueries) != 0 { + t.Errorf("expected no DeleteAll calls when cleanup disabled, got %d", len(store.deleteAllQueries)) + } + if store.namespaceDeletes != 0 { + t.Errorf("expected no DeleteNamespace calls when cleanup disabled, got %d", store.namespaceDeletes) + } +} + +func TestCleanup_DrainsPendingWriters(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + + var done atomic.Bool + plugin.writersWg.Add(1) + go func() { + defer plugin.writersWg.Done() + time.Sleep(50 * time.Millisecond) + done.Store(true) + }() + + if err := plugin.Cleanup(); err != nil { + t.Fatalf("Cleanup failed: %v", err) + } + if !done.Load() { + t.Fatal("expected Cleanup to wait for pending writers to finish") + } +} + +// ----------------------------------------------------------------------------- +// cacheState reaper +// ----------------------------------------------------------------------------- + +func TestCleanupOldCacheStates_ReapsOldEntries(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + + plugin.cacheStates.Store("old-1", &cacheState{CreatedAt: time.Now().Add(-2 * cacheStateMaxAge)}) + plugin.cacheStates.Store("old-2", &cacheState{CreatedAt: time.Now().Add(-2 * cacheStateMaxAge)}) + plugin.cacheStates.Store("recent", &cacheState{CreatedAt: time.Now()}) + + plugin.cleanupOldCacheStates() + + if _, ok := plugin.cacheStates.Load("old-1"); ok { + t.Error("expected old-1 to be reaped") + } + if _, ok := plugin.cacheStates.Load("old-2"); ok { + t.Error("expected old-2 to be reaped") + } + if _, ok := plugin.cacheStates.Load("recent"); !ok { + t.Error("expected recent to be preserved") + } +} + +// ----------------------------------------------------------------------------- +// Stream accumulator reaper +// ----------------------------------------------------------------------------- + +func TestCleanupOldStreamAccumulators_ReapsByLastSeenAt(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + + plugin.streamAccumulators.Store("old", &StreamAccumulator{ + RequestID: "old", + LastSeenAt: time.Now().Add(-2 * streamAccumulatorMaxAge), + }) + plugin.streamAccumulators.Store("recent", &StreamAccumulator{ + RequestID: "recent", + LastSeenAt: time.Now(), + }) + + plugin.cleanupOldStreamAccumulators() + + if _, ok := plugin.streamAccumulators.Load("old"); ok { + t.Error("expected old accumulator to be reaped") + } + if _, ok := plugin.streamAccumulators.Load("recent"); !ok { + t.Error("expected recent accumulator to be preserved") + } +} + +// ----------------------------------------------------------------------------- +// Replay goroutine cancellation (buildStreamingResponseFromResult) +// ----------------------------------------------------------------------------- + +func TestBuildStreamingResponseFromResult_ConsumerAbandonment(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + + // Build a cached entry with multiple chunks. + chunkJSON := `{"chat_response":{"choices":[]}}` + streamArray := []string{chunkJSON, chunkJSON, chunkJSON, chunkJSON, chunkJSON} + + req := &schemas.BifrostRequest{ + RequestType: schemas.ChatCompletionStreamRequest, + ChatRequest: CreateBasicChatRequest("hi", 0.7, 50), + } + ctx := newBaseTestContext() + state := &cacheState{} + + sc, err := plugin.buildStreamingResponseFromResult( + ctx, state, req, + vectorstore.SearchResult{ID: "stream-id"}, + streamArray, CacheTypeSemantic, nil, nil, nil, + ) + if err != nil { + t.Fatalf("buildStreamingResponseFromResult failed: %v", err) + } + if sc == nil || sc.Stream == nil { + t.Fatal("expected a stream short-circuit") + } + + // Read one chunk, then cancel ctx — the replay goroutine should exit + // (close the channel) instead of blocking on its send forever. + // Guard the first receive so a regression that stalls the producer + // fails fast instead of hanging until the suite-level timeout. + select { + case _, ok := <-sc.Stream: + if !ok { + t.Fatal("expected first replay chunk before cancellation, channel closed early") + } + case <-time.After(2 * time.Second): + t.Fatal("replay goroutine did not emit the first chunk") + } + ctx.Cancel() + + // Drain remaining; channel must close within a reasonable bound. + timeout := time.After(2 * time.Second) + for { + select { + case _, ok := <-sc.Stream: + if !ok { + return // channel closed → replay goroutine exited cleanly ✓ + } + case <-timeout: + t.Fatal("replay goroutine did not exit after ctx.Cancel()") + } + } +} diff --git a/plugins/semanticcache/plugin_cache_type_test.go b/plugins/semanticcache/plugin_cache_type_test.go index ee28902ae8..9d8d655a1d 100644 --- a/plugins/semanticcache/plugin_cache_type_test.go +++ b/plugins/semanticcache/plugin_cache_type_test.go @@ -2,7 +2,6 @@ package semanticcache import ( "context" - "errors" "sync" "testing" "time" @@ -14,24 +13,25 @@ import ( // TestCacheTypeDirectOnly tests that CacheTypeKey set to "direct" only performs direct hash matching func TestCacheTypeDirectOnly(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() // First, cache a response using CacheTypeDirect so it is stored under the deterministic ID - ctx1 := CreateContextWithCacheKeyAndType("test-cache-type-direct", CacheTypeDirect) + ctx1 := CreateContextWithCacheKeyAndType(t, "test-cache-type-direct", CacheTypeDirect) testRequest := CreateBasicChatRequest("What is Bifrost?", 0.7, 50) t.Log("Making first request to populate cache...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) WaitForCache(setup.Plugin) // Now test with CacheTypeKey set to direct only - ctx2 := CreateContextWithCacheKeyAndType("test-cache-type-direct", CacheTypeDirect) + ctx2 := CreateContextWithCacheKeyAndType(t, "test-cache-type-direct", CacheTypeDirect) t.Log("Making second request with CacheTypeKey=direct...") response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) @@ -47,17 +47,18 @@ func TestCacheTypeDirectOnly(t *testing.T) { // TestCacheTypeSemanticOnly tests that CacheTypeKey set to "semantic" only performs semantic search func TestCacheTypeSemanticOnly(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() // First, cache a response using normal behavior - ctx1 := CreateContextWithCacheKey("test-cache-type-semantic") + ctx1 := CreateContextWithCacheKey(t, "test-cache-type-semantic") testRequest := CreateBasicChatRequest("Explain machine learning concepts", 0.7, 50) t.Log("Making first request to populate cache...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) @@ -67,7 +68,7 @@ func TestCacheTypeSemanticOnly(t *testing.T) { similarRequest := CreateBasicChatRequest("Can you explain concepts in machine learning", 0.7, 50) // Try with semantic-only search - ctx2 := CreateContextWithCacheKeyAndType("test-cache-type-semantic", CacheTypeSemantic) + ctx2 := CreateContextWithCacheKeyAndType(t, "test-cache-type-semantic", CacheTypeSemantic) t.Log("Making second request with similar content and CacheTypeKey=semantic...") response2, err2 := setup.Client.ChatCompletionRequest(ctx2, similarRequest) @@ -79,9 +80,14 @@ func TestCacheTypeSemanticOnly(t *testing.T) { } } - // This might be a cache hit if semantic similarity is high enough - // The test validates that semantic search is attempted - if response2.ExtraFields.CacheDebug != nil && response2.ExtraFields.CacheDebug.CacheHit { + // This might be a cache hit if semantic similarity is high enough. + // Hit/miss is similarity-dependent, but CacheDebug must be stamped either + // way — semantic search ran. This catches a regression where the stamping + // stops without making the test flake on similarity scores. + if response2.ExtraFields.CacheDebug == nil { + t.Fatal("expected CacheDebug to be stamped on the response (semantic search should have run)") + } + if response2.ExtraFields.CacheDebug.CacheHit { AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}, "semantic") t.Log("✅ CacheTypeKey=semantic correctly found semantic match") } else { @@ -94,24 +100,25 @@ func TestCacheTypeSemanticOnly(t *testing.T) { // TestCacheTypeDirectWithSemanticFallback tests the default behavior (both direct and semantic) func TestCacheTypeDirectWithSemanticFallback(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() // Cache a response first - ctx1 := CreateContextWithCacheKey("test-cache-type-fallback") + ctx1 := CreateContextWithCacheKey(t, "test-cache-type-fallback") testRequest := CreateBasicChatRequest("Define artificial intelligence", 0.7, 50) t.Log("Making first request to populate cache...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) WaitForCache(setup.Plugin) // Test exact match (should hit direct cache) - ctx2 := CreateContextWithCacheKey("test-cache-type-fallback") + ctx2 := CreateContextWithCacheKey(t, "test-cache-type-fallback") t.Log("Making second identical request (should hit direct cache)...") response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) @@ -133,8 +140,12 @@ func TestCacheTypeDirectWithSemanticFallback(t *testing.T) { t.Fatalf("Third request failed: %v", err3) } - // May or may not be a cache hit depending on semantic similarity - if response3.ExtraFields.CacheDebug != nil && response3.ExtraFields.CacheDebug.CacheHit { + // May or may not be a cache hit depending on semantic similarity, but + // CacheDebug must be stamped (regression guard). + if response3.ExtraFields.CacheDebug == nil { + t.Fatal("expected CacheDebug to be stamped on the response") + } + if response3.ExtraFields.CacheDebug.CacheHit { AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}, "semantic") t.Log("✅ Default behavior correctly found semantic match") } else { @@ -145,49 +156,66 @@ func TestCacheTypeDirectWithSemanticFallback(t *testing.T) { t.Log("✅ Default behavior correctly attempts both direct and semantic search") } -// TestCacheTypeInvalidValue tests behavior with invalid CacheTypeKey values +// TestCacheTypeInvalidValue tests behavior with invalid CacheTypeKey values: +// the plugin must fall back to default behavior (try both direct + semantic) +// rather than disable caching entirely. func TestCacheTypeInvalidValue(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - // Create context with invalid cache type - ctx := CreateContextWithCacheKey("test-invalid-cache-type") - ctx = ctx.WithValue(CacheTypeKey, "invalid_type") - testRequest := CreateBasicChatRequest("Test invalid cache type", 0.7, 50) - t.Log("Making request with invalid CacheTypeKey value...") - response, err := setup.Client.ChatCompletionRequest(ctx, testRequest) + // First request with invalid CacheTypeKey — must be a miss but ALSO must + // have caused the response to be cached (fallback to default behavior). + ctx1 := CreateContextWithCacheKey(t, "test-invalid-cache-type") + ctx1 = ctx1.WithValue(CacheTypeKey, "invalid_type") + + t.Log("Making first request with invalid CacheTypeKey value...") + response1, err := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err) } + AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) - // Should fall back to default behavior (both direct and semantic) - AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response}) + WaitForCache(setup.Plugin) - t.Log("✅ Invalid CacheTypeKey value falls back to default behavior") + // Second identical request — fallback should mean the entry was written + // the first time, so this must hit (proves the invalid value didn't + // disable caching as a side effect). + ctx2 := CreateContextWithCacheKey(t, "test-invalid-cache-type") + ctx2 = ctx2.WithValue(CacheTypeKey, "invalid_type") + t.Log("Making second identical request — must hit cache, proving fallback to default cached the first call...") + response2, err := setup.Client.ChatCompletionRequest(ctx2, testRequest) + if err != nil { + t.Fatalf("Second request failed: %v", err) + } + AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}, string(CacheTypeDirect)) + + t.Log("✅ Invalid CacheTypeKey value falls back to default behavior (caching works)") } // TestCacheTypeWithEmbeddingRequests tests CacheTypeKey behavior with embedding requests func TestCacheTypeWithEmbeddingRequests(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() embeddingRequest := CreateEmbeddingRequest([]string{"Test embedding with cache type"}) // Cache first request - ctx1 := CreateContextWithCacheKey("test-embedding-cache-type") + ctx1 := CreateContextWithCacheKey(t, "test-embedding-cache-type") t.Log("Making first embedding request...") response1, err1 := setup.Client.EmbeddingRequest(ctx1, embeddingRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response1}) WaitForCache(setup.Plugin) // Test with direct-only cache type - ctx2 := CreateContextWithCacheKeyAndType("test-embedding-cache-type", CacheTypeDirect) + ctx2 := CreateContextWithCacheKeyAndType(t, "test-embedding-cache-type", CacheTypeDirect) t.Log("Making second embedding request with CacheTypeKey=direct...") response2, err2 := setup.Client.EmbeddingRequest(ctx2, embeddingRequest) if err2 != nil { @@ -200,7 +228,7 @@ func TestCacheTypeWithEmbeddingRequests(t *testing.T) { AssertCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response2}, "direct") // Test with semantic-only cache type (should not find semantic match for embeddings) - ctx3 := CreateContextWithCacheKeyAndType("test-embedding-cache-type", CacheTypeSemantic) + ctx3 := CreateContextWithCacheKeyAndType(t, "test-embedding-cache-type", CacheTypeSemantic) t.Log("Making third embedding request with CacheTypeKey=semantic...") response3, err3 := setup.Client.EmbeddingRequest(ctx3, embeddingRequest) if err3 != nil { @@ -214,24 +242,25 @@ func TestCacheTypeWithEmbeddingRequests(t *testing.T) { // TestCacheTypePerformanceCharacteristics tests that different cache types have expected performance func TestCacheTypePerformanceCharacteristics(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() testRequest := CreateBasicChatRequest("Performance test for cache types", 0.7, 50) // Cache first request using CacheTypeDirect so it is stored under the deterministic ID - ctx1 := CreateContextWithCacheKeyAndType("test-cache-performance", CacheTypeDirect) + ctx1 := CreateContextWithCacheKeyAndType(t, "test-cache-performance", CacheTypeDirect) t.Log("Making first request to populate cache...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) WaitForCache(setup.Plugin) // Test direct-only performance - ctx2 := CreateContextWithCacheKeyAndType("test-cache-performance", CacheTypeDirect) + ctx2 := CreateContextWithCacheKeyAndType(t, "test-cache-performance", CacheTypeDirect) start2 := time.Now() response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) duration2 := time.Since(start2) @@ -243,7 +272,7 @@ func TestCacheTypePerformanceCharacteristics(t *testing.T) { t.Logf("Direct cache lookup took: %v", duration2) // Test default behavior (both direct and semantic) performance - ctx3 := CreateContextWithCacheKey("test-cache-performance") + ctx3 := CreateContextWithCacheKey(t, "test-cache-performance") start3 := time.Now() response3, err3 := setup.Client.ChatCompletionRequest(ctx3, testRequest) duration3 := time.Since(start3) @@ -254,8 +283,17 @@ func TestCacheTypePerformanceCharacteristics(t *testing.T) { t.Logf("Default cache lookup took: %v", duration3) - // Both should be fast since they hit direct cache - // Direct-only might be slightly faster as it doesn't need to prepare for semantic fallback + // Both lookups hit direct cache so both must be substantially faster than + // a real upstream call. Compare against an upper bound rather than each + // other (relative comparisons flake under CI load); 1s is generous and + // still proves a cached lookup didn't silently hit the network. + const upperBoundForCacheLookup = 1 * time.Second + if duration2 > upperBoundForCacheLookup { + t.Errorf("direct-only cache lookup took %v, expected < %v (provider likely called)", duration2, upperBoundForCacheLookup) + } + if duration3 > upperBoundForCacheLookup { + t.Errorf("default-mode cache lookup took %v, expected < %v (provider likely called)", duration3, upperBoundForCacheLookup) + } t.Log("✅ Cache type performance characteristics validated") } @@ -367,7 +405,7 @@ func TestDirectCacheHitPreservesCachedProviderMetadataAcrossProviders(t *testing const cacheKey = "cross-provider-direct-single" const prompt = "Explain green threading in Go in one short sentence." - seedCtx := CreateContextWithCacheKeyAndType(cacheKey, CacheTypeDirect) + seedCtx := CreateContextWithCacheKeyAndType(t, cacheKey, CacheTypeDirect) seedReq := newCrossProviderChatRequest(schemas.OpenAI, "gpt-5.2", schemas.ChatCompletionRequest, prompt) _, shortCircuit, err := plugin.PreLLMHook(seedCtx, seedReq) @@ -407,7 +445,7 @@ func TestDirectCacheHitPreservesCachedProviderMetadataAcrossProviders(t *testing } plugin.WaitForPendingOperations() - hitCtx := CreateContextWithCacheKeyAndType(cacheKey, CacheTypeDirect) + hitCtx := CreateContextWithCacheKeyAndType(t, cacheKey, CacheTypeDirect) hitReq := newCrossProviderChatRequest(schemas.Anthropic, "claude-sonnet-4-6", schemas.ChatCompletionRequest, prompt) _, shortCircuit, err = plugin.PreLLMHook(hitCtx, hitReq) @@ -461,7 +499,7 @@ func TestStreamingDirectCacheHitPreservesCachedProviderMetadataAcrossProviders(t const cacheKey = "cross-provider-direct-stream" const prompt = "Explain green threading in Go in one short sentence." - seedCtx := CreateContextWithCacheKeyAndType(cacheKey, CacheTypeDirect) + seedCtx := CreateContextWithCacheKeyAndType(t, cacheKey, CacheTypeDirect) seedReq := newCrossProviderChatRequest(schemas.OpenAI, "gpt-5.2", schemas.ChatCompletionStreamRequest, prompt) _, shortCircuit, err := plugin.PreLLMHook(seedCtx, seedReq) @@ -514,7 +552,7 @@ func TestStreamingDirectCacheHitPreservesCachedProviderMetadataAcrossProviders(t plugin.WaitForPendingOperations() } - hitCtx := CreateContextWithCacheKeyAndType(cacheKey, CacheTypeDirect) + hitCtx := CreateContextWithCacheKeyAndType(t, cacheKey, CacheTypeDirect) hitReq := newCrossProviderChatRequest(schemas.Anthropic, "claude-sonnet-4-6", schemas.ChatCompletionStreamRequest, prompt) _, shortCircuit, err = plugin.PreLLMHook(hitCtx, hitReq) @@ -564,6 +602,29 @@ func TestStreamingDirectCacheHitPreservesCachedProviderMetadataAcrossProviders(t } } +// runDirectSearchForTest is a small helper for the unit tests that directly +// exercise performDirectSearch. It builds the metadata + paramsHash + state +// the way PreLLMHook would and then calls the search. +func runDirectSearchForTest(t *testing.T, plugin *Plugin, ctx *schemas.BifrostContext, req *schemas.BifrostRequest, cacheKey string) (*cacheState, *schemas.LLMPluginShortCircuit, error) { + t.Helper() + requestID, _ := ctx.Value(schemas.BifrostContextKeyRequestID).(string) + if requestID == "" { + t.Fatal("test context is missing request ID") + } + state := plugin.createCacheState(requestID) + metadata, err := plugin.buildRequestMetadataForCaching(state, req) + if err != nil { + t.Fatalf("buildRequestMetadataForCaching failed: %v", err) + } + paramsHash, err := hashMap(metadata) + if err != nil { + t.Fatalf("hashMap failed: %v", err) + } + state.ParamsHash = paramsHash + sc, err := plugin.performDirectSearch(ctx, state, req, cacheKey, metadata, paramsHash) + return state, sc, err +} + func TestCacheTypeDirectUsesChunkLookup(t *testing.T) { logger := bifrost.NewDefaultLogger(schemas.LogLevelDebug) store := newDirectFastPathStore() @@ -578,10 +639,15 @@ func TestCacheTypeDirectUsesChunkLookup(t *testing.T) { ChatRequest: CreateBasicChatRequest("What is Bifrost?", 0.7, 50), } - ctx := CreateContextWithCacheKeyAndType("chunk-fast-path", CacheTypeDirect) - directID, err := plugin.prepareDirectCacheLookup(ctx, req, "chunk-fast-path") + // First pass: warm the deterministic cache ID and learn what it is. + ctx := CreateContextWithCacheKeyAndType(t, "chunk-fast-path", CacheTypeDirect) + state, _, err := runDirectSearchForTest(t, plugin, ctx, req, "chunk-fast-path") if err != nil { - t.Fatalf("prepareDirectCacheLookup failed: %v", err) + t.Fatalf("performDirectSearch failed: %v", err) + } + directID := state.DirectCacheID + if directID == "" { + t.Fatal("expected DirectCacheID to be populated") } cachedContent := "cached response" @@ -614,15 +680,18 @@ func TestCacheTypeDirectUsesChunkLookup(t *testing.T) { }, } - shortCircuit, err := plugin.performDirectChunkLookup(ctx, req, "chunk-fast-path") + // Second pass: should hit the chunk we just stored, via point-fetch only. + priorChunkCalls := store.getChunkCalls + ctx2 := CreateContextWithCacheKeyAndType(t, "chunk-fast-path", CacheTypeDirect) + _, shortCircuit, err := runDirectSearchForTest(t, plugin, ctx2, req, "chunk-fast-path") if err != nil { - t.Fatalf("performDirectChunkLookup failed: %v", err) + t.Fatalf("second performDirectSearch failed: %v", err) } if shortCircuit == nil || shortCircuit.Response == nil || shortCircuit.Response.ChatResponse == nil { t.Fatal("expected direct chunk lookup to return cached response") } - if store.getChunkCalls != 1 { - t.Fatalf("expected one GetChunk call, got %d", store.getChunkCalls) + if store.getChunkCalls != priorChunkCalls+1 { + t.Fatalf("expected one additional GetChunk call, got %d total", store.getChunkCalls) } if store.getAllCalls != 0 { t.Fatalf("expected no GetAll calls, got %d", store.getAllCalls) @@ -646,22 +715,22 @@ func TestDefaultDirectSearchSetsStorageIDForDeterministicWrites(t *testing.T) { ChatRequest: CreateBasicChatRequest("What is Bifrost?", 0.7, 50), } - ctx := CreateContextWithCacheKey("default-mode") - _, err := plugin.performDirectSearch(ctx, req, "default-mode") - if err != nil && !errors.Is(err, vectorstore.ErrNotSupported) { + ctx := CreateContextWithCacheKey(t, "default-mode") + state, _, err := runDirectSearchForTest(t, plugin, ctx, req, "default-mode") + if err != nil { t.Fatalf("performDirectSearch failed: %v", err) } - - storageID, _ := ctx.Value(requestStorageIDKey).(string) - if storageID == "" { - t.Fatal("expected default direct search to set requestStorageIDKey") + if state.DirectCacheID == "" { + t.Fatal("expected default direct search to populate state.DirectCacheID") } if store.getChunkCalls != 1 { t.Fatalf("expected one GetChunk call, got %d", store.getChunkCalls) } } -func TestPreLLMHookClearsStaleStorageIDOnReusedContext(t *testing.T) { +// TestPreLLMHookResetsStateOnReusedRequestID verifies that a second PreLLMHook +// call for the same request ID overwrites any prior state instead of inheriting it. +func TestPreLLMHookResetsStateOnReusedRequestID(t *testing.T) { logger := bifrost.NewDefaultLogger(schemas.LogLevelDebug) store := newDirectFastPathStore() config := getDefaultTestConfig() @@ -677,19 +746,29 @@ func TestPreLLMHookClearsStaleStorageIDOnReusedContext(t *testing.T) { ChatRequest: CreateBasicChatRequest("What is Bifrost?", 0.7, 50), } - ctx := CreateContextWithCacheKey("reused-context") - ctx.SetValue(requestStorageIDKey, "stale-storage-id") + ctx := CreateContextWithCacheKey(t, "reused-context") + requestID, _ := ctx.Value(schemas.BifrostContextKeyRequestID).(string) + // Seed stale state under the same request ID. + stale := plugin.createCacheState(requestID) + stale.DirectCacheID = "stale-storage-id" + stale.ParamsHash = "stale-params-hash" if _, _, err := plugin.PreLLMHook(ctx, req); err != nil { t.Fatalf("PreLLMHook failed: %v", err) } - storageID, _ := ctx.Value(requestStorageIDKey).(string) - if storageID == "" { - t.Fatal("expected PreLLMHook to replace stale requestStorageIDKey with a deterministic id") + state := plugin.getCacheState(requestID) + if state == nil { + t.Fatal("expected cache state to be present after PreLLMHook") + } + if state == stale { + t.Fatal("expected PreLLMHook to replace the stale state object") } - if storageID == "stale-storage-id" { - t.Fatal("expected PreLLMHook to clear stale requestStorageIDKey before setting a deterministic id") + if state.DirectCacheID == "" { + t.Fatal("expected PreLLMHook to populate a deterministic DirectCacheID") + } + if state.DirectCacheID == "stale-storage-id" { + t.Fatal("expected PreLLMHook to clear stale DirectCacheID before populating a new one") } } @@ -707,16 +786,17 @@ func TestCacheTypeDirectStoresDeterministicID(t *testing.T) { RequestType: schemas.ChatCompletionRequest, ChatRequest: CreateBasicChatRequest("What is Bifrost?", 0.7, 50), } - ctx := CreateContextWithCacheKeyAndType("deterministic-store", CacheTypeDirect) - ctx.SetValue(requestIDKey, "request-uuid") - ctx.SetValue(requestProviderKey, schemas.OpenAI) - ctx.SetValue(requestModelKey, req.ChatRequest.Model) + ctx := CreateContextWithCacheKeyAndType(t, "deterministic-store", CacheTypeDirect) - directID, err := plugin.prepareDirectCacheLookup(ctx, req, "deterministic-store") - if err != nil { - t.Fatalf("prepareDirectCacheLookup failed: %v", err) + if _, _, err := plugin.PreLLMHook(ctx, req); err != nil { + t.Fatalf("PreLLMHook failed: %v", err) + } + requestID, _ := ctx.Value(schemas.BifrostContextKeyRequestID).(string) + state := plugin.getCacheState(requestID) + if state == nil || state.DirectCacheID == "" { + t.Fatal("expected PreLLMHook to populate state.DirectCacheID") } - ctx.SetValue(requestStorageIDKey, directID) + directID := state.DirectCacheID content := "stored response" response := &schemas.BifrostResponse{ @@ -749,8 +829,8 @@ func TestCacheTypeDirectStoresDeterministicID(t *testing.T) { if store.addIDs[0] != directID { t.Fatalf("expected deterministic storage id %q, got %q", directID, store.addIDs[0]) } - if store.addIDs[0] == "request-uuid" { - t.Fatal("expected storage id to differ from request UUID") + if store.addIDs[0] == requestID { + t.Fatal("expected storage id to differ from request ID") } } @@ -763,6 +843,24 @@ func TestPostLLMHookUsesDeterministicStorageIDOutsideDirectMode(t *testing.T) { logger: logger, } + req := &schemas.BifrostRequest{ + RequestType: schemas.ChatCompletionRequest, + ChatRequest: CreateBasicChatRequest("What is Bifrost?", 0.7, 50), + } + + // Default mode (no CacheTypeKey) should still produce a deterministic + // storage ID via the direct-search path that PreLLMHook always runs. + ctx := CreateContextWithCacheKey(t, "default-mode-store") + if _, _, err := plugin.PreLLMHook(ctx, req); err != nil { + t.Fatalf("PreLLMHook failed: %v", err) + } + requestID, _ := ctx.Value(schemas.BifrostContextKeyRequestID).(string) + state := plugin.getCacheState(requestID) + if state == nil || state.DirectCacheID == "" { + t.Fatal("expected default-mode PreLLMHook to populate state.DirectCacheID") + } + directID := state.DirectCacheID + content := "stored response" response := &schemas.BifrostResponse{ ChatResponse: &schemas.BifrostChatResponse{ @@ -782,16 +880,6 @@ func TestPostLLMHookUsesDeterministicStorageIDOutsideDirectMode(t *testing.T) { } response.ChatResponse.ExtraFields.RequestType = schemas.ChatCompletionRequest - ctx := CreateContextWithCacheKey("default-mode-store") - ctx.SetValue(requestIDKey, "request-uuid") - ctx.SetValue(requestProviderKey, schemas.OpenAI) - ctx.SetValue(requestModelKey, "openai/gpt-4o-mini") - ctx.SetValue(requestHashKey, "request-hash") - ctx.SetValue(requestParamsHashKey, "params-hash") - - directID := plugin.generateDirectCacheID(schemas.OpenAI, "openai/gpt-4o-mini", "default-mode-store", "request-hash", "params-hash") - ctx.SetValue(requestStorageIDKey, directID) - if _, _, err := plugin.PostLLMHook(ctx, response, nil); err != nil { t.Fatalf("PostLLMHook failed: %v", err) } @@ -806,67 +894,6 @@ func TestPostLLMHookUsesDeterministicStorageIDOutsideDirectMode(t *testing.T) { } } -func TestPerformDirectSearchDisablesScanFallbackForLegacyLookup(t *testing.T) { - logger := bifrost.NewDefaultLogger(schemas.LogLevelDebug) - store := newDirectFastPathStore() - plugin := &Plugin{ - store: store, - config: getDefaultTestConfig(), - logger: logger, - } - - req := &schemas.BifrostRequest{ - RequestType: schemas.ChatCompletionRequest, - ChatRequest: CreateBasicChatRequest("What is Bifrost?", 0.7, 50), - } - - ctx := CreateContextWithCacheKey("legacy-no-scan") - _, err := plugin.performDirectSearch(ctx, req, "legacy-no-scan") - if err != nil && !errors.Is(err, vectorstore.ErrNotSupported) { - t.Fatalf("performDirectSearch failed: %v", err) - } - - if store.getAllCalls != 1 { - t.Fatalf("expected one legacy GetAll call, got %d", store.getAllCalls) - } - if !vectorstore.IsScanFallbackDisabled(store.lastGetAllCtx) { - t.Fatal("expected legacy direct lookup to disable scan fallback") - } -} - -func TestPerformLegacyDirectSearchTreatsQuerySyntaxErrorAsMiss(t *testing.T) { - logger := bifrost.NewDefaultLogger(schemas.LogLevelDebug) - store := newDirectFastPathStore() - store.getAllErr = vectorstore.ErrQuerySyntax - plugin := &Plugin{ - store: store, - config: getDefaultTestConfig(), - logger: logger, - } - - req := &schemas.BifrostRequest{ - RequestType: schemas.ChatCompletionRequest, - ChatRequest: CreateBasicChatRequest("What is Bifrost?", 0.7, 50), - } - - ctx := CreateContextWithCacheKey("legacy-query-syntax") - _, err := plugin.prepareDirectCacheLookup(ctx, req, "legacy-query-syntax") - if err != nil { - t.Fatalf("prepareDirectCacheLookup failed: %v", err) - } - - shortCircuit, err := plugin.performLegacyDirectSearch(ctx, req, "legacy-query-syntax") - if err != nil { - t.Fatalf("performLegacyDirectSearch failed: %v", err) - } - if shortCircuit != nil { - t.Fatal("expected query syntax incompatibility to be treated as a miss") - } - if store.getAllCalls != 1 { - t.Fatalf("expected one legacy GetAll call, got %d", store.getAllCalls) - } -} - func TestGetOrCreateStreamAccumulatorUsesSingleAccumulatorPerRequest(t *testing.T) { logger := bifrost.NewDefaultLogger(schemas.LogLevelDebug) plugin := &Plugin{ diff --git a/plugins/semanticcache/plugin_conversation_config_test.go b/plugins/semanticcache/plugin_conversation_config_test.go index 7c4d0e72c2..98c1398aa8 100644 --- a/plugins/semanticcache/plugin_conversation_config_test.go +++ b/plugins/semanticcache/plugin_conversation_config_test.go @@ -14,7 +14,7 @@ func TestConversationHistoryThresholdBasic(t *testing.T) { setup := CreateTestSetupWithConversationThreshold(t, 2) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-conversation-threshold-basic") + ctx := CreateContextWithCacheKey(t, "test-conversation-threshold-basic") // Test 1: Conversation with exactly 2 messages (should cache) conversation1 := BuildConversationHistory("", @@ -25,7 +25,7 @@ func TestConversationHistoryThresholdBasic(t *testing.T) { t.Log("Testing conversation with exactly 2 messages (at threshold)...") response1, err1 := setup.Client.ChatCompletionRequest(ctx, request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) // Fresh request @@ -53,17 +53,18 @@ func TestConversationHistoryThresholdBasic(t *testing.T) { t.Log("Testing conversation with 5 messages (exceeds threshold)...") response3, err3 := setup.Client.ChatCompletionRequest(ctx, request2) if err3 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err3) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}) // Should not cache WaitForCache(setup.Plugin) - // Verify it was NOT cached + // Verify it was NOT cached. The first call already succeeded, so any error + // here is a real regression rather than upstream flakiness — fail fast. t.Log("Verifying conversation exceeding threshold was not cached...") response4, err4 := setup.Client.ChatCompletionRequest(ctx, request2) if err4 != nil { - return // Test will be skipped by retry function + t.Fatalf("verification request failed: %v", err4) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response4}) // Should still be fresh (not cached) @@ -76,7 +77,7 @@ func TestConversationHistoryThresholdWithSystemPrompt(t *testing.T) { setup := CreateTestSetupWithConversationThreshold(t, 3) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-threshold-system-prompt") + ctx := CreateContextWithCacheKey(t, "test-threshold-system-prompt") // System prompt + 2 user/assistant pairs = 5 messages total > 3 conversation := BuildConversationHistory( @@ -89,16 +90,17 @@ func TestConversationHistoryThresholdWithSystemPrompt(t *testing.T) { t.Log("Testing conversation with system prompt (5 total messages > 3 threshold)...") response1, err1 := setup.Client.ChatCompletionRequest(ctx, request) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) // Should not cache (exceeds threshold) WaitForCache(setup.Plugin) - // Verify not cached + // Verify not cached. First call already succeeded, so failures here + // indicate a real regression rather than upstream flakiness. response2, err2 := setup.Client.ChatCompletionRequest(ctx, request) if err2 != nil { - return // Test will be skipped by retry function + t.Fatalf("verification request failed: %v", err2) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}) // Should not be cached @@ -111,7 +113,7 @@ func TestConversationHistoryThresholdWithExcludeSystemPrompt(t *testing.T) { setup := CreateTestSetupWithThresholdAndExcludeSystem(t, 3, true) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-threshold-exclude-system") + ctx := CreateContextWithCacheKey(t, "test-threshold-exclude-system") // Create conversation with exactly 3 non-system messages to test threshold boundary // System + 1.5 user/assistant pairs = 4 messages total @@ -133,7 +135,7 @@ func TestConversationHistoryThresholdWithExcludeSystemPrompt(t *testing.T) { response1, err1 := setup.Client.ChatCompletionRequest(ctx, request) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) // Fresh request, should not hit cache @@ -172,7 +174,7 @@ func TestConversationHistoryThresholdDifferentValues(t *testing.T) { setup := CreateTestSetupWithConversationThreshold(t, tc.threshold) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-threshold-" + tc.name) + ctx := CreateContextWithCacheKey(t, "test-threshold-" + tc.name) // Build conversation with specified number of messages var conversation []schemas.ChatMessage @@ -194,7 +196,7 @@ func TestConversationHistoryThresholdDifferentValues(t *testing.T) { response1, err1 := setup.Client.ChatCompletionRequest(ctx, request) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) // Always fresh first time @@ -202,7 +204,7 @@ func TestConversationHistoryThresholdDifferentValues(t *testing.T) { response2, err2 := setup.Client.ChatCompletionRequest(ctx, request) if err2 != nil { - return // Test will be skipped by retry function + t.Fatalf("verification request failed: %v", err2) } if tc.shouldCache { @@ -222,7 +224,7 @@ func TestExcludeSystemPromptBasic(t *testing.T) { setup := CreateTestSetupWithExcludeSystemPrompt(t, true) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-exclude-system-basic") + ctx := CreateContextWithCacheKey(t, "test-exclude-system-basic") // Create two conversations with different system prompts but same user/assistant messages conversation1 := BuildConversationHistory( @@ -241,7 +243,7 @@ func TestExcludeSystemPromptBasic(t *testing.T) { t.Log("Caching conversation with system prompt 1...") response1, err1 := setup.Client.ChatCompletionRequest(ctx, request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) @@ -268,7 +270,7 @@ func TestExcludeSystemPromptComparison(t *testing.T) { setup1 := CreateTestSetupWithExcludeSystemPrompt(t, false) defer setup1.Cleanup() - ctx1 := CreateContextWithCacheKey("test-exclude-system-false") + ctx1 := CreateContextWithCacheKey(t, "test-exclude-system-false") conversation1 := BuildConversationHistory( "You are helpful", @@ -286,7 +288,7 @@ func TestExcludeSystemPromptComparison(t *testing.T) { t.Log("Testing ExcludeSystemPrompt=false...") response1, err1 := setup1.Client.ChatCompletionRequest(ctx1, request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) @@ -315,12 +317,12 @@ func TestExcludeSystemPromptComparison(t *testing.T) { setup2 := CreateTestSetupWithExcludeSystemPrompt(t, true) defer setup2.Cleanup() - ctx2 := CreateContextWithCacheKey("test-exclude-system-true") + ctx2 := CreateContextWithCacheKey(t, "test-exclude-system-true") t.Log("Testing ExcludeSystemPrompt=true...") response3, err3 := setup2.Client.ChatCompletionRequest(ctx2, request1) if err3 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err3) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}) @@ -341,7 +343,7 @@ func TestExcludeSystemPromptWithMultipleSystemMessages(t *testing.T) { setup := CreateTestSetupWithExcludeSystemPrompt(t, true) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-multiple-system-messages") + ctx := CreateContextWithCacheKey(t, "test-multiple-system-messages") // Manually create conversation with multiple system messages conversation1 := []schemas.ChatMessage{ @@ -388,7 +390,7 @@ func TestExcludeSystemPromptWithMultipleSystemMessages(t *testing.T) { t.Log("Caching conversation with multiple system messages...") response1, err1 := setup.Client.ChatCompletionRequest(ctx, request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) @@ -414,7 +416,7 @@ func TestExcludeSystemPromptWithNoSystemMessages(t *testing.T) { setup := CreateTestSetupWithExcludeSystemPrompt(t, true) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-no-system-messages") + ctx := CreateContextWithCacheKey(t, "test-no-system-messages") // Conversation with no system messages conversation := []schemas.ChatMessage{ @@ -433,7 +435,7 @@ func TestExcludeSystemPromptWithNoSystemMessages(t *testing.T) { t.Log("Testing conversation with no system messages...") response1, err1 := setup.Client.ChatCompletionRequest(ctx, request) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) diff --git a/plugins/semanticcache/plugin_core_test.go b/plugins/semanticcache/plugin_core_test.go index 5bed26528d..a280f662f6 100644 --- a/plugins/semanticcache/plugin_core_test.go +++ b/plugins/semanticcache/plugin_core_test.go @@ -10,12 +10,17 @@ import ( "github.com/maximhq/bifrost/framework/vectorstore" ) -// TestSemanticCacheBasicFunctionality tests the core caching functionality +// TestSemanticCacheBasicFunctionality tests the core caching functionality. +// +// Intentionally NOT parallel: the assertions at the bottom of this function +// enforce wall-clock comparisons (cache must be faster than upstream, with at +// least 1.5× speedup). Running this in parallel with other integration tests +// causes CPU/network contention that flakes those ratios. func TestSemanticCacheBasicFunctionality(t *testing.T) { setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-basic-value") + ctx := CreateContextWithCacheKey(t, "test-basic-value") // Create test request testRequest := CreateBasicChatRequest( @@ -32,7 +37,7 @@ func TestSemanticCacheBasicFunctionality(t *testing.T) { duration1 := time.Since(start1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } if response1 == nil || len(response1.Choices) == 0 || response1.Choices[0].Message.Content.ContentStr == nil { @@ -106,13 +111,14 @@ func TestSemanticCacheBasicFunctionality(t *testing.T) { // TestSemanticSearch tests the semantic similarity search functionality func TestSemanticSearch(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() // Lower threshold for more flexible matching setup.Config.Threshold = 0.5 - ctx := CreateContextWithCacheKey("semantic-test-value") + ctx := CreateContextWithCacheKey(t, "semantic-test-value") // First request - this will be cached firstRequest := CreateBasicChatRequest( @@ -127,7 +133,7 @@ func TestSemanticSearch(t *testing.T) { duration1 := time.Since(start1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } if response1 == nil || len(response1.Choices) == 0 || response1.Choices[0].Message.Content.ContentStr == nil { @@ -209,7 +215,7 @@ func TestSemanticSearch(t *testing.T) { func TestToFloat32Embedding(t *testing.T) { input := []float64{0.12345678901234568, -0.875, 1.5} - got := toFloat32Embedding(input) + got := float64ToFloat32Embedding(input) if len(got) != len(input) { t.Fatalf("expected %d elements, got %d", len(input), len(got)) @@ -246,13 +252,14 @@ func TestFlattenToFloat32Embedding(t *testing.T) { // TestDirectVsSemanticSearch tests the difference between direct hash matching and semantic search func TestDirectVsSemanticSearch(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() // Lower threshold for more flexible semantic matching setup.Config.Threshold = 0.2 - ctx := CreateContextWithCacheKey("direct-vs-semantic-test") + ctx := CreateContextWithCacheKey(t, "direct-vs-semantic-test") // Test Case 1: Exact same request (should use direct hash matching) t.Log("=== Test Case 1: Exact Same Request (Direct Hash Match) ===") @@ -266,7 +273,7 @@ func TestDirectVsSemanticSearch(t *testing.T) { t.Log("Making first request...") _, err1 := setup.Client.ChatCompletionRequest(ctx, exactRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } WaitForCache(setup.Plugin) @@ -330,10 +337,11 @@ func TestDirectVsSemanticSearch(t *testing.T) { // TestNoCacheScenarios tests scenarios where caching should NOT occur func TestNoCacheScenarios(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("no-cache-test") + ctx := CreateContextWithCacheKey(t, "no-cache-test") // Test Case 1: Different parameters should NOT cache hit t.Log("=== Test Case 1: Different Parameters ===") @@ -344,7 +352,7 @@ func TestNoCacheScenarios(t *testing.T) { request1 := CreateBasicChatRequest(basePrompt, 0.1, 50) _, err1 := setup.Client.ChatCompletionRequest(ctx, request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } WaitForCache(setup.Plugin) @@ -353,7 +361,7 @@ func TestNoCacheScenarios(t *testing.T) { request2 := CreateBasicChatRequest(basePrompt, 0.9, 50) // Different temperature response2, err2 := setup.Client.ChatCompletionRequest(ctx, request2) if err2 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err2) } // Should NOT be cached @@ -365,17 +373,28 @@ func TestNoCacheScenarios(t *testing.T) { request3 := CreateBasicChatRequest(basePrompt, 0.1, 200) // Different max_tokens response3, err3 := setup.Client.ChatCompletionRequest(ctx, request3) if err3 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err3) } - - // Should NOT be cached AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}) + WaitForCache(setup.Plugin) + + // Make request3 a SECOND time. The miss above could be a miss for the + // wrong reason (e.g. caching disabled entirely). A second-call hit + // confirms (a) request3's params produce a distinct cache_key from the + // earlier requests AND (b) caching itself is functioning under this ctx. + response3Again, err := setup.Client.ChatCompletionRequest(ctx, request3) + if err != nil { + t.Fatalf("Repeat of request3 failed: %v", err) + } + AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3Again}, string(CacheTypeDirect)) + t.Log("✅ No cache scenarios test completed!") } // TestCacheConfiguration tests different cache configuration options func TestCacheConfiguration(t *testing.T) { + t.Parallel() tests := []struct { name string config *Config @@ -408,7 +427,10 @@ func TestCacheConfiguration(t *testing.T) { EmbeddingModel: "text-embedding-3-small", Dimension: 1536, Threshold: 0.8, - TTL: 1 * time.Hour, // Custom TTL + // Short TTL so the test can verify expiry is honored. + // 1h would only verify the configured value didn't crash; + // it can't distinguish "TTL applied" from "default TTL applied". + TTL: 2 * time.Second, }, expectedBehavior: "custom_ttl", }, @@ -419,19 +441,22 @@ func TestCacheConfiguration(t *testing.T) { setup := NewTestSetupWithConfig(t, tt.config) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("config-test-" + tt.name) + ctx := CreateContextWithCacheKey(t, "config-test-"+tt.name) // Basic functionality test with the configuration testRequest := CreateBasicChatRequest("Test configuration: "+tt.name, 0.5, 50) - _, err1 := setup.Client.ChatCompletionRequest(ctx, testRequest) + response1, err1 := setup.Client.ChatCompletionRequest(ctx, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } + AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) WaitForCache(setup.Plugin) - _, err2 := setup.Client.ChatCompletionRequest(ctx, testRequest) + // Second identical request must hit (regardless of which config — + // all three configs cache identical requests via the direct path). + response2, err2 := setup.Client.ChatCompletionRequest(ctx, testRequest) if err2 != nil { if err2.Error != nil { t.Fatalf("Second request failed: %v", err2.Error.Message) @@ -439,8 +464,32 @@ func TestCacheConfiguration(t *testing.T) { t.Fatalf("Second request failed: %v", err2) } } - - t.Logf("✅ Configuration test '%s' completed", tt.name) + AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}, string(CacheTypeDirect)) + + // Per-config behavioral check. + switch tt.expectedBehavior { + case "strict_matching": + // Threshold=0.95 should still allow direct hits on identical + // content (threshold only gates semantic search). Verified above. + case "loose_matching": + // Same — direct path doesn't use threshold. The relevant check + // is that the cache actually wrote (verified above). + case "custom_ttl": + // Verify Config.TTL was actually honored: wait past expiry + // and confirm a third request misses. If the plugin had + // fallen back to the default TTL, this would still hit. + time.Sleep(tt.config.TTL + 1*time.Second) + response3, err3 := setup.Client.ChatCompletionRequest(ctx, testRequest) + if err3 != nil { + if err3.Error != nil { + t.Fatalf("Post-expiry request failed: %v", err3.Error.Message) + } else { + t.Fatalf("Post-expiry request failed: %v", err3) + } + } + AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}) + } + t.Logf("✅ Configuration test '%s' completed (cache write + read verified)", tt.name) }) } } @@ -510,7 +559,7 @@ func (m *MockUnsupportedStore) Close(ctx context.Context, namespace string) erro // TestInvalidProviderRejection tests that providers without embedding support are rejected during initialization func TestInvalidProviderRejection(t *testing.T) { - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + ctx := newBaseTestContext() logger := bifrost.NewDefaultLogger(schemas.LogLevelDebug) // Create a mock vector store for testing @@ -532,11 +581,10 @@ func TestInvalidProviderRejection(t *testing.T) { for _, provider := range unsupportedProviders { t.Run(string(provider), func(t *testing.T) { config := &Config{ - Provider: provider, - EmbeddingModel: "some-model", - Dimension: 1536, - Threshold: 0.8, - CleanUpOnShutdown: false, + Provider: provider, + EmbeddingModel: "some-model", + Dimension: 1536, + Threshold: 0.8, } // Provider validation was moved to request time (global client handles it). @@ -551,7 +599,7 @@ func TestInvalidProviderRejection(t *testing.T) { // TestValidProviderAccepted tests that providers with embedding support are accepted during initialization func TestValidProviderAccepted(t *testing.T) { - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + ctx := newBaseTestContext() logger := bifrost.NewDefaultLogger(schemas.LogLevelDebug) // Create a mock vector store for testing @@ -559,11 +607,10 @@ func TestValidProviderAccepted(t *testing.T) { // Test a supported provider (OpenAI) config := &Config{ - Provider: schemas.OpenAI, - EmbeddingModel: "text-embedding-3-small", - Dimension: 1536, - Threshold: 0.8, - CleanUpOnShutdown: false, + Provider: schemas.OpenAI, + EmbeddingModel: "text-embedding-3-small", + Dimension: 1536, + Threshold: 0.8, } // Init should succeed; provider validation happens at request time via the global client. diff --git a/plugins/semanticcache/plugin_cross_cache_test.go b/plugins/semanticcache/plugin_cross_cache_test.go index 7a49389911..00f1085443 100644 --- a/plugins/semanticcache/plugin_cross_cache_test.go +++ b/plugins/semanticcache/plugin_cross_cache_test.go @@ -8,24 +8,25 @@ import ( // TestCrossCacheTypeAccessibility tests that entries cached one way are accessible another way func TestCrossCacheTypeAccessibility(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() testRequest := CreateBasicChatRequest("What is artificial intelligence?", 0.7, 100) // Test 1: Cache with default behavior (both direct + semantic) - ctx1 := CreateContextWithCacheKey("test-cross-cache-access") + ctx1 := CreateContextWithCacheKey(t, "test-cross-cache-access") t.Log("Caching with default behavior (both direct + semantic)...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) WaitForCache(setup.Plugin) // Test 2: Retrieve with direct-only cache type - ctx2 := CreateContextWithCacheKeyAndType("test-cross-cache-access", CacheTypeDirect) + ctx2 := CreateContextWithCacheKeyAndType(t, "test-cross-cache-access", CacheTypeDirect) t.Log("Retrieving with CacheTypeKey=direct...") response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err2 != nil { @@ -38,7 +39,7 @@ func TestCrossCacheTypeAccessibility(t *testing.T) { AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}, "direct") // Should find direct match // Test 3: Retrieve with semantic-only cache type - ctx3 := CreateContextWithCacheKeyAndType("test-cross-cache-access", CacheTypeSemantic) + ctx3 := CreateContextWithCacheKeyAndType(t, "test-cross-cache-access", CacheTypeSemantic) t.Log("Retrieving with CacheTypeKey=semantic...") response3, err3 := setup.Client.ChatCompletionRequest(ctx3, testRequest) if err3 != nil { @@ -51,6 +52,7 @@ func TestCrossCacheTypeAccessibility(t *testing.T) { // TestCacheTypeIsolation tests that entries cached separately by type behave correctly func TestCacheTypeIsolation(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() @@ -60,22 +62,22 @@ func TestCacheTypeIsolation(t *testing.T) { clearTestKeysWithStore(t, setup.Store) // Test 1: Cache with direct-only - ctx1 := CreateContextWithCacheKeyAndType("test-cache-isolation", CacheTypeDirect) + ctx1 := CreateContextWithCacheKeyAndType(t, "test-cache-isolation", CacheTypeDirect) t.Log("Caching with CacheTypeKey=direct only...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) // Fresh request WaitForCache(setup.Plugin) // Test 2: Try to retrieve with semantic-only (should miss because no semantic entry) - ctx2 := CreateContextWithCacheKeyAndType("test-cache-isolation", CacheTypeSemantic) + ctx2 := CreateContextWithCacheKeyAndType(t, "test-cache-isolation", CacheTypeSemantic) t.Log("Retrieving same request with CacheTypeKey=semantic (should miss)...") response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err2 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err2) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}) // Should miss - no semantic cache entry @@ -90,7 +92,7 @@ func TestCacheTypeIsolation(t *testing.T) { AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}, "direct") // Should hit direct cache // Test 4: Default behavior (should find the direct cache) - ctx4 := CreateContextWithCacheKey("test-cache-isolation") + ctx4 := CreateContextWithCacheKey(t, "test-cache-isolation") t.Log("Retrieving with default behavior (should find direct cache)...") response4, err4 := setup.Client.ChatCompletionRequest(ctx4, testRequest) if err4 != nil { @@ -103,17 +105,18 @@ func TestCacheTypeIsolation(t *testing.T) { // TestCacheTypeFallbackBehavior tests whether cache types fallback to each other func TestCacheTypeFallbackBehavior(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() // Cache an entry with default behavior originalRequest := CreateBasicChatRequest("Explain machine learning", 0.7, 100) - ctx1 := CreateContextWithCacheKey("test-fallback-behavior") + ctx1 := CreateContextWithCacheKey(t, "test-fallback-behavior") t.Log("Caching with default behavior...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, originalRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) @@ -121,19 +124,19 @@ func TestCacheTypeFallbackBehavior(t *testing.T) { // Test similar request with direct-only (should miss direct, no fallback, but should cache response) similarRequest := CreateBasicChatRequest("Explain machine learning concepts", 0.7, 100) - ctx2 := CreateContextWithCacheKeyAndType("test-fallback-behavior", CacheTypeDirect) + ctx2 := CreateContextWithCacheKeyAndType(t, "test-fallback-behavior", CacheTypeDirect) t.Log("Testing similar request with CacheTypeKey=direct (should miss, make request, cache without embeddings)...") response2, err2 := setup.Client.ChatCompletionRequest(ctx2, similarRequest) if err2 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err2) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}) // Should miss - no direct match, no semantic search WaitForCache(setup.Plugin) // Let the response get cached // Test same similar request with semantic-only (should hit original entry) - ctx3 := CreateContextWithCacheKeyAndType("test-fallback-behavior", CacheTypeSemantic) + ctx3 := CreateContextWithCacheKeyAndType(t, "test-fallback-behavior", CacheTypeSemantic) t.Log("Testing similar request with CacheTypeKey=semantic (should find semantic match from step 1)...") response3, err3 := setup.Client.ChatCompletionRequest(ctx3, similarRequest) @@ -141,8 +144,12 @@ func TestCacheTypeFallbackBehavior(t *testing.T) { t.Fatalf("Third request failed: %v", err3) } - // Should find semantic match from step 1's cached entry (which has embeddings) - if response3.ExtraFields.CacheDebug != nil && response3.ExtraFields.CacheDebug.CacheHit { + // Should find semantic match from step 1's cached entry (which has embeddings). + // Hit is similarity-dependent; CacheDebug must be stamped either way. + if response3.ExtraFields.CacheDebug == nil { + t.Fatal("expected CacheDebug to be stamped on the response") + } + if response3.ExtraFields.CacheDebug.CacheHit { AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}, "semantic") t.Log("✅ Semantic search found similar entry from step 1") } else { @@ -153,7 +160,7 @@ func TestCacheTypeFallbackBehavior(t *testing.T) { // Test a different similar request with default behavior (try both, fallback to semantic) // Use a slightly different request to avoid hitting the cached response from step 2 differentSimilarRequest := CreateBasicChatRequest("Explain the basics of machine learning", 0.7, 100) - ctx4 := CreateContextWithCacheKey("test-fallback-behavior") + ctx4 := CreateContextWithCacheKey(t, "test-fallback-behavior") t.Log("Testing different similar request with default behavior (direct miss -> semantic fallback)...") response4, err4 := setup.Client.ChatCompletionRequest(ctx4, differentSimilarRequest) @@ -161,8 +168,12 @@ func TestCacheTypeFallbackBehavior(t *testing.T) { t.Fatalf("Fourth request failed: %v", err4) } - // Should try direct first (miss), then semantic (might hit) - if response4.ExtraFields.CacheDebug != nil && response4.ExtraFields.CacheDebug.CacheHit { + // Should try direct first (miss), then semantic (might hit). CacheDebug + // must be stamped either way. + if response4.ExtraFields.CacheDebug == nil { + t.Fatal("expected CacheDebug to be stamped on the response") + } + if response4.ExtraFields.CacheDebug.CacheHit { AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response4}, "semantic") t.Log("✅ Default behavior found semantic fallback") } else { @@ -175,17 +186,18 @@ func TestCacheTypeFallbackBehavior(t *testing.T) { // TestMultipleCacheEntriesPriority tests behavior when multiple cache entries exist func TestMultipleCacheEntriesPriority(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() testRequest := CreateBasicChatRequest("What is deep learning?", 0.7, 100) // Create cache entry with default behavior first - ctx1 := CreateContextWithCacheKey("test-cache-priority") + ctx1 := CreateContextWithCacheKey(t, "test-cache-priority") t.Log("Creating cache entry with default behavior...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) originalContent := *response1.Choices[0].Message.Content.ContentStr @@ -211,7 +223,7 @@ func TestMultipleCacheEntriesPriority(t *testing.T) { } // Test with direct-only access - ctx2 := CreateContextWithCacheKeyAndType("test-cache-priority", CacheTypeDirect) + ctx2 := CreateContextWithCacheKeyAndType(t, "test-cache-priority", CacheTypeDirect) t.Log("Accessing with CacheTypeKey=direct...") response3, err3 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err3 != nil { @@ -220,7 +232,7 @@ func TestMultipleCacheEntriesPriority(t *testing.T) { AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}, "direct") // Should find direct cache // Test with semantic-only access - ctx3 := CreateContextWithCacheKeyAndType("test-cache-priority", CacheTypeSemantic) + ctx3 := CreateContextWithCacheKeyAndType(t, "test-cache-priority", CacheTypeSemantic) t.Log("Accessing with CacheTypeKey=semantic...") response4, err4 := setup.Client.ChatCompletionRequest(ctx3, testRequest) if err4 != nil { @@ -233,6 +245,7 @@ func TestMultipleCacheEntriesPriority(t *testing.T) { // TestCrossCacheTypeWithDifferentParameters tests cache type behavior with parameter variations func TestCrossCacheTypeWithDifferentParameters(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() @@ -240,19 +253,19 @@ func TestCrossCacheTypeWithDifferentParameters(t *testing.T) { // Cache with specific parameters request1 := CreateBasicChatRequest(baseMessage, 0.7, 100) - ctx1 := CreateContextWithCacheKey("test-cross-cache-params") + ctx1 := CreateContextWithCacheKey(t, "test-cross-cache-params") t.Log("Caching with temp=0.7, max_tokens=100...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) WaitForCache(setup.Plugin) // Test same parameters with direct-only - ctx2 := CreateContextWithCacheKeyAndType("test-cross-cache-params", CacheTypeDirect) + ctx2 := CreateContextWithCacheKeyAndType(t, "test-cross-cache-params", CacheTypeDirect) t.Log("Retrieving same parameters with CacheTypeKey=direct...") response2, err2 := setup.Client.ChatCompletionRequest(ctx2, request1) if err2 != nil { @@ -269,18 +282,18 @@ func TestCrossCacheTypeWithDifferentParameters(t *testing.T) { t.Log("Testing different parameters (should miss)...") response3, err3 := setup.Client.ChatCompletionRequest(ctx2, request3) if err3 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err3) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}) // Should miss due to different params // Test semantic search with different parameters - ctx4 := CreateContextWithCacheKeyAndType("test-cross-cache-params", CacheTypeSemantic) + ctx4 := CreateContextWithCacheKeyAndType(t, "test-cross-cache-params", CacheTypeSemantic) similarRequest := CreateBasicChatRequest("Can you explain quantum computing", 0.5, 200) t.Log("Testing semantic search with different params and similar message...") response4, err4 := setup.Client.ChatCompletionRequest(ctx4, similarRequest) if err4 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err4) } // Should miss semantic search due to different parameters (params_hash different) AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response4}) @@ -290,26 +303,27 @@ func TestCrossCacheTypeWithDifferentParameters(t *testing.T) { // TestCacheTypeErrorHandling tests error scenarios with cache types func TestCacheTypeErrorHandling(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() testRequest := CreateBasicChatRequest("Test error handling", 0.7, 50) // Test invalid cache type (should fallback to default) - ctx1 := CreateContextWithCacheKey("test-cache-error-handling") + ctx1 := CreateContextWithCacheKey(t, "test-cache-error-handling") ctx1 = ctx1.WithValue(CacheTypeKey, "invalid_cache_type") t.Log("Testing invalid cache type (should fallback to default behavior)...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) // Should work with fallback behavior WaitForCache(setup.Plugin) // Test nil cache type (should use default) - ctx2 := CreateContextWithCacheKey("test-cache-error-handling") + ctx2 := CreateContextWithCacheKey(t, "test-cache-error-handling") ctx2 = ctx2.WithValue(CacheTypeKey, nil) t.Log("Testing nil cache type (should use default behavior)...") diff --git a/plugins/semanticcache/plugin_default_cache_key_test.go b/plugins/semanticcache/plugin_default_cache_key_test.go index 57cd2d6cb4..db8e78443a 100644 --- a/plugins/semanticcache/plugin_default_cache_key_test.go +++ b/plugins/semanticcache/plugin_default_cache_key_test.go @@ -1,7 +1,6 @@ package semanticcache import ( - "context" "testing" "github.com/maximhq/bifrost/core/schemas" @@ -10,21 +9,22 @@ import ( // TestDefaultCacheKey_CachesWithoutPerRequestKey verifies that when DefaultCacheKey // is configured, requests without an explicit cache key are cached automatically. func TestDefaultCacheKey_CachesWithoutPerRequestKey(t *testing.T) { + t.Parallel() config := getDefaultTestConfig() - config.DefaultCacheKey = "test-default-key" + config.DefaultCacheKey = keyForTest(t, "test-default-key") setup := NewTestSetupWithConfig(t, config) defer setup.Cleanup() // Context with NO per-request cache key - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + ctx := newBaseTestContext() testRequest := CreateBasicChatRequest("What is Bifrost? Answer in one short sentence.", 0.7, 50) t.Log("Making first request without per-request cache key (should use default and be cached)...") response1, err1 := setup.Client.ChatCompletionRequest(ctx, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } if response1 == nil || len(response1.Choices) == 0 || response1.Choices[0].Message.Content.ContentStr == nil { @@ -37,7 +37,7 @@ func TestDefaultCacheKey_CachesWithoutPerRequestKey(t *testing.T) { WaitForCache(setup.Plugin) t.Log("Making second identical request without per-request cache key (should hit cache)...") - ctx2 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + ctx2 := newBaseTestContext() response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err2 != nil { if err2.Error != nil { @@ -53,8 +53,9 @@ func TestDefaultCacheKey_CachesWithoutPerRequestKey(t *testing.T) { // TestDefaultCacheKey_PerRequestKeyOverridesDefault verifies that an explicit // per-request cache key takes precedence over the configured default. func TestDefaultCacheKey_PerRequestKeyOverridesDefault(t *testing.T) { + t.Parallel() config := getDefaultTestConfig() - config.DefaultCacheKey = "test-default-key" + config.DefaultCacheKey = keyForTest(t, "test-default-key") setup := NewTestSetupWithConfig(t, config) defer setup.Cleanup() @@ -62,16 +63,16 @@ func TestDefaultCacheKey_PerRequestKeyOverridesDefault(t *testing.T) { testRequest := CreateBasicChatRequest("What is the capital of France?", 0.5, 50) // Cache with the default key (no per-request key) - ctx1 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + ctx1 := newBaseTestContext() _, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } WaitForCache(setup.Plugin) // Verify the cache was actually populated with the default key - ctxDefault2 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + ctxDefault2 := newBaseTestContext() responseDefault2, errDefault2 := setup.Client.ChatCompletionRequest(ctxDefault2, testRequest) if errDefault2 != nil { if errDefault2.Error != nil { @@ -82,7 +83,7 @@ func TestDefaultCacheKey_PerRequestKeyOverridesDefault(t *testing.T) { AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: responseDefault2}, string(CacheTypeDirect)) // Same request but with a DIFFERENT per-request key — should miss - ctx2 := CreateContextWithCacheKey("override-key") + ctx2 := CreateContextWithCacheKey(t, "override-key") response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err2 != nil { if err2.Error != nil { @@ -98,20 +99,21 @@ func TestDefaultCacheKey_PerRequestKeyOverridesDefault(t *testing.T) { // TestDefaultCacheKey_EmptyDefault_NoCaching verifies that when DefaultCacheKey // is empty (default zero value), requests without a per-request key bypass caching. func TestDefaultCacheKey_EmptyDefault_NoCaching(t *testing.T) { + t.Parallel() config := getDefaultTestConfig() // DefaultCacheKey is intentionally left empty (zero value) setup := NewTestSetupWithConfig(t, config) defer setup.Cleanup() - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + ctx := newBaseTestContext() testRequest := CreateBasicChatRequest("What is deep learning", 0.7, 50) t.Log("Making first request without any cache key and no default (should not cache)...") response1, err1 := setup.Client.ChatCompletionRequest(ctx, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) @@ -119,7 +121,7 @@ func TestDefaultCacheKey_EmptyDefault_NoCaching(t *testing.T) { WaitForCache(setup.Plugin) t.Log("Making second identical request (should still not cache)...") - ctx2 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + ctx2 := newBaseTestContext() response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err2 != nil { if err2.Error != nil { diff --git a/plugins/semanticcache/plugin_edge_cases_test.go b/plugins/semanticcache/plugin_edge_cases_test.go index a99eb64ef2..946daca1a9 100644 --- a/plugins/semanticcache/plugin_edge_cases_test.go +++ b/plugins/semanticcache/plugin_edge_cases_test.go @@ -1,7 +1,6 @@ package semanticcache import ( - "context" "strings" "testing" @@ -11,6 +10,7 @@ import ( // TestParameterVariations tests that different parameters don't cache hit inappropriately func TestParameterVariations(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() @@ -45,7 +45,7 @@ func TestParameterVariations(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { // Create a fresh context for each subtest to avoid context pollution - ctx := CreateContextWithCacheKey("param-variations-test") + ctx := CreateContextWithCacheKey(t, "param-variations-test") // Clear cache for this subtest clearTestKeysWithStore(t, setup.Store) @@ -53,7 +53,7 @@ func TestParameterVariations(t *testing.T) { // Make first request _, err1 := setup.Client.ChatCompletionRequest(ctx, tt.request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } WaitForCache(setup.Plugin) @@ -80,10 +80,11 @@ func TestParameterVariations(t *testing.T) { // TestToolVariations tests caching behavior with different tool configurations func TestToolVariations(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("tool-variations-test") + ctx := CreateContextWithCacheKey(t, "tool-variations-test") // Base request without tools baseRequest := &schemas.BifrostChatRequest{ @@ -190,7 +191,7 @@ func TestToolVariations(t *testing.T) { t.Log("Making request with tools...") response2, err2 := setup.Client.ChatCompletionRequest(ctx, requestWithTools) if err2 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err2) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}) @@ -210,7 +211,7 @@ func TestToolVariations(t *testing.T) { t.Log("Making request with different tools...") response4, err4 := setup.Client.ChatCompletionRequest(ctx, requestWithDifferentTools) if err4 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err4) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response4}) @@ -220,6 +221,7 @@ func TestToolVariations(t *testing.T) { // TestContentVariations tests caching behavior with different content types func TestContentVariations(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() @@ -349,14 +351,13 @@ func TestContentVariations(t *testing.T) { t.Run(tt.name, func(t *testing.T) { t.Logf("Testing content variation: %s", tt.name) - // Create a fresh context for each subtest to avoid context pollution - ctx := CreateContextWithCacheKey("content-variations-test") + // Use a per-subtest cache key so subtests don't share entries. + ctx := CreateContextWithCacheKey(t, "content-variations-"+tt.name) // Make first request _, err1 := setup.Client.ChatCompletionRequest(ctx, tt.request) if err1 != nil { - t.Logf("⚠️ First %s request failed: %v", tt.name, err1) - return // Skip this test case + t.Skipf("upstream request error, skipping %s: %v", tt.name, err1) } WaitForCache(setup.Plugin) @@ -376,6 +377,7 @@ func TestContentVariations(t *testing.T) { // TestBoundaryParameterValues tests edge case parameter values func TestBoundaryParameterValues(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() @@ -454,25 +456,40 @@ func TestBoundaryParameterValues(t *testing.T) { t.Run(tt.name, func(t *testing.T) { t.Logf("Testing boundary parameters: %s", tt.name) - // Create a fresh context for each subtest to avoid context pollution - ctx := CreateContextWithCacheKey("boundary-params-test") + // Per-subtest cache key so subtests don't share entries. + ctx := CreateContextWithCacheKey(t, "boundary-params-"+tt.name) - _, err := setup.Client.ChatCompletionRequest(ctx, tt.request) - if err != nil { - t.Logf("⚠️ %s request failed (may be expected): %v", tt.name, err) - } else { - t.Logf("✅ %s handled gracefully", tt.name) + // First request must succeed (boundary values are valid OpenAI + // inputs); a real failure here is a regression, not "expected". + response1, err1 := setup.Client.ChatCompletionRequest(ctx, tt.request) + if err1 != nil { + t.Skipf("upstream request error, skipping %s: %v", tt.name, err1) + } + AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) + + WaitForCache(setup.Plugin) + + // Second identical request must hit — proves boundary params + // don't break cache key generation or storage. + response2, err2 := setup.Client.ChatCompletionRequest(ctx, tt.request) + if err2 != nil { + t.Fatalf("Second %s request failed: %v", tt.name, err2) } + AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}, string(CacheTypeDirect)) + t.Logf("✅ %s parameters cached correctly", tt.name) }) } } // TestSemanticSimilarityEdgeCases tests edge cases in semantic similarity matching func TestSemanticSimilarityEdgeCases(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - setup.Config.Threshold = 0.9 + // Threshold tuned for the prompt pairs below; 0.9 is too strict for + // semantically-similar-but-different-phrasing pairs and produces flakes. + setup.Config.Threshold = 0.7 // Test case: Similar questions with different wording similarTests := []struct { @@ -510,7 +527,7 @@ func TestSemanticSimilarityEdgeCases(t *testing.T) { for i, test := range similarTests { t.Run(test.description, func(t *testing.T) { // Create a fresh context for each subtest to avoid context pollution - ctx := CreateContextWithCacheKey("semantic-edge-test") + ctx := CreateContextWithCacheKey(t, "semantic-edge-test") // Clear cache for this subtest clearTestKeysWithStore(t, setup.Store) @@ -519,7 +536,7 @@ func TestSemanticSimilarityEdgeCases(t *testing.T) { request1 := CreateBasicChatRequest(test.prompt1, 0.1, 50) _, err1 := setup.Client.ChatCompletionRequest(ctx, request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } // Wait for cache to be written @@ -558,7 +575,7 @@ func TestSemanticSimilarityEdgeCases(t *testing.T) { if semanticMatch { t.Logf("✅ Test %d: Semantic match found as expected for '%s'", i+1, test.description) } else { - t.Logf("ℹ️ Test %d: No semantic match found for '%s', check with threshold: %f and found similarity: %f", i+1, test.description, cacheThresholdFloat, cacheSimilarityFloat) + t.Errorf("❌ Test %d: Expected semantic match for '%s' but none found (threshold=%f, similarity=%f)", i+1, test.description, cacheThresholdFloat, cacheSimilarityFloat) } } else { if semanticMatch { @@ -573,6 +590,7 @@ func TestSemanticSimilarityEdgeCases(t *testing.T) { // TestErrorHandlingEdgeCases tests various error scenarios func TestErrorHandlingEdgeCases(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() @@ -580,23 +598,33 @@ func TestErrorHandlingEdgeCases(t *testing.T) { // Test without cache key (should not crash and bypass cache) t.Run("Request without cache key", func(t *testing.T) { - ctxNoKey := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + ctxNoKey := newBaseTestContext() - response, err := setup.Client.ChatCompletionRequest(ctxNoKey, testRequest) + response1, err := setup.Client.ChatCompletionRequest(ctxNoKey, testRequest) if err != nil { t.Errorf("Request without cache key failed: %v", err) return } + AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) - // Should bypass cache since there's no cache key - AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response}) - t.Log("✅ Request without cache key correctly bypassed cache") + WaitForCache(setup.Plugin) + + // Second identical request must also miss — proves the first wasn't + // silently cached against a default key. + ctxNoKey2 := newBaseTestContext() + response2, err := setup.Client.ChatCompletionRequest(ctxNoKey2, testRequest) + if err != nil { + t.Errorf("Second request without cache key failed: %v", err) + return + } + AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}) + t.Log("✅ Request without cache key correctly bypassed cache (verified across two calls)") }) // Test with invalid cache key type t.Run("Request with invalid cache key type", func(t *testing.T) { // First establish a cached response with valid context - validCtx := CreateContextWithCacheKey("error-handling-test") + validCtx := CreateContextWithCacheKey(t, "error-handling-test") _, err := setup.Client.ChatCompletionRequest(validCtx, testRequest) if err != nil { t.Fatalf("First request with valid cache key failed: %v", err) @@ -605,7 +633,7 @@ func TestErrorHandlingEdgeCases(t *testing.T) { WaitForCache(setup.Plugin) // Now test with invalid key type - should bypass cache - ctxInvalidKey := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline).WithValue(CacheKey, 12345) + ctxInvalidKey := newBaseTestContext().WithValue(CacheKey, 12345) response, err := setup.Client.ChatCompletionRequest(ctxInvalidKey, testRequest) if err != nil { diff --git a/plugins/semanticcache/plugin_embedding_test.go b/plugins/semanticcache/plugin_embedding_test.go index c5487a8510..e42f71c63c 100644 --- a/plugins/semanticcache/plugin_embedding_test.go +++ b/plugins/semanticcache/plugin_embedding_test.go @@ -9,10 +9,11 @@ import ( // TestEmbeddingRequestsCaching tests that embedding requests are properly cached using direct hash matching func TestEmbeddingRequestsCaching(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-embedding-cache") + ctx := CreateContextWithCacheKey(t, "test-embedding-cache") // Create embedding request embeddingRequest := CreateEmbeddingRequest([]string{ @@ -28,7 +29,7 @@ func TestEmbeddingRequestsCaching(t *testing.T) { duration1 := time.Since(start1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } if response1 == nil || len(response1.Data) == 0 { @@ -76,33 +77,48 @@ func TestEmbeddingRequestsCaching(t *testing.T) { // TestEmbeddingRequestsNoCacheWithoutCacheKey tests that embedding requests without cache key are not cached func TestEmbeddingRequestsNoCacheWithoutCacheKey(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - // Don't set cache key in context - ctx := CreateContextWithCacheKey("") + // Don't set cache key in context. CreateContextWithCacheKey(t, "") would + // still populate CacheKey from t.Name() and turn this into a keyed + // request — using a base context keeps CacheKey unset so we exercise + // the cache-disabled path. + ctx := newBaseTestContext() embeddingRequest := CreateEmbeddingRequest([]string{"Test embedding without cache key"}) - t.Log("Making embedding request without cache key...") - - response, err := setup.Client.EmbeddingRequest(ctx, embeddingRequest) + t.Log("Making first embedding request without cache key...") + response1, err := setup.Client.EmbeddingRequest(ctx, embeddingRequest) if err != nil { t.Fatalf("Embedding request failed: %v", err) } + AssertNoCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response1}) + + WaitForCache(setup.Plugin) - // Should not be cached - AssertNoCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response}) + // Real check: a second identical request must ALSO miss. If the cache + // silently keyed off something else (e.g. a default key), this would + // surface as a hit and fail the assertion. + t.Log("Making second identical request — must also miss because nothing was cached...") + ctx2 := newBaseTestContext() + response2, err := setup.Client.EmbeddingRequest(ctx2, embeddingRequest) + if err != nil { + t.Fatalf("Second embedding request failed: %v", err) + } + AssertNoCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response2}) t.Log("✅ Embedding requests without cache key are properly not cached") } // TestEmbeddingRequestsDifferentTexts tests that different embedding texts produce different cache entries func TestEmbeddingRequestsDifferentTexts(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-embedding-different") + ctx := CreateContextWithCacheKey(t, "test-embedding-different") // Create two different embedding requests request1 := CreateEmbeddingRequest([]string{"First set of texts"}) @@ -111,7 +127,7 @@ func TestEmbeddingRequestsDifferentTexts(t *testing.T) { t.Log("Making first embedding request...") response1, err1 := setup.Client.EmbeddingRequest(ctx, request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response1}) @@ -120,7 +136,7 @@ func TestEmbeddingRequestsDifferentTexts(t *testing.T) { t.Log("Making second different embedding request...") response2, err2 := setup.Client.EmbeddingRequest(ctx, request2) if err2 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err2) } // Should not be a cache hit since texts are different AssertNoCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response2}) @@ -130,19 +146,20 @@ func TestEmbeddingRequestsDifferentTexts(t *testing.T) { // TestEmbeddingRequestsCacheExpiration tests TTL functionality for embedding requests func TestEmbeddingRequestsCacheExpiration(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() // Set very short TTL for testing shortTTL := 5 * time.Second - ctx := CreateContextWithCacheKeyAndTTL("test-embedding-ttl", shortTTL) + ctx := CreateContextWithCacheKeyAndTTL(t, "test-embedding-ttl", shortTTL) embeddingRequest := CreateEmbeddingRequest([]string{"TTL test embedding"}) t.Log("Making first embedding request with short TTL...") response1, err1 := setup.Client.EmbeddingRequest(ctx, embeddingRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response1}) @@ -160,12 +177,15 @@ func TestEmbeddingRequestsCacheExpiration(t *testing.T) { AssertCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response2}, "direct") t.Logf("Waiting for TTL expiration (%v)...", shortTTL) - time.Sleep(shortTTL + 1*time.Second) // Wait for TTL to expire + // expires_at is stored at second-precision Unix(); a 1s buffer can land + // on the same boundary as the entry's expiry under load. 2s is the + // minimum margin that's robust to seconds-level rounding + a slow CI. + time.Sleep(shortTTL + 2*time.Second) t.Log("Making third request after TTL expiration...") response3, err3 := setup.Client.EmbeddingRequest(ctx, embeddingRequest) if err3 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err3) } // Should not be a cache hit since TTL expired AssertNoCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response3}) diff --git a/plugins/semanticcache/plugin_image_generation_test.go b/plugins/semanticcache/plugin_image_generation_test.go index a65c06e81b..c6dee8d347 100644 --- a/plugins/semanticcache/plugin_image_generation_test.go +++ b/plugins/semanticcache/plugin_image_generation_test.go @@ -10,6 +10,10 @@ import ( // TestImageGenerationCacheBasicFunctionality tests basic image generation caching func TestImageGenerationCacheBasicFunctionality(t *testing.T) { + if testing.Short() { + t.Skipf("skipping %s in -short mode (gpt-image-1 calls take ~15-65s)", "TestImageGenerationCacheBasicFunctionality") + } + t.Parallel() if testing.Short() { t.Skip("skipping integration test in -short mode") } @@ -19,7 +23,7 @@ func TestImageGenerationCacheBasicFunctionality(t *testing.T) { setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-image-gen-value") + ctx := CreateContextWithCacheKey(t, "test-image-gen-value") // Create test image generation request testRequest := CreateImageGenerationRequest( @@ -116,6 +120,10 @@ func TestImageGenerationCacheBasicFunctionality(t *testing.T) { // TestImageGenerationSemanticSearch tests semantic similarity search for image generation func TestImageGenerationSemanticSearch(t *testing.T) { + if testing.Short() { + t.Skipf("skipping %s in -short mode (gpt-image-1 calls take ~15-65s)", "TestImageGenerationSemanticSearch") + } + t.Parallel() if testing.Short() { t.Skip("skipping integration test in -short mode") } @@ -132,7 +140,7 @@ func TestImageGenerationSemanticSearch(t *testing.T) { setup := NewTestSetupWithConfig(t, config) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("image-semantic-test-value") + ctx := CreateContextWithCacheKey(t, "image-semantic-test-value") // First request - this will be cached firstRequest := CreateImageGenerationRequest( @@ -234,6 +242,10 @@ func TestImageGenerationSemanticSearch(t *testing.T) { // TestImageGenerationDifferentParameters tests that different parameters are cached separately func TestImageGenerationDifferentParameters(t *testing.T) { + if testing.Short() { + t.Skipf("skipping %s in -short mode (gpt-image-1 calls take ~15-65s)", "TestImageGenerationDifferentParameters") + } + t.Parallel() if testing.Short() { t.Skip("skipping integration test in -short mode") } @@ -243,7 +255,7 @@ func TestImageGenerationDifferentParameters(t *testing.T) { setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("image-params-test") + ctx := CreateContextWithCacheKey(t, "image-params-test") basePrompt := "A cute cat sitting on a windowsill" @@ -292,6 +304,10 @@ func TestImageGenerationDifferentParameters(t *testing.T) { // TestImageGenerationStreamCaching tests streaming image generation caching func TestImageGenerationStreamCaching(t *testing.T) { + if testing.Short() { + t.Skipf("skipping %s in -short mode (gpt-image-1 calls take ~15-65s)", "TestImageGenerationStreamCaching") + } + t.Parallel() if testing.Short() { t.Skip("skipping integration test in -short mode") } @@ -301,7 +317,7 @@ func TestImageGenerationStreamCaching(t *testing.T) { setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("image-stream-test") + ctx := CreateContextWithCacheKey(t, "image-stream-test") // Create test image generation request testRequest := CreateImageGenerationRequest( diff --git a/plugins/semanticcache/plugin_integration_test.go b/plugins/semanticcache/plugin_integration_test.go index 58ab9d04c3..c153928972 100644 --- a/plugins/semanticcache/plugin_integration_test.go +++ b/plugins/semanticcache/plugin_integration_test.go @@ -1,7 +1,6 @@ package semanticcache import ( - "context" "strings" "testing" "time" @@ -13,11 +12,12 @@ import ( // TestSemanticCacheBasicFlow tests the complete semantic cache flow func TestSemanticCacheBasicFlow(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx.SetValue(CacheKey, "test-cache-enabled") + ctx := newBaseTestContext() + ctx.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) // Test request request := &schemas.BifrostRequest{ @@ -107,8 +107,8 @@ func TestSemanticCacheBasicFlow(t *testing.T) { t.Log("Testing second identical request (expecting cache hit)...") // Reset context for second request - ctx2 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx2.SetValue(CacheKey, "test-cache-enabled") + ctx2 := newBaseTestContext() + ctx2.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) modifiedReq2, shortCircuit2, err := setup.Plugin.PreLLMHook(ctx2, request) if err != nil { @@ -158,11 +158,12 @@ func TestSemanticCacheBasicFlow(t *testing.T) { // TestSemanticCacheStrictFiltering tests that the cache respects parameter differences func TestSemanticCacheStrictFiltering(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx.SetValue(CacheKey, "test-cache-enabled") + ctx := newBaseTestContext() + ctx.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) // Base request baseRequest := &schemas.BifrostRequest{ @@ -231,8 +232,8 @@ func TestSemanticCacheStrictFiltering(t *testing.T) { // Second request with different temperature - should be cache miss t.Log("Testing second request with temperature=0.5 (expecting cache miss)...") - ctx2 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx2.SetValue(CacheKey, "test-cache-enabled") + ctx2 := newBaseTestContext() + ctx2.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) modifiedRequest := &schemas.BifrostRequest{ RequestType: schemas.ChatCompletionRequest, @@ -268,8 +269,8 @@ func TestSemanticCacheStrictFiltering(t *testing.T) { // Third request with different model - should be cache miss t.Log("Testing third request with different model (expecting cache miss)...") - ctx3 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx3.SetValue(CacheKey, "test-cache-enabled") + ctx3 := newBaseTestContext() + ctx3.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) modifiedRequest2 := &schemas.BifrostRequest{ RequestType: schemas.ChatCompletionRequest, @@ -306,11 +307,12 @@ func TestSemanticCacheStrictFiltering(t *testing.T) { // TestSemanticCacheStreamingFlow tests streaming response caching func TestSemanticCacheStreamingFlow(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx.SetValue(CacheKey, "test-cache-enabled") + ctx := newBaseTestContext() + ctx.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) request := &schemas.BifrostRequest{ RequestType: schemas.ChatCompletionStreamRequest, @@ -356,10 +358,20 @@ func TestSemanticCacheStreamingFlow(t *testing.T) { for i, chunk := range chunks { var finishReason *string - if i == len(chunks)-1 { + isFinal := i == len(chunks)-1 + if isFinal { finishReason = bifrost.Ptr("stop") } + // Bifrost's stream pipeline sets this on the final chunk before + // invoking PostLLMHook (see core/bifrost.go where it stamps + // BifrostContextKeyStreamEndIndicator=true). The cache plugin's + // PostLLMHook flushes the accumulator only when IsFinalChunk(ctx) + // returns true, so a hand-rolled stream simulation must mirror + // that — otherwise the entry is never written and the second + // request misses. + ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, isFinal) + chunkResponse := &schemas.BifrostResponse{ ChatResponse: &schemas.BifrostChatResponse{ ID: uuid.New().String(), @@ -395,8 +407,8 @@ func TestSemanticCacheStreamingFlow(t *testing.T) { // Test cache retrieval for streaming t.Log("Testing streaming cache retrieval...") - ctx2 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx2.SetValue(CacheKey, "test-cache-enabled") + ctx2 := newBaseTestContext() + ctx2.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) _, shortCircuit2, err := setup.Plugin.PreLLMHook(ctx2, request) if err != nil { @@ -404,10 +416,8 @@ func TestSemanticCacheStreamingFlow(t *testing.T) { } if shortCircuit2 == nil { - t.Log("⚠️ Expected streaming cache hit, but got cache miss - this may be expected with the new unified storage") - return + t.Fatal("expected streaming cache hit on identical second request after the first stream was fully accumulated and stored") } - if shortCircuit2.Stream == nil { t.Fatal("Cache hit but stream is nil") } @@ -434,12 +444,13 @@ func TestSemanticCacheStreamingFlow(t *testing.T) { // TestSemanticCache_NoCacheWhenKeyMissing verifies cache is disabled when cache key is missing from context func TestSemanticCache_NoCacheWhenKeyMissing(t *testing.T) { + t.Parallel() t.Log("Testing cache behavior when cache key is missing...") setup := NewTestSetup(t) defer setup.Cleanup() - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + ctx := newBaseTestContext() // Don't set the cache key - cache should be disabled request := &schemas.BifrostRequest{ @@ -473,12 +484,13 @@ func TestSemanticCache_NoCacheWhenKeyMissing(t *testing.T) { // TestSemanticCache_CustomTTLHandling verifies cache respects custom TTL values from context func TestSemanticCache_CustomTTLHandling(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() // Configure plugin with custom TTL key - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx.SetValue(CacheKey, "test-cache-enabled") + ctx := newBaseTestContext() + ctx.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) ctx.SetValue(CacheTTLKey, 1*time.Minute) // Custom TTL request := &schemas.BifrostRequest{ @@ -538,20 +550,37 @@ func TestSemanticCache_CustomTTLHandling(t *testing.T) { WaitForCache(setup.Plugin) - t.Log("✅ Custom TTL configuration test passed!") + // Read back: a second identical request must hit AND the entry's TTL + // must reflect the per-request override (1 minute), not the plugin + // default (5 minutes). expires_at is exposed via cache_debug isn't + // directly readable, but we can confirm the entry is present. + ctx2 := newBaseTestContext() + ctx2.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) + ctx2.SetValue(CacheTTLKey, 1*time.Minute) + _, sc2, err := setup.Plugin.PreLLMHook(ctx2, request) + if err != nil { + t.Fatalf("Second PreLLMHook failed: %v", err) + } + if sc2 == nil || sc2.Response == nil { + t.Fatal("expected cache hit on second identical request with custom TTL") + } + if cd := sc2.Response.GetExtraFields().CacheDebug; cd == nil || !cd.CacheHit { + t.Fatal("expected CacheDebug.CacheHit=true on hit") + } + t.Log("✅ Custom TTL configuration test passed (entry written and retrievable)") } // TestSemanticCache_CustomThresholdHandling verifies cache respects custom similarity threshold from context func TestSemanticCache_CustomThresholdHandling(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - // Configure plugin with custom threshold key - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx.SetValue(CacheKey, "test-cache-enabled") - ctx.SetValue(CacheThresholdKey, 0.95) // Very high threshold - - request := &schemas.BifrostRequest{ + // Seed an entry with the DEFAULT threshold (0.8) so a follow-up + // request can attempt semantic search against it. + seedCtx := newBaseTestContext() + seedCtx.SetValue(CacheKey, keyForTest(t, "threshold-seed")) + seedReq := &schemas.BifrostRequest{ RequestType: schemas.ChatCompletionRequest, ChatRequest: &schemas.BifrostChatRequest{ Provider: schemas.OpenAI, @@ -567,21 +596,57 @@ func TestSemanticCache_CustomThresholdHandling(t *testing.T) { }, } - // Test that custom threshold is used (this would need semantic search to be fully testable) - _, shortCircuit, err := setup.Plugin.PreLLMHook(ctx, request) + _, sc1, err := setup.Plugin.PreLLMHook(seedCtx, seedReq) if err != nil { - t.Fatalf("PreLLMHook failed: %v", err) + t.Fatalf("seed PreLLMHook failed: %v", err) } - - if shortCircuit != nil { - t.Fatal("Expected cache miss with high threshold, but got cache hit") + if sc1 != nil { + t.Fatal("Expected initial cache miss") + } + seedRes := &schemas.BifrostResponse{ + ChatResponse: &schemas.BifrostChatResponse{ + ID: "threshold-test", + Choices: []schemas.BifrostResponseChoice{{ + ChatNonStreamResponseChoice: &schemas.ChatNonStreamResponseChoice{ + Message: &schemas.ChatMessage{ + Role: "assistant", + Content: &schemas.ChatMessageContent{ContentStr: bifrost.Ptr("seed response")}, + }, + }, + }}, + ExtraFields: schemas.BifrostResponseExtraFields{ + Provider: schemas.OpenAI, OriginalModelRequested: "gpt-4o-mini", RequestType: schemas.ChatCompletionRequest, + }, + }, } + if _, _, err := setup.Plugin.PostLLMHook(seedCtx, seedRes, nil); err != nil { + t.Fatalf("seed PostLLMHook failed: %v", err) + } + WaitForCache(setup.Plugin) - t.Log("✅ Custom threshold configuration test passed!") + // Identical-content request with a HIGH threshold (0.95) MUST still hit + // via the direct path (direct hashing ignores threshold). Threshold only + // gates semantic search; a same-input request matches the deterministic + // directCacheID regardless. This proves the override doesn't break direct. + hitCtx := newBaseTestContext() + hitCtx.SetValue(CacheKey, keyForTest(t, "threshold-seed")) + hitCtx.SetValue(CacheThresholdKey, 0.95) + _, sc2, err := setup.Plugin.PreLLMHook(hitCtx, seedReq) + if err != nil { + t.Fatalf("hit PreLLMHook failed: %v", err) + } + if sc2 == nil || sc2.Response == nil { + t.Fatal("expected direct cache hit even with high threshold (direct ignores threshold)") + } + if cd := sc2.Response.GetExtraFields().CacheDebug; cd == nil || cd.HitType == nil || *cd.HitType != string(CacheTypeDirect) { + t.Fatalf("expected hit_type=direct, got cache_debug=%+v", cd) + } + t.Log("✅ Custom threshold override tracked through PreLLMHook without breaking direct path") } // TestSemanticCache_ProviderModelCachingFlags verifies cache behavior with provider/model caching flags func TestSemanticCache_ProviderModelCachingFlags(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() @@ -589,8 +654,8 @@ func TestSemanticCache_ProviderModelCachingFlags(t *testing.T) { setup.Config.CacheByProvider = bifrost.Ptr(false) setup.Config.CacheByModel = bifrost.Ptr(false) - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx.SetValue(CacheKey, "test-cache-enabled") + ctx := newBaseTestContext() + ctx.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) request1 := &schemas.BifrostRequest{ RequestType: schemas.ChatCompletionRequest, @@ -666,29 +731,36 @@ func TestSemanticCache_ProviderModelCachingFlags(t *testing.T) { }, } - ctx2 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx2.SetValue(CacheKey, "test-cache-enabled") + ctx2 := newBaseTestContext() + ctx2.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) _, shortCircuit2, err := setup.Plugin.PreLLMHook(ctx2, request2) if err != nil { t.Fatalf("Second PreLLMHook failed: %v", err) } - // With provider/model caching disabled, we might get cache hits across different providers/models - // This behavior depends on the exact implementation of hash generation - t.Logf("Cache behavior with disabled provider/model flags: hit=%v", shortCircuit2 != nil) - - t.Log("✅ Provider/model caching flags test passed!") + // CacheByProvider=false + CacheByModel=false means provider and model are + // stripped from the directCacheID input. Same content + same cache_key + // must produce the SAME directCacheID, so the second request MUST hit + // even though it specifies a completely different provider/model. + if shortCircuit2 == nil || shortCircuit2.Response == nil { + t.Fatal("expected cache hit across providers/models when CacheByProvider+CacheByModel=false") + } + if cd := shortCircuit2.Response.GetExtraFields().CacheDebug; cd == nil || !cd.CacheHit { + t.Fatalf("expected CacheDebug.CacheHit=true, got %+v", cd) + } + t.Log("✅ CacheByProvider=false + CacheByModel=false correctly shares entries across providers/models") } // TestSemanticCache_ConfigurationEdgeCases verifies edge cases in configuration handling func TestSemanticCache_ConfigurationEdgeCases(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() // Test with invalid TTL type in context - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx.SetValue(CacheKey, "test-cache-enabled") + ctx := newBaseTestContext() + ctx.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) ctx.SetValue(CacheTTLKey, "not-a-duration") // Invalid TTL type request := &schemas.BifrostRequest{ @@ -712,25 +784,63 @@ func TestSemanticCache_ConfigurationEdgeCases(t *testing.T) { if err != nil { t.Fatalf("PreLLMHook failed with invalid TTL: %v", err) } - if shortCircuit != nil { t.Fatal("Unexpected cache hit with invalid TTL") } - // Test with invalid threshold type - ctx2 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx2.SetValue(CacheKey, "test-cache-enabled") - ctx2.SetValue(CacheThresholdKey, "not-a-float") // Invalid threshold type + // Plugin must FALL BACK to its default TTL — verify by writing then + // reading the entry. If the invalid TTL caused caching to silently + // disable, the second request would miss. + res := &schemas.BifrostResponse{ + ChatResponse: &schemas.BifrostChatResponse{ + ID: "edge-ttl", + Choices: []schemas.BifrostResponseChoice{{ + ChatNonStreamResponseChoice: &schemas.ChatNonStreamResponseChoice{ + Message: &schemas.ChatMessage{Role: "assistant", Content: &schemas.ChatMessageContent{ContentStr: bifrost.Ptr("ok")}}, + }, + }}, + ExtraFields: schemas.BifrostResponseExtraFields{Provider: schemas.OpenAI, OriginalModelRequested: "gpt-4o-mini", RequestType: schemas.ChatCompletionRequest}, + }, + } + if _, _, err := setup.Plugin.PostLLMHook(ctx, res, nil); err != nil { + t.Fatalf("PostLLMHook failed: %v", err) + } + WaitForCache(setup.Plugin) - // Should handle invalid threshold gracefully - _, shortCircuit2, err := setup.Plugin.PreLLMHook(ctx2, request) + ctxRead := newBaseTestContext() + ctxRead.SetValue(CacheKey, keyForTest(t, "test-cache-enabled")) + ctxRead.SetValue(CacheTTLKey, "not-a-duration") + if _, sc, err := setup.Plugin.PreLLMHook(ctxRead, request); err != nil { + t.Fatalf("read PreLLMHook failed: %v", err) + } else if sc == nil { + t.Fatal("expected cache hit — invalid TTL should have fallen back to default and entry should be retrievable") + } + + // Test with invalid threshold type — same expectation: fallback works. + ctx2 := newBaseTestContext() + ctx2.SetValue(CacheKey, keyForTest(t, "test-cache-threshold-edge")) + ctx2.SetValue(CacheThresholdKey, "not-a-float") + + _, sc2, err := setup.Plugin.PreLLMHook(ctx2, request) if err != nil { t.Fatalf("PreLLMHook failed with invalid threshold: %v", err) } + if sc2 != nil { + t.Fatal("Unexpected cache hit on first call with invalid threshold") + } + if _, _, err := setup.Plugin.PostLLMHook(ctx2, res, nil); err != nil { + t.Fatalf("PostLLMHook failed: %v", err) + } + WaitForCache(setup.Plugin) - if shortCircuit2 != nil { - t.Fatal("Unexpected cache hit with invalid threshold") + ctx2Read := newBaseTestContext() + ctx2Read.SetValue(CacheKey, keyForTest(t, "test-cache-threshold-edge")) + ctx2Read.SetValue(CacheThresholdKey, "still-not-a-float") + if _, sc, err := setup.Plugin.PreLLMHook(ctx2Read, request); err != nil { + t.Fatalf("threshold read PreLLMHook failed: %v", err) + } else if sc == nil { + t.Fatal("expected cache hit — invalid threshold should have fallen back to default") } - t.Log("✅ Configuration edge cases test passed!") + t.Log("✅ Configuration edge cases test passed (invalid TTL/threshold fall back gracefully)") } diff --git a/plugins/semanticcache/plugin_nil_content_test.go b/plugins/semanticcache/plugin_nil_content_test.go index db34034458..8337beb943 100644 --- a/plugins/semanticcache/plugin_nil_content_test.go +++ b/plugins/semanticcache/plugin_nil_content_test.go @@ -1,6 +1,7 @@ package semanticcache import ( + "strings" "testing" bifrost "github.com/maximhq/bifrost/core" @@ -87,18 +88,33 @@ func TestExtractTextForEmbedding_NilContent(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - // This should not panic - text, hash, err := plugin.extractTextForEmbedding(tt.request) - // We don't care about the error — the important thing is no panic - t.Logf("text=%q, hash=%q, err=%v", text, hash, err) + // Primary contract: must not panic on nil-content messages. + // Secondary: returned text must not contain stringification + // artifacts, and the all-nil case must surface as an error. + text, err := plugin.extractTextForEmbedding(nil, tt.request) + if strings.Contains(text, "") || strings.Contains(text, "%!") { + t.Fatalf("extractTextForEmbedding produced a stringification artifact: %q", text) + } + if tt.name == "ChatRequest where all messages have nil Content" { + if err == nil { + t.Fatalf("expected error when no message has text content, got text=%q", text) + } + if text != "" { + t.Fatalf("expected empty text when all content is nil, got %q", text) + } + } }) } } -func TestPrepareDirectCacheLookup_ResponsesStreamRequest(t *testing.T) { +// TestPreLLMHookSeedsDirectCacheIDForResponsesStream verifies the streaming +// Responses path runs through PreLLMHook → performDirectSearch and stamps a +// deterministic DirectCacheID on the per-request cacheState. +func TestPreLLMHookSeedsDirectCacheIDForResponsesStream(t *testing.T) { plugin := &Plugin{ config: getDefaultTestConfig(), logger: bifrost.NewDefaultLogger(schemas.LogLevelDebug), + store: newDirectFastPathStore(), } req := &schemas.BifrostRequest{ @@ -106,26 +122,32 @@ func TestPrepareDirectCacheLookup_ResponsesStreamRequest(t *testing.T) { ResponsesRequest: CreateStreamingResponsesRequest("Explain cache invalidation", 0.2, 200), } - ctx := CreateContextWithCacheKey("responses-stream-direct") - directID, err := plugin.prepareDirectCacheLookup(ctx, req, "responses-stream-direct") - if err != nil { - t.Fatalf("prepareDirectCacheLookup failed: %v", err) + ctx := CreateContextWithCacheKeyAndType(t, "responses-stream-direct", CacheTypeDirect) + if _, _, err := plugin.PreLLMHook(ctx, req); err != nil { + t.Fatalf("PreLLMHook failed: %v", err) } - if directID == "" { - t.Fatal("expected deterministic direct cache id") + + requestID, _ := ctx.Value(schemas.BifrostContextKeyRequestID).(string) + state := plugin.getCacheState(requestID) + if state == nil { + t.Fatal("expected cache state to be created") } - if got, _ := ctx.Value(requestHashKey).(string); got == "" { - t.Fatal("expected request hash to be stored in context") + if state.DirectCacheID == "" { + t.Fatal("expected DirectCacheID to be populated by direct search") } - if got, _ := ctx.Value(requestParamsHashKey).(string); got == "" { - t.Fatal("expected params hash to be stored in context") + if state.ParamsHash == "" { + t.Fatal("expected ParamsHash to be populated") } } -func TestPrepareDirectCacheLookup_UnsupportedRequestTypeFailsClosed(t *testing.T) { +// TestPreLLMHookFailsClosedForUnsupportedRequestType verifies the plugin +// short-circuits early for unsupported request types and never populates +// state fields that downstream caching logic would read. +func TestPreLLMHookFailsClosedForUnsupportedRequestType(t *testing.T) { plugin := &Plugin{ config: getDefaultTestConfig(), logger: bifrost.NewDefaultLogger(schemas.LogLevelDebug), + store: newDirectFastPathStore(), } req := &schemas.BifrostRequest{ @@ -138,29 +160,36 @@ func TestPrepareDirectCacheLookup_UnsupportedRequestTypeFailsClosed(t *testing.T }, } - ctx := CreateContextWithCacheKey("unsupported-direct") - directID, err := plugin.prepareDirectCacheLookup(ctx, req, "unsupported-direct") - if err == nil { - t.Fatal("expected prepareDirectCacheLookup to reject unsupported request type") - } - if directID != "" { - t.Fatalf("expected no direct cache id, got %q", directID) - } - if got, _ := ctx.Value(requestHashKey).(string); got != "" { - t.Fatalf("expected request hash to remain unset, got %q", got) - } - if got, _ := ctx.Value(requestParamsHashKey).(string); got != "" { - t.Fatalf("expected params hash to remain unset, got %q", got) + ctx := CreateContextWithCacheKey(t, "unsupported-direct") + if _, shortCircuit, err := plugin.PreLLMHook(ctx, req); err != nil || shortCircuit != nil { + t.Fatalf("PreLLMHook unexpected: shortCircuit=%v err=%v", shortCircuit, err) } - if got, _ := ctx.Value(requestStorageIDKey).(string); got != "" { - t.Fatalf("expected storage id to remain unset, got %q", got) + + requestID, _ := ctx.Value(schemas.BifrostContextKeyRequestID).(string) + state := plugin.getCacheState(requestID) + // Unsupported types create the state slot (reset happens up front) but + // never populate the caching fields. + if state != nil { + if state.DirectCacheID != "" { + t.Fatalf("expected DirectCacheID unset, got %q", state.DirectCacheID) + } + if state.ParamsHash != "" { + t.Fatalf("expected ParamsHash unset, got %q", state.ParamsHash) + } + if state.Embeddings != nil { + t.Fatalf("expected Embeddings unset, got %v", state.Embeddings) + } } } +// TestPreLLMHookSkipsUnsupportedCountTokensRequest verifies CountTokensRequest +// (which is not in the supported set) flows through PreLLMHook without +// short-circuiting and without populating cache fields. func TestPreLLMHookSkipsUnsupportedCountTokensRequest(t *testing.T) { plugin := &Plugin{ config: getDefaultTestConfig(), logger: bifrost.NewDefaultLogger(schemas.LogLevelDebug), + store: newDirectFastPathStore(), } req := &schemas.BifrostRequest{ @@ -179,18 +208,7 @@ func TestPreLLMHookSkipsUnsupportedCountTokensRequest(t *testing.T) { }, } - ctx := CreateContextWithCacheKey("count-tokens-test") - ctx.SetValue(requestIDKey, "stale-request-id") - ctx.SetValue(requestStorageIDKey, "stale-storage-id") - ctx.SetValue(requestHashKey, "stale-request-hash") - ctx.SetValue(requestParamsHashKey, "stale-params-hash") - ctx.SetValue(requestModelKey, "stale-model") - ctx.SetValue(requestProviderKey, schemas.OpenAI) - ctx.SetValue(requestEmbeddingKey, []float32{1, 2, 3}) - ctx.SetValue(requestEmbeddingTokensKey, 99) - ctx.SetValue(isCacheHitKey, true) - ctx.SetValue(cacheHitTypeKey, CacheTypeDirect) - + ctx := CreateContextWithCacheKey(t, "count-tokens-test") modifiedReq, shortCircuit, err := plugin.PreLLMHook(ctx, req) if err != nil { t.Fatalf("PreLLMHook failed: %v", err) @@ -201,35 +219,12 @@ func TestPreLLMHookSkipsUnsupportedCountTokensRequest(t *testing.T) { if shortCircuit != nil { t.Fatal("expected no short-circuit for unsupported count tokens request") } - if got, _ := ctx.Value(requestIDKey).(string); got != "" { - t.Fatalf("expected requestIDKey to remain unset, got %q", got) - } - if got, _ := ctx.Value(requestHashKey).(string); got != "" { - t.Fatalf("expected requestHashKey to remain unset, got %q", got) - } - if got, _ := ctx.Value(requestParamsHashKey).(string); got != "" { - t.Fatalf("expected requestParamsHashKey to remain unset, got %q", got) - } - if got, _ := ctx.Value(requestStorageIDKey).(string); got != "" { - t.Fatalf("expected requestStorageIDKey to remain unset, got %q", got) - } - if got, _ := ctx.Value(requestModelKey).(string); got != "" { - t.Fatalf("expected requestModelKey to remain unset, got %q", got) - } - if got, ok := ctx.Value(requestProviderKey).(schemas.ModelProvider); ok && got != "" { - t.Fatalf("expected requestProviderKey to remain unset, got %q", got) - } - if got := ctx.Value(requestEmbeddingKey); got != nil { - t.Fatalf("expected requestEmbeddingKey to remain unset, got %#v", got) - } - if got, ok := ctx.Value(requestEmbeddingTokensKey).(int); ok && got != 0 { - t.Fatalf("expected requestEmbeddingTokensKey to remain unset, got %d", got) - } - if got, ok := ctx.Value(isCacheHitKey).(bool); ok && got { - t.Fatal("expected isCacheHitKey to remain unset") - } - if got, ok := ctx.Value(cacheHitTypeKey).(CacheType); ok && got != "" { - t.Fatalf("expected cacheHitTypeKey to remain unset, got %q", got) + + requestID, _ := ctx.Value(schemas.BifrostContextKeyRequestID).(string) + if state := plugin.getCacheState(requestID); state != nil { + if state.DirectCacheID != "" || state.ParamsHash != "" || state.Embeddings != nil { + t.Fatalf("expected unsupported request to leave state empty, got %+v", state) + } } } @@ -276,9 +271,19 @@ func TestGetNormalizedInputForCaching_NilContent(t *testing.T) { }, } - // This should not panic + // Must not panic, and must return a non-nil filtered messages slice + // of the right element type (we built a ChatCompletionRequest). result := plugin.getNormalizedInputForCaching(request) - t.Logf("result type: %T", result) + if result == nil { + t.Fatal("getNormalizedInputForCaching returned nil for a valid Chat request") + } + msgs, ok := result.([]schemas.ChatMessage) + if !ok { + t.Fatalf("expected []schemas.ChatMessage, got %T", result) + } + if len(msgs) != len(request.ChatRequest.Input) { + t.Fatalf("normalized message count %d differs from input %d (filtering changed unexpectedly)", len(msgs), len(request.ChatRequest.Input)) + } } // createResponsesRequestWithNilContent builds a BifrostResponsesRequest with a nil Content message for testing. diff --git a/plugins/semanticcache/plugin_no_mutation_test.go b/plugins/semanticcache/plugin_no_mutation_test.go new file mode 100644 index 0000000000..d0a65b681f --- /dev/null +++ b/plugins/semanticcache/plugin_no_mutation_test.go @@ -0,0 +1,196 @@ +package semanticcache + +import ( + "context" + "encoding/json" + "os" + "reflect" + "sync" + "testing" + + bifrost "github.com/maximhq/bifrost/core" + "github.com/maximhq/bifrost/core/schemas" + "github.com/maximhq/bifrost/framework/vectorstore" +) + +// requestCapturer is an LLMPlugin that records the request it sees in +// PreLLMHook. Placed AFTER semantic_cache in the plugin chain it observes +// the request post-cache-plugin-mutation; we then assert that nothing +// landed in the request that originated from cache-side normalization +// (lowercase, whitespace-trim, system-prompt filtering, etc.). +// +// This complements the in-process unit tests because those exercise the +// helpers that DO normalize (getNormalizedInputForCaching) — what we want +// here is a contract test on the request that flows downstream. +type requestCapturer struct { + mu sync.Mutex + captured *schemas.BifrostRequest +} + +func (p *requestCapturer) GetName() string { return "test-request-capturer" } +func (p *requestCapturer) Cleanup() error { return nil } + +func (p *requestCapturer) PreLLMHook(ctx *schemas.BifrostContext, req *schemas.BifrostRequest) (*schemas.BifrostRequest, *schemas.LLMPluginShortCircuit, error) { + p.mu.Lock() + // Snapshot the request via JSON round-trip so any later mutation by the + // pipeline (none expected, but be defensive) can't retroactively change + // what the test sees. + data, err := json.Marshal(req) + if err == nil { + var snapshot schemas.BifrostRequest + if jerr := json.Unmarshal(data, &snapshot); jerr == nil { + p.captured = &snapshot + } + } + if p.captured == nil { + p.captured = req // fall back to direct reference + } + p.mu.Unlock() + return req, nil, nil +} + +func (p *requestCapturer) PostLLMHook(_ *schemas.BifrostContext, resp *schemas.BifrostResponse, e *schemas.BifrostError) (*schemas.BifrostResponse, *schemas.BifrostError, error) { + return resp, e, nil +} + +// TestCachingDoesNotMutateRequestSentToProvider runs through the full plugin +// pipeline against the real OpenAI API and asserts that nothing the cache +// plugin does internally (text normalization, system-prompt filtering, +// metadata extraction, embedding generation) leaks into the request that +// reaches the provider. +// +// The test is gated on OPENAI_API_KEY because we need a real round-trip; the +// in-process mocker would short-circuit before the request body is finalized. +func TestCachingDoesNotMutateRequestSentToProvider(t *testing.T) { + if testing.Short() { + t.Skip("skipping real-LLM test in -short mode") + } + if os.Getenv("OPENAI_API_KEY") == "" { + t.Skip("OPENAI_API_KEY not set; needed for live LLM contract test") + } + t.Parallel() + + // Stand up the cache plugin against the shared Weaviate test namespace, + // same as the rest of the integration suite. + logger := bifrost.NewDefaultLogger(schemas.LogLevelError) + store, err := vectorstore.NewVectorStore(context.Background(), &vectorstore.Config{ + Type: vectorstore.VectorStoreTypeWeaviate, + Config: getWeaviateConfigFromEnv(), + Enabled: true, + }, logger) + if err != nil { + t.Skipf("Weaviate not available: %v", err) + } + cfg := &Config{ + Provider: schemas.OpenAI, + EmbeddingModel: "text-embedding-3-small", + Dimension: 1536, + Threshold: 0.8, + ConversationHistoryThreshold: DefaultConversationHistoryThreshold, + VectorStoreNamespace: SharedTestNamespace, + } + if err := ensureSharedTestNamespace(context.Background(), store, cfg.Dimension); err != nil { + t.Fatalf("ensureSharedTestNamespace: %v", err) + } + cachePlugin, err := Init(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), cfg, logger, store) + if err != nil { + t.Fatalf("cache plugin Init: %v", err) + } + + capturer := &requestCapturer{} + + // Real OpenAI provider, no mocker — the request must travel end-to-end. + bctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) + client, err := bifrost.Init(bctx, schemas.BifrostConfig{ + Account: &BaseAccount{}, + // Order matters: cache runs first, capturer second so it sees the + // request as it flows out of the cache plugin. + LLMPlugins: []schemas.LLMPlugin{cachePlugin, capturer}, + Logger: logger, + }) + if err != nil { + t.Fatalf("bifrost.Init: %v", err) + } + defer client.Shutdown() + cachePlugin.(*Plugin).SetEmbeddingRequestExecutor(client.EmbeddingRequest) + + // Content carefully chosen to surface normalization if it ever leaks: + // - leading/trailing whitespace (would be stripped by strings.TrimSpace) + // - mixed case (would be lowercased) + // - a system prompt (would be stripped if ExcludeSystemPrompt leaked) + systemContent := " RESPOND with a SINGLE word. " + userContent := " Hello, World! PRESERVE_THIS_VERBATIM. " + + chatReq := &schemas.BifrostChatRequest{ + Provider: schemas.OpenAI, + Model: "gpt-4o-mini", + Input: []schemas.ChatMessage{ + { + Role: schemas.ChatMessageRoleSystem, + Content: &schemas.ChatMessageContent{ + ContentStr: bifrost.Ptr(systemContent), + }, + }, + { + Role: schemas.ChatMessageRoleUser, + Content: &schemas.ChatMessageContent{ + ContentStr: bifrost.Ptr(userContent), + }, + }, + }, + Params: &schemas.ChatParameters{ + Temperature: bifrost.Ptr(0.0), + MaxCompletionTokens: bifrost.Ptr(5), + }, + } + + ctx := newBaseTestContext() + ctx.SetValue(CacheKey, keyForTest(t, "")) + + // Take a JSON snapshot of the original input as the test sent it. + originalJSON, err := json.Marshal(chatReq) + if err != nil { + t.Fatalf("marshal original: %v", err) + } + + if _, llmErr := client.ChatCompletionRequest(ctx, chatReq); llmErr != nil { + // Even if OpenAI errors, the request was already captured by the + // time the provider call fired. Continue with the assertion. + t.Logf("upstream LLM error (expected to still proceed with assertion): %v", llmErr) + } + + capturer.mu.Lock() + captured := capturer.captured + capturer.mu.Unlock() + if captured == nil { + t.Fatal("capturer never recorded a request — pipeline order or plugin wiring is wrong") + } + + // 1) The chat input the provider saw must be byte-for-byte identical to + // what the caller passed in. + capturedJSON, err := json.Marshal(captured.ChatRequest) + if err != nil { + t.Fatalf("marshal captured: %v", err) + } + var origMap, capMap map[string]any + _ = json.Unmarshal(originalJSON, &origMap) + _ = json.Unmarshal(capturedJSON, &capMap) + if !reflect.DeepEqual(origMap["input"], capMap["input"]) { + t.Fatalf("chat input mutated by cache plugin\noriginal: %s\ncaptured: %s", originalJSON, capturedJSON) + } + + // 2) Belt-and-suspenders: explicit spot checks on the fields most likely + // to be mangled by normalization regressions, with clear failure messages. + if len(captured.ChatRequest.Input) != len(chatReq.Input) { + t.Fatalf("system prompt was filtered out: captured=%d messages, original=%d", len(captured.ChatRequest.Input), len(chatReq.Input)) + } + if got := *captured.ChatRequest.Input[0].Content.ContentStr; got != systemContent { + t.Fatalf("system content was modified: got %q, want %q", got, systemContent) + } + if got := *captured.ChatRequest.Input[1].Content.ContentStr; got != userContent { + t.Fatalf("user content was modified: got %q, want %q", got, userContent) + } + if captured.ChatRequest.Input[0].Role != schemas.ChatMessageRoleSystem { + t.Fatalf("system role was rewritten: got %q", captured.ChatRequest.Input[0].Role) + } +} diff --git a/plugins/semanticcache/plugin_no_store_test.go b/plugins/semanticcache/plugin_no_store_test.go index 7e9ab296c2..aef75171ff 100644 --- a/plugins/semanticcache/plugin_no_store_test.go +++ b/plugins/semanticcache/plugin_no_store_test.go @@ -8,17 +8,18 @@ import ( // TestCacheNoStoreBasicFunctionality tests that CacheNoStoreKey prevents caching func TestCacheNoStoreBasicFunctionality(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() testRequest := CreateBasicChatRequest("What is artificial intelligence?", 0.7, 100) // Test 1: Normal caching (control test) - ctx1 := CreateContextWithCacheKey("test-no-store-control") + ctx1 := CreateContextWithCacheKey(t, "test-no-store-control") t.Log("Making normal request (should be cached)...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) // Fresh request @@ -37,11 +38,11 @@ func TestCacheNoStoreBasicFunctionality(t *testing.T) { AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}, "direct") // Should be cached // Test 2: NoStore = true (should not cache) - ctx2 := CreateContextWithCacheKeyAndNoStore("test-no-store-disabled", true) + ctx2 := CreateContextWithCacheKeyAndNoStore(t, "test-no-store-disabled", true) t.Log("Making request with CacheNoStoreKey=true (should not be cached)...") response3, err3 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err3 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err3) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}) // Fresh request @@ -51,16 +52,16 @@ func TestCacheNoStoreBasicFunctionality(t *testing.T) { t.Log("Verifying no-store request was not cached...") response4, err4 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err4 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err4) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response4}) // Should still be fresh (not cached) // Test 3: NoStore = false (should cache normally) - ctx3 := CreateContextWithCacheKeyAndNoStore("test-no-store-enabled", false) + ctx3 := CreateContextWithCacheKeyAndNoStore(t, "test-no-store-enabled", false) t.Log("Making request with CacheNoStoreKey=false (should be cached)...") response5, err5 := setup.Client.ChatCompletionRequest(ctx3, testRequest) if err5 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err5) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response5}) // Fresh request @@ -79,6 +80,7 @@ func TestCacheNoStoreBasicFunctionality(t *testing.T) { // TestCacheNoStoreWithDifferentRequestTypes tests NoStore with various request types func TestCacheNoStoreWithDifferentRequestTypes(t *testing.T) { + t.Parallel() t.Skip("Skipping Embedding Tests") setup := NewTestSetup(t) @@ -86,12 +88,12 @@ func TestCacheNoStoreWithDifferentRequestTypes(t *testing.T) { // Test with chat completion chatRequest := CreateBasicChatRequest("Test no-store with chat", 0.7, 50) - ctx1 := CreateContextWithCacheKeyAndNoStore("test-no-store-chat", true) + ctx1 := CreateContextWithCacheKeyAndNoStore(t, "test-no-store-chat", true) t.Log("Testing no-store with chat completion...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, chatRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) @@ -100,18 +102,18 @@ func TestCacheNoStoreWithDifferentRequestTypes(t *testing.T) { // Verify not cached response2, err2 := setup.Client.ChatCompletionRequest(ctx1, chatRequest) if err2 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err2) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}) // Should not be cached // Test with embedding request embeddingRequest := CreateEmbeddingRequest([]string{"Test no-store with embeddings"}) - ctx2 := CreateContextWithCacheKeyAndNoStore("test-no-store-embedding", true) + ctx2 := CreateContextWithCacheKeyAndNoStore(t, "test-no-store-embedding", true) t.Log("Testing no-store with embedding request...") response3, err3 := setup.Client.EmbeddingRequest(ctx2, embeddingRequest) if err3 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err3) } AssertNoCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response3}) @@ -120,7 +122,7 @@ func TestCacheNoStoreWithDifferentRequestTypes(t *testing.T) { // Verify not cached response4, err4 := setup.Client.EmbeddingRequest(ctx2, embeddingRequest) if err4 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err4) } AssertNoCacheHit(t, &schemas.BifrostResponse{EmbeddingResponse: response4}) // Should not be cached @@ -129,6 +131,7 @@ func TestCacheNoStoreWithDifferentRequestTypes(t *testing.T) { // TestCacheNoStoreWithConversationHistory tests NoStore with conversation context func TestCacheNoStoreWithConversationHistory(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() @@ -141,12 +144,12 @@ func TestCacheNoStoreWithConversationHistory(t *testing.T) { request := CreateConversationRequest(messages, 0.7, 100) // Test with no-store enabled - ctx := CreateContextWithCacheKeyAndNoStore("test-no-store-conversation", true) + ctx := CreateContextWithCacheKeyAndNoStore(t, "test-no-store-conversation", true) t.Log("Testing no-store with conversation history...") response1, err1 := setup.Client.ChatCompletionRequest(ctx, request) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) @@ -155,7 +158,7 @@ func TestCacheNoStoreWithConversationHistory(t *testing.T) { // Verify not cached (same conversation should not hit cache) response2, err2 := setup.Client.ChatCompletionRequest(ctx, request) if err2 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err2) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}) // Should not be cached due to no-store @@ -164,20 +167,21 @@ func TestCacheNoStoreWithConversationHistory(t *testing.T) { // TestCacheNoStoreWithCacheTypes tests NoStore interaction with CacheTypeKey func TestCacheNoStoreWithCacheTypes(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() testRequest := CreateBasicChatRequest("Test no-store with cache types", 0.7, 50) // Test no-store with direct cache type - ctx1 := CreateContextWithCacheKey("test-no-store-cache-types") + ctx1 := CreateContextWithCacheKey(t, "test-no-store-cache-types") ctx1 = ctx1.WithValue(CacheNoStoreKey, true) ctx1 = ctx1.WithValue(CacheTypeKey, CacheTypeDirect) t.Log("Testing no-store with CacheTypeKey=direct...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) @@ -186,19 +190,19 @@ func TestCacheNoStoreWithCacheTypes(t *testing.T) { // Should not be cached response2, err2 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err2 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err2) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}) // No-store should override cache type // Test no-store with semantic cache type - ctx2 := CreateContextWithCacheKey("test-no-store-cache-types") + ctx2 := CreateContextWithCacheKey(t, "test-no-store-cache-types") ctx2 = ctx2.WithValue(CacheNoStoreKey, true) ctx2 = ctx2.WithValue(CacheTypeKey, CacheTypeSemantic) t.Log("Testing no-store with CacheTypeKey=semantic...") response3, err3 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err3 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err3) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}) @@ -207,7 +211,7 @@ func TestCacheNoStoreWithCacheTypes(t *testing.T) { // Should not be cached response4, err4 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err4 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err4) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response4}) // No-store should override cache type @@ -216,19 +220,20 @@ func TestCacheNoStoreWithCacheTypes(t *testing.T) { // TestCacheNoStoreErrorHandling tests error scenarios with NoStore func TestCacheNoStoreErrorHandling(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() testRequest := CreateBasicChatRequest("Test no-store error handling", 0.7, 50) // Test with invalid no-store value (non-boolean) - ctx1 := CreateContextWithCacheKey("test-no-store-errors") + ctx1 := CreateContextWithCacheKey(t, "test-no-store-errors") ctx1 = ctx1.WithValue(CacheNoStoreKey, "invalid") t.Log("Testing no-store with invalid value (should cache normally)...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) @@ -246,13 +251,13 @@ func TestCacheNoStoreErrorHandling(t *testing.T) { AssertCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}, "direct") // Should be cached (invalid value ignored) // Test with nil value (should cache normally) - ctx2 := CreateContextWithCacheKey("test-no-store-nil") + ctx2 := CreateContextWithCacheKey(t, "test-no-store-nil") ctx2 = ctx2.WithValue(CacheNoStoreKey, nil) t.Log("Testing no-store with nil value (should cache normally)...") response3, err3 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err3 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err3) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response3}) @@ -270,24 +275,25 @@ func TestCacheNoStoreErrorHandling(t *testing.T) { // TestCacheNoStoreReadButNoWrite tests that NoStore allows reading cache but prevents writing func TestCacheNoStoreReadButNoWrite(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() testRequest := CreateBasicChatRequest("Describe Isaac Newton's three laws of motion", 0.7, 50) // Step 1: Cache a response normally - ctx1 := CreateContextWithCacheKey("test-no-store-read") + ctx1 := CreateContextWithCacheKey(t, "test-no-store-read") t.Log("Caching response normally...") response1, err1 := setup.Client.ChatCompletionRequest(ctx1, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) WaitForCache(setup.Plugin) // Step 2: Try to read with no-store enabled (should still read from cache) - ctx2 := CreateContextWithCacheKeyAndNoStore("test-no-store-read", true) + ctx2 := CreateContextWithCacheKeyAndNoStore(t, "test-no-store-read", true) t.Log("Reading with no-store enabled (should still hit cache for reads)...") response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) if err2 != nil { diff --git a/plugins/semanticcache/plugin_normalization_test.go b/plugins/semanticcache/plugin_normalization_test.go index a2bbe68aec..a2c90b1666 100644 --- a/plugins/semanticcache/plugin_normalization_test.go +++ b/plugins/semanticcache/plugin_normalization_test.go @@ -9,6 +9,7 @@ import ( // TestTextNormalizationDirectCache tests that text normalization works correctly // for direct cache (hash-based) matching across all input types func TestTextNormalizationDirectCache(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() @@ -22,7 +23,7 @@ func TestTextNormalizationDirectCache(t *testing.T) { } func testChatCompletionNormalization(t *testing.T, setup *TestSetup) { - ctx := CreateContextWithCacheKey("test-chat-normalization") + ctx := CreateContextWithCacheKey(t, "test-chat-normalization") // Test cases with different case and whitespace variations testCases := []struct { @@ -93,7 +94,10 @@ func testChatCompletionNormalization(t *testing.T, setup *TestSetup) { t.Logf("Making first request with user: '%s', system: '%s'", testCases[0].userMsg, testCases[0].systemMsg) response1, err1 := setup.Client.ChatCompletionRequest(ctx, requests[0]) if err1 != nil { - return // Test will be skipped by retry function + if isTransientUpstreamError(err1) { + t.Skipf("transient upstream error, skipping test: %v", err1) + } + t.Fatalf("upstream request failed: %v", err1) } if response1 == nil || len(response1.Choices) == 0 { @@ -124,7 +128,7 @@ func testChatCompletionNormalization(t *testing.T, setup *TestSetup) { } func testSpeechNormalization(t *testing.T, setup *TestSetup) { - ctx := CreateContextWithCacheKey("test-speech-normalization") + ctx := CreateContextWithCacheKey(t, "test-speech-normalization") // Test cases with different case and whitespace variations for speech input testCases := []struct { @@ -151,7 +155,10 @@ func testSpeechNormalization(t *testing.T, setup *TestSetup) { t.Logf("Making first speech request with: '%s'", testCases[0].input) response1, err1 := setup.Client.SpeechRequest(ctx, requests[0]) if err1 != nil { - return // Test will be skipped by retry function + if isTransientUpstreamError(err1) { + t.Skipf("transient upstream error, skipping test: %v", err1) + } + t.Fatalf("upstream request failed: %v", err1) } if response1 == nil { @@ -183,10 +190,11 @@ func testSpeechNormalization(t *testing.T, setup *TestSetup) { // TestChatCompletionContentBlocksNormalization tests normalization for content blocks func TestChatCompletionContentBlocksNormalization(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-content-blocks-normalization") + ctx := CreateContextWithCacheKey(t, "test-content-blocks-normalization") // Test cases with content blocks having different text normalization testCases := []struct { @@ -245,7 +253,10 @@ func TestChatCompletionContentBlocksNormalization(t *testing.T) { t.Logf("Making first request with content blocks: %v", testCases[0].textBlocks) response1, err1 := setup.Client.ChatCompletionRequest(ctx, requests[0]) if err1 != nil { - return // Test will be skipped by retry function + if isTransientUpstreamError(err1) { + t.Skipf("transient upstream error, skipping test: %v", err1) + } + t.Fatalf("upstream request failed: %v", err1) } if response1 == nil || len(response1.Choices) == 0 { @@ -277,17 +288,21 @@ func TestChatCompletionContentBlocksNormalization(t *testing.T) { // TestNormalizationWithSemanticCache tests that normalization works with semantic cache as well func TestNormalizationWithSemanticCache(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-normalization-semantic") + ctx := CreateContextWithCacheKey(t, "test-normalization-semantic") // Make first request with original text originalRequest := CreateBasicChatRequest("What is Machine Learning?", 0.5, 50) t.Log("Making first request with original text...") response1, err1 := setup.Client.ChatCompletionRequest(ctx, originalRequest) if err1 != nil { - return // Test will be skipped by retry function + if isTransientUpstreamError(err1) { + t.Skipf("transient upstream error, skipping test: %v", err1) + } + t.Fatalf("upstream request failed: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response1}) diff --git a/plugins/semanticcache/plugin_paths_test.go b/plugins/semanticcache/plugin_paths_test.go new file mode 100644 index 0000000000..a1de790276 --- /dev/null +++ b/plugins/semanticcache/plugin_paths_test.go @@ -0,0 +1,564 @@ +package semanticcache + +import ( + "context" + "encoding/json" + "reflect" + "strings" + "sync" + "sync/atomic" + "testing" + "time" + + bifrost "github.com/maximhq/bifrost/core" + "github.com/maximhq/bifrost/core/schemas" + "github.com/maximhq/bifrost/framework/vectorstore" +) + +// ----------------------------------------------------------------------------- +// PostLLMHook error path +// ----------------------------------------------------------------------------- + +func TestPostLLMHook_SkipsOnBifrostError(t *testing.T) { + store := newObservableStore() + plugin := newTestPlugin(t, store) + + ctx := newBaseTestContext() + ctx.SetValue(CacheKey, keyForTest(t, "")) + + // Drive a normal PreLLMHook so cacheState exists. + req := &schemas.BifrostRequest{ + RequestType: schemas.ChatCompletionRequest, + ChatRequest: CreateBasicChatRequest("hello", 0.7, 50), + } + if _, _, err := plugin.PreLLMHook(ctx, req); err != nil { + t.Fatalf("PreLLMHook failed: %v", err) + } + + // Pass a non-nil bifrost error to PostLLMHook. + bifrostErr := &schemas.BifrostError{ + Error: &schemas.ErrorField{Message: "upstream blew up"}, + } + res := &schemas.BifrostResponse{ + ChatResponse: &schemas.BifrostChatResponse{ + ExtraFields: schemas.BifrostResponseExtraFields{RequestType: schemas.ChatCompletionRequest}, + }, + } + if _, _, err := plugin.PostLLMHook(ctx, res, bifrostErr); err != nil { + t.Fatalf("PostLLMHook failed: %v", err) + } + plugin.WaitForPendingOperations() + + store.mu.Lock() + defer store.mu.Unlock() + if len(store.addIDs) != 0 { + t.Fatalf("expected zero cache writes on error response, got %d", len(store.addIDs)) + } +} + +// ----------------------------------------------------------------------------- +// shouldSkipCacheWrite paths +// +// shouldSkipCacheWrite gates only the cache WRITE — cache_debug telemetry is +// stamped before this is consulted (see PostLLMHook). The cache-hit replay +// case is handled separately as an early return in PostLLMHook and is not +// exercised here. +// ----------------------------------------------------------------------------- + +func TestShouldSkipCacheWrite_LargePayloadMode(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + + ctx := newBaseTestContext() + ctx.SetValue(schemas.BifrostContextKeyLargePayloadMode, true) + + if !plugin.shouldSkipCacheWrite(ctx) { + t.Fatal("expected LargePayloadMode to skip the cache write") + } +} + +func TestShouldSkipCacheWrite_LargeResponseMode(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + + ctx := newBaseTestContext() + ctx.SetValue(schemas.BifrostContextKeyLargeResponseMode, true) + + if !plugin.shouldSkipCacheWrite(ctx) { + t.Fatal("expected LargeResponseMode to skip the cache write") + } +} + +func TestShouldSkipCacheWrite_NoStoreFlag(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + + ctx := newBaseTestContext() + ctx.SetValue(CacheNoStoreKey, true) + + if !plugin.shouldSkipCacheWrite(ctx) { + t.Fatal("expected CacheNoStoreKey=true to skip the cache write") + } +} + +func TestShouldSkipCacheWrite_DefaultIsFalse(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + if plugin.shouldSkipCacheWrite(newBaseTestContext()) { + t.Fatal("expected default context to allow the cache write") + } +} + +// ----------------------------------------------------------------------------- +// Init validation +// ----------------------------------------------------------------------------- + +func TestInit_RejectsNilConfig(t *testing.T) { + if _, err := Init(context.Background(), nil, bifrost.NewDefaultLogger(schemas.LogLevelError), newObservableStore()); err == nil { + t.Fatal("expected error for nil config") + } +} + +func TestInit_RejectsNilStore(t *testing.T) { + cfg := &Config{Provider: schemas.OpenAI, EmbeddingModel: "text-embedding-3-small", Dimension: 1536} + if _, err := Init(context.Background(), cfg, bifrost.NewDefaultLogger(schemas.LogLevelError), nil); err == nil { + t.Fatal("expected error for nil store") + } +} + +func TestInit_RejectsNegativeDimension(t *testing.T) { + cfg := &Config{Dimension: -1} + if _, err := Init(context.Background(), cfg, bifrost.NewDefaultLogger(schemas.LogLevelError), newObservableStore()); err == nil || !strings.Contains(err.Error(), "dimension") { + t.Fatalf("expected dimension error, got %v", err) + } +} + +func TestInit_RejectsZeroDimensionWithProvider(t *testing.T) { + cfg := &Config{Provider: schemas.OpenAI, EmbeddingModel: "text-embedding-3-small", Dimension: 0} + if _, err := Init(context.Background(), cfg, bifrost.NewDefaultLogger(schemas.LogLevelError), newObservableStore()); err == nil || !strings.Contains(err.Error(), "dimension") { + t.Fatalf("expected dimension error when provider set with zero dimension, got %v", err) + } +} + +func TestInit_AllowsDirectOnlyMode(t *testing.T) { + // Provider="" + Dimension=1 is the documented direct-only mode. + cfg := &Config{Dimension: 1} + plugin, err := Init(context.Background(), cfg, bifrost.NewDefaultLogger(schemas.LogLevelError), newObservableStore()) + if err != nil { + t.Fatalf("expected direct-only mode to init successfully, got %v", err) + } + if plugin == nil { + t.Fatal("expected non-nil plugin in direct-only mode") + } + _ = plugin.Cleanup() +} + +// ----------------------------------------------------------------------------- +// PreLLMHook fallback when embedding executor missing +// ----------------------------------------------------------------------------- + +func TestPreLLMHook_FallsBackToDirectWhenExecutorMissing(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + // Intentionally do NOT set plugin.embeddingRequestExecutor. + + req := &schemas.BifrostRequest{ + RequestType: schemas.ChatCompletionRequest, + ChatRequest: CreateBasicChatRequest("hello", 0.7, 50), + } + ctx := CreateContextWithCacheKey(t, "") + + // PreLLMHook should not error, should not panic, and direct search should + // still populate state.DirectCacheID. + _, sc, err := plugin.PreLLMHook(ctx, req) + if err != nil { + t.Fatalf("PreLLMHook failed: %v", err) + } + if sc != nil { + t.Fatalf("expected miss (empty store), got short-circuit %+v", sc) + } + + requestID, _ := ctx.Value(schemas.BifrostContextKeyRequestID).(string) + state := plugin.getCacheState(requestID) + if state == nil || state.DirectCacheID == "" { + t.Fatal("expected DirectCacheID populated even without embedding executor") + } + if state.Embeddings != nil { + t.Fatalf("expected no embedding generated when executor missing, got %v", state.Embeddings) + } +} + +// ----------------------------------------------------------------------------- +// Expired-entry full lifecycle +// ----------------------------------------------------------------------------- + +func TestExpiredEntry_DetectedAndDeleted(t *testing.T) { + store := newObservableStore() + plugin := newTestPlugin(t, store) + + // Plant an already-expired entry under a deterministic ID. + expiredID := "expired-id-1" + chunkJSON, _ := json.Marshal(&schemas.BifrostResponse{ + ChatResponse: &schemas.BifrostChatResponse{}, + }) + store.chunks[expiredID] = vectorstore.SearchResult{ + ID: expiredID, + Properties: map[string]interface{}{ + "response": string(chunkJSON), + "expires_at": time.Now().Add(-1 * time.Minute).Unix(), + }, + } + + req := &schemas.BifrostRequest{ + RequestType: schemas.ChatCompletionRequest, + ChatRequest: CreateBasicChatRequest("hi", 0.7, 50), + } + ctx := newBaseTestContext() + state := &cacheState{} + + sc, err := plugin.buildResponseFromResult( + ctx, state, req, + store.chunks[expiredID], + CacheTypeDirect, nil, nil, + ) + if err != nil { + t.Fatalf("buildResponseFromResult failed: %v", err) + } + if sc != nil { + t.Fatal("expected expired entry to surface as a miss (nil short-circuit)") + } + + // The async delete is tracked on writersWg, so this drain must observe it. + plugin.WaitForPendingOperations() + + store.mu.Lock() + defer store.mu.Unlock() + found := false + for _, id := range store.deleteIDs { + if id == expiredID { + found = true + break + } + } + if !found { + t.Fatalf("expected expired entry %q to be deleted, got delete log %v", expiredID, store.deleteIDs) + } +} + +// ----------------------------------------------------------------------------- +// WebSocketResponsesRequest support +// ----------------------------------------------------------------------------- + +func TestIsSemanticCacheSupportedRequestType_WebSocket(t *testing.T) { + if !isSemanticCacheSupportedRequestType(schemas.WebSocketResponsesRequest) { + t.Fatal("WebSocketResponsesRequest should be supported") + } +} + +// ----------------------------------------------------------------------------- +// UnmarshalJSON rejection paths +// ----------------------------------------------------------------------------- + +func TestUnmarshalJSON_RejectsUnsupportedTTLType(t *testing.T) { + var c Config + if err := c.UnmarshalJSON([]byte(`{"provider":"openai","ttl":true}`)); err == nil { + t.Fatal("expected error for boolean TTL") + } +} + +func TestUnmarshalJSON_RejectsNegativeTTL(t *testing.T) { + var c Config + if err := c.UnmarshalJSON([]byte(`{"provider":"openai","ttl":-5}`)); err == nil || !strings.Contains(err.Error(), "non-negative") { + t.Fatalf("expected non-negative TTL error, got %v", err) + } +} + +func TestUnmarshalJSON_RejectsMalformedJSON(t *testing.T) { + var c Config + if err := c.UnmarshalJSON([]byte(`{not valid json`)); err == nil { + t.Fatal("expected error for malformed JSON") + } +} + +func TestUnmarshalJSON_RejectsBadDurationString(t *testing.T) { + var c Config + if err := c.UnmarshalJSON([]byte(`{"provider":"openai","ttl":"forever"}`)); err == nil { + t.Fatal("expected error for unparseable duration string") + } +} + +// ----------------------------------------------------------------------------- +// Stream replay cancellation variants +// ----------------------------------------------------------------------------- + +func TestStreamReplay_CancelImmediately(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + chunk := `{"chat_response":{"choices":[]}}` + streamArray := []string{chunk, chunk, chunk} + + req := &schemas.BifrostRequest{ + RequestType: schemas.ChatCompletionStreamRequest, + ChatRequest: CreateBasicChatRequest("hi", 0.7, 50), + } + ctx := newBaseTestContext() + state := &cacheState{} + + sc, err := plugin.buildStreamingResponseFromResult( + ctx, state, req, + vectorstore.SearchResult{ID: "stream-1"}, + streamArray, CacheTypeSemantic, nil, nil, nil, + ) + if err != nil { + t.Fatalf("buildStreamingResponseFromResult failed: %v", err) + } + ctx.Cancel() // cancel before reading any chunks + + // Channel must close within a short window. + timeout := time.After(2 * time.Second) + for { + select { + case _, ok := <-sc.Stream: + if !ok { + return // channel closed cleanly + } + case <-timeout: + t.Fatal("replay goroutine did not exit after immediate cancel") + } + } +} + +func TestStreamReplay_FullDrain(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + chunk := `{"chat_response":{"choices":[]}}` + streamArray := []string{chunk, chunk, chunk} + + req := &schemas.BifrostRequest{ + RequestType: schemas.ChatCompletionStreamRequest, + ChatRequest: CreateBasicChatRequest("hi", 0.7, 50), + } + ctx := newBaseTestContext() + state := &cacheState{} + + sc, err := plugin.buildStreamingResponseFromResult( + ctx, state, req, + vectorstore.SearchResult{ID: "stream-2"}, + streamArray, CacheTypeSemantic, nil, nil, nil, + ) + if err != nil { + t.Fatalf("buildStreamingResponseFromResult failed: %v", err) + } + + count := 0 + for chunk := range sc.Stream { + if chunk == nil { + t.Fatal("received nil chunk") + } + count++ + } + if count != len(streamArray) { + t.Fatalf("expected %d chunks, got %d", len(streamArray), count) + } +} + +// ----------------------------------------------------------------------------- +// Plugin-log emission on failure paths (ctx.Log) +// ----------------------------------------------------------------------------- + +// scopedTestContext returns a plugin-scoped BifrostContext so ctx.Log entries +// land on the per-request log store and can be inspected via GetPluginLogs. +// In production the framework wraps every plugin hook this way. +func scopedTestContext(t testing.TB, suffix string) *schemas.BifrostContext { + t.Helper() + root := CreateContextWithCacheKey(t, suffix) + name := PluginName + return root.WithPluginScope(&name) +} + +func TestPreLLMHook_EmitsPluginLogOnEmbeddingFailure(t *testing.T) { + store := newObservableStore() + plugin := newTestPlugin(t, store) + plugin.SetEmbeddingRequestExecutor(func(_ *schemas.BifrostContext, _ *schemas.BifrostEmbeddingRequest) (*schemas.BifrostEmbeddingResponse, *schemas.BifrostError) { + return nil, &schemas.BifrostError{Error: &schemas.ErrorField{Message: "rate limit exceeded"}} + }) + + req := &schemas.BifrostRequest{ + RequestType: schemas.ChatCompletionRequest, + ChatRequest: CreateBasicChatRequest("test prompt", 0.7, 50), + } + ctx := scopedTestContext(t, "") + + if _, _, err := plugin.PreLLMHook(ctx, req); err != nil { + t.Fatalf("PreLLMHook failed: %v", err) + } + + logs := ctx.GetPluginLogs() + if len(logs) == 0 { + t.Fatal("expected at least one plugin log entry on embedding failure, got none") + } + var found bool + for _, l := range logs { + if l.PluginName != PluginName { + continue + } + if strings.Contains(l.Message, "semantic search skipped") && strings.Contains(l.Message, "rate limit") { + if l.Level != schemas.LogLevelWarn { + t.Errorf("expected Warn level for embedding failure, got %s", l.Level) + } + found = true + } + } + if !found { + t.Fatalf("expected a Warn plugin log mentioning semantic search skipped + the upstream error, got %+v", logs) + } +} + +// pluginLogContains is a small assertion helper: returns true if any log +// entry from PluginName matches the substring at the given level (or any +// level if level is ""). +func pluginLogContains(logs []schemas.PluginLogEntry, level schemas.LogLevel, substr string) bool { + for _, l := range logs { + if l.PluginName != PluginName { + continue + } + if level != "" && l.Level != level { + continue + } + if strings.Contains(l.Message, substr) { + return true + } + } + return false +} + +func TestPreLLMHook_NoDebugLogsOnFlow(t *testing.T) { + // We deliberately do not emit Debug-level plugin logs for normal cache + // flow (hit/miss). cache_debug already conveys that. Only Warn-level + // failure logs should appear on the response. + store := newObservableStore() + plugin := newTestPlugin(t, store) + + req := &schemas.BifrostRequest{ + RequestType: schemas.ChatCompletionRequest, + ChatRequest: CreateBasicChatRequest("first request", 0.7, 50), + } + ctx := scopedTestContext(t, "") + if _, _, err := plugin.PreLLMHook(ctx, req); err != nil { + t.Fatalf("PreLLMHook failed: %v", err) + } + + logs := ctx.GetPluginLogs() + for _, l := range logs { + if l.PluginName != PluginName { + continue + } + if l.Level == schemas.LogLevelDebug { + t.Fatalf("expected no Debug plugin logs on normal flow, got %+v", l) + } + } +} + +func TestResolveCacheTypes_EmitsPluginLogOnInvalidValue(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + ctx := scopedTestContext(t, "") + ctx.SetValue(CacheTypeKey, "not-a-cache-type") // wrong type + + plugin.resolveCacheTypes(ctx) + + logs := ctx.GetPluginLogs() + var found bool + for _, l := range logs { + if l.PluginName == PluginName && strings.Contains(l.Message, "CacheTypeKey is not a CacheType") { + found = true + } + } + if !found { + t.Fatalf("expected plugin log warning about invalid CacheTypeKey, got %+v", logs) + } +} + +// ----------------------------------------------------------------------------- +// generateEmbedding handles all EmbeddingStruct representations +// ----------------------------------------------------------------------------- + +func TestGenerateEmbedding_AcceptsInt8Array(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + plugin.SetEmbeddingRequestExecutor(func(_ *schemas.BifrostContext, _ *schemas.BifrostEmbeddingRequest) (*schemas.BifrostEmbeddingResponse, *schemas.BifrostError) { + return &schemas.BifrostEmbeddingResponse{ + Data: []schemas.EmbeddingData{{ + Embedding: schemas.EmbeddingStruct{ + EmbeddingInt8Array: []int8{-128, -1, 0, 1, 127}, + }, + }}, + }, nil + }) + + ctx := scopedTestContext(t, "") + emb, _, err := plugin.generateEmbedding(ctx, "anything") + if err != nil { + t.Fatalf("generateEmbedding failed for int8 input: %v", err) + } + want := []float32{-128, -1, 0, 1, 127} + if !reflect.DeepEqual(emb, want) { + t.Fatalf("int8 → float32 conversion: want %v, got %v", want, emb) + } +} + +func TestGenerateEmbedding_AcceptsInt32Array(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + plugin.SetEmbeddingRequestExecutor(func(_ *schemas.BifrostContext, _ *schemas.BifrostEmbeddingRequest) (*schemas.BifrostEmbeddingResponse, *schemas.BifrostError) { + return &schemas.BifrostEmbeddingResponse{ + Data: []schemas.EmbeddingData{{ + Embedding: schemas.EmbeddingStruct{ + EmbeddingInt32Array: []int32{0, 100000, -100000}, + }, + }}, + }, nil + }) + + ctx := scopedTestContext(t, "") + emb, _, err := plugin.generateEmbedding(ctx, "anything") + if err != nil { + t.Fatalf("generateEmbedding failed for int32 input: %v", err) + } + want := []float32{0, 100000, -100000} + if !reflect.DeepEqual(emb, want) { + t.Fatalf("int32 → float32 conversion: want %v, got %v", want, emb) + } +} + +// ----------------------------------------------------------------------------- +// Concurrent PreLLMHook on same requestID — last writer wins, no panic +// ----------------------------------------------------------------------------- + +func TestPreLLMHook_ConcurrentSameRequestID(t *testing.T) { + plugin := newTestPlugin(t, newObservableStore()) + + req := &schemas.BifrostRequest{ + RequestType: schemas.ChatCompletionRequest, + ChatRequest: CreateBasicChatRequest("hi", 0.7, 50), + } + + requestID := "shared-request-id" + const N = 8 + var wg sync.WaitGroup + var panics atomic.Int32 + wg.Add(N) + for i := 0; i < N; i++ { + go func() { + defer wg.Done() + defer func() { + if r := recover(); r != nil { + panics.Add(1) + } + }() + ctx := newBaseTestContext() + ctx.SetValue(schemas.BifrostContextKeyRequestID, requestID) + ctx.SetValue(CacheKey, keyForTest(t, "")) + _, _, _ = plugin.PreLLMHook(ctx, req) + }() + } + wg.Wait() + + if panics.Load() != 0 { + t.Fatalf("expected zero panics under concurrent PreLLMHook, got %d", panics.Load()) + } + // State for the shared requestID should exist (one of them won). + if state := plugin.getCacheState(requestID); state == nil { + t.Fatal("expected cache state to exist after concurrent PreLLMHook") + } +} diff --git a/plugins/semanticcache/plugin_responses_test.go b/plugins/semanticcache/plugin_responses_test.go index f7af0580cc..2474ea88c1 100644 --- a/plugins/semanticcache/plugin_responses_test.go +++ b/plugins/semanticcache/plugin_responses_test.go @@ -9,10 +9,11 @@ import ( // TestResponsesAPIBasicFunctionality tests the core caching functionality with Responses API func TestResponsesAPIBasicFunctionality(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-responses-basic") + ctx := CreateContextWithCacheKey(t, "test-responses-basic") // Create test request testRequest := CreateBasicResponsesRequest( @@ -29,7 +30,7 @@ func TestResponsesAPIBasicFunctionality(t *testing.T) { duration1 := time.Since(start1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } if response1 == nil || len(response1.Output) == 0 { @@ -94,10 +95,11 @@ func TestResponsesAPIBasicFunctionality(t *testing.T) { // TestResponsesAPIDifferentParameters tests that different parameters produce different cache entries func TestResponsesAPIDifferentParameters(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-responses-params") + ctx := CreateContextWithCacheKey(t, "test-responses-params") basePrompt := "Explain quantum computing" tests := []struct { @@ -140,7 +142,7 @@ func TestResponsesAPIDifferentParameters(t *testing.T) { // Make first request _, err1 := setup.Client.ResponsesRequest(ctx, tt.request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } WaitForCache(setup.Plugin) @@ -168,17 +170,18 @@ func TestResponsesAPIDifferentParameters(t *testing.T) { // TestResponsesAPISemanticMatching tests semantic similarity matching with Responses API func TestResponsesAPISemanticMatching(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKeyAndType("test-responses-semantic", CacheTypeSemantic) + ctx := CreateContextWithCacheKeyAndType(t, "test-responses-semantic", CacheTypeSemantic) // First request originalRequest := CreateBasicResponsesRequest("What is machine learning?", 0.5, 500) t.Log("Making first Responses request with original text...") response1, err1 := setup.Client.ResponsesRequest(ctx, originalRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ResponsesResponse: response1}) @@ -203,10 +206,11 @@ func TestResponsesAPISemanticMatching(t *testing.T) { // TestResponsesAPIWithInstructions tests caching with system instructions func TestResponsesAPIWithInstructions(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-responses-instructions") + ctx := CreateContextWithCacheKey(t, "test-responses-instructions") // Create request with instructions request1 := CreateResponsesRequestWithInstructions( @@ -219,7 +223,7 @@ func TestResponsesAPIWithInstructions(t *testing.T) { t.Log("Making first Responses request with instructions...") response1, err1 := setup.Client.ResponsesRequest(ctx, request1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ResponsesResponse: response1}) @@ -250,19 +254,20 @@ func TestResponsesAPIWithInstructions(t *testing.T) { // TestResponsesAPICacheExpiration tests TTL functionality for Responses API requests func TestResponsesAPICacheExpiration(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() // Set very short TTL for testing shortTTL := 5 * time.Second - ctx := CreateContextWithCacheKeyAndTTL("test-responses-ttl", shortTTL) + ctx := CreateContextWithCacheKeyAndTTL(t, "test-responses-ttl", shortTTL) responsesRequest := CreateBasicResponsesRequest("TTL test for Responses API", 0.5, 500) t.Log("Making first Responses request with short TTL...") response1, err1 := setup.Client.ResponsesRequest(ctx, responsesRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ResponsesResponse: response1}) @@ -285,7 +290,7 @@ func TestResponsesAPICacheExpiration(t *testing.T) { t.Log("Making third Responses request after TTL expiration...") response3, err3 := setup.Client.ResponsesRequest(ctx, responsesRequest) if err3 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err3) } // Should not be a cache hit since TTL expired AssertNoCacheHit(t, &schemas.BifrostResponse{ResponsesResponse: response3}) @@ -295,39 +300,52 @@ func TestResponsesAPICacheExpiration(t *testing.T) { // TestResponsesAPIWithoutCacheKey tests that Responses requests without cache key are not cached func TestResponsesAPIWithoutCacheKey(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - // Don't set cache key in context - ctx := CreateContextWithCacheKey("") + // Don't set cache key in context. CreateContextWithCacheKey(t, "") would + // still populate CacheKey from t.Name(); using a base context keeps it + // unset so we exercise the cache-disabled path. + ctx := newBaseTestContext() responsesRequest := CreateBasicResponsesRequest("Test Responses without cache key", 0.5, 500) - t.Log("Making Responses request without cache key...") - - response, err := setup.Client.ResponsesRequest(ctx, responsesRequest) + t.Log("Making first Responses request without cache key...") + response1, err := setup.Client.ResponsesRequest(ctx, responsesRequest) if err != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err) } + AssertNoCacheHit(t, &schemas.BifrostResponse{ResponsesResponse: response1}) - // Should not be cached - AssertNoCacheHit(t, &schemas.BifrostResponse{ResponsesResponse: response}) + WaitForCache(setup.Plugin) + + // A second identical request must also miss — proves the first one + // was not silently cached against some default key. + t.Log("Making second identical request — must also miss because nothing was cached...") + ctx2 := newBaseTestContext() + response2, err := setup.Client.ResponsesRequest(ctx2, responsesRequest) + if err != nil { + t.Skipf("upstream request error, skipping test: %v", err) + } + AssertNoCacheHit(t, &schemas.BifrostResponse{ResponsesResponse: response2}) t.Log("✅ Responses requests without cache key are properly not cached") } // TestResponsesAPINoStoreFlag tests that Responses requests with no-store flag are not cached func TestResponsesAPINoStoreFlag(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() responsesRequest := CreateBasicResponsesRequest("Test no-store with Responses API", 0.7, 500) - ctx := CreateContextWithCacheKeyAndNoStore("test-no-store-responses", true) + ctx := CreateContextWithCacheKeyAndNoStore(t, "test-no-store-responses", true) t.Log("Testing no-store with Responses API...") response1, err1 := setup.Client.ResponsesRequest(ctx, responsesRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ResponsesResponse: response1}) @@ -336,79 +354,86 @@ func TestResponsesAPINoStoreFlag(t *testing.T) { // Verify not cached response2, err2 := setup.Client.ResponsesRequest(ctx, responsesRequest) if err2 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err2) } AssertNoCacheHit(t, &schemas.BifrostResponse{ResponsesResponse: response2}) // Should not be cached t.Log("✅ Responses API no-store flag working correctly") } -// TestResponsesAPIStreaming tests streaming Responses API requests +// TestResponsesAPIStreaming tests streaming Responses API caching by warming +// the cache with a streaming request and replaying it with a second identical +// streaming request that must be served from cache. func TestResponsesAPIStreaming(t *testing.T) { - t.Log("Responses streaming not supported yet") - + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-responses-streaming") + ctx := CreateContextWithCacheKey(t, "test-responses-streaming") prompt := "Explain the basics of quantum computing in simple terms" - // Make non-streaming request first - t.Log("Making non-streaming Responses request...") - nonStreamRequest := CreateBasicResponsesRequest(prompt, 0.5, 500) - _, err1 := setup.Client.ResponsesRequest(ctx, nonStreamRequest) + // Warm the cache with a streaming request — the plugin accumulates the + // chunks and stores them on the final chunk. + t.Log("Warming cache with first streaming Responses request...") + streamRequest := CreateStreamingResponsesRequest(prompt, 0.5, 500) + stream1, err1 := setup.Client.ResponsesStreamRequest(ctx, streamRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) + } + chunkCount1 := 0 + for streamMsg := range stream1 { + if streamMsg.BifrostError != nil { + t.Fatalf("Error in first stream: %v", streamMsg.BifrostError) + } + if streamMsg.BifrostResponsesStreamResponse != nil { + chunkCount1++ + } + } + if chunkCount1 == 0 { + t.Fatal("first streaming request produced no chunks") } WaitForCache(setup.Plugin) - // Make streaming request with same prompt and parameters - t.Log("Making streaming Responses request with same prompt...") - streamRequest := CreateStreamingResponsesRequest(prompt, 0.5, 500) - stream, err2 := setup.Client.ResponsesStreamRequest(ctx, streamRequest) + // Second identical streaming request — must be served from cache. We + // require AT LEAST ONE chunk with CacheHit=true (the final chunk gets + // the cache_debug stamp during replay). + t.Log("Replaying — second identical streaming request must serve from cache...") + ctx2 := CreateContextWithCacheKey(t, "test-responses-streaming") + stream2, err2 := setup.Client.ResponsesStreamRequest(ctx2, streamRequest) if err2 != nil { - t.Fatalf("Streaming Responses request failed: %v", err2) + t.Fatalf("Second streaming Responses request failed: %v", err2) } - var streamResponses []schemas.BifrostResponsesStreamResponse - for streamMsg := range stream { + cacheHitFound := false + chunkCount2 := 0 + for streamMsg := range stream2 { if streamMsg.BifrostError != nil { - t.Fatalf("Error in Responses stream: %v", streamMsg.BifrostError) + t.Fatalf("Error in second stream: %v", streamMsg.BifrostError) } if streamMsg.BifrostResponsesStreamResponse != nil { - streamResponses = append(streamResponses, *streamMsg.BifrostResponsesStreamResponse) + chunkCount2++ + if cd := streamMsg.BifrostResponsesStreamResponse.ExtraFields.CacheDebug; cd != nil && cd.CacheHit { + cacheHitFound = true + } } } - - if len(streamResponses) == 0 { - t.Fatal("No streaming responses received") - } - - // Check if any of the streaming responses was served from cache - cacheHitFound := false - for _, resp := range streamResponses { - if resp.ExtraFields.CacheDebug != nil && resp.ExtraFields.CacheDebug.CacheHit { - cacheHitFound = true - break - } + if chunkCount2 == 0 { + t.Fatal("replay produced no chunks") } - if !cacheHitFound { - t.Log("⚠️ No cache hit detected in streaming responses - this could be expected behavior") - } else { - t.Log("✓ Cache hit detected in streaming Responses API") + t.Fatal("expected at least one chunk with CacheDebug.CacheHit=true on streaming replay") } - - t.Log("✅ Streaming Responses API test completed") + t.Log("✅ Streaming Responses API replay served from cache") } // TestResponsesAPIComplexParameters tests complex parameter handling func TestResponsesAPIComplexParameters(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-responses-complex-params") + ctx := CreateContextWithCacheKey(t, "test-responses-complex-params") // Create request with various complex parameters request := CreateBasicResponsesRequest("Test complex parameters", 0.8, 500) @@ -421,7 +446,7 @@ func TestResponsesAPIComplexParameters(t *testing.T) { t.Log("Making first Responses request with complex parameters...") response1, err1 := setup.Client.ResponsesRequest(ctx, request) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } AssertNoCacheHit(t, &schemas.BifrostResponse{ResponsesResponse: response1}) diff --git a/plugins/semanticcache/plugin_streaming_test.go b/plugins/semanticcache/plugin_streaming_test.go index f029564055..7a85717c7f 100644 --- a/plugins/semanticcache/plugin_streaming_test.go +++ b/plugins/semanticcache/plugin_streaming_test.go @@ -9,10 +9,11 @@ import ( // TestStreamingCacheBasicFunctionality tests streaming response caching func TestStreamingCacheBasicFunctionality(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("test-stream-value") + ctx := CreateContextWithCacheKey(t, "test-stream-value") // Create a test streaming request testRequest := CreateStreamingChatRequest( @@ -27,7 +28,7 @@ func TestStreamingCacheBasicFunctionality(t *testing.T) { start1 := time.Now() stream1, err1 := setup.Client.ChatCompletionStreamRequest(ctx, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } var responses1 []schemas.BifrostChatResponse @@ -115,10 +116,11 @@ func TestStreamingCacheBasicFunctionality(t *testing.T) { // TestStreamingVsNonStreaming tests that streaming and non-streaming requests are cached separately func TestStreamingVsNonStreaming(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("stream-vs-non-test") + ctx := CreateContextWithCacheKey(t, "stream-vs-non-test") prompt := "What is the meaning of life?" @@ -127,7 +129,7 @@ func TestStreamingVsNonStreaming(t *testing.T) { nonStreamRequest := CreateBasicChatRequest(prompt, 0.5, 50) nonStreamResponse, err1 := setup.Client.ChatCompletionRequest(ctx, nonStreamRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } WaitForCache(setup.Plugin) @@ -184,10 +186,11 @@ func TestStreamingVsNonStreaming(t *testing.T) { // TestStreamingChunkOrdering tests that cached streaming responses maintain proper chunk ordering func TestStreamingChunkOrdering(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("chunk-order-test") + ctx := CreateContextWithCacheKey(t, "chunk-order-test") // Request that should generate multiple chunks testRequest := CreateStreamingChatRequest( @@ -199,7 +202,7 @@ func TestStreamingChunkOrdering(t *testing.T) { t.Log("Making first streaming request to establish cache...") stream1, err1 := setup.Client.ChatCompletionStreamRequest(ctx, testRequest) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } var originalChunks []schemas.BifrostChatResponse @@ -213,6 +216,9 @@ func TestStreamingChunkOrdering(t *testing.T) { } if len(originalChunks) < 2 { + // Stream chunking is at the provider's discretion — under load OpenAI + // occasionally bundles a short reply into a single delivered chunk. + // Ordering is not testable in that case; skip rather than fail. t.Skipf("Need at least 2 chunks to test ordering, got %d", len(originalChunks)) } @@ -273,10 +279,11 @@ func TestStreamingChunkOrdering(t *testing.T) { // TestSpeechSynthesisStreaming tests speech synthesis streaming caching func TestSpeechSynthesisStreaming(t *testing.T) { + t.Parallel() setup := NewTestSetup(t) defer setup.Cleanup() - ctx := CreateContextWithCacheKey("speech-stream-test") + ctx := CreateContextWithCacheKey(t, "speech-stream-test") // Create speech synthesis request speechRequest := CreateSpeechRequest( @@ -290,7 +297,7 @@ func TestSpeechSynthesisStreaming(t *testing.T) { duration1 := time.Since(start1) if err1 != nil { - return // Test will be skipped by retry function + t.Skipf("upstream request error, skipping test: %v", err1) } if response1 == nil { diff --git a/plugins/semanticcache/plugin_vectorstore_test.go b/plugins/semanticcache/plugin_vectorstore_test.go index f4ac8130f2..3e93cc8baa 100644 --- a/plugins/semanticcache/plugin_vectorstore_test.go +++ b/plugins/semanticcache/plugin_vectorstore_test.go @@ -1,7 +1,6 @@ package semanticcache import ( - "context" "os" "strings" "testing" @@ -47,27 +46,30 @@ func getVectorStoreTestCases() []VectorStoreTestCase { } } -// getDefaultTestConfig returns the default test configuration +// getDefaultTestConfig returns the default test configuration. Mirrors the +// defaults Init applies, which matters for unit tests that construct Plugin +// directly without going through Init. func getDefaultTestConfig() *Config { return &Config{ - Provider: schemas.OpenAI, - EmbeddingModel: "text-embedding-3-small", - Dimension: 1536, - Threshold: 0.8, - CleanUpOnShutdown: true, + Provider: schemas.OpenAI, + EmbeddingModel: "text-embedding-3-small", + Dimension: 1536, + Threshold: 0.8, + ConversationHistoryThreshold: DefaultConversationHistoryThreshold, } } // TestSemanticCache_AllVectorStores_BasicFlow tests the basic cache flow across all vector stores func TestSemanticCache_AllVectorStores_BasicFlow(t *testing.T) { + t.Parallel() for _, tc := range getVectorStoreTestCases() { t.Run(tc.Name, func(t *testing.T) { skipIfNoAPIKey(t, tc.StoreType) setup := NewTestSetupWithVectorStore(t, getDefaultTestConfig(), tc.StoreType) defer setup.Cleanup() - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx.SetValue(CacheKey, "test-"+strings.ToLower(tc.Name)+"-basic") + ctx := newBaseTestContext() + ctx.SetValue(CacheKey, keyForTest(t, "test-"+strings.ToLower(tc.Name)+"-basic")) // Test request request := &schemas.BifrostRequest{ @@ -146,8 +148,8 @@ func TestSemanticCache_AllVectorStores_BasicFlow(t *testing.T) { // Second request - should be a cache hit t.Logf("[%s] Testing second identical request (expecting cache hit)...", tc.Name) - ctx2 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx2.SetValue(CacheKey, "test-"+strings.ToLower(tc.Name)+"-basic") + ctx2 := newBaseTestContext() + ctx2.SetValue(CacheKey, keyForTest(t, "test-"+strings.ToLower(tc.Name)+"-basic")) _, shortCircuit2, err := setup.Plugin.PreLLMHook(ctx2, request) if err != nil { @@ -170,6 +172,7 @@ func TestSemanticCache_AllVectorStores_BasicFlow(t *testing.T) { // TestSemanticCache_AllVectorStores_DirectHashMatch tests direct hash matching across all vector stores func TestSemanticCache_AllVectorStores_DirectHashMatch(t *testing.T) { + t.Parallel() for _, tc := range getVectorStoreTestCases() { t.Run(tc.Name, func(t *testing.T) { skipIfNoAPIKey(t, tc.StoreType) @@ -181,7 +184,7 @@ func TestSemanticCache_AllVectorStores_DirectHashMatch(t *testing.T) { testRunID := uuid.New().String()[:8] cacheKey := "test-" + strings.ToLower(tc.Name) + "-direct-" + testRunID - ctx := CreateContextWithCacheKeyAndType(cacheKey, CacheTypeDirect) + ctx := CreateContextWithCacheKeyAndType(t, cacheKey, CacheTypeDirect) testRequest := CreateBasicChatRequest("Direct hash test for "+tc.Name+" "+testRunID, 0.7, 50) @@ -196,7 +199,7 @@ func TestSemanticCache_AllVectorStores_DirectHashMatch(t *testing.T) { WaitForCache(setup.Plugin) // Second request with direct-only cache type - ctx2 := CreateContextWithCacheKeyAndType(cacheKey, CacheTypeDirect) + ctx2 := CreateContextWithCacheKeyAndType(t, cacheKey, CacheTypeDirect) t.Logf("[%s] Making second request with CacheTypeDirect...", tc.Name) response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) @@ -212,6 +215,7 @@ func TestSemanticCache_AllVectorStores_DirectHashMatch(t *testing.T) { // TestSemanticCache_AllVectorStores_NamespaceIsolation tests that different cache keys are isolated func TestSemanticCache_AllVectorStores_NamespaceIsolation(t *testing.T) { + t.Parallel() for _, tc := range getVectorStoreTestCases() { t.Run(tc.Name, func(t *testing.T) { skipIfNoAPIKey(t, tc.StoreType) @@ -225,7 +229,7 @@ func TestSemanticCache_AllVectorStores_NamespaceIsolation(t *testing.T) { cacheKey2 := "test-" + strings.ToLower(tc.Name) + "-namespace-2-" + testRunID // Cache with first key - ctx1 := CreateContextWithCacheKey(cacheKey1) + ctx1 := CreateContextWithCacheKey(t, cacheKey1) testRequest := CreateBasicChatRequest("Namespace isolation test for "+tc.Name+" "+testRunID, 0.7, 50) t.Logf("[%s] Making request with cache key 1...", tc.Name) @@ -239,7 +243,7 @@ func TestSemanticCache_AllVectorStores_NamespaceIsolation(t *testing.T) { WaitForCache(setup.Plugin) // Try with different cache key - should miss - ctx2 := CreateContextWithCacheKey(cacheKey2) + ctx2 := CreateContextWithCacheKey(t, cacheKey2) t.Logf("[%s] Making same request with different cache key (expecting miss)...", tc.Name) response2, err2 := setup.Client.ChatCompletionRequest(ctx2, testRequest) @@ -251,7 +255,7 @@ func TestSemanticCache_AllVectorStores_NamespaceIsolation(t *testing.T) { AssertNoCacheHit(t, &schemas.BifrostResponse{ChatResponse: response2}) // Try with original key - should hit - ctx3 := CreateContextWithCacheKey(cacheKey1) + ctx3 := CreateContextWithCacheKey(t, cacheKey1) t.Logf("[%s] Making same request with original cache key (expecting hit)...", tc.Name) response3, err3 := setup.Client.ChatCompletionRequest(ctx3, testRequest) @@ -267,14 +271,15 @@ func TestSemanticCache_AllVectorStores_NamespaceIsolation(t *testing.T) { // TestSemanticCache_AllVectorStores_ParameterFiltering tests that different parameters don't share cache func TestSemanticCache_AllVectorStores_ParameterFiltering(t *testing.T) { + t.Parallel() for _, tc := range getVectorStoreTestCases() { t.Run(tc.Name, func(t *testing.T) { skipIfNoAPIKey(t, tc.StoreType) setup := NewTestSetupWithVectorStore(t, getDefaultTestConfig(), tc.StoreType) defer setup.Cleanup() - ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx.SetValue(CacheKey, "test-"+strings.ToLower(tc.Name)+"-params") + ctx := newBaseTestContext() + ctx.SetValue(CacheKey, keyForTest(t, "test-"+strings.ToLower(tc.Name)+"-params")) // First request with temperature=0.7 request1 := &schemas.BifrostRequest{ @@ -342,8 +347,8 @@ func TestSemanticCache_AllVectorStores_ParameterFiltering(t *testing.T) { // Second request with different temperature - should be cache miss t.Logf("[%s] Testing second request with temperature=0.5 (expecting cache miss)...", tc.Name) - ctx2 := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) - ctx2.SetValue(CacheKey, "test-"+strings.ToLower(tc.Name)+"-params") + ctx2 := newBaseTestContext() + ctx2.SetValue(CacheKey, keyForTest(t, "test-"+strings.ToLower(tc.Name)+"-params")) request2 := &schemas.BifrostRequest{ RequestType: schemas.ChatCompletionRequest, @@ -381,6 +386,7 @@ func TestSemanticCache_AllVectorStores_ParameterFiltering(t *testing.T) { // TestSemanticCache_AllVectorStores_EmbeddingRequest tests embedding request caching across all vector stores func TestSemanticCache_AllVectorStores_EmbeddingRequest(t *testing.T) { + t.Parallel() for _, tc := range getVectorStoreTestCases() { t.Run(tc.Name, func(t *testing.T) { skipIfNoAPIKey(t, tc.StoreType) @@ -395,7 +401,7 @@ func TestSemanticCache_AllVectorStores_EmbeddingRequest(t *testing.T) { embeddingRequest := CreateEmbeddingRequest([]string{"Test embedding with " + tc.Name + " " + testRunID}) // Cache first request - ctx1 := CreateContextWithCacheKey(cacheKey) + ctx1 := CreateContextWithCacheKey(t, cacheKey) t.Logf("[%s] Making first embedding request...", tc.Name) response1, err1 := setup.Client.EmbeddingRequest(ctx1, embeddingRequest) if err1 != nil { @@ -407,7 +413,7 @@ func TestSemanticCache_AllVectorStores_EmbeddingRequest(t *testing.T) { WaitForCache(setup.Plugin) // Second request - should be cache hit - ctx2 := CreateContextWithCacheKey(cacheKey) + ctx2 := CreateContextWithCacheKey(t, cacheKey) t.Logf("[%s] Making second embedding request (expecting cache hit)...", tc.Name) response2, err2 := setup.Client.EmbeddingRequest(ctx2, embeddingRequest) if err2 != nil { diff --git a/plugins/semanticcache/search.go b/plugins/semanticcache/search.go index 6e8a2cf6a7..79b4c1b32a 100644 --- a/plugins/semanticcache/search.go +++ b/plugins/semanticcache/search.go @@ -9,89 +9,32 @@ import ( "strings" "time" + "github.com/cespare/xxhash/v2" + "github.com/google/uuid" bifrost "github.com/maximhq/bifrost/core" "github.com/maximhq/bifrost/core/schemas" "github.com/maximhq/bifrost/framework/vectorstore" ) -func (plugin *Plugin) prepareDirectCacheLookup(ctx *schemas.BifrostContext, req *schemas.BifrostRequest, cacheKey string) (string, error) { - hash, err := plugin.generateRequestHash(req) +// performDirectSearch does an O(1) point fetch on the deterministic directCacheID +// derived from (provider, model, cacheKey, request_hash, params_hash). Caller +// supplies the prebuilt metadata + paramsHash so we don't recompute them when +// semantic search runs as well. +func (plugin *Plugin) performDirectSearch(ctx *schemas.BifrostContext, state *cacheState, req *schemas.BifrostRequest, cacheKey string, metadata map[string]interface{}, paramsHash string) (*schemas.LLMPluginShortCircuit, error) { + requestHash, err := plugin.generateRequestHash(req, metadata) if err != nil { - return "", fmt.Errorf("failed to generate request hash: %w", err) + return nil, fmt.Errorf("failed to generate request hash: %w", err) } - plugin.logger.Debug(PluginLoggerPrefix + " Generated Hash for Request: " + hash) - - paramsHash, err := plugin.computeRequestParamsHash(req) - if err != nil { - return "", fmt.Errorf("failed to compute direct lookup params hash: %w", err) - } - - ctx.SetValue(requestHashKey, hash) - ctx.SetValue(requestParamsHashKey, paramsHash) - - provider, model, _ := req.GetRequestFields() - directCacheID := plugin.generateDirectCacheID(provider, model, cacheKey, hash, paramsHash) - - return directCacheID, nil -} - -func (plugin *Plugin) performLegacyDirectSearch(ctx *schemas.BifrostContext, req *schemas.BifrostRequest, cacheKey string) (*schemas.LLMPluginShortCircuit, error) { - hash, _ := ctx.Value(requestHashKey).(string) - paramsHash, _ := ctx.Value(requestParamsHashKey).(string) - provider, model, _ := req.GetRequestFields() - - filters := []vectorstore.Query{ - {Field: "request_hash", Operator: vectorstore.QueryOperatorEqual, Value: hash}, - {Field: "cache_key", Operator: vectorstore.QueryOperatorEqual, Value: cacheKey}, - {Field: "params_hash", Operator: vectorstore.QueryOperatorEqual, Value: paramsHash}, - {Field: "from_bifrost_semantic_cache_plugin", Operator: vectorstore.QueryOperatorEqual, Value: true}, - } - - if plugin.config.CacheByProvider != nil && *plugin.config.CacheByProvider { - filters = append(filters, vectorstore.Query{Field: "provider", Operator: vectorstore.QueryOperatorEqual, Value: string(provider)}) - } - if plugin.config.CacheByModel != nil && *plugin.config.CacheByModel { - filters = append(filters, vectorstore.Query{Field: "model", Operator: vectorstore.QueryOperatorEqual, Value: model}) - } - - plugin.logger.Debug(fmt.Sprintf("%s Searching for legacy direct hash match with %d filters", PluginLoggerPrefix, len(filters))) - - selectFields := append([]string(nil), SelectFields...) - if bifrost.IsStreamRequestType(req.RequestType) { - selectFields = removeField(selectFields, "response") - } else { - selectFields = removeField(selectFields, "stream_chunks") - } - - searchCtx := vectorstore.WithDisableScanFallback(ctx) - var cursor *string - results, _, err := plugin.store.GetAll(searchCtx, plugin.config.VectorStoreNamespace, filters, selectFields, cursor, 1) - if err != nil { - if errors.Is(err, vectorstore.ErrNotFound) || errors.Is(err, vectorstore.ErrQuerySyntax) { - return nil, nil - } - return nil, fmt.Errorf("failed to search for legacy direct hash match: %w", err) - } - - if len(results) == 0 { - plugin.logger.Debug(PluginLoggerPrefix + " No legacy direct hash match found") - return nil, nil - } - - result := results[0] - plugin.logger.Debug(fmt.Sprintf("%s Found legacy direct hash match with ID: %s", PluginLoggerPrefix, result.ID)) - return plugin.buildResponseFromResult(ctx, req, result, CacheTypeDirect, 1.0, 0) -} - -func (plugin *Plugin) performDirectChunkLookup(ctx *schemas.BifrostContext, req *schemas.BifrostRequest, cacheKey string) (*schemas.LLMPluginShortCircuit, error) { - directCacheID, err := plugin.prepareDirectCacheLookup(ctx, req, cacheKey) + directCacheID, err := plugin.generateDirectCacheID(provider, model, cacheKey, requestHash, paramsHash) if err != nil { - return nil, err + return nil, fmt.Errorf("failed to generate direct cache ID: %w", err) } - ctx.SetValue(requestStorageIDKey, directCacheID) + state.DirectCacheID = directCacheID + // All filters (cacheKey, provider, model, requestHash, paramsHash) are + // encoded into directCacheID, so a Get-by-ID is sufficient. result, err := plugin.store.GetChunk(ctx, plugin.config.VectorStoreNamespace, directCacheID) if err != nil { errMsg := strings.ToLower(err.Error()) @@ -99,93 +42,46 @@ func (plugin *Plugin) performDirectChunkLookup(ctx *schemas.BifrostContext, req strings.Contains(errMsg, "not found") || strings.Contains(errMsg, "status code: 404") if isMiss { - plugin.logger.Debug(PluginLoggerPrefix + " No direct chunk match found") return nil, nil } return nil, fmt.Errorf("failed to fetch direct cache chunk: %w", err) } - - plugin.logger.Debug(fmt.Sprintf("%s Found direct chunk match with ID: %s", PluginLoggerPrefix, result.ID)) - return plugin.buildResponseFromResult(ctx, req, result, CacheTypeDirect, 1.0, 0) -} - -func (plugin *Plugin) performDirectSearch(ctx *schemas.BifrostContext, req *schemas.BifrostRequest, cacheKey string) (*schemas.LLMPluginShortCircuit, error) { - shortCircuit, err := plugin.performDirectChunkLookup(ctx, req, cacheKey) - if err != nil { - return nil, err - } - if shortCircuit != nil { - return shortCircuit, nil - } - - return plugin.performLegacyDirectSearch(ctx, req, cacheKey) -} - -// generateEmbeddingsForStorage generates embeddings and stores them in context for PostHook storage. -// This is used when the vector store requires vectors but we're in direct-only cache mode. -// Unlike performSemanticSearch, this function does not perform any search - it only generates -// and stores embeddings so they can be persisted with the cache entry. -func (plugin *Plugin) generateEmbeddingsForStorage(ctx *schemas.BifrostContext, req *schemas.BifrostRequest) error { - // Extract text and metadata for embedding - text, paramsHash, err := plugin.extractTextForEmbedding(req) - if err != nil { - return fmt.Errorf("failed to extract text for embedding: %w", err) - } - - // Generate embedding - embedding, inputTokens, err := plugin.generateEmbedding(ctx, text) - if err != nil { - return fmt.Errorf("failed to generate embedding: %w", err) - } - - // Store embedding and metadata in context for PostHook - ctx.SetValue(requestEmbeddingKey, embedding) - ctx.SetValue(requestEmbeddingTokensKey, inputTokens) - ctx.SetValue(requestParamsHashKey, paramsHash) - - return nil + return plugin.buildResponseFromResult(ctx, state, req, result, CacheTypeDirect, nil, nil) } // performSemanticSearch performs semantic similarity search and returns matching response if found. -func (plugin *Plugin) performSemanticSearch(ctx *schemas.BifrostContext, req *schemas.BifrostRequest, cacheKey string) (*schemas.LLMPluginShortCircuit, error) { - // Extract text and metadata for embedding - text, paramsHash, err := plugin.extractTextForEmbedding(req) +// Caller supplies the prebuilt paramsHash so it isn't recomputed. +func (plugin *Plugin) performSemanticSearch(ctx *schemas.BifrostContext, state *cacheState, req *schemas.BifrostRequest, cacheKey string, paramsHash string) (*schemas.LLMPluginShortCircuit, error) { + text, err := plugin.extractTextForEmbedding(state, req) if err != nil { return nil, fmt.Errorf("failed to extract text for embedding: %w", err) } - // Generate embedding embedding, inputTokens, err := plugin.generateEmbedding(ctx, text) if err != nil { + // Note: silent skip — provider misconfig or transient embedding errors + // fall through to the upstream LLM call. return nil, fmt.Errorf("failed to generate embedding: %w", err) } - // Store embedding and metadata in context for PostLLMHook - ctx.SetValue(requestEmbeddingKey, embedding) - ctx.SetValue(requestEmbeddingTokensKey, inputTokens) - ctx.SetValue(requestParamsHashKey, paramsHash) + state.Embeddings = embedding + state.EmbeddingsInputTokens = inputTokens cacheThreshold := plugin.config.Threshold - - thresholdValue := ctx.Value(CacheThresholdKey) - if thresholdValue != nil { - threshold, ok := thresholdValue.(float64) - if !ok { - plugin.logger.Warn(PluginLoggerPrefix + " Threshold is not a float64, using default threshold") - } else { + if v := ctx.Value(CacheThresholdKey); v != nil { + if threshold, ok := v.(float64); ok { cacheThreshold = threshold + } else { + plugin.logger.Warn("Threshold is not a float64, using default threshold") } } provider, model, _ := req.GetRequestFields() - - // Build strict metadata filters as Query slices (provider, model, and all params) strictFilters := []vectorstore.Query{ {Field: "cache_key", Operator: vectorstore.QueryOperatorEqual, Value: cacheKey}, {Field: "params_hash", Operator: vectorstore.QueryOperatorEqual, Value: paramsHash}, {Field: "from_bifrost_semantic_cache_plugin", Operator: vectorstore.QueryOperatorEqual, Value: true}, } - if plugin.config.CacheByProvider != nil && *plugin.config.CacheByProvider { strictFilters = append(strictFilters, vectorstore.Query{Field: "provider", Operator: vectorstore.QueryOperatorEqual, Value: string(provider)}) } @@ -193,96 +89,175 @@ func (plugin *Plugin) performSemanticSearch(ctx *schemas.BifrostContext, req *sc strictFilters = append(strictFilters, vectorstore.Query{Field: "model", Operator: vectorstore.QueryOperatorEqual, Value: model}) } - plugin.logger.Debug(fmt.Sprintf("%s Performing semantic search with %d metadata filters", PluginLoggerPrefix, len(strictFilters))) - - // Make a full copy so we don't mutate the original backing array - selectFields := append([]string(nil), SelectFields...) - if bifrost.IsStreamRequestType(req.RequestType) { - selectFields = removeField(selectFields, "response") - } else { - selectFields = removeField(selectFields, "stream_chunks") - } - - // For semantic search, we want semantic similarity in content but exact parameter matching + selectFields := selectFieldsForRequest(req.RequestType) results, err := plugin.store.GetNearest(ctx, plugin.config.VectorStoreNamespace, embedding, strictFilters, selectFields, cacheThreshold, 1) if err != nil { return nil, fmt.Errorf("failed to search semantic cache: %w", err) } - if len(results) == 0 { - plugin.logger.Debug(PluginLoggerPrefix + " No semantic match found") return nil, nil } + return plugin.buildResponseFromResult(ctx, state, req, results[0], CacheTypeSemantic, &cacheThreshold, &inputTokens) +} - // Found a semantically similar entry - result := results[0] - plugin.logger.Debug(fmt.Sprintf("%s Found semantic match with ID: %s, Score: %f", PluginLoggerPrefix, result.ID, *result.Score)) +// selectFieldsStream / selectFieldsNonStream are precomputed at package init +// because selectFieldsForRequest is called on every cache lookup. +var ( + selectFieldsStream = filterSelectFields("response") + selectFieldsNonStream = filterSelectFields("stream_chunks") +) - // Build response from cached result - return plugin.buildResponseFromResult(ctx, req, result, CacheTypeSemantic, cacheThreshold, inputTokens) +// filterSelectFields returns SelectFields with the named field removed. Used +// at package init to precompute the per-request projection lists. +func filterSelectFields(skip string) []string { + out := make([]string, 0, len(SelectFields)) + for _, f := range SelectFields { + if f != skip { + out = append(out, f) + } + } + return out } -// buildResponseFromResult constructs a LLMPluginShortCircuit response from a cached VectorEntry result -func (plugin *Plugin) buildResponseFromResult(ctx *schemas.BifrostContext, req *schemas.BifrostRequest, result vectorstore.SearchResult, cacheType CacheType, threshold float64, inputTokens int) (*schemas.LLMPluginShortCircuit, error) { - // Extract response data from the result properties - properties := result.Properties - if properties == nil { - return nil, fmt.Errorf("no properties found in cached result") +// selectFieldsForRequest returns the projection list trimmed to the response +// shape we actually need (single response vs stream chunks). +func selectFieldsForRequest(requestType schemas.RequestType) []string { + if bifrost.IsStreamRequestType(requestType) { + return selectFieldsStream } + return selectFieldsNonStream +} - // Check TTL - if entry has expired, delete it and return cache miss - if expiresAtRaw, exists := properties["expires_at"]; exists && expiresAtRaw != nil { - var expiresAt int64 - var validType bool - switch v := expiresAtRaw.(type) { - case string: - var err error - expiresAt, err = strconv.ParseInt(v, 10, 64) - if err != nil { - validType = false - } else { - validType = true - } - case float64: - expiresAt = int64(v) - validType = true - case int64: - expiresAt = v - validType = true - case int: - expiresAt = int64(v) - validType = true - } - if validType { - currentTime := time.Now().Unix() - if expiresAt < currentTime { - // Entry has expired, delete it asynchronously - go func() { - deleteCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second) - defer cancel() - err := plugin.store.Delete(deleteCtx, plugin.config.VectorStoreNamespace, result.ID) - if err != nil { - plugin.logger.Warn("%s Failed to delete expired entry %s: %v", PluginLoggerPrefix, result.ID, err) - } - }() - // Return nil to indicate cache miss - return nil, nil - } +// generateEmbedding generates an embedding for the given text using the configured provider. +func (plugin *Plugin) generateEmbedding(ctx *schemas.BifrostContext, text string) ([]float32, int, error) { + embeddingReq := &schemas.BifrostEmbeddingRequest{ + Provider: plugin.config.Provider, + Model: plugin.config.EmbeddingModel, + Input: &schemas.EmbeddingInput{ + Text: &text, + }, + } + + embeddingCtx := schemas.NewBifrostContext(ctx, schemas.NoDeadline) + // Cancel the derived context once we're done. NewBifrostContext starts a + // watchCancellation goroutine that holds a reference to ctx (the scoped + // plugin context). Without this, that goroutine outlives the plugin call + // and may dereference fields on a parent context that has already been + // released back to its sync.Pool — see core/schemas.ReleasePluginScope. + defer embeddingCtx.Cancel() + embeddingCtx.SetValue(schemas.BifrostContextKeySkipPluginPipeline, true) + if plugin.embeddingRequestExecutor == nil { + return nil, 0, fmt.Errorf("embedding request executor is not configured") + } + response, err := plugin.embeddingRequestExecutor(embeddingCtx, embeddingReq) + if err != nil { + return nil, 0, fmt.Errorf("failed to generate embedding: %v", err) + } + + if len(response.Data) == 0 { + return nil, 0, fmt.Errorf("no embeddings returned from provider") + } + + embedding := response.Data[0].Embedding + inputTokens := 0 + if response.Usage != nil { + inputTokens = response.Usage.TotalTokens + } + + switch { + case embedding.EmbeddingStr != nil: + var vals []float32 + if err := json.Unmarshal([]byte(*embedding.EmbeddingStr), &vals); err != nil { + return nil, 0, fmt.Errorf("failed to parse string embedding: %w", err) } + return vals, inputTokens, nil + case embedding.EmbeddingArray != nil: + return float64ToFloat32Embedding(embedding.EmbeddingArray), inputTokens, nil + case len(embedding.Embedding2DArray) > 0: + return flattenToFloat32Embedding(embedding.Embedding2DArray), inputTokens, nil + case embedding.EmbeddingInt8Array != nil: + // Quantized int8/binary embedding format. Promote to float32 so the + // cosine-similarity path treats it uniformly. + return int8ToFloat32Embedding(embedding.EmbeddingInt8Array), inputTokens, nil + case embedding.EmbeddingInt32Array != nil: + return int32ToFloat32Embedding(embedding.EmbeddingInt32Array), inputTokens, nil + } + return nil, 0, fmt.Errorf("embedding data is not in expected format") +} + +// generateRequestHash creates an xxhash of the (normalized input, params). +// Fallbacks are excluded since they only affect error handling. +func (plugin *Plugin) generateRequestHash(req *schemas.BifrostRequest, params map[string]interface{}) (string, error) { + hashInput := map[string]interface{}{ + "input": plugin.getNormalizedInputForCaching(req), + "params": params, } + jsonData, err := schemas.MarshalDeeplySorted(hashInput) + if err != nil { + return "", fmt.Errorf("failed to marshal request for hashing: %w", err) + } + return fmt.Sprintf("%x", xxhash.Sum64(jsonData)), nil +} - // Check if this is a streaming response - need to check for non-null values - streamResponses, hasStreamingResponse := properties["stream_chunks"] - singleResponse, hasSingleResponse := properties["response"] +// generateDirectCacheID returns a deterministic UUIDv5 derived from the cache +// key, request hash, params hash, and (optionally) provider/model. The same +// inputs always produce the same ID, which is what makes the direct path an +// O(1) point fetch. +func (plugin *Plugin) generateDirectCacheID(provider schemas.ModelProvider, model string, cacheKey string, requestHash string, paramsHash string) (string, error) { + idInput := struct { + CacheKey string `json:"cache_key"` + RequestHash string `json:"request_hash"` + ParamsHash string `json:"params_hash"` + Provider string `json:"provider,omitempty"` + Model string `json:"model,omitempty"` + }{ + CacheKey: cacheKey, + RequestHash: requestHash, + ParamsHash: paramsHash, + } + if plugin.config.CacheByProvider != nil && *plugin.config.CacheByProvider { + idInput.Provider = string(provider) + } + if plugin.config.CacheByModel != nil && *plugin.config.CacheByModel { + idInput.Model = model + } + data, err := schemas.MarshalDeeplySorted(idInput) + if err != nil { + return "", err + } + return uuid.NewSHA1(directCacheNamespace, data).String(), nil +} - // Consider fields present only if they're not null - hasValidSingleResponse := hasSingleResponse && singleResponse != nil - hasValidStreamingResponse := hasStreamingResponse && streamResponses != nil +// buildResponseFromResult constructs a LLMPluginShortCircuit response from a cached VectorEntry result. +// +// Return contract: +// - (shortCircuit, nil): cache hit — caller should return shortCircuit to short-circuit upstream. +// - (nil, nil): treat as a miss. Used for both genuine misses and "soft" misses +// (expired entry, unparseable expires_at, format mismatch). Caller proceeds to upstream. +// - (nil, err): hard error worth logging; caller logs and proceeds to upstream. +func (plugin *Plugin) buildResponseFromResult(ctx *schemas.BifrostContext, state *cacheState, req *schemas.BifrostRequest, result vectorstore.SearchResult, cacheType CacheType, threshold *float64, inputTokens *int) (*schemas.LLMPluginShortCircuit, error) { + properties := result.Properties + if properties == nil { + return nil, fmt.Errorf("no properties found in cached result") + } - // Parse stream_chunks - streamChunks, err := plugin.parseStreamChunks(streamResponses) - if err != nil || len(streamChunks) == 0 { - hasValidStreamingResponse = false + if expired, miss := isExpiredEntry(properties); expired { + // Async best-effort cleanup of the stale entry. Tracked on writersWg + // so WaitForPendingOperations + Cleanup block until it finishes, + // avoiding a delete racing with namespace teardown. + plugin.writersWg.Add(1) + go func() { + defer plugin.writersWg.Done() + deleteCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + if err := plugin.store.Delete(deleteCtx, plugin.config.VectorStoreNamespace, result.ID); err != nil { + plugin.logger.Warn("Failed to delete expired entry %s: %v", result.ID, err) + } + }() + return nil, nil + } else if miss { + // Unparseable expires_at — treat as miss to be safe. + return nil, nil } similarity := 0.0 @@ -290,134 +265,118 @@ func (plugin *Plugin) buildResponseFromResult(ctx *schemas.BifrostContext, req * similarity = *result.Score } - isStreamRequest := bifrost.IsStreamRequestType(req.RequestType) - - if isStreamRequest && hasValidStreamingResponse { - return plugin.buildStreamingResponseFromResult(ctx, req, result, streamChunks, cacheType, threshold, similarity, inputTokens) - } else if !isStreamRequest && hasValidSingleResponse { - return plugin.buildSingleResponseFromResult(ctx, req, result, singleResponse, cacheType, threshold, similarity, inputTokens) + isStream := bifrost.IsStreamRequestType(req.RequestType) + if isStream { + streamResponses, ok := properties["stream_chunks"] + if ok && streamResponses != nil { + streamChunks, err := plugin.parseStreamChunks(streamResponses) + if err == nil && len(streamChunks) > 0 { + return plugin.buildStreamingResponseFromResult(ctx, state, req, result, streamChunks, cacheType, threshold, &similarity, inputTokens) + } + } } else { - plugin.logger.Warn("%s Cache entry format mismatch for request %s (isStream=%t, hasSingle=%t, hasStream=%t), treating as miss", - PluginLoggerPrefix, result.ID, isStreamRequest, hasValidSingleResponse, hasValidStreamingResponse) - return nil, nil + singleResponse, ok := properties["response"] + if ok && singleResponse != nil { + return plugin.buildNonStreamingResponseFromResult(ctx, state, req, result, singleResponse, cacheType, threshold, &similarity, inputTokens) + } + } + + msg := fmt.Sprintf("cache entry %s format mismatch (isStream=%t), treating as miss — entry may be corrupt", result.ID, isStream) + plugin.logger.Warn(msg) + ctx.Log(schemas.LogLevelWarn, msg) + return nil, nil +} + +// isExpiredEntry returns (expired, parseFailed). A nil/missing expires_at is +// treated as never-expires. +func isExpiredEntry(properties map[string]interface{}) (bool, bool) { + expiresAtRaw, exists := properties["expires_at"] + if !exists || expiresAtRaw == nil { + return false, false } + var expiresAt int64 + switch v := expiresAtRaw.(type) { + case string: + parsed, err := strconv.ParseInt(v, 10, 64) + if err != nil { + return false, true + } + expiresAt = parsed + case float64: + expiresAt = int64(v) + case int64: + expiresAt = v + case int: + expiresAt = int64(v) + default: + return false, true + } + return expiresAt < time.Now().Unix(), false } -// buildSingleResponseFromResult constructs a single response from cached data -func (plugin *Plugin) buildSingleResponseFromResult(ctx *schemas.BifrostContext, req *schemas.BifrostRequest, result vectorstore.SearchResult, responseData interface{}, cacheType CacheType, threshold float64, similarity float64, inputTokens int) (*schemas.LLMPluginShortCircuit, error) { +// buildNonStreamingResponseFromResult constructs a single response from cached data. +func (plugin *Plugin) buildNonStreamingResponseFromResult(ctx *schemas.BifrostContext, state *cacheState, req *schemas.BifrostRequest, result vectorstore.SearchResult, responseData interface{}, cacheType CacheType, threshold *float64, similarity *float64, inputTokens *int) (*schemas.LLMPluginShortCircuit, error) { requestedProvider, requestedModel, _ := req.GetRequestFields() responseStr, ok := responseData.(string) if !ok { return nil, fmt.Errorf("cached response is not a string") } - - // Unmarshal the cached response var cachedResponse schemas.BifrostResponse if err := json.Unmarshal([]byte(responseStr), &cachedResponse); err != nil { return nil, fmt.Errorf("failed to unmarshal cached response: %w", err) } - extraFields := cachedResponse.GetExtraFields() - - if extraFields.CacheDebug == nil { - extraFields.CacheDebug = &schemas.BifrostCacheDebug{} - } - extraFields.CacheDebug.CacheHit = true - extraFields.CacheDebug.HitType = bifrost.Ptr(string(cacheType)) - extraFields.CacheDebug.CacheID = bifrost.Ptr(result.ID) - extraFields.CacheDebug.RequestedProvider = bifrost.Ptr(string(requestedProvider)) - extraFields.CacheDebug.RequestedModel = bifrost.Ptr(requestedModel) - if cacheType == CacheTypeSemantic { - extraFields.CacheDebug.ProviderUsed = bifrost.Ptr(string(plugin.config.Provider)) - extraFields.CacheDebug.ModelUsed = bifrost.Ptr(plugin.config.EmbeddingModel) - extraFields.CacheDebug.Threshold = &threshold - extraFields.CacheDebug.Similarity = &similarity - extraFields.CacheDebug.InputTokens = &inputTokens - } else { - extraFields.CacheDebug.ProviderUsed = nil - extraFields.CacheDebug.ModelUsed = nil - extraFields.CacheDebug.Threshold = nil - extraFields.CacheDebug.Similarity = nil - extraFields.CacheDebug.InputTokens = nil - } - - ctx.SetValue(isCacheHitKey, true) - ctx.SetValue(cacheHitTypeKey, cacheType) - - return &schemas.LLMPluginShortCircuit{ - Response: &cachedResponse, - }, nil + plugin.stampCacheDebugForHit(state, cachedResponse.GetExtraFields(), result.ID, requestedProvider, requestedModel, cacheType, threshold, similarity, inputTokens) + state.ShortCircuited = true + return &schemas.LLMPluginShortCircuit{Response: &cachedResponse}, nil } -// buildStreamingResponseFromResult constructs a streaming response from cached data -func (plugin *Plugin) buildStreamingResponseFromResult(ctx *schemas.BifrostContext, req *schemas.BifrostRequest, result vectorstore.SearchResult, streamArray []interface{}, cacheType CacheType, threshold float64, similarity float64, inputTokens int) (*schemas.LLMPluginShortCircuit, error) { +// buildStreamingResponseFromResult constructs a streaming response from cached data. +// The replay goroutine guards every send with ctx.Done() so a dropped consumer +// can't leak the goroutine (and its captured chunks) for the lifetime of the +// process. +func (plugin *Plugin) buildStreamingResponseFromResult(ctx *schemas.BifrostContext, state *cacheState, req *schemas.BifrostRequest, result vectorstore.SearchResult, streamArray []string, cacheType CacheType, threshold *float64, similarity *float64, inputTokens *int) (*schemas.LLMPluginShortCircuit, error) { requestedProvider, requestedModel, _ := req.GetRequestFields() - - // Mark cache-hit once to avoid concurrent ctx writes - ctx.SetValue(isCacheHitKey, true) - ctx.SetValue(cacheHitTypeKey, cacheType) - - // Create stream channel streamChan := make(chan *schemas.BifrostStreamChunk) + done := ctx.Done() + // We deliberately do NOT pre-decode all chunks up front — that would + // add O(N) latency before the first chunk is delivered, defeating the + // purpose of streaming for long responses. A malformed chunk is + // extremely unlikely (we wrote it as JSON ourselves), and on the rare + // occasion it happens we log+skip rather than truncate the user's view. go func() { defer close(streamChan) - - // Set cache-hit markers inside the streaming goroutine to avoid races - ctx.SetValue(isCacheHitKey, true) - ctx.SetValue(cacheHitTypeKey, cacheType) - - // Process each stream chunk - for i, chunkData := range streamArray { - chunkStr, ok := chunkData.(string) - if !ok { - plugin.logger.Warn("%s Stream chunk %d is not a string, skipping", PluginLoggerPrefix, i) - continue - } - - // Unmarshal the chunk as BifrostResponse + for i, chunkStr := range streamArray { var cachedResponse schemas.BifrostResponse if err := json.Unmarshal([]byte(chunkStr), &cachedResponse); err != nil { - plugin.logger.Warn("%s Failed to unmarshal stream chunk %d, skipping: %v", PluginLoggerPrefix, i, err) + plugin.logger.Warn("Failed to unmarshal stream chunk %d, skipping: %v", i, err) continue } // Ensure RequestType is set on every chunk so downstream consumers - // (logging, telemetry, etc.) correctly identify this as a streaming response. + // (logging, telemetry) correctly identify this as a streaming response. if ef := cachedResponse.GetExtraFields(); ef != nil && ef.RequestType == "" { ef.RequestType = req.RequestType } - // Add cache debug to only the last chunk if i == len(streamArray)-1 { - ctx.SetValue(schemas.BifrostContextKeyStreamEndIndicator, true) - extraFields := cachedResponse.GetExtraFields() - cacheDebug := schemas.BifrostCacheDebug{ - CacheHit: true, - HitType: bifrost.Ptr(string(cacheType)), - CacheID: bifrost.Ptr(result.ID), - RequestedProvider: bifrost.Ptr(string(requestedProvider)), - RequestedModel: bifrost.Ptr(requestedModel), - } - if cacheType == CacheTypeSemantic { - cacheDebug.ProviderUsed = bifrost.Ptr(string(plugin.config.Provider)) - cacheDebug.ModelUsed = bifrost.Ptr(plugin.config.EmbeddingModel) - cacheDebug.Threshold = &threshold - cacheDebug.Similarity = &similarity - cacheDebug.InputTokens = &inputTokens - } else { - cacheDebug.ProviderUsed = nil - cacheDebug.ModelUsed = nil - cacheDebug.Threshold = nil - cacheDebug.Similarity = nil - cacheDebug.InputTokens = nil - } - extraFields.CacheDebug = &cacheDebug + // stampCacheDebugForHit marks this chunk as the cache-hit final + // chunk; cache.PostLLMHook keys off CacheDebug.CacheHit=true to + // set BifrostContextKeyStreamEndIndicator on the root ctx + // synchronously (same goroutine as logging.PostLLMHook). + // + // We deliberately do NOT call ctx.Root().SetValue here. Doing + // so races against the receiver's PostLLMHook for the previous + // chunk: the cache replay can advance to iteration N (and + // write the indicator) while the receiver is still running + // PostLLMHooks for chunk N-1, poisoning that chunk's + // IsFinalChunk read and causing duplicate "final" events. + plugin.stampCacheDebugForHit(state, cachedResponse.GetExtraFields(), result.ID, requestedProvider, requestedModel, cacheType, threshold, similarity, inputTokens) } - // Send chunk to stream - streamChan <- &schemas.BifrostStreamChunk{ + chunk := &schemas.BifrostStreamChunk{ BifrostTextCompletionResponse: cachedResponse.TextCompletionResponse, BifrostChatResponse: cachedResponse.ChatResponse, BifrostResponsesStreamResponse: cachedResponse.ResponsesStreamResponse, @@ -425,44 +384,63 @@ func (plugin *Plugin) buildStreamingResponseFromResult(ctx *schemas.BifrostConte BifrostTranscriptionStreamResponse: cachedResponse.TranscriptionStreamResponse, BifrostImageGenerationStreamResponse: cachedResponse.ImageGenerationStreamResponse, } + + select { + case streamChan <- chunk: + case <-done: + return + } } }() - return &schemas.LLMPluginShortCircuit{ - Stream: streamChan, - }, nil + state.ShortCircuited = true + return &schemas.LLMPluginShortCircuit{Stream: streamChan}, nil } -// parseStreamChunks parses stream_chunks data from various formats into []interface{} -// Handles []interface{}, []string, and JSON string formats -func (plugin *Plugin) parseStreamChunks(streamData interface{}) ([]interface{}, error) { - if streamData == nil { - return nil, fmt.Errorf("stream data is nil") - } - - switch v := streamData.(type) { - case []interface{}: - return v, nil - case []string: - // Convert []string to []interface{} - result := make([]interface{}, len(v)) - for i, s := range v { - result[i] = s - } - return result, nil - case string: - // Parse JSON string from Redis - var stringArray []string - if err := json.Unmarshal([]byte(v), &stringArray); err != nil { - return nil, fmt.Errorf("failed to parse JSON string: %w", err) - } - // Convert to []interface{} - result := make([]interface{}, len(stringArray)) - for i, s := range stringArray { - result[i] = s - } - return result, nil - default: - return nil, fmt.Errorf("unsupported stream data type: %T", streamData) +// stampCacheDebugForHit stamps the cache-hit telemetry on the response. For +// CacheTypeDirect, the embedding-related fields are explicitly cleared so +// stale carry-over from semantic hits never leaks through. CacheHitLatency +// is computed from state.CreatedAt (set at PreLLMHook entry) so consumers +// can distinguish cache-serve time from the original provider latency +// preserved in the cached response. +func (plugin *Plugin) stampCacheDebugForHit( + state *cacheState, + extraFields *schemas.BifrostResponseExtraFields, + cacheID string, + requestedProvider schemas.ModelProvider, + requestedModel string, + cacheType CacheType, + threshold *float64, + similarity *float64, + inputTokens *int, +) { + // GetExtraFields() can return nil for older/corrupted cache entries that + // were written without ExtraFields populated. Bail rather than panic — + // the chunk will still be delivered, just without CacheDebug telemetry. + if extraFields == nil { + return + } + if extraFields.CacheDebug == nil { + extraFields.CacheDebug = &schemas.BifrostCacheDebug{} + } + cd := extraFields.CacheDebug + cd.CacheHit = true + cd.HitType = bifrost.Ptr(string(cacheType)) + cd.CacheID = bifrost.Ptr(cacheID) + cd.RequestedProvider = bifrost.Ptr(string(requestedProvider)) + cd.RequestedModel = bifrost.Ptr(requestedModel) + cd.CacheHitLatency = bifrost.Ptr(time.Since(state.CreatedAt).Milliseconds()) + if cacheType == CacheTypeSemantic { + cd.ProviderUsed = bifrost.Ptr(string(plugin.config.Provider)) + cd.ModelUsed = bifrost.Ptr(plugin.config.EmbeddingModel) + cd.Threshold = threshold + cd.Similarity = similarity + cd.InputTokens = inputTokens + } else { + cd.ProviderUsed = nil + cd.ModelUsed = nil + cd.Threshold = nil + cd.Similarity = nil + cd.InputTokens = nil } } diff --git a/plugins/semanticcache/state.go b/plugins/semanticcache/state.go new file mode 100644 index 0000000000..489c329076 --- /dev/null +++ b/plugins/semanticcache/state.go @@ -0,0 +1,110 @@ +package semanticcache + +import ( + "time" +) + +// cacheState holds per-request state for the semantic cache plugin. It's +// keyed by the request ID and lives between PreLLMHook (where it's populated) +// and PostLLMHook (where it's consumed and cleared). +// +// Centralizes what used to be a set of stringly-typed BifrostContext keys +// (directCacheID, paramsHash, embeddings, embedding input tokens) into one +// struct so the lifecycle is explicit and consumers don't have to chase +// ctx.Value/SetValue calls across files. +// +// No mutex is needed: per-request access is serialized — PreLLMHook runs once, +// PostLLMHook runs once per chunk in order, and the only async path +// (PostLLMHook's storage goroutine) snapshots the values it needs into locals +// before launching. +type cacheState struct { + DirectCacheID string + ParamsHash string + Embeddings []float32 + EmbeddingsInputTokens int + + // FilteredInput caches getInputForCaching(req) so attachment extraction, + // embedding text extraction, and history-threshold checks reuse the same + // filtered slice instead of re-filtering on each call. + FilteredInput interface{} + + // ShortCircuited is set when PreLLMHook served the response from cache + // (returned a non-nil LLMPluginShortCircuit). PostLLMHook uses this to + // skip the entire cache-write path: only the FINAL replay chunk carries + // CacheDebug.CacheHit=true, so shouldSkipCaching() can't catch the + // non-final chunks on its own — without this flag they'd flow into + // addStreamingResponse and trigger a duplicate write at the same + // directCacheID (Weaviate 422 "id already exists"). + ShortCircuited bool + + CreatedAt time.Time +} + +// cacheStateMaxAge bounds how long an orphaned cacheState may live in memory +// before being reaped. +const cacheStateMaxAge = 60 * time.Minute + +// cacheStateCleanupInterval bounds the worst-case staleness of an orphaned +// state to ~maxAge + interval. +const cacheStateCleanupInterval = 5 * time.Minute + +// createCacheState writes a fresh state for requestID, overwriting any prior. +// PreLLMHook calls this at the top so retries / reused requestIDs don't +// inherit stale fields. +func (p *Plugin) createCacheState(requestID string) *cacheState { + state := &cacheState{CreatedAt: time.Now()} + p.cacheStates.Store(requestID, state) + return state +} + +// getCacheState returns the cacheState for requestID, or nil if none exists. +func (p *Plugin) getCacheState(requestID string) *cacheState { + if v, ok := p.cacheStates.Load(requestID); ok { + return v.(*cacheState) + } + return nil +} + +// clearCacheState drops the cacheState entry for requestID. It's safe to call +// when no entry exists. +func (p *Plugin) clearCacheState(requestID string) { + p.cacheStates.Delete(requestID) +} + +// runCacheStateCleanupLoop reaps stale cacheStates on a ticker until stopCh +// is closed. Started by Init, stopped by Cleanup. +func (p *Plugin) runCacheStateCleanupLoop() { + defer p.cleanupWg.Done() + ticker := time.NewTicker(cacheStateCleanupInterval) + defer ticker.Stop() + for { + select { + case <-p.stopCh: + return + case <-ticker.C: + p.cleanupOldCacheStates() + } + } +} + +// cleanupOldCacheStates deletes every cacheState whose CreatedAt is older +// than cacheStateMaxAge. Entries this old indicate a request that never +// reached PostLLMHook (client disconnect, framework bug); reaping them +// bounds memory under abnormal traffic. +func (p *Plugin) cleanupOldCacheStates() { + cutoff := time.Now().Add(-cacheStateMaxAge) + var toDelete []string + p.cacheStates.Range(func(key, value interface{}) bool { + state := value.(*cacheState) + if state.CreatedAt.Before(cutoff) { + toDelete = append(toDelete, key.(string)) + } + return true + }) + for _, k := range toDelete { + p.cacheStates.Delete(k) + } + if len(toDelete) > 0 { + p.logger.Debug("Reaped %d stale cache states", len(toDelete)) + } +} diff --git a/plugins/semanticcache/stream.go b/plugins/semanticcache/stream.go index e2d3c02526..f8c3fd7b3a 100644 --- a/plugins/semanticcache/stream.go +++ b/plugins/semanticcache/stream.go @@ -5,65 +5,81 @@ import ( "encoding/json" "fmt" "sort" - "sync" "time" ) -// Streaming State Management Methods +// chunkSortKey returns the (Index, ChunkIndex) tuple used to order +// accumulated stream chunks before flush. Image-generation responses use +// both fields; every other response shape uses ChunkIndex with Index=0. +// Nil chunks/responses sort to the end via a max-int sentinel so they're +// dropped deterministically by the consumer. +func chunkSortKey(c *StreamChunk) (int, int) { + const sentinel = int(^uint(0) >> 1) // math.MaxInt without the import + if c == nil || c.Response == nil { + return sentinel, sentinel + } + r := c.Response + switch { + case r.TextCompletionResponse != nil: + return 0, r.TextCompletionResponse.ExtraFields.ChunkIndex + case r.ChatResponse != nil: + return 0, r.ChatResponse.ExtraFields.ChunkIndex + case r.ResponsesResponse != nil: + return 0, r.ResponsesResponse.ExtraFields.ChunkIndex + case r.ResponsesStreamResponse != nil: + return 0, r.ResponsesStreamResponse.ExtraFields.ChunkIndex + case r.SpeechResponse != nil: + return 0, r.SpeechResponse.ExtraFields.ChunkIndex + case r.SpeechStreamResponse != nil: + return 0, r.SpeechStreamResponse.ExtraFields.ChunkIndex + case r.TranscriptionResponse != nil: + return 0, r.TranscriptionResponse.ExtraFields.ChunkIndex + case r.TranscriptionStreamResponse != nil: + return 0, r.TranscriptionStreamResponse.ExtraFields.ChunkIndex + case r.ImageGenerationStreamResponse != nil: + return r.ImageGenerationStreamResponse.Index, r.ImageGenerationStreamResponse.ChunkIndex + } + return sentinel, sentinel +} -// createStreamAccumulator creates a new stream accumulator for a request -func (plugin *Plugin) createStreamAccumulator(requestID string, storageID string, embedding []float32, metadata map[string]interface{}, ttl time.Duration) *StreamAccumulator { - return &StreamAccumulator{ +// getOrCreateStreamAccumulator returns the StreamAccumulator for requestID, +// creating one if none exists. Concurrency-safe: the underlying sync.Map's +// LoadOrStore guarantees a single accumulator per request even under racing +// PostLLMHook invocations. +func (plugin *Plugin) getOrCreateStreamAccumulator(requestID string, storageID string, embedding []float32, metadata map[string]interface{}, ttl time.Duration) *StreamAccumulator { + if existing, ok := plugin.streamAccumulators.Load(requestID); ok { + return existing.(*StreamAccumulator) + } + newAccumulator := &StreamAccumulator{ RequestID: requestID, StorageID: storageID, Chunks: make([]*StreamChunk, 0), - IsComplete: false, + LastSeenAt: time.Now(), Embedding: embedding, Metadata: metadata, TTL: ttl, - mu: sync.Mutex{}, } -} - -// getOrCreateStreamAccumulator gets or creates a stream accumulator for a request -func (plugin *Plugin) getOrCreateStreamAccumulator(requestID string, storageID string, embedding []float32, metadata map[string]interface{}, ttl time.Duration) *StreamAccumulator { - if existing, ok := plugin.streamAccumulators.Load(requestID); ok { - return existing.(*StreamAccumulator) - } - - newAccumulator := plugin.createStreamAccumulator(requestID, storageID, embedding, metadata, ttl) actual, _ := plugin.streamAccumulators.LoadOrStore(requestID, newAccumulator) return actual.(*StreamAccumulator) } -// addStreamChunk adds a chunk to the stream accumulator -func (plugin *Plugin) addStreamChunk(requestID string, chunk *StreamChunk, isFinalChunk bool) error { - // Get accumulator (should exist if properly initialized) +// addStreamChunk appends a chunk to the request's accumulator and refreshes +// LastSeenAt so the reaper treats the stream as still active. +func (plugin *Plugin) addStreamChunk(requestID string, chunk *StreamChunk) error { accumulatorInterface, exists := plugin.streamAccumulators.Load(requestID) if !exists { return fmt.Errorf("stream accumulator not found for request %s", requestID) } - accumulator := accumulatorInterface.(*StreamAccumulator) accumulator.mu.Lock() defer accumulator.mu.Unlock() - - // Add chunk to the list (chunks arrive in order) accumulator.Chunks = append(accumulator.Chunks, chunk) - - // Set FinalTimestamp when FinishReason is present - // This handles both normal completion chunks and usage-only last chunks - if isFinalChunk { - accumulator.FinalTimestamp = chunk.Timestamp - } - - plugin.logger.Debug(fmt.Sprintf("%s Added chunk to stream accumulator for request %s", PluginLoggerPrefix, requestID)) - + accumulator.LastSeenAt = chunk.Timestamp return nil } -// processAccumulatedStream processes all accumulated chunks and caches the complete stream -// Flow: Collect everything → Check for ANY errors → If no errors, order and send to .Add() → If any errors, drop operation +// processAccumulatedStream serializes and stores the accumulated chunks as a +// single cache entry. Called once per stream when the final chunk arrives. func (plugin *Plugin) processAccumulatedStream(ctx context.Context, requestID string) error { accumulatorInterface, exists := plugin.streamAccumulators.Load(requestID) if !exists { @@ -72,130 +88,106 @@ func (plugin *Plugin) processAccumulatedStream(ctx context.Context, requestID st accumulator := accumulatorInterface.(*StreamAccumulator) accumulator.mu.Lock() - - // Ensure unlock happens after cleanup defer accumulator.mu.Unlock() - // Ensure cleanup happens defer plugin.cleanupStreamAccumulator(requestID) - // STEP 1: Check if any chunk in the entire stream had an error - if accumulator.HasError { - plugin.logger.Debug(fmt.Sprintf("%s Stream for request %s had errors, dropping entire operation (not caching)", PluginLoggerPrefix, requestID)) - return nil - } - - // STEP 2: All chunks are clean, now sort and build ordered stream for caching - plugin.logger.Debug(fmt.Sprintf("%s Stream for request %s completed successfully, processing %d chunks for caching", PluginLoggerPrefix, requestID, len(accumulator.Chunks))) - - // Sort chunks by their ChunkIndex to ensure proper order (stable + nil-safe) sort.SliceStable(accumulator.Chunks, func(i, j int) bool { - if accumulator.Chunks[i].Response == nil || accumulator.Chunks[j].Response == nil { - // Push nils to the end deterministically - return accumulator.Chunks[j].Response != nil - } - if accumulator.Chunks[i].Response.TextCompletionResponse != nil { - return accumulator.Chunks[i].Response.TextCompletionResponse.ExtraFields.ChunkIndex < accumulator.Chunks[j].Response.TextCompletionResponse.ExtraFields.ChunkIndex - } - if accumulator.Chunks[i].Response.ChatResponse != nil { - return accumulator.Chunks[i].Response.ChatResponse.ExtraFields.ChunkIndex < accumulator.Chunks[j].Response.ChatResponse.ExtraFields.ChunkIndex - } - if accumulator.Chunks[i].Response.ResponsesResponse != nil { - return accumulator.Chunks[i].Response.ResponsesResponse.ExtraFields.ChunkIndex < accumulator.Chunks[j].Response.ResponsesResponse.ExtraFields.ChunkIndex - } - if accumulator.Chunks[i].Response.ResponsesStreamResponse != nil { - return accumulator.Chunks[i].Response.ResponsesStreamResponse.ExtraFields.ChunkIndex < accumulator.Chunks[j].Response.ResponsesStreamResponse.ExtraFields.ChunkIndex - } - if accumulator.Chunks[i].Response.SpeechResponse != nil { - return accumulator.Chunks[i].Response.SpeechResponse.ExtraFields.ChunkIndex < accumulator.Chunks[j].Response.SpeechResponse.ExtraFields.ChunkIndex - } - if accumulator.Chunks[i].Response.SpeechStreamResponse != nil { - return accumulator.Chunks[i].Response.SpeechStreamResponse.ExtraFields.ChunkIndex < accumulator.Chunks[j].Response.SpeechStreamResponse.ExtraFields.ChunkIndex - } - if accumulator.Chunks[i].Response.TranscriptionResponse != nil { - return accumulator.Chunks[i].Response.TranscriptionResponse.ExtraFields.ChunkIndex < accumulator.Chunks[j].Response.TranscriptionResponse.ExtraFields.ChunkIndex + ai, bi := chunkSortKey(accumulator.Chunks[i]) + aj, bj := chunkSortKey(accumulator.Chunks[j]) + if ai != aj { + return ai < aj } - if accumulator.Chunks[i].Response.TranscriptionStreamResponse != nil { - return accumulator.Chunks[i].Response.TranscriptionStreamResponse.ExtraFields.ChunkIndex < accumulator.Chunks[j].Response.TranscriptionStreamResponse.ExtraFields.ChunkIndex - } - if accumulator.Chunks[i].Response.ImageGenerationStreamResponse != nil { - // For image generation, sort by Index first, then ChunkIndex - if accumulator.Chunks[i].Response.ImageGenerationStreamResponse.Index != accumulator.Chunks[j].Response.ImageGenerationStreamResponse.Index { - return accumulator.Chunks[i].Response.ImageGenerationStreamResponse.Index < accumulator.Chunks[j].Response.ImageGenerationStreamResponse.Index - } - return accumulator.Chunks[i].Response.ImageGenerationStreamResponse.ChunkIndex < accumulator.Chunks[j].Response.ImageGenerationStreamResponse.ChunkIndex - } - return false + return bi < bj }) - var streamResponses []string + streamResponses := make([]string, 0, len(accumulator.Chunks)) for i, chunk := range accumulator.Chunks { - if chunk.Response != nil { - chunkData, err := json.Marshal(chunk.Response) - if err != nil { - plugin.logger.Warn("%s Failed to marshal stream chunk %d: %v", PluginLoggerPrefix, i, err) - continue - } - streamResponses = append(streamResponses, string(chunkData)) + if chunk.Response == nil { + continue + } + chunkData, err := json.Marshal(chunk.Response) + if err != nil { + plugin.logger.Warn("Failed to marshal stream chunk %d: %v", i, err) + continue } + streamResponses = append(streamResponses, string(chunkData)) } - // STEP 3: Validate we have valid chunks to cache if len(streamResponses) == 0 { - plugin.logger.Warn("%s Stream for request %s has no valid response chunks, skipping cache storage", PluginLoggerPrefix, requestID) + plugin.logger.Warn("Stream for request %s has no valid response chunks, skipping cache storage", requestID) return nil } - // STEP 4: Build final metadata and submit to .Add() method - finalMetadata := make(map[string]interface{}) + finalMetadata := make(map[string]interface{}, len(accumulator.Metadata)+1) for k, v := range accumulator.Metadata { finalMetadata[k] = v } finalMetadata["stream_chunks"] = streamResponses - // Store complete unified entry using the final cache storage ID. if err := plugin.store.Add(ctx, plugin.config.VectorStoreNamespace, accumulator.StorageID, accumulator.Embedding, finalMetadata); err != nil { return fmt.Errorf("failed to store complete streaming cache entry: %w", err) } - plugin.logger.Debug(fmt.Sprintf("%s Successfully cached complete stream with %d ordered chunks, ID: %s", PluginLoggerPrefix, len(streamResponses), accumulator.StorageID)) + plugin.logger.Debug("Cached stream with %d chunks, storageID=%s", len(streamResponses), accumulator.StorageID) return nil } -// cleanupStreamAccumulator removes the stream accumulator for a request +// cleanupStreamAccumulator drops the accumulator for requestID. Safe to call +// when no entry exists. func (plugin *Plugin) cleanupStreamAccumulator(requestID string) { plugin.streamAccumulators.Delete(requestID) } -// cleanupOldStreamAccumulators removes stream accumulators older than 5 minutes +// streamAccumulatorMaxAge is how long a stream accumulator may live without +// reaching its final chunk before it's reaped by the periodic cleanup. +const streamAccumulatorMaxAge = 5 * time.Minute + +// streamCleanupInterval bounds the worst-case staleness of an abandoned +// accumulator to ~maxAge + interval. +const streamCleanupInterval = 1 * time.Minute + +// cleanupOldStreamAccumulators reaps accumulators whose most recent chunk is +// older than streamAccumulatorMaxAge. Called both periodically and at +// shutdown to prevent abandoned streams (client disconnect, mid-stream +// error) from accumulating in memory; reaping by LastSeenAt rather than +// first-chunk time keeps long-running streams alive while they're still +// receiving chunks. func (plugin *Plugin) cleanupOldStreamAccumulators() { - fiveMinutesAgo := time.Now().Add(-5 * time.Minute) - cleanedCount := 0 - toDelete := make([]string, 0) + cutoff := time.Now().Add(-streamAccumulatorMaxAge) + var toDelete []string plugin.streamAccumulators.Range(func(key, value interface{}) bool { requestID := key.(string) accumulator := value.(*StreamAccumulator) - - // Check if this accumulator is old (no activity for 5 minutes) accumulator.mu.Lock() - if len(accumulator.Chunks) > 0 { - firstChunkTime := accumulator.Chunks[0].Timestamp - if firstChunkTime.Before(fiveMinutesAgo) { - toDelete = append(toDelete, requestID) - plugin.logger.Debug(fmt.Sprintf("%s Cleaned up old stream accumulator for request %s", PluginLoggerPrefix, requestID)) - } + if accumulator.LastSeenAt.Before(cutoff) { + toDelete = append(toDelete, requestID) } accumulator.mu.Unlock() return true }) - // Delete outside the Range loop to avoid concurrent modification for _, requestID := range toDelete { plugin.streamAccumulators.Delete(requestID) - cleanedCount++ } - if cleanedCount > 0 { - plugin.logger.Debug(fmt.Sprintf("%s Cleaned up %d old stream accumulators", PluginLoggerPrefix, cleanedCount)) + if len(toDelete) > 0 { + plugin.logger.Debug("Reaped %d stale stream accumulators", len(toDelete)) + } +} + +// runStreamCleanupLoop runs cleanupOldStreamAccumulators on a ticker until +// stopCh is closed. Started by Init, stopped by Cleanup. +func (plugin *Plugin) runStreamCleanupLoop() { + defer plugin.cleanupWg.Done() + ticker := time.NewTicker(streamCleanupInterval) + defer ticker.Stop() + for { + select { + case <-plugin.stopCh: + return + case <-ticker.C: + plugin.cleanupOldStreamAccumulators() + } } } diff --git a/plugins/semanticcache/test_utils.go b/plugins/semanticcache/test_utils.go index e9b847c6dc..d9b4084926 100644 --- a/plugins/semanticcache/test_utils.go +++ b/plugins/semanticcache/test_utils.go @@ -4,15 +4,64 @@ import ( "context" "os" "strconv" + "sync" "testing" "time" + "github.com/google/uuid" bifrost "github.com/maximhq/bifrost/core" "github.com/maximhq/bifrost/core/schemas" "github.com/maximhq/bifrost/framework/vectorstore" mocker "github.com/maximhq/bifrost/plugins/mocker" ) +// isTransientUpstreamError reports whether a BifrostError reflects a +// transient upstream condition (timeout, rate-limit, 5xx) where skipping +// the test is reasonable. All other errors — including missing API keys, +// client-side issues, or non-HTTP failures — should fail the test rather +// than mask regressions behind a green skip. +func isTransientUpstreamError(err *schemas.BifrostError) bool { + if err == nil || err.StatusCode == nil { + return false + } + code := *err.StatusCode + return code == 408 || code == 425 || code == 429 || code >= 500 +} + +// withTestRequestID stamps a fresh BifrostContextKeyRequestID on the context. +// Unit tests that call PreLLMHook/PostLLMHook directly need this so the plugin +// can anchor per-request state. In integration tests the framework overwrites +// it, so setting it here is safe in either path. +func withTestRequestID(ctx *schemas.BifrostContext) *schemas.BifrostContext { + ctx.SetValue(schemas.BifrostContextKeyRequestID, uuid.NewString()) + return ctx +} + +// keyForTest returns a cache key namespaced by t.Name(). All tests should +// derive their cache keys via this helper so two tests running in parallel +// (t.Parallel) cannot see each other's entries through the shared Weaviate +// namespace — direct lookups encode cache_key into the storage ID and +// semantic search filters by it. +// +// Pass suffix="" for the most common single-key-per-test case. For tests +// that exercise multiple distinct cache keys (e.g. cross-key isolation +// tests), pass suffixes to disambiguate within the test. +func keyForTest(t testing.TB, suffix string) string { + t.Helper() + if suffix == "" { + return t.Name() + } + return t.Name() + "/" + suffix +} + +// newBaseTestContext returns a BifrostContext with a fresh request ID stamped. +// Replaces bare schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) +// in tests that call plugin.PreLLMHook / PostLLMHook directly — the plugin +// requires a request ID to anchor per-request state. +func newBaseTestContext() *schemas.BifrostContext { + return withTestRequestID(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline)) +} + // getWeaviateConfigFromEnv retrieves Weaviate configuration from environment variables func getWeaviateConfigFromEnv() vectorstore.WeaviateConfig { scheme := os.Getenv("WEAVIATE_SCHEME") @@ -366,11 +415,10 @@ type TestSetup struct { // NewTestSetup creates a new test setup with default configuration func NewTestSetup(t *testing.T) *TestSetup { return NewTestSetupWithConfig(t, &Config{ - Provider: schemas.OpenAI, - EmbeddingModel: "text-embedding-3-small", - Dimension: 1536, - Threshold: 0.8, - CleanUpOnShutdown: true, + Provider: schemas.OpenAI, + EmbeddingModel: "text-embedding-3-small", + Dimension: 1536, + Threshold: 0.8, }) } @@ -379,11 +427,41 @@ func NewTestSetupWithConfig(t *testing.T, config *Config) *TestSetup { return NewTestSetupWithVectorStore(t, config, vectorstore.VectorStoreTypeWeaviate) } +// SharedTestNamespace is the single Weaviate class all parallel tests share. +// Mirrors production: many concurrent requests hit one namespace, isolated +// by per-test cache_keys (see keyForTest). Distinct from the plugin's +// production default so test runs can't collide with a real cache. +const SharedTestNamespace = "BifrostSemanticCachePluginTest" + +var ( + sharedTestNamespaceOnce sync.Once + sharedTestNamespaceErr error +) + +// ensureSharedTestNamespace creates the shared test class exactly once per +// test process — sync.Once gates the TOCTOU race between concurrent +// Plugin.Init calls (each of which would otherwise check-then-create against +// the shared store and one would lose the race). +// +// Subsequent Plugin.Init calls in tests still invoke CreateNamespace, but the +// vectorstore implementations short-circuit when the class already exists. +func ensureSharedTestNamespace(ctx context.Context, store vectorstore.VectorStore, dim int) error { + sharedTestNamespaceOnce.Do(func() { + sharedTestNamespaceErr = store.CreateNamespace(ctx, SharedTestNamespace, dim, VectorStoreProperties) + }) + return sharedTestNamespaceErr +} + // NewTestSetupWithVectorStore creates a new test setup with custom configuration and vector store type func NewTestSetupWithVectorStore(t *testing.T, config *Config, storeType vectorstore.VectorStoreType) *TestSetup { ctx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline) logger := bifrost.NewDefaultLogger(schemas.LogLevelDebug) + // All tests share one namespace; isolation comes from per-test cache_keys. + if config.VectorStoreNamespace == "" { + config.VectorStoreNamespace = SharedTestNamespace + } + // Get the appropriate config for the vector store type var storeConfig interface{} switch storeType { @@ -408,6 +486,15 @@ func NewTestSetupWithVectorStore(t *testing.T, config *Config, storeType vectors t.Skipf("Vector store %s not available or failed to connect: %v", storeType, err) } + // Pre-create the shared namespace exactly once across the test process so + // concurrent Plugin.Init calls don't lose the TOCTOU race inside the + // vector store driver (check-then-create). + preCreateCtx, preCreateCancel := context.WithTimeout(context.Background(), 30*time.Second) + defer preCreateCancel() + if err := ensureSharedTestNamespace(preCreateCtx, store, config.Dimension); err != nil { + t.Fatalf("Failed to create shared test namespace: %v", err) + } + plugin, err := Init(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), config, logger, store) if err != nil { t.Fatalf("Failed to initialize plugin: %v", err) @@ -534,13 +621,29 @@ func AssertNoCacheHit(t *testing.T, response *schemas.BifrostResponse) { t.Log("✅ Response correctly not served from cache (cache_debug present but CacheHit=false)") } -// WaitForCache waits for async cache operations to complete +// WaitForCache waits for async cache operations to complete. +// +// WaitForPendingOperations now drains the writersWg accurately (every +// PostLLMHook goroutine + the expired-entry async delete is tracked), so the +// stored entries are guaranteed durable when this returns. The small sleep +// below is a buffer for vector store index visibility on stores with eventual +// consistency (Weaviate is usually immediate on single-node, but cloud or +// multi-shard setups may need a tick to make the entry queryable). +// +// Override via SEMCACHE_TEST_INDEX_DELAY_MS for slower stores / CI. func WaitForCache(plugin schemas.LLMPlugin) { if p, ok := plugin.(*Plugin); ok { p.WaitForPendingOperations() } - // Small buffer for Weaviate index consistency - time.Sleep(500 * time.Millisecond) + delayMs := 100 + if v := os.Getenv("SEMCACHE_TEST_INDEX_DELAY_MS"); v != "" { + if parsed, err := strconv.Atoi(v); err == nil && parsed >= 0 { + delayMs = parsed + } + } + if delayMs > 0 { + time.Sleep(time.Duration(delayMs) * time.Millisecond) + } } // CreateEmbeddingRequest creates an embedding request for testing @@ -611,28 +714,30 @@ func CreateImageGenerationRequest(prompt string, size string, quality string) *s } // CreateContextWithCacheKey creates a context with the test cache key -func CreateContextWithCacheKey(value string) *schemas.BifrostContext { - return schemas.NewBifrostContextWithValue(context.Background(), schemas.NoDeadline, CacheKey, value) +// CreateContextWithCacheKey creates a context with a per-test cache key. +// suffix may be "" for tests using only one cache key. +func CreateContextWithCacheKey(t testing.TB, suffix string) *schemas.BifrostContext { + return withTestRequestID(schemas.NewBifrostContextWithValue(context.Background(), schemas.NoDeadline, CacheKey, keyForTest(t, suffix))) } // CreateContextWithCacheKeyAndType creates a context with cache key and cache type -func CreateContextWithCacheKeyAndType(value string, cacheType CacheType) *schemas.BifrostContext { - return schemas.NewBifrostContextWithValue(context.Background(), schemas.NoDeadline, CacheKey, value).WithValue(CacheTypeKey, cacheType) +func CreateContextWithCacheKeyAndType(t testing.TB, suffix string, cacheType CacheType) *schemas.BifrostContext { + return withTestRequestID(schemas.NewBifrostContextWithValue(context.Background(), schemas.NoDeadline, CacheKey, keyForTest(t, suffix)).WithValue(CacheTypeKey, cacheType)) } // CreateContextWithCacheKeyAndTTL creates a context with cache key and custom TTL -func CreateContextWithCacheKeyAndTTL(value string, ttl time.Duration) *schemas.BifrostContext { - return schemas.NewBifrostContextWithValue(context.Background(), schemas.NoDeadline, CacheKey, value).WithValue(CacheTTLKey, ttl) +func CreateContextWithCacheKeyAndTTL(t testing.TB, suffix string, ttl time.Duration) *schemas.BifrostContext { + return withTestRequestID(schemas.NewBifrostContextWithValue(context.Background(), schemas.NoDeadline, CacheKey, keyForTest(t, suffix)).WithValue(CacheTTLKey, ttl)) } // CreateContextWithCacheKeyAndThreshold creates a context with cache key and custom threshold -func CreateContextWithCacheKeyAndThreshold(value string, threshold float64) *schemas.BifrostContext { - return schemas.NewBifrostContext(context.Background(), schemas.NoDeadline).WithValue(CacheKey, value).WithValue(CacheThresholdKey, threshold) +func CreateContextWithCacheKeyAndThreshold(t testing.TB, suffix string, threshold float64) *schemas.BifrostContext { + return withTestRequestID(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline).WithValue(CacheKey, keyForTest(t, suffix)).WithValue(CacheThresholdKey, threshold)) } // CreateContextWithCacheKeyAndNoStore creates a context with cache key and no-store flag -func CreateContextWithCacheKeyAndNoStore(value string, noStore bool) *schemas.BifrostContext { - return schemas.NewBifrostContext(context.Background(), schemas.NoDeadline).WithValue(CacheKey, value).WithValue(CacheNoStoreKey, noStore) +func CreateContextWithCacheKeyAndNoStore(t testing.TB, suffix string, noStore bool) *schemas.BifrostContext { + return withTestRequestID(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline).WithValue(CacheKey, keyForTest(t, suffix)).WithValue(CacheNoStoreKey, noStore)) } // CreateTestSetupWithConversationThreshold creates a test setup with custom conversation history threshold @@ -641,7 +746,6 @@ func CreateTestSetupWithConversationThreshold(t *testing.T, threshold int) *Test Provider: schemas.OpenAI, EmbeddingModel: "text-embedding-3-small", Dimension: 1536, - CleanUpOnShutdown: true, Threshold: 0.8, ConversationHistoryThreshold: threshold, } @@ -655,7 +759,6 @@ func CreateTestSetupWithExcludeSystemPrompt(t *testing.T, excludeSystem bool) *T Provider: schemas.OpenAI, EmbeddingModel: "text-embedding-3-small", Dimension: 1536, - CleanUpOnShutdown: true, Threshold: 0.8, ExcludeSystemPrompt: &excludeSystem, } @@ -669,7 +772,6 @@ func CreateTestSetupWithThresholdAndExcludeSystem(t *testing.T, threshold int, e Provider: schemas.OpenAI, EmbeddingModel: "text-embedding-3-small", Dimension: 1536, - CleanUpOnShutdown: true, Threshold: 0.8, ConversationHistoryThreshold: threshold, ExcludeSystemPrompt: &excludeSystem, diff --git a/plugins/semanticcache/utils.go b/plugins/semanticcache/utils.go index 125ae4670e..29f15fc825 100644 --- a/plugins/semanticcache/utils.go +++ b/plugins/semanticcache/utils.go @@ -5,6 +5,7 @@ import ( "encoding/json" "fmt" "maps" + "sort" "strings" "time" @@ -14,19 +15,119 @@ import ( "github.com/maximhq/bifrost/core/schemas" ) -// directCacheNamespace is a fixed UUID v5 namespace used for deterministic direct cache ID generation. -// Using a fixed namespace ensures IDs are reproducible across restarts and store types. +// directCacheNamespace is a fixed namespace UUID for generating deterministic +// UUID v5 cache IDs via uuid.NewSHA1, used by generateDirectCacheID. The +// bytes are arbitrary — they only need to be stable across restarts so the +// same (cache_key, request_hash, params_hash) tuple maps to the same ID. var directCacheNamespace = uuid.MustParse("b1f3c2d4-e5a6-7890-abcd-ef1234567890") +// isSemanticCacheSupportedRequestType reports whether semantic cache supports +// this request type for cache lookup and storage. Unsupported types are skipped. +// +// IMPORTANT: this list must stay in sync with the switch in buildRequestMetadataForCaching. +// When adding a new case there, add it here too. +func isSemanticCacheSupportedRequestType(requestType schemas.RequestType) bool { + switch requestType { + case schemas.TextCompletionRequest, + schemas.TextCompletionStreamRequest, + schemas.ChatCompletionRequest, + schemas.ChatCompletionStreamRequest, + schemas.ResponsesRequest, + schemas.ResponsesStreamRequest, + schemas.WebSocketResponsesRequest, + schemas.SpeechRequest, + schemas.SpeechStreamRequest, + schemas.EmbeddingRequest, + schemas.TranscriptionRequest, + schemas.TranscriptionStreamRequest, + schemas.ImageGenerationRequest, + schemas.ImageGenerationStreamRequest: + return true + default: + return false + } +} + +// hashSortedSet returns a deterministic hex hash for an order-insensitive +// list of items. Some request fields are semantically sets but JSON-encoded +// as lists (most notably Tools, where MCP's randomized map iteration would +// otherwise perturb the request hash). The caller supplies a key extractor +// because shapes differ across fields (e.g. ChatTool.Function.Name vs +// ResponsesTool.Name). Use this for set-shaped fields large enough to be +// worth compressing; for short []string sets, prefer sortedStringSet which +// keeps the metadata human-debuggable. +func hashSortedSet[T any](items []T, key func(T) string) (string, error) { + if len(items) == 0 { + return "", nil + } + sorted := make([]T, len(items)) + copy(sorted, items) + sort.SliceStable(sorted, func(i, j int) bool { + return key(sorted[i]) < key(sorted[j]) + }) + payload := make([]any, len(sorted)) + for i, t := range sorted { + payload[i] = t + } + itemsJSON, err := schemas.MarshalDeeplySorted(payload) + if err != nil { + return "", err + } + return fmt.Sprintf("%x", xxhash.Sum64(itemsJSON)), nil +} + +// hashMap returns a deterministic xxhash hex digest of the map. Uses +// MarshalDeeplySorted because plain json.Marshal doesn't guarantee key +// ordering on Go maps. +func hashMap(m map[string]interface{}) (string, error) { + jsonData, err := schemas.MarshalDeeplySorted(m) + if err != nil { + return "", fmt.Errorf("failed to marshal metadata for metadata hash: %w", err) + } + return fmt.Sprintf("%x", xxhash.Sum64(jsonData)), nil +} + +// sortedStringSet returns a sorted copy of a string slice that is semantically +// a set (e.g. modalities, stop sequences, include flags). Sorting in place +// would mutate the caller's parameters, so a copy is returned. +func sortedStringSet(values []string) []string { + if len(values) == 0 { + return nil + } + sorted := make([]string, len(values)) + copy(sorted, values) + sort.Strings(sorted) + return sorted +} + +// putIfSet writes m[key] = *v when v is non-nil. Used by extract*ParametersToMetadata +// to collapse the if-nil-set boilerplate that dominates those functions. +func putIfSet[T any](m map[string]any, key string, v *T) { + if v != nil { + m[key] = *v + } +} + +// putSortedSetIfNonEmpty writes m[key] = sortedStringSet(values) when values +// has any entries — otherwise leaves the key absent so the resulting metadata +// hash treats "unset" and "empty" identically. +func putSortedSetIfNonEmpty(m map[string]any, key string, values []string) { + if len(values) > 0 { + m[key] = sortedStringSet(values) + } +} + // normalizeText applies consistent normalization to text inputs for better cache hit rates. // It converts text to lowercase and trims whitespace to reduce cache misses due to minor variations. func normalizeText(text string) string { return strings.ToLower(strings.TrimSpace(text)) } -// Semantic cache keeps vector-store/search payloads as float32 even though -// normalized embedding API responses now preserve provider precision as float64. -func toFloat32Embedding(values []float64) []float32 { +// float64ToFloat32Embedding converts a []float64 to a []float32. The semantic cache +// keeps vector payloads as float32 even though the embedding APIs now +// preserve full float64 precision — the cosine similarity used at query +// time is well within float32 range. +func float64ToFloat32Embedding(values []float64) []float32 { if len(values) == 0 { return nil } @@ -39,355 +140,264 @@ func toFloat32Embedding(values []float64) []float32 { return embedding } -func flattenToFloat32Embedding(values [][]float64) []float32 { - total := 0 - for _, arr := range values { - total += len(arr) - } - if total == 0 { +// int8ToFloat32Embedding promotes a quantized int8 embedding (used for +// binary/quantized formats by some providers) to float32 so the cache can +// store and compare it uniformly against float32 entries. +func int8ToFloat32Embedding(values []int8) []float32 { + if len(values) == 0 { return nil } - - embedding := make([]float32, 0, total) - for _, arr := range values { - embedding = append(embedding, toFloat32Embedding(arr)...) + embedding := make([]float32, len(values)) + for i, value := range values { + embedding[i] = float32(value) } - return embedding } -// generateEmbedding generates an embedding for the given text using the configured provider. -func (plugin *Plugin) generateEmbedding(ctx *schemas.BifrostContext, text string) ([]float32, int, error) { - // Create embedding request - embeddingReq := &schemas.BifrostEmbeddingRequest{ - Provider: plugin.config.Provider, - Model: plugin.config.EmbeddingModel, - Input: &schemas.EmbeddingInput{ - Text: &text, - }, - } - - // Create a new context from incoming context. Parent ctx will be used for cancellation. - embeddingCtx := schemas.NewBifrostContext(ctx, schemas.NoDeadline) - defer embeddingCtx.ReleasePluginScope() - - embeddingCtx.SetValue(schemas.BifrostContextKeySkipPluginPipeline, true) - - if plugin.embeddingRequestExecutor == nil { - return nil, 0, fmt.Errorf("embedding request executor is not configured") - } - response, err := plugin.embeddingRequestExecutor(embeddingCtx, embeddingReq) - if err != nil { - return nil, 0, fmt.Errorf("failed to generate embedding: %v", err) - } - - // Extract the first embedding from response - if len(response.Data) == 0 { - return nil, 0, fmt.Errorf("no embeddings returned from provider") - } - - // Get the embedding from the first data item - embedding := response.Data[0].Embedding - inputTokens := 0 - if response.Usage != nil { - inputTokens = response.Usage.TotalTokens +// int32ToFloat32Embedding promotes a uint8/ubinary-style int32 embedding to +// float32 for the same reason as int8ToFloat32Embedding. +func int32ToFloat32Embedding(values []int32) []float32 { + if len(values) == 0 { + return nil } - - if embedding.EmbeddingStr != nil { - // decode embedding.EmbeddingStr to []float32 - var vals []float32 - if err := json.Unmarshal([]byte(*embedding.EmbeddingStr), &vals); err != nil { - return nil, 0, fmt.Errorf("failed to parse string embedding: %w", err) - } - return vals, inputTokens, nil - } else if embedding.EmbeddingArray != nil { - return toFloat32Embedding(embedding.EmbeddingArray), inputTokens, nil - } else if len(embedding.Embedding2DArray) > 0 { - return flattenToFloat32Embedding(embedding.Embedding2DArray), inputTokens, nil + embedding := make([]float32, len(values)) + for i, value := range values { + embedding[i] = float32(value) } - - return nil, 0, fmt.Errorf("embedding data is not in expected format") + return embedding } -// generateRequestHash creates an xxhash of the request for semantic cache key generation. -// It normalizes the request by including all relevant fields that affect the response: -// - Input (chat completion, text completion, etc.) -// - Parameters (temperature, max_tokens, tools, etc.) -// - Provider (if CacheByProvider is true) -// - Model (if CacheByModel is true) -// -// Note: Fallbacks are excluded as they only affect error handling, not the actual response. -// -// Parameters: -// - req: The Bifrost request to hash for semantic cache key generation -// -// Returns: -// - string: Hexadecimal representation of the xxhash -// - error: Any error that occurred during request normalization or hashing -func (plugin *Plugin) generateRequestHash(req *schemas.BifrostRequest) (string, error) { - // Build canonical metadata first to ensure deterministic hashing - metadata, err := plugin.buildRequestMetadataForCaching(req) - if err != nil { - return "", fmt.Errorf("failed to build metadata for request hash: %w", err) +// flattenToFloat32Embedding concatenates a 2D embedding (one inner slice per +// input chunk) into a single flat []float32. Used when the provider returns +// per-chunk embeddings that we want to store as a single vector. +func flattenToFloat32Embedding(values [][]float64) []float32 { + total := 0 + for _, arr := range values { + total += len(arr) } - - // Create a hash input structure that includes both input and canonical parameters - hashInput := struct { - Input interface{} `json:"input"` - Params map[string]interface{} `json:"params,omitempty"` - }{ - Input: plugin.getNormalizedInputForCaching(req), - Params: metadata, + if total == 0 { + return nil } - // Marshal to JSON with deeply sorted keys for deterministic hashing - // MarshalDeeplySorted handles OrderedMap and nested map[string]interface{} correctly - jsonData, err := schemas.MarshalDeeplySorted(hashInput) - if err != nil { - return "", fmt.Errorf("failed to marshal request for hashing: %w", err) + embedding := make([]float32, 0, total) + for _, arr := range values { + embedding = append(embedding, float64ToFloat32Embedding(arr)...) } - // Generate hash based on configured algorithm - hash := xxhash.Sum64(jsonData) - return fmt.Sprintf("%x", hash), nil + return embedding } -func (plugin *Plugin) buildRequestMetadataForCaching(req *schemas.BifrostRequest) (map[string]interface{}, error) { +// buildRequestMetadataForCaching extracts the canonical, hashable parameter +// set for the request: anything that should change the cache key when it +// changes. The returned map is fed to hashMap to derive params_hash, which +// then anchors both direct and semantic lookups. +func (plugin *Plugin) buildRequestMetadataForCaching(state *cacheState, req *schemas.BifrostRequest) (map[string]interface{}, error) { metadata := map[string]interface{}{ "stream": bifrost.IsStreamRequestType(req.RequestType), } + if attachments := plugin.extractAttachmentsForCaching(state, req); len(attachments) > 0 { + metadata["attachments"] = attachments + } + switch req.RequestType { case schemas.TextCompletionRequest, schemas.TextCompletionStreamRequest: if req.TextCompletionRequest == nil { - return nil, fmt.Errorf("text completion payload is nil (%s)", describeRequestShape(req)) + return nil, fmt.Errorf("text completion payload is nil") } if req.TextCompletionRequest != nil && req.TextCompletionRequest.Params != nil { plugin.extractTextCompletionParametersToMetadata(req.TextCompletionRequest.Params, metadata) } case schemas.ChatCompletionRequest, schemas.ChatCompletionStreamRequest: if req.ChatRequest == nil { - return nil, fmt.Errorf("chat payload is nil (%s)", describeRequestShape(req)) + return nil, fmt.Errorf("chat payload is nil") } if req.ChatRequest != nil && req.ChatRequest.Params != nil { plugin.extractChatParametersToMetadata(req.ChatRequest.Params, metadata) } case schemas.ResponsesRequest, schemas.ResponsesStreamRequest, schemas.WebSocketResponsesRequest: if req.ResponsesRequest == nil { - return nil, fmt.Errorf("responses payload is nil (%s)", describeRequestShape(req)) + return nil, fmt.Errorf("responses payload is nil") } if req.ResponsesRequest != nil && req.ResponsesRequest.Params != nil { plugin.extractResponsesParametersToMetadata(req.ResponsesRequest.Params, metadata) } case schemas.SpeechRequest, schemas.SpeechStreamRequest: if req.SpeechRequest == nil { - return nil, fmt.Errorf("speech payload is nil (%s)", describeRequestShape(req)) + return nil, fmt.Errorf("speech payload is nil") } if req.SpeechRequest != nil && req.SpeechRequest.Params != nil { plugin.extractSpeechParametersToMetadata(req.SpeechRequest.Params, metadata) } case schemas.EmbeddingRequest: if req.EmbeddingRequest == nil { - return nil, fmt.Errorf("embedding payload is nil (%s)", describeRequestShape(req)) + return nil, fmt.Errorf("embedding payload is nil") } if req.EmbeddingRequest != nil && req.EmbeddingRequest.Params != nil { plugin.extractEmbeddingParametersToMetadata(req.EmbeddingRequest.Params, metadata) } case schemas.TranscriptionRequest, schemas.TranscriptionStreamRequest: if req.TranscriptionRequest == nil { - return nil, fmt.Errorf("transcription payload is nil (%s)", describeRequestShape(req)) + return nil, fmt.Errorf("transcription payload is nil") } if req.TranscriptionRequest != nil && req.TranscriptionRequest.Params != nil { plugin.extractTranscriptionParametersToMetadata(req.TranscriptionRequest.Params, metadata) } case schemas.ImageGenerationRequest, schemas.ImageGenerationStreamRequest: if req.ImageGenerationRequest == nil { - return nil, fmt.Errorf("image generation payload is nil (%s)", describeRequestShape(req)) + return nil, fmt.Errorf("image generation payload is nil") } if req.ImageGenerationRequest != nil && req.ImageGenerationRequest.Params != nil { plugin.extractImageGenerationParametersToMetadata(req.ImageGenerationRequest.Params, metadata) } default: - return nil, fmt.Errorf("unsupported request type for semantic caching (%s)", describeRequestShape(req)) + return nil, fmt.Errorf("unsupported request type for semantic caching") } return metadata, nil } -// isSemanticCacheSupportedRequestType reports whether semantic cache supports -// this request type for cache lookup and storage. Unsupported types are skipped. -// -// IMPORTANT: this list must stay in sync with the switch in buildRequestMetadataForCaching. -// When adding a new case there, add it here too. -func isSemanticCacheSupportedRequestType(requestType schemas.RequestType) bool { - switch requestType { - case schemas.TextCompletionRequest, - schemas.TextCompletionStreamRequest, - schemas.ChatCompletionRequest, - schemas.ChatCompletionStreamRequest, - schemas.ResponsesRequest, - schemas.ResponsesStreamRequest, - schemas.WebSocketResponsesRequest, - schemas.SpeechRequest, - schemas.SpeechStreamRequest, - schemas.EmbeddingRequest, - schemas.TranscriptionRequest, - schemas.TranscriptionStreamRequest, - schemas.ImageGenerationRequest, - schemas.ImageGenerationStreamRequest: - return true - default: - return false +// extractAttachmentsForCaching collects image/file URLs referenced by the +// request input in document order. Attachments are part of the cache key — +// two messages with identical text but different images must not collide. +// Honors ExcludeSystemPrompt via getInputForCaching. Returns nil for +// request types without attachment-bearing content blocks. +func (plugin *Plugin) extractAttachmentsForCaching(state *cacheState, req *schemas.BifrostRequest) []string { + switch req.RequestType { + case schemas.ChatCompletionRequest, schemas.ChatCompletionStreamRequest: + messages, ok := plugin.getInputForCaching(state, req).([]schemas.ChatMessage) + if !ok { + return nil + } + var attachments []string + for _, msg := range messages { + if msg.Content == nil || msg.Content.ContentBlocks == nil { + continue + } + for _, block := range msg.Content.ContentBlocks { + if block.ImageURLStruct != nil && block.ImageURLStruct.URL != "" { + attachments = append(attachments, block.ImageURLStruct.URL) + } + } + } + return attachments + case schemas.ResponsesRequest, schemas.ResponsesStreamRequest, schemas.WebSocketResponsesRequest: + messages, ok := plugin.getInputForCaching(state, req).([]schemas.ResponsesMessage) + if !ok { + return nil + } + var attachments []string + for _, msg := range messages { + if msg.Content == nil || msg.Content.ContentBlocks == nil { + continue + } + for _, block := range msg.Content.ContentBlocks { + if block.ResponsesInputMessageContentBlockImage != nil && block.ResponsesInputMessageContentBlockImage.ImageURL != nil { + attachments = append(attachments, *block.ResponsesInputMessageContentBlockImage.ImageURL) + } + if block.ResponsesInputMessageContentBlockFile != nil && block.ResponsesInputMessageContentBlockFile.FileURL != nil { + attachments = append(attachments, *block.ResponsesInputMessageContentBlockFile.FileURL) + } + } + } + return attachments } + return nil } -func (plugin *Plugin) computeRequestParamsHash(req *schemas.BifrostRequest) (string, error) { - metadata, err := plugin.buildRequestMetadataForCaching(req) - if err != nil { - return "", err - } - - hash, err := getMetadataHash(metadata) - if err != nil { - return "", fmt.Errorf("failed to compute params hash (%s): %w", describeRequestShape(req), err) +// extractChatMessageContent flattens a ChatMessage's content (string or +// blocks) into a single space-joined string. Returns "" when the message +// carries no text (e.g. assistant tool-call messages with nil content). +func extractChatMessageContent(msg schemas.ChatMessage) string { + if msg.Content == nil { + return "" + } + if msg.Content.ContentStr != nil { + return *msg.Content.ContentStr + } + if msg.Content.ContentBlocks != nil { + var parts []string + for _, block := range msg.Content.ContentBlocks { + if block.Text != nil { + parts = append(parts, *block.Text) + } + } + return strings.Join(parts, " ") } - return hash, nil + return "" } -// describeRequestShape summarizes the request families relevant to semantic -// cache lookups and diagnostics. It is intentionally scoped to request types -// that can participate in semantic cache behavior. -func describeRequestShape(req *schemas.BifrostRequest) string { - if req == nil { - return "request=nil" +// extractResponsesMessageContent flattens a ResponsesMessage's content into a +// single string, mirroring extractChatMessageContent but for the Responses API. +func extractResponsesMessageContent(msg schemas.ResponsesMessage) string { + if msg.Content == nil { + return "" } - - return fmt.Sprintf( - "request_type=%s text=%t chat=%t responses=%t embedding=%t speech=%t transcription=%t image=%t", - req.RequestType, - req.TextCompletionRequest != nil, - req.ChatRequest != nil, - req.ResponsesRequest != nil, - req.EmbeddingRequest != nil, - req.SpeechRequest != nil, - req.TranscriptionRequest != nil, - req.ImageGenerationRequest != nil, - ) + if msg.Content.ContentStr != nil { + return *msg.Content.ContentStr + } + if msg.Content.ContentBlocks != nil { + var parts []string + for _, block := range msg.Content.ContentBlocks { + if block.Text != nil { + parts = append(parts, *block.Text) + } + } + return strings.Join(parts, " ") + } + return "" } -// extractTextForEmbedding extracts meaningful text from different input types for embedding generation. -// Returns the text to embed and metadata for storage. +// extractTextForEmbedding flattens the request input into a single string +// suitable for embedding generation. PreLLMHook short-circuits embedding and +// transcription requests before this is called (their inputs aren't +// themselves embeddable), so this function only handles request types that +// reach performSemanticSearch. // // Text serialization format (for cache consistency): // - Chat API: "role: content" // - Responses API: "role: msgType: content" (when msgType is present), "role: content" (when msgType is empty) -// -// Note: Format updated to conditionally include msgType to avoid double colons and maintain consistency. -func (plugin *Plugin) extractTextForEmbedding(req *schemas.BifrostRequest) (string, string, error) { - metadata, err := plugin.buildRequestMetadataForCaching(req) - if err != nil { - return "", "", err - } - attachments := []string{} - +func (plugin *Plugin) extractTextForEmbedding(state *cacheState, req *schemas.BifrostRequest) (string, error) { switch { case req.TextCompletionRequest != nil: - metadataHash, err := getMetadataHash(metadata) - if err != nil { - return "", "", fmt.Errorf("failed to marshal metadata for metadata hash: %w", err) - } - - var textContent string if req.TextCompletionRequest.Input.PromptStr != nil { - textContent = normalizeText(*req.TextCompletionRequest.Input.PromptStr) - } else if len(req.TextCompletionRequest.Input.PromptArray) > 0 { - textContent = normalizeText(strings.Join(req.TextCompletionRequest.Input.PromptArray, " ")) + return normalizeText(*req.TextCompletionRequest.Input.PromptStr), nil } - return textContent, metadataHash, nil + if len(req.TextCompletionRequest.Input.PromptArray) > 0 { + return normalizeText(strings.Join(req.TextCompletionRequest.Input.PromptArray, " ")), nil + } + return "", fmt.Errorf("no prompt found in text completion request") case req.ChatRequest != nil: - reqInput, ok := plugin.getInputForCaching(req).([]schemas.ChatMessage) + reqInput, ok := plugin.getInputForCaching(state, req).([]schemas.ChatMessage) if !ok { - return "", "", fmt.Errorf("failed to cast request input to chat messages") + return "", fmt.Errorf("failed to cast request input to chat messages") } - - // Serialize chat messages for embedding var textParts []string for _, msg := range reqInput { - // Extract content as string - // Content can be nil for messages like assistant tool-call messages - var content string - if msg.Content != nil { - if msg.Content.ContentStr != nil { - content = *msg.Content.ContentStr - } else if msg.Content.ContentBlocks != nil { - // For content blocks, extract text parts - var blockTexts []string - for _, block := range msg.Content.ContentBlocks { - if block.Text != nil { - blockTexts = append(blockTexts, *block.Text) - } - if block.ImageURLStruct != nil && block.ImageURLStruct.URL != "" { - attachments = append(attachments, block.ImageURLStruct.URL) - } - } - content = strings.Join(blockTexts, " ") - } - } - - if content != "" { - textParts = append(textParts, fmt.Sprintf("%s: %s", msg.Role, normalizeText(content))) + content := extractChatMessageContent(msg) + if content == "" { + continue } + textParts = append(textParts, fmt.Sprintf("%s: %s", msg.Role, normalizeText(content))) } - if len(textParts) == 0 { - return "", "", fmt.Errorf("no text content found in chat messages") - } - - if len(attachments) > 0 { - metadata["attachments"] = attachments + return "", fmt.Errorf("no text content found in chat messages") } - - metadataHash, err := getMetadataHash(metadata) - if err != nil { - return "", "", fmt.Errorf("failed to marshal metadata for metadata hash: %w", err) - } - - return strings.Join(textParts, "\n"), metadataHash, nil + return strings.Join(textParts, "\n"), nil case req.ResponsesRequest != nil: - reqInput, ok := plugin.getInputForCaching(req).([]schemas.ResponsesMessage) + reqInput, ok := plugin.getInputForCaching(state, req).([]schemas.ResponsesMessage) if !ok { - return "", "", fmt.Errorf("failed to cast request input to responses messages") + return "", fmt.Errorf("failed to cast request input to responses messages") } - - // Serialize chat messages for embedding var textParts []string for _, msg := range reqInput { - // Extract content as string - // Content can be nil for messages like assistant tool-call messages - var content string - if msg.Content != nil { - if msg.Content.ContentStr != nil { - content = normalizeText(*msg.Content.ContentStr) - } else if msg.Content.ContentBlocks != nil { - // For content blocks, extract text parts - var blockTexts []string - for _, block := range msg.Content.ContentBlocks { - if block.Text != nil { - blockTexts = append(blockTexts, normalizeText(*block.Text)) - } - if block.ResponsesInputMessageContentBlockImage != nil && block.ResponsesInputMessageContentBlockImage.ImageURL != nil { - attachments = append(attachments, *block.ResponsesInputMessageContentBlockImage.ImageURL) - } - if block.ResponsesInputMessageContentBlockFile != nil && block.ResponsesInputMessageContentBlockFile.FileURL != nil { - attachments = append(attachments, *block.ResponsesInputMessageContentBlockFile.FileURL) - } - } - content = strings.Join(blockTexts, " ") - } + content := extractResponsesMessageContent(msg) + if content == "" { + continue } - + content = normalizeText(content) role := "" msgType := "" if msg.Role != nil { @@ -396,396 +406,291 @@ func (plugin *Plugin) extractTextForEmbedding(req *schemas.BifrostRequest) (stri if msg.Type != nil { msgType = string(*msg.Type) } - - if content != "" { - if msgType != "" { - textParts = append(textParts, fmt.Sprintf("%s: %s: %s", role, msgType, content)) - } else { - textParts = append(textParts, fmt.Sprintf("%s: %s", role, content)) - } + if msgType != "" { + textParts = append(textParts, fmt.Sprintf("%s: %s: %s", role, msgType, content)) + } else { + textParts = append(textParts, fmt.Sprintf("%s: %s", role, content)) } } - if len(textParts) == 0 { - return "", "", fmt.Errorf("no text content found in chat messages") - } - - if len(attachments) > 0 { - metadata["attachments"] = attachments + return "", fmt.Errorf("no text content found in responses messages") } - - metadataHash, err := getMetadataHash(metadata) - if err != nil { - return "", "", fmt.Errorf("failed to marshal metadata for metadata hash: %w", err) - } - - return strings.Join(textParts, "\n"), metadataHash, nil + return strings.Join(textParts, "\n"), nil case req.SpeechRequest != nil: - if req.SpeechRequest.Input.Input != "" { - metadataHash, err := getMetadataHash(metadata) - if err != nil { - return "", "", fmt.Errorf("failed to marshal metadata for metadata hash: %w", err) - } - - return req.SpeechRequest.Input.Input, metadataHash, nil - } - return "", "", fmt.Errorf("no input text found in speech request") - - case req.EmbeddingRequest != nil: - metadataHash, err := getMetadataHash(metadata) - if err != nil { - return "", "", fmt.Errorf("failed to marshal metadata for metadata hash: %w", err) - } - - texts := req.EmbeddingRequest.Input.Texts - - if len(texts) == 0 && req.EmbeddingRequest.Input.Text != nil { - texts = []string{*req.EmbeddingRequest.Input.Text} - } - - var text string - for _, t := range texts { - text += t + " " + if req.SpeechRequest.Input.Input == "" { + return "", fmt.Errorf("no input text found in speech request") } - - return strings.TrimSpace(text), metadataHash, nil - - case req.TranscriptionRequest != nil: - // Skip semantic caching for transcription requests - return "", "", fmt.Errorf("transcription requests are not supported for semantic caching") + return normalizeText(req.SpeechRequest.Input.Input), nil case req.ImageGenerationRequest != nil: if req.ImageGenerationRequest.Input == nil || req.ImageGenerationRequest.Input.Prompt == "" { - return "", "", fmt.Errorf("no prompt found in image generation request") + return "", fmt.Errorf("no prompt found in image generation request") } - metadataHash, err := getMetadataHash(metadata) - if err != nil { - return "", "", fmt.Errorf("failed to marshal metadata for metadata hash: %w", err) - } - return normalizeText(req.ImageGenerationRequest.Input.Prompt), metadataHash, nil + return normalizeText(req.ImageGenerationRequest.Input.Prompt), nil default: - return "", "", fmt.Errorf("unsupported input type for semantic caching (%s)", describeRequestShape(req)) - } -} - -func getMetadataHash(metadata map[string]interface{}) (string, error) { - // Use MarshalDeeplySorted for deterministic hashing - plain json.Marshal - // doesn't guarantee key ordering since Go maps have random iteration order - metadataJSON, err := schemas.MarshalDeeplySorted(metadata) - if err != nil { - return "", fmt.Errorf("failed to marshal metadata for metadata hash: %w", err) + return "", fmt.Errorf("unsupported input type for semantic caching") } - return fmt.Sprintf("%x", xxhash.Sum64(metadataJSON)), nil -} - -func (plugin *Plugin) generateDirectCacheID(provider schemas.ModelProvider, model string, cacheKey string, requestHash string, paramsHash string) string { - idInput := struct { - CacheKey string `json:"cache_key"` - RequestHash string `json:"request_hash"` - ParamsHash string `json:"params_hash"` - Provider string `json:"provider,omitempty"` - Model string `json:"model,omitempty"` - }{ - CacheKey: cacheKey, - RequestHash: requestHash, - ParamsHash: paramsHash, - } - - if plugin.config.CacheByProvider != nil && *plugin.config.CacheByProvider { - idInput.Provider = string(provider) - } - if plugin.config.CacheByModel != nil && *plugin.config.CacheByModel { - idInput.Model = model - } - - idJSON, err := schemas.MarshalDeeplySorted(idInput) - if err != nil { - // Fallback: derive deterministic UUID from concatenated inputs - fallbackStr := cacheKey + requestHash + paramsHash - if plugin.config.CacheByProvider != nil && *plugin.config.CacheByProvider { - fallbackStr += string(provider) - } - if plugin.config.CacheByModel != nil && *plugin.config.CacheByModel { - fallbackStr += model - } - return uuid.NewSHA1(directCacheNamespace, []byte(fallbackStr)).String() - } - - return uuid.NewSHA1(directCacheNamespace, idJSON).String() } -// buildUnifiedMetadata constructs the unified metadata structure for VectorEntry -func (plugin *Plugin) buildUnifiedMetadata(provider schemas.ModelProvider, model string, paramsHash string, requestHash string, cacheKey string, ttl time.Duration) map[string]interface{} { +// buildUnifiedMetadata builds the property map written alongside the cache +// entry: the columns the vector store indexes for filtering (cache_key, +// provider, model, params_hash, expires_at) plus the from_bifrost marker +// used by Cleanup and ClearCacheForKey to scope deletes. Caller still adds +// the response payload (response or stream_chunks) before Add. +func (plugin *Plugin) buildUnifiedMetadata(provider schemas.ModelProvider, model string, paramsHash string, cacheKey string, ttl time.Duration) map[string]interface{} { unifiedMetadata := make(map[string]interface{}) - - // Top-level fields (outside params) unifiedMetadata["provider"] = string(provider) unifiedMetadata["model"] = model - unifiedMetadata["request_hash"] = requestHash unifiedMetadata["cache_key"] = cacheKey unifiedMetadata["from_bifrost_semantic_cache_plugin"] = true - - // Calculate expiration timestamp (current time + TTL) - expiresAt := time.Now().Add(ttl).Unix() - unifiedMetadata["expires_at"] = expiresAt - - // Individual param fields will be stored as params_* by the vectorstore - // We pass the params map to the vectorstore, and it handles the individual field storage + unifiedMetadata["expires_at"] = time.Now().Add(ttl).Unix() if paramsHash != "" { unifiedMetadata["params_hash"] = paramsHash } - return unifiedMetadata } -// addSingleResponse stores a single (non-streaming) response in unified VectorEntry format. -// responseData is the pre-marshaled JSON of the response; the caller must marshal -// synchronously before spawning the cache goroutine so the marshal cannot race -// with downstream mutation of the response struct. -func (plugin *Plugin) addSingleResponse(ctx context.Context, responseID string, responseData []byte, embedding []float32, metadata map[string]interface{}, ttl time.Duration) error { - // Add response field to metadata +// addNonStreamingResponse marshals the response and writes it as a single +// cache entry. The metadata map is mutated (response + stream_chunks added) +// — safe because the calling goroutine owns it. The ttl parameter is +// retained for symmetry with addStreamingResponse; the actual expiry is +// already encoded in metadata["expires_at"] by buildUnifiedMetadata. +func (plugin *Plugin) addNonStreamingResponse(ctx context.Context, responseID string, res *schemas.BifrostResponse, embedding []float32, metadata map[string]interface{}, ttl time.Duration) error { + responseData, err := json.Marshal(res) + if err != nil { + return fmt.Errorf("failed to marshal response: %w", err) + } metadata["response"] = string(responseData) metadata["stream_chunks"] = []string{} - // Store unified entry using new VectorStore interface if err := plugin.store.Add(ctx, plugin.config.VectorStoreNamespace, responseID, embedding, metadata); err != nil { return fmt.Errorf("failed to store unified cache entry: %w", err) } - plugin.logger.Debug(fmt.Sprintf("%s Successfully cached single response with ID: %s", PluginLoggerPrefix, responseID)) + plugin.logger.Debug("Successfully cached single response with ID: %s", responseID) return nil } -// addStreamingResponse handles streaming response storage by accumulating chunks -func (plugin *Plugin) addStreamingResponse(ctx context.Context, requestID string, storageID string, res *schemas.BifrostResponse, bifrostErr *schemas.BifrostError, embedding []float32, metadata map[string]interface{}, ttl time.Duration, isFinalChunk bool) error { - // Create accumulator if it doesn't exist +// addStreamingResponse appends one chunk to the per-request accumulator and, +// when the final chunk arrives, flushes the accumulated stream to the cache. +// Errors never reach this function: PostLLMHook returns early on bifrostErr +// (errors are always delivered as the final chunk), so an errored stream +// simply leaves its accumulator behind for the periodic reaper. +func (plugin *Plugin) addStreamingResponse(ctx context.Context, requestID string, storageID string, res *schemas.BifrostResponse, embedding []float32, metadata map[string]interface{}, ttl time.Duration, isFinalChunk bool) error { accumulator := plugin.getOrCreateStreamAccumulator(requestID, storageID, embedding, metadata, ttl) - // Create chunk from current response chunk := &StreamChunk{ Timestamp: time.Now(), Response: res, } - - // Check for finish reason or set error finish reason - if bifrostErr != nil { - // Error case - mark as final chunk with error - chunk.FinishReason = bifrost.Ptr("error") - } else if res != nil && res.ChatResponse != nil && len(res.ChatResponse.Choices) > 0 { - choice := res.ChatResponse.Choices[0] - if choice.ChatStreamResponseChoice != nil { - chunk.FinishReason = choice.FinishReason - } + if err := plugin.addStreamChunk(requestID, chunk); err != nil { + return fmt.Errorf("failed to add stream chunk: %w", err) } - // Add chunk to accumulator synchronously to maintain order - if err := plugin.addStreamChunk(requestID, chunk, isFinalChunk); err != nil { - return fmt.Errorf("failed to add stream chunk: %w", err) + if !isFinalChunk { + return nil } - // Check if this is the final chunk and gate final processing to ensure single invocation + // Gate final processing so it runs exactly once even if multiple chunks + // race here (shouldn't happen in practice but cheap insurance). accumulator.mu.Lock() - // Check for completion: either FinishReason is present, there's an error, or token usage exists alreadyComplete := accumulator.IsComplete - - // Track if any chunk has an error - if bifrostErr != nil { - accumulator.HasError = true - } - - if isFinalChunk && !alreadyComplete { + if !alreadyComplete { accumulator.IsComplete = true - accumulator.FinalTimestamp = chunk.Timestamp } accumulator.mu.Unlock() - // If this is the final chunk and hasn't been processed yet, process accumulated chunks - // Note: processAccumulatedStream will check for errors and skip caching if any errors occurred - if isFinalChunk && !alreadyComplete { - if processErr := plugin.processAccumulatedStream(ctx, requestID); processErr != nil { - plugin.logger.Warn("%s Failed to process accumulated stream for request %s: %v", PluginLoggerPrefix, requestID, processErr) - } + if alreadyComplete { + return nil + } + if err := plugin.processAccumulatedStream(ctx, requestID); err != nil { + plugin.logger.Warn("Failed to process accumulated stream for request %s: %v", requestID, err) } - return nil } -// getInputForCaching extracts request input for hashing/embedding without normalization. -// For Chat/Responses requests, it filters out system messages if configured but returns shallow copies. -// For other request types, it returns direct references to the input. -func (plugin *Plugin) getInputForCaching(req *schemas.BifrostRequest) interface{} { - switch req.RequestType { - case schemas.TextCompletionRequest, schemas.TextCompletionStreamRequest: - return req.TextCompletionRequest.Input - case schemas.ChatCompletionRequest, schemas.ChatCompletionStreamRequest: - originalMessages := req.ChatRequest.Input - filteredMessages := make([]schemas.ChatMessage, 0, len(originalMessages)) - for _, msg := range originalMessages { - // Skip system messages if configured to exclude them - if plugin.config.ExcludeSystemPrompt != nil && *plugin.config.ExcludeSystemPrompt && msg.Role == schemas.ChatMessageRoleSystem { +// parseStreamChunks parses stream_chunks data from the various shapes +// different vector store drivers hand back (Weaviate's JSON-decoded +// []interface{}, typed []string, or Redis's JSON-encoded string) into a +// flat []string of per-chunk JSON payloads. +// +// Non-string elements in the []interface{} case are dropped with a warning +// rather than failing the whole replay — partial cache hits are better than +// no hit at all. +func (plugin *Plugin) parseStreamChunks(streamData interface{}) ([]string, error) { + if streamData == nil { + return nil, fmt.Errorf("stream data is nil") + } + + switch v := streamData.(type) { + case []string: + return v, nil + case []interface{}: + result := make([]string, 0, len(v)) + for i, item := range v { + s, ok := item.(string) + if !ok { + plugin.logger.Warn("Stream chunk %d is not a string (got %T), skipping", i, item) continue } - filteredMessages = append(filteredMessages, msg) + result = append(result, s) } - return filteredMessages - case schemas.ResponsesRequest, schemas.ResponsesStreamRequest, schemas.WebSocketResponsesRequest: - originalMessages := req.ResponsesRequest.Input - filteredMessages := make([]schemas.ResponsesMessage, 0, len(originalMessages)) - for _, msg := range originalMessages { - // Skip system messages if configured to exclude them - if plugin.config.ExcludeSystemPrompt != nil && *plugin.config.ExcludeSystemPrompt && msg.Role != nil && *msg.Role == schemas.ResponsesInputMessageRoleSystem { - continue - } - filteredMessages = append(filteredMessages, msg) + return result, nil + case string: + // Redis: stream_chunks stored as a JSON-encoded array of strings. + var stringArray []string + if err := json.Unmarshal([]byte(v), &stringArray); err != nil { + return nil, fmt.Errorf("failed to parse JSON string: %w", err) } - return filteredMessages + return stringArray, nil + default: + return nil, fmt.Errorf("unsupported stream data type: %T", streamData) + } +} + +// getInputForCaching extracts request input for hashing/embedding without +// normalization. For Chat/Responses requests, system messages are filtered +// out when ExcludeSystemPrompt is enabled — that path returns a fresh slice; +// otherwise the original slice is returned by reference (no allocation). +// Other request types always return the underlying input directly. +// +// The slice for Chat/Responses is memoized on state so attachment extraction, +// embedding text extraction, and the history-threshold check reuse the same +// slice instead of re-walking on each call. State may be nil (tests / +// pre-state callers), in which case nothing is cached. +func (plugin *Plugin) getInputForCaching(state *cacheState, req *schemas.BifrostRequest) interface{} { + if state != nil && state.FilteredInput != nil { + return state.FilteredInput + } + excludeSystem := plugin.config.ExcludeSystemPrompt != nil && *plugin.config.ExcludeSystemPrompt + var out interface{} + switch req.RequestType { + case schemas.TextCompletionRequest, schemas.TextCompletionStreamRequest: + out = req.TextCompletionRequest.Input + case schemas.ChatCompletionRequest, schemas.ChatCompletionStreamRequest: + out = filterChatMessages(req.ChatRequest.Input, excludeSystem) + case schemas.ResponsesRequest, schemas.ResponsesStreamRequest, schemas.WebSocketResponsesRequest: + out = filterResponsesMessages(req.ResponsesRequest.Input, excludeSystem) case schemas.SpeechRequest, schemas.SpeechStreamRequest: - return req.SpeechRequest.Input.Input + out = req.SpeechRequest.Input.Input case schemas.EmbeddingRequest: - return req.EmbeddingRequest.Input + out = req.EmbeddingRequest.Input case schemas.TranscriptionRequest, schemas.TranscriptionStreamRequest: - return req.TranscriptionRequest.Input + out = req.TranscriptionRequest.Input case schemas.ImageGenerationRequest, schemas.ImageGenerationStreamRequest: - return req.ImageGenerationRequest.Input + out = req.ImageGenerationRequest.Input default: return nil } + if state != nil { + state.FilteredInput = out + } + return out +} + +// filterChatMessages returns msgs unchanged when excludeSystem is false. +// Otherwise, returns a copy with system messages dropped. +func filterChatMessages(msgs []schemas.ChatMessage, excludeSystem bool) []schemas.ChatMessage { + if !excludeSystem { + return msgs + } + out := make([]schemas.ChatMessage, 0, len(msgs)) + for _, m := range msgs { + if m.Role == schemas.ChatMessageRoleSystem { + continue + } + out = append(out, m) + } + return out } -// getNormalizedInputForCaching returns a copy of req.Input for hashing/embedding. The input is normalized. -// It applies text normalization (lowercase + trim) and optionally removes system messages. +// filterResponsesMessages returns msgs unchanged when excludeSystem is false. +// Otherwise, returns a copy with system messages dropped. +func filterResponsesMessages(msgs []schemas.ResponsesMessage, excludeSystem bool) []schemas.ResponsesMessage { + if !excludeSystem { + return msgs + } + out := make([]schemas.ResponsesMessage, 0, len(msgs)) + for _, m := range msgs { + if m.Role != nil && *m.Role == schemas.ResponsesInputMessageRoleSystem { + continue + } + out = append(out, m) + } + return out +} + +// getNormalizedInputForCaching returns a copy of req.Input with text fields +// lowercased + trimmed, suitable for hashing/embedding. System messages are +// dropped when ExcludeSystemPrompt is enabled. +// +// Allocation strategy: the original request must never be mutated, but the +// returned value only needs to round-trip through json.Marshal — it's hashed, +// not stored. So we shallow-copy each message struct and rewrite Content +// (the only field we normalize), sharing all other pointer fields with the +// original. This avoids the per-call message-graph deep copy that +// schemas.DeepCopy*Message would otherwise do. func (plugin *Plugin) getNormalizedInputForCaching(req *schemas.BifrostRequest) interface{} { + excludeSystem := plugin.config.ExcludeSystemPrompt != nil && *plugin.config.ExcludeSystemPrompt switch req.RequestType { case schemas.TextCompletionRequest, schemas.TextCompletionStreamRequest: - // Create a deep copy of the input to avoid mutating the original request - copiedInput := schemas.TextCompletionInput{} - if req.TextCompletionRequest.Input.PromptStr != nil { - copiedPromptStr := *req.TextCompletionRequest.Input.PromptStr - copiedInput.PromptStr = &copiedPromptStr - } else if len(req.TextCompletionRequest.Input.PromptArray) > 0 { - copiedPromptArray := make([]string, len(req.TextCompletionRequest.Input.PromptArray)) - copy(copiedPromptArray, req.TextCompletionRequest.Input.PromptArray) - copiedInput.PromptArray = copiedPromptArray - } - - if copiedInput.PromptStr != nil { - normalizedText := normalizeText(*copiedInput.PromptStr) - copiedInput.PromptStr = &normalizedText - } else if len(copiedInput.PromptArray) > 0 { - // Create a copy of the PromptArray and normalize each element - normalizedPromptArray := make([]string, len(copiedInput.PromptArray)) - copy(normalizedPromptArray, copiedInput.PromptArray) - for i, prompt := range normalizedPromptArray { - normalizedPromptArray[i] = normalizeText(prompt) + input := req.TextCompletionRequest.Input + out := schemas.TextCompletionInput{} + if input.PromptStr != nil { + ns := normalizeText(*input.PromptStr) + out.PromptStr = &ns + } else if len(input.PromptArray) > 0 { + arr := make([]string, len(input.PromptArray)) + for i, p := range input.PromptArray { + arr[i] = normalizeText(p) } - copiedInput.PromptArray = normalizedPromptArray + out.PromptArray = arr } - return copiedInput + return out case schemas.ChatCompletionRequest, schemas.ChatCompletionStreamRequest: originalMessages := req.ChatRequest.Input normalizedMessages := make([]schemas.ChatMessage, 0, len(originalMessages)) - for _, msg := range originalMessages { - // Skip system messages if configured to exclude them - if plugin.config.ExcludeSystemPrompt != nil && *plugin.config.ExcludeSystemPrompt && msg.Role == schemas.ChatMessageRoleSystem { + if excludeSystem && msg.Role == schemas.ChatMessageRoleSystem { continue } - - // Create a deep copy of the message with normalized content - normalizedMsg := schemas.DeepCopyChatMessage(msg) - - // Normalize message content - // Content can be nil for messages like assistant tool-call messages - if msg.Content != nil { - if msg.Content.ContentStr != nil { - normalizedContent := normalizeText(*msg.Content.ContentStr) - normalizedMsg.Content.ContentStr = &normalizedContent - } else if msg.Content.ContentBlocks != nil { - // Create a copy of content blocks with normalized text - normalizedBlocks := make([]schemas.ChatContentBlock, len(msg.Content.ContentBlocks)) - for i, block := range msg.Content.ContentBlocks { - normalizedBlocks[i] = block - if block.Text != nil { - normalizedText := normalizeText(*block.Text) - normalizedBlocks[i].Text = &normalizedText - } - } - normalizedMsg.Content.ContentBlocks = normalizedBlocks - } - } - - normalizedMessages = append(normalizedMessages, normalizedMsg) + normalizedMessages = append(normalizedMessages, normalizeChatMessage(msg)) } return normalizedMessages case schemas.ResponsesRequest, schemas.ResponsesStreamRequest, schemas.WebSocketResponsesRequest: originalMessages := req.ResponsesRequest.Input normalizedMessages := make([]schemas.ResponsesMessage, 0, len(originalMessages)) - for _, msg := range originalMessages { - // Skip system messages if configured to exclude them - if plugin.config.ExcludeSystemPrompt != nil && *plugin.config.ExcludeSystemPrompt && msg.Role != nil && *msg.Role == schemas.ResponsesInputMessageRoleSystem { + if excludeSystem && msg.Role != nil && *msg.Role == schemas.ResponsesInputMessageRoleSystem { continue } - - // Create a deep copy of the message with normalized content - normalizedMsg := schemas.DeepCopyResponsesMessage(msg) - - // Create a deep copy of the Content to avoid modifying the original - if msg.Content != nil { - if msg.Content.ContentStr != nil { - normalizedText := normalizeText(*msg.Content.ContentStr) - normalizedMsg.Content.ContentStr = &normalizedText - } else if msg.Content.ContentBlocks != nil { - // Create a copy of content blocks with normalized text - normalizedBlocks := make([]schemas.ResponsesMessageContentBlock, len(msg.Content.ContentBlocks)) - for i, block := range msg.Content.ContentBlocks { - normalizedBlocks[i] = block - if block.Text != nil { - normalizedText := normalizeText(*block.Text) - normalizedBlocks[i].Text = &normalizedText - } - } - normalizedMsg.Content.ContentBlocks = normalizedBlocks - } - } - - normalizedMessages = append(normalizedMessages, normalizedMsg) + normalizedMessages = append(normalizedMessages, normalizeResponsesMessage(msg)) } return normalizedMessages case schemas.SpeechRequest, schemas.SpeechStreamRequest: return normalizeText(req.SpeechRequest.Input.Input) case schemas.EmbeddingRequest: - // Create a deep copy of the input to avoid mutating the original request - copiedInput := schemas.EmbeddingInput{} - if req.EmbeddingRequest.Input.Text != nil { - copiedText := *req.EmbeddingRequest.Input.Text - copiedInput.Text = &copiedText - } else if len(req.EmbeddingRequest.Input.Texts) > 0 { - copiedTexts := make([]string, len(req.EmbeddingRequest.Input.Texts)) - copy(copiedTexts, req.EmbeddingRequest.Input.Texts) - copiedInput.Texts = copiedTexts - } else if req.EmbeddingRequest.Input.Embedding != nil { - copiedEmbedding := make([]int, len(req.EmbeddingRequest.Input.Embedding)) - copy(copiedEmbedding, req.EmbeddingRequest.Input.Embedding) - copiedInput.Embedding = copiedEmbedding - } else if req.EmbeddingRequest.Input.Embeddings != nil { - copiedEmbeddings := make([][]int, len(req.EmbeddingRequest.Input.Embeddings)) - copy(copiedEmbeddings, req.EmbeddingRequest.Input.Embeddings) - copiedInput.Embeddings = copiedEmbeddings - } - if copiedInput.Text != nil { - normalizedText := normalizeText(*copiedInput.Text) - copiedInput.Text = &normalizedText - } else if len(copiedInput.Texts) > 0 { - normalizedTexts := make([]string, len(copiedInput.Texts)) - for i, text := range copiedInput.Texts { - normalizedTexts[i] = normalizeText(text) + input := req.EmbeddingRequest.Input + out := schemas.EmbeddingInput{} + if input.Text != nil { + ns := normalizeText(*input.Text) + out.Text = &ns + } else if len(input.Texts) > 0 { + arr := make([]string, len(input.Texts)) + for i, t := range input.Texts { + arr[i] = normalizeText(t) } - copiedInput.Texts = normalizedTexts - } - return copiedInput + out.Texts = arr + } else if input.Embedding != nil { + // Numeric embeddings aren't text-normalizable but must still appear + // in the hash payload, so copy the slice to avoid aliasing. + out.Embedding = append([]int(nil), input.Embedding...) + } else if input.Embeddings != nil { + out.Embeddings = append([][]int(nil), input.Embeddings...) + } + return out case schemas.TranscriptionRequest, schemas.TranscriptionStreamRequest: return req.TranscriptionRequest.Input case schemas.ImageGenerationRequest, schemas.ImageGenerationStreamRequest: @@ -800,18 +705,60 @@ func (plugin *Plugin) getNormalizedInputForCaching(req *schemas.BifrostRequest) } } -// removeField removes the first occurrence of target from the slice. -func removeField(arr []string, target string) []string { - for i, v := range arr { - if v == target { - // remove element at index i - return append(arr[:i], arr[i+1:]...) +// normalizeChatMessage returns a shallow copy of msg with its Content +// rewritten so text fields are lowercased + trimmed. Other pointer fields +// (ToolCalls, Annotations, ChatToolMessage, ChatAssistantMessage) are +// aliased — safe because we don't mutate them. +func normalizeChatMessage(msg schemas.ChatMessage) schemas.ChatMessage { + out := msg + if msg.Content == nil { + return out + } + nc := *msg.Content + if msg.Content.ContentStr != nil { + ns := normalizeText(*msg.Content.ContentStr) + nc.ContentStr = &ns + } else if msg.Content.ContentBlocks != nil { + blocks := make([]schemas.ChatContentBlock, len(msg.Content.ContentBlocks)) + for i, b := range msg.Content.ContentBlocks { + blocks[i] = b + if b.Text != nil { + nt := normalizeText(*b.Text) + blocks[i].Text = &nt + } + } + nc.ContentBlocks = blocks + } + out.Content = &nc + return out +} + +// normalizeResponsesMessage mirrors normalizeChatMessage for the Responses API. +func normalizeResponsesMessage(msg schemas.ResponsesMessage) schemas.ResponsesMessage { + out := msg + if msg.Content == nil { + return out + } + nc := *msg.Content + if msg.Content.ContentStr != nil { + ns := normalizeText(*msg.Content.ContentStr) + nc.ContentStr = &ns + } else if msg.Content.ContentBlocks != nil { + blocks := make([]schemas.ResponsesMessageContentBlock, len(msg.Content.ContentBlocks)) + for i, b := range msg.Content.ContentBlocks { + blocks[i] = b + if b.Text != nil { + nt := normalizeText(*b.Text) + blocks[i].Text = &nt + } } + nc.ContentBlocks = blocks } - return arr // unchanged if target not found + out.Content = &nc + return out } -// extractChatParametersToMetadata extracts Chat API parameters into metadata map +// extractChatParametersToMetadata extracts Chat API parameters into metadata map. func (plugin *Plugin) extractChatParametersToMetadata(params *schemas.ChatParameters, metadata map[string]interface{}) { if params.ToolChoice != nil { if params.ToolChoice.ChatToolChoiceStr != nil { @@ -820,87 +767,53 @@ func (plugin *Plugin) extractChatParametersToMetadata(params *schemas.ChatParame metadata["tool_choice"] = params.ToolChoice.ChatToolChoiceStruct.Function.Name } } - if params.Temperature != nil { - metadata["temperature"] = *params.Temperature - } - if params.TopP != nil { - metadata["top_p"] = *params.TopP - } - if params.MaxCompletionTokens != nil { - metadata["max_tokens"] = *params.MaxCompletionTokens - } - if params.Stop != nil { - metadata["stop_sequences"] = params.Stop - } - if params.PresencePenalty != nil { - metadata["presence_penalty"] = *params.PresencePenalty - } - if params.FrequencyPenalty != nil { - metadata["frequency_penalty"] = *params.FrequencyPenalty - } - if params.ParallelToolCalls != nil { - metadata["parallel_tool_calls"] = *params.ParallelToolCalls - } - if params.User != nil { - metadata["user"] = *params.User - } - if params.LogitBias != nil { - metadata["logit_bias"] = *params.LogitBias - } - if params.LogProbs != nil { - metadata["logprobs"] = *params.LogProbs - } - if params.Modalities != nil { - metadata["modalities"] = params.Modalities - } - if params.PromptCacheKey != nil { - metadata["prompt_cache_key"] = *params.PromptCacheKey - } - if params.Reasoning != nil && params.Reasoning.Enabled != nil { - metadata["reasoning_enabled"] = *params.Reasoning.Enabled - } - if params.Reasoning != nil && params.Reasoning.Effort != nil { - metadata["reasoning_effort"] = *params.Reasoning.Effort + putIfSet(metadata, "temperature", params.Temperature) + putIfSet(metadata, "top_p", params.TopP) + putIfSet(metadata, "max_tokens", params.MaxCompletionTokens) + putSortedSetIfNonEmpty(metadata, "stop_sequences", params.Stop) + putIfSet(metadata, "presence_penalty", params.PresencePenalty) + putIfSet(metadata, "frequency_penalty", params.FrequencyPenalty) + putIfSet(metadata, "parallel_tool_calls", params.ParallelToolCalls) + putIfSet(metadata, "user", params.User) + putIfSet(metadata, "logit_bias", params.LogitBias) + putIfSet(metadata, "logprobs", params.LogProbs) + putSortedSetIfNonEmpty(metadata, "modalities", params.Modalities) + putIfSet(metadata, "prompt_cache_key", params.PromptCacheKey) + if params.Reasoning != nil { + putIfSet(metadata, "reasoning_enabled", params.Reasoning.Enabled) + putIfSet(metadata, "reasoning_effort", params.Reasoning.Effort) } if params.ResponseFormat != nil { + // ResponseFormat is a struct pointer that callers expect to round-trip + // through JSON; store the pointer directly so MarshalDeeplySorted walks it. metadata["response_format"] = params.ResponseFormat } - if params.SafetyIdentifier != nil { - metadata["safety_identifier"] = *params.SafetyIdentifier - } - if params.Seed != nil { - metadata["seed"] = *params.Seed - } - if params.ServiceTier != nil { - metadata["service_tier"] = *params.ServiceTier - } - if params.Store != nil { - metadata["store"] = *params.Store - } - if params.TopLogProbs != nil { - metadata["top_logprobs"] = *params.TopLogProbs - } - if params.Verbosity != nil { - metadata["verbosity"] = *params.Verbosity - } + putIfSet(metadata, "safety_identifier", params.SafetyIdentifier) + putIfSet(metadata, "seed", params.Seed) + putIfSet(metadata, "service_tier", params.ServiceTier) + putIfSet(metadata, "store", params.Store) + putIfSet(metadata, "top_logprobs", params.TopLogProbs) + putIfSet(metadata, "verbosity", params.Verbosity) if len(params.ExtraParams) > 0 { maps.Copy(metadata, params.ExtraParams) } if len(params.Tools) > 0 { - tools := make([]interface{}, len(params.Tools)) - for i, t := range params.Tools { - tools[i] = t - } - if toolsJSON, err := schemas.MarshalDeeplySorted(tools); err != nil { - plugin.logger.Warn("%s Failed to marshal tools for metadata: %v", PluginLoggerPrefix, err) - } else { - toolHash := xxhash.Sum64(toolsJSON) - metadata["tools_hash"] = fmt.Sprintf("%x", toolHash) + // Tools are an order-insensitive set; producer-side ordering (notably + // MCP's randomized map iteration) must not perturb the request hash. + if toolsHash, err := hashSortedSet(params.Tools, func(t schemas.ChatTool) string { + if t.Function == nil { + return "" + } + return t.Function.Name + }); err != nil { + plugin.logger.Warn("Failed to marshal tools for metadata: %v", err) + } else if toolsHash != "" { + metadata["tools_hash"] = toolsHash } } } -// extractResponsesParametersToMetadata extracts Responses API parameters into metadata map +// extractResponsesParametersToMetadata extracts Responses API parameters into metadata map. func (plugin *Plugin) extractResponsesParametersToMetadata(params *schemas.ResponsesParameters, metadata map[string]interface{}) { if params.ToolChoice != nil { if params.ToolChoice.ResponsesToolChoiceStr != nil { @@ -909,158 +822,86 @@ func (plugin *Plugin) extractResponsesParametersToMetadata(params *schemas.Respo metadata["tool_choice"] = *params.ToolChoice.ResponsesToolChoiceStruct.Name } } - if params.Temperature != nil { - metadata["temperature"] = *params.Temperature - } - if params.TopP != nil { - metadata["top_p"] = *params.TopP - } - if params.MaxOutputTokens != nil { - metadata["max_tokens"] = *params.MaxOutputTokens - } - if params.ParallelToolCalls != nil { - metadata["parallel_tool_calls"] = *params.ParallelToolCalls - } - if params.Background != nil { - metadata["background"] = *params.Background - } - if params.Conversation != nil { - metadata["conversation"] = *params.Conversation - } - if params.Include != nil { - metadata["include"] = params.Include - } - if params.Instructions != nil { - metadata["instructions"] = *params.Instructions - } - if params.MaxToolCalls != nil { - metadata["max_tool_calls"] = *params.MaxToolCalls - } - if params.PreviousResponseID != nil { - metadata["previous_response_id"] = *params.PreviousResponseID - } - if params.PromptCacheKey != nil { - metadata["prompt_cache_key"] = *params.PromptCacheKey - } + putIfSet(metadata, "temperature", params.Temperature) + putIfSet(metadata, "top_p", params.TopP) + putIfSet(metadata, "max_tokens", params.MaxOutputTokens) + putIfSet(metadata, "parallel_tool_calls", params.ParallelToolCalls) + putIfSet(metadata, "background", params.Background) + putIfSet(metadata, "conversation", params.Conversation) + putSortedSetIfNonEmpty(metadata, "include", params.Include) + putIfSet(metadata, "instructions", params.Instructions) + putIfSet(metadata, "max_tool_calls", params.MaxToolCalls) + putIfSet(metadata, "previous_response_id", params.PreviousResponseID) + putIfSet(metadata, "prompt_cache_key", params.PromptCacheKey) if params.Reasoning != nil { - if params.Reasoning.Effort != nil { - metadata["reasoning_effort"] = *params.Reasoning.Effort - } - if params.Reasoning.MaxTokens != nil { - metadata["reasoning_max_tokens"] = *params.Reasoning.MaxTokens - } - if params.Reasoning.Summary != nil { - metadata["reasoning_summary"] = *params.Reasoning.Summary - } - } - if params.SafetyIdentifier != nil { - metadata["safety_identifier"] = *params.SafetyIdentifier - } - if params.ServiceTier != nil { - metadata["service_tier"] = *params.ServiceTier - } - if params.Store != nil { - metadata["store"] = *params.Store + putIfSet(metadata, "reasoning_effort", params.Reasoning.Effort) + putIfSet(metadata, "reasoning_max_tokens", params.Reasoning.MaxTokens) + putIfSet(metadata, "reasoning_summary", params.Reasoning.Summary) } + putIfSet(metadata, "safety_identifier", params.SafetyIdentifier) + putIfSet(metadata, "service_tier", params.ServiceTier) + putIfSet(metadata, "store", params.Store) if params.Text != nil { - if params.Text.Verbosity != nil { - metadata["text_verbosity"] = *params.Text.Verbosity - } + putIfSet(metadata, "text_verbosity", params.Text.Verbosity) if params.Text.Format != nil { metadata["text_format_type"] = params.Text.Format.Type } } - if params.TopLogProbs != nil { - metadata["top_logprobs"] = *params.TopLogProbs - } - if params.Truncation != nil { - metadata["truncation"] = *params.Truncation - } + putIfSet(metadata, "top_logprobs", params.TopLogProbs) + putIfSet(metadata, "truncation", params.Truncation) if len(params.ExtraParams) > 0 { maps.Copy(metadata, params.ExtraParams) } if len(params.Tools) > 0 { - tools := make([]interface{}, len(params.Tools)) - for i, t := range params.Tools { - tools[i] = t - } - if toolsJSON, err := schemas.MarshalDeeplySorted(tools); err != nil { - plugin.logger.Warn("%s Failed to marshal tools for metadata: %v", PluginLoggerPrefix, err) - } else { - toolHash := xxhash.Sum64(toolsJSON) - metadata["tools_hash"] = fmt.Sprintf("%x", toolHash) + // Tools are an order-insensitive set; producer-side ordering (notably + // MCP's randomized map iteration) must not perturb the request hash. + if toolsHash, err := hashSortedSet(params.Tools, func(t schemas.ResponsesTool) string { + if t.Name == nil { + return "" + } + return *t.Name + }); err != nil { + plugin.logger.Warn("Failed to marshal tools for metadata: %v", err) + } else if toolsHash != "" { + metadata["tools_hash"] = toolsHash } } } -// extractTextCompletionParametersToMetadata extracts Text Completion parameters into metadata map +// extractTextCompletionParametersToMetadata extracts Text Completion parameters into metadata map. func (plugin *Plugin) extractTextCompletionParametersToMetadata(params *schemas.TextCompletionParameters, metadata map[string]interface{}) { - if params.Temperature != nil { - metadata["temperature"] = *params.Temperature - } - if params.TopP != nil { - metadata["top_p"] = *params.TopP - } - if params.MaxTokens != nil { - metadata["max_tokens"] = *params.MaxTokens - } - if params.Stop != nil { - metadata["stop_sequences"] = params.Stop - } - if params.PresencePenalty != nil { - metadata["presence_penalty"] = *params.PresencePenalty - } - if params.FrequencyPenalty != nil { - metadata["frequency_penalty"] = *params.FrequencyPenalty - } - if params.User != nil { - metadata["user"] = *params.User - } - if params.BestOf != nil { - metadata["best_of"] = *params.BestOf - } - if params.Echo != nil { - metadata["echo"] = *params.Echo - } - if params.LogitBias != nil { - metadata["logit_bias"] = *params.LogitBias - } - if params.LogProbs != nil { - metadata["logprobs"] = *params.LogProbs - } - if params.N != nil { - metadata["n"] = *params.N - } - if params.Seed != nil { - metadata["seed"] = *params.Seed - } - if params.Suffix != nil { - metadata["suffix"] = *params.Suffix - } + putIfSet(metadata, "temperature", params.Temperature) + putIfSet(metadata, "top_p", params.TopP) + putIfSet(metadata, "max_tokens", params.MaxTokens) + putSortedSetIfNonEmpty(metadata, "stop_sequences", params.Stop) + putIfSet(metadata, "presence_penalty", params.PresencePenalty) + putIfSet(metadata, "frequency_penalty", params.FrequencyPenalty) + putIfSet(metadata, "user", params.User) + putIfSet(metadata, "best_of", params.BestOf) + putIfSet(metadata, "echo", params.Echo) + putIfSet(metadata, "logit_bias", params.LogitBias) + putIfSet(metadata, "logprobs", params.LogProbs) + putIfSet(metadata, "n", params.N) + putIfSet(metadata, "seed", params.Seed) + putIfSet(metadata, "suffix", params.Suffix) if len(params.ExtraParams) > 0 { maps.Copy(metadata, params.ExtraParams) } } -// extractSpeechParametersToMetadata extracts Speech parameters into metadata map +// extractSpeechParametersToMetadata extracts Speech parameters into metadata map. func (plugin *Plugin) extractSpeechParametersToMetadata(params *schemas.SpeechParameters, metadata map[string]interface{}) { if params == nil { return } - - if params.Speed != nil { - metadata["speed"] = *params.Speed - } + putIfSet(metadata, "speed", params.Speed) if params.ResponseFormat != "" { metadata["response_format"] = params.ResponseFormat } if params.Instructions != "" { metadata["instructions"] = params.Instructions } - // Check if VoiceConfig.Voice is non-nil before accessing it - if params.VoiceConfig.Voice != nil { - metadata["voice"] = *params.VoiceConfig.Voice - } + putIfSet(metadata, "voice", params.VoiceConfig.Voice) if len(params.VoiceConfig.MultiVoiceConfig) > 0 { flattenedVC := make([]string, len(params.VoiceConfig.MultiVoiceConfig)) for i, vc := range params.VoiceConfig.MultiVoiceConfig { @@ -1068,117 +909,97 @@ func (plugin *Plugin) extractSpeechParametersToMetadata(params *schemas.SpeechPa } metadata["multi_voice_count"] = flattenedVC } + if len(params.PronunciationDictionaryLocators) > 0 { + if hash, err := hashSortedSet(params.PronunciationDictionaryLocators, func(l schemas.SpeechPronunciationDictionaryLocator) string { + return l.PronunciationDictionaryID + }); err != nil { + plugin.logger.Warn("Failed to marshal pronunciation_dictionary_locators for metadata: %v", err) + } else if hash != "" { + metadata["pronunciation_dictionary_locators_hash"] = hash + } + } if len(params.ExtraParams) > 0 { maps.Copy(metadata, params.ExtraParams) } } -// extractEmbeddingParametersToMetadata extracts Embedding parameters into metadata map +// extractEmbeddingParametersToMetadata extracts Embedding parameters into metadata map. func (plugin *Plugin) extractEmbeddingParametersToMetadata(params *schemas.EmbeddingParameters, metadata map[string]interface{}) { - if params.EncodingFormat != nil { - metadata["encoding_format"] = *params.EncodingFormat - } - if params.Dimensions != nil { - metadata["dimensions"] = *params.Dimensions - } + putIfSet(metadata, "encoding_format", params.EncodingFormat) + putIfSet(metadata, "dimensions", params.Dimensions) if len(params.ExtraParams) > 0 { maps.Copy(metadata, params.ExtraParams) } } -// extractTranscriptionParametersToMetadata extracts Transcription parameters into metadata map +// extractTranscriptionParametersToMetadata extracts Transcription parameters into metadata map. func (plugin *Plugin) extractTranscriptionParametersToMetadata(params *schemas.TranscriptionParameters, metadata map[string]interface{}) { - if params.Language != nil { - metadata["language"] = *params.Language - } - if params.ResponseFormat != nil { - metadata["response_format"] = *params.ResponseFormat - } - if params.Prompt != nil { - metadata["prompt"] = *params.Prompt - } - if params.Format != nil { - metadata["file_format"] = *params.Format + putIfSet(metadata, "language", params.Language) + putIfSet(metadata, "response_format", params.ResponseFormat) + putIfSet(metadata, "prompt", params.Prompt) + putIfSet(metadata, "file_format", params.Format) + putSortedSetIfNonEmpty(metadata, "timestamp_granularities", params.TimestampGranularities) + putSortedSetIfNonEmpty(metadata, "include", params.Include) + if len(params.AdditionalFormats) > 0 { + if hash, err := hashSortedSet(params.AdditionalFormats, func(f schemas.TranscriptionAdditionalFormat) string { + return string(f.Format) + }); err != nil { + plugin.logger.Warn("Failed to marshal additional_formats for metadata: %v", err) + } else if hash != "" { + metadata["additional_formats_hash"] = hash + } } if len(params.ExtraParams) > 0 { maps.Copy(metadata, params.ExtraParams) } } -// extractImageGenerationParametersToMetadata extracts Image Generation parameters into metadata map +// extractImageGenerationParametersToMetadata extracts Image Generation parameters into metadata map. func (plugin *Plugin) extractImageGenerationParametersToMetadata(params *schemas.ImageGenerationParameters, metadata map[string]interface{}) { if params == nil { return } - - if params.N != nil { - metadata["n"] = *params.N - } - if params.Background != nil { - metadata["background"] = *params.Background - } - if params.Moderation != nil { - metadata["moderation"] = *params.Moderation - } - if params.PartialImages != nil { - metadata["partial_images"] = *params.PartialImages - } - if params.Size != nil { - metadata["size"] = *params.Size - } - if params.Quality != nil { - metadata["quality"] = *params.Quality - } - if params.OutputCompression != nil { - metadata["output_compression"] = *params.OutputCompression - } - if params.OutputFormat != nil { - metadata["output_format"] = *params.OutputFormat - } - if params.Style != nil { - metadata["style"] = *params.Style + putIfSet(metadata, "n", params.N) + putIfSet(metadata, "background", params.Background) + putIfSet(metadata, "moderation", params.Moderation) + putIfSet(metadata, "partial_images", params.PartialImages) + putIfSet(metadata, "size", params.Size) + putIfSet(metadata, "quality", params.Quality) + putIfSet(metadata, "output_compression", params.OutputCompression) + putIfSet(metadata, "output_format", params.OutputFormat) + putIfSet(metadata, "style", params.Style) + putIfSet(metadata, "response_format", params.ResponseFormat) + putIfSet(metadata, "seed", params.Seed) + putIfSet(metadata, "negative_prompt", params.NegativePrompt) + putIfSet(metadata, "num_inference_steps", params.NumInferenceSteps) + putIfSet(metadata, "user", params.User) + if len(params.InputImages) > 0 { + metadata["input_images"] = params.InputImages } - if params.ResponseFormat != nil { - metadata["response_format"] = *params.ResponseFormat - } - if params.Seed != nil { - metadata["seed"] = *params.Seed - } - if params.NegativePrompt != nil { - metadata["negative_prompt"] = *params.NegativePrompt - } - if params.NumInferenceSteps != nil { - metadata["num_inference_steps"] = *params.NumInferenceSteps - } - if params.User != nil { - metadata["user"] = *params.User - } - if len(params.ExtraParams) > 0 { maps.Copy(metadata, params.ExtraParams) } } -func (plugin *Plugin) isConversationHistoryThresholdExceeded(req *schemas.BifrostRequest) bool { +// isConversationHistoryThresholdExceeded returns true when the request's +// conversation history is longer than ConversationHistoryThreshold. Long +// histories are unlikely to repeat and unlikely to be semantically similar +// to other requests, so caching them mostly bloats the store; PreLLMHook +// uses this to skip caching such requests entirely. +func (plugin *Plugin) isConversationHistoryThresholdExceeded(state *cacheState, req *schemas.BifrostRequest) bool { switch { case req.ChatRequest != nil: - input, ok := plugin.getInputForCaching(req).([]schemas.ChatMessage) + input, ok := plugin.getInputForCaching(state, req).([]schemas.ChatMessage) if !ok { return false } - if len(input) > plugin.config.ConversationHistoryThreshold { - return true - } - return false + return len(input) > plugin.config.ConversationHistoryThreshold case req.ResponsesRequest != nil: - input, ok := plugin.getInputForCaching(req).([]schemas.ResponsesMessage) + input, ok := plugin.getInputForCaching(state, req).([]schemas.ResponsesMessage) if !ok { return false } - if len(input) > plugin.config.ConversationHistoryThreshold { - return true - } - return false + return len(input) > plugin.config.ConversationHistoryThreshold default: return false } diff --git a/tests/e2e/api/HARNESS_COVERAGE_BACKLOG.md b/tests/e2e/api/HARNESS_COVERAGE_BACKLOG.md index 63b179dba6..09fd031343 100644 --- a/tests/e2e/api/HARNESS_COVERAGE_BACKLOG.md +++ b/tests/e2e/api/HARNESS_COVERAGE_BACKLOG.md @@ -301,7 +301,7 @@ Vertex's API surface for Gemini largely mirrors AI Studio's generateContent — ### Anthropic-on-Vertex specific - [x] Claude Opus 4.7 in user's region (`global` / `us-east5` / `europe-west1`) -- [ ] **Claude Sonnet 4.6 / 4.5 / Haiku 4.5** (regional gating — must use `global` or `us-east5`) +- [~] **Claude Sonnet 4.6 / 4.5 / Haiku 4.5** (regional gating - must use `global` or `us-east5`; Sonnet 4.6 cross-cut variants added in Cross-Cut Round 4 covering structured output, function calling, streaming, vision, tool_choice, stop sequences, multi-turn, system message, web search, PDF, sampling-params; Haiku 4.5 + Sonnet 4.5 still uncovered) - [ ] **`anthropic_version: "vertex-2023-10-16"` in body** (Vertex-specific replacement for the header) - [ ] **Vertex `:streamRawPredict` endpoint** for SSE streaming - [ ] **Beta headers via body field** (`anthropic_beta` instead of HTTP header) @@ -358,20 +358,20 @@ These exercise Bifrost's translation layer between provider shapes — every che `POST /v1/chat/completions` endpoint with `provider/model` prefix routing. - [x] OpenAI / Anthropic / Bedrock / Gemini / Vertex Basic Chat (50 cross-model entries) -- [~] Function calling cross-cut (4 providers tested; Vertex partially) -- [~] Structured output cross-cut (OpenAI + Gemini + Vertex; **Anthropic + Bedrock missing**) -- [~] Streaming cross-cut (4 providers tested; Vertex/Azure missing) -- [~] Vision cross-cut (OpenAI + Anthropic + Gemini; **Bedrock + Vertex + Azure missing**) -- [~] Web search cross-cut (3 providers; **Bedrock + Vertex + Azure missing**) -- [ ] **Code execution cross-cut** (Anthropic + Gemini) -- [ ] **Tool choice forced cross-cut** (multi-provider) -- [ ] **Computer use via cross-model** (`anthropic/claude-...` with computer_2025x tools — verifies Bifrost's translation; currently only tested via /anthropic drop-in) -- [ ] **Extended/adaptive thinking via cross-model** -- [ ] **Prompt caching via cross-model** -- [ ] **System message cross-cut** (every provider via `/v1/chat/completions`) -- [ ] **Multi-turn conversation cross-cut** (provider-specific role normalization) -- [ ] **Stop sequences cross-cut** (each provider has different stop semantics) -- [ ] **Sampling-params normalization** (Bifrost should silently drop temperature for Opus 4.7+) +- [x] Function calling cross-cut (OpenAI + Anthropic + Bedrock + Gemini + Vertex Claude + Vertex Gemini via Cross-Cut Round 4; Azure via Cross-Cut Round 4) +- [x] Structured output cross-cut (OpenAI + Anthropic + Bedrock + Gemini + Vertex Gemini + Vertex Claude via Cross-Cut Round 4; Azure via Cross-Cut Round 4) +- [x] Streaming cross-cut (OpenAI + Anthropic + Bedrock + Gemini + Vertex Claude + Vertex Gemini + Azure via Cross-Cut Round 4) +- [x] Vision cross-cut (OpenAI + Anthropic + Bedrock + Gemini + Vertex Gemini + Vertex Claude + Azure via Cross-Cut Round 4) +- [~] Web search cross-cut (Anthropic + Bedrock + Vertex Claude (sonnet) + Vertex Gemini (google_search) via Cross-Cut Round 4; **OpenAI Responses-style web_search via /v1/chat still missing**) +- [~] **Code execution cross-cut** (Anthropic + Gemini + Bedrock + Vertex Claude (opus); **Vertex Gemini code execution via /v1/chat untested**) +- [~] **Tool choice forced cross-cut** (OpenAI + Bedrock + Vertex Claude via Cross-Cut Round 4; **Anthropic + Gemini + Azure still missing**) +- [ ] **Computer use via cross-model** (`anthropic/claude-...` with computer_2025x tools - verifies Bifrost's translation; currently only tested via /anthropic drop-in and `vertex/claude-opus-4-7` preview at L1279) +- [~] **Extended/adaptive thinking via cross-model** (Anthropic enabled + Bedrock enabled/adaptive + Vertex Claude enabled/adaptive covered; **anthropic-direct adaptive Opus 4.7 still missing**) +- [x] **Prompt caching via cross-model** (Anthropic + Bedrock 1h + Vertex Claude 1h covered) +- [~] **System message cross-cut** (Vertex Claude added in Round 4; Azure added in Round 4; **other providers were already implicit via cross-cut entries** - if explicit test needed, file a ticket) +- [~] **Multi-turn conversation cross-cut** (Vertex Claude added in Round 4; remaining providers still cross-cut-implicit only) +- [x] **Stop sequences cross-cut** (OpenAI + Anthropic + Gemini already; + Bedrock + Vertex Claude + Vertex Gemini added in Cross-Cut Round 4) +- [~] **Sampling-params normalization** (Bifrost should silently drop temperature for Opus 4.7+; Anthropic-direct + Vertex Claude Opus 4.7 covered; **Bedrock Opus 4.7 via cross-model still missing**) - [ ] **Failover scenarios** (request to provider X falls back to provider Y on 5xx) - [ ] **Virtual keys / governance** (`X-Bifrost-VK` header with allowed_models) - [ ] **Rate limit propagation** (provider 429 → Bifrost 429 with Retry-After preserved) diff --git a/tests/e2e/api/collections/provider-harness.json b/tests/e2e/api/collections/provider-harness.json index 665ed30482..619fb5a370 100644 --- a/tests/e2e/api/collections/provider-harness.json +++ b/tests/e2e/api/collections/provider-harness.json @@ -1318,6 +1318,7 @@ "item": [ { "name": "openai/gpt-4o-mini (json_schema)", + "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], @@ -1327,6 +1328,7 @@ }, { "name": "anthropic/claude-haiku (forced tool)", + "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Forced tool: emit_city invoked with schema-compliant arguments', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls in response').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('emit_city'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('arguments not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); pm.expect(a).to.have.property('country').that.is.a('string'); pm.expect(a).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], @@ -1336,6 +1338,7 @@ }, { "name": "gemini/gemini-2.5-flash (responseSchema)", + "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var t = parts.find(function (pp) { return pp && pp.text; }); var c = t ? t.text : ''; pm.expect(c, 'candidates[0].content.parts[*].text empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('parts.text not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], @@ -1345,6 +1348,7 @@ }, { "name": "vertex/gemini-2.5-pro (responseSchema)", + "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var t = parts.find(function (pp) { return pp && pp.text; }); var c = t ? t.text : ''; pm.expect(c, 'candidates[0].content.parts[*].text empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('parts.text not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], @@ -1391,6 +1395,7 @@ "item": [ { "name": "openai/gpt-4o-mini", + "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city argument', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls in response').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('arguments not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], @@ -1400,6 +1405,7 @@ }, { "name": "anthropic/claude-haiku-4-5", + "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city argument', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls in response').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('arguments not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], @@ -1409,6 +1415,7 @@ }, { "name": "bedrock/global.anthropic.claude-sonnet-4-6", + "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city argument', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls in response').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('arguments not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], @@ -1418,6 +1425,7 @@ }, { "name": "gemini/gemini-2.5-flash", + "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city argument', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls in response').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('arguments not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], @@ -1520,6 +1528,9 @@ { "name": "OpenAI: service_tier auto", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"service_tier\": \"auto\"\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai","v1","chat","completions"]}}}, { "name": "OpenAI: store + metadata", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"store\": true,\n \"metadata\": {\"harness\": \"backlog\"}\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai","v1","chat","completions"]}}}, { "name": "OpenAI Responses: reasoning summary", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"o3-mini\",\n \"input\": \"What's 17*23?\",\n \"reasoning\": {\"summary\": \"auto\"}\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/responses","host":["{{baseUrl}}"],"path":["openai","v1","responses"]}}}, + { "name": "OpenAI Responses streaming: summary_index + obfuscation preserved", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code >= 400) { return; }","pm.test('summary_index and obfuscation survive stream', function () {"," var body = pm.response.text() || '';"," pm.expect(body, 'expected summary_index in SSE body').to.include('\"summary_index\"');"," pm.expect(body, 'expected obfuscation in SSE body').to.include('\"obfuscation\"');","});"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"o3-mini\",\n \"input\": \"What's 17*23?\",\n \"reasoning\": {\"summary\": \"auto\"},\n \"stream\": true,\n \"stream_options\": {\"include_obfuscation\": true}\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/responses","host":["{{baseUrl}}"],"path":["openai","v1","responses"]}}}, + { "name": "OpenAI Responses streaming: assistant phase preserved (gpt-5.3-codex)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code >= 400) { return; }","pm.test('phase appears on assistant message items', function () {"," var body = pm.response.text() || '';"," var hasPhase = body.indexOf('\"phase\":\"final_answer\"') !== -1 || body.indexOf('\"phase\":\"commentary\"') !== -1;"," pm.expect(hasPhase, 'no phase field in SSE body. First 200 chars: ' + body.slice(0,200)).to.be.true;","});"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-5.3-codex\",\n \"input\": \"Solve 2+2 and explain your steps briefly.\",\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/responses","host":["{{baseUrl}}"],"path":["openai","v1","responses"]}}}, + { "name": "OpenAI Responses: assistant phase input round-trip (gpt-5.3-codex)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code >= 400) {"," pm.test('phase field accepted on input message', function () {"," var body = (pm.response.text() || '').toLowerCase();"," pm.expect(body, 'unexpected rejection of phase field: ' + body.slice(0,200)).to.not.include('unknown field \"phase\"');"," pm.expect(body).to.not.include('unexpected field \"phase\"');"," });"," return;","}","pm.test('response returned output items', function () {"," var body = pm.response.text() || '';"," pm.expect(body).to.include('\"output\"');","});"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-5.3-codex\",\n \"input\": [\n {\"role\":\"user\",\"content\":\"What's 2+2?\"},\n {\"role\":\"assistant\",\"phase\":\"final_answer\",\"content\":\"4\"},\n {\"role\":\"user\",\"content\":\"Now what's 3+3?\"}\n ]\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/responses","host":["{{baseUrl}}"],"path":["openai","v1","responses"]}}}, { "name": "OpenAI Responses: background mode", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o\",\n \"input\": \"Hi\",\n \"background\": true\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/responses","host":["{{baseUrl}}"],"path":["openai","v1","responses"]}}}, { "name": "OpenAI Responses: truncation auto", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"input\": \"Hi\",\n \"truncation\": \"auto\"\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/responses","host":["{{baseUrl}}"],"path":["openai","v1","responses"]}}}, { "name": "OpenAI Responses: include array", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"input\": \"Hi\",\n \"include\": [\"message.input_image.image_url\"]\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/responses","host":["{{baseUrl}}"],"path":["openai","v1","responses"]}}}, @@ -1617,13 +1628,13 @@ { "name": "Cross-cut: code execution Gemini", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Compute 50!\"}],\n \"tools\": [{\"type\":\"code_execution\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "Cross-cut: extended thinking via cross-model", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Plan a trip\"}],\n \"thinking\": {\"type\":\"enabled\",\"budget_tokens\":2000},\n \"max_tokens\": 4096\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "Cross-cut: prompt caching via cross-model", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"system\",\"content\":[{\"type\":\"text\",\"text\":\"Long ctx\",\"cache_control\":{\"type\":\"ephemeral\"}}]},{\"role\":\"user\",\"content\":\"Hi\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, - { "name": "Cross-cut: stop sequences (OpenAI)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, - { "name": "Cross-cut: stop sequences (Anthropic)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, - { "name": "Cross-cut: stop sequences (Gemini)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: stop sequences (OpenAI)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked into content').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter'], 'unexpected finish_reason: ' + fr).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: stop sequences (Anthropic)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked into content').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter'], 'unexpected finish_reason: ' + fr).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: stop sequences (Gemini)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked into content').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter'], 'unexpected finish_reason: ' + fr).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "Cross-cut: tool_choice forced (OpenAI)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"f\",\"parameters\":{\"type\":\"object\"}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "Cross-cut: tool_choice forced (Bedrock)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"f\",\"parameters\":{\"type\":\"object\"}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, - { "name": "Cross-cut: structured output Anthropic", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Pick a city\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, - { "name": "Cross-cut: structured output Bedrock", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Pick a city\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}}}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: structured output Anthropic", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Pick a city\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: structured output Bedrock", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Pick a city\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}}}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "Cross-cut: vision Bedrock", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "Cross-cut: vision Vertex (Gemini)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "Cross-cut: web search Bedrock (Anthropic)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Latest news\"}],\n \"tools\": [{\"type\":\"web_search_20250305\",\"name\":\"web_search\",\"max_uses\":2}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, @@ -1660,7 +1671,7 @@ { "name": "Vertex Gemini: vision", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Describe\"},{\"fileData\":{\"mimeType\":\"image/jpeg\",\"fileUri\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/{{vertexModel}}:generateContent","host":["{{baseUrl}}"],"path":["genai","v1beta","models","{{vertexModel}}:generateContent"]}}}, { "name": "Vertex Gemini: code execution", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Compute fib(20)\"}]}],\n \"tools\": [{\"codeExecution\":{}}]\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/{{vertexModel}}:generateContent","host":["{{baseUrl}}"],"path":["genai","v1beta","models","{{vertexModel}}:generateContent"]}}}, { "name": "Vertex Gemini: thinking budget", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Solve 17*23\"}]}],\n \"generationConfig\": {\"thinkingConfig\":{\"thinkingBudget\":4000}}\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/{{vertexModel}}:generateContent","host":["{{baseUrl}}"],"path":["genai","v1beta","models","{{vertexModel}}:generateContent"]}}}, - { "name": "Vertex Gemini: structured output json_schema (via /v1/chat)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Pick a city\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}}}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Gemini: structured output json_schema (via /v1/chat)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Pick a city\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}}}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "Vertex Claude: extended thinking", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"max_tokens\": 4096,\n \"thinking\": {\"type\":\"adaptive\"},\n \"messages\": [{\"role\":\"user\",\"content\":\"Solve in steps\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "Vertex Claude: web search", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Latest news\"}],\n \"tools\": [{\"type\":\"web_search_20250305\",\"name\":\"web_search\",\"max_uses\":2}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "Vertex Claude: prompt caching", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"max_tokens\": 256,\n \"system\": [{\"type\":\"text\",\"text\":\"Long ctx\",\"cache_control\":{\"type\":\"ephemeral\"}}],\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, @@ -1712,7 +1723,7 @@ { "name": "Anthropic: allowed_callers + advanced beta", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"},{"key":"anthropic-beta","value":"advanced-tool-use-2025-09-15"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-opus-4-7\",\n \"max_tokens\": 256,\n \"tools\": [{\"name\":\"f\",\"input_schema\":{\"type\":\"object\"},\"allowed_callers\":[\"direct\"]}],\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic","v1","messages"]}}}, { "name": "Anthropic: skills/container", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"},{"key":"anthropic-beta","value":"skills-2025-10-29"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-opus-4-7\",\n \"max_tokens\": 1024,\n \"container\": {\"skills\":[{\"skill_id\":\"data-analysis\",\"type\":\"anthropic\"}]},\n \"messages\": [{\"role\":\"user\",\"content\":\"Analyze\"}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic","v1","messages"]}}}, { "name": "Gemini: parallel function calls", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Weather in NYC and SF?\"}]}],\n \"tools\": [{\"functionDeclarations\":[{\"name\":\"get_weather\",\"parameters\":{\"type\":\"OBJECT\",\"properties\":{\"city\":{\"type\":\"STRING\"}}}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/{{genaiModel}}:generateContent","host":["{{baseUrl}}"],"path":["genai","v1beta","models","{{genaiModel}}:generateContent"]}}}, - { "name": "Gemini: structured output via /v1/chat (json_schema)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Pick a city\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}}}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Gemini: structured output via /v1/chat (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Pick a city\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}}}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, { "name": "[PREVIEW] Gemini: cached content reference", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Use cache\"}]}],\n \"cachedContent\": \"cachedContents/REPLACE_ME\"\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/{{genaiModel}}:generateContent","host":["{{baseUrl}}"],"path":["genai","v1beta","models","{{genaiModel}}:generateContent"]}}}, { "name": "Gemini: audio input (inline)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Transcribe\"},{\"inlineData\":{\"mimeType\":\"audio/wav\",\"data\":\"UklGRkQDAABXQVZFZm10IBAAAAABAAEAQB8AAIA+AAACABAAZGF0YSADAAAAAJUK6xPlGrIe3R5iG6oUgAv7AFj22eye5YHhAOEo5Jzql/MK/rcIXhLYGUQeHB9GHBgWTg3wAjv4cO645v7h0OBS4znp0fEW/NEGvhCxGLgdPB8OHXAXDg/jBCX6GPDs55niwOCZ4uznGPAl+uMEDg9wFw4dPB+4HbEYvhDRBhb80fE56VLj0OD+4bjmcO47+PACTg0YFkYcHB9EHtgZXhK3CAr+l/Oc6ijkAOGB4Z7l2exY9vsAgAuqFGIb3R6yHuUa6xOVCgAAa/UV7BvlTuEj4Z7kVuuA9AX/qAknE2Iafx4AH9gbZBVpDPYBSfei7SjmvOHk4Lrj6Omy8hD9xQeQEUgZAh4wH64cxxYvDuoDL/lC70/nSOLE4PLikOjy8B372wXoDxQYZx1AH2cdFBjoD9sFHfvy8JDo8uLE4EjiT+dC7y/56gMvDscWrhwwHwIeSBmQEcUHEP2y8ujpuuPk4LzhKOai7Un39gFpDGQV2BsAH38eYhonE6gJBf+A9FbrnuQj4U7hG+UV7Gv1AACVCusT5RqyHt0eYhuqFIAL+wBY9tnsnuWB4QDhKOSc6pfzCv63CF4S2BlEHhwfRhwYFk4N8AI7+HDuuOb+4dDgUuM56dHxFvzRBr4QsRi4HTwfDh1wFw4P4wQl+hjw7OeZ4sDgmeLs5xjwJfrjBA4PcBcOHTwfuB2xGL4Q0QYW/NHxOelS49Dg/uG45nDuO/jwAk4NGBZGHBwfRB7YGV4StwgK/pfznOoo5ADhgeGe5dnsWPb7AIALqhRiG90esh7lGusTlQoAAGv1Fewb5U7hI+Ge5FbrgPQF/6gJJxNiGn8eAB/YG2QVaQz2AUn3ou0o5rzh5OC64+jpsvIQ/cUHkBFIGQIeMB+uHMcWLw7qAy/5Qu9P50jixODy4pDo8vAd+9sF6A8UGGcdQB9nHRQY6A/bBR378vCQ6PLixOBI4k/nQu8v+eoDLw7HFq4cMB8CHkgZkBHFBxD9svLo6brj5OC84Sjmou1J9/YBaQxkFdgbAB9/HmIaJxOoCQX/gPRW657kI+FO4RvlFexr9Q==\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/{{genaiModel}}:generateContent","host":["{{baseUrl}}"],"path":["genai","v1beta","models","{{genaiModel}}:generateContent"]}}}, { "name": "Gemini: list cached contents", "request": { "method": "GET", "header": [{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "url": {"raw":"{{baseUrl}}/genai/v1beta/cachedContents","host":["{{baseUrl}}"],"path":["genai","v1beta","cachedContents"]}}}, @@ -1782,7 +1793,375 @@ { "name": "[PREVIEW] Azure: skills/container", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"api-key","value":"{{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"input\": \"Use skills\",\n \"tools\": [{\"type\":\"code_interpreter\",\"container\":{\"type\":\"auto\"}}]\n}"}, "url": {"raw":"{{baseUrl}}/openai/openai/deployments/{{azureDeployment}}/responses?api-version=2025-04-01-preview","host":["{{baseUrl}}"],"path":["openai","openai","deployments","{{azureDeployment}}","responses"],"query":[{"key":"api-version","value":"2025-04-01-preview"}]}}}, { "name": "Azure: service_tier scale", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"api-key","value":"{{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"service_tier\": \"flex\"\n}"}, "url": {"raw":"{{baseUrl}}/openai/openai/deployments/{{azureDeployment}}/chat/completions?api-version={{azureApiVersion}}","host":["{{baseUrl}}"],"path":["openai","openai","deployments","{{azureDeployment}}","chat","completions"],"query":[{"key":"api-version","value":"{{azureApiVersion}}"}]}}} ] + }, + { + "name": "Cross-Cut Round 4 (Vertex Claude + Vertex Gemini + Azure cross-cut)", + "item": [ + { "name": "Vertex Claude: structured output (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (CityInfo)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('name').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p.name.toLowerCase(), 'expected Paris').to.include('paris'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"system\",\"content\":\"Extract the city information from the user's message.\"},{\"role\":\"user\",\"content\":\"I visited Paris, France last summer.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"CityInfo\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"}},\"required\":[\"name\",\"country\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: function calling cross-cut", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city argument', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls in response').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('arguments not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: streaming", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: vision", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"What is in this image?\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: tool_choice forced", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"f\",\"parameters\":{\"type\":\"object\"}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: stop sequences", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: multi-turn cross-cut", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: system message cross-cut", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: web search via /v1/chat (sonnet)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Latest news\"}],\n \"tools\": [{\"type\":\"web_search_20250305\",\"name\":\"web_search\",\"max_uses\":2}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: code execution via /v1/chat (opus)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Compute 50!\"}],\n \"tools\": [{\"type\":\"code_execution_20250522\",\"name\":\"code_execution\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: PDF input via /v1/chat (sonnet)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"document\",\"source\":{\"type\":\"url\",\"url\":\"https://www.berkshirehathaway.com/letters/2024ltr.pdf\"}},{\"type\":\"text\",\"text\":\"Summarize\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Claude: sampling-params dropped for Opus 4.7", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"temperature\": 0.7,\n \"top_p\": 0.9\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Gemini: function calling cross-cut", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city argument', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls in response').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('arguments not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Vertex Gemini: streaming cross-cut", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: web search Vertex Gemini (google_search)", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Latest news\"}],\n \"tools\": [{\"type\":\"google_search\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: stop sequences (Bedrock)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked into content').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter'], 'unexpected finish_reason: ' + fr).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: stop sequences (Vertex Gemini)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked into content').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter'], 'unexpected finish_reason: ' + fr).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: Azure basic chat", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"azure/{{azureDeployment}}\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: Azure tools", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"azure/{{azureDeployment}}\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: Azure structured output (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"azure/{{azureDeployment}}\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country for Paris\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"}},\"required\":[\"city\",\"country\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: Azure streaming", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"azure/{{azureDeployment}}\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: Azure vision", "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"azure/{{azureDeployment}}\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 5: Structured Output Matrix (response_format json_schema via /v1/chat across providers/models)", + "item": [ + { "name": "Cross-cut: openai/gpt-5 (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-5-mini (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4o (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4.1 (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4.1\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/o3-mini (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/o3-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-opus-4-7 (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/claude-opus-4-7 (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/claude-haiku-4-5 (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-haiku-4-5-20251001-v1:0\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "[PREVIEW] Cross-cut: bedrock/nova-pro (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/us.amazon.nova-pro-v1:0\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "[PREVIEW] Cross-cut: bedrock/nova-lite (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/us.amazon.nova-lite-v1:0\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-flash (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-pro (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: schema-compliant JSON (city/country/pop)', function () { var j = pm.response.json(); var c = ''; if (j.choices && j.choices[0] && j.choices[0].message) { c = j.choices[0].message.content || ''; } if (!c && Array.isArray(j.content)) { var tb = j.content.find(function (b) { return b.type === 'text' && b.text; }); c = tb ? tb.text : ''; } pm.expect(c, 'content was empty').to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('content not JSON: ' + e.message + ' (got: ' + c.slice(0,120) + ')'); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 6: Function Calling Matrix", + "item": [ + { "name": "Cross-cut: openai/gpt-5 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-5-mini function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4o function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4.1 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4.1\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/o3-mini function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/o3-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-opus-4-7 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-haiku-4-5 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-opus-4-7 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-sonnet-4-6 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-haiku-4-5-20251001-v1:0 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-haiku-4-5-20251001-v1:0\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "[PREVIEW] Cross-cut: bedrock/us.amazon.nova-pro-v1:0 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/us.amazon.nova-pro-v1:0\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-sonnet-4-6 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-pro function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-flash function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-pro function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-flash function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: azure/{{azureDeployment}} function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked with city', function () { var j = pm.response.json(); var tc = null; if (j.choices && j.choices[0] && j.choices[0].message && Array.isArray(j.choices[0].message.tool_calls) && j.choices[0].message.tool_calls.length) { tc = j.choices[0].message.tool_calls[0]; } pm.expect(tc, 'no tool_calls').to.not.be.null; pm.expect(tc.function && tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"azure/{{azureDeployment}}\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 7: Streaming Matrix", + "item": [ + { "name": "Cross-cut: openai/gpt-5 streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-5-mini streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4o streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4.1 streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4.1\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/o3-mini streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/o3-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-opus-4-7 streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-haiku-4-5 streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-opus-4-7 streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-sonnet-4-6 streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-haiku-4-5-20251001-v1:0 streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-haiku-4-5-20251001-v1:0\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-sonnet-4-6 streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-pro streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-flash streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-pro streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-flash streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: azure/{{azureDeployment}} streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"azure/{{azureDeployment}}\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 8: Vision Matrix", + "item": [ + { "name": "Cross-cut: openai/gpt-5 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4o vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4.1 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4.1\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-opus-4-7 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-haiku-4-5 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-opus-4-7 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-sonnet-4-6 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-sonnet-4-6 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-pro vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-flash vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-pro vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-flash vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: azure/{{azureDeployment}} vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"azure/{{azureDeployment}}\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 9: Tool Choice Forced Matrix", + "item": [ + { "name": "Cross-cut: openai/gpt-5 tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-5-mini tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4o tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4.1 tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4.1\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-opus-4-7 tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-haiku-4-5 tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-opus-4-7 tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-sonnet-4-6 tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-sonnet-4-6 tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-pro tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: azure/{{azureDeployment}} tool_choice forced", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Tool choice forced: tool_calls present', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls; pm.expect(Array.isArray(tc) && tc.length > 0).to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"azure/{{azureDeployment}}\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}],\n \"tool_choice\": \"required\"\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 10: Stop Sequences Matrix", + "item": [ + { "name": "Cross-cut: openai/gpt-5 stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4o stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-opus-4-7 stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-haiku-4-5 stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-opus-4-7 stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-sonnet-4-6 stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-sonnet-4-6 stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-flash stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-flash stop sequences", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Stop sequence: halted before stop token', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; var fr = (j.choices && j.choices[0] && j.choices[0].finish_reason) || ''; pm.expect(c.toLowerCase(), 'stop token \"three\" leaked').to.not.include('three'); pm.expect(['stop','stop_sequence','length','content_filter']).to.include(fr); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count: one, two, three, four\"}],\n \"stop\": [\"three\"]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 11: Multi-turn Matrix", + "item": [ + { "name": "Cross-cut: openai/gpt-5 multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4o multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-opus-4-7 multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-haiku-4-5 multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-opus-4-7 multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-sonnet-4-6 multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-sonnet-4-6 multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-pro multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-pro multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 12: System Message Matrix", + "item": [ + { "name": "Cross-cut: openai/gpt-5 system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-5\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: openai/gpt-4o system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"openai/gpt-4o\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-opus-4-7 system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-haiku-4-5 system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-opus-4-7 system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/global.anthropic.claude-sonnet-4-6 system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-sonnet-4-6 system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-pro system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-pro system message", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-pro\",\n \"messages\": [{\"role\":\"system\",\"content\":\"You are a pirate.\"},{\"role\":\"user\",\"content\":\"Greet me\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 13: Web Search Matrix", + "item": [ + { "name": "Cross-cut: anthropic/claude-opus-4-7 web_search", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Web search: response non-empty', function () { var raw = JSON.stringify(pm.response.json()); pm.expect(raw.length).to.be.greaterThan(100); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Latest news\"}],\n \"tools\": [{\"type\":\"web_search_20250305\",\"name\":\"web_search\",\"max_uses\":2}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 web_search", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Web search: response non-empty', function () { var raw = JSON.stringify(pm.response.json()); pm.expect(raw.length).to.be.greaterThan(100); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Latest news\"}],\n \"tools\": [{\"type\":\"web_search_20250305\",\"name\":\"web_search\",\"max_uses\":2}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/claude-opus-4-7 web_search", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Web search: response non-empty', function () { var raw = JSON.stringify(pm.response.json()); pm.expect(raw.length).to.be.greaterThan(100); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Latest news\"}],\n \"tools\": [{\"type\":\"web_search_20250305\",\"name\":\"web_search\",\"max_uses\":2}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 web_search", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Web search: response non-empty', function () { var raw = JSON.stringify(pm.response.json()); pm.expect(raw.length).to.be.greaterThan(100); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Latest news\"}],\n \"tools\": [{\"type\":\"web_search_20250305\",\"name\":\"web_search\",\"max_uses\":2}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: gemini/gemini-2.5-flash google_search", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Web search: response non-empty', function () { var raw = JSON.stringify(pm.response.json()); pm.expect(raw.length).to.be.greaterThan(100); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gemini/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Latest news\"}],\n \"tools\": [{\"type\":\"google_search\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/gemini-2.5-flash google_search", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Web search: response non-empty', function () { var raw = JSON.stringify(pm.response.json()); pm.expect(raw.length).to.be.greaterThan(100); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/gemini-2.5-flash\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Latest news\"}],\n \"tools\": [{\"type\":\"google_search\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 14: Code Execution Matrix", + "item": [ + { "name": "Cross-cut: anthropic/claude-opus-4-7 code_execution", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Compute 50!\"}],\n \"tools\": [{\"type\":\"code_execution_20250522\",\"name\":\"code_execution\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/claude-opus-4-7 code_execution", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Compute 50!\"}],\n \"tools\": [{\"type\":\"code_execution_20250522\",\"name\":\"code_execution\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 code_execution", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Compute 50!\"}],\n \"tools\": [{\"type\":\"code_execution_20250522\",\"name\":\"code_execution\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 15: Extended/Adaptive Thinking Matrix", + "item": [ + { "name": "Cross-cut: anthropic/claude-opus-4-7 adaptive thinking", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"max_tokens\": 4096,\n \"thinking\": {\"type\":\"adaptive\"},\n \"messages\": [{\"role\":\"user\",\"content\":\"Solve in steps\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 enabled thinking", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"max_tokens\": 4096,\n \"thinking\": {\"type\":\"enabled\",\"budget_tokens\":2000},\n \"messages\": [{\"role\":\"user\",\"content\":\"Solve in steps\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/claude-opus-4-7 adaptive thinking", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"max_tokens\": 4096,\n \"thinking\": {\"type\":\"adaptive\"},\n \"messages\": [{\"role\":\"user\",\"content\":\"Solve in steps\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/claude-sonnet-4-6 enabled thinking", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-sonnet-4-6\",\n \"max_tokens\": 4096,\n \"thinking\": {\"type\":\"enabled\",\"budget_tokens\":2000},\n \"messages\": [{\"role\":\"user\",\"content\":\"Solve in steps\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 adaptive thinking", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"max_tokens\": 4096,\n \"thinking\": {\"type\":\"adaptive\"},\n \"messages\": [{\"role\":\"user\",\"content\":\"Solve in steps\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-sonnet-4-6 enabled thinking", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"max_tokens\": 4096,\n \"thinking\": {\"type\":\"enabled\",\"budget_tokens\":2000},\n \"messages\": [{\"role\":\"user\",\"content\":\"Solve in steps\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 16: Prompt Caching Matrix", + "item": [ + { "name": "Cross-cut: anthropic/claude-opus-4-7 prompt caching", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Prompt caching: usage present', function () { var j = pm.response.json(); pm.expect(j.usage || {}).to.be.an('object'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"max_tokens\": 256,\n \"system\": [{\"type\":\"text\",\"text\":\"Long ctx\",\"cache_control\":{\"type\":\"ephemeral\"}}],\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-haiku-4-5 prompt caching", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Prompt caching: usage present', function () { var j = pm.response.json(); pm.expect(j.usage || {}).to.be.an('object'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-haiku-4-5\",\n \"max_tokens\": 256,\n \"system\": [{\"type\":\"text\",\"text\":\"Long ctx\",\"cache_control\":{\"type\":\"ephemeral\"}}],\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/claude-opus-4-7 prompt caching", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Prompt caching: usage present', function () { var j = pm.response.json(); pm.expect(j.usage || {}).to.be.an('object'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"max_tokens\": 256,\n \"system\": [{\"type\":\"text\",\"text\":\"Long ctx\",\"cache_control\":{\"type\":\"ephemeral\"}}],\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/claude-haiku-4-5 prompt caching", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Prompt caching: usage present', function () { var j = pm.response.json(); pm.expect(j.usage || {}).to.be.an('object'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-haiku-4-5-20251001-v1:0\",\n \"max_tokens\": 256,\n \"system\": [{\"type\":\"text\",\"text\":\"Long ctx\",\"cache_control\":{\"type\":\"ephemeral\"}}],\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 prompt caching", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Prompt caching: usage present', function () { var j = pm.response.json(); pm.expect(j.usage || {}).to.be.an('object'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"max_tokens\": 256,\n \"system\": [{\"type\":\"text\",\"text\":\"Long ctx\",\"cache_control\":{\"type\":\"ephemeral\"}}],\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-sonnet-4-6 prompt caching", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Prompt caching: usage present', function () { var j = pm.response.json(); pm.expect(j.usage || {}).to.be.an('object'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-sonnet-4-6\",\n \"max_tokens\": 256,\n \"system\": [{\"type\":\"text\",\"text\":\"Long ctx\",\"cache_control\":{\"type\":\"ephemeral\"}}],\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 17: PDF Input Matrix", + "item": [ + { "name": "Cross-cut: anthropic/claude-opus-4-7 PDF input", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-opus-4-7\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"document\",\"source\":{\"type\":\"url\",\"url\":\"https://www.berkshirehathaway.com/letters/2024ltr.pdf\"}},{\"type\":\"text\",\"text\":\"Summarize\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: anthropic/claude-sonnet-4-6 PDF input", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"anthropic/claude-sonnet-4-6\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"document\",\"source\":{\"type\":\"url\",\"url\":\"https://www.berkshirehathaway.com/letters/2024ltr.pdf\"}},{\"type\":\"text\",\"text\":\"Summarize\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: bedrock/claude-opus-4-7 PDF input", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"bedrock/global.anthropic.claude-opus-4-7\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"document\",\"source\":{\"type\":\"url\",\"url\":\"https://www.berkshirehathaway.com/letters/2024ltr.pdf\"}},{\"type\":\"text\",\"text\":\"Summarize\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}}, + { "name": "Cross-cut: vertex/claude-opus-4-7 PDF input", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"vertex/claude-opus-4-7\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"document\",\"source\":{\"type\":\"url\",\"url\":\"https://www.berkshirehathaway.com/letters/2024ltr.pdf\"}},{\"type\":\"text\",\"text\":\"Summarize\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/v1/chat/completions","host":["{{baseUrl}}"],"path":["v1","chat","completions"]}}} + ] + }, + { + "name": "Cross-Cut Round 18: Cohere Drop-in Smoke", + "item": [ + { "name": "Cohere drop-in: basic chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Cohere shape: message.content non-empty', function () { var j = pm.response.json(); var c = j.message && j.message.content; pm.expect(Array.isArray(c) ? c.length > 0 : (typeof c === 'string' && c.length > 0), 'expected non-empty cohere message content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"command-r-plus\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/cohere/v2/chat","host":["{{baseUrl}}"],"path":["cohere", "v2", "chat"]} } }, + { "name": "Cohere drop-in: streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var raw = JSON.stringify(j); pm.expect(raw.length, 'body too small').to.be.greaterThan(50); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"command-r-plus\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}]\n,\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/cohere/v2/chat","host":["{{baseUrl}}"],"path":["cohere", "v2", "chat"]} } }, + { "name": "Cohere drop-in: multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Cohere shape: message.content non-empty', function () { var j = pm.response.json(); var c = j.message && j.message.content; pm.expect(Array.isArray(c) ? c.length > 0 : (typeof c === 'string' && c.length > 0), 'expected non-empty cohere message content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"command-r-plus\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/cohere/v2/chat","host":["{{baseUrl}}"],"path":["cohere", "v2", "chat"]} } }, + { "name": "Cohere drop-in: tools", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Cohere shape: message.content non-empty', function () { var j = pm.response.json(); var c = j.message && j.message.content; pm.expect(Array.isArray(c) ? c.length > 0 : (typeof c === 'string' && c.length > 0), 'expected non-empty cohere message content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"command-r-plus\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/cohere/v2/chat","host":["{{baseUrl}}"],"path":["cohere", "v2", "chat"]} } }, + { "name": "Cohere drop-in: list models", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Response content non-empty', function () { var j = pm.response.json(); var raw = JSON.stringify(j); pm.expect(raw.length, 'body too small').to.be.greaterThan(50); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":""}, "url": {"raw":"{{baseUrl}}/cohere/v1/models","host":["{{baseUrl}}"],"path":["cohere", "v1", "models"]} } } + ] + }, + { + "name": "Cross-Cut Round 19: LangChain Drop-in Smoke", + "item": [ + { "name": "LangChain drop-in: OpenAI shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('OpenAI shape: choices[0].message.content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && (j.choices[0].message.content || (j.choices[0].message.tool_calls && j.choices[0].message.tool_calls.length))) || ''; pm.expect(c, 'no content or tool_calls').to.be.ok; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/langchain/v1/chat/completions","host":["{{baseUrl}}"],"path":["langchain", "v1", "chat", "completions"]} } }, + { "name": "LangChain drop-in: Anthropic shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic shape: content array non-empty', function () { var j = pm.response.json(); pm.expect(Array.isArray(j.content) && j.content.length > 0, 'expected non-empty content array').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 256,\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/langchain/v1/messages","host":["{{baseUrl}}"],"path":["langchain", "v1", "messages"]} } }, + { "name": "LangChain drop-in: Gemini shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini shape: candidates[0].content.parts non-empty', function () { var j = pm.response.json(); var parts = j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts; pm.expect(Array.isArray(parts) && parts.length > 0, 'expected non-empty parts').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Hello\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/langchain/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["langchain", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "LangChain drop-in: Bedrock shape converse", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Bedrock Converse shape: output.message.content non-empty', function () { var j = pm.response.json(); var content = j.output && j.output.message && j.output.message.content; pm.expect(Array.isArray(content) && content.length > 0, 'expected non-empty content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":[{\"text\":\"Hello\"}]}],\n \"inferenceConfig\": {\"maxTokens\": 256}\n}"}, "url": {"raw":"{{baseUrl}}/langchain/model/global.anthropic.claude-haiku-4-5-20251001-v1:0/converse","host":["{{baseUrl}}"],"path":["langchain", "model", "global.anthropic.claude-haiku-4-5-20251001-v1:0", "converse"]} } }, + { "name": "LangChain drop-in: Cohere shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Cohere shape: message.content non-empty', function () { var j = pm.response.json(); var c = j.message && j.message.content; pm.expect(Array.isArray(c) ? c.length > 0 : (typeof c === 'string' && c.length > 0), 'expected non-empty cohere message content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"command-r-plus\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/langchain/v2/chat","host":["{{baseUrl}}"],"path":["langchain", "v2", "chat"]} } } + ] + }, + { + "name": "Cross-Cut Round 20: LiteLLM Drop-in Smoke", + "item": [ + { "name": "LiteLLM drop-in: OpenAI shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('OpenAI shape: choices[0].message.content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && (j.choices[0].message.content || (j.choices[0].message.tool_calls && j.choices[0].message.tool_calls.length))) || ''; pm.expect(c, 'no content or tool_calls').to.be.ok; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/litellm/v1/chat/completions","host":["{{baseUrl}}"],"path":["litellm", "v1", "chat", "completions"]} } }, + { "name": "LiteLLM drop-in: Anthropic shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic shape: content array non-empty', function () { var j = pm.response.json(); pm.expect(Array.isArray(j.content) && j.content.length > 0, 'expected non-empty content array').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 256,\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/litellm/v1/messages","host":["{{baseUrl}}"],"path":["litellm", "v1", "messages"]} } }, + { "name": "LiteLLM drop-in: Gemini shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini shape: candidates[0].content.parts non-empty', function () { var j = pm.response.json(); var parts = j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts; pm.expect(Array.isArray(parts) && parts.length > 0, 'expected non-empty parts').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Hello\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/litellm/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["litellm", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "LiteLLM drop-in: Bedrock shape converse", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Bedrock Converse shape: output.message.content non-empty', function () { var j = pm.response.json(); var content = j.output && j.output.message && j.output.message.content; pm.expect(Array.isArray(content) && content.length > 0, 'expected non-empty content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":[{\"text\":\"Hello\"}]}],\n \"inferenceConfig\": {\"maxTokens\": 256}\n}"}, "url": {"raw":"{{baseUrl}}/litellm/model/global.anthropic.claude-haiku-4-5-20251001-v1:0/converse","host":["{{baseUrl}}"],"path":["litellm", "model", "global.anthropic.claude-haiku-4-5-20251001-v1:0", "converse"]} } }, + { "name": "LiteLLM drop-in: Cohere shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Cohere shape: message.content non-empty', function () { var j = pm.response.json(); var c = j.message && j.message.content; pm.expect(Array.isArray(c) ? c.length > 0 : (typeof c === 'string' && c.length > 0), 'expected non-empty cohere message content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"command-r-plus\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/litellm/v2/chat","host":["{{baseUrl}}"],"path":["litellm", "v2", "chat"]} } } + ] + }, + { + "name": "Cross-Cut Round 21: PydanticAI Drop-in Smoke", + "item": [ + { "name": "PydanticAI drop-in: OpenAI shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('OpenAI shape: choices[0].message.content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && (j.choices[0].message.content || (j.choices[0].message.tool_calls && j.choices[0].message.tool_calls.length))) || ''; pm.expect(c, 'no content or tool_calls').to.be.ok; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/pydanticai/v1/chat/completions","host":["{{baseUrl}}"],"path":["pydanticai", "v1", "chat", "completions"]} } }, + { "name": "PydanticAI drop-in: Anthropic shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic shape: content array non-empty', function () { var j = pm.response.json(); pm.expect(Array.isArray(j.content) && j.content.length > 0, 'expected non-empty content array').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 256,\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/pydanticai/v1/messages","host":["{{baseUrl}}"],"path":["pydanticai", "v1", "messages"]} } }, + { "name": "PydanticAI drop-in: Gemini shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini shape: candidates[0].content.parts non-empty', function () { var j = pm.response.json(); var parts = j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts; pm.expect(Array.isArray(parts) && parts.length > 0, 'expected non-empty parts').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Hello\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/pydanticai/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["pydanticai", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "PydanticAI drop-in: Bedrock shape converse", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Bedrock Converse shape: output.message.content non-empty', function () { var j = pm.response.json(); var content = j.output && j.output.message && j.output.message.content; pm.expect(Array.isArray(content) && content.length > 0, 'expected non-empty content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":[{\"text\":\"Hello\"}]}],\n \"inferenceConfig\": {\"maxTokens\": 256}\n}"}, "url": {"raw":"{{baseUrl}}/pydanticai/model/global.anthropic.claude-haiku-4-5-20251001-v1:0/converse","host":["{{baseUrl}}"],"path":["pydanticai", "model", "global.anthropic.claude-haiku-4-5-20251001-v1:0", "converse"]} } }, + { "name": "PydanticAI drop-in: Cohere shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Cohere shape: message.content non-empty', function () { var j = pm.response.json(); var c = j.message && j.message.content; pm.expect(Array.isArray(c) ? c.length > 0 : (typeof c === 'string' && c.length > 0), 'expected non-empty cohere message content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"command-r-plus\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/pydanticai/v2/chat","host":["{{baseUrl}}"],"path":["pydanticai", "v2", "chat"]} } } + ] + }, + { + "name": "Cross-Cut Round 22: Cursor Drop-in Smoke", + "item": [ + { "name": "Cursor drop-in: OpenAI shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('OpenAI shape: choices[0].message.content non-empty', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && (j.choices[0].message.content || (j.choices[0].message.tool_calls && j.choices[0].message.tool_calls.length))) || ''; pm.expect(c, 'no content or tool_calls').to.be.ok; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/cursor/v1/chat/completions","host":["{{baseUrl}}"],"path":["cursor", "v1", "chat", "completions"]} } }, + { "name": "Cursor drop-in: Anthropic shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic shape: content array non-empty', function () { var j = pm.response.json(); pm.expect(Array.isArray(j.content) && j.content.length > 0, 'expected non-empty content array').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 256,\n \"messages\": [{\"role\":\"user\",\"content\":\"Hello\"}]\n}"}, "url": {"raw":"{{baseUrl}}/cursor/v1/messages","host":["{{baseUrl}}"],"path":["cursor", "v1", "messages"]} } }, + { "name": "Cursor drop-in: Gemini shape chat", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini shape: candidates[0].content.parts non-empty', function () { var j = pm.response.json(); var parts = j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts; pm.expect(Array.isArray(parts) && parts.length > 0, 'expected non-empty parts').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Hello\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/cursor/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["cursor", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "Cursor drop-in: Bedrock shape converse", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Bedrock Converse shape: output.message.content non-empty', function () { var j = pm.response.json(); var content = j.output && j.output.message && j.output.message.content; pm.expect(Array.isArray(content) && content.length > 0, 'expected non-empty content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":[{\"text\":\"Hello\"}]}],\n \"inferenceConfig\": {\"maxTokens\": 256}\n}"}, "url": {"raw":"{{baseUrl}}/cursor/model/global.anthropic.claude-haiku-4-5-20251001-v1:0/converse","host":["{{baseUrl}}"],"path":["cursor", "model", "global.anthropic.claude-haiku-4-5-20251001-v1:0", "converse"]} } } + ] + }, + { + "name": "Cross-Cut Round 23: Drop-in Structured Output Matrix (native shapes via /openai, /anthropic, /bedrock, /genai)", + "item": [ + { "name": "Drop-in /openai: gpt-5 (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: JSON with city/country/pop', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('not JSON: ' + e.message); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /openai: gpt-4o (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: JSON with city/country/pop', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('not JSON: ' + e.message); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /openai: gpt-4o-mini (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: JSON with city/country/pop', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('not JSON: ' + e.message); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /anthropic: claude-haiku-4-5 (forced tool emit_city)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic shape: content array non-empty', function () { var j = pm.response.json(); pm.expect(Array.isArray(j.content) && j.content.length > 0, 'expected non-empty content array').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"tools\": [{\"name\":\"emit_city\",\"input_schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"]}}],\n \"tool_choice\": {\"type\":\"tool\",\"name\":\"emit_city\"}\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /anthropic: claude-sonnet-4-6 (forced tool emit_city)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic shape: content array non-empty', function () { var j = pm.response.json(); pm.expect(Array.isArray(j.content) && j.content.length > 0, 'expected non-empty content array').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-sonnet-4-6\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"tools\": [{\"name\":\"emit_city\",\"input_schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"]}}],\n \"tool_choice\": {\"type\":\"tool\",\"name\":\"emit_city\"}\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /bedrock: claude-haiku Converse (toolChoice emit_city)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Bedrock Converse shape: output.message.content non-empty', function () { var j = pm.response.json(); var content = j.output && j.output.message && j.output.message.content; pm.expect(Array.isArray(content) && content.length > 0, 'expected non-empty content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":[{\"text\":\"Extract city/country/pop for Paris.\"}]}],\n \"toolConfig\": {\"tools\":[{\"toolSpec\":{\"name\":\"emit_city\",\"inputSchema\":{\"json\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"]}}}}],\"toolChoice\":{\"tool\":{\"name\":\"emit_city\"}}}\n}"}, "url": {"raw":"{{baseUrl}}/bedrock/model/global.anthropic.claude-haiku-4-5-20251001-v1:0/converse","host":["{{baseUrl}}"],"path":["bedrock", "model", "global.anthropic.claude-haiku-4-5-20251001-v1:0", "converse"]} } }, + { "name": "Drop-in /genai: gemini-2.5-flash (responseSchema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini shape: candidates[0].content.parts non-empty', function () { var j = pm.response.json(); var parts = j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts; pm.expect(Array.isArray(parts) && parts.length > 0, 'expected non-empty parts').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Extract city/country/pop for Paris.\"}]}],\n \"generationConfig\": {\"responseMimeType\":\"application/json\",\"responseSchema\":{\"type\":\"OBJECT\",\"properties\":{\"city\":{\"type\":\"STRING\"},\"country\":{\"type\":\"STRING\"},\"pop\":{\"type\":\"NUMBER\"}}}}\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["genai", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "Drop-in /genai: gemini-2.5-pro (responseSchema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini shape: candidates[0].content.parts non-empty', function () { var j = pm.response.json(); var parts = j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts; pm.expect(Array.isArray(parts) && parts.length > 0, 'expected non-empty parts').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Extract city/country/pop for Paris.\"}]}],\n \"generationConfig\": {\"responseMimeType\":\"application/json\",\"responseSchema\":{\"type\":\"OBJECT\",\"properties\":{\"city\":{\"type\":\"STRING\"},\"country\":{\"type\":\"STRING\"},\"pop\":{\"type\":\"NUMBER\"}}}}\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/gemini-2.5-pro:generateContent","host":["{{baseUrl}}"],"path":["genai", "v1beta", "models", "gemini-2.5-pro:generateContent"]} } } + ] + }, + { + "name": "Cross-Cut Round 24: Drop-in Function Calling Matrix (native shapes)", + "item": [ + { "name": "Drop-in /openai: gpt-5 function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls && j.choices[0].message.tool_calls[0]; pm.expect(tc, 'no tool_calls').to.be.ok; pm.expect(tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /openai: gpt-4o function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls && j.choices[0].message.tool_calls[0]; pm.expect(tc, 'no tool_calls').to.be.ok; pm.expect(tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /openai: gpt-4o-mini function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls && j.choices[0].message.tool_calls[0]; pm.expect(tc, 'no tool_calls').to.be.ok; pm.expect(tc.function.name).to.equal('get_weather'); var a; try { a = JSON.parse(tc.function.arguments); } catch (e) { pm.expect.fail('args not JSON: ' + e.message); return; } pm.expect(a).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /anthropic: claude-opus-4-7 tool_use", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic: tool_use block with get_weather', function () { var j = pm.response.json(); var tu = (j.content || []).find(function (b) { return b.type === 'tool_use'; }); pm.expect(tu, 'no tool_use block').to.be.ok; pm.expect(tu.name).to.equal('get_weather'); pm.expect(tu.input).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-opus-4-7\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"name\":\"get_weather\",\"input_schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /anthropic: claude-sonnet-4-6 tool_use", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic: tool_use block with get_weather', function () { var j = pm.response.json(); var tu = (j.content || []).find(function (b) { return b.type === 'tool_use'; }); pm.expect(tu, 'no tool_use block').to.be.ok; pm.expect(tu.name).to.equal('get_weather'); pm.expect(tu.input).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-sonnet-4-6\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"name\":\"get_weather\",\"input_schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /anthropic: claude-haiku-4-5 tool_use", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic: tool_use block with get_weather', function () { var j = pm.response.json(); var tu = (j.content || []).find(function (b) { return b.type === 'tool_use'; }); pm.expect(tu, 'no tool_use block').to.be.ok; pm.expect(tu.name).to.equal('get_weather'); pm.expect(tu.input).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"name\":\"get_weather\",\"input_schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /bedrock: claude-sonnet-4-6 Converse tool_use", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Bedrock Converse shape: output.message.content non-empty', function () { var j = pm.response.json(); var content = j.output && j.output.message && j.output.message.content; pm.expect(Array.isArray(content) && content.length > 0, 'expected non-empty content').to.be.true; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":[{\"text\":\"Weather in Lagos?\"}]}],\n \"toolConfig\": {\"tools\":[{\"toolSpec\":{\"name\":\"get_weather\",\"inputSchema\":{\"json\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}}]}\n}"}, "url": {"raw":"{{baseUrl}}/bedrock/model/global.anthropic.claude-sonnet-4-6/converse","host":["{{baseUrl}}"],"path":["bedrock", "model", "global.anthropic.claude-sonnet-4-6", "converse"]} } }, + { "name": "Drop-in /genai: gemini-2.5-flash functionDeclarations", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini: functionCall in parts', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var fc = parts.find(function (p) { return p && p.functionCall; }); pm.expect(fc, 'no functionCall').to.be.ok; pm.expect(fc.functionCall.name).to.equal('get_weather'); pm.expect(fc.functionCall.args).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Weather in Lagos?\"}]}],\n \"tools\": [{\"functionDeclarations\":[{\"name\":\"get_weather\",\"parameters\":{\"type\":\"OBJECT\",\"properties\":{\"city\":{\"type\":\"STRING\"}},\"required\":[\"city\"]}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["genai", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "Drop-in /genai: gemini-2.5-pro functionDeclarations", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini: functionCall in parts', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var fc = parts.find(function (p) { return p && p.functionCall; }); pm.expect(fc, 'no functionCall').to.be.ok; pm.expect(fc.functionCall.name).to.equal('get_weather'); pm.expect(fc.functionCall.args).to.have.property('city').that.is.a('string'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Weather in Lagos?\"}]}],\n \"tools\": [{\"functionDeclarations\":[{\"name\":\"get_weather\",\"parameters\":{\"type\":\"OBJECT\",\"properties\":{\"city\":{\"type\":\"STRING\"}},\"required\":[\"city\"]}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/gemini-2.5-pro:generateContent","host":["{{baseUrl}}"],"path":["genai", "v1beta", "models", "gemini-2.5-pro:generateContent"]} } } + ] + }, + { + "name": "Cross-Cut Round 25: Drop-in Vision Matrix (native shapes)", + "item": [ + { "name": "Drop-in /openai: gpt-5 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /openai: gpt-4o vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /openai: gpt-4.1 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4.1\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /anthropic: claude-opus-4-7 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic vision: text block describes image', function () { var j = pm.response.json(); var t = (j.content || []).find(function (b) { return b.type === 'text' && b.text; }); pm.expect(t, 'no text block').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-opus-4-7\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"image\",\"source\":{\"type\":\"url\",\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}},{\"type\":\"text\",\"text\":\"Describe\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /anthropic: claude-sonnet-4-6 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic vision: text block describes image', function () { var j = pm.response.json(); var t = (j.content || []).find(function (b) { return b.type === 'text' && b.text; }); pm.expect(t, 'no text block').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-sonnet-4-6\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"image\",\"source\":{\"type\":\"url\",\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}},{\"type\":\"text\",\"text\":\"Describe\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /anthropic: claude-haiku-4-5 vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic vision: text block describes image', function () { var j = pm.response.json(); var t = (j.content || []).find(function (b) { return b.type === 'text' && b.text; }); pm.expect(t, 'no text block').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"image\",\"source\":{\"type\":\"url\",\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}},{\"type\":\"text\",\"text\":\"Describe\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /genai: gemini-2.5-flash vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini vision: text part describes image', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var t = parts.find(function (p) { return p && p.text; }); pm.expect(t, 'no text part').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Describe\"},{\"fileData\":{\"mimeType\":\"image/jpeg\",\"fileUri\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["genai", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "Drop-in /genai: gemini-2.5-pro vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini vision: text part describes image', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var t = parts.find(function (p) { return p && p.text; }); pm.expect(t, 'no text part').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Describe\"},{\"fileData\":{\"mimeType\":\"image/jpeg\",\"fileUri\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/gemini-2.5-pro:generateContent","host":["{{baseUrl}}"],"path":["genai", "v1beta", "models", "gemini-2.5-pro:generateContent"]} } } + ] + }, + { + "name": "Cross-Cut Round 26: Drop-in Streaming Matrix (native shapes)", + "item": [ + { "name": "Drop-in /openai stream: gpt-4o", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /openai stream: gpt-4o-mini", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /openai stream: gpt-5", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-5\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/openai/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /anthropic stream: claude-opus-4-7", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-opus-4-7\",\n \"max_tokens\": 256,\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /anthropic stream: claude-sonnet-4-6", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-sonnet-4-6\",\n \"max_tokens\": 256,\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /anthropic stream: claude-haiku-4-5", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 256,\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/anthropic/v1/messages","host":["{{baseUrl}}"],"path":["anthropic", "v1", "messages"]} } }, + { "name": "Drop-in /bedrock stream: claude-haiku Converse", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Bedrock stream: AWS event-stream or SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected stream content-type, got ' + ct).to.match(/event-stream|vnd\\.amazon\\.eventstream/); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":[{\"text\":\"Count 1-5.\"}]}],\n \"inferenceConfig\": {\"maxTokens\": 256}\n}"}, "url": {"raw":"{{baseUrl}}/bedrock/model/global.anthropic.claude-haiku-4-5-20251001-v1:0/converse-stream","host":["{{baseUrl}}"],"path":["bedrock", "model", "global.anthropic.claude-haiku-4-5-20251001-v1:0", "converse-stream"]} } }, + { "name": "Drop-in /genai stream: gemini-2.5-flash", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Count 1-5.\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse","host":["{{baseUrl}}"],"path":["genai", "v1beta", "models", "gemini-2.5-flash:streamGenerateContent"],"query":[{"key":"alt","value":"sse"}]} } }, + { "name": "Drop-in /genai stream: gemini-2.5-pro", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Count 1-5.\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai/v1beta/models/gemini-2.5-pro:streamGenerateContent?alt=sse","host":["{{baseUrl}}"],"path":["genai", "v1beta", "models", "gemini-2.5-pro:streamGenerateContent"],"query":[{"key":"alt","value":"sse"}]} } } + ] + }, + { + "name": "Cross-Cut Round 27: Drop-in Umbrella Vision Matrix (vision via /langchain, /litellm, /pydanticai, /cursor)", + "item": [ + { "name": "Drop-in /langchain vision: OpenAI shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/langchain/v1/chat/completions","host":["{{baseUrl}}"],"path":["langchain", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /langchain vision: Anthropic shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: text block describes image', function () { var j = pm.response.json(); var t = (j.content || []).find(function (b) { return b.type === 'text' && b.text; }); pm.expect(t, 'no text block').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"image\",\"source\":{\"type\":\"url\",\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}},{\"type\":\"text\",\"text\":\"Describe\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/langchain/v1/messages","host":["{{baseUrl}}"],"path":["langchain", "v1", "messages"]} } }, + { "name": "Drop-in /langchain vision: Gemini shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: text part describes image', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var t = parts.find(function (p) { return p && p.text; }); pm.expect(t, 'no text part').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Describe\"},{\"fileData\":{\"mimeType\":\"image/jpeg\",\"fileUri\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/langchain/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["langchain", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "Drop-in /litellm vision: OpenAI shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/litellm/v1/chat/completions","host":["{{baseUrl}}"],"path":["litellm", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /litellm vision: Anthropic shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: text block describes image', function () { var j = pm.response.json(); var t = (j.content || []).find(function (b) { return b.type === 'text' && b.text; }); pm.expect(t, 'no text block').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"image\",\"source\":{\"type\":\"url\",\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}},{\"type\":\"text\",\"text\":\"Describe\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/litellm/v1/messages","host":["{{baseUrl}}"],"path":["litellm", "v1", "messages"]} } }, + { "name": "Drop-in /litellm vision: Gemini shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: text part describes image', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var t = parts.find(function (p) { return p && p.text; }); pm.expect(t, 'no text part').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Describe\"},{\"fileData\":{\"mimeType\":\"image/jpeg\",\"fileUri\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/litellm/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["litellm", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "Drop-in /pydanticai vision: OpenAI shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/pydanticai/v1/chat/completions","host":["{{baseUrl}}"],"path":["pydanticai", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /pydanticai vision: Anthropic shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: text block describes image', function () { var j = pm.response.json(); var t = (j.content || []).find(function (b) { return b.type === 'text' && b.text; }); pm.expect(t, 'no text block').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"image\",\"source\":{\"type\":\"url\",\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}},{\"type\":\"text\",\"text\":\"Describe\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/pydanticai/v1/messages","host":["{{baseUrl}}"],"path":["pydanticai", "v1", "messages"]} } }, + { "name": "Drop-in /pydanticai vision: Gemini shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: text part describes image', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var t = parts.find(function (p) { return p && p.text; }); pm.expect(t, 'no text part').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Describe\"},{\"fileData\":{\"mimeType\":\"image/jpeg\",\"fileUri\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/pydanticai/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["pydanticai", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "Drop-in /cursor vision: OpenAI shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o\",\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/cursor/v1/chat/completions","host":["{{baseUrl}}"],"path":["cursor", "v1", "chat", "completions"]} } }, + { "name": "Drop-in /cursor vision: Anthropic shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: text block describes image', function () { var j = pm.response.json(); var t = (j.content || []).find(function (b) { return b.type === 'text' && b.text; }); pm.expect(t, 'no text block').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"image\",\"source\":{\"type\":\"url\",\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}},{\"type\":\"text\",\"text\":\"Describe\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/cursor/v1/messages","host":["{{baseUrl}}"],"path":["cursor", "v1", "messages"]} } }, + { "name": "Drop-in /cursor vision: Gemini shape", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: text part describes image', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var t = parts.find(function (p) { return p && p.text; }); pm.expect(t, 'no text part').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Describe\"},{\"fileData\":{\"mimeType\":\"image/jpeg\",\"fileUri\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/cursor/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["cursor", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } } + ] + }, + { + "name": "Cross-Cut Round 28: Passthrough Advanced Matrix (features via *_passthrough byte-for-byte routes)", + "item": [ + { "name": "Passthrough /openai: structured output (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: JSON with city/country/pop', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('not JSON: ' + e.message); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/openai_passthrough/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai_passthrough", "v1", "chat", "completions"]} } }, + { "name": "Passthrough /openai: function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Function call: get_weather invoked', function () { var j = pm.response.json(); var tc = j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.tool_calls && j.choices[0].message.tool_calls[0]; pm.expect(tc, 'no tool_calls').to.be.ok; pm.expect(tc.function.name).to.equal('get_weather'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"parameters\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}}]\n}"}, "url": {"raw":"{{baseUrl}}/openai_passthrough/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai_passthrough", "v1", "chat", "completions"]} } }, + { "name": "Passthrough /openai: streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"Authorization","value":"Bearer {{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"model\": \"gpt-4o-mini\",\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/openai_passthrough/v1/chat/completions","host":["{{baseUrl}}"],"path":["openai_passthrough", "v1", "chat", "completions"]} } }, + { "name": "Passthrough /anthropic: function calling (tool_use)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Anthropic: tool_use block', function () { var j = pm.response.json(); var tu = (j.content || []).find(function (b) { return b.type === 'tool_use'; }); pm.expect(tu, 'no tool_use').to.be.ok; pm.expect(tu.name).to.equal('get_weather'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 1024,\n \"messages\": [{\"role\":\"user\",\"content\":\"Weather in Lagos?\"}],\n \"tools\": [{\"name\":\"get_weather\",\"input_schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic_passthrough/v1/messages","host":["{{baseUrl}}"],"path":["anthropic_passthrough", "v1", "messages"]} } }, + { "name": "Passthrough /anthropic: vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: text block describes image', function () { var j = pm.response.json(); var t = (j.content || []).find(function (b) { return b.type === 'text' && b.text; }); pm.expect(t, 'no text block').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 512,\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"image\",\"source\":{\"type\":\"url\",\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}},{\"type\":\"text\",\"text\":\"Describe\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic_passthrough/v1/messages","host":["{{baseUrl}}"],"path":["anthropic_passthrough", "v1", "messages"]} } }, + { "name": "Passthrough /anthropic: multi-turn", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Multi-turn: response present', function () { var j = pm.response.json(); var c = (j.content && j.content.length > 0) || (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content); pm.expect(c, 'no content').to.be.ok; }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 256,\n \"messages\": [{\"role\":\"user\",\"content\":\"Hi\"},{\"role\":\"assistant\",\"content\":\"Hello\"},{\"role\":\"user\",\"content\":\"How are you?\"}]\n}"}, "url": {"raw":"{{baseUrl}}/anthropic_passthrough/v1/messages","host":["{{baseUrl}}"],"path":["anthropic_passthrough", "v1", "messages"]} } }, + { "name": "Passthrough /anthropic: streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-api-key","value":"{{anthropicKey}}"},{"key":"anthropic-version","value":"2023-06-01"}], "body": {"mode":"raw","raw":"{\n \"model\": \"claude-haiku-4-5\",\n \"max_tokens\": 256,\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/anthropic_passthrough/v1/messages","host":["{{baseUrl}}"],"path":["anthropic_passthrough", "v1", "messages"]} } }, + { "name": "Passthrough /genai: function calling", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Gemini: functionCall present', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var fc = parts.find(function (p) { return p && p.functionCall; }); pm.expect(fc, 'no functionCall').to.be.ok; pm.expect(fc.functionCall.name).to.equal('get_weather'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Weather in Lagos?\"}]}],\n \"tools\": [{\"functionDeclarations\":[{\"name\":\"get_weather\",\"parameters\":{\"type\":\"OBJECT\",\"properties\":{\"city\":{\"type\":\"STRING\"}},\"required\":[\"city\"]}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai_passthrough/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["genai_passthrough", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "Passthrough /genai: vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: text part describes image', function () { var j = pm.response.json(); var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var t = parts.find(function (p) { return p && p.text; }); pm.expect(t, 'no text part').to.be.ok; pm.expect(t.text.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Describe\"},{\"fileData\":{\"mimeType\":\"image/jpeg\",\"fileUri\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai_passthrough/v1beta/models/gemini-2.5-flash:generateContent","host":["{{baseUrl}}"],"path":["genai_passthrough", "v1beta", "models", "gemini-2.5-flash:generateContent"]} } }, + { "name": "Passthrough /genai: streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"x-goog-api-key","value":"{{genaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"contents\": [{\"parts\":[{\"text\":\"Count 1-5.\"}]}]\n}"}, "url": {"raw":"{{baseUrl}}/genai_passthrough/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse","host":["{{baseUrl}}"],"path":["genai_passthrough", "v1beta", "models", "gemini-2.5-flash:streamGenerateContent"],"query":[{"key":"alt","value":"sse"}]} } }, + { "name": "Passthrough /azure: structured output (json_schema)", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Structured output: JSON with city/country/pop', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; var p; try { p = JSON.parse(c); } catch (e) { pm.expect.fail('not JSON: ' + e.message); return; } pm.expect(p).to.have.property('city').that.is.a('string'); pm.expect(p).to.have.property('country').that.is.a('string'); pm.expect(p).to.have.property('pop').that.is.a('number'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"api-key","value":"{{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":\"Extract city/country/pop for Paris.\"}],\n \"response_format\": {\"type\":\"json_schema\",\"json_schema\":{\"name\":\"city\",\"strict\":true,\"schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"},\"country\":{\"type\":\"string\"},\"pop\":{\"type\":\"number\"}},\"required\":[\"city\",\"country\",\"pop\"],\"additionalProperties\":false}}}\n}"}, "url": {"raw":"{{baseUrl}}/azure_passthrough/openai/deployments/{{azureDeployment}}/chat/completions?api-version={{azureApiVersion}}","host":["{{baseUrl}}"],"path":["azure_passthrough", "openai", "deployments", "{{azureDeployment}}", "chat", "completions"],"query":[{"key":"api-version","value":"{{azureApiVersion}}"}]} } }, + { "name": "Passthrough /azure: vision", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Vision: response describes image', function () { var j = pm.response.json(); var c = (j.choices && j.choices[0] && j.choices[0].message && j.choices[0].message.content) || ''; pm.expect(c).to.be.a('string').and.not.empty; pm.expect(c.length).to.be.greaterThan(20); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"api-key","value":"{{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"https://storage.googleapis.com/generativeai-downloads/images/scones.jpg\"}}]}]\n}"}, "url": {"raw":"{{baseUrl}}/azure_passthrough/openai/deployments/{{azureDeployment}}/chat/completions?api-version={{azureApiVersion}}","host":["{{baseUrl}}"],"path":["azure_passthrough", "openai", "deployments", "{{azureDeployment}}", "chat", "completions"],"query":[{"key":"api-version","value":"{{azureApiVersion}}"}]} } }, + { "name": "Passthrough /azure: streaming", "event": [{"listen":"test","script":{"type":"text/javascript","exec":["if (pm.response.code < 400) { pm.test('Streaming: SSE response', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected event-stream, got ' + ct).to.include('event-stream'); }); }"]}}], "request": { "method": "POST", "header": [{"key":"Content-Type","value":"application/json"},{"key":"api-key","value":"{{openaiKey}}"}], "body": {"mode":"raw","raw":"{\n \"messages\": [{\"role\":\"user\",\"content\":\"Count 1-5.\"}],\n \"stream\": true\n}"}, "url": {"raw":"{{baseUrl}}/azure_passthrough/openai/deployments/{{azureDeployment}}/chat/completions?api-version={{azureApiVersion}}","host":["{{baseUrl}}"],"path":["azure_passthrough", "openai", "deployments", "{{azureDeployment}}", "chat", "completions"],"query":[{"key":"api-version","value":"{{azureApiVersion}}"}]} } } + ] } + ] } ] diff --git a/tests/e2e/api/runners/augment-provider-harness.mjs b/tests/e2e/api/runners/augment-provider-harness.mjs new file mode 100644 index 0000000000..6a0b5f8639 --- /dev/null +++ b/tests/e2e/api/runners/augment-provider-harness.mjs @@ -0,0 +1,350 @@ +#!/usr/bin/env node +// Adds generated provider-harness coverage rows that would be too repetitive to +// maintain by hand in the checked-in Postman collection. + +import { readFileSync, writeFileSync } from "node:fs"; + +const args = Object.fromEntries( + process.argv.slice(2).reduce((acc, cur, i, arr) => { + if (cur.startsWith("--")) { + const key = cur.slice(2); + const next = arr[i + 1]; + acc.push([key, next && !next.startsWith("--") ? next : "true"]); + } + return acc; + }, []) +); + +const source = args.source; +const out = args.out; + +if (!source || !out) { + console.error("[augment-provider-harness] --source and --out are required"); + process.exit(2); +} + +const collection = JSON.parse(readFileSync(source, "utf8")); + +const responseNonEmptyTest = { + listen: "test", + script: { + type: "text/javascript", + exec: [ + "if (pm.response.code < 400) { pm.test('Thinking response non-empty', function () { var j = pm.response.json(); var msg = j.choices && j.choices[0] && j.choices[0].message; var c = (msg && msg.content) || ''; var r = (msg && (msg.reasoning || msg.reasoning_details)) || null; var parts = (j.candidates && j.candidates[0] && j.candidates[0].content && j.candidates[0].content.parts) || []; var gc = parts.some(function (p) { return p && (p.text || p.thought || p.thoughtSignature); }); pm.expect(c || r || gc, 'expected answer or reasoning content').to.be.ok; }); }", + ], + }, +}; + +const streamingTest = { + listen: "test", + script: { + type: "text/javascript", + exec: [ + "if (pm.response.code < 400) { pm.test('Streaming: response is SSE', function () { var ct = pm.response.headers.get('content-type') || ''; pm.expect(ct, 'expected SSE, got ' + ct).to.include('event-stream'); }); }", + ], + }, +}; + +const chatUrl = { + raw: "{{baseUrl}}/v1/chat/completions", + host: ["{{baseUrl}}"], + path: ["v1", "chat", "completions"], +}; + +const item = ({ name, body, stream = false }) => ({ + name, + event: [stream ? streamingTest : responseNonEmptyTest], + request: { + method: "POST", + header: [{ key: "Content-Type", value: "application/json" }], + body: { mode: "raw", raw: JSON.stringify(body, null, 2) }, + url: chatUrl, + }, +}); + +const genaiItem = ({ name, model, body }) => ({ + name, + event: [responseNonEmptyTest], + request: { + method: "POST", + header: [ + { key: "Content-Type", value: "application/json" }, + { key: "x-goog-api-key", value: "{{genaiKey}}" }, + ], + body: { mode: "raw", raw: JSON.stringify(body, null, 2) }, + url: { + raw: `{{baseUrl}}/genai/v1beta/models/${model}:generateContent`, + host: ["{{baseUrl}}"], + path: ["genai", "v1beta", "models", `${model}:generateContent`], + }, + }, +}); + +const effortModels = [ + { label: "openai/gpt-5", model: "openai/gpt-5" }, + { label: "openai/gpt-5-mini", model: "openai/gpt-5-mini" }, + { label: "openai/o3-mini", model: "openai/o3-mini" }, + { label: "anthropic/claude-opus-4-7", model: "anthropic/claude-opus-4-7", maxTokens: 4096 }, + { label: "anthropic/claude-sonnet-4-6", model: "anthropic/claude-sonnet-4-6", maxTokens: 4096 }, + { label: "bedrock/claude-opus-4-7", model: "bedrock/global.anthropic.claude-opus-4-7", maxTokens: 4096 }, + { label: "bedrock/claude-sonnet-4-6", model: "bedrock/global.anthropic.claude-sonnet-4-6", maxTokens: 4096 }, + { label: "gemini/gemini-2.5-flash", model: "gemini/gemini-2.5-flash" }, + { label: "gemini/gemini-2.5-pro", model: "gemini/gemini-2.5-pro" }, + { label: "vertex/gemini-2.5-flash", model: "vertex/gemini-2.5-flash" }, + { label: "vertex/gemini-2.5-pro", model: "vertex/gemini-2.5-pro" }, + { label: "vertex/claude-opus-4-7", model: "vertex/claude-opus-4-7", maxTokens: 4096 }, + { label: "vertex/claude-sonnet-4-6", model: "vertex/claude-sonnet-4-6", maxTokens: 4096 }, +]; + +const efforts = ["low", "medium", "high"]; + +const bodyForEffort = (model, effort, stream = false) => { + const body = { + model: model.model, + messages: [{ role: "user", content: "Solve 17 * 23 + 12. Keep the final answer concise." }], + reasoning_effort: effort, + }; + if (model.maxTokens) body.max_tokens = model.maxTokens; + if (stream) body.stream = true; + return body; +}; + +const effortItems = effortModels.flatMap((model) => + efforts.map((effort) => + item({ + name: `Cross-cut: ${model.label} thinking effort ${effort}`, + body: bodyForEffort(model, effort, false), + }) + ) +); + +const streamingEffortItems = effortModels.flatMap((model) => + efforts.map((effort) => + item({ + name: `Cross-cut: ${model.label} streaming + thinking effort ${effort}`, + body: bodyForEffort(model, effort, true), + stream: true, + }) + ) +); + +const nativeThinkingItems = [ + item({ + name: "Cross-cut: anthropic/claude-sonnet-4-6 thinking budget lowest", + body: { + model: "anthropic/claude-sonnet-4-6", + max_tokens: 4096, + thinking: { type: "enabled", budget_tokens: 1024 }, + messages: [{ role: "user", content: "Solve 31 * 27." }], + }, + }), + item({ + name: "Cross-cut: anthropic/claude-sonnet-4-6 thinking budget highest", + body: { + model: "anthropic/claude-sonnet-4-6", + max_tokens: 8192, + thinking: { type: "enabled", budget_tokens: 4096 }, + messages: [{ role: "user", content: "Solve 31 * 27." }], + }, + }), + item({ + name: "Cross-cut: anthropic/claude-opus-4-7 adaptive thinking", + body: { + model: "anthropic/claude-opus-4-7", + max_tokens: 4096, + thinking: { type: "adaptive" }, + messages: [{ role: "user", content: "Solve 31 * 27." }], + }, + }), + genaiItem({ + name: "Cross-cut: gemini/gemini-2.5-flash thinking budget lowest", + model: "gemini-2.5-flash", + body: { contents: [{ parts: [{ text: "Solve 31 * 27." }] }], generationConfig: { thinkingConfig: { thinkingBudget: 1024 } } }, + }), + genaiItem({ + name: "Cross-cut: gemini/gemini-2.5-pro thinking budget highest", + model: "gemini-2.5-pro", + body: { contents: [{ parts: [{ text: "Solve 31 * 27." }] }], generationConfig: { thinkingConfig: { thinkingBudget: 8192 } } }, + }), +]; + +const streamingFeatureItems = [ + item({ + name: "Cross-cut: openai/gpt-4o-mini streaming + function calling tools", + stream: true, + body: { + model: "openai/gpt-4o-mini", + messages: [{ role: "user", content: "Weather in Lagos?" }], + stream: true, + tools: [{ type: "function", function: { name: "get_weather", parameters: { type: "object", properties: { city: { type: "string" } }, required: ["city"] } } }], + }, + }), + item({ + name: "Cross-cut: anthropic/claude-haiku-4-5 streaming + function calling tools", + stream: true, + body: { + model: "anthropic/claude-haiku-4-5", + max_tokens: 512, + messages: [{ role: "user", content: "Weather in Lagos?" }], + stream: true, + tools: [{ type: "function", function: { name: "get_weather", parameters: { type: "object", properties: { city: { type: "string" } }, required: ["city"] } } }], + }, + }), + item({ + name: "Cross-cut: gemini/gemini-2.5-flash streaming + function calling tools", + stream: true, + body: { + model: "gemini/gemini-2.5-flash", + messages: [{ role: "user", content: "Weather in Lagos?" }], + stream: true, + tools: [{ type: "function", function: { name: "get_weather", parameters: { type: "object", properties: { city: { type: "string" } }, required: ["city"] } } }], + }, + }), + item({ + name: "Cross-cut: openai/gpt-4o-mini streaming + vision image input", + stream: true, + body: { + model: "openai/gpt-4o-mini", + messages: [{ role: "user", content: [{ type: "text", text: "Describe this image briefly." }, { type: "image_url", image_url: { url: "https://storage.googleapis.com/generativeai-downloads/images/scones.jpg" } }] }], + stream: true, + }, + }), + item({ + name: "Cross-cut: anthropic/claude-haiku-4-5 streaming + vision image input", + stream: true, + body: { + model: "anthropic/claude-haiku-4-5", + max_tokens: 512, + messages: [{ role: "user", content: [{ type: "image", source: { type: "url", url: "https://storage.googleapis.com/generativeai-downloads/images/scones.jpg" } }, { type: "text", text: "Describe this image briefly." }] }], + stream: true, + }, + }), + item({ + name: "Cross-cut: gemini/gemini-2.5-flash streaming + vision image input", + stream: true, + body: { + model: "gemini/gemini-2.5-flash", + messages: [{ role: "user", content: [{ type: "text", text: "Describe this image briefly." }, { type: "image_url", image_url: { url: "https://storage.googleapis.com/generativeai-downloads/images/scones.jpg" } }] }], + stream: true, + }, + }), + item({ + name: "Cross-cut: openai/gpt-4o-mini streaming + structured output json_schema", + stream: true, + body: { + model: "openai/gpt-4o-mini", + messages: [{ role: "user", content: "Extract city/country/population for Paris." }], + stream: true, + response_format: { type: "json_schema", json_schema: { name: "city", strict: true, schema: { type: "object", properties: { city: { type: "string" }, country: { type: "string" }, population: { type: "number" } }, required: ["city", "country", "population"], additionalProperties: false } } }, + }, + }), + item({ + name: "Cross-cut: gemini/gemini-2.5-flash streaming + structured output json_schema", + stream: true, + body: { + model: "gemini/gemini-2.5-flash", + messages: [{ role: "user", content: "Extract city/country/population for Paris." }], + stream: true, + response_format: { type: "json_schema", json_schema: { name: "city", strict: true, schema: { type: "object", properties: { city: { type: "string" }, country: { type: "string" }, population: { type: "number" } }, required: ["city", "country", "population"], additionalProperties: false } } }, + }, + }), + item({ + name: "Cross-cut: anthropic/claude-opus-4-7 streaming + web search", + stream: true, + body: { + model: "anthropic/claude-opus-4-7", + max_tokens: 1024, + messages: [{ role: "user", content: "Find one current headline and summarize it in one sentence." }], + stream: true, + tools: [{ type: "web_search_20250305", name: "web_search", max_uses: 1 }], + }, + }), + item({ + name: "Cross-cut: gemini/gemini-2.5-flash streaming + google search", + stream: true, + body: { + model: "gemini/gemini-2.5-flash", + messages: [{ role: "user", content: "Find one current headline and summarize it in one sentence." }], + stream: true, + tools: [{ type: "google_search" }], + }, + }), + item({ + name: "Cross-cut: anthropic/claude-haiku-4-5 streaming + prompt caching", + stream: true, + body: { + model: "anthropic/claude-haiku-4-5", + max_tokens: 512, + system: [{ type: "text", text: "Reusable cached context for streaming prompt caching coverage.", cache_control: { type: "ephemeral" } }], + messages: [{ role: "user", content: "Reply with a short acknowledgement." }], + stream: true, + }, + }), + item({ + name: "Cross-cut: openai/gpt-4o-mini streaming + stop sequences", + stream: true, + body: { + model: "openai/gpt-4o-mini", + messages: [{ role: "user", content: "Count one, two, three, four." }], + stop: ["three"], + stream: true, + }, + }), + item({ + name: "Cross-cut: openai/gpt-4o-mini streaming + sampling params", + stream: true, + body: { + model: "openai/gpt-4o-mini", + messages: [{ role: "user", content: "Pick a color and explain why." }], + temperature: 0.7, + top_p: 0.9, + stream: true, + }, + }), +]; + +const generatedFolders = [ + { + name: "Cross-Cut Round 29: Thinking Effort Ladder (generated)", + description: "Generated at harness runtime. Covers low, medium, and high reasoning_effort across reasoning-capable OpenAI, Anthropic, Bedrock, Gemini, and Vertex model families.", + item: effortItems, + }, + { + name: "Cross-Cut Round 30: Streaming + Thinking Matrix (generated)", + description: "Generated at harness runtime. Combines stream:true with low, medium, and high reasoning_effort across reasoning-capable model families.", + item: streamingEffortItems, + }, + { + name: "Cross-Cut Round 31: Native Thinking Modes (generated)", + description: "Generated at harness runtime. Covers native thinking controls where providers expose explicit budgets or adaptive modes.", + item: nativeThinkingItems, + }, + { + name: "Cross-Cut Round 32: Streaming Feature Matrix (generated)", + description: "Generated at harness runtime. Ensures each interactive feature bucket has streaming coverage where the request shape supports stream:true.", + item: streamingFeatureItems, + }, +]; + +const findFolder = (items, name) => { + for (const entry of items || []) { + if (entry.name === name) return entry; + const nested = findFolder(entry.item, name); + if (nested) return nested; + } + return null; +}; + +const backlog = findFolder(collection.item, "12. Backlog Coverage (auto-added missing cases)"); +if (!backlog) { + console.error("[augment-provider-harness] backlog folder not found"); + process.exit(1); +} + +const generatedNames = new Set(generatedFolders.map((f) => f.name)); +backlog.item = (backlog.item || []).filter((entry) => !generatedNames.has(entry.name)); +backlog.item.push(...generatedFolders); + +writeFileSync(out, `${JSON.stringify(collection, null, 2)}\n`); +const generatedCount = generatedFolders.reduce((sum, folder) => sum + folder.item.length, 0); +console.error(`[augment-provider-harness] wrote ${out} with ${generatedCount} generated requests`); diff --git a/tests/e2e/api/runners/filter-collection.mjs b/tests/e2e/api/runners/filter-collection.mjs index ba37475fe6..d2036d3c33 100644 --- a/tests/e2e/api/runners/filter-collection.mjs +++ b/tests/e2e/api/runners/filter-collection.mjs @@ -1,10 +1,15 @@ #!/usr/bin/env node -// Filters a Postman collection by provider, feature keyword, or "rerun failed" +// Filters a Postman collection by provider, feature keyword(s), or "rerun failed" // from a prior newman report. Writes the filtered collection to --out. // // Usage: // node filter-collection.mjs --source path.json --out /tmp/x.json --provider anthropic // node filter-collection.mjs --source path.json --out /tmp/x.json --feature "web search" +// node filter-collection.mjs --source path.json --out /tmp/x.json --feature "cross-cut,structured output" # multi-keyword AND +// +// Structural keyword: "cross-cut" matches by route shape (unified /v1/chat/completions +// with a provider/model body), not just by name substring. Lets the AND filter find +// every cross-cut row without renaming 100+ items to add a literal "Cross-cut:" prefix. // node filter-collection.mjs --source path.json --out /tmp/x.json --rerun-failed --report tmp/newman-report.json import { readFileSync, writeFileSync, existsSync } from "node:fs"; @@ -23,7 +28,10 @@ const args = Object.fromEntries( const SOURCE = args.source; const OUT = args.out; const PROVIDER = (args.provider || "").toLowerCase(); -const FEATURE = (args.feature || "").toLowerCase(); +const FEATURE_PARTS = (args.feature || "").toLowerCase().split(",").map((s) => s.trim()).filter(Boolean); +// --feature-any is the OR-of-keywords counterpart of --feature (which ANDs). Item passes +// if it matches at least one keyword. Combines with --feature/--provider via AND. +const FEATURE_ANY_PARTS = (args["feature-any"] || "").toLowerCase().split(",").map((s) => s.trim()).filter(Boolean); const RERUN_FAILED = args["rerun-failed"] === "true"; const REPORT = args.report || "tmp/newman-report.json"; @@ -31,8 +39,8 @@ if (!SOURCE || !OUT) { console.error("[filter-collection] --source and --out are required"); process.exit(2); } -if (!PROVIDER && !FEATURE && !RERUN_FAILED) { - console.error("[filter-collection] need at least one of: --provider, --feature, --rerun-failed"); +if (!PROVIDER && !FEATURE_PARTS.length && !FEATURE_ANY_PARTS.length && !RERUN_FAILED) { + console.error("[filter-collection] need at least one of: --provider, --feature, --feature-any, --rerun-failed"); process.exit(2); } @@ -46,17 +54,67 @@ const PROVIDER_KEYWORDS = { passthrough: ["_passthrough"], }; -const itemMatchesProvider = (item) => { +// Haystack = item JSON + ancestor folder names. Folder names encode the harness +// taxonomy ("Structured Output cross-cut", "Vertex Features", ...) so PROVIDER and +// FEATURE filters need to see them, otherwise a row named "openai/gpt-4o-mini" inside +// folder "Structured Output cross-cut" is invisible to FEATURE="cross-cut". +const buildHaystack = (item, ancestorNames) => + (JSON.stringify(item) + " " + ancestorNames.join(" ")).toLowerCase(); + +// Structural keywords - matched against route shape, not name substring. Lets users +// say FEATURE="cross-cut,structured output" and have it work for every row routed via +// unified /v1/chat/completions with a provider/model prefix, regardless of how the +// row is named or which folder it lives in. +const STRUCTURAL_KEYWORDS = { + "cross-cut": (item) => { + const req = item.request || {}; + const url = (typeof req.url === "string" ? req.url : req.url?.raw) || ""; + const body = req.body?.raw || ""; + const isUnified = /\/v1\/chat\/completions(\?|$)/.test(url) && + !/\/(openai|anthropic|bedrock|genai|azure)\/v1/.test(url) && + !/_passthrough/.test(url); + const hasProviderPrefix = /"model"\s*:\s*"(openai|anthropic|bedrock|gemini|vertex|azure)\//.test(body); + return isUnified && hasProviderPrefix; + }, + crosscut: (item) => STRUCTURAL_KEYWORDS["cross-cut"](item), +}; + +const FEATURE_ALIASES = { + chat: ["chat", "messages", "responses"], + streaming: ["streaming", "\"stream\": true", "streamgeneratecontent", "converse-stream", "alt=sse"], + embeddings: ["embeddings", "embedding"], + audio: ["audio", "speech", "transcription"], + "image-gen": ["image-gen", "image generation", "image gen", "images/generations"], + tools: ["tools", "\"tools\"", "tool use", "tool_choice", "function calling", "functiondeclarations", "function_calling"], + vision: ["vision", "image_url", "\"type\":\"image\"", "\"type\": \"image\"", "inline_data", "filedata"], + json: ["json_schema", "json object", "structured output", "responseschema", "response_schema", "responsemimetype", "response mime"], + reasoning: ["reasoning", "thinking", "reasoning_effort", "budget_tokens", "thinkingbudget", "thinking_budget"], +}; + +const matchesKeyword = (item, ancestorNames, haystack, keyword) => { + const structural = STRUCTURAL_KEYWORDS[keyword]; + if (structural && (structural(item) || haystack.includes(keyword))) return true; + const aliases = FEATURE_ALIASES[keyword] || [keyword]; + return aliases.some((alias) => haystack.includes(alias)); +}; + +const itemMatchesProvider = (item, ancestorNames) => { if (!PROVIDER) return true; const keywords = PROVIDER_KEYWORDS[PROVIDER] || [PROVIDER]; - const haystack = JSON.stringify(item).toLowerCase(); + const haystack = buildHaystack(item, ancestorNames); return keywords.some((k) => haystack.includes(k)); }; -const itemMatchesFeature = (item) => { - if (!FEATURE) return true; - const haystack = JSON.stringify(item).toLowerCase(); - return haystack.includes(FEATURE); +const itemMatchesFeature = (item, ancestorNames) => { + if (!FEATURE_PARTS.length) return true; + const haystack = buildHaystack(item, ancestorNames); + return FEATURE_PARTS.every((p) => matchesKeyword(item, ancestorNames, haystack, p)); +}; + +const itemMatchesFeatureAny = (item, ancestorNames) => { + if (!FEATURE_ANY_PARTS.length) return true; + const haystack = buildHaystack(item, ancestorNames); + return FEATURE_ANY_PARTS.some((p) => matchesKeyword(item, ancestorNames, haystack, p)); }; let failedNames = null; @@ -79,18 +137,21 @@ const itemMatchesRerunFailed = (item) => { return failedNames.has(item.name); }; -const passes = (item) => { +const passes = (item, ancestorNames) => { if (!item.request) return true; // folders pass; we filter their items below - return itemMatchesProvider(item) && itemMatchesFeature(item) && itemMatchesRerunFailed(item); + return itemMatchesProvider(item, ancestorNames) && + itemMatchesFeature(item, ancestorNames) && + itemMatchesFeatureAny(item, ancestorNames) && + itemMatchesRerunFailed(item); }; -const filterTree = (items) => { +const filterTree = (items, ancestorNames = []) => { const out = []; for (const item of items) { if (Array.isArray(item.item)) { - const kids = filterTree(item.item); + const kids = filterTree(item.item, [...ancestorNames, item.name || ""]); if (kids.length > 0) out.push({ ...item, item: kids }); - } else if (passes(item)) { + } else if (passes(item, ancestorNames)) { out.push(item); } } @@ -101,4 +162,4 @@ const collection = JSON.parse(readFileSync(SOURCE, "utf8")); const filtered = { ...collection, item: filterTree(collection.item || []) }; const totalAfter = JSON.stringify(filtered).match(/"request":/g)?.length || 0; writeFileSync(OUT, JSON.stringify(filtered, null, 2)); -console.error(`[filter-collection] wrote ${OUT} with ${totalAfter} requests after filter (provider=${PROVIDER || "-"}, feature=${FEATURE || "-"}, rerun-failed=${RERUN_FAILED})`); +console.error(`[filter-collection] wrote ${OUT} with ${totalAfter} requests after filter (provider=${PROVIDER || "-"}, feature=${FEATURE_PARTS.join("+") || "-"}, feature-any=${FEATURE_ANY_PARTS.join("|") || "-"}, rerun-failed=${RERUN_FAILED})`); diff --git a/tests/e2e/api/runners/harness-monitor.mjs b/tests/e2e/api/runners/harness-monitor.mjs new file mode 100644 index 0000000000..118c9a9fd6 --- /dev/null +++ b/tests/e2e/api/runners/harness-monitor.mjs @@ -0,0 +1,561 @@ +#!/usr/bin/env node +// Live terminal progress monitor for `make run-provider-harness-test`. +// +// Tails per-provider newman CLI logs (parallel mode) or the merged CLI log +// (sequential mode), aggregates pass/fail/% per provider with folder breakdown, +// elapsed time + ETA, and most-recent failure text. Renders an in-place table. +// +// Usage: +// node harness-monitor.mjs \ +// --mode parallel \ +// --providers "openai anthropic bedrock gemini vertex azure passthrough" \ +// --tmp-dir tmp \ +// --status-file tmp/parallel-status \ +// --launched 7 +// +// node harness-monitor.mjs \ +// --mode sequential \ +// --providers "openai anthropic" \ +// --tmp-dir tmp \ +// --log tmp/newman-cli.log + +import { existsSync, readFileSync, statSync, openSync, readSync, closeSync } from "node:fs"; +import { join } from "node:path"; + +const args = Object.fromEntries( + process.argv.slice(2).reduce((acc, cur, i, arr) => { + if (cur.startsWith("--")) { + const key = cur.slice(2); + const next = arr[i + 1]; + acc.push([key, next && !next.startsWith("--") ? next : "true"]); + } + return acc; + }, []) +); + +const MODE = args.mode === "sequential" ? "sequential" : "parallel"; +const PROVIDERS = (args.providers || "").trim().split(/\s+/).filter(Boolean); +const TMP_DIR = args["tmp-dir"] || "tmp"; +const STATUS_FILE = args["status-file"] || join(TMP_DIR, "parallel-status"); +const LAUNCHED = parseInt(args.launched || String(PROVIDERS.length), 10); +const SEQ_LOG = args.log || join(TMP_DIR, "newman-cli.log"); +const TAIL_INTERVAL_MS = 250; +const RENDER_INTERVAL_MS = 1000; +const IDLE_EXIT_MS = 3000; + +if (PROVIDERS.length === 0) { + console.error("[harness-monitor] --providers is required"); + process.exit(2); +} + +// Mirror filter-collection.mjs PROVIDER_KEYWORDS. Used only in sequential mode +// to route folder/request lines (which lack a [provider] prefix) to a provider. +const PROVIDER_KEYWORDS = { + openai: ["openai", "gpt-", "o3", "o1"], + anthropic: ["anthropic", "claude-"], + bedrock: ["bedrock"], + gemini: ["gemini", "genai", "googlesearch"], + vertex: ["vertex"], + azure: ["azure", "deployments"], + passthrough: ["_passthrough", "passthrough"], +}; + +const ANSI_RE = /\x1b\[[0-9;?]*[A-Za-z]/g; +const stripAnsi = (s) => s.replace(ANSI_RE, ""); + +// State per provider. status transitions: pending -> running -> pass/fail/skipped. +const state = { + startedAt: Date.now(), + mode: MODE, + providers: Object.fromEntries( + PROVIDERS.map((p) => [ + p, + { + status: "pending", + totalRequests: 0, + doneRequests: 0, + pass: 0, + fail: 0, + folders: {}, + folderOrder: [], + currentFolder: null, + currentRequest: null, + currentRequestDone: false, + currentRequestHadFail: false, + currentRequestFolder: null, + lastFailure: null, + }, + ]) + ), +}; +let lastByteAt = Date.now(); +let lastRenderLines = 0; + +// ----- Denominator: walk the filtered collection per provider. ---------------- + +function countLeaves(items, perFolder, topFolder) { + if (!Array.isArray(items)) return 0; + let total = 0; + for (const node of items) { + if (Array.isArray(node.item)) { + const next = topFolder ?? node.name ?? "(root)"; + total += countLeaves(node.item, perFolder, next); + } else if (node.request) { + total += 1; + const folder = topFolder ?? "(root)"; + if (!perFolder[folder]) perFolder[folder] = { total: 0, pass: 0, fail: 0 }; + perFolder[folder].total += 1; + } + } + return total; +} + +function loadDenominators() { + for (const p of PROVIDERS) { + const ps = state.providers[p]; + // Parallel mode writes tmp/harness-filtered-

.json per provider. + // Sequential mode writes tmp/harness-filtered.json once, or falls back to the source collection. + const candidates = + MODE === "parallel" + ? [join(TMP_DIR, `harness-filtered-${p}.json`)] + : [ + join(TMP_DIR, "harness-filtered.json"), + "tests/e2e/api/collections/provider-harness.json", + ]; + for (const path of candidates) { + if (!existsSync(path)) continue; + try { + const data = JSON.parse(readFileSync(path, "utf8")); + const folders = {}; + const total = countLeaves(data.item || [], folders, null); + ps.totalRequests = total; + ps.folders = folders; + ps.folderOrder = Object.keys(folders); + break; + } catch { + // ignore - try next candidate + } + } + } +} + +// ----- Tail: poll-based incremental read of newman CLI logs. ------------------ + +const tails = new Map(); // path -> { provider, offset, buf } + +function ensureTail(path, provider) { + if (!tails.has(path)) tails.set(path, { provider, offset: 0, buf: "" }); +} + +function readNewBytes() { + for (const [path, h] of tails) { + let st; + try { + st = statSync(path); + } catch { + continue; + } + if (st.size <= h.offset) continue; + const len = st.size - h.offset; + const buf = Buffer.alloc(len); + let fd; + try { + fd = openSync(path, "r"); + readSync(fd, buf, 0, len, h.offset); + } catch { + if (fd != null) try { closeSync(fd); } catch {} + continue; + } + closeSync(fd); + h.offset = st.size; + h.buf += buf.toString("utf8"); + const lines = h.buf.split("\n"); + h.buf = lines.pop(); + for (const raw of lines) handleLine(stripAnsi(raw), h.provider); + lastByteAt = Date.now(); + } +} + +// ----- Parsing ---------------------------------------------------------------- + +const RE_PREFIX = /^\[([a-z]+)\]\s?(.*)$/; +const RE_FOLDER = /^❏\s+(.+?)\s*$/; +const RE_REQUEST = /^↳\s+(.+?)\s*$/; +const RE_REQUEST_DONE = /\[\s*\d+(?:\s+[A-Za-z]+)?,\s*[\d.]+\s*[kMG]?B,\s*[\d.]+\s*m?s\s*\]/; +const RE_ASSERT_FAIL = /^\s*\d+\.\s+(.+?)$/; + +function inferProviderFromLine(line) { + const lower = line.toLowerCase(); + for (const p of PROVIDERS) { + const kws = PROVIDER_KEYWORDS[p] || [p]; + for (const k of kws) if (lower.includes(k)) return p; + } + return null; +} + +// Newman emits per-request lines in this order: ↳ start, then the [size,duration] +// summary, then ✓ pass-assertions, then numbered fail lines. So we can't commit +// pass/fail at the summary line - we'd miss subsequent fail lines. Instead we +// defer commit until the next ↳ / ❏ / finalizeAll(). +function finalizeRequest(ps) { + if (!ps.currentRequest) return; + if (ps.currentRequestDone) { + if (ps.currentRequestHadFail) ps.fail += 1; + else ps.pass += 1; + const f = ps.currentRequestFolder; + if (f && ps.folders[f]) { + if (ps.currentRequestHadFail) ps.folders[f].fail += 1; + else ps.folders[f].pass += 1; + } + } + ps.currentRequest = null; + ps.currentRequestDone = false; + ps.currentRequestHadFail = false; + ps.currentRequestFolder = null; +} + +function finalizeAll() { + for (const p of PROVIDERS) finalizeRequest(state.providers[p]); +} + +function handleLine(line, taggedProvider) { + let provider = taggedProvider; + let body = line; + + if (MODE === "parallel") { + const m = line.match(RE_PREFIX); + if (m && state.providers[m[1]]) { + provider = m[1]; + body = m[2]; + } else if (!provider) { + return; + } + } + + const ps = state.providers[provider]; + if (!ps) return; + if (ps.status === "pending") ps.status = "running"; + + const trimmed = body.trimStart(); + + let m; + if ((m = trimmed.match(RE_FOLDER))) { + finalizeRequest(ps); + const folder = m[1].trim(); + ps.currentFolder = folder; + if (!ps.folders[folder]) { + ps.folders[folder] = { total: 0, pass: 0, fail: 0 }; + ps.folderOrder.push(folder); + } + return; + } + if ((m = trimmed.match(RE_REQUEST))) { + finalizeRequest(ps); + ps.currentRequest = m[1].trim(); + ps.currentRequestDone = false; + ps.currentRequestHadFail = false; + ps.currentRequestFolder = ps.currentFolder; + return; + } + // Disambiguate request-done summary from assertion-fail; check done first. + if (RE_REQUEST_DONE.test(trimmed)) { + if (ps.currentRequest && !ps.currentRequestDone) { + ps.currentRequestDone = true; + ps.doneRequests += 1; + } + return; + } + if ((m = trimmed.match(RE_ASSERT_FAIL)) && ps.currentRequest) { + ps.currentRequestHadFail = true; + ps.lastFailure = { folder: ps.currentRequestFolder, text: m[1].trim() }; + return; + } +} + +// ----- Status file: pick up final pass/fail verdicts in parallel mode. -------- + +function readStatusFile() { + if (MODE !== "parallel") return { lines: 0 }; + if (!existsSync(STATUS_FILE)) return { lines: 0 }; + let content; + try { + content = readFileSync(STATUS_FILE, "utf8"); + } catch { + return { lines: 0 }; + } + const lines = content.trim().split("\n").filter(Boolean); + for (const ln of lines) { + const [p, v] = ln.split(":"); + const ps = state.providers[p]; + if (!ps) continue; + if (v === "pass") ps.status = "pass"; + else if (v === "fail") ps.status = "fail"; + } + return { lines: lines.length }; +} + +// ----- Render ----------------------------------------------------------------- + +const C = { + reset: "\x1b[0m", + bold: "\x1b[1m", + dim: "\x1b[2m", + red: "\x1b[31m", + green: "\x1b[32m", + yellow: "\x1b[33m", + cyan: "\x1b[36m", + gray: "\x1b[90m", +}; + +function fmtDuration(ms) { + if (!isFinite(ms) || ms < 0) return "--:--"; + const s = Math.floor(ms / 1000); + const m = Math.floor(s / 60); + const r = s % 60; + return `${String(m).padStart(2, "0")}:${String(r).padStart(2, "0")}`; +} + +function truncate(s, n) { + if (!s) return ""; + return s.length <= n ? s : s.slice(0, n - 1) + "…"; +} + +function padRight(s, n) { + const str = String(s); + return str.length >= n ? str.slice(0, n) : str + " ".repeat(n - str.length); +} +function padLeft(s, n) { + const str = String(s); + return str.length >= n ? str.slice(0, n) : " ".repeat(n - str.length) + str; +} + +function statusGlyph(status) { + switch (status) { + case "pass": return `${C.green}✓${C.reset}`; + case "fail": return `${C.red}✗${C.reset}`; + case "running": return `${C.cyan}●${C.reset}`; + case "skipped": return `${C.gray}-${C.reset}`; + default: return `${C.gray}·${C.reset}`; + } +} + +function renderFrame() { + const cols = process.stdout.columns || 120; + + // Aggregate totals. + let aggDone = 0, aggTotal = 0, aggPass = 0, aggFail = 0; + for (const p of PROVIDERS) { + const ps = state.providers[p]; + aggDone += ps.doneRequests; + aggTotal += ps.totalRequests; + aggPass += ps.pass; + aggFail += ps.fail; + } + const elapsed = Date.now() - state.startedAt; + const eta = + aggDone > 0 && aggTotal > aggDone ? elapsed * (aggTotal / aggDone - 1) : NaN; + + const out = []; + out.push( + `${C.bold}Bifrost Provider Harness - live${C.reset}` + + ` ${C.dim}Elapsed${C.reset} ${fmtDuration(elapsed)}` + + ` ${C.dim}ETA${C.reset} ${fmtDuration(eta)}` + + ` ${C.dim}Mode${C.reset} ${state.mode}` + ); + + // Table width math: each cell consumes (width + 3) chars (" content │"), + // plus 1 leading "│". So total = 1 + 3N + sum(widths). Compute the failure + // column from terminal width to guarantee no row wraps. Drop the column + // entirely if there isn't even 20 chars left for it. + const fixed = [1, 12, 9, 5, 5, 5]; + const fixedSum = fixed.reduce((a, b) => a + b, 0); + const overheadWith7 = 1 + 3 * 7; // 22 + const overheadWith6 = 1 + 3 * 6; // 19 + const targetWidth = Math.max(40, cols - 1); + const failColWidth = targetWidth - overheadWith7 - fixedSum; + const showFailureCol = failColWidth >= 20; + const headers = showFailureCol + ? ["", "Provider", "Done", "Pass", "Fail", "%", "Last failure"] + : ["", "Provider", "Done", "Pass", "Fail", "%"]; + const widths = showFailureCol ? [...fixed, failColWidth] : fixed; + + const sep = (left, mid, right, fill = "─") => { + let line = left; + for (let i = 0; i < widths.length; i++) { + line += fill.repeat(widths[i] + 2); + line += i === widths.length - 1 ? right : mid; + } + return line; + }; + + const row = (cells) => { + let line = "│"; + for (let i = 0; i < cells.length; i++) { + line += " " + padRight(cells[i], widths[i]) + " │"; + } + return line; + }; + + out.push(sep("┌", "┬", "┐")); + out.push(row(headers)); + out.push(sep("├", "┼", "┤")); + + for (const p of PROVIDERS) { + const ps = state.providers[p]; + const pct = ps.totalRequests ? Math.floor((100 * ps.doneRequests) / ps.totalRequests) : 0; + const doneCell = `${padLeft(ps.doneRequests, 3)}/${padRight(ps.totalRequests, 3)}`; + const failCellRaw = ps.fail > 0 ? `${C.red}${padLeft(ps.fail, widths[4])}${C.reset}` : padLeft(ps.fail, widths[4]); + const cells = [ + statusGlyph(ps.status), + p, + doneCell, + padLeft(ps.pass, widths[3]), + // failCell: pre-padded so the row() pad-right is a no-op for this cell + failCellRaw, + `${pct}%`, + ]; + if (showFailureCol) { + cells.push(truncate(ps.lastFailure?.text || (ps.currentRequest || "-"), widths[6])); + } + out.push(rowWithRawCells(cells, widths)); + } + + out.push(sep("├", "┼", "┤")); + const totalPct = aggTotal ? Math.floor((100 * aggDone) / aggTotal) : 0; + const totalCells = [ + "", + `${C.bold}TOTAL${C.reset}`, + `${padLeft(aggDone, 3)}/${padRight(aggTotal, 3)}`, + padLeft(aggPass, widths[3]), + aggFail > 0 ? `${C.red}${padLeft(aggFail, widths[4])}${C.reset}` : padLeft(aggFail, widths[4]), + `${totalPct}%`, + ]; + if (showFailureCol) totalCells.push(""); + out.push(rowWithRawCells(totalCells, widths)); + out.push(sep("└", "┴", "┘")); + + // Folder breakdown: show each running provider's currentFolder + last few folders. + out.push(""); + out.push(`${C.bold}Current folders${C.reset}`); + for (const p of PROVIDERS) { + const ps = state.providers[p]; + if (ps.totalRequests === 0) continue; + const cur = ps.currentFolder; + if (!cur) { + out.push(` ${padRight(p, 12)} ${C.gray}(waiting)${C.reset}`); + continue; + } + const f = ps.folders[cur] || { total: 0, pass: 0, fail: 0 }; + const doneInFolder = f.pass + f.fail; + out.push( + ` ${padRight(p, 12)} ${C.cyan}${truncate(cur, 40)}${C.reset} ` + + `${doneInFolder}/${f.total} ` + + `(${C.green}✓ ${f.pass}${C.reset}, ${f.fail > 0 ? C.red : C.dim}✗ ${f.fail}${C.reset})` + ); + } + + return out; +} + +// Cell may contain ANSI escapes; padRight in row() would break alignment. So +// compute visible length, then pad with spaces externally. +function rowWithRawCells(cells, widths) { + let line = "│"; + for (let i = 0; i < cells.length; i++) { + const raw = String(cells[i]); + const visible = raw.replace(ANSI_RE, ""); + const w = widths[i]; + const padded = visible.length >= w ? raw : raw + " ".repeat(w - visible.length); + line += " " + padded + " │"; + } + return line; +} + +function draw() { + const lines = renderFrame(); + const rows = process.stdout.rows || lines.length; + // Clamp to terminal height so we don't push the title off the top. + const visible = lines.slice(0, Math.max(1, rows - 1)); + let out = "\x1b[H"; // cursor home (alt screen, so this is the buffer origin) + for (const ln of visible) out += ln + "\x1b[K\n"; + out += "\x1b[J"; // clear from cursor to end-of-screen (wipes prior taller frame's tail) + process.stdout.write(out); + lastRenderLines = visible.length; +} + +// ----- Lifecycle -------------------------------------------------------------- + +function setupTails() { + if (MODE === "parallel") { + for (const p of PROVIDERS) { + ensureTail(join(TMP_DIR, `newman-cli-${p}.log`), p); + } + } else { + // Sequential: one shared log, provider inferred per-line. + ensureTail(SEQ_LOG, null); + } +} + +function shouldExit() { + if (MODE === "parallel") { + const { lines } = readStatusFile(); + if (lines >= LAUNCHED && Date.now() - lastByteAt > IDLE_EXIT_MS) return true; + } else { + // Sequential mode: rely on signals from the Makefile. Also exit when the + // log shows the newman "failures" summary block AND we've been idle. + if (Date.now() - lastByteAt > IDLE_EXIT_MS * 2 && lastRenderLines > 0) { + const allDone = PROVIDERS.every( + (p) => state.providers[p].totalRequests === 0 || + state.providers[p].doneRequests >= state.providers[p].totalRequests + ); + if (allDone) return true; + } + } + return false; +} + +function teardown(code = 0) { + // Drain any pending bytes the tail timer hasn't picked up yet, then commit + // the trailing in-flight request before the final frame. + readNewBytes(); + finalizeAll(); + draw(); + // Snapshot the final frame to stderr so it persists on the main screen + // after we leave the alt buffer (otherwise the user sees the table vanish). + const finalLines = renderFrame(); + // Leave alt screen, restore cursor, then print the persistent snapshot. + process.stdout.write("\x1b[?25h\x1b[?1049l"); + process.stderr.write(finalLines.join("\n") + "\n"); + process.exit(code); +} + +process.on("SIGTERM", () => teardown(0)); +process.on("SIGINT", () => teardown(130)); +process.on("SIGHUP", () => teardown(0)); + +// Enter alt screen buffer + hide cursor + clear it. This gives us a fresh +// canvas with a known origin so cursor-home redraws are deterministic and +// the preamble (boot logs, launch messages) is preserved on the main screen. +process.stdout.write("\x1b[?1049h\x1b[H\x1b[2J\x1b[?25l"); + +// Initial denominator pass; retry once a second until at least one provider has totals. +loadDenominators(); +const denomTimer = setInterval(() => { + const haveAny = PROVIDERS.some((p) => state.providers[p].totalRequests > 0); + if (!haveAny) loadDenominators(); + else clearInterval(denomTimer); +}, 1000); + +setupTails(); +setInterval(() => { + readNewBytes(); + readStatusFile(); +}, TAIL_INTERVAL_MS); + +setInterval(() => { + draw(); + if (shouldExit()) teardown(0); +}, RENDER_INTERVAL_MS); + +// Draw a first frame immediately so the user sees something. +draw(); diff --git a/tests/e2e/api/runners/pick-features.mjs b/tests/e2e/api/runners/pick-features.mjs new file mode 100644 index 0000000000..b65edf36f6 --- /dev/null +++ b/tests/e2e/api/runners/pick-features.mjs @@ -0,0 +1,181 @@ +#!/usr/bin/env node +// Interactive multi-select picker for harness modalities (criss-cross matrix). +// +// Designed to be invoked via $(node pick-features.mjs) - we read keys from +// stdin (raw mode), render the menu to stderr, and write only the final +// selection (comma-separated, lowercase) to stdout so the Makefile can +// capture it cleanly. +// +// Exit codes: +// 0 - selection confirmed (stdout = comma-separated keywords; empty when all selected) +// 1 - user cancelled (Esc / Ctrl+C / q) +// 2 - not running on an interactive TTY (no menu shown; stdout empty) + +const FEATURES = [ + { key: "chat", label: "Chat completions (text-only)" }, + { key: "streaming", label: "Streaming responses (SSE)" }, + { key: "embeddings", label: "Embeddings" }, + { key: "audio", label: "Audio (speech / transcription)" }, + { key: "image-gen", label: "Image generation" }, + { key: "tools", label: "Tool / function calling" }, + { key: "vision", label: "Vision (image input)" }, + { key: "json", label: "Structured output (JSON mode / schema)" }, + { key: "reasoning", label: "Reasoning / thinking" }, +]; + +// Need a TTY on both stdin (for keys) and stderr (for menu render). stdout is +// intentionally NOT required - it's the result channel, often piped via $(...). +if (!process.stdin.isTTY || !process.stderr.isTTY) { + process.exit(2); +} + +const writeUI = (s) => process.stderr.write(s); + +const selected = new Set(FEATURES.map((f) => f.key)); +let cursor = 0; +let firstFrame = true; +let lastLines = 0; + +const C = { + reset: "\x1b[0m", + bold: "\x1b[1m", + dim: "\x1b[2m", + cyan: "\x1b[36m", + green: "\x1b[32m", + yellow: "\x1b[33m", + gray: "\x1b[90m", +}; + +function render() { + const lines = []; + lines.push(`${C.bold}Bifrost harness - pick modalities${C.reset} ${C.dim}(space toggles, a=all, n=none, enter runs, q=cancel)${C.reset}`); + lines.push(""); + for (let i = 0; i < FEATURES.length; i++) { + const f = FEATURES[i]; + const box = selected.has(f.key) ? `${C.green}[x]${C.reset}` : "[ ]"; + const arrow = i === cursor ? `${C.cyan}>${C.reset}` : " "; + const label = i === cursor ? `${C.bold}${f.label}${C.reset}` : f.label; + lines.push(` ${arrow} ${box} ${label} ${C.gray}${f.key}${C.reset}`); + } + lines.push(""); + const n = selected.size; + const summary = n === FEATURES.length + ? `${C.dim}All modalities selected (no filter) - all providers will run${C.reset}` + : n === 0 + ? `${C.yellow}No modalities selected - press space or 'a' to choose at least one${C.reset}` + : `${C.dim}${n} of ${FEATURES.length} selected: ${[...selected].join(", ")}${C.reset}`; + lines.push(summary); + + let out = ""; + if (firstFrame) { + writeUI("\x1b[?25l"); + firstFrame = false; + } else if (lastLines > 0) { + out += `\x1b[${lastLines}A\x1b[0J`; + } + out += lines.join("\n") + "\n"; + writeUI(out); + lastLines = lines.length; +} + +function restoreTty() { + try { process.stdin.setRawMode(false); } catch {} + writeUI("\x1b[?25h"); +} + +function commit() { + restoreTty(); + process.stdin.pause(); + if (selected.size === 0) process.exit(1); + // Emit empty when all are selected so the Makefile takes the no-filter path. + if (selected.size < FEATURES.length) { + process.stdout.write([...selected].join(",")); + } + process.exit(0); +} + +function cancel() { + restoreTty(); + process.stdin.pause(); + process.stderr.write("\n[pick-features] cancelled\n"); + process.exit(1); +} + +process.on("SIGINT", cancel); +process.on("SIGTERM", cancel); + +process.stdin.setRawMode(true); +process.stdin.resume(); +process.stdin.setEncoding("utf8"); + +process.stdin.on("data", (chunk) => { + for (const key of splitKeys(chunk)) handleKey(key); +}); + +function handleKey(key) { + if (key === "\x1b[A" || key === "k") { + cursor = (cursor - 1 + FEATURES.length) % FEATURES.length; + render(); + return; + } + if (key === "\x1b[B" || key === "j") { + cursor = (cursor + 1) % FEATURES.length; + render(); + return; + } + if (key === " ") { + const k = FEATURES[cursor].key; + if (selected.has(k)) selected.delete(k); + else selected.add(k); + render(); + return; + } + if (key === "a") { + for (const f of FEATURES) selected.add(f.key); + render(); + return; + } + if (key === "n") { + selected.clear(); + render(); + return; + } + if (key === "\r" || key === "\n") { + if (selected.size === 0) return; + commit(); + return; + } + if (key === "\x1b" || key === "q" || key === "\x03") { + cancel(); + return; + } +} + +// Terminals batch fast keypresses into a single data chunk. Split on ESC +// boundaries so an arrow-key escape sequence stays atomic. +function splitKeys(chunk) { + const out = []; + let i = 0; + while (i < chunk.length) { + const ch = chunk[i]; + if (ch === "\x1b") { + // CSI sequence: ESC [ + if (chunk[i + 1] === "[") { + let j = i + 2; + while (j < chunk.length && !/[A-Za-z~]/.test(chunk[j])) j++; + out.push(chunk.slice(i, j + 1)); + i = j + 1; + continue; + } + // bare ESC + out.push("\x1b"); + i++; + continue; + } + out.push(ch); + i++; + } + return out; +} + +render(); diff --git a/tests/e2e/api/runners/run-stream-cancellation.mjs b/tests/e2e/api/runners/run-stream-cancellation.mjs new file mode 100644 index 0000000000..afe23932b7 --- /dev/null +++ b/tests/e2e/api/runners/run-stream-cancellation.mjs @@ -0,0 +1,196 @@ +#!/usr/bin/env node +// Exercises Bifrost's server-side stream cancellation path by opening a stream, +// reading the first bytes, then aborting the downstream request. + +import { writeFileSync } from "node:fs"; + +const args = Object.fromEntries( + process.argv.slice(2).reduce((acc, cur, i, arr) => { + if (cur.startsWith("--")) { + const key = cur.slice(2); + const next = arr[i + 1]; + acc.push([key, next && !next.startsWith("--") ? next : "true"]); + } + return acc; + }, []) +); + +const baseUrl = args["base-url"] || process.env.BASE_URL || "http://localhost:8080"; +const providerFilter = (args.provider || "").toLowerCase(); +const out = args.out || "tmp/stream-cancel-report.json"; +const readTimeoutMs = Number(args["read-timeout-ms"] || 20000); +const abortAfterBytes = Number(args["abort-after-bytes"] || 1); + +const cases = [ + { + provider: "openai", + name: "openai/gpt-4o-mini chat stream", + path: "/v1/chat/completions", + body: { + model: "openai/gpt-4o-mini", + messages: [{ role: "user", content: "Count from 1 to 100 slowly." }], + stream: true, + }, + }, + { + provider: "anthropic", + name: "anthropic/claude-haiku-4-5 chat stream", + path: "/v1/chat/completions", + body: { + model: "anthropic/claude-haiku-4-5", + messages: [{ role: "user", content: "Count from 1 to 100 slowly." }], + stream: true, + max_tokens: 512, + }, + }, + { + provider: "bedrock", + name: "bedrock/nova-lite chat stream", + path: "/v1/chat/completions", + body: { + model: "bedrock/us.amazon.nova-lite-v1:0", + messages: [{ role: "user", content: "Count from 1 to 100 slowly." }], + stream: true, + }, + }, + { + provider: "gemini", + name: "gemini/gemini-2.5-flash chat stream", + path: "/v1/chat/completions", + body: { + model: "gemini/gemini-2.5-flash", + messages: [{ role: "user", content: "Count from 1 to 100 slowly." }], + stream: true, + }, + }, + { + provider: "vertex", + name: "vertex/gemini-2.5-flash chat stream", + path: "/v1/chat/completions", + body: { + model: "vertex/gemini-2.5-flash", + messages: [{ role: "user", content: "Count from 1 to 100 slowly." }], + stream: true, + }, + }, + { + provider: "azure", + name: "azure deployment chat stream", + path: "/v1/chat/completions", + body: { + model: "azure/{{azureDeployment}}", + messages: [{ role: "user", content: "Count from 1 to 100 slowly." }], + stream: true, + }, + }, +].filter((c) => !providerFilter || c.provider === providerFilter); + +const resolveVariables = (value) => { + if (typeof value === "string") { + return value.replaceAll("{{azureDeployment}}", process.env.AZURE_DEPLOYMENT || process.env.BIFROST_AZURE_DEPLOYMENT || "gpt-4o-mini"); + } + if (Array.isArray(value)) return value.map(resolveVariables); + if (value && typeof value === "object") { + return Object.fromEntries(Object.entries(value).map(([k, v]) => [k, resolveVariables(v)])); + } + return value; +}; + +async function runCase(testCase) { + const controller = new AbortController(); + const timer = setTimeout(() => controller.abort(new Error("stream read timeout")), readTimeoutMs); + let bytesRead = 0; + + try { + const response = await fetch(`${baseUrl}${testCase.path}`, { + method: "POST", + headers: { "content-type": "application/json" }, + body: JSON.stringify(resolveVariables(testCase.body)), + signal: controller.signal, + }); + + const contentType = response.headers.get("content-type") || ""; + if (!response.ok) { + const body = await response.text().catch(() => ""); + return { + ...testCase, + ok: false, + status: response.status, + contentType, + error: `stream returned HTTP ${response.status}: ${body.slice(0, 500)}`, + }; + } + if (!/event-stream|vnd\.amazon\.eventstream|octet-stream/i.test(contentType)) { + return { + ...testCase, + ok: false, + status: response.status, + contentType, + error: `expected streaming content-type, got ${contentType || ""}`, + }; + } + if (!response.body) { + return { ...testCase, ok: false, status: response.status, contentType, error: "response body is not readable" }; + } + + const reader = response.body.getReader(); + while (bytesRead < abortAfterBytes) { + const { done, value } = await reader.read(); + if (done) break; + bytesRead += value?.byteLength || 0; + } + + controller.abort(new Error("intentional downstream abort after first stream bytes")); + await reader.cancel("intentional downstream abort").catch(() => {}); + + return { + ...testCase, + ok: bytesRead > 0, + status: response.status, + contentType, + bytesRead, + aborted: true, + error: bytesRead > 0 ? undefined : "stream ended before any bytes were read", + }; + } catch (err) { + return { + ...testCase, + ok: false, + bytesRead, + aborted: controller.signal.aborted, + error: err?.message || String(err), + }; + } finally { + clearTimeout(timer); + } +} + +if (cases.length === 0) { + console.error(`[stream-cancel] no cases selected for provider=${providerFilter || "-"}`); + process.exit(2); +} + +const results = []; +for (const testCase of cases) { + process.stderr.write(`[stream-cancel] ${testCase.name} ... `); + const result = await runCase(testCase); + results.push(result); + process.stderr.write(result.ok ? "ok\n" : `failed (${result.error})\n`); +} + +const report = { + baseUrl, + provider: providerFilter || null, + total: results.length, + failed: results.filter((r) => !r.ok).length, + results, +}; + +writeFileSync(out, `${JSON.stringify(report, null, 2)}\n`); + +if (report.failed > 0) { + console.error(`[stream-cancel] ${report.failed}/${report.total} case(s) failed; report: ${out}`); + process.exit(1); +} + +console.error(`[stream-cancel] all ${report.total} case(s) passed; report: ${out}`); diff --git a/tests/e2e/core/fixtures/base.fixture.ts b/tests/e2e/core/fixtures/base.fixture.ts index a5e95bf0e5..2416b6893e 100644 --- a/tests/e2e/core/fixtures/base.fixture.ts +++ b/tests/e2e/core/fixtures/base.fixture.ts @@ -1,123 +1,140 @@ -import { test as base, expect } from '@playwright/test' -import { SidebarPage } from '../pages/sidebar.page' -import { ProvidersPage } from '../../features/providers/pages/providers.page' -import { VirtualKeysPage } from '../../features/virtual-keys/pages/virtual-keys.page' -import { DashboardPage } from '../../features/dashboard/pages/dashboard.page' -import { LogsPage } from '../../features/logs/pages/logs.page' -import { MCPLogsPage } from '../../features/mcp-logs/pages/mcp-logs.page' -import { RoutingRulesPage } from '../../features/routing-rules/pages/routing-rules.page' -import { MCPRegistryPage } from '../../features/mcp-registry/pages/mcp-registry.page' -import { PluginsPage } from '../../features/plugins/pages/plugins.page' -import { ObservabilityPage } from '../../features/observability/pages/observability.page' -import { ConfigSettingsPage } from '../../features/config/pages/config-settings.page' -import { GovernancePage } from '../../features/governance/pages/governance.page' -import { MCPAuthConfigPage } from '../../features/mcp-auth-config/pages/mcp-auth-config.page' -import { MCPSettingsPage } from '../../features/mcp-settings/pages/mcp-settings.page' -import { MCPToolGroupsPage } from '../../features/mcp-tool-groups/pages/mcp-tool-groups.page' -import { ModelLimitsPage } from '../../features/model-limits/pages/model-limits.page' +import { test as base, expect } from "@playwright/test"; +import { SidebarPage } from "../pages/sidebar.page"; +import { ProvidersPage } from "../../features/providers/pages/providers.page"; +import { VirtualKeysPage } from "../../features/virtual-keys/pages/virtual-keys.page"; +import { DashboardPage } from "../../features/dashboard/pages/dashboard.page"; +import { LogsPage } from "../../features/logs/pages/logs.page"; +import { MCPLogsPage } from "../../features/mcp-logs/pages/mcp-logs.page"; +import { RoutingRulesPage } from "../../features/routing-rules/pages/routing-rules.page"; +import { MCPRegistryPage } from "../../features/mcp-registry/pages/mcp-registry.page"; +import { PluginsPage } from "../../features/plugins/pages/plugins.page"; +import { ObservabilityPage } from "../../features/observability/pages/observability.page"; +import { ConfigSettingsPage } from "../../features/config/pages/config-settings.page"; +import { GovernancePage } from "../../features/governance/pages/governance.page"; +import { MCPAuthConfigPage } from "../../features/mcp-auth-config/pages/mcp-auth-config.page"; +import { MCPSettingsPage } from "../../features/mcp-settings/pages/mcp-settings.page"; +import { MCPToolGroupsPage } from "../../features/mcp-tool-groups/pages/mcp-tool-groups.page"; +import { ModelLimitsPage } from "../../features/model-limits/pages/model-limits.page"; /** * Custom test fixtures type */ type BifrostFixtures = { - closeDevProfiler: void - sidebarPage: SidebarPage - providersPage: ProvidersPage - virtualKeysPage: VirtualKeysPage - dashboardPage: DashboardPage - logsPage: LogsPage - mcpLogsPage: MCPLogsPage - routingRulesPage: RoutingRulesPage - mcpRegistryPage: MCPRegistryPage - pluginsPage: PluginsPage - observabilityPage: ObservabilityPage - configSettingsPage: ConfigSettingsPage - governancePage: GovernancePage - modelLimitsPage: ModelLimitsPage - mcpSettingsPage: MCPSettingsPage - mcpToolGroupsPage: MCPToolGroupsPage - mcpAuthConfigPage: MCPAuthConfigPage -} + closeDevProfiler: void; + sidebarPage: SidebarPage; + providersPage: ProvidersPage; + virtualKeysPage: VirtualKeysPage; + dashboardPage: DashboardPage; + logsPage: LogsPage; + mcpLogsPage: MCPLogsPage; + routingRulesPage: RoutingRulesPage; + mcpRegistryPage: MCPRegistryPage; + pluginsPage: PluginsPage; + observabilityPage: ObservabilityPage; + configSettingsPage: ConfigSettingsPage; + governancePage: GovernancePage; + modelLimitsPage: ModelLimitsPage; + mcpSettingsPage: MCPSettingsPage; + mcpToolGroupsPage: MCPToolGroupsPage; + mcpAuthConfigPage: MCPAuthConfigPage; +}; /** * Extended test with Bifrost-specific fixtures */ export const test = base.extend({ - closeDevProfiler: [async ({ page }, use) => { - // Automatically dismiss the Dev Profiler overlay whenever it appears. - // Uses addLocatorHandler so it triggers before any test action if the profiler is visible. - await page.addLocatorHandler( - page.getByText('Dev Profiler', { exact: true }), - async () => { - await page.locator('button[title="Dismiss"]').click({ force: true }) - } - ) - await use() - }, { auto: true }], - - sidebarPage: async ({ page }, use) => { - await use(new SidebarPage(page)) - }, - - providersPage: async ({ page }, use) => { - await use(new ProvidersPage(page)) - }, - - virtualKeysPage: async ({ page }, use) => { - await use(new VirtualKeysPage(page)) - }, - - dashboardPage: async ({ page }, use) => { - await use(new DashboardPage(page)) - }, - - logsPage: async ({ page }, use) => { - await use(new LogsPage(page)) - }, - - mcpLogsPage: async ({ page }, use) => { - await use(new MCPLogsPage(page)) - }, - - routingRulesPage: async ({ page }, use) => { - await use(new RoutingRulesPage(page)) - }, - - mcpRegistryPage: async ({ page }, use) => { - await use(new MCPRegistryPage(page)) - }, - - pluginsPage: async ({ page }, use) => { - await use(new PluginsPage(page)) - }, - - observabilityPage: async ({ page }, use) => { - await use(new ObservabilityPage(page)) - }, - - configSettingsPage: async ({ page }, use) => { - await use(new ConfigSettingsPage(page)) - }, - - governancePage: async ({ page }, use) => { - await use(new GovernancePage(page)) - }, - - modelLimitsPage: async ({ page }, use) => { - await use(new ModelLimitsPage(page)) - }, - - mcpSettingsPage: async ({ page }, use) => { - await use(new MCPSettingsPage(page)) - }, - - mcpToolGroupsPage: async ({ page }, use) => { - await use(new MCPToolGroupsPage(page)) - }, - - mcpAuthConfigPage: async ({ page }, use) => { - await use(new MCPAuthConfigPage(page)) - }, -}) - -export { expect } + closeDevProfiler: [ + async ({ page }, use) => { + // Keep the development profiler from stealing focus or blocking assertions when + // tests reuse a manually started dev server that was not launched with + // BIFROST_DISABLE_PROFILER=1. + await page.addInitScript(() => { + window.localStorage.setItem("devProfiler.isVisible", "false"); + window.localStorage.setItem("devProfiler.isExpanded", "false"); + }); + + await page.addLocatorHandler( + page.getByText("Dev Profiler", { exact: true }), + async () => { + await page.evaluate(() => { + window.localStorage.setItem("devProfiler.isVisible", "false"); + window.localStorage.setItem("devProfiler.isExpanded", "false"); + }); + await page + .locator('button[title="Dismiss"]') + .click({ force: true, timeout: 1000 }) + .catch(() => {}); + }, + { noWaitAfter: true }, + ); + await use(); + }, + { auto: true }, + ], + + sidebarPage: async ({ page }, use) => { + await use(new SidebarPage(page)); + }, + + providersPage: async ({ page }, use) => { + await use(new ProvidersPage(page)); + }, + + virtualKeysPage: async ({ page }, use) => { + await use(new VirtualKeysPage(page)); + }, + + dashboardPage: async ({ page }, use) => { + await use(new DashboardPage(page)); + }, + + logsPage: async ({ page }, use) => { + await use(new LogsPage(page)); + }, + + mcpLogsPage: async ({ page }, use) => { + await use(new MCPLogsPage(page)); + }, + + routingRulesPage: async ({ page }, use) => { + await use(new RoutingRulesPage(page)); + }, + + mcpRegistryPage: async ({ page }, use) => { + await use(new MCPRegistryPage(page)); + }, + + pluginsPage: async ({ page }, use) => { + await use(new PluginsPage(page)); + }, + + observabilityPage: async ({ page }, use) => { + await use(new ObservabilityPage(page)); + }, + + configSettingsPage: async ({ page }, use) => { + await use(new ConfigSettingsPage(page)); + }, + + governancePage: async ({ page }, use) => { + await use(new GovernancePage(page)); + }, + + modelLimitsPage: async ({ page }, use) => { + await use(new ModelLimitsPage(page)); + }, + + mcpSettingsPage: async ({ page }, use) => { + await use(new MCPSettingsPage(page)); + }, + + mcpToolGroupsPage: async ({ page }, use) => { + await use(new MCPToolGroupsPage(page)); + }, + + mcpAuthConfigPage: async ({ page }, use) => { + await use(new MCPAuthConfigPage(page)); + }, +}); + +export { expect }; \ No newline at end of file diff --git a/tests/e2e/core/utils/selectors.ts b/tests/e2e/core/utils/selectors.ts index d73d38b579..62cf882afa 100644 --- a/tests/e2e/core/utils/selectors.ts +++ b/tests/e2e/core/utils/selectors.ts @@ -12,7 +12,8 @@ export const Selectors = { providers: { // Sidebar list providerList: '[data-testid="provider-list"]', - providerItem: (name: string) => `[data-testid="provider-item-${name.replace(/[^a-z0-9]+/gi, "-").toLowerCase()}"]`, + providerItem: (name: string) => + `[data-testid="provider-item-${name.replace(/[^a-z0-9]+/gi, "-").toLowerCase()}"]`, addProviderBtn: '[data-testid="add-provider-btn"]', /** Add New Provider dropdown > Custom provider... (opens custom provider sheet) */ addProviderOptionCustom: '[data-testid="add-provider-option-custom"]', @@ -22,7 +23,7 @@ export const Selectors = { addKeyBtn: '[data-testid="add-key-btn"]', keysTable: '[data-testid="keys-table"]', keyRow: (name: string) => `[data-testid="key-row-${name}"]`, - + // Key form keyForm: { container: '[data-testid="key-form"]', @@ -33,7 +34,7 @@ export const Selectors = { saveBtn: '[data-testid="key-save-btn"]', cancelBtn: '[data-testid="key-cancel-btn"]', }, - + // Custom provider sheet customProviderSheet: { container: '[data-testid="custom-provider-sheet"]', @@ -51,14 +52,14 @@ export const Selectors = { table: '[data-testid="vk-table"]', row: (name: string) => `[data-testid="vk-row-${name}"]`, createBtn: '[data-testid="create-vk-btn"]', - + // Sheet/Form sheet: { - container: '[data-testid="vk-sheet"]', + container: '[data-testid="vk-sheet-content"]', nameInput: '[data-testid="vk-name-input"]', descriptionInput: '[data-testid="vk-description-input"]', isActiveToggle: '[data-testid="vk-is-active-toggle"]', - + // Provider configs providerSelect: '[data-testid="vk-provider-select"]', @@ -66,7 +67,7 @@ export const Selectors = { entityTypeSelect: '[data-testid="vk-entity-type-select"]', teamSelect: '[data-testid="vk-team-select"]', customerSelect: '[data-testid="vk-customer-select"]', - + // Actions saveBtn: '[data-testid="vk-save-btn"]', cancelBtn: '[data-testid="vk-cancel-btn"]', @@ -99,4 +100,4 @@ export const Selectors = { confirmBtn: '[data-testid="dialog-confirm-btn"]', cancelBtn: '[data-testid="dialog-cancel-btn"]', }, -} +}; diff --git a/tests/e2e/features/mcp-registry/mcp-registry.data.ts b/tests/e2e/features/mcp-registry/mcp-registry.data.ts index 488b271595..6ca4802893 100644 --- a/tests/e2e/features/mcp-registry/mcp-registry.data.ts +++ b/tests/e2e/features/mcp-registry/mcp-registry.data.ts @@ -165,3 +165,54 @@ export function createCodeModeClientData(overrides: Partial = { ...overrides, }) } + +/** + * Create HTTP client pointing at auth-demo-server (port 3002) with header auth. + * Requires auth-demo-server to be running: examples/mcps/auth-demo-server + */ +export function createHeadersAuthClientData(overrides: Partial = {}): MCPClientConfig { + return createMCPClientData({ + name: `headers_auth_client_${Date.now()}`, + connectionType: 'http', + connectionUrl: 'http://localhost:3002/', + authType: 'headers', + headers: { + 'X-API-Key': { value: 'super-secret-key', env_var: '', from_env: false }, + 'X-Tool-Token': { value: 'tool-exec-secret', env_var: '', from_env: false }, + }, + isPingAvailable: false, + ...overrides, + }) +} + +/** + * Create HTTP client pointing at oauth-demo-server (port 3003) with OAuth 2.0. + * Relies on RFC 8414 / RFC 9728 auto-discovery — no manual URLs needed. + * Requires oauth-demo-server to be running: examples/mcps/oauth-demo-server + */ +export function createOAuthClientData(overrides: Partial = {}): MCPClientConfig { + return createMCPClientData({ + name: `oauth_client_${Date.now()}`, + connectionType: 'http', + connectionUrl: 'http://localhost:3003/mcp', + authType: 'oauth', + isPingAvailable: false, + ...overrides, + }) +} + +/** + * Create HTTP client pointing at oauth-demo-server (port 3003) with Per-User OAuth 2.0. + * Requires user consent via browser before tools are accessible. + * Requires oauth-demo-server to be running: examples/mcps/oauth-demo-server + */ +export function createPerUserOAuthClientData(overrides: Partial = {}): MCPClientConfig { + return createMCPClientData({ + name: `per_user_oauth_client_${Date.now()}`, + connectionType: 'http', + connectionUrl: 'http://localhost:3003/mcp', + authType: 'per_user_oauth', + isPingAvailable: false, + ...overrides, + }) +} diff --git a/tests/e2e/features/mcp-registry/mcp-registry.spec.ts b/tests/e2e/features/mcp-registry/mcp-registry.spec.ts index 90cda086ef..ca939ff845 100644 --- a/tests/e2e/features/mcp-registry/mcp-registry.spec.ts +++ b/tests/e2e/features/mcp-registry/mcp-registry.spec.ts @@ -2,6 +2,9 @@ import { expect, test } from '../../core/fixtures/base.fixture' import { createCodeModeClientData, createHTTPClientData, + createHeadersAuthClientData, + createOAuthClientData, + createPerUserOAuthClientData, createSSEClientData, createSTDIOClientData } from './mcp-registry.data' @@ -309,20 +312,6 @@ test.describe('MCP Registry', () => { }) test.describe('Form Validation', () => { - test('should require name for client', async ({ mcpRegistryPage }) => { - await mcpRegistryPage.createBtn.click() - await expect(mcpRegistryPage.sheet).toBeVisible() - - // Clear name field (should be empty by default) - await mcpRegistryPage.nameInput.clear() - - // Save button should be disabled when name is empty - const saveBtn = mcpRegistryPage.saveBtn - await expect(saveBtn).toBeDisabled() - - await mcpRegistryPage.cancelCreation() - }) - test('should validate name format', async ({ mcpRegistryPage }) => { await mcpRegistryPage.createBtn.click() await expect(mcpRegistryPage.sheet).toBeVisible() @@ -333,9 +322,11 @@ test.describe('MCP Registry', () => { // Fill connection URL to satisfy other validation await mcpRegistryPage.connectionUrlInput.fill('http://localhost:3001') - // Save button should be disabled due to validation error - const saveBtn = mcpRegistryPage.saveBtn - await expect(saveBtn).toBeDisabled() + // Complex forms keep the action enabled and show exact inline errors on submit. + await mcpRegistryPage.saveBtn.click() + await expect( + mcpRegistryPage.page.getByText('Server name can only contain letters, numbers, and underscores') + ).toBeVisible() await mcpRegistryPage.cancelCreation() }) @@ -347,11 +338,135 @@ test.describe('MCP Registry', () => { // Fill valid name await mcpRegistryPage.nameInput.fill(`valid_name_${Date.now()}`) - // Leave connection URL empty - save should be disabled - const saveBtn = mcpRegistryPage.saveBtn - await expect(saveBtn).toBeDisabled() + // Leave connection URL empty, submit, and assert the highlighted requirement. + await mcpRegistryPage.saveBtn.click() + await expect(mcpRegistryPage.page.getByText('Connection URL is required')).toBeVisible() await mcpRegistryPage.cancelCreation() }) }) + + test.describe('MCP Header Authentication', () => { + test('should display header fields when headers auth type is selected', async ({ mcpRegistryPage }) => { + await mcpRegistryPage.createBtn.click() + await expect(mcpRegistryPage.sheet).toBeVisible() + + await mcpRegistryPage.selectAuthType('headers') + + const headersTable = mcpRegistryPage.page.locator('[data-testid="mcp-headers-table"]') + await expect(headersTable).toBeVisible() + + await mcpRegistryPage.cancelCreation() + }) + + test('should create MCP client with header auth and connect to auth-demo-server', async ({ mcpRegistryPage }) => { + const clientData = createHeadersAuthClientData() + createdClients.push(clientData.name) + + const created = await mcpRegistryPage.createClient(clientData) + expect(created).toBe(true) + + const exists = await mcpRegistryPage.clientExists(clientData.name) + expect(exists).toBe(true) + + // Server requires X-API-Key — should connect and expose tools (public_info, secret_data) + await mcpRegistryPage.viewClientDetails(clientData.name) + const toolsCount = await mcpRegistryPage.getToolsCount() + expect(toolsCount).toBeGreaterThanOrEqual(2) + + await mcpRegistryPage.closeDetailSheet() + }) + }) + + test.describe('MCP OAuth 2.0', () => { + test('should display OAuth fields when OAuth 2.0 auth type is selected', async ({ mcpRegistryPage }) => { + await mcpRegistryPage.createBtn.click() + await expect(mcpRegistryPage.sheet).toBeVisible() + + await mcpRegistryPage.selectAuthType('oauth') + + // All fields are optional — auto-discovered from server metadata + await expect(mcpRegistryPage.oauthClientIdInput).toBeVisible() + await expect(mcpRegistryPage.oauthClientSecretInput).toBeVisible() + await expect(mcpRegistryPage.oauthAuthorizeUrlInput).toBeVisible() + await expect(mcpRegistryPage.oauthTokenUrlInput).toBeVisible() + + await mcpRegistryPage.cancelCreation() + }) + + test('should create OAuth 2.0 client and complete authorization flow', async ({ mcpRegistryPage }) => { + const clientData = createOAuthClientData() + createdClients.push(clientData.name) + + const created = await mcpRegistryPage.createClient(clientData) + expect(created).toBe(true) + + // OAuth Authorization dialog appears — click "Open Authorization Window" + const openWindowBtn = mcpRegistryPage.page.locator('[data-testid="oauth-open-window-btn"]') + await expect(openWindowBtn).toBeVisible({ timeout: 10000 }) + + const [popup] = await Promise.all([ + mcpRegistryPage.page.waitForEvent('popup'), + openWindowBtn.click(), + ]) + + // Complete login in popup (oauth-demo-server login form) + await popup.waitForLoadState() + await popup.locator('#user').fill('demo-user') + await popup.getByRole('button', { name: /Sign in/i }).click() + + // Popup redirects to Bifrost callback then closes + await popup.waitForEvent('close', { timeout: 15000 }).catch(() => {}) + + // Client should now be connected and visible in table + const exists = await mcpRegistryPage.clientExists(clientData.name) + expect(exists).toBe(true) + }) + }) + + test.describe('MCP Per-User OAuth 2.0', () => { + test('should display OAuth fields when Per-User OAuth 2.0 auth type is selected', async ({ mcpRegistryPage }) => { + await mcpRegistryPage.createBtn.click() + await expect(mcpRegistryPage.sheet).toBeVisible() + + await mcpRegistryPage.selectAuthType('per_user_oauth') + + await expect(mcpRegistryPage.oauthClientIdInput).toBeVisible() + await expect(mcpRegistryPage.oauthClientSecretInput).toBeVisible() + await expect(mcpRegistryPage.oauthAuthorizeUrlInput).toBeVisible() + await expect(mcpRegistryPage.oauthTokenUrlInput).toBeVisible() + + await mcpRegistryPage.cancelCreation() + }) + + test('should create Per-User OAuth 2.0 client and complete authorization flow', async ({ mcpRegistryPage }) => { + const clientData = createPerUserOAuthClientData() + createdClients.push(clientData.name) + + const created = await mcpRegistryPage.createClient(clientData) + expect(created).toBe(true) + + // "Test OAuth Configuration" dialog appears. Per-user OAuth opens the + // authorization popup directly from this confirmation click. + const confirmBtn = mcpRegistryPage.page.locator('[data-testid="per-user-oauth-confirm"]') + await expect(confirmBtn).toBeVisible({ timeout: 10000 }) + + const [popup] = await Promise.all([ + mcpRegistryPage.page.waitForEvent('popup'), + confirmBtn.click(), + ]) + + // Complete login in popup (oauth-demo-server login form) + await popup.waitForLoadState() + await popup.locator('#user').fill('demo-user') + await popup.getByRole('button', { name: /Sign in/i }).click() + + // Popup redirects to Bifrost callback then closes + await popup.waitForEvent('close', { timeout: 15000 }).catch(() => {}) + + // Client should be visible in table + const exists = await mcpRegistryPage.clientExists(clientData.name) + expect(exists).toBe(true) + }) + }) }) diff --git a/tests/e2e/features/mcp-registry/pages/mcp-registry.page.ts b/tests/e2e/features/mcp-registry/pages/mcp-registry.page.ts index f3c9a35268..bdf1d2f69a 100644 --- a/tests/e2e/features/mcp-registry/pages/mcp-registry.page.ts +++ b/tests/e2e/features/mcp-registry/pages/mcp-registry.page.ts @@ -10,7 +10,7 @@ export type MCPConnectionType = 'http' | 'sse' | 'stdio' /** * Authentication types for HTTP/SSE connections */ -export type MCPAuthType = 'none' | 'headers' | 'oauth' +export type MCPAuthType = 'none' | 'headers' | 'oauth' | 'per_user_oauth' /** Header value shape used by API (value / env_var / from_env) */ export type EnvVarLike = { value: string; env_var?: string; from_env?: boolean } @@ -124,8 +124,22 @@ export class MCPRegistryPage extends BasePage { } async clientExists(name: string): Promise { - await this.page.waitForTimeout(500) // Brief wait for UI update - return (await this.getClientRow(name).count()) > 0 + try { + await expect(this.getClientRow(name)).toBeVisible({ timeout: 5000 }) + return true + } catch { + return false + } + } + + private async openClientActions(name: string): Promise { + const row = this.getClientRow(name) + await expect(row).toBeVisible({ timeout: 10000 }) + await row.scrollIntoViewIfNeeded() + + const actionsBtn = row.getByRole('button', { name: /MCP server actions/i }) + await actionsBtn.waitFor({ state: 'visible', timeout: 10000 }) + await actionsBtn.click() } /** @@ -174,8 +188,8 @@ export class MCPRegistryPage extends BasePage { await expect(selectTrigger).toBeVisible({ timeout: 5000 }) await selectTrigger.click() - // Select the option by data-testid - const optionTestId = `auth-type-${type}` + // Select the option by data-testid (per_user_oauth uses kebab-case in testid) + const optionTestId = `auth-type-${type.replace(/_/g, '-')}` const option = this.page.locator(`[data-testid="${optionTestId}"]`) await expect(option).toBeVisible({ timeout: 5000 }) await option.click() @@ -265,8 +279,8 @@ export class MCPRegistryPage extends BasePage { } } - // Handle OAuth config - if (config.authType === 'oauth') { + // Handle OAuth config (oauth and per_user_oauth share the same fields) + if (config.authType === 'oauth' || config.authType === 'per_user_oauth') { if (config.oauthClientId) { await this.oauthClientIdInput.fill(config.oauthClientId) } @@ -520,10 +534,17 @@ export class MCPRegistryPage extends BasePage { * Reconnect an MCP client */ async reconnectClient(name: string): Promise { - const row = this.getClientRow(name) - // Stop propagation by clicking the reconnect button directly - const reconnectBtn = row.locator('button').filter({ has: this.page.locator('svg.lucide-refresh-ccw') }) - await reconnectBtn.click() + await this.openClientActions(name) + + const reconnectMenuItem = this.page.getByRole('menuitem', { name: /Reconnect/i }) + await expect(reconnectMenuItem).toBeVisible({ timeout: 10000 }) + await reconnectMenuItem.click() + + const reconnectDialog = this.page.locator('[role="alertdialog"], [role="dialog"]').filter({ hasText: /Reconnect/i }).first() + if (await reconnectDialog.isVisible({ timeout: 1000 }).catch(() => false)) { + await reconnectDialog.getByRole('button', { name: /Reconnect|Continue|Confirm/i }).click() + } + await this.waitForSuccessToast('Reconnected') } @@ -662,12 +683,14 @@ export class MCPRegistryPage extends BasePage { * disappearing from the table after the list refetches. */ async deleteClient(name: string, options?: { requireToast?: boolean }): Promise { - const row = this.getClientRow(name) - const deleteBtn = row - .locator('button') - .filter({ has: this.page.locator('svg.lucide-trash-2') }) - .or(row.locator('button').filter({ has: this.page.locator('svg.lucide-trash') })) - await deleteBtn.click() + const exists = await this.clientExists(name) + if (!exists) return + + await this.openClientActions(name) + + const deleteMenuItem = this.page.getByRole('menuitem', { name: /Delete/i }) + await expect(deleteMenuItem).toBeVisible({ timeout: 10000 }) + await deleteMenuItem.click() const confirmDialog = this.page.locator('[role="alertdialog"]') await expect(confirmDialog).toBeVisible({ timeout: 5000 }) diff --git a/tests/e2e/features/model-limits/pages/model-limits.page.ts b/tests/e2e/features/model-limits/pages/model-limits.page.ts index f23be8b111..29597912c1 100644 --- a/tests/e2e/features/model-limits/pages/model-limits.page.ts +++ b/tests/e2e/features/model-limits/pages/model-limits.page.ts @@ -41,7 +41,35 @@ export class ModelLimitsPage extends BasePage { async modelLimitExists(modelName: string, provider: string = 'all'): Promise { const row = this.getModelLimitRow(modelName, provider) - return (await row.count()) > 0 + return await row.isVisible({ timeout: 5000 }).catch(() => false) + } + + private async openModelLimitActions(modelName: string, provider: string = 'all'): Promise { + const row = this.getModelLimitRow(modelName, provider) + await expect(row).toBeVisible({ timeout: 10000 }) + await row.scrollIntoViewIfNeeded() + + const actionsBtn = this.page.getByTestId( + `model-limit-button-actions-${toTestIdPart(modelName)}-${toTestIdPart(provider)}` + ) + await actionsBtn.waitFor({ state: 'visible', timeout: 10000 }) + await actionsBtn.scrollIntoViewIfNeeded() + await actionsBtn.click() + } + + private async waitForSheetClosedAfterSave(): Promise { + const closed = await expect(this.sheet) + .not.toBeVisible({ timeout: 5000 }) + .then(() => true) + .catch(() => false) + if (!closed) { + await this.page.keyboard.press('Escape') + await expect(this.sheet).not.toBeVisible({ timeout: 5000 }) + } + + await expect(this.page.locator('html')) + .not.toHaveClass(/bprogress-busy/, { timeout: 10000 }) + .catch(() => {}) } /** @@ -84,14 +112,17 @@ export class ModelLimitsPage extends BasePage { } const saveBtn = this.page.getByRole('button', { name: /Create Limit/i }) + await expect(saveBtn).toBeEnabled({ timeout: 10000 }) await saveBtn.click() - await this.waitForSuccessToast() - await expect(this.sheet).not.toBeVisible({ timeout: 10000 }) + await this.waitForSheetClosedAfterSave() + await expect(this.getModelLimitRow(selectedModelName, config.provider)).toBeVisible({ timeout: 15000 }) return selectedModelName } async editModelLimit(modelName: string, provider: string, updates: Partial): Promise { + await this.openModelLimitActions(modelName, provider) const editBtn = this.page.getByTestId(`model-limit-button-edit-${toTestIdPart(modelName)}-${toTestIdPart(provider)}`) + await editBtn.waitFor({ state: 'visible', timeout: 10000 }) await editBtn.click() await expect(this.sheet).toBeVisible({ timeout: 5000 }) await this.waitForSheetAnimation() @@ -114,19 +145,24 @@ export class ModelLimitsPage extends BasePage { } const saveBtn = this.page.getByRole('button', { name: /Save Changes|Create Limit/i }) + await expect(saveBtn).toBeEnabled({ timeout: 10000 }) await saveBtn.click() - await this.waitForSuccessToast() - await expect(this.sheet).not.toBeVisible({ timeout: 10000 }) + await this.waitForSheetClosedAfterSave() + await expect(this.getModelLimitRow(modelName, provider)).toBeVisible({ timeout: 15000 }) } async deleteModelLimit(modelName: string, provider: string = 'all'): Promise { + const exists = await this.modelLimitExists(modelName, provider) + if (!exists) return + + await this.openModelLimitActions(modelName, provider) const deleteBtn = this.page.getByTestId(`model-limit-button-delete-${toTestIdPart(modelName)}-${toTestIdPart(provider)}`) + await deleteBtn.waitFor({ state: 'visible', timeout: 10000 }) await deleteBtn.click() const confirmDialog = this.page.locator('[role="alertdialog"]') await confirmDialog.getByRole('button', { name: /Delete/i }).click() - await this.waitForSuccessToast() - await this.page.waitForTimeout(1000) + await expect(this.getModelLimitRow(modelName, provider)).not.toBeVisible({ timeout: 15000 }) } async closeSheet(): Promise { diff --git a/tests/e2e/features/providers/providers.spec.ts b/tests/e2e/features/providers/providers.spec.ts index b70eca6296..0f86899d07 100644 --- a/tests/e2e/features/providers/providers.spec.ts +++ b/tests/e2e/features/providers/providers.spec.ts @@ -1,785 +1,979 @@ -import { expect, test } from '../../core/fixtures/base.fixture'; -import { createCustomProviderData, createProviderKeyData } from './providers.data'; +import { expect, test } from "../../core/fixtures/base.fixture"; +import { + createCustomProviderData, + createProviderKeyData, +} from "./providers.data"; // Track created resources for cleanup -const createdKeys: { provider: string; keyName: string }[] = [] -const createdProviders: string[] = [] +const createdKeys: { provider: string; keyName: string }[] = []; +const createdProviders: string[] = []; -test.describe('Providers', () => { - test.describe.configure({ mode: 'serial' }) +test.describe("Providers", () => { + test.describe.configure({ mode: "serial" }); test.beforeEach(async ({ providersPage }) => { - await providersPage.goto() - }) + await providersPage.goto(); + }); test.afterEach(async ({ providersPage }) => { // Clean up any keys created during tests for (const { provider, keyName } of [...createdKeys]) { try { - await providersPage.selectProvider(provider) - const exists = await providersPage.keyExists(keyName, 2000) + await providersPage.selectProvider(provider); + const exists = await providersPage.keyExists(keyName, 2000); if (exists) { - await providersPage.deleteKey(keyName) + await providersPage.deleteKey(keyName); } } catch (error) { - const errorMsg = error instanceof Error ? error.message : String(error) - console.error(`[CLEANUP ERROR] Failed to delete provider key ${provider}/${keyName}: ${errorMsg}`) + const errorMsg = error instanceof Error ? error.message : String(error); + console.error( + `[CLEANUP ERROR] Failed to delete provider key ${provider}/${keyName}: ${errorMsg}`, + ); } } - createdKeys.length = 0 + createdKeys.length = 0; // Clean up any custom providers created during tests (skip toast wait so cleanup does not fail if toast is missing) for (const providerName of [...createdProviders]) { try { - await providersPage.deleteProvider(providerName, { skipToastWait: true }) + await providersPage.deleteProvider(providerName, { + skipToastWait: true, + }); } catch (error) { - const errorMsg = error instanceof Error ? error.message : String(error) - console.error(`[CLEANUP ERROR] Failed to delete provider ${providerName}: ${errorMsg}`) + const errorMsg = error instanceof Error ? error.message : String(error); + console.error( + `[CLEANUP ERROR] Failed to delete provider ${providerName}: ${errorMsg}`, + ); } } - createdProviders.length = 0 - }) + createdProviders.length = 0; + }); - test.describe('Provider Navigation', () => { - test('should display standard providers in sidebar', async ({ providersPage }) => { + test.describe("Provider Navigation", () => { + test("should display standard providers in sidebar", async ({ + providersPage, + }) => { // Check that OpenAI provider is visible - const openaiProvider = providersPage.getProviderItem('openai') - await expect(openaiProvider).toBeVisible() + const openaiProvider = providersPage.getProviderItem("openai"); + await expect(openaiProvider).toBeVisible(); // Check that Anthropic provider is visible - const anthropicProvider = providersPage.getProviderItem('anthropic') - await expect(anthropicProvider).toBeVisible() - }) + const anthropicProvider = providersPage.getProviderItem("anthropic"); + await expect(anthropicProvider).toBeVisible(); + }); - test('should select a provider from the sidebar', async ({ providersPage }) => { - await providersPage.selectProvider('openai') + test("should select a provider from the sidebar", async ({ + providersPage, + }) => { + await providersPage.selectProvider("openai"); // Verify URL contains provider param - await expect(providersPage.page).toHaveURL(/provider=openai/) - }) + await expect(providersPage.page).toHaveURL(/provider=openai/); + }); - test('should switch between providers', async ({ providersPage }) => { + test("should switch between providers", async ({ providersPage }) => { // Select OpenAI first - await providersPage.selectProvider('openai') - await expect(providersPage.page).toHaveURL(/provider=openai/) + await providersPage.selectProvider("openai"); + await expect(providersPage.page).toHaveURL(/provider=openai/); // Switch to Anthropic - await providersPage.selectProvider('anthropic') - await expect(providersPage.page).toHaveURL(/provider=anthropic/) - }) - }) - - test.describe('Provider Keys', () => { - test('should add a new key to OpenAI provider', async ({ providersPage }) => { + await providersPage.selectProvider("anthropic"); + await expect(providersPage.page).toHaveURL(/provider=anthropic/); + }); + }); + + test.describe("Provider Keys", () => { + test("should add a new key to OpenAI provider", async ({ + providersPage, + }) => { // Select OpenAI provider - await providersPage.selectProvider('openai') + await providersPage.selectProvider("openai"); // Create test key data with unique name (no spaces for easier locating) const keyData = createProviderKeyData({ name: `E2E-Test-Key-${Date.now()}`, - value: 'sk-test-e2e-key-12345', + value: "sk-test-e2e-key-12345", weight: 1.0, - }) + }); // Track for cleanup - createdKeys.push({ provider: 'openai', keyName: keyData.name }) + createdKeys.push({ provider: "openai", keyName: keyData.name }); // Add the key - await providersPage.addKey(keyData) + await providersPage.addKey(keyData); // Verify key appears in table (with waiting) - const keyExists = await providersPage.keyExists(keyData.name) - expect(keyExists).toBe(true) - }) + const keyExists = await providersPage.keyExists(keyData.name); + expect(keyExists).toBe(true); + }); - test('should add a key with custom weight', async ({ providersPage }) => { - await providersPage.selectProvider('openai') + test("should add a key with custom weight", async ({ providersPage }) => { + await providersPage.selectProvider("openai"); const keyData = createProviderKeyData({ name: `Weight-Key-${Date.now()}`, - value: 'sk-test-weight-key-12345', + value: "sk-test-weight-key-12345", weight: 0.5, - }) + }); // Track for cleanup - createdKeys.push({ provider: 'openai', keyName: keyData.name }) + createdKeys.push({ provider: "openai", keyName: keyData.name }); - await providersPage.addKey(keyData) + await providersPage.addKey(keyData); - const keyExists = await providersPage.keyExists(keyData.name) - expect(keyExists).toBe(true) - }) + const keyExists = await providersPage.keyExists(keyData.name); + expect(keyExists).toBe(true); + }); - test('should display empty state when no keys configured', async ({ providersPage }) => { + test("should display empty state when no keys configured", async ({ + providersPage, + }) => { // Add Nebius from the dropdown if not already in sidebar (created with no keys) - if (!(await providersPage.providerExists('nebius'))) { - await providersPage.addKnownProviderFromDropdown('nebius') - createdProviders.push('nebius') + if (!(await providersPage.providerExists("nebius"))) { + await providersPage.addKnownProviderFromDropdown("nebius"); + createdProviders.push("nebius"); } // Select Nebius (it has zero keys) - const providerItem = providersPage.getProviderItem('nebius') - await expect(providerItem).toBeVisible({ timeout: 15000 }) - await providersPage.selectProvider('nebius') - const keyCount = await providersPage.getKeyCount() - expect(keyCount).toBe(0) + const providerItem = providersPage.getProviderItem("nebius"); + await expect(providerItem).toBeVisible({ timeout: 15000 }); + await providersPage.selectProvider("nebius"); + const keyCount = await providersPage.getKeyCount(); + expect(keyCount).toBe(0); // Empty state row should be visible - await expect(providersPage.keysTableEmptyState).toBeVisible() - }) - }) + await expect(providersPage.keysTableEmptyState).toBeVisible(); + }); + }); - test.describe('Custom Providers', () => { - test('should open custom provider creation sheet', async ({ providersPage }) => { - await providersPage.openCustomProviderSheet() + test.describe("Custom Providers", () => { + test("should open custom provider creation sheet", async ({ + providersPage, + }) => { + await providersPage.openCustomProviderSheet(); // Verify form fields are present - await expect(providersPage.customProviderNameInput).toBeVisible() - await expect(providersPage.baseProviderSelect).toBeVisible() - await expect(providersPage.baseUrlInput).toBeVisible() - }) - - test('should create a custom OpenAI-compatible provider', async ({ providersPage }) => { + await expect(providersPage.customProviderNameInput).toBeVisible(); + await expect(providersPage.baseProviderSelect).toBeVisible(); + await expect(providersPage.baseUrlInput).toBeVisible(); + }); + + test("should create a custom OpenAI-compatible provider", async ({ + providersPage, + }) => { const providerData = createCustomProviderData({ name: `test-openai-${Date.now()}`, - baseProviderType: 'openai', - baseUrl: 'https://api.test-provider.com/v1', - }) + baseProviderType: "openai", + baseUrl: "https://api.test-provider.com/v1", + }); // Track for cleanup - createdProviders.push(providerData.name) + createdProviders.push(providerData.name); - await providersPage.createProvider(providerData) + await providersPage.createProvider(providerData); // Wait for provider to appear in sidebar - const providerItem = providersPage.getProviderItem(providerData.name) - await expect(providerItem).toBeVisible({ timeout: 15000 }) - }) + const providerItem = providersPage.getProviderItem(providerData.name); + await expect(providerItem).toBeVisible({ timeout: 15000 }); + }); - test('should create a custom Anthropic-compatible provider', async ({ providersPage }) => { + test("should create a custom Anthropic-compatible provider", async ({ + providersPage, + }) => { const providerData = createCustomProviderData({ name: `test-anthropic-${Date.now()}`, - baseProviderType: 'anthropic', - baseUrl: 'https://api.anthropic-proxy.com', - }) + baseProviderType: "anthropic", + baseUrl: "https://api.anthropic-proxy.com", + }); // Track for cleanup - createdProviders.push(providerData.name) + createdProviders.push(providerData.name); - await providersPage.createProvider(providerData) + await providersPage.createProvider(providerData); // Wait for provider to appear in sidebar - const providerItem = providersPage.getProviderItem(providerData.name) - await expect(providerItem).toBeVisible({ timeout: 15000 }) - }) + const providerItem = providersPage.getProviderItem(providerData.name); + await expect(providerItem).toBeVisible({ timeout: 15000 }); + }); - test('should cancel custom provider creation', async ({ providersPage }) => { - await providersPage.openCustomProviderSheet() + test("should cancel custom provider creation", async ({ + providersPage, + }) => { + await providersPage.openCustomProviderSheet(); // Fill some data - await providersPage.customProviderNameInput.fill('cancelled-provider') + await providersPage.customProviderNameInput.fill("cancelled-provider"); // Cancel - await providersPage.customProviderCancelBtn.click() + await providersPage.customProviderCancelBtn.click(); // Sheet should close - await expect(providersPage.customProviderSheet).not.toBeVisible() + await expect(providersPage.customProviderSheet).not.toBeVisible(); // Provider should not exist - const providerExists = await providersPage.providerExists('cancelled-provider') - expect(providerExists).toBe(false) - }) - - test('should delete custom provider and update UI', async ({ providersPage }) => { + const providerExists = + await providersPage.providerExists("cancelled-provider"); + expect(providerExists).toBe(false); + }); + + test("should delete custom provider and update UI", async ({ + providersPage, + }) => { const providerData = createCustomProviderData({ name: `delete-test-${Date.now()}`, - baseProviderType: 'openai', - baseUrl: 'https://api.delete-test.com/v1', - }) - createdProviders.push(providerData.name) + baseProviderType: "openai", + baseUrl: "https://api.delete-test.com/v1", + }); + createdProviders.push(providerData.name); - await providersPage.createProvider(providerData) + await providersPage.createProvider(providerData); - const providerItem = providersPage.getProviderItem(providerData.name) - await expect(providerItem).toBeVisible({ timeout: 15000 }) + const providerItem = providersPage.getProviderItem(providerData.name); + await expect(providerItem).toBeVisible({ timeout: 15000 }); - await providersPage.deleteProvider(providerData.name, { skipToastWait: true }) + await providersPage.deleteProvider(providerData.name, { + skipToastWait: true, + }); - const idx = createdProviders.indexOf(providerData.name) - if (idx >= 0) createdProviders.splice(idx, 1) + const idx = createdProviders.indexOf(providerData.name); + if (idx >= 0) createdProviders.splice(idx, 1); // Assert provider is no longer in the configured providers list (do not rely on toast) - await expect(providerItem).not.toBeVisible({ timeout: 5000 }) - }) - }) + await expect(providerItem).not.toBeVisible({ timeout: 5000 }); + }); + }); - test.describe('Form Validation', () => { - test('should require name for custom provider', async ({ providersPage }) => { - await providersPage.openCustomProviderSheet() + test.describe("Form Validation", () => { + test("should require name for custom provider", async ({ + providersPage, + }) => { + await providersPage.openCustomProviderSheet(); // Try to save without name - await providersPage.baseUrlInput.fill('https://api.example.com') + await providersPage.baseUrlInput.fill("https://api.example.com"); // The save button should be disabled or show error - const saveBtn = providersPage.customProviderSaveBtn - await saveBtn.click() + const saveBtn = providersPage.customProviderSaveBtn; + await saveBtn.click(); // Form should still be visible (not submitted) - await expect(providersPage.customProviderSheet).toBeVisible() - }) + await expect(providersPage.customProviderSheet).toBeVisible(); + }); - test('should require base URL for custom provider', async ({ providersPage }) => { - await providersPage.openCustomProviderSheet() + test("should require base URL for custom provider", async ({ + providersPage, + }) => { + await providersPage.openCustomProviderSheet(); // Fill only name - await providersPage.customProviderNameInput.fill('test-provider') + await providersPage.customProviderNameInput.fill("test-provider"); // Try to save - await providersPage.customProviderSaveBtn.click() + await providersPage.customProviderSaveBtn.click(); // Form should still be visible - await expect(providersPage.customProviderSheet).toBeVisible() - }) - }) -}) + await expect(providersPage.customProviderSheet).toBeVisible(); + }); + }); +}); -test.describe('Provider Key Management', () => { - test.describe.configure({ mode: 'serial' }) +test.describe("Provider Key Management", () => { + test.describe.configure({ mode: "serial" }); // Track keys for cleanup in this test suite - const managementKeys: string[] = [] + const managementKeys: string[] = []; test.beforeEach(async ({ providersPage }) => { - await providersPage.goto() - await providersPage.selectProvider('openai') - }) + await providersPage.goto(); + await providersPage.selectProvider("openai"); + }); test.afterEach(async ({ providersPage }) => { // Clean up any keys created during tests for (const keyName of [...managementKeys]) { try { - const exists = await providersPage.keyExists(keyName, 2000) + const exists = await providersPage.keyExists(keyName, 2000); if (exists) { - await providersPage.deleteKey(keyName) + await providersPage.deleteKey(keyName); } } catch (error) { - const errorMsg = error instanceof Error ? error.message : String(error) - console.error(`[CLEANUP ERROR] Failed to delete provider key ${keyName}: ${errorMsg}`) + const errorMsg = error instanceof Error ? error.message : String(error); + console.error( + `[CLEANUP ERROR] Failed to delete provider key ${keyName}: ${errorMsg}`, + ); } } - managementKeys.length = 0 - }) + managementKeys.length = 0; + }); - test('should edit an existing key', async ({ providersPage }) => { + test("should edit an existing key", async ({ providersPage }) => { // First add a key const keyData = createProviderKeyData({ name: `Edit-Test-Key-${Date.now()}`, - value: 'sk-test-edit-key', - }) + value: "sk-test-edit-key", + }); // Track for cleanup - managementKeys.push(keyData.name) + managementKeys.push(keyData.name); - await providersPage.addKey(keyData) + await providersPage.addKey(keyData); // Now edit it - set weight to 0.7 await providersPage.editKey(keyData.name, { weight: 0.7, - }) + }); // Verify weight was saved and displayed (wait for table to refresh after save) - const keyRow = providersPage.getKeyRow(keyData.name) - await expect(keyRow.getByTestId('key-weight-value')).toContainText('0.7', { timeout: 10000 }) - }) + const keyRow = providersPage.getKeyRow(keyData.name); + await expect(keyRow.getByTestId("key-weight-value")).toContainText("0.7", { + timeout: 10000, + }); + }); - test('should delete a key', async ({ providersPage }) => { + test("should delete a key", async ({ providersPage }) => { // First add a key const keyData = createProviderKeyData({ name: `Delete-Test-Key-${Date.now()}`, - value: 'sk-test-delete-key', - }) + value: "sk-test-delete-key", + }); // Don't track for cleanup - we're testing delete - await providersPage.addKey(keyData) + await providersPage.addKey(keyData); // Verify it exists - let keyExists = await providersPage.keyExists(keyData.name) - expect(keyExists).toBe(true) + let keyExists = await providersPage.keyExists(keyData.name); + expect(keyExists).toBe(true); // Delete it - await providersPage.deleteKey(keyData.name) + await providersPage.deleteKey(keyData.name); // Verify it's gone (use short timeout since we expect it to be gone) - keyExists = await providersPage.keyExists(keyData.name, 1000) - expect(keyExists).toBe(false) - }) + keyExists = await providersPage.keyExists(keyData.name, 1000); + expect(keyExists).toBe(false); + }); - test('should toggle key enabled state', async ({ providersPage }) => { + test("should toggle key enabled state", async ({ providersPage }) => { // First add a key const keyData = createProviderKeyData({ name: `Toggle-Test-Key-${Date.now()}`, - value: 'sk-test-toggle-key', - }) + value: "sk-test-toggle-key", + }); // Track for cleanup - managementKeys.push(keyData.name) + managementKeys.push(keyData.name); - await providersPage.addKey(keyData) + await providersPage.addKey(keyData); // Key starts enabled - let isEnabled = await providersPage.getKeyEnabledState(keyData.name) - expect(isEnabled).toBe(true) + let isEnabled = await providersPage.getKeyEnabledState(keyData.name); + expect(isEnabled).toBe(true); // Toggle to disabled - await providersPage.toggleKeyEnabled(keyData.name) - await providersPage.page.waitForTimeout(9000) - isEnabled = await providersPage.getKeyEnabledState(keyData.name) - expect(isEnabled).toBe(false) - }) -}) - -test.describe('Provider Configuration', () => { + await providersPage.toggleKeyEnabled(keyData.name); + await expect + .poll(async () => providersPage.getKeyEnabledState(keyData.name), { + timeout: 10000, + }) + .toBe(false); + isEnabled = await providersPage.getKeyEnabledState(keyData.name); + expect(isEnabled).toBe(false); + }); + + test("should display Allowed Models field when adding a key", async ({ + providersPage, + }) => { + await providersPage.addKeyBtn.click(); + await expect(providersPage.keyForm).toBeVisible(); + + const allowedModels = providersPage.page.getByTestId( + "api-keys-models-multiselect", + ); + await expect(allowedModels).toBeVisible(); + + // "All Models" should be selected by default + await expect(allowedModels.getByText("All Models")).toBeVisible(); + + await providersPage.keyCancelBtn.click(); + }); + + test("should display Blocked Models field when adding a key", async ({ + providersPage, + }) => { + await providersPage.addKeyBtn.click(); + await expect(providersPage.keyForm).toBeVisible(); + + const blockedModelsField = providersPage.page.getByTestId( + "apikey-blacklisted-models-field", + ); + await expect(blockedModelsField).toBeVisible(); + + const blockedModelsMultiselect = providersPage.page.getByTestId( + "api-keys-blocked-models-multiselect", + ); + await expect(blockedModelsMultiselect).toBeVisible(); + + await providersPage.keyCancelBtn.click(); + }); +}); + +test.describe("Provider Configuration", () => { test.beforeEach(async ({ providersPage }) => { - await providersPage.goto() - }) + await providersPage.goto(); + }); - test('should view provider configuration', async ({ providersPage }) => { + test("should view provider configuration", async ({ providersPage }) => { // Select OpenAI provider - await providersPage.selectProvider('openai') + await providersPage.selectProvider("openai"); // Should see the provider's key table - await expect(providersPage.keysTable).toBeVisible() + await expect(providersPage.keysTable).toBeVisible(); // Should see the add key button - await expect(providersPage.addKeyBtn).toBeVisible() - }) + await expect(providersPage.addKeyBtn).toBeVisible(); + }); - test('should show provider models list', async ({ providersPage }) => { + test("should show provider models list", async ({ providersPage }) => { // Select OpenAI provider - await providersPage.selectProvider('openai') + await providersPage.selectProvider("openai"); // Models section should be visible for selected provider - const modelsSection = providersPage.page.getByText(/Models/i).first() - await expect(modelsSection).toBeVisible() - }) -}) + const modelsSection = providersPage.page.getByText(/Models/i).first(); + await expect(modelsSection).toBeVisible(); + }); +}); -test.describe('Performance Tuning', () => { +test.describe("Performance Tuning", () => { test.beforeEach(async ({ providersPage }) => { - await providersPage.goto() - await providersPage.selectProvider('openai') - }) + await providersPage.goto(); + await providersPage.selectProvider("openai"); + }); - test('should display performance tuning tab', async ({ providersPage }) => { - await providersPage.selectConfigTab('performance') + test("should display performance tuning tab", async ({ providersPage }) => { + await providersPage.selectConfigTab("performance"); // Should see concurrency and buffer size inputs - await expect(providersPage.getConcurrencyInput()).toBeVisible() - await expect(providersPage.getBufferSizeInput()).toBeVisible() - }) + await expect(providersPage.getConcurrencyInput()).toBeVisible(); + await expect(providersPage.getBufferSizeInput()).toBeVisible(); + }); - test('should display raw request/response toggles', async ({ providersPage }) => { - await providersPage.selectConfigTab('debugging') + test("should display raw request/response toggles", async ({ + providersPage, + }) => { + await providersPage.selectConfigTab("debugging"); // Should see raw request and response toggles (Debugging tab labels) - const rawRequestLabel = providersPage.page.getByText('Send Back Raw Request') - const rawResponseLabel = providersPage.page.getByText('Send Back Raw Response') + const rawRequestLabel = providersPage.page.getByText( + "Send Back Raw Request", + ); + const rawResponseLabel = providersPage.page.getByText( + "Send Back Raw Response", + ); - await expect(rawRequestLabel).toBeVisible() - await expect(rawResponseLabel).toBeVisible() - }) + await expect(rawRequestLabel).toBeVisible(); + await expect(rawResponseLabel).toBeVisible(); + }); - test('should update concurrency value', async ({ providersPage }) => { - await providersPage.selectConfigTab('performance') + test("should update concurrency value", async ({ providersPage }) => { + await providersPage.selectConfigTab("performance"); - const concurrencyInput = providersPage.getConcurrencyInput() - const originalValue = await concurrencyInput.inputValue() + const concurrencyInput = providersPage.getConcurrencyInput(); + const originalValue = await concurrencyInput.inputValue(); // Use a small value that is always <= buffer size - const newValue = '5' + const newValue = "5"; - await providersPage.fillNumberInput(concurrencyInput, newValue) + await providersPage.fillNumberInput(concurrencyInput, newValue); // Verify value changed - const currentValue = await concurrencyInput.inputValue() - expect(currentValue).toBe(newValue) + const currentValue = await concurrencyInput.inputValue(); + expect(currentValue).toBe(newValue); // Blur the input - await concurrencyInput.blur() + await concurrencyInput.blur(); // No validation error should appear - await expect(providersPage.page.getByText('Concurrency must be a number')).not.toBeVisible() - await expect(providersPage.page.getByText('Concurrency must be greater than 0')).not.toBeVisible() - await expect(providersPage.page.getByText('Concurrency must be less than or equal to buffer size')).not.toBeVisible() + await expect( + providersPage.page.getByText("Concurrency must be a number"), + ).not.toBeVisible(); + await expect( + providersPage.page.getByText("Concurrency must be greater than 0"), + ).not.toBeVisible(); + await expect( + providersPage.page.getByText( + "Concurrency must be less than or equal to buffer size", + ), + ).not.toBeVisible(); // Save and verify success - const saveBtn = providersPage.getConfigSaveBtn('performance') - await expect(saveBtn).toBeEnabled() - await providersPage.savePerformanceConfig() + const saveBtn = providersPage.getConfigSaveBtn("performance"); + await expect(saveBtn).toBeEnabled(); + await providersPage.savePerformanceConfig(); // Verify value persisted after save (reload would be ideal but we restore instead) - const afterSaveValue = await concurrencyInput.inputValue() - expect(afterSaveValue).toBe(newValue) + const afterSaveValue = await concurrencyInput.inputValue(); + expect(afterSaveValue).toBe(newValue); // Restore original value - await providersPage.fillNumberInput(concurrencyInput, originalValue) + await providersPage.fillNumberInput(concurrencyInput, originalValue); // Blur the input - await concurrencyInput.blur() - await providersPage.savePerformanceConfig() - }) + await concurrencyInput.blur(); + await providersPage.savePerformanceConfig(); + }); - test('should update buffer size value', async ({ providersPage }) => { - await providersPage.selectConfigTab('performance') + test("should update buffer size value", async ({ providersPage }) => { + await providersPage.selectConfigTab("performance"); - const bufferSizeInput = providersPage.getBufferSizeInput() - const originalValue = await bufferSizeInput.inputValue() + const bufferSizeInput = providersPage.getBufferSizeInput(); + const originalValue = await bufferSizeInput.inputValue(); // Use a large value that is always >= concurrency - const newValue = '6000' + const newValue = "6000"; - await providersPage.fillNumberInput(bufferSizeInput, newValue) + await providersPage.fillNumberInput(bufferSizeInput, newValue); // Verify value changed - const currentValue = await bufferSizeInput.inputValue() - expect(currentValue).toBe(newValue) + const currentValue = await bufferSizeInput.inputValue(); + expect(currentValue).toBe(newValue); // Blur the input - await bufferSizeInput.blur() + await bufferSizeInput.blur(); // No validation error should appear - await expect(providersPage.page.getByText('Buffer size must be a number')).not.toBeVisible() - await expect(providersPage.page.getByText('Buffer size must be greater than 0')).not.toBeVisible() - await expect(providersPage.page.getByText('Concurrency must be less than or equal to buffer size')).not.toBeVisible() + await expect( + providersPage.page.getByText("Buffer size must be a number"), + ).not.toBeVisible(); + await expect( + providersPage.page.getByText("Buffer size must be greater than 0"), + ).not.toBeVisible(); + await expect( + providersPage.page.getByText( + "Concurrency must be less than or equal to buffer size", + ), + ).not.toBeVisible(); // Save and verify success - const saveBtn = providersPage.getConfigSaveBtn('performance') - await expect(saveBtn).toBeEnabled() - await providersPage.savePerformanceConfig() + const saveBtn = providersPage.getConfigSaveBtn("performance"); + await expect(saveBtn).toBeEnabled(); + await providersPage.savePerformanceConfig(); // Restore original value - await providersPage.fillNumberInput(bufferSizeInput, originalValue) + await providersPage.fillNumberInput(bufferSizeInput, originalValue); // Blur the input - await bufferSizeInput.blur() - await providersPage.savePerformanceConfig() - }) + await bufferSizeInput.blur(); + await providersPage.savePerformanceConfig(); + }); - test('should toggle and save raw request/response', async ({ providersPage }) => { - await providersPage.selectConfigTab('debugging') + test("should toggle and save raw request/response", async ({ + providersPage, + }) => { + await providersPage.selectConfigTab("debugging"); - const rawRequestSwitch = providersPage.getRawRequestSwitch() - const rawResponseSwitch = providersPage.getRawResponseSwitch() + const rawRequestSwitch = providersPage.getRawRequestSwitch(); + const rawResponseSwitch = providersPage.getRawResponseSwitch(); // Capture original states - const originalRawRequest = await rawRequestSwitch.getAttribute('data-state') === 'checked' - const originalRawResponse = await rawResponseSwitch.getAttribute('data-state') === 'checked' + const originalRawRequest = + (await rawRequestSwitch.getAttribute("data-state")) === "checked"; + const originalRawResponse = + (await rawResponseSwitch.getAttribute("data-state")) === "checked"; // Toggle both switches - await rawRequestSwitch.click() - await rawResponseSwitch.click() + await rawRequestSwitch.click(); + await rawResponseSwitch.click(); // Save and verify success - const saveBtn = providersPage.getConfigSaveBtn('debugging') - await expect(saveBtn).toBeEnabled() - await providersPage.saveDebuggingConfig() + const saveBtn = providersPage.getConfigSaveBtn("debugging"); + await expect(saveBtn).toBeEnabled(); + await providersPage.saveDebuggingConfig(); // Restore original states - const currentRawRequest = await rawRequestSwitch.getAttribute('data-state') === 'checked' - const currentRawResponse = await rawResponseSwitch.getAttribute('data-state') === 'checked' + const currentRawRequest = + (await rawRequestSwitch.getAttribute("data-state")) === "checked"; + const currentRawResponse = + (await rawResponseSwitch.getAttribute("data-state")) === "checked"; if (currentRawRequest !== originalRawRequest) { - await rawRequestSwitch.click() + await rawRequestSwitch.click(); } if (currentRawResponse !== originalRawResponse) { - await rawResponseSwitch.click() + await rawResponseSwitch.click(); } - await providersPage.saveDebuggingConfig() - }) -}) + await providersPage.saveDebuggingConfig(); + }); +}); -test.describe('Proxy Configuration', () => { +test.describe("Proxy Configuration", () => { test.beforeEach(async ({ providersPage }) => { - await providersPage.goto() - await providersPage.selectProvider('openai') - }) + await providersPage.goto(); + await providersPage.selectProvider("openai"); + }); - test('should display proxy config tab', async ({ providersPage }) => { - await providersPage.selectConfigTab('proxy') + test("should display proxy config tab", async ({ providersPage }) => { + await providersPage.selectConfigTab("proxy"); // Should see proxy type selector - const proxyTypeLabel = providersPage.page.getByText('Proxy Type') - await expect(proxyTypeLabel).toBeVisible() - }) + const proxyTypeLabel = providersPage.page.getByText("Proxy Type"); + await expect(proxyTypeLabel).toBeVisible(); + }); - test('should show proxy type options', async ({ providersPage }) => { - await providersPage.selectConfigTab('proxy') + test("should show proxy type options", async ({ providersPage }) => { + await providersPage.selectConfigTab("proxy"); // Open the proxy type dropdown - const proxySelect = providersPage.getProxyTypeSelect() - await proxySelect.click() + const proxySelect = providersPage.getProxyTypeSelect(); + await proxySelect.click(); // Should see HTTP, SOCKS5, Environment options - await expect(providersPage.page.getByRole('option', { name: /HTTP/i })).toBeVisible() - await expect(providersPage.page.getByRole('option', { name: /SOCKS5/i })).toBeVisible() - await expect(providersPage.page.getByRole('option', { name: /Environment/i })).toBeVisible() + await expect( + providersPage.page.getByRole("option", { name: /HTTP/i }), + ).toBeVisible(); + await expect( + providersPage.page.getByRole("option", { name: /SOCKS5/i }), + ).toBeVisible(); + await expect( + providersPage.page.getByRole("option", { name: /Environment/i }), + ).toBeVisible(); // Close dropdown - await providersPage.page.keyboard.press('Escape') - }) + await providersPage.page.keyboard.press("Escape"); + }); - test('should show URL fields when HTTP proxy selected', async ({ providersPage }) => { - await providersPage.selectConfigTab('proxy') + test("should show URL fields when HTTP proxy selected", async ({ + providersPage, + }) => { + await providersPage.selectConfigTab("proxy"); // Select HTTP proxy type - const proxySelect = providersPage.getProxyTypeSelect() - await proxySelect.click() - await providersPage.page.getByRole('option', { name: /HTTP/i }).click() + const proxySelect = providersPage.getProxyTypeSelect(); + await proxySelect.click(); + await providersPage.page.getByRole("option", { name: /HTTP/i }).click(); // Should show URL, username, password fields - await expect(providersPage.page.getByLabel('Proxy URL')).toBeVisible() - await expect(providersPage.page.getByLabel('Username')).toBeVisible() - await expect(providersPage.page.getByLabel('Password')).toBeVisible() - }) -}) + await expect(providersPage.page.getByLabel("Proxy URL")).toBeVisible(); + await expect(providersPage.page.getByLabel("Username")).toBeVisible(); + await expect(providersPage.page.getByLabel("Password")).toBeVisible(); + }); +}); -test.describe('Network Configuration', () => { +test.describe("Network Configuration", () => { test.beforeEach(async ({ providersPage }) => { - await providersPage.goto() - await providersPage.selectProvider('openai') - }) + await providersPage.goto(); + await providersPage.selectProvider("openai"); + }); - test('should display network config tab', async ({ providersPage }) => { - await providersPage.selectConfigTab('network') + test("should display network config tab", async ({ providersPage }) => { + await providersPage.selectConfigTab("network"); // Should see timeout and retry settings - await expect(providersPage.page.getByLabel(/Timeout/i)).toBeVisible() - await expect(providersPage.page.getByLabel(/Max Retries/i)).toBeVisible() - }) + await expect( + providersPage.page.getByLabel("Timeout (seconds)", { exact: true }), + ).toBeVisible(); + await expect(providersPage.page.getByLabel(/Max Retries/i)).toBeVisible(); + }); - test('should display backoff settings', async ({ providersPage }) => { - await providersPage.selectConfigTab('network') + test("should display backoff settings", async ({ providersPage }) => { + await providersPage.selectConfigTab("network"); // Should see backoff configuration - await expect(providersPage.page.getByLabel(/Initial Backoff/i)).toBeVisible() - await expect(providersPage.page.getByLabel(/Max Backoff/i)).toBeVisible() - }) + await expect( + providersPage.page.getByLabel(/Initial Backoff/i), + ).toBeVisible(); + await expect(providersPage.page.getByLabel(/Max Backoff/i)).toBeVisible(); + }); - test('should update timeout value', async ({ providersPage }) => { - await providersPage.selectConfigTab('network') + test("should update timeout value", async ({ providersPage }) => { + await providersPage.selectConfigTab("network"); // Ensure backoff fields are valid (minimum 100ms) so form validation passes - const initialBackoff = providersPage.page.getByLabel(/Initial Backoff/i) - const maxBackoff = providersPage.page.getByLabel(/Max Backoff/i) - const ibVal = await initialBackoff.inputValue() - const mbVal = await maxBackoff.inputValue() + const initialBackoff = providersPage.page.getByLabel(/Initial Backoff/i); + const maxBackoff = providersPage.page.getByLabel(/Max Backoff/i); + const ibVal = await initialBackoff.inputValue(); + const mbVal = await maxBackoff.inputValue(); if (Number(ibVal) < 100) { - await providersPage.fillNumberInput(initialBackoff, '500') + await providersPage.fillNumberInput(initialBackoff, "500"); } if (Number(mbVal) < 100) { - await providersPage.fillNumberInput(maxBackoff, '10000') + await providersPage.fillNumberInput(maxBackoff, "10000"); } - const timeoutInput = providersPage.page.getByLabel(/Timeout/i) - const originalValue = await timeoutInput.inputValue() - const newValue = originalValue === '30' ? '60' : '30' + const timeoutInput = providersPage.page.getByLabel("Timeout (seconds)", { + exact: true, + }); + const originalValue = await timeoutInput.inputValue(); + const newValue = originalValue === "30" ? "60" : "30"; - await providersPage.fillNumberInput(timeoutInput, newValue) + await providersPage.fillNumberInput(timeoutInput, newValue); // Verify value changed - const currentValue = await timeoutInput.inputValue() - expect(currentValue).toBe(newValue) + const currentValue = await timeoutInput.inputValue(); + expect(currentValue).toBe(newValue); // Save button should be enabled - const saveBtn = providersPage.getConfigSaveBtn('network') - await expect(saveBtn).toBeEnabled() - await providersPage.saveNetworkConfig() + const saveBtn = providersPage.getConfigSaveBtn("network"); + await expect(saveBtn).toBeEnabled(); + await providersPage.saveNetworkConfig(); // Restore original value to avoid leaving form dirty - await providersPage.fillNumberInput(timeoutInput, originalValue) - await providersPage.saveNetworkConfig() - - }) + await providersPage.fillNumberInput(timeoutInput, originalValue); + await providersPage.saveNetworkConfig(); + }); - test('should update max retries value', async ({ providersPage }) => { - await providersPage.selectConfigTab('network') + test("should update max retries value", async ({ providersPage }) => { + await providersPage.selectConfigTab("network"); // Ensure backoff fields are valid (minimum 100ms) so form validation passes - const initialBackoff = providersPage.page.getByLabel(/Initial Backoff/i) - const maxBackoff = providersPage.page.getByLabel(/Max Backoff/i) - const ibVal = await initialBackoff.inputValue() - const mbVal = await maxBackoff.inputValue() + const initialBackoff = providersPage.page.getByLabel(/Initial Backoff/i); + const maxBackoff = providersPage.page.getByLabel(/Max Backoff/i); + const ibVal = await initialBackoff.inputValue(); + const mbVal = await maxBackoff.inputValue(); if (Number(ibVal) < 100) { - await providersPage.fillNumberInput(initialBackoff, '500') + await providersPage.fillNumberInput(initialBackoff, "500"); } if (Number(mbVal) < 100) { - await providersPage.fillNumberInput(maxBackoff, '10000') + await providersPage.fillNumberInput(maxBackoff, "10000"); } - const retriesInput = providersPage.page.getByLabel(/Max Retries/i) - const originalValue = await retriesInput.inputValue() - const newValue = originalValue === '0' ? '3' : '0' + const retriesInput = providersPage.page.getByLabel(/Max Retries/i); + const originalValue = await retriesInput.inputValue(); + const newValue = originalValue === "0" ? "3" : "0"; - await providersPage.fillNumberInput(retriesInput, newValue) + await providersPage.fillNumberInput(retriesInput, newValue); // Verify value changed - const currentValue = await retriesInput.inputValue() - expect(currentValue).toBe(newValue) + const currentValue = await retriesInput.inputValue(); + expect(currentValue).toBe(newValue); // Save button should be enabled - const saveBtn = providersPage.getConfigSaveBtn('network') - await expect(saveBtn).toBeEnabled() - await providersPage.saveNetworkConfig() + const saveBtn = providersPage.getConfigSaveBtn("network"); + await expect(saveBtn).toBeEnabled(); + await providersPage.saveNetworkConfig(); // Restore original value to avoid leaving form dirty - await providersPage.fillNumberInput(retriesInput, originalValue) - await providersPage.saveNetworkConfig() - }) -}) + await providersPage.fillNumberInput(retriesInput, originalValue); + await providersPage.saveNetworkConfig(); + }); +}); -test.describe('Governance (Budget & Rate Limits)', () => { +test.describe("Governance (Budget & Rate Limits)", () => { test.beforeEach(async ({ providersPage }) => { - await providersPage.goto() - await providersPage.selectProvider('openai') - }) + await providersPage.goto(); + await providersPage.selectProvider("openai"); + }); - test('should display governance tab', async ({ providersPage }) => { - const isVisible = await providersPage.isGovernanceTabVisible() + test("should display governance tab", async ({ providersPage }) => { + const isVisible = await providersPage.isGovernanceTabVisible(); if (isVisible) { - await providersPage.selectConfigTab('governance') + await providersPage.selectConfigTab("governance"); // Should see budget configuration section - await expect(providersPage.page.getByText('Budget Configuration')).toBeVisible() + await expect( + providersPage.page.getByText("Budget Configuration"), + ).toBeVisible(); } - }) + }); - test('should display budget configuration', async ({ providersPage }) => { - const isVisible = await providersPage.isGovernanceTabVisible() + test("should display budget configuration", async ({ providersPage }) => { + const isVisible = await providersPage.isGovernanceTabVisible(); if (isVisible) { - await providersPage.selectConfigTab('governance') + await providersPage.selectConfigTab("governance"); // Should see budget limit input - const budgetInput = providersPage.page.locator('#providerBudgetMaxLimit') - await expect(budgetInput).toBeVisible() + const budgetInput = providersPage.page.locator("#providerBudgetMaxLimit"); + await expect(budgetInput).toBeVisible(); } - }) + }); - test('should display rate limiting configuration', async ({ providersPage }) => { - const isVisible = await providersPage.isGovernanceTabVisible() + test("should display rate limiting configuration", async ({ + providersPage, + }) => { + const isVisible = await providersPage.isGovernanceTabVisible(); if (isVisible) { - await providersPage.selectConfigTab('governance') + await providersPage.selectConfigTab("governance"); // Should see rate limiting section - await expect(providersPage.page.getByText('Rate Limiting Configuration')).toBeVisible() + await expect( + providersPage.page.getByText("Rate Limiting Configuration"), + ).toBeVisible(); // Should see token and request limit inputs - const tokenInput = providersPage.page.locator('#providerTokenMaxLimit') - const requestInput = providersPage.page.locator('#providerRequestMaxLimit') + const tokenInput = providersPage.page.locator("#providerTokenMaxLimit"); + const requestInput = providersPage.page.locator( + "#providerRequestMaxLimit", + ); - await expect(tokenInput).toBeVisible() - await expect(requestInput).toBeVisible() + await expect(tokenInput).toBeVisible(); + await expect(requestInput).toBeVisible(); } - }) + }); - test('should set budget limit', async ({ providersPage }) => { - const isVisible = await providersPage.isGovernanceTabVisible() + test("should set budget limit", async ({ providersPage }) => { + const isVisible = await providersPage.isGovernanceTabVisible(); if (isVisible) { - await providersPage.selectConfigTab('governance') + await providersPage.selectConfigTab("governance"); - const budgetInput = providersPage.page.locator('#providerBudgetMaxLimit') - await budgetInput.click() - await budgetInput.fill('') + const budgetInput = providersPage.page.locator("#providerBudgetMaxLimit"); + await budgetInput.click(); + await budgetInput.fill(""); // Type character by character to trigger React's onChange - await budgetInput.pressSequentially('100') + await budgetInput.pressSequentially("100"); // Verify value - const value = await budgetInput.inputValue() - expect(value).toBe('100') + const value = await budgetInput.inputValue(); + expect(value).toBe("100"); // Form should now be dirty - save button should be enabled - const saveBtn = providersPage.getConfigSaveBtn('governance') + const saveBtn = providersPage.getConfigSaveBtn("governance"); // Give React time to update the form state - await providersPage.page.waitForTimeout(500) - await expect(saveBtn).toBeEnabled({ timeout: 5000 }) + await providersPage.page.waitForTimeout(500); + await expect(saveBtn).toBeEnabled({ timeout: 5000 }); } - }) + }); - test('should set rate limits', async ({ providersPage }) => { - const isVisible = await providersPage.isGovernanceTabVisible() + test("should set rate limits", async ({ providersPage }) => { + const isVisible = await providersPage.isGovernanceTabVisible(); if (isVisible) { - await providersPage.selectConfigTab('governance') + await providersPage.selectConfigTab("governance"); // Set token limit - use pressSequentially for proper React onChange - const tokenInput = providersPage.page.locator('#providerTokenMaxLimit') - await tokenInput.click() - await tokenInput.fill('') - await tokenInput.pressSequentially('100000') + const tokenInput = providersPage.page.locator("#providerTokenMaxLimit"); + await tokenInput.click(); + await tokenInput.fill(""); + await tokenInput.pressSequentially("100000"); // Set request limit - const requestInput = providersPage.page.locator('#providerRequestMaxLimit') - await requestInput.click() - await requestInput.fill('') - await requestInput.pressSequentially('1000') + const requestInput = providersPage.page.locator( + "#providerRequestMaxLimit", + ); + await requestInput.click(); + await requestInput.fill(""); + await requestInput.pressSequentially("1000"); // Verify values - expect(await tokenInput.inputValue()).toBe('100000') - expect(await requestInput.inputValue()).toBe('1000') + expect(await tokenInput.inputValue()).toBe("100000"); + expect(await requestInput.inputValue()).toBe("1000"); } - }) -}) + }); +}); -test.describe('Debugging Tab', () => { +test.describe("Debugging Tab", () => { test.beforeEach(async ({ providersPage }) => { - await providersPage.goto() - await providersPage.selectProvider('openai') - }) - - test('should display debugging tab', async ({ providersPage }) => { - await providersPage.openConfigSheet() - const debuggingTab = providersPage.page.getByTestId('provider-tab-debugging') - await expect(debuggingTab).toBeVisible() - }) - - test('should navigate to debugging tab', async ({ providersPage }) => { - await providersPage.selectConfigTab('debugging') - - const debuggingTab = providersPage.page.getByTestId('provider-tab-debugging') - await expect(debuggingTab).toHaveAttribute('data-state', 'active') - const debuggingContent = providersPage.page.getByTestId('provider-config-debugging-content') - await expect(debuggingContent).toBeVisible() - }) -}) - -test.describe('vLLM Provider', () => { + await providersPage.goto(); + await providersPage.selectProvider("openai"); + }); + + test("should display debugging tab", async ({ providersPage }) => { + await providersPage.openConfigSheet(); + const debuggingTab = providersPage.page.getByTestId( + "provider-tab-debugging", + ); + await expect(debuggingTab).toBeVisible(); + }); + + test("should navigate to debugging tab", async ({ providersPage }) => { + await providersPage.selectConfigTab("debugging"); + + const debuggingTab = providersPage.page.getByTestId( + "provider-tab-debugging", + ); + await expect(debuggingTab).toHaveAttribute("data-state", "active"); + const debuggingContent = providersPage.page.getByTestId( + "provider-config-debugging-content", + ); + await expect(debuggingContent).toBeVisible(); + }); +}); + +test.describe("Provider specific configuration", () => { test.beforeEach(async ({ providersPage }) => { - await providersPage.goto() - }) + await providersPage.goto(); + }); - test('should display vLLM-specific key fields when adding key to vLLM provider', async ({ providersPage }) => { - const vllmAvailable = await providersPage.providerExists('vllm') + test("should display vLLM-specific key fields when adding key to vLLM provider", async ({ + providersPage, + }) => { + const vllmAvailable = await providersPage.providerExists("vllm"); if (!vllmAvailable) { - test.skip(true, 'vLLM provider not in sidebar (add from dropdown first)') - return + test.skip(true, "vLLM provider not in sidebar (add from dropdown first)"); + return; } - await providersPage.selectProvider('vllm') - await providersPage.addKeyBtn.click() + await providersPage.selectProvider("vllm"); + await providersPage.addKeyBtn.click(); - const vllmUrlInput = providersPage.page.getByTestId('key-input-vllm-url') - const vllmModelInput = providersPage.page.getByTestId('key-input-vllm-model-name') + const vllmUrlInput = providersPage.page.getByTestId("key-input-vllm-url"); + const vllmModelInput = providersPage.page.getByTestId( + "key-input-vllm-model-name", + ); - const urlVisible = await vllmUrlInput.isVisible().catch(() => false) - const modelVisible = await vllmModelInput.isVisible().catch(() => false) + const urlVisible = await vllmUrlInput.isVisible().catch(() => false); + const modelVisible = await vllmModelInput.isVisible().catch(() => false); if (!urlVisible && !modelVisible) { - test.skip(true, 'vLLM key form fields not shown (provider may use standard key form)') - return + test.skip( + true, + "vLLM key form fields not shown (provider may use standard key form)", + ); + return; + } + await expect(vllmUrlInput).toBeVisible(); + await expect(vllmModelInput).toBeVisible(); + + await providersPage.keyCancelBtn.click(); + }); + + test("should display Ollama-specific key fields when adding key to Ollama provider", async ({ + providersPage, + }) => { + const available = await providersPage.providerExists("ollama"); + if (!available) { + test.skip( + true, + "Ollama provider not in sidebar (add from dropdown first)", + ); + return; + } + + await providersPage.selectProvider("ollama"); + await providersPage.addKeyBtn.click(); + + const urlInput = providersPage.page.getByTestId("key-input-ollama-url"); + const urlVisible = await urlInput.isVisible().catch(() => false); + if (!urlVisible) { + test.skip(true, "Ollama key form fields not shown"); + return; + } + + await expect(urlInput).toBeVisible(); + await providersPage.keyCancelBtn.click(); + }); + + test("should display SGLang-specific key fields when adding key to SGLang provider", async ({ + providersPage, + }) => { + const available = await providersPage.providerExists("sgl"); + if (!available) { + test.skip( + true, + "SGLang provider not in sidebar (add from dropdown first)", + ); + return; + } + + await providersPage.selectProvider("sgl"); + await providersPage.addKeyBtn.click(); + + const urlInput = providersPage.page.getByTestId("key-input-sgl-url"); + const urlVisible = await urlInput.isVisible().catch(() => false); + if (!urlVisible) { + test.skip(true, "SGLang key form fields not shown"); + return; } - await expect(vllmUrlInput).toBeVisible() - await expect(vllmModelInput).toBeVisible() - await providersPage.keyCancelBtn.click() - }) -}) + await expect(urlInput).toBeVisible(); + await providersPage.keyCancelBtn.click(); + }); +}); \ No newline at end of file diff --git a/tests/e2e/features/virtual-keys/pages/virtual-keys.page.ts b/tests/e2e/features/virtual-keys/pages/virtual-keys.page.ts index 58a4d877b2..bb5ce516ba 100644 --- a/tests/e2e/features/virtual-keys/pages/virtual-keys.page.ts +++ b/tests/e2e/features/virtual-keys/pages/virtual-keys.page.ts @@ -1,83 +1,83 @@ -import { Locator, Page, expect } from '@playwright/test' -import { BasePage } from '../../../core/pages/base.page' -import { fillSelect, waitForNetworkIdle } from '../../../core/utils/test-helpers' +import { Locator, Page, expect } from "@playwright/test"; +import { BasePage } from "../../../core/pages/base.page"; +import { fillSelect, waitForNetworkIdle } from "../../../core/utils/test-helpers"; /** * Provider display names mapping - matches the UI's ProviderLabels * Used for exact matching when selecting providers in dropdowns */ const PROVIDER_DISPLAY_NAMES: Record = { - openai: 'OpenAI', - anthropic: 'Anthropic', - azure: 'Azure', - bedrock: 'AWS Bedrock', - cohere: 'Cohere', - vertex: 'Vertex AI', - mistral: 'Mistral AI', - ollama: 'Ollama', - groq: 'Groq', - gemini: 'Gemini', - openrouter: 'OpenRouter', - huggingface: 'HuggingFace', - cerebras: 'Cerebras', - perplexity: 'Perplexity', - elevenlabs: 'Elevenlabs', - parasail: 'Parasail', - sgl: 'SGLang', - nebius: 'Nebius Token Factory', - xai: 'xAI', -} + openai: "OpenAI", + anthropic: "Anthropic", + azure: "Azure", + bedrock: "AWS Bedrock", + cohere: "Cohere", + vertex: "Vertex AI", + mistral: "Mistral AI", + ollama: "Ollama", + groq: "Groq", + gemini: "Gemini", + openrouter: "OpenRouter", + huggingface: "HuggingFace", + cerebras: "Cerebras", + perplexity: "Perplexity", + elevenlabs: "Elevenlabs", + parasail: "Parasail", + sgl: "SGLang", + nebius: "Nebius Token Factory", + xai: "xAI", +}; /** * Escape regex special characters in a string */ function escapeRegExp(string: string): string { - return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&') + return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); } /** * Budget configuration */ export interface BudgetConfig { - maxLimit: number - resetDuration?: string + maxLimit: number; + resetDuration?: string; // e.g. "1d", "1m", "1M" — see resetDurationOptions } /** * Rate limit configuration */ export interface RateLimitConfig { - tokenMaxLimit?: number - tokenResetDuration?: string - requestMaxLimit?: number - requestResetDuration?: string + tokenMaxLimit?: number; + tokenResetDuration?: string; + requestMaxLimit?: number; + requestResetDuration?: string; } /** * Provider configuration for virtual key */ export interface ProviderConfig { - provider: string - weight?: number - allowedModels?: string[] - keyIds?: string[] - budget?: BudgetConfig - rateLimit?: RateLimitConfig + provider: string; + weight?: number; + allowedModels?: string[]; + keyIds?: string[]; + budget?: BudgetConfig; + rateLimit?: RateLimitConfig; } /** * Virtual key configuration */ export interface VirtualKeyConfig { - name: string - description?: string - isActive?: boolean - providerConfigs?: ProviderConfig[] - budget?: BudgetConfig - rateLimit?: RateLimitConfig - entityType?: 'none' | 'team' | 'customer' - teamId?: string - customerId?: string + name: string; + description?: string; + isActive?: boolean; + providerConfigs?: ProviderConfig[]; + budgets?: BudgetConfig[]; + rateLimit?: RateLimitConfig; + entityType?: "none" | "team" | "customer"; + teamId?: string; + customerId?: string; } /** @@ -85,60 +85,141 @@ export interface VirtualKeyConfig { */ export class VirtualKeysPage extends BasePage { // Main page elements - readonly createBtn: Locator - readonly table: Locator - readonly emptyState: Locator + readonly createBtn: Locator; + readonly table: Locator; + readonly emptyState: Locator; // Virtual key sheet elements - readonly sheet: Locator - readonly nameInput: Locator - readonly descriptionInput: Locator - readonly isActiveToggle: Locator - readonly providerSelect: Locator - readonly saveBtn: Locator - readonly cancelBtn: Locator + readonly sheet: Locator; + readonly nameInput: Locator; + readonly descriptionInput: Locator; + readonly isActiveToggle: Locator; + readonly providerSelect: Locator; + readonly saveBtn: Locator; + readonly cancelBtn: Locator; constructor(page: Page) { - super(page) + super(page); // Main page elements - this.createBtn = page.getByTestId('create-vk-btn') - this.table = page.getByTestId('vk-table') - this.emptyState = page.getByTestId('virtual-keys-empty-state') + this.createBtn = page.getByTestId("create-vk-btn"); + this.table = page.getByTestId("vk-table"); + this.emptyState = page.getByTestId("virtual-keys-empty-state"); // Virtual key sheet elements - this.sheet = page.getByTestId('vk-sheet') - this.nameInput = page.getByTestId('vk-name-input') - this.descriptionInput = page.getByTestId('vk-description-input') - this.isActiveToggle = page.getByTestId('vk-is-active-toggle') - this.providerSelect = page.getByTestId('vk-provider-select') - this.saveBtn = page.getByTestId('vk-save-btn') - this.cancelBtn = page.getByTestId('vk-cancel-btn') + this.sheet = page.getByTestId("vk-sheet-content"); + this.nameInput = page.getByTestId("vk-name-input"); + this.descriptionInput = page.getByTestId("vk-description-input"); + this.isActiveToggle = page.getByTestId("vk-is-active-toggle"); + this.providerSelect = page.getByTestId("vk-provider-select"); + this.saveBtn = page.getByTestId("vk-save-btn"); + this.cancelBtn = page.getByTestId("vk-cancel-btn"); } /** * Navigate to the virtual keys page */ async goto(): Promise { - await this.page.goto('/workspace/virtual-keys') - await waitForNetworkIdle(this.page) + await this.page.goto("/workspace/governance/virtual-keys"); + await waitForNetworkIdle(this.page); + } + + /** + * Search the virtual keys table by name. + */ + async searchVirtualKeys(name: string): Promise { + const searchInput = this.page.getByTestId("vk-search-input"); + + for (let attempt = 0; attempt < 3; attempt++) { + await searchInput.fill("", { force: true }); + await expect(searchInput) + .toHaveValue("", { timeout: 2000 }) + .catch(() => {}); + + await searchInput.fill(name, { force: true }); + const matched = await expect(searchInput) + .toHaveValue(name, { timeout: 2000 }) + .then(() => true) + .catch(() => false); + if (matched) break; + + await this.page.waitForTimeout(500); + } + + await expect(searchInput) + .toHaveValue(name, { timeout: 10000 }) + .catch(() => {}); + await this.page.waitForTimeout(350); + } + + /** + * Clear table filters that can hide newly created or cleanup-target rows. + */ + async clearTableFilters(): Promise { + const searchInput = this.page.getByTestId("vk-search-input"); + if (await searchInput.isVisible().catch(() => false)) { + await searchInput.fill(""); + } + await this.page.waitForTimeout(350); } /** * Get virtual key row locator by name */ getVirtualKeyRow(name: string): Locator { - return this.page.getByTestId(`vk-row-${name}`) + return this.page.getByTestId(`vk-row-${name}`); + } + + /** + * Open the row actions dropdown and click Edit. + */ + private async openVirtualKeyEditor(name: string): Promise { + await this.searchVirtualKeys(name); + + const row = this.getVirtualKeyRow(name); + await expect(row).toBeVisible({ timeout: 10000 }); + await row.scrollIntoViewIfNeeded(); + + const actionsBtn = this.page.getByTestId(`vk-actions-btn-${name}`); + await actionsBtn.waitFor({ state: "visible", timeout: 10000 }); + await actionsBtn.scrollIntoViewIfNeeded(); + await actionsBtn.click(); + + const editBtn = this.page.getByTestId(`vk-edit-btn-${name}`); + await editBtn.waitFor({ state: "visible", timeout: 10000 }); + await editBtn.click(); + + await expect(this.sheet).toBeVisible({ timeout: 10000 }); + await this.waitForSheetAnimation(); + await expect(this.nameInput).toHaveValue(name, { timeout: 10000 }); + } + + /** + * Wait for the sheet to close after a save. If it stays open, close it without + * racing a button that may already be detaching during the save animation. + */ + private async waitForSheetClosedAfterSave(): Promise { + const closed = await expect(this.sheet) + .not.toBeVisible({ timeout: 5000 }) + .then(() => true) + .catch(() => false); + if (!closed) { + await this.page.keyboard.press("Escape"); + await expect(this.sheet).not.toBeVisible({ timeout: 5000 }); + } + + await expect(this.page.locator("html")) + .not.toHaveClass(/bprogress-busy/, { timeout: 10000 }) + .catch(() => {}); } /** * Check if a virtual key exists in the table */ async virtualKeyExists(name: string): Promise { - const row = this.getVirtualKeyRow(name) - // Use count() to check if element exists in DOM (doesn't require visibility) - const count = await row.count() - return count > 0 + await this.searchVirtualKeys(name).catch(() => {}); + const row = this.getVirtualKeyRow(name); + return await row.isVisible({ timeout: 5000 }).catch(() => false); } /** @@ -146,12 +227,12 @@ export class VirtualKeysPage extends BasePage { * When masked, the display shows bullets (•); when revealed, it shows the full key. */ async isKeyRevealed(name: string): Promise { - const row = this.getVirtualKeyRow(name) - const keyCell = row.getByTestId('vk-key-value') - await keyCell.waitFor({ state: 'visible', timeout: 5000 }) - const text = (await keyCell.textContent())?.trim() ?? '' + const row = this.getVirtualKeyRow(name); + const keyCell = row.getByTestId("vk-key-value"); + await keyCell.waitFor({ state: "visible", timeout: 5000 }); + const text = (await keyCell.textContent())?.trim() ?? ""; // Masked keys contain bullet character; revealed keys do not - return text.length > 0 && !text.includes('•') + return text.length > 0 && !text.includes("•"); } /** @@ -159,65 +240,64 @@ export class VirtualKeysPage extends BasePage { */ async createVirtualKey(config: VirtualKeyConfig): Promise { // Click create button - await this.createBtn.click() + await this.createBtn.click(); // Wait for sheet to appear and animation to complete - await expect(this.sheet).toBeVisible() - await this.waitForSheetAnimation() + await expect(this.sheet).toBeVisible(); + await this.waitForSheetAnimation(); // Fill basic information using keyboard navigation - await this.nameInput.focus() - await this.page.keyboard.type(config.name) + await this.nameInput.focus(); + await this.page.keyboard.type(config.name); if (config.description) { - await this.page.keyboard.press('Tab') // Move to description - await this.page.keyboard.type(config.description) + await this.page.keyboard.press("Tab"); // Move to description + await this.page.keyboard.type(config.description); } // Set active state if specified (default is true, so only toggle if we want inactive) if (config.isActive === false) { - await this.isActiveToggle.focus() - await this.page.keyboard.press('Space') // Toggle the switch + await this.isActiveToggle.focus(); + await this.page.keyboard.press("Space"); // Toggle the switch } // Add provider configurations if (config.providerConfigs && config.providerConfigs.length > 0) { for (const providerConfig of config.providerConfigs) { - await this.addProviderConfig(providerConfig) + await this.addProviderConfig(providerConfig); } } // Set budget if specified - if (config.budget) { - await this.setBudget(config.budget) + if (config.budgets && config.budgets.length > 0) { + await this.setBudgets(config.budgets); } // Set rate limits if specified if (config.rateLimit) { - await this.setRateLimit(config.rateLimit) + await this.setRateLimit(config.rateLimit); } // Set entity assignment if specified - if (config.entityType && config.entityType !== 'none') { - await this.setEntityAssignment(config.entityType, config.teamId, config.customerId) + if (config.entityType && config.entityType !== "none") { + await this.setEntityAssignment(config.entityType, config.teamId, config.customerId); } - // Save the virtual key by clicking the save button - await this.saveBtn.click() - - // Wait for success toast - await this.waitForSuccessToast() - - // Wait for toasts to disappear before continuing - await this.dismissToasts() - - // Wait for sheet to close - await expect(this.sheet).not.toBeVisible({ timeout: 5000 }) + await expect(this.saveBtn).toBeEnabled({ timeout: 10000 }); - // Wait for the new row to appear in the table (ensures table has refreshed) - const row = this.getVirtualKeyRow(config.name) - await row.waitFor({ state: 'attached', timeout: 10000 }) - await row.scrollIntoViewIfNeeded() + // Save the virtual key by clicking the save button + await this.saveBtn.click(); + + // Wait for the new row to appear in the table. This is a stronger success + // signal than sonner toasts, which can overlap between tests and animations. + await this.searchVirtualKeys(config.name); + const row = this.getVirtualKeyRow(config.name); + await row.waitFor({ state: "visible", timeout: 15000 }); + await row.scrollIntoViewIfNeeded(); + + await expect(this.sheet) + .not.toBeVisible({ timeout: 5000 }) + .catch(() => {}); } /** @@ -225,47 +305,77 @@ export class VirtualKeysPage extends BasePage { */ private async addProviderConfig(config: ProviderConfig): Promise { // Click the provider select dropdown - await this.providerSelect.click() + await this.providerSelect.click(); // Wait for dropdown content - await this.page.waitForSelector('[role="listbox"]', { timeout: 5000 }) + await this.page.waitForSelector('[role="listbox"]', { timeout: 5000 }); // Get display name - use mapping for known providers, otherwise use exact name - const displayName = PROVIDER_DISPLAY_NAMES[config.provider.toLowerCase()] || config.provider + const displayName = PROVIDER_DISPLAY_NAMES[config.provider.toLowerCase()] || config.provider; // First try exact match for base providers (e.g., "OpenAI", "Anthropic") - let option = this.page.getByRole('option', { name: displayName, exact: true }) + let option = this.page.getByRole("option", { name: displayName, exact: true }); - if (await option.count() === 0) { + if ((await option.count()) === 0) { // Fallback: try partial match for custom providers (contains provider name) // This handles custom providers like "test-anthropic-1234567890" - option = this.page.getByRole('option').filter({ - hasText: new RegExp(escapeRegExp(config.provider), 'i') - }).first() + option = this.page + .getByRole("option") + .filter({ + hasText: new RegExp(escapeRegExp(config.provider), "i"), + }) + .first(); } // Verify we found a matching option - const optionCount = await option.count() + const optionCount = await option.count(); if (optionCount === 0) { - throw new Error(`No provider option found matching "${config.provider}" (display name: "${displayName}")`) + throw new Error( + `No provider option found matching "${config.provider}" (display name: "${displayName}")`, + ); } - await option.click() + await option.click(); // Wait for dropdown to close after selection - await this.page.waitForSelector('[role="listbox"]', { state: 'hidden', timeout: 5000 }) + await this.page.waitForSelector('[role="listbox"]', { state: "hidden", timeout: 5000 }); } /** - * Set budget configuration in the form + * Set budget lines in the MultiBudgetLines component. + * Clicks "Add Budget" for each entry, fills the amount input, + * and selects the reset period. */ - private async setBudget(budget: BudgetConfig): Promise { - // Find budget max limit input and fill (fill() clears and sets atomically) - const budgetInput = this.page.locator('#budgetMaxLimit') - await budgetInput.fill(String(budget.maxLimit)) + private async setBudgets(budgets: BudgetConfig[]): Promise { + for (let i = 0; i < budgets.length; i++) { + const budget = budgets[i]; + // Click "Add Budget" button to add a new budget line + await this.page.getByTestId("vk-budget-lines-add-btn").click(); + const amountInput = this.page.getByTestId(`vk-budget-lines-amount-${i}`); + await amountInput.fill(String(budget.maxLimit)); + // Select reset period if specified + if (budget.resetDuration) { + await this.page.getByTestId(`vk-budget-lines-line-${i}`).getByRole("combobox").click(); + await this.page + .getByRole("option", { name: this.resetDurationLabel(budget.resetDuration), exact: true }) + .click(); + } + } + } - // Set reset duration if specified - skip for now as default is fine - // The reset duration select is complex and default "Monthly" is usually correct + private resetDurationLabel(value: string): string { + const labels: Record = { + "1m": "Every Minute", + "5m": "Every 5 Minutes", + "15m": "Every 15 Minutes", + "30m": "Every 30 Minutes", + "1h": "Hourly", + "6h": "Every 6 Hours", + "1d": "Daily", + "1w": "Weekly", + "1M": "Monthly", + }; + return labels[value] ?? value; } /** @@ -274,14 +384,14 @@ export class VirtualKeysPage extends BasePage { private async setRateLimit(rateLimit: RateLimitConfig): Promise { // Set token limits (fill() clears and sets atomically) if (rateLimit.tokenMaxLimit !== undefined) { - const tokenInput = this.page.locator('#tokenMaxLimit') - await tokenInput.fill(String(rateLimit.tokenMaxLimit)) + const tokenInput = this.page.locator("#tokenMaxLimit"); + await tokenInput.fill(String(rateLimit.tokenMaxLimit)); } // Set request limits (fill() clears and sets atomically) if (rateLimit.requestMaxLimit !== undefined) { - const requestInput = this.page.locator('#requestMaxLimit') - await requestInput.fill(String(rateLimit.requestMaxLimit)) + const requestInput = this.page.locator("#requestMaxLimit"); + await requestInput.fill(String(rateLimit.requestMaxLimit)); } } @@ -289,29 +399,29 @@ export class VirtualKeysPage extends BasePage { * Set entity assignment (team or customer) */ private async setEntityAssignment( - entityType: 'team' | 'customer', + entityType: "team" | "customer", teamId?: string, - customerId?: string + customerId?: string, ): Promise { // Find and click entity type select - const entityTypeSelect = this.page.locator('[data-testid="vk-entity-type-select"]') + const entityTypeSelect = this.page.locator('[data-testid="vk-entity-type-select"]'); if (await entityTypeSelect.isVisible()) { await fillSelect( this.page, '[data-testid="vk-entity-type-select"]', - entityType === 'team' ? 'Assign to Team' : 'Assign to Customer' - ) + entityType === "team" ? "Assign to Team" : "Assign to Customer", + ); // Select team or customer - if (entityType === 'team' && teamId) { - const teamSelect = this.page.locator('[data-testid="vk-team-select"]') + if (entityType === "team" && teamId) { + const teamSelect = this.page.locator('[data-testid="vk-team-select"]'); if (await teamSelect.isVisible()) { - await fillSelect(this.page, '[data-testid="vk-team-select"]', teamId) + await fillSelect(this.page, '[data-testid="vk-team-select"]', teamId); } - } else if (entityType === 'customer' && customerId) { - const customerSelect = this.page.locator('[data-testid="vk-customer-select"]') + } else if (entityType === "customer" && customerId) { + const customerSelect = this.page.locator('[data-testid="vk-customer-select"]'); if (await customerSelect.isVisible()) { - await fillSelect(this.page, '[data-testid="vk-customer-select"]', customerId) + await fillSelect(this.page, '[data-testid="vk-customer-select"]', customerId); } } } @@ -322,70 +432,54 @@ export class VirtualKeysPage extends BasePage { */ async editVirtualKey(name: string, updates: Partial): Promise { // Wait for any existing toasts to disappear - await this.forceCloseToasts() - - // Find and click the edit button using data-testid - const editBtn = this.page.getByTestId(`vk-edit-btn-${name}`) - await editBtn.waitFor({ state: 'visible', timeout: 10000 }) - await editBtn.scrollIntoViewIfNeeded() - await editBtn.click() - - // Wait for sheet to appear and animation to complete - await expect(this.sheet).toBeVisible() - await this.waitForSheetAnimation() + await this.forceCloseToasts(); + await this.openVirtualKeyEditor(name); // Update name using clear() and fill() for cross-platform compatibility if (updates.name) { - await this.nameInput.clear() - await this.nameInput.fill(updates.name) + await this.nameInput.clear(); + await this.nameInput.fill(updates.name); } // Update description using clear() and fill() for cross-platform compatibility if (updates.description !== undefined) { - await this.descriptionInput.clear() + await this.descriptionInput.clear(); if (updates.description) { - await this.descriptionInput.fill(updates.description) + await this.descriptionInput.fill(updates.description); } } // Update toggle using click() and data-state attribute for reliability if (updates.isActive !== undefined) { // Check current state using data-state attribute (Radix Switch) - const isCurrentlyChecked = await this.isActiveToggle.getAttribute('data-state') === 'checked' + const isCurrentlyChecked = + (await this.isActiveToggle.getAttribute("data-state")) === "checked"; if (isCurrentlyChecked !== updates.isActive) { - await this.isActiveToggle.click() + await this.isActiveToggle.click(); } } - if (updates.budget) { - await this.setBudget(updates.budget) + if (updates.budgets && updates.budgets.length > 0) { + await this.setBudgets(updates.budgets); } if (updates.rateLimit) { - await this.setRateLimit(updates.rateLimit) + await this.setRateLimit(updates.rateLimit); } - // Save changes by clicking the save button - await this.saveBtn.click() + await expect(this.saveBtn).toBeEnabled({ timeout: 10000 }); - // Wait for success toast - await this.waitForSuccessToast() + // Save changes by clicking the save button + await this.saveBtn.click(); - // Wait for toasts to disappear before continuing - await this.dismissToasts() + await this.waitForSheetClosedAfterSave(); - // Check if sheet is still visible - it may not auto-close - const isSheetVisible = await this.sheet.isVisible().catch(() => false) - if (isSheetVisible) { - // Try clicking the close button or pressing Escape - const closeBtn = this.sheet.locator('button[aria-label*="close"], button:has(svg.lucide-x)').first() - if (await closeBtn.isVisible().catch(() => false)) { - await closeBtn.click() - } else { - await this.page.keyboard.press('Escape') - } - await expect(this.sheet).not.toBeVisible({ timeout: 5000 }) + const targetName = updates.name ?? name; + if (updates.name) { + await this.goto(); } + await this.searchVirtualKeys(targetName); + await expect(this.getVirtualKeyRow(targetName)).toBeVisible({ timeout: 10000 }); } /** @@ -393,89 +487,86 @@ export class VirtualKeysPage extends BasePage { * Polls so we don't rely on a stale locator. */ async waitForVirtualKeyGone(name: string, timeoutMs: number): Promise { - const deadline = Date.now() + timeoutMs + const deadline = Date.now() + timeoutMs; while (Date.now() < deadline) { - if ((await this.getVirtualKeyRow(name).count()) === 0) return true - await this.page.waitForTimeout(500) + if ((await this.getVirtualKeyRow(name).count()) === 0) return true; + await this.page.waitForTimeout(500); } - return false + return false; } async deleteVirtualKey(name: string, options?: { requireToast?: boolean }): Promise { // Check if virtual key exists first - const exists = await this.virtualKeyExists(name) + const exists = await this.virtualKeyExists(name); if (!exists) { // Already deleted or doesn't exist, nothing to do - return + return; } // Wait for any existing toasts to disappear - await this.forceCloseToasts() + await this.forceCloseToasts(); // Find the delete button using data-testid (scroll row into view in case table just loaded) - const row = this.getVirtualKeyRow(name) - await row.scrollIntoViewIfNeeded().catch(() => {}) - await this.page.waitForTimeout(300) + const row = this.getVirtualKeyRow(name); + await row.scrollIntoViewIfNeeded().catch(() => {}); + await this.page.waitForTimeout(300); + + await this.searchVirtualKeys(name).catch(() => {}); + await expect(row).toBeVisible({ timeout: 10000 }); + + const actionsBtn = this.page.getByTestId(`vk-actions-btn-${name}`); + await actionsBtn.waitFor({ state: "visible", timeout: 10000 }); + await actionsBtn.scrollIntoViewIfNeeded(); + await actionsBtn.click(); - const deleteBtn = this.page.getByTestId(`vk-delete-btn-${name}`) + const deleteBtn = this.page.getByTestId(`vk-delete-btn-${name}`); // Check if button exists; if not, give table a moment and re-check once - let btnCount = await deleteBtn.count() + let btnCount = await deleteBtn.count(); if (btnCount === 0) { - await this.page.waitForTimeout(800) - btnCount = await deleteBtn.count() + await this.page.waitForTimeout(800); + btnCount = await deleteBtn.count(); } if (btnCount === 0) { - const stillExists = await this.virtualKeyExists(name) - if (!stillExists) return - throw new Error(`Delete button not found for virtual key: ${name}`) + const stillExists = await this.virtualKeyExists(name); + if (!stillExists) return; + throw new Error(`Delete button not found for virtual key: ${name}`); } // Check if button is disabled - const isDisabled = await deleteBtn.isDisabled().catch(() => false) + const isDisabled = await deleteBtn.isDisabled().catch(() => false); if (isDisabled) { - throw new Error(`Delete button is disabled for virtual key: ${name} (likely due to RBAC permissions)`) + throw new Error( + `Delete button is disabled for virtual key: ${name} (likely due to RBAC permissions)`, + ); } - await deleteBtn.waitFor({ state: 'visible', timeout: 10000 }) - await deleteBtn.scrollIntoViewIfNeeded() - await deleteBtn.click() + await deleteBtn.waitFor({ state: "visible", timeout: 10000 }); + await deleteBtn.scrollIntoViewIfNeeded(); + await deleteBtn.click(); // Wait for confirmation dialog and confirm deletion (match "Delete" or "Deleting...") - const confirmDialog = this.page.locator('[role="alertdialog"]') - await confirmDialog.waitFor({ state: 'visible', timeout: 5000 }) - const confirmBtn = confirmDialog.getByRole('button', { name: /Delete/i }) - await confirmBtn.waitFor({ state: 'visible', timeout: 2000 }) - - // Wait for DELETE API response - const deleteResponsePromise = this.page.waitForResponse( - (response) => { - const url = response.url() - return url.includes('/api/virtual-keys/') && response.request().method() === 'DELETE' - }, - { timeout: 15000 } - ) - await confirmBtn.click() - const deleteResponse = await deleteResponsePromise.catch((err) => { - console.warn(`[deleteVirtualKey] No DELETE response captured for "${name}": ${err}`) - return null - }) - if (deleteResponse && !deleteResponse.ok()) { - console.warn(`[deleteVirtualKey] DELETE responded with ${deleteResponse.status()} for "${name}"`) - } + const confirmDialog = this.page.locator('[role="alertdialog"]'); + await confirmDialog.waitFor({ state: "visible", timeout: 5000 }); + const confirmBtn = confirmDialog.getByRole("button", { name: /Delete/i }); + await confirmBtn.waitFor({ state: "visible", timeout: 2000 }); + + await confirmBtn.click(); // Wait for table to refetch and row to disappear (poll fresh locator; avoid stale row reference) - const gone = await this.waitForVirtualKeyGone(name, 20000) + const gone = await this.waitForVirtualKeyGone(name, 20000); if (!gone) { - throw new Error(`Virtual key "${name}" still visible after delete`) + throw new Error(`Virtual key "${name}" still visible after delete`); } // Optionally wait for success toast (skip in cleanup to avoid false failures) if (options?.requireToast !== false) { - await this.getToast().waitFor({ state: 'visible', timeout: 5000 }).catch(() => {}) + await this.getToast() + .waitFor({ state: "visible", timeout: 5000 }) + .catch(() => {}); } - await this.dismissToasts() + await this.dismissToasts().catch(() => {}); } /** @@ -483,36 +574,28 @@ export class VirtualKeysPage extends BasePage { */ async viewVirtualKey(name: string): Promise { // Wait for any existing toasts to disappear - await this.forceCloseToasts() - - // Use the edit button to open the detail sheet - const editBtn = this.page.getByTestId(`vk-edit-btn-${name}`) - await editBtn.waitFor({ state: 'visible', timeout: 10000 }) - await editBtn.scrollIntoViewIfNeeded() - await editBtn.click() - - // Wait for detail sheet to appear - await expect(this.sheet).toBeVisible({ timeout: 5000 }) + await this.forceCloseToasts(); + await this.openVirtualKeyEditor(name); } /** * Get the count of virtual keys in the table */ async getVirtualKeyCount(): Promise { - const rows = this.table.locator('tbody tr') - const count = await rows.count() + const rows = this.table.locator("tbody tr"); + const count = await rows.count(); if (count === 0) { - return 0 + return 0; } // Check if it's the empty state row - const firstRowText = await rows.first().textContent() - if (firstRowText?.includes('No virtual keys found')) { - return 0 + const firstRowText = await rows.first().textContent(); + if (firstRowText?.includes("No virtual keys found")) { + return 0; } - return count + return count; } /** @@ -520,12 +603,12 @@ export class VirtualKeysPage extends BasePage { */ async copyVirtualKeyValue(name: string): Promise { // Find and click the copy button using data-testid - const copyBtn = this.page.getByTestId(`vk-copy-btn-${name}`) - await copyBtn.waitFor({ state: 'attached', timeout: 10000 }) - await copyBtn.scrollIntoViewIfNeeded() - await copyBtn.click() + const copyBtn = this.page.getByTestId(`vk-copy-btn-${name}`); + await copyBtn.waitFor({ state: "attached", timeout: 10000 }); + await copyBtn.scrollIntoViewIfNeeded(); + await copyBtn.click(); - await this.waitForSuccessToast('Copied') + await this.waitForSuccessToast("Copied"); } /** @@ -533,26 +616,30 @@ export class VirtualKeysPage extends BasePage { */ async toggleKeyVisibility(name: string): Promise { // Find and click the visibility toggle button using data-testid - const toggleBtn = this.page.getByTestId(`vk-visibility-btn-${name}`) - await toggleBtn.waitFor({ state: 'attached', timeout: 10000 }) - await toggleBtn.scrollIntoViewIfNeeded() - await toggleBtn.click() + const toggleBtn = this.page.getByTestId(`vk-visibility-btn-${name}`); + await toggleBtn.waitFor({ state: "attached", timeout: 10000 }); + await toggleBtn.scrollIntoViewIfNeeded(); + await toggleBtn.click(); } /** * Close any open sheet/dialog */ async closeSheet(): Promise { - const isSheetVisible = await this.sheet.isVisible().catch(() => false) + const isSheetVisible = await this.sheet.isVisible().catch(() => false); if (isSheetVisible) { // We have to click on the close button to close the sheet - const closeBtn = this.sheet.locator('button[aria-label*="close"], button:has(svg.lucide-x)').first() + const closeBtn = this.sheet + .locator('button[aria-label*="close"], button:has(svg.lucide-x)') + .first(); if (await closeBtn.isVisible().catch(() => false)) { - await closeBtn.click() + await closeBtn.click(); } else { - await this.page.keyboard.press('Escape') + await this.page.keyboard.press("Escape"); } - await expect(this.sheet).not.toBeVisible({ timeout: 5000 }).catch(() => {}) + await expect(this.sheet) + .not.toBeVisible({ timeout: 5000 }) + .catch(() => {}); } } @@ -560,25 +647,25 @@ export class VirtualKeysPage extends BasePage { * Get all virtual key names from the table */ async getAllVirtualKeyNames(): Promise { - const names: string[] = [] - const count = await this.getVirtualKeyCount() + const names: string[] = []; + const count = await this.getVirtualKeyCount(); - if (count === 0) return names + if (count === 0) return names; // Find all delete buttons which have the VK name in their test-id - const deleteButtons = this.page.locator('[data-testid^="vk-delete-btn-"]') - const buttonCount = await deleteButtons.count() + const deleteButtons = this.page.locator('[data-testid^="vk-delete-btn-"]'); + const buttonCount = await deleteButtons.count(); for (let i = 0; i < buttonCount; i++) { - const testId = await deleteButtons.nth(i).getAttribute('data-testid') + const testId = await deleteButtons.nth(i).getAttribute("data-testid"); if (testId) { // Extract name from test-id: "vk-delete-btn-{name}" - const name = testId.replace('vk-delete-btn-', '') - names.push(name) + const name = testId.replace("vk-delete-btn-", ""); + names.push(name); } } - return names + return names; } /** @@ -586,59 +673,61 @@ export class VirtualKeysPage extends BasePage { */ async cleanupAllVirtualKeys(): Promise { // First close any open sheet - await this.closeSheet() + await this.closeSheet(); // Wait for any toasts to clear - await this.dismissToasts() + await this.dismissToasts(); // Keep trying until no more VKs exist - let attempts = 0 - const maxAttempts = 10 // Prevent infinite loops + let attempts = 0; + const maxAttempts = 10; // Prevent infinite loops while (attempts < maxAttempts) { // Get current VK names (refresh the list each iteration) - const names = await this.getAllVirtualKeyNames() + const names = await this.getAllVirtualKeyNames(); if (names.length === 0) { // No more VKs to delete - break + break; } // Delete each one for (const name of names) { try { // Check if VK still exists before trying to delete - const exists = await this.virtualKeyExists(name) + const exists = await this.virtualKeyExists(name); if (!exists) { // Already deleted, skip - continue + continue; } // Make sure sheet is closed before each delete - await this.closeSheet() - await this.deleteVirtualKey(name) + await this.closeSheet(); + await this.deleteVirtualKey(name); // Wait a bit for table to refresh - await this.page.waitForTimeout(500) + await this.page.waitForTimeout(500); } catch (error) { // If delete fails, try to close sheet and continue - await this.closeSheet() - const errorMsg = error instanceof Error ? error.message : String(error) - console.log(`Failed to delete virtual key: ${name} - ${errorMsg}`) + await this.closeSheet(); + const errorMsg = error instanceof Error ? error.message : String(error); + console.log(`Failed to delete virtual key: ${name} - ${errorMsg}`); // Continue with next VK } } - attempts++ + attempts++; // Wait a bit before next iteration to allow table to refresh - await this.page.waitForTimeout(1000) + await this.page.waitForTimeout(1000); } if (attempts >= maxAttempts) { - const remainingNames = await this.getAllVirtualKeyNames() + const remainingNames = await this.getAllVirtualKeyNames(); if (remainingNames.length > 0) { - console.log(`Warning: Could not delete all virtual keys after ${maxAttempts} attempts. Remaining: ${remainingNames.join(', ')}`) + console.log( + `Warning: Could not delete all virtual keys after ${maxAttempts} attempts. Remaining: ${remainingNames.join(", ")}`, + ); } } } @@ -647,37 +736,36 @@ export class VirtualKeysPage extends BasePage { * Clean up specific virtual keys by name */ async cleanupVirtualKeys(names: string[]): Promise { - if (names.length === 0) return + if (names.length === 0) return; // Ensure we're on the virtual keys list with a fresh load so table is ready - await this.goto() - await this.closeSheet() - await this.dismissToasts() - await this.table.waitFor({ state: 'visible', timeout: 10000 }).catch(() => {}) - await this.page.waitForTimeout(500) + await this.goto(); + await this.closeSheet(); + await this.dismissToasts().catch(() => {}); + await this.table.waitFor({ state: "visible", timeout: 10000 }).catch(() => {}); for (const name of names) { const tryDelete = async (): Promise => { - const exists = await this.virtualKeyExists(name) - if (!exists) return - await this.closeSheet() - await this.deleteVirtualKey(name, { requireToast: false }) - } + const exists = await this.virtualKeyExists(name); + if (!exists) return; + await this.closeSheet(); + await this.deleteVirtualKey(name, { requireToast: false }); + }; try { - await tryDelete() + await tryDelete(); } catch (error) { - const errorMsg = error instanceof Error ? error.message : String(error) - console.error(`[CLEANUP ERROR] Failed to delete virtual key: ${name} - ${errorMsg}`) - await this.closeSheet() - await this.page.waitForTimeout(1000) + const errorMsg = error instanceof Error ? error.message : String(error); + console.error(`[CLEANUP ERROR] Failed to delete virtual key: ${name} - ${errorMsg}`); + await this.closeSheet(); + await this.page.waitForTimeout(1000); try { - await tryDelete() + await tryDelete(); } catch (retryError) { - const retryMsg = retryError instanceof Error ? retryError.message : String(retryError) - console.error(`[CLEANUP ERROR] Retry failed for virtual key: ${name} - ${retryMsg}`) + const retryMsg = retryError instanceof Error ? retryError.message : String(retryError); + console.error(`[CLEANUP ERROR] Retry failed for virtual key: ${name} - ${retryMsg}`); } } } } -} +} \ No newline at end of file diff --git a/tests/e2e/features/virtual-keys/virtual-keys.data.ts b/tests/e2e/features/virtual-keys/virtual-keys.data.ts index 5d3a8fb007..ff58b72c25 100644 --- a/tests/e2e/features/virtual-keys/virtual-keys.data.ts +++ b/tests/e2e/features/virtual-keys/virtual-keys.data.ts @@ -39,10 +39,10 @@ export function createVirtualKeyWithProvider( } /** - * Factory function to create virtual key with budget + * Factory function to create virtual key with one or more budget lines */ export function createVirtualKeyWithBudget( - budget: BudgetConfig, + budgets: BudgetConfig[], vkOverrides: Partial = {} ): VirtualKeyConfig { const timestamp = Date.now() @@ -50,7 +50,7 @@ export function createVirtualKeyWithBudget( name: `Test VK Budget ${timestamp}`, description: 'Virtual key with budget configuration', isActive: true, - budget, + budgets, ...vkOverrides, } } @@ -133,6 +133,10 @@ export const SAMPLE_BUDGETS: Record = { maxLimit: 200, resetDuration: '1w', }, + everyMinute: { + maxLimit: 5, + resetDuration: '1m', + }, } /** diff --git a/tests/e2e/features/virtual-keys/virtual-keys.spec.ts b/tests/e2e/features/virtual-keys/virtual-keys.spec.ts index 2d98781517..bbb3d91c94 100644 --- a/tests/e2e/features/virtual-keys/virtual-keys.spec.ts +++ b/tests/e2e/features/virtual-keys/virtual-keys.spec.ts @@ -104,9 +104,9 @@ test.describe('Virtual Keys', () => { }) test.describe('Virtual Key with Budget', () => { - test('should create virtual key with small budget', async ({ virtualKeysPage }) => { - const vkData = createVirtualKeyWithBudget(SAMPLE_BUDGETS.small, { - name: `Small Budget VK ${Date.now()}`, + test('should create virtual key with daily budget', async ({ virtualKeysPage }) => { + const vkData = createVirtualKeyWithBudget([SAMPLE_BUDGETS.daily], { + name: `Daily Budget VK ${Date.now()}`, }) createdVKs.push(vkData.name) @@ -118,14 +118,14 @@ test.describe('Virtual Keys', () => { // Verify budget was saved correctly await virtualKeysPage.viewVirtualKey(vkData.name) await virtualKeysPage.waitForSheetAnimation() - const budgetInput = virtualKeysPage.page.locator('#budgetMaxLimit') - await expect(budgetInput).toHaveValue(String(SAMPLE_BUDGETS.small.maxLimit)) + const amountInput = virtualKeysPage.page.getByTestId('vk-budget-lines-amount-0') + await expect(amountInput).toHaveValue(String(SAMPLE_BUDGETS.daily.maxLimit)) await virtualKeysPage.closeSheet() }) - test('should create virtual key with medium budget', async ({ virtualKeysPage }) => { - const vkData = createVirtualKeyWithBudget(SAMPLE_BUDGETS.medium, { - name: `Medium Budget VK ${Date.now()}`, + test('should create virtual key with every-minute budget', async ({ virtualKeysPage }) => { + const vkData = createVirtualKeyWithBudget([SAMPLE_BUDGETS.everyMinute], { + name: `Minute Budget VK ${Date.now()}`, }) createdVKs.push(vkData.name) @@ -133,18 +133,32 @@ test.describe('Virtual Keys', () => { const vkExists = await virtualKeysPage.virtualKeyExists(vkData.name) expect(vkExists).toBe(true) + + // Verify budget was saved correctly + await virtualKeysPage.viewVirtualKey(vkData.name) + await virtualKeysPage.waitForSheetAnimation() + const amountInput = virtualKeysPage.page.getByTestId('vk-budget-lines-amount-0') + await expect(amountInput).toHaveValue(String(SAMPLE_BUDGETS.everyMinute.maxLimit)) + await virtualKeysPage.closeSheet() }) - test('should create virtual key with daily budget', async ({ virtualKeysPage }) => { - const vkData = createVirtualKeyWithBudget(SAMPLE_BUDGETS.daily, { - name: `Daily Budget VK ${Date.now()}`, - }) + test('should create virtual key with multiple budgets', async ({ virtualKeysPage }) => { + const vkData = createVirtualKeyWithBudget( + [SAMPLE_BUDGETS.daily, SAMPLE_BUDGETS.everyMinute], + { name: `Multi Budget VK ${Date.now()}` } + ) createdVKs.push(vkData.name) await virtualKeysPage.createVirtualKey(vkData) const vkExists = await virtualKeysPage.virtualKeyExists(vkData.name) expect(vkExists).toBe(true) + + await virtualKeysPage.viewVirtualKey(vkData.name) + await virtualKeysPage.waitForSheetAnimation() + await expect(virtualKeysPage.page.getByTestId('vk-budget-lines-amount-0')).toHaveValue(String(SAMPLE_BUDGETS.daily.maxLimit)) + await expect(virtualKeysPage.page.getByTestId('vk-budget-lines-amount-1')).toHaveValue(String(SAMPLE_BUDGETS.everyMinute.maxLimit)) + await virtualKeysPage.closeSheet() }) }) @@ -213,7 +227,7 @@ test.describe('Virtual Keys', () => { name: `Full Config VK ${Date.now()}`, description: 'Virtual key with all configurations', isActive: true, - budget: SAMPLE_BUDGETS.medium, + budgets: [SAMPLE_BUDGETS.medium], rateLimit: SAMPLE_RATE_LIMITS.moderate, }) @@ -455,8 +469,9 @@ test.describe('Form Validation', () => { // Fill name (required field) await virtualKeysPage.nameInput.fill(`Valid Budget Test ${Date.now()}`) - // Fill budget - const budgetInput = virtualKeysPage.page.locator('#budgetMaxLimit') + // Add a budget line and fill amount + await virtualKeysPage.page.getByTestId('vk-budget-lines-add-btn').click() + const budgetInput = virtualKeysPage.page.getByTestId('vk-budget-lines-amount-0') await expect(budgetInput).toBeVisible({ timeout: 5000 }) await budgetInput.fill('100') @@ -539,7 +554,7 @@ test.describe('Provider Management', () => { const vkName = `Provider Budget VK ${Date.now()}` const vkData = createVirtualKeyWithProvider('openai', { name: vkName, - budget: SAMPLE_BUDGETS.small, + budgets: [SAMPLE_BUDGETS.small], }) providerVKs.push(vkName) @@ -547,7 +562,7 @@ test.describe('Provider Management', () => { // Edit the virtual key await virtualKeysPage.editVirtualKey(vkName, { - budget: SAMPLE_BUDGETS.large, + budgets: [SAMPLE_BUDGETS.large], }) // Verify it still exists diff --git a/tests/e2e/global-setup.ts b/tests/e2e/global-setup.ts index 823dfab7f5..09bc827dc7 100644 --- a/tests/e2e/global-setup.ts +++ b/tests/e2e/global-setup.ts @@ -315,8 +315,80 @@ async function runMCPSetup(): Promise { console.log('✓ STDIO server already built') } + // Build and start auth-demo-server on port 3002 + try { + const authServerBinaryName = isWindows ? 'auth-demo-server.exe' : 'auth-demo-server' + const authServerDir = join(REPO_ROOT, 'examples', 'mcps', 'auth-demo-server') + const authServerBinary = join(authServerDir, authServerBinaryName) + const authServerExec = isWindows ? authServerBinaryName : './auth-demo-server' + + if (!existsSync(authServerBinary)) { + console.log('Building auth-demo-server...') + runCommand(goCommand, ['build', '-o', authServerBinaryName, 'main.go'], { + cwd: authServerDir, + env: { ...process.env, CGO_ENABLED: '0' }, + }) + } else { + console.log('✓ auth-demo-server binary already exists') + } + + console.log('Starting auth-demo-server on port 3002...') + const authServer = spawn(authServerExec, [], { + cwd: authServerDir, + detached: true, + stdio: ['ignore', 'pipe', 'pipe'], + }) + authServer.stdout?.on('data', (data) => console.log(`[Auth Server] ${data.toString().trim()}`)) + authServer.stderr?.on('data', (data) => console.error(`[Auth Server Error] ${data.toString().trim()}`)) + if (authServer.pid) { + authServer.unref() + MCP_SERVERS.push(authServer) + await setTimeout(1000) + console.log('✓ auth-demo-server started on http://localhost:3002/') + } + } catch (err) { + console.warn(`⚠️ Failed to start auth-demo-server (header auth tests may skip): ${(err as Error).message}`) + } + + // Build and start oauth-demo-server on port 3003 + try { + const oauthServerBinaryName = isWindows ? 'oauth-demo-server.exe' : 'oauth-demo-server' + const oauthServerDir = join(REPO_ROOT, 'examples', 'mcps', 'oauth-demo-server') + const oauthServerBinary = join(oauthServerDir, oauthServerBinaryName) + const oauthServerExec = isWindows ? oauthServerBinaryName : './oauth-demo-server' + + if (!existsSync(oauthServerBinary)) { + console.log('Building oauth-demo-server...') + runCommand(goCommand, ['build', '-o', oauthServerBinaryName, 'main.go'], { + cwd: oauthServerDir, + env: { ...process.env, CGO_ENABLED: '0' }, + }) + } else { + console.log('✓ oauth-demo-server binary already exists') + } + + console.log('Starting oauth-demo-server on port 3003...') + const oauthServer = spawn(oauthServerExec, [], { + cwd: oauthServerDir, + detached: true, + stdio: ['ignore', 'pipe', 'pipe'], + }) + oauthServer.stdout?.on('data', (data) => console.log(`[OAuth Server] ${data.toString().trim()}`)) + oauthServer.stderr?.on('data', (data) => console.error(`[OAuth Server Error] ${data.toString().trim()}`)) + if (oauthServer.pid) { + oauthServer.unref() + MCP_SERVERS.push(oauthServer) + await setTimeout(1000) + console.log('✓ oauth-demo-server started on http://localhost:3003/') + } + } catch (err) { + console.warn(`⚠️ Failed to start oauth-demo-server (OAuth tests may fail): ${(err as Error).message}`) + } + console.log('✓ MCP servers ready') console.log(' - HTTP/SSE server: http://localhost:3001/') + console.log(' - Auth demo server: http://localhost:3002/') + console.log(' - OAuth demo server: http://localhost:3003/') console.log(' - STDIO server: test-tools-server/dist/index.js') } @@ -423,4 +495,4 @@ async function globalSetup(): Promise<() => Promise> { } } -export default globalSetup +export default globalSetup \ No newline at end of file diff --git a/tests/e2e/playwright.config.ts b/tests/e2e/playwright.config.ts index 020a237d90..9a84ab8178 100644 --- a/tests/e2e/playwright.config.ts +++ b/tests/e2e/playwright.config.ts @@ -1,12 +1,48 @@ -import { defineConfig, devices } from '@playwright/test' +import { defineConfig, devices, type PlaywrightTestConfig } from '@playwright/test' +import { existsSync } from 'fs' +import { resolve } from 'path' + +const enterpriseFeaturesDir = resolve(__dirname, '../../../bifrost-enterprise/e2e/features') +const includeEnterprise = process.env.BIFROST_E2E_INCLUDE_ENTERPRISE === '1' && existsSync(enterpriseFeaturesDir) + +const projects: NonNullable = [ + { + name: 'chromium', + testDir: './features', + use: { ...devices['Desktop Chrome'] }, + testIgnore: ['**/config/**', '**/plugins/**', '**/virtual-keys/**', '**/mcp-registry/**', '**/model-limits/**', '**/providers/**'], + }, + { + name: 'chromium-serial', + testDir: './features', + use: { ...devices['Desktop Chrome'] }, + testMatch: ['**/plugins/**/*.spec.ts', '**/virtual-keys/**/*.spec.ts', '**/mcp-registry/**/*.spec.ts', '**/model-limits/**/*.spec.ts', '**/providers/**/*.spec.ts'], + fullyParallel: false, + }, + { + name: 'chromium-config', + testDir: './features', + use: { ...devices['Desktop Chrome'] }, + testMatch: ['**/config/**/*.spec.ts'], + dependencies: ['chromium', 'chromium-serial'], + }, +] + +if (includeEnterprise) { + projects.push({ + name: 'chromium-enterprise', + testDir: enterpriseFeaturesDir, + use: { ...devices['Desktop Chrome'] }, + testMatch: ['**/*.spec.ts'], + }) +} /** * Playwright configuration for Bifrost E2E tests * @see https://playwright.dev/docs/test-configuration */ export default defineConfig({ - // Look for test files in the features directory - testDir: './features', + testDir: '.', // Run tests in files in parallel fullyParallel: true, @@ -53,6 +89,9 @@ export default defineConfig({ // Timeout for navigation navigationTimeout: 30000, + + // Grant clipboard permissions so copy-to-clipboard tests work on localhost + permissions: ['clipboard-read', 'clipboard-write'], }, // Global timeout for each test @@ -63,35 +102,8 @@ export default defineConfig({ timeout: 10000, }, - // Configure projects: run all tests first, then config last (via dependency order) - projects: [ - { - name: 'chromium', - use: { ...devices['Desktop Chrome'] }, - testIgnore: ['**/config/**', '**/plugins/**', '**/virtual-keys/**', '**/mcp-registry/**', '**/model-limits/**', '**/providers/**'], - }, - { - name: 'chromium-serial', - use: { ...devices['Desktop Chrome'] }, - testMatch: ['**/plugins/**/*.spec.ts', '**/virtual-keys/**/*.spec.ts', '**/mcp-registry/**/*.spec.ts', '**/model-limits/**/*.spec.ts', '**/providers/**/*.spec.ts'], - fullyParallel: false, - }, - { - name: 'chromium-config', - use: { ...devices['Desktop Chrome'] }, - testMatch: ['**/config/**/*.spec.ts'], - dependencies: ['chromium', 'chromium-serial'], - }, - // Uncomment for additional browser testing - // { - // name: 'firefox', - // use: { ...devices['Desktop Firefox'] }, - // }, - // { - // name: 'webkit', - // use: { ...devices['Desktop Safari'] }, - // }, - ], + // Configure projects: run all tests first, then config last (via dependency order). + projects, // Run local dev server before starting tests // Set SKIP_WEB_SERVER=1 to skip auto-starting the dev server diff --git a/tests/integrations/python/pyproject.toml b/tests/integrations/python/pyproject.toml index 8d49b81278..4b5c0f9662 100644 --- a/tests/integrations/python/pyproject.toml +++ b/tests/integrations/python/pyproject.toml @@ -123,4 +123,4 @@ exclude_lines = [ [tool.uv] -exclude-newer = "2026-04-08" \ No newline at end of file +exclude-newer = "2026-05-08T00:00:00Z" diff --git a/tests/integrations/python/tests/test_bedrock.py b/tests/integrations/python/tests/test_bedrock.py index 8640390e33..fbe0a1f055 100644 --- a/tests/integrations/python/tests/test_bedrock.py +++ b/tests/integrations/python/tests/test_bedrock.py @@ -46,6 +46,12 @@ 27. Count tokens from long text - Cross-provider 28. Count tokens from multi-turn conversation - Cross-provider +Nova System Tools Tests (TestNovaSystemTools): +50. nova_grounding non-streaming (converse) +51. nova_grounding streaming (converse-stream) +52. nova_code_interpreter non-streaming (converse) +53. nova_code_interpreter streaming (converse-stream) + Invoke Endpoint — Image Generation Tests (TestBedrockInvokeEndpoint): 29. Titan image generation via invoke (taskType=TEXT_IMAGE) 30. Titan embeddings via invoke (inputText) @@ -2893,3 +2899,411 @@ def test_49_invoke_stream_anthropic_messages(self, bedrock_client): full_text = "".join(text_parts) assert full_text, f"Expected non-empty streamed text, got: {full_text!r}" print(f" ✓ event_types={event_types}, text={full_text[:60]!r}") + + +# --------------------------------------------------------------------------- +# Nova System Tools Tests (nova_grounding and nova_code_interpreter) +# --------------------------------------------------------------------------- +# These tests exercise Bedrock Nova system tools through the Bifrost converse +# and converse-stream paths. Nova system tools are AWS-managed: the model +# invokes them automatically (no client-side tool execution required). +# +# nova_grounding → maps to web_search in Bifrost neutral schema +# nova_code_interpreter → maps to code_interpreter in Bifrost neutral schema +# --------------------------------------------------------------------------- + + +class TestNovaSystemTools: + """ + Tests for Amazon Nova system tools via Bedrock Converse and Converse-Stream. + + Both tools are server-managed by AWS — the model calls them and AWS executes + them automatically in the same response. No client-side tool loop is needed. + + 50. nova_grounding non-streaming + 51. nova_grounding streaming + 52. nova_code_interpreter non-streaming + 53. nova_code_interpreter streaming + """ + + NOVA_MODEL = "us.amazon.nova-2-lite-v1:0" + + # ------------------------------------------------------------------ # + # 50. nova_grounding — non-streaming # + # ------------------------------------------------------------------ # + @skip_if_no_api_key("bedrock") + def test_50_nova_grounding_non_streaming(self, bedrock_client): + """Test Case 50: nova_grounding system tool via Bedrock Converse (non-streaming). + + Sends a converse request with systemTool nova_grounding enabled. The model + automatically searches the web and returns a grounded text response. Bifrost + maps nova_grounding → web_search in the neutral schema and converts back. + """ + print("\n=== Test 50: nova_grounding via converse (non-streaming) ===") + + tool_config = { + "tools": [ + {"systemTool": {"name": "nova_grounding"}} + ] + } + + try: + response = bedrock_client.converse( + modelId=self.NOVA_MODEL, + messages=[ + { + "role": "user", + "content": [ + { + "text": ( + "Use web search to find a brief description of the Eiffel Tower " + "and tell me when it was built." + ) + } + ], + } + ], + toolConfig=tool_config, + inferenceConfig={"maxTokens": 500}, + ) + except Exception as e: + err_str = str(e).lower() + if "validation" in err_str or "unknown" in err_str or "not supported" in err_str: + pytest.skip(f"nova_grounding not available or schema rejected: {e}") + raise + + assert "output" in response, f"Expected 'output' in response, got: {list(response.keys())}" + msg = response["output"].get("message", {}) + assert msg.get("role") == "assistant", f"Expected role='assistant', got: {msg.get('role')}" + + content_blocks = msg.get("content", []) + assert isinstance(content_blocks, list) and len(content_blocks) > 0, ( + f"Expected non-empty content blocks, got: {content_blocks}" + ) + + # nova_grounding returns multiple content blocks: empty text, toolUse, + # toolResult, then the actual grounded text (possibly split across blocks). + # Collect all non-empty text across every block. + full_text = " ".join(b["text"] for b in content_blocks if b.get("text", "").strip()) + assert full_text, ( + f"Expected non-empty text in grounding response, got: {content_blocks}" + ) + + # nova_grounding should produce text about the Eiffel Tower + assert any(kw in full_text.lower() for kw in ["eiffel", "paris", "tower", "france", "1889"]), ( + f"Expected Eiffel Tower info in response, got: {full_text[:200]}" + ) + + stop_reason = response.get("stopReason", "") + print(stop_reason) + assert stop_reason in ("end_turn", "max_tokens"), ( + f"Unexpected stopReason: {stop_reason}" + ) + print(f" ✓ stopReason={stop_reason!r}, text={full_text[:80]!r}") + + # ------------------------------------------------------------------ # + # 51. nova_grounding — streaming # + # ------------------------------------------------------------------ # + @skip_if_no_api_key("bedrock") + def test_51_nova_grounding_streaming(self, bedrock_client): + """Test Case 51: nova_grounding system tool via Bedrock Converse-Stream. + + Per AWS docs, nova_grounding streaming produces citation deltas inline within + the text stream (no separate contentBlockStart for the tool block): + messageStart + contentBlockStart (text block) + contentBlockDelta { delta: { citation: { location: { web: { url, domain } } } } } (0-N) + contentBlockDelta { delta: { text: "..." } } (1-N) + contentBlockStop + messageStop + + Bifrost must reproduce these citation deltas as contentBlockDelta.citation events. + The query asks for real-time information to ensure the model uses grounding. + """ + print("\n=== Test 51: nova_grounding via converse-stream (streaming) ===") + + tool_config = { + "tools": [ + {"systemTool": {"name": "nova_grounding"}} + ] + } + + try: + response_stream = bedrock_client.converse_stream( + modelId=self.NOVA_MODEL, + messages=[ + { + "role": "user", + "content": [ + { + # Use a real-time query so the model actually invokes grounding + "text": ( + "Search the web and tell me today's date and one current headline. " + "You must use web search." + ) + } + ], + } + ], + toolConfig=tool_config, + inferenceConfig={"maxTokens": 500}, + ) + except AttributeError: + pytest.skip("converse_stream not available in this boto3 version") + except Exception as e: + err_str = str(e).lower() + if "validation" in err_str or "unknown" in err_str or "not supported" in err_str: + pytest.skip(f"nova_grounding streaming not available: {e}") + raise + + stream = response_stream.get("stream") + if stream is None: + stream = response_stream.get("eventStream") + assert stream is not None, "Response missing 'stream' or 'eventStream'" + + citation_urls = [] # contentBlockDelta.citation events + text_parts = [] # contentBlockDelta.text events + got_message_stop = False + start_time = time.time() + timeout = 60 + + for event in stream: + print(event) + if time.time() - start_time > timeout: + pytest.fail(f"Streaming timed out after {timeout}s") + + if "contentBlockDelta" in event: + delta = event["contentBlockDelta"].get("delta", {}) + if "text" in delta and delta["text"]: + text_parts.append(delta["text"]) + elif "citation" in delta: + # Citation delta produced by nova_grounding: { citation: { location: { web: { url, domain } } } } + web = delta["citation"].get("location", {}).get("web", {}) + if web.get("url"): + citation_urls.append(web["url"]) + + elif "messageStop" in event: + got_message_stop = True + + assert got_message_stop, "Expected 'messageStop' event" + + full_text = "".join(text_parts) + assert full_text, "Expected non-empty streamed text from nova_grounding response" + + # Grounding must produce citation deltas alongside the text + assert len(citation_urls) > 0, ( + f"Expected at least one contentBlockDelta.citation event — " + f"nova_grounding must emit citation deltas that Bifrost preserves as " + f"contentBlockDelta.citation on the converse-stream route. " + f"text_parts={len(text_parts)}, text={full_text[:100]!r}" + ) + + print( + f" ✓ {len(citation_urls)} citation(s), {len(text_parts)} text delta(s), " + f"text={full_text[:80]!r}" + ) + + # ------------------------------------------------------------------ # + # 52. nova_code_interpreter — non-streaming # + # ------------------------------------------------------------------ # + @skip_if_no_api_key("bedrock") + def test_52_nova_code_interpreter_non_streaming(self, bedrock_client): + """Test Case 52: nova_code_interpreter system tool via Bedrock Converse (non-streaming). + + AWS Bedrock executes the generated code automatically and returns both the + toolUse (code) and toolResult (stdout/stderr) in the same assistant message. + Bifrost merges these into a code_interpreter_call output item and converts + back to Bedrock format, producing a text explanation of the result. + """ + print("\n=== Test 52: nova_code_interpreter via converse (non-streaming) ===") + + tool_config = { + "tools": [ + {"systemTool": {"name": "nova_code_interpreter"}} + ] + } + + try: + response = bedrock_client.converse( + modelId=self.NOVA_MODEL, + messages=[ + { + "role": "user", + "content": [ + { + "text": ( + "Write and execute Python code to calculate the factorial of 10 " + "and print the result." + ) + } + ], + } + ], + toolConfig=tool_config, + inferenceConfig={"maxTokens": 500}, + ) + except Exception as e: + err_str = str(e).lower() + if "validation" in err_str or "unknown" in err_str or "not supported" in err_str: + pytest.skip(f"nova_code_interpreter not available or schema rejected: {e}") + raise + + assert "output" in response, f"Expected 'output' in response, got: {list(response.keys())}" + msg = response["output"].get("message", {}) + assert msg.get("role") == "assistant", f"Expected role='assistant', got: {msg.get('role')}" + + content_blocks = msg.get("content", []) + assert isinstance(content_blocks, list) and len(content_blocks) > 0, ( + f"Expected non-empty content blocks, got: {content_blocks}" + ) + + # nova_code_interpreter returns: empty text, toolUse (code), toolResult + # (stdout), then the model's explanation — possibly split across blocks. + # Collect all non-empty text and verify the factorial result appears. + has_tool_use = any("toolUse" in b for b in content_blocks) + has_tool_result = any("toolResult" in b for b in content_blocks) + full_text = " ".join(b["text"] for b in content_blocks if b.get("text", "").strip()) + + assert has_tool_use, f"Expected toolUse block, got: {content_blocks}" + assert full_text or has_tool_result, ( + f"Expected execution result (toolResult) or explanatory text, got: {content_blocks}" + ) + + # The combined text should mention the factorial result or the computation + if full_text: + assert any(kw in full_text.lower() for kw in ["3628800", "factorial", "result", "10", "code"]), ( + f"Expected factorial-related text, got: {full_text[:200]}" + ) + + stop_reason = response.get("stopReason", "") + assert stop_reason in ("end_turn", "max_tokens"), ( + f"Unexpected stopReason: {stop_reason}" + ) + print(f" ✓ stopReason={stop_reason!r}, content_blocks={len(content_blocks)}") + if full_text: + print(f" ✓ text={full_text[:80]!r}") + + # ------------------------------------------------------------------ # + # 53. nova_code_interpreter — streaming # + # ------------------------------------------------------------------ # + @skip_if_no_api_key("bedrock") + def test_53_nova_code_interpreter_streaming(self, bedrock_client): + """Test Case 53: nova_code_interpreter system tool via Bedrock Converse-Stream. + + Bedrock streaming for nova_code_interpreter produces (per AWS docs): + messageStart + contentBlockStart { start: { toolUse: { name: "nova_code_interpreter", ... } } } + contentBlockDelta { delta: { toolUse: { input: '{"snippet":"..."}' } } } (1-N) + contentBlockStop + contentBlockStart (text block) + contentBlockDelta { delta: { text: "..." } } (1-N) + contentBlockStop + messageStop + + Each toolUse delta is a complete JSON object {"snippet":""}. + Bifrost must reproduce this exact shape on the converse-stream route. + """ + print("\n=== Test 53: nova_code_interpreter via converse-stream (streaming) ===") + + tool_config = { + "tools": [ + {"systemTool": {"name": "nova_code_interpreter"}} + ] + } + + try: + response_stream = bedrock_client.converse_stream( + modelId=self.NOVA_MODEL, + messages=[ + { + "role": "user", + "content": [ + { + "text": ( + "Write and run Python code to compute 2 raised to the power of 20." + ) + } + ], + } + ], + toolConfig=tool_config, + inferenceConfig={"maxTokens": 500}, + ) + except AttributeError: + pytest.skip("converse_stream not available in this boto3 version") + except Exception as e: + err_str = str(e).lower() + if "validation" in err_str or "unknown" in err_str or "not supported" in err_str: + pytest.skip(f"nova_code_interpreter streaming not available: {e}") + raise + + stream = response_stream.get("stream") + if stream is None: + stream = response_stream.get("eventStream") + assert stream is not None, "Response missing 'stream' or 'eventStream'" + + has_code_interpreter_block_start = False # contentBlockStart with nova_code_interpreter + code_snippets = [] # parsed snippet values from toolUse deltas + text_parts = [] + got_message_stop = False + current_block_start = None + start_time = time.time() + timeout = 60 + + for event in stream: + if time.time() - start_time > timeout: + pytest.fail(f"Streaming timed out after {timeout}s") + + if "messageStart" in event: + pass + + elif "contentBlockStart" in event: + current_block_start = event["contentBlockStart"].get("start", {}) + tool_use = current_block_start.get("toolUse", {}) + if tool_use.get("name") == "nova_code_interpreter": + has_code_interpreter_block_start = True + + elif "contentBlockStop" in event: + current_block_start = None + + elif "contentBlockDelta" in event: + delta = event["contentBlockDelta"].get("delta", {}) + if "text" in delta and delta["text"]: + text_parts.append(delta["text"]) + elif "toolUse" in delta: + raw_input = delta["toolUse"].get("input", "") + if raw_input: + # Each delta is a complete JSON object: {"snippet": ""} + try: + parsed = json.loads(raw_input) + snippet = parsed.get("snippet", "") + if snippet: + code_snippets.append(snippet) + except json.JSONDecodeError: + pass # Unexpected — deltas should be complete JSON + + elif "messageStop" in event: + got_message_stop = True + + assert got_message_stop, "Expected 'messageStop' event" + + # Bedrock contract: a contentBlockStart for nova_code_interpreter MUST appear + assert has_code_interpreter_block_start, ( + "Expected contentBlockStart with toolUse.name='nova_code_interpreter' — " + "Bifrost must emit this event for the nova_code_interpreter block" + ) + + # nova_code_interpreter must stream at least one toolUse delta with a snippet + assert len(code_snippets) > 0, ( + "Expected at least one contentBlockDelta.toolUse.input with a 'snippet' field " + "from nova_code_interpreter — Bifrost must emit these code deltas" + ) + + full_code = "".join(code_snippets) + assert full_code.strip(), "Expected non-empty code snippet from nova_code_interpreter" + + full_text = "".join(text_parts) + print( + f" ✓ code_interpreter block started, {len(code_snippets)} code delta(s), " + f"code={full_code[:60]!r}, text={full_text[:60]!r}" + ) diff --git a/tests/integrations/python/uv.lock b/tests/integrations/python/uv.lock index dbb9a8254d..58803d2464 100644 --- a/tests/integrations/python/uv.lock +++ b/tests/integrations/python/uv.lock @@ -217,14 +217,14 @@ wheels = [ [[package]] name = "authlib" -version = "1.6.11" +version = "1.6.12" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "cryptography" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/28/10/b325d58ffe86815b399334a101e63bc6fa4e1953921cb23703b48a0a0220/authlib-1.6.11.tar.gz", hash = "sha256:64db35b9b01aeccb4715a6c9a6613a06f2bd7be2ab9d2eb89edd1dfc7580a38f", size = 165359 } +sdist = { url = "https://files.pythonhosted.org/packages/d3/30/6691fdc63b35f54a5a65e04fa1e59d827f4d4e8f4a39678ba7d3088ce0c8/authlib-1.6.12.tar.gz", hash = "sha256:0656d8482f28fc8221929d5f35b2bde5d13e10555ebc06b4561b0d622e83b1bd", size = 165368 } wheels = [ - { url = "https://files.pythonhosted.org/packages/57/2f/55fca558f925a51db046e5b929deb317ddb05afed74b22d89f4eca578980/authlib-1.6.11-py2.py3-none-any.whl", hash = "sha256:c8687a9a26451c51a34a06fa17bb97cb15bba46a6a626755e2d7f50da8bff3e3", size = 244469 }, + { url = "https://files.pythonhosted.org/packages/cd/51/9b0b5cd4cf683a02db937a6f9bbebcdc9c56558a7bb3763ce7d3512103c3/authlib-1.6.12-py2.py3-none-any.whl", hash = "sha256:e9229ad7fde610b139dd12f5edbe97eab9ee78bfb85691247e767727850b99ab", size = 244473 }, ] [[package]] @@ -2091,7 +2091,7 @@ wheels = [ [[package]] name = "langchain-classic" -version = "1.0.0" +version = "1.0.7" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "langchain-core" }, @@ -2102,9 +2102,9 @@ dependencies = [ { name = "requests" }, { name = "sqlalchemy" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/d9/b1/a66babeccb2c05ed89690a534296688c0349bee7a71641e91ecc2afd72fd/langchain_classic-1.0.0.tar.gz", hash = "sha256:a63655609254ebc36d660eb5ad7c06c778b2e6733c615ffdac3eac4fbe2b12c5", size = 10514930 } +sdist = { url = "https://files.pythonhosted.org/packages/9b/78/84b5065816f348c39fefa4316f209f0135e8410216340a953bec17d9e4e4/langchain_classic-1.0.7.tar.gz", hash = "sha256:debbec8065e69b95108d2652e8d5c44f4516e19aa8d716c02ed2211c3aee099d", size = 10554118 } wheels = [ - { url = "https://files.pythonhosted.org/packages/74/74/246f809a3741c21982f985ca0113ec92d3c84896308561cc4414823f6951/langchain_classic-1.0.0-py3-none-any.whl", hash = "sha256:97f71f150c10123f5511c08873f030e35ede52311d729a7688c721b4e1e01f33", size = 1040701 }, + { url = "https://files.pythonhosted.org/packages/f5/78/2d9980d028ff0523eea503a77c200e2ff252a3a75eb6e7842bcf5f9c979b/langchain_classic-1.0.7-py3-none-any.whl", hash = "sha256:d9d9be38f7aa534ed0259c2410432e34a1f80b1d491e686749bb55af56479be3", size = 1041386 }, ] [[package]] @@ -2307,7 +2307,7 @@ wheels = [ [[package]] name = "langsmith" -version = "0.7.32" +version = "0.8.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "httpx" }, @@ -2320,9 +2320,9 @@ dependencies = [ { name = "xxhash" }, { name = "zstandard" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/2f/b4/a0b4a501bee6b8a741ce29f8c48155b132118483cddc6f9247735ddb38fa/langsmith-0.7.32.tar.gz", hash = "sha256:b59b8e106d0e4c4842e158229296086e2aa7c561e3f602acda73d3ad0062e915", size = 1184518 } +sdist = { url = "https://files.pythonhosted.org/packages/a8/64/95f1f013531395f4e8ed73caeee780f65c7c58fe028cb543f8937b45611b/langsmith-0.8.0.tar.gz", hash = "sha256:59fe5b2a56bbbe14a08aa76691f84b49e8675dd21e11b57d80c6db8c08bac2e3", size = 4432996 } wheels = [ - { url = "https://files.pythonhosted.org/packages/62/bc/148f98ac7dad73ac5e1b1c985290079cfeeb9ba13d760a24f25002beb2c9/langsmith-0.7.32-py3-none-any.whl", hash = "sha256:e1fde928990c4c52f47dc5132708cec674355d9101723d564183e965f383bf5f", size = 378272 }, + { url = "https://files.pythonhosted.org/packages/f3/e1/a4be2e696c9473bb53298df398237da5674704d781d4b748ed35aeef592a/langsmith-0.8.0-py3-none-any.whl", hash = "sha256:12cc4bc5622b835a6d841964d6034df3617bdb912dae0c1381fd0a68a9b3a3ef", size = 393268 }, ] [[package]] diff --git a/tests/integrations/typescript/package-lock.json b/tests/integrations/typescript/package-lock.json index daa9896cbd..c254ff97d6 100644 --- a/tests/integrations/typescript/package-lock.json +++ b/tests/integrations/typescript/package-lock.json @@ -16,7 +16,7 @@ "@langchain/core": "^1.1.39", "@langchain/google-genai": "^2.1.26", "@langchain/openai": "^1.4.4", - "langsmith": "^0.5.19", + "langsmith": "0.6.0", "openai": "^6.15.0", "yaml": "^2.6.0", "zod": "^3.24.0" @@ -4225,13 +4225,12 @@ } }, "node_modules/langsmith": { - "version": "0.5.19", - "resolved": "https://registry.npmjs.org/langsmith/-/langsmith-0.5.19.tgz", - "integrity": "sha512-5tFoETuFMvGkbPGsINNlIE4Ab86CsPhdPOQZCGwNt/NX0h5NDKQLKOWS/G2XcRUBOQl4mCNbrayUvUTWaIRsCg==", + "version": "0.6.0", + "resolved": "https://registry.npmjs.org/langsmith/-/langsmith-0.6.0.tgz", + "integrity": "sha512-GGaj5IMRfLv2HXXFzGk9diISMYLTpSTh6fzCZGKxWYW/NqEztIFtnXLq6G/RVhzFRmCykLap1fuC67LVKoQLcg==", "license": "MIT", "dependencies": { - "p-queue": "6.6.2", - "uuid": "10.0.0" + "p-queue": "6.6.2" }, "peerDependencies": { "@opentelemetry/api": "*", @@ -4258,19 +4257,6 @@ } } }, - "node_modules/langsmith/node_modules/uuid": { - "version": "10.0.0", - "resolved": "https://registry.npmjs.org/uuid/-/uuid-10.0.0.tgz", - "integrity": "sha512-8XkAphELsDnEGrDxUOHB3RGvXz6TeuYSGEZBOjtTtPm2lwhGBjLgOzLHB63IUWfBpNucQjND6d3AOudO+H3RWQ==", - "funding": [ - "https://github.com/sponsors/broofa", - "https://github.com/sponsors/ctavan" - ], - "license": "MIT", - "bin": { - "uuid": "dist/bin/uuid" - } - }, "node_modules/levn": { "version": "0.4.1", "resolved": "https://registry.npmjs.org/levn/-/levn-0.4.1.tgz", diff --git a/tests/integrations/typescript/package.json b/tests/integrations/typescript/package.json index f88fce6a8f..7b6705e0e4 100644 --- a/tests/integrations/typescript/package.json +++ b/tests/integrations/typescript/package.json @@ -31,7 +31,7 @@ "@langchain/core": "1.1.39", "@langchain/google-genai": "2.1.26", "@langchain/openai": "1.4.4", - "langsmith": "0.5.19", + "langsmith": "0.6.0", "openai": "6.15.0", "yaml": "2.6.0", "zod": "3.24.0" diff --git a/tests/semanticcache/.gitignore b/tests/semanticcache/.gitignore new file mode 100644 index 0000000000..6c7f6431d4 --- /dev/null +++ b/tests/semanticcache/.gitignore @@ -0,0 +1,2 @@ +reports/ +*.log diff --git a/tests/semanticcache/README.md b/tests/semanticcache/README.md new file mode 100644 index 0000000000..3a4d087984 --- /dev/null +++ b/tests/semanticcache/README.md @@ -0,0 +1,77 @@ +# Semantic Cache E2E + +End-to-end test suite for the `semantic_cache` plugin. See `PLAN.md` for the full case list. + +## Prerequisites + +The suite assumes a properly-provisioned test environment — it verifies but does not provision. + +- **Bifrost running** at `BIFROST_URL` (default `http://localhost:8080`). Required endpoints: `/api/plugins/*`, `/api/cache/*`, `/api/providers`, `/api/logs/{id}`, `/v1/chat/completions`. +- **Vector store** configured in `config.json`, type **`weaviate`**, reachable from Bifrost. The plugin will create/use namespace `BifrostSemanticCachePluginE2E` by default (override via `SC_NAMESPACE`). +- **Providers configured with API keys**: + - **OpenAI** — required. Must have a chat model (default `openai/gpt-4o-mini`), an alternate chat model used in cache-by-model cases (default `openai/gpt-4o`), and the embedding model `text-embedding-3-small` (used in Phase 2). + - **Gemini** — optional. When absent, cross-provider cases are skipped with a `WARN` in `0.3_optional_providers`. Chat model: default `gemini/gemini-2.5-flash`. + - **Anthropic** — optional. Same behavior as Gemini: absence skips cross-provider cases instead of aborting. Chat model: default `anthropic/claude-haiku-4-5`. +- **`semantic_cache` plugin must be ABSENT** at run start. Set `RUN_FORCE=1` to auto-delete a pre-existing row before the run. + +## Running + +```bash +# All phases (recommended) +GOWORK=off go test -v ./... + +# Single phase +GOWORK=off go test -v -run TestPhase1_DirectOnly ./... + +# Single case +GOWORK=off go test -v -run TestPhase1_DirectOnly/1.1_exact_match_chat ./... + +# Auto-delete pre-existing plugin row +RUN_FORCE=1 GOWORK=off go test -v ./... + +# Keep the plugin around for post-mortem +RUN_KEEP_PLUGIN=1 GOWORK=off go test -v ./... +``` + +`GOWORK=off` is required because this module isn't in the repo's `go.work` (test modules under `tests/*` follow the same pattern as `tests/governance` — standalone). + +## Env vars + +| var | default | purpose | +| --- | --- | --- | +| `BIFROST_URL` | `http://localhost:8080` | Bifrost base URL | +| `SC_CHAT_MODEL_OPENAI` | `openai/gpt-4o-mini` | OpenAI chat model used in cases | +| `SC_CHAT_MODEL_OPENAI_ALT` | `openai/gpt-4o` | second OpenAI chat model for cache-by-model cases | +| `SC_EMBED_MODEL_OPENAI` | `text-embedding-3-small` | embedding model for Phase 2 | +| `SC_CHAT_MODEL_GEMINI` | `gemini/gemini-2.5-flash` | Gemini chat model | +| `SC_CHAT_MODEL_ANTHROPIC` | `anthropic/claude-haiku-4-5` | Anthropic chat model | +| `SC_NAMESPACE` | `BifrostSemanticCachePluginE2E` | vector store namespace (isolates test data from prod) | +| `RUN_FORCE` | unset | `1` → delete pre-existing plugin row before run | +| `RUN_KEEP_PLUGIN` | unset | `1` → skip teardown DELETE on exit | +| `TRAIL_SESSION_ID` | unset | stamped onto every log line when running under `trail` | + +## Trail integration + +Start Bifrost under `trail`, capture the session id, export it, then run: + +```bash +trail run --label semantic-cache-e2e -- ./bifrost-http -port 8080 -config config.json +# capture the printed session id, then in another shell: +export TRAIL_SESSION_ID= +RUN_FORCE=1 GOWORK=off go test -v ./... +``` + +Every log line carries `trail_sid=`, so a single `trail get_logs` call with that session id reconstructs both the test harness output and the Bifrost stdout for the run. + +## Output + +Each run writes to `reports//`: +- `run.log` — one structured line per step (mirrors `t.Logf` output) +- `p--s.req.json` / `.resp.json` — full request/response bodies for forensics +- `*.plugin_create.req.json` / `.plugin_update.req.json` — exact wire bodies sent to `/api/plugins` (for parity audit against the UI) + +On any FAIL the matching `*.resp.json` and `run.log` line carry enough info to grep via `trail` (look for `bifrost_req_id=` or `[SC-E2E] case=`). + +## What's implemented so far + +Skeleton + Phase 0 preconditions + Phase 1 smallest viable loop (cases 1.1, 1.2, 1.3, 1.13). See `PLAN.md` §11 for the full implementation roadmap. diff --git a/tests/semanticcache/assert_test.go b/tests/semanticcache/assert_test.go new file mode 100644 index 0000000000..4c47e29e8d --- /dev/null +++ b/tests/semanticcache/assert_test.go @@ -0,0 +1,144 @@ +package semanticcache + +import ( + "os" + "strconv" + "testing" + "time" +) + +// cacheWriteSettle is the gap we wait between a miss-and-store request and a +// subsequent expected-hit, since PostLLMHook writes to the vector store in a +// goroutine (main.go:553-569) — the HTTP response returns before the write +// commits. 500ms covers typical Weaviate write latency including first-write +// cold start. Override via SC_WRITE_SETTLE_MS for environments with slower stores. +var cacheWriteSettle = func() time.Duration { + if v := os.Getenv("SC_WRITE_SETTLE_MS"); v != "" { + if ms, err := strconv.Atoi(v); err == nil && ms > 0 { + return time.Duration(ms) * time.Millisecond + } + } + return 500 * time.Millisecond +}() + +// waitForCacheWrite pauses long enough for the plugin's async PostLLMHook +// store write to commit before a follow-up read. Logged so timing is visible +// in run.log. +func waitForCacheWrite(t *testing.T, lc logCtx, step int) { + t.Helper() + logf(t, lc.at(step), "INFO", "wait_for_cache_write", map[string]any{ + "settle_ms": cacheWriteSettle.Milliseconds(), + }) + time.Sleep(cacheWriteSettle) +} + +// cacheDebugged is implemented by any HTTP response type that carries +// `extra_fields.cache_debug`. Lets the assertion helpers work across chat, +// text-completion, responses, embedding, image-gen, etc. without per-type +// duplication. +type cacheDebugged interface { + cacheDebug() *cacheDebug +} + +// assertMiss verifies the response is a cache miss with a non-empty cache_id stamped. +// cache_debug must be present (plugin ran), CacheHit must be false, cache_id must be set. +func assertMiss(t *testing.T, lc logCtx, step int, resp cacheDebugged) string { + t.Helper() + cd := resp.cacheDebug() + if cd == nil { + logf(t, lc.at(step), "FAIL", "assert_miss", map[string]any{"reason": "cache_debug absent"}) + t.Fatalf("expected miss with cache_debug stamped; cache_debug is nil") + } + if cd.CacheHit { + logf(t, lc.at(step), "FAIL", "assert_miss", map[string]any{"cache_hit": true}) + t.Fatalf("expected miss, got cache_hit=true cache_id=%s", deref(cd.CacheID)) + } + if cd.CacheID == nil || *cd.CacheID == "" { + logf(t, lc.at(step), "FAIL", "assert_miss", map[string]any{"reason": "cache_id empty on miss"}) + t.Fatalf("expected cache_id stamped on miss; got nil/empty") + } + logf(t, lc.at(step), "PASS", "assert_miss", map[string]any{"cache_id": *cd.CacheID}) + return *cd.CacheID +} + +// assertHit verifies the response is a cache hit with the expected hit_type. +// Returns the cache_id for further chaining (e.g. same_cache_id checks). +func assertHit(t *testing.T, lc logCtx, step int, resp cacheDebugged, wantType string) string { + t.Helper() + cd := resp.cacheDebug() + if cd == nil { + logf(t, lc.at(step), "FAIL", "assert_hit", map[string]any{"reason": "cache_debug absent"}) + t.Fatalf("expected hit with cache_debug stamped; cache_debug is nil") + } + if !cd.CacheHit { + logf(t, lc.at(step), "FAIL", "assert_hit", map[string]any{"cache_hit": false}) + t.Fatalf("expected hit, got cache_hit=false cache_id=%s", deref(cd.CacheID)) + } + if wantType != "" { + if cd.HitType == nil || *cd.HitType != wantType { + logf(t, lc.at(step), "FAIL", "assert_hit_type", map[string]any{ + "want": wantType, "got": deref(cd.HitType), + }) + t.Fatalf("expected hit_type=%q, got %q", wantType, deref(cd.HitType)) + } + } + if cd.CacheID == nil || *cd.CacheID == "" { + t.Fatalf("expected cache_id stamped on hit; got nil/empty") + } + if cd.CacheHitLatency == nil { + t.Logf("warning: cache_hit_latency not stamped on hit") + } + logf(t, lc.at(step), "PASS", "assert_hit", map[string]any{ + "cache_id": *cd.CacheID, + "hit_type": deref(cd.HitType), + "latency": derefInt64(cd.CacheHitLatency), + }) + return *cd.CacheID +} + +// assertNoCacheDebug verifies the plugin did NOT run (no cache_debug stamped). +// Used for plugin-disabled and skipped-caching cases. +func assertNoCacheDebug(t *testing.T, lc logCtx, step int, resp cacheDebugged) { + t.Helper() + cd := resp.cacheDebug() + if cd != nil { + logf(t, lc.at(step), "FAIL", "assert_no_cache_debug", map[string]any{ + "cache_hit": cd.CacheHit, + "cache_id": deref(cd.CacheID), + }) + t.Fatalf("expected no cache_debug, got cache_hit=%v cache_id=%s", cd.CacheHit, deref(cd.CacheID)) + } + logf(t, lc.at(step), "PASS", "assert_no_cache_debug", nil) +} + +func assertSameCacheID(t *testing.T, lc logCtx, step int, got, want string) { + t.Helper() + if got != want { + logf(t, lc.at(step), "FAIL", "assert_same_cache_id", map[string]any{"want": want, "got": got}) + t.Fatalf("expected same cache_id %q, got %q", want, got) + } + logf(t, lc.at(step), "PASS", "assert_same_cache_id", map[string]any{"cache_id": got}) +} + +func assertDifferentCacheID(t *testing.T, lc logCtx, step int, a, b string) { + t.Helper() + if a == b { + logf(t, lc.at(step), "FAIL", "assert_diff_cache_id", map[string]any{"cache_id": a}) + t.Fatalf("expected different cache_ids, both = %q", a) + } + logf(t, lc.at(step), "PASS", "assert_diff_cache_id", map[string]any{"a": a, "b": b}) +} + +func deref(p *string) string { + if p == nil { + return "" + } + return *p +} + +func derefInt64(p *int64) int64 { + if p == nil { + return 0 + } + return *p +} diff --git a/tests/semanticcache/cache_test.go b/tests/semanticcache/cache_test.go new file mode 100644 index 0000000000..222c74090a --- /dev/null +++ b/tests/semanticcache/cache_test.go @@ -0,0 +1,42 @@ +package semanticcache + +import ( + "net/http" + "net/url" + "testing" +) + +// clearByCacheID hits DELETE /api/cache/clear/{cacheId}. Returns the HTTP +// status code so callers in §3.3-style cases can assert specific contracts. +func clearByCacheID(t *testing.T, lc logCtx, step int, cacheID string) int { + t.Helper() + status, body, _, err := doJSON(t, "DELETE", "/api/cache/clear/"+url.PathEscape(cacheID), nil, nil) + if err != nil { + t.Fatalf("clearByCacheID http error: %v", err) + } + logf(t, lc.at(step), "INFO", "clear_by_id", map[string]any{ + "cache_id": cacheID, + "status": status, + }) + if status != http.StatusOK && status != http.StatusNotFound { + t.Logf("clearByCacheID body: %s", truncate(string(body), 200)) + } + return status +} + +// clearByCacheKey hits DELETE /api/cache/clear-by-key/{cacheKey}. +func clearByCacheKey(t *testing.T, lc logCtx, step int, key string) int { + t.Helper() + status, body, _, err := doJSON(t, "DELETE", "/api/cache/clear-by-key/"+url.PathEscape(key), nil, nil) + if err != nil { + t.Fatalf("clearByCacheKey http error: %v", err) + } + logf(t, lc.at(step), "INFO", "clear_by_key", map[string]any{ + "cache_key": key, + "status": status, + }) + if status != http.StatusOK && status != http.StatusNotFound { + t.Logf("clearByCacheKey body: %s", truncate(string(body), 200)) + } + return status +} diff --git a/tests/semanticcache/direct_test.go b/tests/semanticcache/direct_test.go new file mode 100644 index 0000000000..a7a59591a2 --- /dev/null +++ b/tests/semanticcache/direct_test.go @@ -0,0 +1,1429 @@ +package semanticcache + +import ( + "encoding/json" + "net/http" + "os" + "strings" + "testing" + "time" +) + +// Test image URLs copied from core/internal/llmtests/utils.go so the e2e +// suite uses the same fixtures the rest of the test-suite has validated +// providers against. +const ( + testImageURL1 = "https://pestworldcdn-dcf2a8gbggazaghf.z01.azurefd.net/media/561791/carpenter-ant4.jpg" + testImageURL2 = "https://images.pexels.com/photos/30662605/pexels-photo-30662605/free-photo-of-eiffel-tower-view-from-the-seine-river-in-paris.jpeg" +) + +// TestDirect runs the direct-only caching cases from the plan (1.1–1.55). +// +// Parallelism rules — IMPORTANT for anyone adding new cases: +// +// - Cases that ONLY exercise per-request behavior (different cache keys, +// headers, params, attachments) call `t.Parallel()` at the top of the +// subtest body. Cache keys are unique per case so they never collide. +// +// - Cases that mutate the plugin's CONFIG via `pluginUpdate` (e.g. flipping +// cache_by_model, exclude_system_prompt, default_cache_key) must NOT +// call `t.Parallel()`. They run synchronously inside the parent loop, +// one at a time. Each such case restores the baseline config via +// `t.Cleanup` before returning. +// +// Go's test framework guarantees the order: every `t.Parallel()` subtest +// PAUSES until the parent function reaches its end, then all paused +// subtests unblock and run concurrently. So all 4 mutating cases (1.4, +// 1.6, 1.8, 1.10) execute serially first; the remaining parallel cases +// then fire off together against the restored baseline plugin. +// +// Adding a new mutating case → omit `t.Parallel()` + add a `// Serial:` +// comment so the next person sees the intent. +func TestDirect(t *testing.T) { + lc := newLogCtx("direct", "setup") + logf(t, lc.at(0), "SETUP", "phase_start", map[string]any{ + "mode": "direct-only", + "ttl": ttlDirect, + }) + + // Setup: create the plugin in direct-only mode with a default cache key + // scoped to phase1, so case 1.3 can test the default-key path. Cases that + // mutate config PUT during the case and restore baseline via t.Cleanup. + created := pluginCreate(t, lc, 1, true, directOnlyConfig(ttlDirect, defaultKeyDirect)) + if created.Status.Status != "active" && created.Status.Status != "ready" && created.Status.Status != "Ready" && created.Status.Status != "Initialized" { + t.Logf("note: plugin status=%q (continuing — status field naming may vary)", created.Status.Status) + } + + // Cleanup at end of phase — clear every key used. Plugin stays loaded so + // later phases can PUT-update it. + allKeys := []string{ + defaultKeyDirect, + "phase1-k1-a", "phase1-k1-b", "phase1-k2", "phase1-ttl", + "phase1-k5", "phase1-k6", "phase1-k7", "phase1-k8", + "phase1-k9", "phase1-k10", "phase1-k11", "phase1-k12", + "phase1-k14", "phase1-k15", "phase1-k16", "phase1-k17", + "phase1-k18", "phase1-k19", "phase1-k41", + "phase1-k45", "phase1-k46", "phase1-k54", + "phase1-k32", "phase1-k33", "phase1-k34", "phase1-k35", "phase1-k36", + "phase1-k48", "phase1-k49", "phase1-k50", "phase1-k51", "phase1-k52", + "phase1-k26", "phase1-k27", "phase1-k28", "phase1-k29", + "phase1-k30", "phase1-k31", "phase1-k42", "phase1-k43", + "phase1-k20", "phase1-k21", "phase1-k22", "phase1-k23", + "phase1-k53", "phase1-k53-seed", + "phase1-k24", "phase1-k25", "phase1-k47", + "phase1-k38", "phase1-k39", "phase1-k37", + "phase1-k55", + } + t.Cleanup(func() { + for _, k := range allKeys { + _ = clearByCacheKey(t, lc.at(99), 99, k) + } + }) + + // 1.1 exact_match_chat + t.Run("1.1_exact_match_chat", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.1_exact_match_chat") + req := simpleChat(cfg.OpenAIModel, "What is the capital of France?") + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: "phase1-k1-a"}) + idA := assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: "phase1-k1-a"}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.2 key_isolation + t.Run("1.2_key_isolation", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.2_key_isolation") + req := simpleChat(cfg.OpenAIModel, "Recommend a science fiction book to read.") + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: "phase1-k1-b"}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, req, cacheHeaders{Key: "phase1-k2"}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.3 default_cache_key — no header, default key on plugin applies. + t.Run("1.3_default_cache_key", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.3_default_cache_key") + req := simpleChat(cfg.OpenAIModel, "Give me one fun fact about octopuses.") + + respA := postChat(t, lc, 1, req, cacheHeaders{}) + idA := assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.4 no_key_no_default — when DefaultCacheKey="" and no x-bf-cache-key, + // the plugin's PreLLMHook bails before any cache work (`resolveCacheKey` returns false). + // PostLLMHook also bails because state was never created. So no cache_debug is stamped. + t.Run("1.4_no_key_no_default", func(t *testing.T) { + // Serial: this case mutates plugin config (default_cache_key=""). + lc := newLogCtx("direct", "1.4_no_key_no_default") + + // Flip default_cache_key off. + pluginUpdate(t, lc, 1, true, directOnlyConfig(ttlDirect, "")) + t.Cleanup(func() { restoreDirectBaseline(t, lc, 99) }) + + req := simpleChat(cfg.OpenAIModel, "Tell me a one-line joke about teapots.") + respA := postChat(t, lc, 2, req, cacheHeaders{}) + assertNoCacheDebug(t, lc, 3, respA) + + respB := postChat(t, lc, 4, req, cacheHeaders{}) + assertNoCacheDebug(t, lc, 5, respB) + }) + + // 1.5 cache_by_model_default_true — model in cache key by default, so two + // requests with same body but different models → distinct cache_ids. + t.Run("1.5_cache_by_model_default_true", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.5_cache_by_model_default_true") + key := "phase1-k5" + body := "What is the speed of light in vacuum?" + + respA := postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, simpleChat(cfg.OpenAIModelAlt, body), cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.6 cache_by_model_false — flip the flag, same body across two models + // should produce the same cache_id; B hits the entry stored by A. + t.Run("1.6_cache_by_model_false", func(t *testing.T) { + // Serial: this case mutates plugin config (cache_by_model=false). + lc := newLogCtx("direct", "1.6_cache_by_model_false") + + cfgBlob := directOnlyConfig(ttlDirect, defaultKeyDirect) + cfgBlob["cache_by_model"] = false + pluginUpdate(t, lc, 1, true, cfgBlob) + t.Cleanup(func() { restoreDirectBaseline(t, lc, 99) }) + + key := "phase1-k6" + body := "Recommend one short walk-friendly podcast." + + respA := postChat(t, lc, 2, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 3, respA) + waitForCacheWrite(t, lc, 4) + + respB := postChat(t, lc, 5, simpleChat(cfg.OpenAIModelAlt, body), cacheHeaders{Key: key}) + idB := assertHit(t, lc, 6, respB, "direct") + assertSameCacheID(t, lc, 7, idB, idA) + }) + + // 1.7 cache_by_provider_default_true — provider in cache key by default. + t.Run("1.7_cache_by_provider_default_true", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.7_cache_by_provider_default_true") + if os.Getenv("SC_CHAT_MODEL_ANTHROPIC") == "" { + t.Skip("anthropic model not configured (SC_CHAT_MODEL_ANTHROPIC unset)") + } + key := "phase1-k7" + body := "Give one tip for staying focused while reading." + + respA := postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, simpleChat(cfg.AnthroModel, body), cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.8 cache_by_provider_false — with both cache_by_provider and + // cache_by_model off, providers can share cache entries. + t.Run("1.8_cache_by_provider_false", func(t *testing.T) { + // Serial: this case mutates plugin config (cache_by_* = false). + lc := newLogCtx("direct", "1.8_cache_by_provider_false") + if os.Getenv("SC_CHAT_MODEL_ANTHROPIC") == "" { + t.Skip("anthropic model not configured (SC_CHAT_MODEL_ANTHROPIC unset)") + } + + cfgBlob := directOnlyConfig(ttlDirect, defaultKeyDirect) + cfgBlob["cache_by_provider"] = false + cfgBlob["cache_by_model"] = false + pluginUpdate(t, lc, 1, true, cfgBlob) + t.Cleanup(func() { restoreDirectBaseline(t, lc, 99) }) + + key := "phase1-k8" + body := "Say hi in three words." + + respA := postChat(t, lc, 2, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 3, respA) + waitForCacheWrite(t, lc, 4) + + respB := postChat(t, lc, 5, simpleChat(cfg.AnthroModel, body), cacheHeaders{Key: key}) + idB := assertHit(t, lc, 6, respB, "direct") + assertSameCacheID(t, lc, 7, idB, idA) + }) + + // 1.9 exclude_system_prompt_false — system message is part of the hash + // by default; different systems → different cache_ids. + t.Run("1.9_exclude_system_prompt_false", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.9_exclude_system_prompt_false") + key := "phase1-k9" + user := "What's 2+2?" + + respA := postChat(t, lc, 1, chatWithSystem(cfg.OpenAIModel, "You are a math tutor.", user), cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, chatWithSystem(cfg.OpenAIModel, "You are a pirate.", user), cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.10 exclude_system_prompt_true — flag flips system message out of the + // hash; identical user message hits regardless of system. + t.Run("1.10_exclude_system_prompt_true", func(t *testing.T) { + // Serial: this case mutates plugin config (exclude_system_prompt=true). + lc := newLogCtx("direct", "1.10_exclude_system_prompt_true") + + cfgBlob := directOnlyConfig(ttlDirect, defaultKeyDirect) + cfgBlob["exclude_system_prompt"] = true + pluginUpdate(t, lc, 1, true, cfgBlob) + t.Cleanup(func() { restoreDirectBaseline(t, lc, 99) }) + + key := "phase1-k10" + user := "What's the powerhouse of the cell?" + + respA := postChat(t, lc, 2, chatWithSystem(cfg.OpenAIModel, "You are a biology teacher.", user), cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 3, respA) + waitForCacheWrite(t, lc, 4) + + respB := postChat(t, lc, 5, chatWithSystem(cfg.OpenAIModel, "You are Sherlock Holmes.", user), cacheHeaders{Key: key}) + idB := assertHit(t, lc, 6, respB, "direct") + assertSameCacheID(t, lc, 7, idB, idA) + }) + + // 1.11 conversation_threshold_skips — len(messages) > threshold (default 3) + // → plugin bails before any cache work. No cache_debug on either response. + t.Run("1.11_conversation_threshold_skips", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.11_conversation_threshold_skips") + key := "phase1-k11" + + msgs := []chatMessage{ + {Role: "user", Content: textContent("Hi.")}, + {Role: "assistant", Content: textContent("Hello! How can I help?")}, + {Role: "user", Content: textContent("What's the weather like in Paris?")}, + {Role: "user", Content: textContent("Actually, give me one travel tip for Paris.")}, + } + req := chatRequest{Model: cfg.OpenAIModel, Messages: msgs} + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key}) + assertNoCacheDebug(t, lc, 2, respA) + + respB := postChat(t, lc, 3, req, cacheHeaders{Key: key}) + assertNoCacheDebug(t, lc, 4, respB) + }) + + // 1.12 conversation_threshold_boundary — len(messages) == threshold (3) + // is still cached (code uses `>`, not `>=`). Boundary case. + t.Run("1.12_conversation_threshold_boundary", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.12_conversation_threshold_boundary") + key := "phase1-k12" + + msgs := []chatMessage{ + {Role: "user", Content: textContent("Hi.")}, + {Role: "assistant", Content: textContent("Hello!")}, + {Role: "user", Content: textContent("Recommend one calming tea.")}, + } + req := chatRequest{Model: cfg.OpenAIModel, Messages: msgs} + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.13 ttl_expiry_default + t.Run("1.13_ttl_expiry_default", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.13_ttl_expiry_default") + req := simpleChat(cfg.OpenAIModel, "Name a primary color.") + key := "phase1-ttl" + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + // Confirm a fresh read hits within TTL. + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "direct") + + // Sleep past TTL + 2s safety margin. + wait := ttlDirectDuration + 2*time.Second + logf(t, lc.at(6), "INFO", "sleep_for_ttl", map[string]any{"seconds": wait.Seconds()}) + time.Sleep(wait) + + respC := postChat(t, lc, 7, req, cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 8, respC) + }) + + // 1.14 ttl_per_request_override — x-bf-cache-ttl=3s overrides plugin default (10s). + // B hit within 3s, C miss after sleeping past 3s + safety. + t.Run("1.14_ttl_per_request_override", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.14_ttl_per_request_override") + req := simpleChat(cfg.OpenAIModel, "Name a noble gas.") + key := "phase1-k14" + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key, TTL: "3s"}) + _ = assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key, TTL: "3s"}) + _ = assertHit(t, lc, 5, respB, "direct") + + // Sleep past per-request TTL but well under plugin default (10s). + wait := 4 * time.Second + logf(t, lc.at(6), "INFO", "sleep_past_per_request_ttl", map[string]any{"seconds": wait.Seconds()}) + time.Sleep(wait) + + respC := postChat(t, lc, 7, req, cacheHeaders{Key: key, TTL: "3s"}) + _ = assertMiss(t, lc, 8, respC) + }) + + // 1.15 ttl_invalid_header_ignored — bogus x-bf-cache-ttl is silently ignored + // (lib/ctx.go:381). Plugin default TTL applies; B still hits within default. + t.Run("1.15_ttl_invalid_header_ignored", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.15_ttl_invalid_header_ignored") + req := simpleChat(cfg.OpenAIModel, "What is a haiku?") + key := "phase1-k15" + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key, TTL: "garbage"}) + _ = assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key, TTL: "also-garbage"}) + _ = assertHit(t, lc, 5, respB, "direct") + }) + + // 1.16 no_store_header — both A and B send x-bf-cache-no-store=true; nothing + // is ever written, so both miss. cache_debug IS stamped (plugin runs, but + // PostLLMHook's shouldSkipCaching short-circuits the write). + t.Run("1.16_no_store_header", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.16_no_store_header") + req := simpleChat(cfg.OpenAIModel, "Define entropy in one sentence.") + key := "phase1-k16" + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key, NoStore: "true"}) + idA := assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key, NoStore: "true"}) + idB := assertMiss(t, lc, 5, respB) + // Same body + key → same deterministic cache_id even though no entry exists. + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.17 no_store_with_hit — A writes normally; B sends no-store=true but the + // header only blocks WRITES, not reads. B still hits the entry A stored. + t.Run("1.17_no_store_with_hit", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.17_no_store_with_hit") + req := simpleChat(cfg.OpenAIModel, "What's the boiling point of water in Celsius?") + key := "phase1-k17" + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key, NoStore: "true"}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.18 cache_type_direct_header — explicit x-bf-cache-type=direct in direct-only + // mode is a no-op narrow (direct is already the only path). B still hits. + t.Run("1.18_cache_type_direct_header", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.18_cache_type_direct_header") + req := simpleChat(cfg.OpenAIModel, "Name the Roman god of war.") + key := "phase1-k18" + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key, Type: "direct"}) + idA := assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key, Type: "direct"}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.19 cache_type_semantic_in_direct_only — STRICT assertion of PLAN §12 bug #2. + // In direct-only mode with x-bf-cache-type=semantic, the plugin has no + // embedding executor → no semantic search can run. Direct search is also + // suppressed by the header. The canDoSemanticSearch early-exit guard in + // PreLLMHook (plugins/semanticcache/main.go) returns before any cache + // activity, so no cache_debug is stamped and no orphan entry is written. + // If either appears, the guard has regressed. + t.Run("1.19_cache_type_semantic_in_direct_only", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.19_cache_type_semantic_in_direct_only") + req := simpleChat(cfg.OpenAIModel, "Tell me one famous quote about courage.") + key := "phase1-k19" + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key, Type: "semantic"}) + assertNoCacheDebug(t, lc, 2, respA) + + respB := postChat(t, lc, 3, req, cacheHeaders{Key: key, Type: "semantic"}) + assertNoCacheDebug(t, lc, 4, respB) + }) + + // 1.41 threshold_header_ignored_direct_only — x-bf-cache-threshold has no + // effect on direct lookups (it's only consulted in performSemanticSearch). + // B with threshold=0.0 still finds A's deterministic entry. + t.Run("1.41_threshold_header_ignored_direct_only", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.41_threshold_header_ignored_direct_only") + req := simpleChat(cfg.OpenAIModel, "Name a famous bridge.") + key := "phase1-k41" + zero := 0.0 + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key, Threshold: &zero}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.45 no_store_explicit_false — header value MUST be the literal "true" to + // disable writes (ctx.go:406). Sending "false" does NOT block writes. + t.Run("1.45_no_store_explicit_false", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.45_no_store_explicit_false") + req := simpleChat(cfg.OpenAIModel, "What's a synonym for happy?") + key := "phase1-k45" + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key, NoStore: "false"}) + idA := assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key, NoStore: "false"}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.46 no_store_uppercase_true — header match is case-sensitive. "TRUE" does + // not toggle the no-store flag; writes proceed normally. + t.Run("1.46_no_store_uppercase_true", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.46_no_store_uppercase_true") + req := simpleChat(cfg.OpenAIModel, "Name a famous painter.") + key := "phase1-k46" + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key, NoStore: "TRUE"}) + idA := assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key, NoStore: "TRUE"}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.37 params_top_logprobs — top_logprobs is a non-trivial chat parameter + // that lands in the params metadata (utils.go:795). Distinct values must + // produce distinct cache_ids. Stands in for the "extra_params" case in + // the plan since extra_params is hard to wire on the OpenAI-compat + // endpoint — same isolation contract, less plumbing. + t.Run("1.37_params_top_logprobs", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.37_params_top_logprobs") + key := "phase1-k37" + body := "Name one mountain range." + yes := true + + reqA := simpleChat(cfg.OpenAIModel, body) + reqA.LogProbs = &yes + t1 := 2 + reqA.TopLogProbs = &t1 + + reqB := simpleChat(cfg.OpenAIModel, body) + reqB.LogProbs = &yes + t2 := 5 + reqB.TopLogProbs = &t2 + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.38 clear_by_cache_id — populate an entry, delete it by id, verify the + // same body now misses. + t.Run("1.38_clear_by_cache_id", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.38_clear_by_cache_id") + key := "phase1-k38" + req := simpleChat(cfg.OpenAIModel, "Name one type of tree.") + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + // Confirm the entry is queryable before we delete it. + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "direct") + + // Delete by id. + if got := clearByCacheID(t, lc, 6, idA); got != http.StatusOK { + t.Fatalf("expected 200 from clear-by-id, got %d", got) + } + + // Subsequent identical request must miss again. + respC := postChat(t, lc, 7, req, cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 8, respC) + }) + + // 1.39 clear_by_key — populate two distinct bodies under the same cache + // key, then bulk-delete by key; both should miss afterwards. + t.Run("1.39_clear_by_key", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.39_clear_by_key") + key := "phase1-k39" + reqA := simpleChat(cfg.OpenAIModel, "Recommend one mystery novel.") + reqB := simpleChat(cfg.OpenAIModel, "Recommend one biography.") + + respA1 := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 2, respA1) + respB1 := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 4, respB1) + waitForCacheWrite(t, lc, 5) + + // Both should now hit before we clear. + _ = assertHit(t, lc, 7, postChat(t, lc, 6, reqA, cacheHeaders{Key: key}), "direct") + _ = assertHit(t, lc, 9, postChat(t, lc, 8, reqB, cacheHeaders{Key: key}), "direct") + + // Bulk-clear the whole key. + if got := clearByCacheKey(t, lc, 10, key); got != http.StatusOK { + t.Fatalf("expected 200 from clear-by-key, got %d", got) + } + + // Both should miss again. + _ = assertMiss(t, lc, 12, postChat(t, lc, 11, reqA, cacheHeaders{Key: key})) + _ = assertMiss(t, lc, 14, postChat(t, lc, 13, reqB, cacheHeaders{Key: key})) + }) + + // 1.40 clear_unknown_id — DELETE with a random uuid. Whether Bifrost returns + // 200 (idempotent delete) or 404 (strict not-found), the contract is: + // no 5xx and no crash. Documents the actual behavior in the log so PLAN + // can pin it down later. + t.Run("1.40_clear_unknown_id", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.40_clear_unknown_id") + unknownID := "00000000-0000-0000-0000-000000000000" + status := clearByCacheID(t, lc, 1, unknownID) + if status >= 500 { + t.Fatalf("clear unknown id returned %d (server error); expected idempotent 200 or 404", status) + } + // Accept either contract; surface which one in the log for PLAN docs. + logf(t, lc.at(2), "PASS", "clear_unknown_id_documented", map[string]any{ + "status": status, + "contract": "idempotent" + (map[bool]string{true: "_or_404"}[status == http.StatusNotFound]), + }) + }) + + // 1.24 streaming_chat — SSE chat, A→B identical. B replays cached chunks; + // final chunk on B has cache_hit=true with hit_type=direct, and chunk count + // matches A's chunk count. + t.Run("1.24_streaming_chat", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.24_streaming_chat") + key := "phase1-k24" + req := simpleChat(cfg.OpenAIModel, "Recite three colors of the rainbow, one per line.") + + respA := postChatStream(t, lc, 1, req, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + if len(respA.dataChunks()) < 2 { + t.Fatalf("expected ≥2 data chunks on miss stream, got %d", len(respA.dataChunks())) + } + waitForCacheWrite(t, lc, 3) + + respB := postChatStream(t, lc, 4, req, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + if got, want := len(respB.dataChunks()), len(respA.dataChunks()); got != want { + t.Fatalf("expected B chunk count %d to match A's %d", got, want) + } + }) + + // 1.25 streaming_replay_order — chunk-by-chunk content should be identical + // between A (live stream) and B (cached replay). Plugin stores chunks as a + // JSON array and replays them in order (search.go:351). + t.Run("1.25_streaming_replay_order", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.25_streaming_replay_order") + key := "phase1-k25" + req := simpleChat(cfg.OpenAIModel, "Count from one to five.") + + respA := postChatStream(t, lc, 1, req, cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChatStream(t, lc, 4, req, cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "direct") + + a := respA.dataChunks() + b := respB.dataChunks() + if len(a) != len(b) { + t.Fatalf("chunk count mismatch: A=%d B=%d", len(a), len(b)) + } + for i := range a { + ta, tb := a[i].chunkText(), b[i].chunkText() + if ta != tb { + t.Fatalf("chunk %d text mismatch:\nA=%q\nB=%q", i, ta, tb) + } + } + logf(t, lc.at(6), "PASS", "chunks_identical_in_order", map[string]any{"count": len(a)}) + }) + + // 1.47 streaming_non_final_chunks_have_no_cache_debug — only the final + // data chunk carries the cache_debug stamp (stampCacheDebugForMiss / + // stampCacheDebugForHit skip non-final chunks). All earlier chunks must + // have cache_debug absent on both A (miss) and B (hit). + t.Run("1.47_streaming_non_final_chunks_no_cache_debug", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.47_streaming_non_final_chunks_no_cache_debug") + key := "phase1-k47" + req := simpleChat(cfg.OpenAIModel, "List two breakfast foods.") + + check := func(stage string, resp *streamResponse) { + data := resp.dataChunks() + if len(data) == 0 { + t.Fatalf("[%s] no data chunks received", stage) + } + for i := 0; i < len(data)-1; i++ { + if cd := data[i].cacheDebug(); cd != nil { + t.Fatalf("[%s] non-final chunk %d had cache_debug stamped: %+v", stage, i, cd) + } + } + finalCD := data[len(data)-1].cacheDebug() + if finalCD == nil { + t.Fatalf("[%s] final chunk missing cache_debug stamp", stage) + } + } + + respA := postChatStream(t, lc, 1, req, cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 2, respA) + check("miss", respA) + waitForCacheWrite(t, lc, 3) + + respB := postChatStream(t, lc, 4, req, cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "direct") + check("hit", respB) + logf(t, lc.at(6), "PASS", "non_final_chunks_clean", map[string]any{ + "a_count": len(respA.dataChunks()), + "b_count": len(respB.dataChunks()), + }) + }) + + // 1.20 text_completion — /v1/completions with same prompt → hit. Plugin's + // metadata extractor handles TextCompletionRequest specifically. + t.Run("1.20_text_completion", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.20_text_completion") + key := "phase1-k20" + maxTok := 30 + req := textCompletionRequest{ + Model: "openai/gpt-3.5-turbo-instruct", + Prompt: "The capital of Japan is", + MaxTokens: &maxTok, + } + + respA := postTextCompletion(t, lc, 1, req, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postTextCompletion(t, lc, 4, req, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.21 responses_api — /v1/responses with identical input → hit. + t.Run("1.21_responses_api", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.21_responses_api") + key := "phase1-k21" + req := responsesRequest{ + Model: cfg.OpenAIModel, + Input: "Name one type of cloud.", + } + + respA := postResponses(t, lc, 1, req, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postResponses(t, lc, 4, req, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.22 embedding_endpoint — /v1/embeddings with identical input → hit. + // Plugin's EmbeddingRequest path is direct-cache-only (semantic search is + // suppressed for embedding requests — see PreLLMHook semanticEligible check). + t.Run("1.22_embedding_endpoint", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.22_embedding_endpoint") + key := "phase1-k22" + req := embeddingRequest{ + Model: "openai/" + cfg.OpenAIEmbed, + Input: "The quick brown fox jumps over the lazy dog.", + } + + respA := postEmbedding(t, lc, 1, req, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postEmbedding(t, lc, 4, req, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.23 image_generation — /v1/images/generations with identical prompt → hit. + // Note: this case is expensive ($0.04/image on dall-e-3). Skip by setting + // SC_SKIP_IMAGE_GEN=1. + t.Run("1.23_image_generation", func(t *testing.T) { + t.Parallel() + if os.Getenv("SC_SKIP_IMAGE_GEN") == "1" { + t.Skip("SC_SKIP_IMAGE_GEN=1") + } + lc := newLogCtx("direct", "1.23_image_generation") + key := "phase1-k23" + n := 1 + req := imageGenRequest{ + Model: "openai/dall-e-3", + Prompt: "A minimalist line drawing of a red teapot on a white background.", + N: &n, + Size: "1024x1024", + } + + respA := postImageGen(t, lc, 1, req, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postImageGen(t, lc, 4, req, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.53 responses_previous_response_id — different previous_response_id + // values must produce distinct cache_ids (it's in params_hash via utils.go:834). + // We use placeholder IDs since we only check params_hash isolation, not + // the actual conversation chain. + t.Run("1.53_responses_previous_response_id", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.53_responses_previous_response_id") + key := "phase1-k53" + + // Need a real previous_response_id for the provider to accept the call. + // Create one by first making a /v1/responses call and capturing its id. + seed := postResponses(t, lc, 1, responsesRequest{ + Model: cfg.OpenAIModel, + Input: "Say 'one'.", + }, cacheHeaders{Key: "phase1-k53-seed"}) + + var seedBody struct { + ID string `json:"id"` + } + if err := json.Unmarshal(seed.bodyRaw, &seedBody); err != nil || seedBody.ID == "" { + t.Skipf("could not extract response id to seed previous_response_id: %v", err) + } + + // Make a second seed call so we have two distinct previous_response_ids. + seed2 := postResponses(t, lc, 2, responsesRequest{ + Model: cfg.OpenAIModel, + Input: "Say 'two'.", + }, cacheHeaders{Key: "phase1-k53-seed"}) + var seed2Body struct { + ID string `json:"id"` + } + if err := json.Unmarshal(seed2.bodyRaw, &seed2Body); err != nil || seed2Body.ID == "" { + t.Skipf("could not extract second response id: %v", err) + } + + input := "Continue." + reqA := responsesRequest{Model: cfg.OpenAIModel, Input: input, PreviousResponseID: &seedBody.ID} + reqB := responsesRequest{Model: cfg.OpenAIModel, Input: input, PreviousResponseID: &seed2Body.ID} + + respA := postResponses(t, lc, 3, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 4, respA) + + respB := postResponses(t, lc, 5, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 6, respB) + assertDifferentCacheID(t, lc, 7, idA, idB) + }) + + // 1.26 normalization_case — getNormalizedInputForCaching lowercases + trims + // (utils.go:122). "Hello" and "hello " hash identically. + t.Run("1.26_normalization_case", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.26_normalization_case") + key := "phase1-k26" + + reqA := simpleChat(cfg.OpenAIModel, "Hello, who wrote 1984?") + reqB := simpleChat(cfg.OpenAIModel, "hello, who wrote 1984? ") + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, reqB, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.27 normalization_whitespace — leading/trailing whitespace trimmed; inner + // whitespace preserved verbatim. + t.Run("1.27_normalization_whitespace", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.27_normalization_whitespace") + key := "phase1-k27" + + reqA := simpleChat(cfg.OpenAIModel, " Name one type of pasta. ") + reqB := simpleChat(cfg.OpenAIModel, "Name one type of pasta.") + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, reqB, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.28 unicode_prompt — non-ASCII + emoji round-trips through hash + cache. + t.Run("1.28_unicode_prompt", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.28_unicode_prompt") + key := "phase1-k28" + body := "🚀 Quel est le sens de la vie? 寿司は美味しい。" + + respA := postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.29 large_prompt — ~10KB prompt; the second call's wall-clock should be + // dominated by cache_hit_latency (~ms), not provider latency (~s). + t.Run("1.29_large_prompt", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.29_large_prompt") + key := "phase1-k29" + // Repeat a sentence to ~10KB. + body := strings.Repeat("In a region far away, beneath the silver moon, a curious traveler set out at dawn carrying a worn leather satchel and a heart full of questions. ", 70) + body += " Summarize the above in one sentence." + + respA := postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + // cache_hit_latency is stamped at hit time — assert it's at least set. + // (Sanity check; provider latency would be much higher.) + if cd := respB.cacheDebug(); cd == nil || cd.CacheHitLatency == nil { + t.Fatalf("expected cache_hit_latency stamped on large_prompt hit") + } + }) + + // 1.30 image_in_message — identical image_url block in both A and B → hit. + // Verifies extractAttachmentsForCaching contributes consistently to the hash. + t.Run("1.30_image_in_message", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.30_image_in_message") + key := "phase1-k30" + + reqA := chatWithImage(cfg.OpenAIModel, "What is shown in this image?", testImageURL1) + reqB := chatWithImage(cfg.OpenAIModel, "What is shown in this image?", testImageURL1) + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, reqB, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.31 image_attachment_diff — same text, different image URL → distinct cache_ids. + t.Run("1.31_image_attachment_diff", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.31_image_attachment_diff") + key := "phase1-k31" + prompt := "What is shown in this image?" + + reqA := chatWithImage(cfg.OpenAIModel, prompt, testImageURL1) + reqB := chatWithImage(cfg.OpenAIModel, prompt, testImageURL2) + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.42 nil_content_msg — a 3-message conversation including an assistant + // tool-call message with nil content (followed by a tool response). + // extractChatMessageContent handles nil content as empty string (utils.go:312) + // so the hash is stable across runs. + t.Run("1.42_nil_content_msg", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.42_nil_content_msg") + key := "phase1-k42" + + mkReq := func() chatRequest { + return chatRequest{ + Model: cfg.OpenAIModel, + Messages: []chatMessage{ + {Role: "user", Content: textContent("What's the weather in NYC?")}, + { + Role: "assistant", + // Content intentionally omitted (nil) — assistant + // tool-call messages set content=null per OpenAI spec. + ToolCalls: []chatToolCall{{ + ID: "call_abc", + Type: "function", + Function: chatToolCallFunc{ + Name: "get_weather", + Arguments: `{"city":"NYC"}`, + }, + }}, + }, + {Role: "tool", ToolCallID: "call_abc", Content: textContent("Sunny, 72°F")}, + }, + } + } + + respA := postChat(t, lc, 1, mkReq(), cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, mkReq(), cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.43 empty_messages — sending messages:[] should be rejected by the + // provider (or Bifrost validation) without crashing Bifrost. Accept any + // non-2xx response; the contract is "no crash, no orphan cache entry." + t.Run("1.43_empty_messages", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.43_empty_messages") + key := "phase1-k43" + + req := chatRequest{Model: cfg.OpenAIModel, Messages: []chatMessage{}} + hdr := http.Header{} + (cacheHeaders{Key: key}).apply(&http.Request{Header: hdr}) + + status, body, _, err := doJSON(t, "POST", "/v1/chat/completions", req, hdr) + if err != nil { + t.Fatalf("empty_messages http error: %v", err) + } + logf(t, lc.at(1), "INFO", "response", map[string]any{ + "status": status, + "body_len": len(body), + }) + if status >= 200 && status < 300 { + t.Fatalf("expected non-success status for empty messages, got %d body=%s", + status, truncate(string(body), 200)) + } + // Subsequent identical request should also fail — and crucially + // shouldn't return a stale cache hit. + status2, body2, _, _ := doJSON(t, "POST", "/v1/chat/completions", req, hdr) + if status2 >= 200 && status2 < 300 { + t.Fatalf("expected non-success status on retry, got %d body=%s", + status2, truncate(string(body2), 200)) + } + logf(t, lc.at(2), "PASS", "no_crash_on_empty_messages", map[string]any{ + "status_a": status, "status_b": status2, + }) + }) + + // 1.44 plugin_get_status — GET /api/plugins/semantic_cache after the phase + // is warm. status should be "active" and config should round-trip what we PUT. + t.Run("1.44_plugin_get_status", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.44_plugin_get_status") + p, exists := pluginGet(t, lc, 1) + if !exists { + t.Fatalf("plugin %q should exist mid-phase", pluginName) + } + if !p.Enabled { + t.Fatalf("expected plugin enabled=true, got %v", p.Enabled) + } + validStatuses := map[string]bool{"active": true, "ready": true, "Ready": true, "Initialized": true} + if got := p.Status.Status; !validStatuses[got] { + t.Fatalf("expected plugin status to be one of active/ready/Ready/Initialized, got %q", got) + } + // Config blob round-trip checks — backend may coerce numeric types + // when re-serializing from the DB. + gotDim, _ := p.Config["dimension"].(float64) + if int(gotDim) != 1 { + t.Fatalf("expected dimension=1 (direct-only), got %v", p.Config["dimension"]) + } + if got, _ := p.Config["default_cache_key"].(string); got != defaultKeyDirect { + t.Fatalf("expected default_cache_key=%q, got %q", defaultKeyDirect, got) + } + logf(t, lc.at(2), "PASS", "plugin_status_validated", map[string]any{ + "status": p.Status.Status, + "enabled": p.Enabled, + "dimension": p.Config["dimension"], + "default_cache_key": p.Config["default_cache_key"], + }) + }) + + // 1.32 params_temperature_isolation — temperature is part of params hash, + // so the same body with different temperatures produces distinct cache_ids. + t.Run("1.32_params_temperature_isolation", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.32_params_temperature_isolation") + key := "phase1-k32" + body := "Pick one number between 1 and 10." + + reqA := simpleChat(cfg.OpenAIModel, body) + t1 := 0.3 + reqA.Temperature = &t1 + + reqB := simpleChat(cfg.OpenAIModel, body) + t2 := 0.7 + reqB.Temperature = &t2 + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.33 params_top_p_isolation — top_p in params hash. + t.Run("1.33_params_top_p_isolation", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.33_params_top_p_isolation") + key := "phase1-k33" + body := "Name a Greek philosopher." + + reqA := simpleChat(cfg.OpenAIModel, body) + tp1 := 0.5 + reqA.TopP = &tp1 + + reqB := simpleChat(cfg.OpenAIModel, body) + tp2 := 0.9 + reqB.TopP = &tp2 + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.34 params_seed_same — same seed, same body → hit. + t.Run("1.34_params_seed_same", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.34_params_seed_same") + key := "phase1-k34" + body := "Recommend one Latin saying." + seed := 42 + + reqA := simpleChat(cfg.OpenAIModel, body) + reqA.Seed = &seed + reqB := simpleChat(cfg.OpenAIModel, body) + reqB.Seed = &seed + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, reqB, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.35 params_seed_diff — different seeds → miss. + t.Run("1.35_params_seed_diff", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.35_params_seed_diff") + key := "phase1-k35" + body := "Recommend one quote about patience." + + reqA := simpleChat(cfg.OpenAIModel, body) + s1 := 42 + reqA.Seed = &s1 + + reqB := simpleChat(cfg.OpenAIModel, body) + s2 := 99 + reqB.Seed = &s2 + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.36 params_max_tokens_isolation — max_tokens in params hash. + t.Run("1.36_params_max_tokens_isolation", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.36_params_max_tokens_isolation") + key := "phase1-k36" + body := "List two healthy snacks." + + reqA := simpleChat(cfg.OpenAIModel, body) + m1 := 60 + reqA.MaxTokens = &m1 + + reqB := simpleChat(cfg.OpenAIModel, body) + m2 := 120 + reqB.MaxTokens = &m2 + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.48 tools_order_independent — Tools is hashed as a sorted set (utils.go:801-813), + // so reordering identical tool definitions must NOT change the cache_id. + // This catches the MCP-randomized-map regression the docstring calls out. + t.Run("1.48_tools_order_independent", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.48_tools_order_independent") + key := "phase1-k48" + body := "Look up the current weather in Tokyo." + + toolA := chatTool{Type: "function", Function: &toolFunction{ + Name: "get_weather", Description: "Get current weather", + Parameters: map[string]any{"type": "object", "properties": map[string]any{"city": map[string]any{"type": "string"}}, "required": []string{"city"}}, + }} + toolB := chatTool{Type: "function", Function: &toolFunction{ + Name: "search_web", Description: "Search the web", + Parameters: map[string]any{"type": "object", "properties": map[string]any{"query": map[string]any{"type": "string"}}, "required": []string{"query"}}, + }} + + reqA := simpleChat(cfg.OpenAIModel, body) + reqA.Tools = []chatTool{toolA, toolB} + reqB := simpleChat(cfg.OpenAIModel, body) + reqB.Tools = []chatTool{toolB, toolA} // swapped order + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, reqB, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 1.49 tools_function_name_change — different tool names → distinct params hash → miss. + t.Run("1.49_tools_function_name_change", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.49_tools_function_name_change") + key := "phase1-k49" + body := "Search for top hiking trails near Seattle." + + mkTool := func(name string) chatTool { + return chatTool{Type: "function", Function: &toolFunction{ + Name: name, Description: "do a search", + Parameters: map[string]any{"type": "object", "properties": map[string]any{"q": map[string]any{"type": "string"}}, "required": []string{"q"}}, + }} + } + + reqA := simpleChat(cfg.OpenAIModel, body) + reqA.Tools = []chatTool{mkTool("search")} + + reqB := simpleChat(cfg.OpenAIModel, body) + reqB.Tools = []chatTool{mkTool("lookup")} + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.50 prompt_cache_key_in_metadata — params.PromptCacheKey is extracted + // into the metadata map (utils.go:781) so different values → different cache_ids. + t.Run("1.50_prompt_cache_key_in_metadata", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.50_prompt_cache_key_in_metadata") + key := "phase1-k50" + body := "Translate 'hello' to French." + + reqA := simpleChat(cfg.OpenAIModel, body) + pckA := "tenant-X" + reqA.PromptCacheKey = &pckA + + reqB := simpleChat(cfg.OpenAIModel, body) + pckB := "tenant-Y" + reqB.PromptCacheKey = &pckB + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.51 service_tier_in_metadata — service_tier is in params hash. + t.Run("1.51_service_tier_in_metadata", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.51_service_tier_in_metadata") + key := "phase1-k51" + body := "Define empathy in one sentence." + + // "auto" and "default" are both accepted by gpt-4o-mini ("flex" is gated + // on premium models). The point is to differ; the values matter only + // for params_hash isolation. + reqA := simpleChat(cfg.OpenAIModel, body) + stA := "default" + reqA.ServiceTier = &stA + + reqB := simpleChat(cfg.OpenAIModel, body) + stB := "auto" + reqB.ServiceTier = &stB + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.52 store_flag_in_metadata — params.Store toggle changes params hash. + t.Run("1.52_store_flag_in_metadata", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.52_store_flag_in_metadata") + key := "phase1-k52" + body := "Name one chess opening." + + reqA := simpleChat(cfg.OpenAIModel, body) + storeA := true + reqA.Store = &storeA + + reqB := simpleChat(cfg.OpenAIModel, body) + storeB := false + reqB.Store = &storeB + + respA := postChat(t, lc, 1, reqA, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + + respB := postChat(t, lc, 3, reqB, cacheHeaders{Key: key}) + idB := assertMiss(t, lc, 4, respB) + assertDifferentCacheID(t, lc, 5, idA, idB) + }) + + // 1.54 ttl_zero_per_request — x-bf-cache-ttl=0s (or negative) falls back to + // the plugin default TTL. Without this contract, "0s" would yield + // expires_at=now and silently break caching for the affected request; + // instead the plugin treats non-positive values as "use default", matching + // Init's behavior for Config.TTL=0. + t.Run("1.54_ttl_zero_per_request", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.54_ttl_zero_per_request") + req := simpleChat(cfg.OpenAIModel, "Name a constellation.") + key := "phase1-k54" + + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key, TTL: "0s"}) + idA := assertMiss(t, lc, 2, respA) + + waitForCacheWrite(t, lc, 3) + + // B with TTL=0s should hit — the override is rejected as non-positive + // and the plugin's default (10s) keeps A's entry alive. + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key, TTL: "0s"}) + idB := assertHit(t, lc, 5, respB, "direct") + assertSameCacheID(t, lc, 6, idB, idA) + + // Negative TTL should follow the same fallback path. + respC := postChat(t, lc, 7, req, cacheHeaders{Key: key, TTL: "-30s"}) + _ = assertHit(t, lc, 8, respC, "direct") + }) + + // 1.55 cache_debug_in_logs_endpoint — cross-check that the persisted log + // row's cache_debug column matches the in-flight response stamp. Guards + // against drift between PostLLMHook stamping and durable storage (same + // data path the UI Logs view reads). + t.Run("1.55_cache_debug_in_logs_endpoint", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("direct", "1.55_cache_debug_in_logs_endpoint") + key := "phase1-k55" + req := simpleChat(cfg.OpenAIModel, "Name one famous lighthouse.") + + // Generate a hit so cache_debug carries the full set of hit-only fields. + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, req, cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key}) + respCD := assertHitAndReturnCacheDebug(t, lc, 5, respB, "direct") + + entry := findLogByCacheDebug(t, lc, 6, respCD) + assertLogMatchesResponseCacheDebug(t, lc, 7, respCD, entry.CacheDebug) + }) + + logf(t, newLogCtx("direct", "teardown").at(99), "TEARDOWN", "phase_end", nil) +} + +// assertHitAndReturnCacheDebug is the same as assertHit but also returns the +// full cacheDebug struct (the regular helper returns just the cache_id string). +// Used by the /api/logs cross-check cases that need to compare all fields. +func assertHitAndReturnCacheDebug(t *testing.T, lc logCtx, step int, resp cacheDebugged, wantType string) *cacheDebug { + t.Helper() + _ = assertHit(t, lc, step, resp, wantType) + return resp.cacheDebug() +} + +// restoreDirectBaseline PUTs the canonical direct-only config so cases that +// mutate via pluginUpdate leave a clean slate for the next subtest. +func restoreDirectBaseline(t *testing.T, lc logCtx, step int) { + t.Helper() + pluginUpdate(t, lc, step, true, directOnlyConfig(ttlDirect, defaultKeyDirect)) +} + +// Defaults the phase 1 cases share. Kept narrow so a future case can tighten +// ttl (e.g. case 1.14) without colliding. +const ( + ttlDirect = "10s" + defaultKeyDirect = "phase1-default" +) + +var ttlDirectDuration = 10 * time.Second + +func simpleChat(model, content string) chatRequest { + return chatRequest{ + Model: model, + Messages: []chatMessage{ + {Role: "user", Content: textContent(content)}, + }, + } +} + +func chatWithSystem(model, system, user string) chatRequest { + return chatRequest{ + Model: model, + Messages: []chatMessage{ + {Role: "system", Content: textContent(system)}, + {Role: "user", Content: textContent(user)}, + }, + } +} + +// chatWithImage builds a user message with an image_url + text block. Used to +// exercise the attachments path of buildRequestMetadataForCaching. +func chatWithImage(model, text, imageURL string) chatRequest { + return chatRequest{ + Model: model, + Messages: []chatMessage{{ + Role: "user", + Content: blocksContent([]map[string]any{ + {"type": "text", "text": text}, + {"type": "image_url", "image_url": map[string]any{"url": imageURL}}, + }), + }}, + } +} diff --git a/tests/semanticcache/e2e_test.go b/tests/semanticcache/e2e_test.go new file mode 100644 index 0000000000..43309c96bf --- /dev/null +++ b/tests/semanticcache/e2e_test.go @@ -0,0 +1,141 @@ +package semanticcache + +import ( + "encoding/json" + "fmt" + "net/http" + "os" + "testing" +) + +// TestMain wires up the run: loads env-based config, sets up a per-run report +// directory, checks Bifrost reachability, ensures the plugin is absent (or +// deletes it under RUN_FORCE=1), then defers to the test functions. +// +// On exit, attempts a teardown DELETE so the env is clean for the next run +// — unless RUN_KEEP_PLUGIN=1. +func TestMain(m *testing.M) { + loadConfig() + if err := initLog(); err != nil { + fmt.Fprintf(os.Stderr, "init log failed: %v\n", err) + os.Exit(2) + } + exitCode := 1 + defer func() { + closeLog() + os.Exit(exitCode) + }() + + // Sanity: Bifrost reachable. + status, _, _, err := doRaw("GET", "/api/plugins") + if err != nil { + fmt.Fprintf(os.Stderr, "[SC-E2E] FATAL: cannot reach Bifrost at %s: %v\n", cfg.BifrostURL, err) + return + } + if status != http.StatusOK { + fmt.Fprintf(os.Stderr, "[SC-E2E] FATAL: GET /api/plugins returned %d (Bifrost up at %s?)\n", status, cfg.BifrostURL) + return + } + + // Plugin pre-check: must be absent unless RUN_FORCE=1. + status, body, _, err := doRaw("GET", "/api/plugins/"+pluginName) + if err != nil { + fmt.Fprintf(os.Stderr, "[SC-E2E] FATAL: pre-check GET /api/plugins/%s failed at %s: %v\n", + pluginName, cfg.BifrostURL, err) + return + } + if status == http.StatusOK { + if os.Getenv("RUN_FORCE") != "1" { + fmt.Fprintf(os.Stderr, + "[SC-E2E] FATAL: plugin %q already exists at %s. "+ + "Set RUN_FORCE=1 to delete it and proceed.\nGET body: %s\n", + pluginName, cfg.BifrostURL, truncate(string(body), 300)) + return + } + fmt.Fprintf(os.Stderr, "[SC-E2E] WARN: RUN_FORCE=1 → deleting pre-existing %q plugin\n", pluginName) + ds, dbody, _, derr := doRaw("DELETE", "/api/plugins/"+pluginName) + if derr != nil || (ds != http.StatusOK && ds != http.StatusNotFound) { + fmt.Fprintf(os.Stderr, "[SC-E2E] FATAL: cannot delete pre-existing plugin: status=%d err=%v body=%s\n", + ds, derr, truncate(string(dbody), 300)) + return + } + } + + fmt.Fprintf(os.Stderr, "[SC-E2E] run starting: bifrost=%s namespace=%s reports=%s trail_sid=%q\n", + cfg.BifrostURL, cfg.Namespace, runReportDir, trailSID) + + exitCode = m.Run() + + // Teardown — best-effort cleanup so the next run starts clean. + if os.Getenv("RUN_KEEP_PLUGIN") != "1" { + ds, _, _, _ := doRaw("DELETE", "/api/plugins/"+pluginName) + fmt.Fprintf(os.Stderr, "[SC-E2E] teardown: delete plugin → status=%d\n", ds) + } + fmt.Fprintf(os.Stderr, "[SC-E2E] run finished: exit=%d reports=%s\n", exitCode, runReportDir) +} + +// doRaw is a lightweight stdout-only HTTP helper for TestMain (no *testing.T available). +func doRaw(method, path string) (int, []byte, http.Header, error) { + req, err := http.NewRequest(method, cfg.BifrostURL+path, nil) + if err != nil { + return 0, nil, nil, err + } + resp, err := cfg.HTTPClient.Do(req) + if err != nil { + return 0, nil, nil, err + } + defer resp.Body.Close() + var b []byte + if resp.Body != nil { + b, _ = readAllSafe(resp.Body) + } + return resp.StatusCode, b, resp.Header, nil +} + +func readAllSafe(r interface{ Read([]byte) (int, error) }) ([]byte, error) { + buf := make([]byte, 0, 4096) + tmp := make([]byte, 4096) + for { + n, err := r.Read(tmp) + if n > 0 { + buf = append(buf, tmp[:n]...) + } + if err != nil { + if err.Error() == "EOF" { + return buf, nil + } + return buf, err + } + } +} + +// providersList fetches the configured providers; used by Phase 0 checks. +type providerSummary struct { + Name string `json:"name"` +} + +func providersList(t *testing.T, lc logCtx, step int) []providerSummary { + t.Helper() + status, body, _, err := doJSON(t, "GET", "/api/providers", nil, nil) + if err != nil { + t.Fatalf("providersList: %v", err) + } + if status != http.StatusOK { + t.Fatalf("providersList status=%d body=%s", status, truncate(string(body), 300)) + } + // /api/providers returns {providers: [...]} based on convention. + var wrap struct { + Providers []providerSummary `json:"providers"` + } + if err := json.Unmarshal(body, &wrap); err == nil && wrap.Providers != nil { + logf(t, lc.at(step), "INFO", "providers_list", map[string]any{"count": len(wrap.Providers)}) + return wrap.Providers + } + // Fallback: response may be a bare list. + var bare []providerSummary + if err := json.Unmarshal(body, &bare); err != nil { + t.Fatalf("providersList decode: %v\nbody=%s", err, truncate(string(body), 500)) + } + logf(t, lc.at(step), "INFO", "providers_list", map[string]any{"count": len(bare)}) + return bare +} diff --git a/tests/semanticcache/fixtures_test.go b/tests/semanticcache/fixtures_test.go new file mode 100644 index 0000000000..f590ccbb21 --- /dev/null +++ b/tests/semanticcache/fixtures_test.go @@ -0,0 +1,100 @@ +package semanticcache + +import "testing" + +// paraphrasePair holds two prompts that are SEMANTICALLY equivalent (near +// rephrasings, expected cosine ≥ ~0.92 with text-embedding-3-small) plus an +// UNRELATED prompt from a completely different domain (expected cosine +// ≤ ~0.4). The gap from the default 0.8 threshold is intentionally large +// on both sides so Phase 2 hit/miss assertions never sit on a flaky +// boundary. +// +// Pair design rules (when adding new ones): +// - Canonical vs Paraphrase: only swap 1-2 words/phrases (e.g. "What is" +// ↔ "Tell me"), keep ALL content nouns and proper nouns identical, keep +// overall sentence shape. This pushes cosine into 0.92-0.97. +// - Unrelated: pick a topic from a completely different domain (cooking +// vs astronomy, history vs electronics, etc.). Single-domain switches +// ("dogs" ↔ "cats") creep up to 0.6+ and would be flaky. +// - Sentences should be long enough (>= ~8 content words) that small +// wording changes don't dominate the embedding. +type paraphrasePair struct { + Name string + Canonical string + Paraphrase string + Unrelated string +} + +// paraphrasePairs is the chat/text-paraphrase corpus used by Phase 2 semantic +// cases. Each pair is hand-curated to land WELL above (canonical→paraphrase) +// or WELL below (canonical→unrelated) the default 0.8 threshold. +var paraphrasePairs = []paraphrasePair{ + { + Name: "capital_france", + Canonical: "What is the capital city of France in modern times?", + Paraphrase: "Tell me the capital city of France in modern times.", + Unrelated: "Explain how a transistor works at the silicon level.", + }, + { + Name: "boiling_water", + Canonical: "At what temperature does pure water boil at sea level?", + Paraphrase: "What is the boiling point of pure water at sea level?", + Unrelated: "Recommend a well-known jazz album recorded in the 1960s.", + }, + { + Name: "vinaigrette", + Canonical: "How do I make a basic vinaigrette salad dressing at home?", + Paraphrase: "What are the steps to make a basic vinaigrette salad dressing at home?", + Unrelated: "Describe quantum entanglement in a single paragraph for a beginner.", + }, + { + Name: "opera_composer", + Canonical: "Name a famous Italian opera composer from the nineteenth century.", + Paraphrase: "Tell me one famous Italian opera composer from the nineteenth century.", + Unrelated: "What is the average distance from Earth to the planet Mars?", + }, + { + Name: "photosynthesis", + Canonical: "Briefly explain how photosynthesis works in green plants.", + Paraphrase: "In a few sentences, describe how photosynthesis works in green plants.", + Unrelated: "How do you knit a basic scarf using stockinette stitch?", + }, +} + +// imagePromptPairs is the image-generation paraphrase corpus used by Phase 2 +// case 2.25 (image_gen_semantic_paraphrase). Image prompts tend to be shorter +// than chat prompts so we leave the content nouns identical and only vary +// modifiers slightly. +var imagePromptPairs = []paraphrasePair{ + { + Name: "red_apple", + Canonical: "A bright red apple sitting on a wooden kitchen table in daylight.", + Paraphrase: "A vivid red apple resting on a wooden kitchen table in daylight.", + Unrelated: "A futuristic silver spaceship orbiting Saturn against a starry void.", + }, +} + +// pairByName looks up a paraphrase pair by name. Fatal if not defined — the +// suite should fail loudly if a case references a pair that was removed. +func pairByName(t *testing.T, name string) paraphrasePair { + t.Helper() + for _, p := range paraphrasePairs { + if p.Name == name { + return p + } + } + t.Fatalf("paraphrase pair %q not defined in paraphrasePairs", name) + return paraphrasePair{} +} + +// imagePairByName looks up an image prompt pair by name. Fatal if not defined. +func imagePairByName(t *testing.T, name string) paraphrasePair { + t.Helper() + for _, p := range imagePromptPairs { + if p.Name == name { + return p + } + } + t.Fatalf("image prompt pair %q not defined in imagePromptPairs", name) + return paraphrasePair{} +} diff --git a/tests/semanticcache/go.mod b/tests/semanticcache/go.mod new file mode 100644 index 0000000000..5417c6abe8 --- /dev/null +++ b/tests/semanticcache/go.mod @@ -0,0 +1,3 @@ +module github.com/maximhq/bifrost/tests/semanticcache + +go 1.26.2 diff --git a/tests/semanticcache/http_test.go b/tests/semanticcache/http_test.go new file mode 100644 index 0000000000..c3497870d1 --- /dev/null +++ b/tests/semanticcache/http_test.go @@ -0,0 +1,674 @@ +package semanticcache + +import ( + "bufio" + "bytes" + "encoding/json" + "fmt" + "io" + "net/http" + "os" + "strings" + "testing" + "time" +) + +type runConfig struct { + BifrostURL string + OpenAIModel string + OpenAIModelAlt string // different model, same provider — for cache_by_model cases + OpenAIEmbed string + GeminiModel string + AnthroModel string + Namespace string + HTTPClient *http.Client +} + +var cfg runConfig + +func loadConfig() { + cfg.BifrostURL = strings.TrimRight(getenv("BIFROST_URL", "http://localhost:8080"), "/") + cfg.OpenAIModel = getenv("SC_CHAT_MODEL_OPENAI", "openai/gpt-4o-mini") + cfg.OpenAIModelAlt = getenv("SC_CHAT_MODEL_OPENAI_ALT", "openai/gpt-4o") + cfg.OpenAIEmbed = getenv("SC_EMBED_MODEL_OPENAI", "text-embedding-3-small") + cfg.GeminiModel = getenv("SC_CHAT_MODEL_GEMINI", "gemini/gemini-2.5-flash") + cfg.AnthroModel = getenv("SC_CHAT_MODEL_ANTHROPIC", "anthropic/claude-haiku-4-5") + cfg.Namespace = getenv("SC_NAMESPACE", "BifrostSemanticCachePluginE2E") + cfg.HTTPClient = &http.Client{Timeout: 120 * time.Second} +} + +func getenv(k, fallback string) string { + if v, ok := os.LookupEnv(k); ok && v != "" { + return v + } + return fallback +} + +// cacheDebug mirrors schemas.BifrostCacheDebug as it arrives over the wire. +type cacheDebug struct { + CacheHit bool `json:"cache_hit"` + CacheID *string `json:"cache_id,omitempty"` + HitType *string `json:"hit_type,omitempty"` + RequestedProvider *string `json:"requested_provider,omitempty"` + RequestedModel *string `json:"requested_model,omitempty"` + ProviderUsed *string `json:"provider_used,omitempty"` + ModelUsed *string `json:"model_used,omitempty"` + InputTokens *int `json:"input_tokens,omitempty"` + Threshold *float64 `json:"threshold,omitempty"` + Similarity *float64 `json:"similarity,omitempty"` + CacheHitLatency *int64 `json:"cache_hit_latency,omitempty"` +} + +// extraFields subset — only what we read in assertions. +type extraFields struct { + RequestType string `json:"request_type,omitempty"` + Provider string `json:"provider,omitempty"` + CacheDebug *cacheDebug `json:"cache_debug,omitempty"` +} + +type chatChoice struct { + Index int `json:"index"` + Message json.RawMessage `json:"message"` + FinishReason *string `json:"finish_reason,omitempty"` +} + +type chatResponse struct { + ID string `json:"id"` + Object string `json:"object,omitempty"` + Model string `json:"model,omitempty"` + Choices []chatChoice `json:"choices"` + ExtraFields *extraFields `json:"extra_fields,omitempty"` + // Captured at HTTP layer, not part of body. + bodyRaw []byte + respHeader http.Header + statusCode int +} + +func (c *chatResponse) cacheDebug() *cacheDebug { + if c.ExtraFields == nil { + return nil + } + return c.ExtraFields.CacheDebug +} + +// chatRequest is the minimum we need on the wire — OpenAI-compatible. Optional +// pointer fields keep "unset" distinguishable from "zero" for cache_key +// composition tests (e.g. seed=0 differs from seed unset). +type chatRequest struct { + Model string `json:"model"` + Messages []chatMessage `json:"messages"` + Temperature *float64 `json:"temperature,omitempty"` + TopP *float64 `json:"top_p,omitempty"` + MaxTokens *int `json:"max_tokens,omitempty"` + Seed *int `json:"seed,omitempty"` + Stream *bool `json:"stream,omitempty"` + Tools []chatTool `json:"tools,omitempty"` + PromptCacheKey *string `json:"prompt_cache_key,omitempty"` + ServiceTier *string `json:"service_tier,omitempty"` + Store *bool `json:"store,omitempty"` + LogProbs *bool `json:"logprobs,omitempty"` + TopLogProbs *int `json:"top_logprobs,omitempty"` +} + +// chatMessage uses RawContent so it can carry either a plain string or a +// content-block array (image_url, text, etc.). Helpers below build both shapes. +type chatMessage struct { + Role string `json:"role"` + Content json.RawMessage `json:"content,omitempty"` + ToolCalls []chatToolCall `json:"tool_calls,omitempty"` + ToolCallID string `json:"tool_call_id,omitempty"` +} + +type chatToolCall struct { + ID string `json:"id"` + Type string `json:"type"` + Function chatToolCallFunc `json:"function"` +} + +type chatToolCallFunc struct { + Name string `json:"name"` + Arguments string `json:"arguments"` +} + +// textContent returns a JSON-encoded plain-string content payload. +func textContent(s string) json.RawMessage { + b, _ := json.Marshal(s) + return json.RawMessage(b) +} + +// blocksContent returns a JSON-encoded content-block array (used for image_url +// inputs and other multi-modal messages). +func blocksContent(blocks []map[string]any) json.RawMessage { + b, _ := json.Marshal(blocks) + return json.RawMessage(b) +} + +type chatTool struct { + Type string `json:"type"` // "function" + Function *toolFunction `json:"function,omitempty"` // required when type=function +} + +type toolFunction struct { + Name string `json:"name"` + Description string `json:"description,omitempty"` + Parameters map[string]any `json:"parameters,omitempty"` +} + +type cacheHeaders struct { + Key string // x-bf-cache-key + TTL string // x-bf-cache-ttl + Threshold *float64 + Type string // x-bf-cache-type + NoStore string // x-bf-cache-no-store +} + +func (h cacheHeaders) apply(req *http.Request) { + if h.Key != "" { + req.Header.Set("x-bf-cache-key", h.Key) + } + if h.TTL != "" { + req.Header.Set("x-bf-cache-ttl", h.TTL) + } + if h.Threshold != nil { + req.Header.Set("x-bf-cache-threshold", fmt.Sprintf("%v", *h.Threshold)) + } + if h.Type != "" { + req.Header.Set("x-bf-cache-type", h.Type) + } + if h.NoStore != "" { + req.Header.Set("x-bf-cache-no-store", h.NoStore) + } +} + +// doJSON sends a JSON request and returns status, body, headers. +func doJSON(t *testing.T, method, path string, body any, extra http.Header) (int, []byte, http.Header, error) { + t.Helper() + var rdr io.Reader + if body != nil { + buf, err := json.Marshal(body) + if err != nil { + return 0, nil, nil, fmt.Errorf("marshal: %w", err) + } + rdr = bytes.NewReader(buf) + } + url := cfg.BifrostURL + path + req, err := http.NewRequest(method, url, rdr) + if err != nil { + return 0, nil, nil, fmt.Errorf("new request: %w", err) + } + if body != nil { + req.Header.Set("Content-Type", "application/json") + } + for k, vv := range extra { + for _, v := range vv { + req.Header.Add(k, v) + } + } + resp, err := cfg.HTTPClient.Do(req) + if err != nil { + return 0, nil, nil, fmt.Errorf("http do: %w", err) + } + defer resp.Body.Close() + respBytes, err := io.ReadAll(resp.Body) + if err != nil { + return resp.StatusCode, nil, resp.Header, fmt.Errorf("read body: %w", err) + } + return resp.StatusCode, respBytes, resp.Header, nil +} + +// postChat sends a chat completion and parses the response. +func postChat(t *testing.T, lc logCtx, step int, req chatRequest, ch cacheHeaders) *chatResponse { + t.Helper() + logf(t, lc.at(step), "INFO", "request", map[string]any{ + "method": "POST", + "path": "/v1/chat/completions", + "model": req.Model, + "cache_key": ch.Key, + "ttl": ch.TTL, + "type": ch.Type, + "no_store": ch.NoStore, + }) + + // Dump request body for forensics. + if rb, err := json.MarshalIndent(req, "", " "); err == nil { + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.req.json", lc.phase, lc.name, step), rb) + } + + hdr := http.Header{} + ch.apply(&http.Request{Header: hdr}) + + status, body, respHdr, err := doJSON(t, "POST", "/v1/chat/completions", req, hdr) + if err != nil { + t.Fatalf("postChat http error: %v", err) + } + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.resp.json", lc.phase, lc.name, step), body) + + out := &chatResponse{bodyRaw: body, respHeader: respHdr, statusCode: status} + if status != http.StatusOK { + logf(t, lc.at(step), "ERROR", "response", map[string]any{ + "status": status, + "body_len": len(body), + }) + t.Fatalf("chat completion failed: status=%d body=%s", status, truncate(string(body), 500)) + } + if err := json.Unmarshal(body, out); err != nil { + t.Fatalf("decode chat response: %v\nbody=%s", err, truncate(string(body), 500)) + } + cd := out.cacheDebug() + fields := map[string]any{"status": status} + if cd != nil { + fields["cache_hit"] = cd.CacheHit + if cd.CacheID != nil { + fields["cache_id"] = *cd.CacheID + } + if cd.HitType != nil { + fields["hit_type"] = *cd.HitType + } + if cd.CacheHitLatency != nil { + fields["cache_hit_latency"] = *cd.CacheHitLatency + } + } else { + fields["cache_debug"] = "" + } + logf(t, lc.at(step), "INFO", "response", fields) + return out +} + +func truncate(s string, n int) string { + if len(s) <= n { + return s + } + return s[:n] + "…" +} + +// ----------------------------------------------------------------------------- +// Text completion (/v1/completions) +// ----------------------------------------------------------------------------- + +type textCompletionRequest struct { + Model string `json:"model"` + Prompt string `json:"prompt"` + MaxTokens *int `json:"max_tokens,omitempty"` + Temperature *float64 `json:"temperature,omitempty"` +} + +type textCompletionResponse struct { + ExtraFields *extraFields `json:"extra_fields,omitempty"` + bodyRaw []byte + statusCode int +} + +func (r *textCompletionResponse) cacheDebug() *cacheDebug { + if r.ExtraFields == nil { + return nil + } + return r.ExtraFields.CacheDebug +} + +func postTextCompletion(t *testing.T, lc logCtx, step int, req textCompletionRequest, ch cacheHeaders) *textCompletionResponse { + t.Helper() + logf(t, lc.at(step), "INFO", "request", map[string]any{ + "method": "POST", "path": "/v1/completions", "model": req.Model, "cache_key": ch.Key, + }) + if rb, err := json.MarshalIndent(req, "", " "); err == nil { + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.req.json", lc.phase, lc.name, step), rb) + } + hdr := http.Header{} + ch.apply(&http.Request{Header: hdr}) + status, body, _, err := doJSON(t, "POST", "/v1/completions", req, hdr) + if err != nil { + t.Fatalf("postTextCompletion http error: %v", err) + } + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.resp.json", lc.phase, lc.name, step), body) + if status != http.StatusOK { + t.Fatalf("text completion failed: status=%d body=%s", status, truncate(string(body), 500)) + } + out := &textCompletionResponse{bodyRaw: body, statusCode: status} + if err := json.Unmarshal(body, out); err != nil { + t.Fatalf("decode text completion response: %v\nbody=%s", err, truncate(string(body), 500)) + } + logCacheDebugFields(t, lc.at(step), out.cacheDebug()) + return out +} + +// ----------------------------------------------------------------------------- +// Embeddings (/v1/embeddings) +// ----------------------------------------------------------------------------- + +type embeddingRequest struct { + Model string `json:"model"` + Input string `json:"input"` +} + +type embeddingResponse struct { + ExtraFields *extraFields `json:"extra_fields,omitempty"` + bodyRaw []byte + statusCode int +} + +func (r *embeddingResponse) cacheDebug() *cacheDebug { + if r.ExtraFields == nil { + return nil + } + return r.ExtraFields.CacheDebug +} + +func postEmbedding(t *testing.T, lc logCtx, step int, req embeddingRequest, ch cacheHeaders) *embeddingResponse { + t.Helper() + logf(t, lc.at(step), "INFO", "request", map[string]any{ + "method": "POST", "path": "/v1/embeddings", "model": req.Model, "cache_key": ch.Key, + }) + if rb, err := json.MarshalIndent(req, "", " "); err == nil { + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.req.json", lc.phase, lc.name, step), rb) + } + hdr := http.Header{} + ch.apply(&http.Request{Header: hdr}) + status, body, _, err := doJSON(t, "POST", "/v1/embeddings", req, hdr) + if err != nil { + t.Fatalf("postEmbedding http error: %v", err) + } + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.resp.json", lc.phase, lc.name, step), body) + if status != http.StatusOK { + t.Fatalf("embedding failed: status=%d body=%s", status, truncate(string(body), 500)) + } + out := &embeddingResponse{bodyRaw: body, statusCode: status} + if err := json.Unmarshal(body, out); err != nil { + t.Fatalf("decode embedding response: %v\nbody=%s", err, truncate(string(body), 500)) + } + logCacheDebugFields(t, lc.at(step), out.cacheDebug()) + return out +} + +// ----------------------------------------------------------------------------- +// Image generation (/v1/images/generations) +// ----------------------------------------------------------------------------- + +type imageGenRequest struct { + Model string `json:"model"` + Prompt string `json:"prompt"` + N *int `json:"n,omitempty"` + Size string `json:"size,omitempty"` +} + +type imageGenResponse struct { + ExtraFields *extraFields `json:"extra_fields,omitempty"` + bodyRaw []byte + statusCode int +} + +func (r *imageGenResponse) cacheDebug() *cacheDebug { + if r.ExtraFields == nil { + return nil + } + return r.ExtraFields.CacheDebug +} + +func postImageGen(t *testing.T, lc logCtx, step int, req imageGenRequest, ch cacheHeaders) *imageGenResponse { + t.Helper() + logf(t, lc.at(step), "INFO", "request", map[string]any{ + "method": "POST", "path": "/v1/images/generations", "model": req.Model, "cache_key": ch.Key, + }) + if rb, err := json.MarshalIndent(req, "", " "); err == nil { + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.req.json", lc.phase, lc.name, step), rb) + } + hdr := http.Header{} + ch.apply(&http.Request{Header: hdr}) + status, body, _, err := doJSON(t, "POST", "/v1/images/generations", req, hdr) + if err != nil { + t.Fatalf("postImageGen http error: %v", err) + } + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.resp.json", lc.phase, lc.name, step), body) + if status != http.StatusOK { + t.Fatalf("image gen failed: status=%d body=%s", status, truncate(string(body), 500)) + } + out := &imageGenResponse{bodyRaw: body, statusCode: status} + if err := json.Unmarshal(body, out); err != nil { + t.Fatalf("decode image gen response: %v\nbody=%s", err, truncate(string(body), 500)) + } + logCacheDebugFields(t, lc.at(step), out.cacheDebug()) + return out +} + +// ----------------------------------------------------------------------------- +// Responses API (/v1/responses) — OpenAI's newer interface +// ----------------------------------------------------------------------------- + +type responsesRequest struct { + Model string `json:"model"` + Input string `json:"input"` + Instructions *string `json:"instructions,omitempty"` + PreviousResponseID *string `json:"previous_response_id,omitempty"` +} + +type responsesResponse struct { + ExtraFields *extraFields `json:"extra_fields,omitempty"` + bodyRaw []byte + statusCode int +} + +func (r *responsesResponse) cacheDebug() *cacheDebug { + if r.ExtraFields == nil { + return nil + } + return r.ExtraFields.CacheDebug +} + +func postResponses(t *testing.T, lc logCtx, step int, req responsesRequest, ch cacheHeaders) *responsesResponse { + t.Helper() + logf(t, lc.at(step), "INFO", "request", map[string]any{ + "method": "POST", "path": "/v1/responses", "model": req.Model, "cache_key": ch.Key, + }) + if rb, err := json.MarshalIndent(req, "", " "); err == nil { + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.req.json", lc.phase, lc.name, step), rb) + } + hdr := http.Header{} + ch.apply(&http.Request{Header: hdr}) + status, body, _, err := doJSON(t, "POST", "/v1/responses", req, hdr) + if err != nil { + t.Fatalf("postResponses http error: %v", err) + } + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.resp.json", lc.phase, lc.name, step), body) + if status != http.StatusOK { + t.Fatalf("responses API failed: status=%d body=%s", status, truncate(string(body), 500)) + } + out := &responsesResponse{bodyRaw: body, statusCode: status} + if err := json.Unmarshal(body, out); err != nil { + t.Fatalf("decode responses API response: %v\nbody=%s", err, truncate(string(body), 500)) + } + logCacheDebugFields(t, lc.at(step), out.cacheDebug()) + return out +} + +// ----------------------------------------------------------------------------- +// Streaming chat (/v1/chat/completions with stream:true) — SSE +// ----------------------------------------------------------------------------- + +// streamChunk is one decoded SSE data event from a chat completion stream. +type streamChunk struct { + Index int + Raw []byte + Parsed map[string]any + ExtraFields *extraFields + Done bool // true for the terminal [DONE] sentinel +} + +func (c *streamChunk) cacheDebug() *cacheDebug { + if c.ExtraFields == nil { + return nil + } + return c.ExtraFields.CacheDebug +} + +// chunkText extracts choices[0].delta.content (or .message.content) as a +// string. Used to compare chunk order/content across A and B in case 1.25. +func (c *streamChunk) chunkText() string { + if c.Parsed == nil { + return "" + } + choices, _ := c.Parsed["choices"].([]any) + if len(choices) == 0 { + return "" + } + c0, _ := choices[0].(map[string]any) + if c0 == nil { + return "" + } + if delta, ok := c0["delta"].(map[string]any); ok { + if s, ok := delta["content"].(string); ok { + return s + } + } + if msg, ok := c0["message"].(map[string]any); ok { + if s, ok := msg["content"].(string); ok { + return s + } + } + return "" +} + +// streamResponse aggregates every chunk received from one streamed chat +// completion. cacheDebug() returns the stamp from the final chunk — that's +// the only chunk the plugin tags (search.go:628 guard). +type streamResponse struct { + Chunks []streamChunk + statusCode int + headers http.Header +} + +func (s *streamResponse) cacheDebug() *cacheDebug { + for i := len(s.Chunks) - 1; i >= 0; i-- { + if cd := s.Chunks[i].cacheDebug(); cd != nil { + return cd + } + } + return nil +} + +// dataChunks returns the chunks excluding the terminal [DONE] sentinel. +func (s *streamResponse) dataChunks() []streamChunk { + out := make([]streamChunk, 0, len(s.Chunks)) + for _, c := range s.Chunks { + if !c.Done { + out = append(out, c) + } + } + return out +} + +func postChatStream(t *testing.T, lc logCtx, step int, req chatRequest, ch cacheHeaders) *streamResponse { + t.Helper() + streamFlag := true + req.Stream = &streamFlag + + logf(t, lc.at(step), "INFO", "request", map[string]any{ + "method": "POST", "path": "/v1/chat/completions", "model": req.Model, + "cache_key": ch.Key, "stream": true, + }) + if rb, err := json.MarshalIndent(req, "", " "); err == nil { + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.req.json", lc.phase, lc.name, step), rb) + } + + bodyBytes, err := json.Marshal(req) + if err != nil { + t.Fatalf("marshal stream req: %v", err) + } + httpReq, err := http.NewRequest("POST", cfg.BifrostURL+"/v1/chat/completions", bytes.NewReader(bodyBytes)) + if err != nil { + t.Fatalf("new stream req: %v", err) + } + httpReq.Header.Set("Content-Type", "application/json") + httpReq.Header.Set("Accept", "text/event-stream") + ch.apply(httpReq) + + resp, err := cfg.HTTPClient.Do(httpReq) + if err != nil { + t.Fatalf("stream do: %v", err) + } + defer resp.Body.Close() + + out := &streamResponse{statusCode: resp.StatusCode, headers: resp.Header} + if resp.StatusCode != http.StatusOK { + body, _ := io.ReadAll(resp.Body) + t.Fatalf("stream request failed: status=%d body=%s", resp.StatusCode, truncate(string(body), 500)) + } + + scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 64*1024), 4*1024*1024) + rawDump := &bytes.Buffer{} + idx := 0 + for scanner.Scan() { + line := scanner.Bytes() + rawDump.Write(line) + rawDump.WriteByte('\n') + if !bytes.HasPrefix(line, []byte("data: ")) { + continue + } + payload := bytes.TrimPrefix(line, []byte("data: ")) + payload = bytes.TrimSpace(payload) + if len(payload) == 0 { + continue + } + if bytes.Equal(payload, []byte("[DONE]")) { + out.Chunks = append(out.Chunks, streamChunk{Index: idx, Done: true}) + idx++ + break + } + ck := streamChunk{Index: idx, Raw: append([]byte(nil), payload...)} + if err := json.Unmarshal(payload, &ck.Parsed); err != nil { + t.Logf("warning: chunk %d unparseable JSON: %v\nraw=%s", idx, err, truncate(string(payload), 200)) + } else { + var ef struct { + ExtraFields *extraFields `json:"extra_fields,omitempty"` + } + _ = json.Unmarshal(payload, &ef) + ck.ExtraFields = ef.ExtraFields + } + out.Chunks = append(out.Chunks, ck) + idx++ + } + if err := scanner.Err(); err != nil { + t.Fatalf("stream scanner: %v", err) + } + + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.resp.sse.txt", lc.phase, lc.name, step), rawDump.Bytes()) + + fields := map[string]any{ + "status": resp.StatusCode, + "chunk_count": len(out.dataChunks()), + } + if cd := out.cacheDebug(); cd != nil { + fields["cache_hit"] = cd.CacheHit + if cd.CacheID != nil { + fields["cache_id"] = *cd.CacheID + } + if cd.HitType != nil { + fields["hit_type"] = *cd.HitType + } + } else { + fields["cache_debug"] = "" + } + logf(t, lc.at(step), "INFO", "response", fields) + return out +} + +// logCacheDebugFields emits a single response-event log line with the standard +// cache_debug fields, used by every postXxx helper above. +func logCacheDebugFields(t *testing.T, lc logCtx, cd *cacheDebug) { + t.Helper() + fields := map[string]any{"status": 200} + if cd != nil { + fields["cache_hit"] = cd.CacheHit + if cd.CacheID != nil { + fields["cache_id"] = *cd.CacheID + } + if cd.HitType != nil { + fields["hit_type"] = *cd.HitType + } + if cd.CacheHitLatency != nil { + fields["cache_hit_latency"] = *cd.CacheHitLatency + } + } else { + fields["cache_debug"] = "" + } + logf(t, lc, "INFO", "response", fields) +} diff --git a/tests/semanticcache/lifecycle_test.go b/tests/semanticcache/lifecycle_test.go new file mode 100644 index 0000000000..b00ccac8df --- /dev/null +++ b/tests/semanticcache/lifecycle_test.go @@ -0,0 +1,187 @@ +package semanticcache + +import ( + "net/http" + "testing" +) + +const ( + ttlLifecycle = "30s" + defaultKeyLifecycle = "phase3-default" +) + +// TestLifecycle exercises plugin disable / enable / delete lifecycle. +// +// Unlike TestDirect / TestSemantic, every subtest runs SERIALLY by design — +// each case mutates plugin lifecycle state (enabled flag, existence) which +// is fundamentally global and not parallelizable. No `t.Parallel()` calls +// in this file. +// +// Test flow (linear timeline): +// +// Setup → seed entry under direct-only plugin +// 3.1 → PUT {enabled:false} +// 3.2 → request after disable, no cache_debug stamped +// 3.3 → DELETE /api/cache/clear/{id} → expect 400 +// 3.4 → DELETE /api/cache/clear-by-key/{k} → expect 400 +// 3.5 → PUT {enabled:true} +// 3.6 → seed entry STILL hits (disable preserves namespace data) +// 3.7 → DELETE /api/plugins/semantic_cache +// 3.8 → request after delete, no cache_debug +// 3.9 → POST /api/plugins to recreate (same namespace) +// 3.10 → seed entry STILL hits (delete+recreate preserves namespace data — +// contract from commit a7c611e2e removing CleanUpOnShutdown) +func TestLifecycle(t *testing.T) { + lc := newLogCtx("lifecycle", "setup") + logf(t, lc.at(0), "SETUP", "phase_start", map[string]any{ + "mode": "direct-only", + "ttl": ttlLifecycle, + }) + + // Clean state — Phase 2 may have left a plugin in semantic mode; tear it + // down so we can create from scratch in direct-only. + if _, exists := pluginGet(t, lc, 1); exists { + pluginDelete(t, lc, 2) + } + + // Create plugin in direct-only mode. + created := pluginCreate(t, lc, 3, true, directOnlyConfig(ttlLifecycle, defaultKeyLifecycle)) + if !created.Enabled || created.Status.Status != "active" { + t.Fatalf("setup: expected enabled+active, got enabled=%v status=%q", + created.Enabled, created.Status.Status) + } + + // Populate the seed entry. We'll reference seedCacheID and seedReq across + // disable / re-enable / delete / recreate to assert namespace persistence. + seedKey := "phase3-seed" + seedReq := simpleChat(cfg.OpenAIModel, "Name the largest planet in our solar system.") + respA := postChat(t, lc, 4, seedReq, cacheHeaders{Key: seedKey}) + seedCacheID := assertMiss(t, lc, 5, respA) + waitForCacheWrite(t, lc, 6) + // Confirm the seed entry is queryable before we start disrupting state. + _ = assertHit(t, lc, 8, postChat(t, lc, 7, seedReq, cacheHeaders{Key: seedKey}), "direct") + logf(t, lc.at(9), "SETUP", "seed_entry_ready", map[string]any{"cache_id": seedCacheID}) + + allKeys := []string{seedKey, "phase3-k2", "phase3-k8"} + teardownLc := newLogCtx("lifecycle", "teardown") + t.Cleanup(func() { + // Best-effort: clear keys if the plugin is loaded at teardown time. + // If a case left it disabled/deleted, the 400 is informational. + for _, k := range allKeys { + _ = clearByCacheKey(t, teardownLc.at(99), 99, k) + } + }) + + // 3.1 disable_via_update — PUT {enabled:false, config:}. + // Per UI wire parity (PLAN §3.5), we re-send the current config along + // with enabled=false — never PUT bare {enabled:false} which would wipe + // the saved config blob. + t.Run("3.1_disable_via_update", func(t *testing.T) { + lc := newLogCtx("lifecycle", "3.1_disable_via_update") + updated := pluginUpdate(t, lc, 1, false, directOnlyConfig(ttlLifecycle, defaultKeyLifecycle)) + if updated.Enabled { + t.Fatalf("expected enabled=false in update response, got true") + } + // Confirm via GET that the disabled state is reflected. + p, exists := pluginGet(t, lc, 2) + if !exists { + t.Fatalf("plugin row should persist after disable (only memory unloaded)") + } + if p.Enabled { + t.Fatalf("GET expected enabled=false, got true") + } + if p.Status.Status != "disabled" { + t.Fatalf("expected status=disabled, got %q", p.Status.Status) + } + }) + + // 3.2 request_after_disable_no_cache_debug — plugin removed from + // in-memory pipeline; PreLLMHook never runs; no cache_debug stamped. + t.Run("3.2_request_after_disable_no_cache_debug", func(t *testing.T) { + lc := newLogCtx("lifecycle", "3.2_request_after_disable_no_cache_debug") + resp := postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, "What's 2+2?"), cacheHeaders{Key: "phase3-k2"}) + assertNoCacheDebug(t, lc, 2, resp) + }) + + // 3.3 clear_endpoints_when_plugin_disabled — the cache-clear handler must + // return HTTP 400 with "plugin is not loaded" when the resolver returns + // nil. Pre-fix this returned 405; bug surfaced + fixed earlier this run. + t.Run("3.3_clear_endpoints_when_plugin_disabled", func(t *testing.T) { + lc := newLogCtx("lifecycle", "3.3_clear_endpoints_when_plugin_disabled") + status := clearByCacheID(t, lc, 1, "00000000-0000-0000-0000-000000000000") + if status != http.StatusBadRequest { + t.Fatalf("expected 400 (plugin not loaded), got %d", status) + } + }) + + // 3.4 clear_by_key_endpoints_when_disabled — same contract for clear-by-key. + t.Run("3.4_clear_by_key_endpoints_when_disabled", func(t *testing.T) { + lc := newLogCtx("lifecycle", "3.4_clear_by_key_endpoints_when_disabled") + status := clearByCacheKey(t, lc, 1, "phase3-disabled-test") + if status != http.StatusBadRequest { + t.Fatalf("expected 400 (plugin not loaded), got %d", status) + } + }) + + // 3.5 re_enable_via_update — flip back to enabled; status flips to active. + t.Run("3.5_re_enable_via_update", func(t *testing.T) { + lc := newLogCtx("lifecycle", "3.5_re_enable_via_update") + updated := pluginUpdate(t, lc, 1, true, directOnlyConfig(ttlLifecycle, defaultKeyLifecycle)) + if !updated.Enabled { + t.Fatalf("expected enabled=true after re-enable, got false") + } + if updated.Status.Status != "active" { + t.Fatalf("expected status=active after re-enable, got %q", updated.Status.Status) + } + }) + + // 3.6 replay_previous_entries_after_reenable — entries written before + // disable must still be queryable. Namespace data is independent of + // plugin in-memory lifecycle. + t.Run("3.6_replay_previous_entries_after_reenable", func(t *testing.T) { + lc := newLogCtx("lifecycle", "3.6_replay_previous_entries_after_reenable") + resp := postChat(t, lc, 1, seedReq, cacheHeaders{Key: seedKey}) + gotID := assertHit(t, lc, 2, resp, "direct") + assertSameCacheID(t, lc, 3, gotID, seedCacheID) + }) + + // 3.7 delete_plugin — DELETE removes both DB row and in-memory plugin. + t.Run("3.7_delete_plugin", func(t *testing.T) { + lc := newLogCtx("lifecycle", "3.7_delete_plugin") + pluginDelete(t, lc, 1) + if _, exists := pluginGet(t, lc, 2); exists { + t.Fatalf("plugin should be 404 after delete") + } + }) + + // 3.8 request_after_delete — no plugin instance, no cache_debug. + t.Run("3.8_request_after_delete", func(t *testing.T) { + lc := newLogCtx("lifecycle", "3.8_request_after_delete") + resp := postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, "What's 3+3?"), cacheHeaders{Key: "phase3-k8"}) + assertNoCacheDebug(t, lc, 2, resp) + }) + + // 3.9 re_create_clean — POST with the SAME config (and therefore the + // same namespace). Recreate must succeed and surface status=active. + t.Run("3.9_re_create_clean", func(t *testing.T) { + lc := newLogCtx("lifecycle", "3.9_re_create_clean") + created := pluginCreate(t, lc, 1, true, directOnlyConfig(ttlLifecycle, defaultKeyLifecycle)) + if !created.Enabled || created.Status.Status != "active" { + t.Fatalf("recreate: expected enabled+active, got enabled=%v status=%q", + created.Enabled, created.Status.Status) + } + }) + + // 3.10 namespace_persists_across_delete_recreate — the contract that + // commit a7c611e2e (removing CleanUpOnShutdown) enabled: entries written + // under a namespace must survive plugin delete + recreate. Without this, + // any production restart of Bifrost would wipe the cache. + t.Run("3.10_namespace_persists_across_delete_recreate", func(t *testing.T) { + lc := newLogCtx("lifecycle", "3.10_namespace_persists_across_delete_recreate") + resp := postChat(t, lc, 1, seedReq, cacheHeaders{Key: seedKey}) + gotID := assertHit(t, lc, 2, resp, "direct") + assertSameCacheID(t, lc, 3, gotID, seedCacheID) + }) + + logf(t, teardownLc.at(99), "TEARDOWN", "phase_end", nil) +} diff --git a/tests/semanticcache/log_test.go b/tests/semanticcache/log_test.go new file mode 100644 index 0000000000..767288d260 --- /dev/null +++ b/tests/semanticcache/log_test.go @@ -0,0 +1,95 @@ +package semanticcache + +import ( + "fmt" + "os" + "path/filepath" + "sort" + "strings" + "sync" + "testing" + "time" +) + +var ( + logMu sync.Mutex + runReportDir string + runLogFile *os.File + trailSID string +) + +func initLog() error { + base := filepath.Join("reports", time.Now().UTC().Format("20060102T150405Z")) + if err := os.MkdirAll(base, 0o755); err != nil { + return err + } + f, err := os.Create(filepath.Join(base, "run.log")) + if err != nil { + return err + } + runReportDir = base + runLogFile = f + trailSID = strings.TrimSpace(os.Getenv("TRAIL_SESSION_ID")) + return nil +} + +func closeLog() { + logMu.Lock() + defer logMu.Unlock() + if runLogFile != nil { + _ = runLogFile.Close() + runLogFile = nil + } +} + +type logCtx struct { + phase string + name string + step int +} + +func newLogCtx(phase, name string) logCtx { return logCtx{phase: phase, name: name} } + +func (lc logCtx) at(step int) logCtx { lc.step = step; return lc } + +func logf(t *testing.T, lc logCtx, lvl, event string, fields map[string]any) { + t.Helper() + var b strings.Builder + fmt.Fprintf(&b, "[SC-E2E] ts=%s lvl=%-5s phase=%s case=%s step=%d event=%s", + time.Now().UTC().Format("2006-01-02T15:04:05.000Z"), + lvl, lc.phase, lc.name, lc.step, event) + if trailSID != "" { + fmt.Fprintf(&b, " trail_sid=%s", trailSID) + } + keys := make([]string, 0, len(fields)) + for k := range fields { + keys = append(keys, k) + } + sort.Strings(keys) + for _, k := range keys { + fmt.Fprintf(&b, " %s=%v", k, fields[k]) + } + line := b.String() + t.Log(line) + logMu.Lock() + if runLogFile != nil { + fmt.Fprintln(runLogFile, line) + } + logMu.Unlock() +} + +func reportPath(parts ...string) string { + if runReportDir == "" { + return filepath.Join(parts...) + } + return filepath.Join(append([]string{runReportDir}, parts...)...) +} + +func dumpJSON(t *testing.T, name string, body []byte) string { + t.Helper() + p := reportPath(name) + if err := os.WriteFile(p, body, 0o644); err != nil { + t.Logf("warning: dump %s failed: %v", p, err) + } + return p +} diff --git a/tests/semanticcache/logs_crosscheck_test.go b/tests/semanticcache/logs_crosscheck_test.go new file mode 100644 index 0000000000..2781f3a450 --- /dev/null +++ b/tests/semanticcache/logs_crosscheck_test.go @@ -0,0 +1,188 @@ +package semanticcache + +import ( + "encoding/json" + "fmt" + "math" + "net/http" + "testing" + "time" +) + +// floatEpsilon is the tolerance for cache_debug float field comparison between +// the in-flight response stamp and the persisted log row. The two paths use +// different JSON encoders (encoding/json for the HTTP response, sonic for +// the log store at logstore/payload.go:509), and sonic's default precision +// produces small (~1e-5) differences for similarity/threshold values. Not +// semantic drift — just round-trip noise. 1e-4 is comfortably above the +// observed delta while still tight enough to catch any real divergence. +const floatEpsilon = 1e-4 + +// logEntry is the minimum slice of a Bifrost log row we need for cross-checking +// the persisted cache_debug against the in-flight response. The full Log row +// has dozens of fields — we only care about ID, Timestamp, and CacheDebug. +type logEntry struct { + ID string `json:"id"` + Timestamp string `json:"timestamp"` + CacheDebug *cacheDebug `json:"cache_debug,omitempty"` +} + +// findLogByCacheDebug polls /api/logs descending-by-timestamp looking for an +// entry whose cache_debug matches the response stamp's (cache_id, cache_hit) +// pair. Returns the matching log row or fatal-fails after the timeout. +// +// Why match BOTH fields: for a semantic hit, A's miss-and-store log row and +// B's hit-replay log row carry the SAME cache_id (B's stamped cache_id points +// to A's storage entry). Without the cache_hit discriminator the helper would +// return whichever row was persisted first (usually A's miss). +// +// Polling exists because Bifrost's logging pipeline is asynchronous — the HTTP +// response returns before the row is persisted. +func findLogByCacheDebug(t *testing.T, lc logCtx, step int, want *cacheDebug) *logEntry { + t.Helper() + if want == nil || want.CacheID == nil { + t.Fatalf("findLogByCacheDebug: response cache_debug or cache_id is nil") + } + wantID := *want.CacheID + deadline := time.Now().Add(5 * time.Second) + attempts := 0 + for time.Now().Before(deadline) { + attempts++ + status, body, _, err := doJSON(t, "GET", + "/api/logs?limit=50&sort_by=timestamp&order=desc", nil, nil) + if err != nil { + t.Fatalf("findLogByCacheDebug GET err: %v", err) + } + if status != http.StatusOK { + t.Fatalf("findLogByCacheDebug status=%d body=%s", status, truncate(string(body), 300)) + } + var resp struct { + Logs []logEntry `json:"logs"` + } + if err := json.Unmarshal(body, &resp); err != nil { + t.Fatalf("findLogByCacheDebug decode: %v\nbody=%s", err, truncate(string(body), 300)) + } + for i := range resp.Logs { + l := &resp.Logs[i] + if l.CacheDebug == nil || l.CacheDebug.CacheID == nil { + continue + } + if *l.CacheDebug.CacheID != wantID { + continue + } + if l.CacheDebug.CacheHit != want.CacheHit { + continue + } + logf(t, lc.at(step), "INFO", "log_found", map[string]any{ + "cache_id": wantID, "log_id": l.ID, "cache_hit": l.CacheDebug.CacheHit, "attempts": attempts, + }) + return l + } + time.Sleep(200 * time.Millisecond) + } + t.Fatalf("log entry with cache_id=%s cache_hit=%v not found after %d attempts", wantID, want.CacheHit, attempts) + return nil +} + +// assertLogMatchesResponseCacheDebug verifies every field of the persisted +// log's cache_debug matches the in-flight response's cache_debug. Catches +// drift between PostLLMHook stamping and the durable log write — same data +// path the UI Logs view reads, so this guards a real production contract. +func assertLogMatchesResponseCacheDebug(t *testing.T, lc logCtx, step int, respCD, logCD *cacheDebug) { + t.Helper() + if respCD == nil { + t.Fatalf("response cache_debug is nil; nothing to cross-check") + } + if logCD == nil { + t.Fatalf("log row has no cache_debug; expected matching stamp") + } + mismatches := []string{} + if logCD.CacheHit != respCD.CacheHit { + mismatches = append(mismatches, fmt.Sprintf("cache_hit: resp=%v log=%v", respCD.CacheHit, logCD.CacheHit)) + } + if deref(logCD.CacheID) != deref(respCD.CacheID) { + mismatches = append(mismatches, fmt.Sprintf("cache_id: resp=%q log=%q", deref(respCD.CacheID), deref(logCD.CacheID))) + } + if deref(logCD.HitType) != deref(respCD.HitType) { + mismatches = append(mismatches, fmt.Sprintf("hit_type: resp=%q log=%q", deref(respCD.HitType), deref(logCD.HitType))) + } + if deref(logCD.RequestedProvider) != deref(respCD.RequestedProvider) { + mismatches = append(mismatches, fmt.Sprintf("requested_provider: resp=%q log=%q", deref(respCD.RequestedProvider), deref(logCD.RequestedProvider))) + } + if deref(logCD.RequestedModel) != deref(respCD.RequestedModel) { + mismatches = append(mismatches, fmt.Sprintf("requested_model: resp=%q log=%q", deref(respCD.RequestedModel), deref(logCD.RequestedModel))) + } + if deref(logCD.ProviderUsed) != deref(respCD.ProviderUsed) { + mismatches = append(mismatches, fmt.Sprintf("provider_used: resp=%q log=%q", deref(respCD.ProviderUsed), deref(logCD.ProviderUsed))) + } + if deref(logCD.ModelUsed) != deref(respCD.ModelUsed) { + mismatches = append(mismatches, fmt.Sprintf("model_used: resp=%q log=%q", deref(respCD.ModelUsed), deref(logCD.ModelUsed))) + } + // Numeric float fields aren't expected to differ but float64 round-trip + // through sonic JSON is exact for these magnitudes; equality check is fine. + if !floatPtrEq(logCD.Threshold, respCD.Threshold) { + mismatches = append(mismatches, fmt.Sprintf("threshold: resp=%v log=%v", respPtrStr(respCD.Threshold), respPtrStr(logCD.Threshold))) + } + if !floatPtrEq(logCD.Similarity, respCD.Similarity) { + mismatches = append(mismatches, fmt.Sprintf("similarity: resp=%v log=%v", respPtrStr(respCD.Similarity), respPtrStr(logCD.Similarity))) + } + if !intPtrEq(logCD.InputTokens, respCD.InputTokens) { + mismatches = append(mismatches, fmt.Sprintf("input_tokens: resp=%v log=%v", intPtrStr(respCD.InputTokens), intPtrStr(logCD.InputTokens))) + } + // cache_hit_latency is not cross-checked: the log row may be persisted + // after the response was sent, and the field can be the same OR slightly + // different depending on where in PostLLMHook the stamp lands. + + if len(mismatches) > 0 { + t.Fatalf("cache_debug response/log mismatch:\n - %s", joinLines(mismatches)) + } + logf(t, lc.at(step), "PASS", "log_matches_response_cache_debug", map[string]any{ + "cache_id": deref(respCD.CacheID), + "hit_type": deref(respCD.HitType), + "fields_compared": []string{ + "cache_hit", "cache_id", "hit_type", + "requested_provider", "requested_model", + "provider_used", "model_used", "input_tokens", + "threshold", "similarity", + }, + }) +} + +func floatPtrEq(a, b *float64) bool { + if a == nil || b == nil { + return a == b + } + return math.Abs(*a-*b) < floatEpsilon +} + +func intPtrEq(a, b *int) bool { + if a == nil || b == nil { + return a == b + } + return *a == *b +} + +func respPtrStr(p *float64) string { + if p == nil { + return "" + } + return fmt.Sprintf("%.6f", *p) +} + +func intPtrStr(p *int) string { + if p == nil { + return "" + } + return fmt.Sprintf("%d", *p) +} + +func joinLines(s []string) string { + out := "" + for i, v := range s { + if i > 0 { + out += "\n - " + } + out += v + } + return out +} diff --git a/tests/semanticcache/plugin_test.go b/tests/semanticcache/plugin_test.go new file mode 100644 index 0000000000..98a12526a3 --- /dev/null +++ b/tests/semanticcache/plugin_test.go @@ -0,0 +1,212 @@ +package semanticcache + +import ( + "encoding/json" + "fmt" + "net/http" + "testing" +) + +const pluginName = "semantic_cache" + +// createPluginRequest mirrors handlers.CreatePluginRequest + ui/lib/types/plugins.ts. +// path is always sent (UI sends "" for built-ins; backend normalizes empty → nil). +type createPluginRequest struct { + Name string `json:"name"` + Path string `json:"path"` + Enabled bool `json:"enabled"` + Config map[string]any `json:"config"` + Placement *string `json:"placement,omitempty"` + Order *int `json:"order,omitempty"` +} + +// updatePluginRequest mirrors handlers.UpdatePluginRequest. The UI ALWAYS re-sends +// the current config alongside enabled, never PUTs `{enabled:false}` alone — +// that would wipe the DB config row (handlers/plugins.go:399). +type updatePluginRequest struct { + Enabled bool `json:"enabled"` + Path *string `json:"path,omitempty"` + Config map[string]any `json:"config,omitempty"` + Placement *string `json:"placement,omitempty"` + Order *int `json:"order,omitempty"` +} + +type pluginStatus struct { + Name string `json:"name"` + Status string `json:"status"` + Logs []string `json:"logs"` +} + +type pluginResponse struct { + Name string `json:"name"` + ActualName string `json:"actualName"` + Enabled bool `json:"enabled"` + Config map[string]any `json:"config"` + IsCustom bool `json:"isCustom"` + Path *string `json:"path,omitempty"` + Placement *string `json:"placement,omitempty"` + Order *int `json:"order,omitempty"` + Status pluginStatus `json:"status"` +} + +type pluginEnvelope struct { + Message string `json:"message"` + Plugin pluginResponse `json:"plugin"` +} + +// directOnlyConfig returns the plugin config blob for direct-only mode. +// Mirrors what cachingView.tsx buildPayload produces for mode="direct". +func directOnlyConfig(ttl string, defaultKey string) map[string]any { + c := map[string]any{ + "dimension": 1, + "ttl": ttl, + "threshold": 0.8, + "conversation_history_threshold": 3, + "exclude_system_prompt": false, + "cache_by_model": true, + "cache_by_provider": true, + "vector_store_namespace": cfg.Namespace, + } + if defaultKey != "" { + c["default_cache_key"] = defaultKey + } + return c +} + +// semanticConfig returns the plugin config blob for semantic mode. +func semanticConfig(provider, embedModel string, dimension int, ttl string, threshold float64, defaultKey string) map[string]any { + c := map[string]any{ + "provider": provider, + "embedding_model": embedModel, + "dimension": dimension, + "ttl": ttl, + "threshold": threshold, + "conversation_history_threshold": 3, + "exclude_system_prompt": false, + "cache_by_model": true, + "cache_by_provider": true, + "vector_store_namespace": cfg.Namespace, + } + if defaultKey != "" { + c["default_cache_key"] = defaultKey + } + return c +} + +// pluginGet fetches the plugin row; returns (resp, true) if found, (nil, false) on 404. +func pluginGet(t *testing.T, lc logCtx, step int) (*pluginResponse, bool) { + t.Helper() + status, body, _, err := doJSON(t, "GET", "/api/plugins/"+pluginName, nil, nil) + if err != nil { + t.Fatalf("pluginGet http error: %v", err) + } + if status == http.StatusNotFound { + logf(t, lc.at(step), "INFO", "plugin_get", map[string]any{"status": status, "exists": false}) + return nil, false + } + if status != http.StatusOK { + t.Fatalf("pluginGet unexpected status=%d body=%s", status, truncate(string(body), 300)) + } + var p pluginResponse + if err := json.Unmarshal(body, &p); err != nil { + t.Fatalf("pluginGet decode: %v\nbody=%s", err, truncate(string(body), 300)) + } + logf(t, lc.at(step), "INFO", "plugin_get", map[string]any{ + "status": status, + "exists": true, + "enabled": p.Enabled, + "plugin_status": p.Status.Status, + }) + return &p, true +} + +// pluginCreate matches the UI flow: POST /api/plugins with path:"" for built-ins. +func pluginCreate(t *testing.T, lc logCtx, step int, enabled bool, config map[string]any) *pluginResponse { + t.Helper() + req := createPluginRequest{ + Name: pluginName, + Path: "", // UI always sends "" for built-ins (cachingView.tsx:225) + Enabled: enabled, + Config: config, + } + if reqJSON, _ := json.MarshalIndent(req, "", " "); reqJSON != nil { + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.plugin_create.req.json", lc.phase, lc.name, step), reqJSON) + } + logf(t, lc.at(step), "INFO", "plugin_create", map[string]any{ + "enabled": enabled, + "mode": modeFromConfig(config), + "namespace": fmt.Sprintf("%v", config["vector_store_namespace"]), + }) + status, body, _, err := doJSON(t, "POST", "/api/plugins", req, nil) + if err != nil { + t.Fatalf("pluginCreate http error: %v", err) + } + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.plugin_create.resp.json", lc.phase, lc.name, step), body) + if status != http.StatusCreated { + t.Fatalf("pluginCreate status=%d body=%s", status, truncate(string(body), 500)) + } + var env pluginEnvelope + if err := json.Unmarshal(body, &env); err != nil { + t.Fatalf("pluginCreate decode: %v\nbody=%s", err, truncate(string(body), 500)) + } + logf(t, lc.at(step), "PASS", "plugin_created", map[string]any{ + "status": env.Plugin.Status.Status, + "enabled": env.Plugin.Enabled, + }) + return &env.Plugin +} + +// pluginUpdate matches the UI flow: PUT with {enabled, config} — always re-send +// config when toggling enabled, never PUT bare {enabled:false} (would wipe DB row). +func pluginUpdate(t *testing.T, lc logCtx, step int, enabled bool, config map[string]any) *pluginResponse { + t.Helper() + req := updatePluginRequest{ + Enabled: enabled, + Config: config, + } + if reqJSON, _ := json.MarshalIndent(req, "", " "); reqJSON != nil { + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.plugin_update.req.json", lc.phase, lc.name, step), reqJSON) + } + logf(t, lc.at(step), "INFO", "plugin_update", map[string]any{ + "enabled": enabled, + "mode": modeFromConfig(config), + }) + status, body, _, err := doJSON(t, "PUT", "/api/plugins/"+pluginName, req, nil) + if err != nil { + t.Fatalf("pluginUpdate http error: %v", err) + } + dumpJSON(t, fmt.Sprintf("p%s-%s-s%d.plugin_update.resp.json", lc.phase, lc.name, step), body) + if status != http.StatusOK { + t.Fatalf("pluginUpdate status=%d body=%s", status, truncate(string(body), 500)) + } + var env pluginEnvelope + if err := json.Unmarshal(body, &env); err != nil { + t.Fatalf("pluginUpdate decode: %v\nbody=%s", err, truncate(string(body), 500)) + } + logf(t, lc.at(step), "PASS", "plugin_updated", map[string]any{ + "status": env.Plugin.Status.Status, + "enabled": env.Plugin.Enabled, + }) + return &env.Plugin +} + +// pluginDelete removes the plugin row + in-memory instance. +func pluginDelete(t *testing.T, lc logCtx, step int) { + t.Helper() + status, body, _, err := doJSON(t, "DELETE", "/api/plugins/"+pluginName, nil, nil) + if err != nil { + t.Fatalf("pluginDelete http error: %v", err) + } + if status != http.StatusOK && status != http.StatusNotFound { + t.Fatalf("pluginDelete status=%d body=%s", status, truncate(string(body), 300)) + } + logf(t, lc.at(step), "INFO", "plugin_deleted", map[string]any{"status": status}) +} + +// modeFromConfig describes a config blob in one word for log fields. +func modeFromConfig(c map[string]any) string { + if p, _ := c["provider"].(string); p != "" { + return "semantic" + } + return "direct-only" +} diff --git a/tests/semanticcache/preconditions_test.go b/tests/semanticcache/preconditions_test.go new file mode 100644 index 0000000000..6dbd94576d --- /dev/null +++ b/tests/semanticcache/preconditions_test.go @@ -0,0 +1,72 @@ +package semanticcache + +import ( + "net/http" + "strings" + "testing" +) + +// TestPreconditions verifies the test env is ready (Bifrost reachable, +// providers configured, plugin absent at run start). Pure checks, no state +// changes. Trusts env for vector-store config (per plan §13.4). +func TestPreconditions(t *testing.T) { + lc := newLogCtx("preconditions", "preconditions") + logf(t, lc.at(0), "SETUP", "phase_start", map[string]any{"bifrost_url": cfg.BifrostURL}) + + t.Run("0.1_bifrost_reachable", func(t *testing.T) { + lc := lc + lc.name = "0.1_bifrost_reachable" + status, body, _, err := doJSON(t, "GET", "/api/plugins", nil, nil) + if err != nil || status != http.StatusOK { + logf(t, lc.at(1), "FAIL", "bifrost_unreachable", map[string]any{ + "status": status, "err": err, + }) + t.Fatalf("GET /api/plugins failed: status=%d err=%v body=%s", + status, err, truncate(string(body), 200)) + } + logf(t, lc.at(1), "PASS", "bifrost_reachable", map[string]any{"status": status}) + }) + + t.Run("0.2_openai_configured", func(t *testing.T) { + lc := lc + lc.name = "0.2_openai_configured" + ps := providersList(t, lc, 1) + if !hasProvider(ps, "openai") { + logf(t, lc.at(2), "FAIL", "openai_missing", nil) + t.Fatalf("openai provider not configured (got %d providers)", len(ps)) + } + logf(t, lc.at(2), "PASS", "openai_present", nil) + }) + + t.Run("0.3_optional_providers", func(t *testing.T) { + lc := lc + lc.name = "0.3_optional_providers" + ps := providersList(t, lc, 1) + for _, want := range []string{"gemini", "anthropic"} { + if hasProvider(ps, want) { + logf(t, lc.at(2), "PASS", "provider_present", map[string]any{"provider": want}) + } else { + logf(t, lc.at(2), "WARN", "provider_absent", map[string]any{ + "provider": want, + "effect": "cross-provider cases will skip", + }) + } + } + }) + + // The plugin-absent precondition is enforced in TestMain (with RUN_FORCE=1 + // auto-deleting a pre-existing row). We don't re-check here because tests + // run in alphabetical file order — TestDirect / TestSemantic / TestLifecycle + // create their own plugin and may leave it loaded for the next test. + + logf(t, lc.at(99), "TEARDOWN", "phase_end", nil) +} + +func hasProvider(ps []providerSummary, name string) bool { + for _, p := range ps { + if strings.EqualFold(p.Name, name) { + return true + } + } + return false +} diff --git a/tests/semanticcache/semantic_test.go b/tests/semanticcache/semantic_test.go new file mode 100644 index 0000000000..c3d02193ba --- /dev/null +++ b/tests/semanticcache/semantic_test.go @@ -0,0 +1,1031 @@ +package semanticcache + +import ( + "encoding/json" + "fmt" + "math" + "net/http" + "os" + "testing" + "time" +) + +// TestParaphraseFixtures pre-flights every pair in paraphrasePairs against +// the deployed embedding model. Fails early with the actual cosine values +// if a pair has drifted, so downstream semantic cases never debug a +// borderline-flaky pair. Costs ~10 embedding calls (cents). +// +// Set SC_SKIP_FIXTURE_VERIFY=1 to skip this when running semantic cases +// against an environment with no openai/text-embedding-3-small access. +func TestParaphraseFixtures(t *testing.T) { + if os.Getenv("SC_SKIP_FIXTURE_VERIFY") == "1" { + t.Skip("SC_SKIP_FIXTURE_VERIFY=1") + } + for _, pair := range paraphrasePairs { + p := pair + t.Run(p.Name, func(t *testing.T) { + t.Parallel() + lc := newLogCtx("fixtures", p.Name) + + ec := embedVector(t, lc, 1, p.Canonical) + ep := embedVector(t, lc, 2, p.Paraphrase) + eu := embedVector(t, lc, 3, p.Unrelated) + + simHit := cosine(ec, ep) + simMiss := cosine(ec, eu) + + logf(t, lc.at(4), "INFO", "cosine_check", map[string]any{ + "hit_cosine": fmt.Sprintf("%.4f", simHit), + "miss_cosine": fmt.Sprintf("%.4f", simMiss), + }) + + if simHit < 0.85 { + t.Errorf("HIT cosine %.4f < 0.85 — paraphrase too distant\n canonical=%q\n paraphrase=%q", + simHit, p.Canonical, p.Paraphrase) + } + if simMiss > 0.6 { + t.Errorf("MISS cosine %.4f > 0.6 — unrelated too close\n canonical=%q\n unrelated=%q", + simMiss, p.Canonical, p.Unrelated) + } + }) + } +} + +// embedVector hits /v1/embeddings and parses the float64 vector. Plugin +// state irrelevant — direct API call. +func embedVector(t *testing.T, lc logCtx, step int, text string) []float64 { + t.Helper() + req := embeddingRequest{Model: "openai/" + cfg.OpenAIEmbed, Input: text} + status, body, _, err := doJSON(t, "POST", "/v1/embeddings", req, nil) + if err != nil || status != http.StatusOK { + t.Fatalf("embedVector: status=%d err=%v body=%s", status, err, truncate(string(body), 300)) + } + var resp struct { + Data []struct { + Embedding []float64 `json:"embedding"` + } `json:"data"` + } + if err := json.Unmarshal(body, &resp); err != nil { + t.Fatalf("embedVector decode: %v", err) + } + if len(resp.Data) == 0 || len(resp.Data[0].Embedding) == 0 { + t.Fatalf("embedVector: empty data in response %s", truncate(string(body), 300)) + } + logf(t, lc.at(step), "INFO", "embedding_computed", map[string]any{ + "dim": len(resp.Data[0].Embedding), + "text_len": len(text), + }) + return resp.Data[0].Embedding +} + +func cosine(a, b []float64) float64 { + if len(a) != len(b) || len(a) == 0 { + return 0 + } + var dot, na, nb float64 + for i := range a { + dot += a[i] * b[i] + na += a[i] * a[i] + nb += b[i] * b[i] + } + if na == 0 || nb == 0 { + return 0 + } + return dot / (math.Sqrt(na) * math.Sqrt(nb)) +} + +// ----------------------------------------------------------------------------- +// Phase 2 — semantic mode +// ----------------------------------------------------------------------------- + +const ( + ttlSemantic = "30s" + defaultKeySemantic = "phase2-default" + thresholdSemantic = 0.85 +) + +// semanticNamespace is a dedicated Weaviate class for the semantic-mode suite. +// Phase 1 created cfg.Namespace with dimension=1 (direct-only); reusing that +// namespace for dim=1536 writes would error out with "vector dimensions do +// not match the index dimensions" — a Weaviate constraint, not a plugin bug. +// Real users switching modes face the same constraint and create a new +// namespace, so the suite mirrors that. +func semanticNamespace() string { return cfg.Namespace + "Semantic" } + +// semanticBaseline is the canonical Phase 2 plugin config — used by setup and +// by every t.Cleanup that restores baseline after a mutating case. +func semanticBaseline() map[string]any { + // Lock the embedding model: dimension=1536 is hard-coded and only + // text-embedding-3-small produces 1536-dim vectors. Any other model would + // cause confusing dimension-mismatch failures downstream rather than a + // clear prerequisite error here. + if cfg.OpenAIEmbed != "text-embedding-3-small" { + panic(fmt.Sprintf("semantic suite expects cfg.OpenAIEmbed=text-embedding-3-small, got %q", cfg.OpenAIEmbed)) + } + c := semanticConfig("openai", cfg.OpenAIEmbed, 1536, ttlSemantic, thresholdSemantic, defaultKeySemantic) + c["vector_store_namespace"] = semanticNamespace() + return c +} + +func restoreSemanticBaseline(t *testing.T, lc logCtx, step int) { + t.Helper() + pluginUpdate(t, lc, step, true, semanticBaseline()) +} + +// TestSemantic runs the semantic-mode cases (2.1–2.44). +// +// Parallelism rules (same as Phase 1): +// +// - Read-only cases call `t.Parallel()`. +// - Cases that mutate plugin config via `pluginUpdate` (2.12, 2.13, 2.21, +// 2.31, 2.32) MUST NOT call `t.Parallel()`. They run synchronously inside +// the parent loop, one at a time, restoring baseline via `t.Cleanup`. +// +// Plugin lifecycle: this test is self-contained — it upserts the plugin to +// semantic mode at setup regardless of whether Phase 1 ran. Existing entries +// in the namespace from prior runs are tolerated because each case uses a +// unique cache_key (phase2-kNN). +func TestSemantic(t *testing.T) { + lc := newLogCtx("semantic", "setup") + logf(t, lc.at(0), "SETUP", "phase_start", map[string]any{ + "mode": "semantic", + "ttl": ttlSemantic, + "threshold": thresholdSemantic, + "dimension": 1536, + }) + + // Upsert plugin to semantic mode. PUT creates with enabled:false if + // missing, then the same call's body sets enabled:true + config. + if _, exists := pluginGet(t, lc, 1); exists { + pluginUpdate(t, lc, 2, true, semanticBaseline()) + } else { + pluginCreate(t, lc, 2, true, semanticBaseline()) + } + + allKeys := []string{ + defaultKeySemantic, + "phase2-k1", "phase2-k2", "phase2-k3", "phase2-k4", "phase2-k5", + "phase2-k6", "phase2-k7", "phase2-k8", "phase2-k9", "phase2-k10", "phase2-k10-alt", + "phase2-k11", "phase2-k12", "phase2-k13", "phase2-k14", "phase2-k15", + "phase2-k16", "phase2-k17", "phase2-k18", "phase2-k19", "phase2-k20", + "phase2-k21", "phase2-k22", "phase2-k23", "phase2-k24", "phase2-k25", + "phase2-k26", "phase2-k27", "phase2-k28", "phase2-k29", + "phase2-k31a", "phase2-k32", + "phase2-k33", "phase2-k34", "phase2-k35", "phase2-k36", + "phase2-k37", "phase2-k38", "phase2-k39", + "phase2-k40", "phase2-k41", "phase2-k42", "phase2-k43", + "phase2-k39-seedA", "phase2-k39-seedB", + "phase2-k44", + } + t.Cleanup(func() { + // Surface unexpected cleanup failures so stale entries don't poison + // subsequent runs. 404 is fine — not every key in allKeys gets + // written by every run. + for _, k := range allKeys { + if got := clearByCacheKey(t, lc.at(99), 99, k); got != http.StatusOK && got != http.StatusNotFound { + t.Errorf("cleanup clearByCacheKey(%q): unexpected status %d", k, got) + } + } + }) + + // 2.1 direct_path_still_works — exact-match in semantic mode hits direct first. + t.Run("2.1_direct_path_still_works", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.1_direct_path_still_works") + key := "phase2-k1" + req := simpleChat(cfg.OpenAIModel, "Name one common edible mushroom variety.") + respA := postChat(t, lc, 1, req, cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key}) + idB := assertHit(t, lc, 5, respB, "direct") // direct runs first in semantic mode + assertSameCacheID(t, lc, 6, idB, idA) + }) + + // 2.2 semantic_hit_paraphrase — distinct text but high semantic similarity. + t.Run("2.2_semantic_hit_paraphrase", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.2_semantic_hit_paraphrase") + key := "phase2-k2" + pair := pairByName(t, "capital_france") + + respA := postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "semantic") + cd := respB.cacheDebug() + if cd.Similarity == nil || cd.Threshold == nil || cd.ProviderUsed == nil || cd.ModelUsed == nil || cd.InputTokens == nil { + t.Fatalf("expected similarity/threshold/provider_used/model_used/input_tokens stamped on semantic hit, got %+v", cd) + } + if *cd.Similarity < *cd.Threshold { + t.Fatalf("semantic hit but similarity %.4f < threshold %.4f", *cd.Similarity, *cd.Threshold) + } + }) + + // 2.3 below_threshold_miss — unrelated body misses, but cache_debug still + // stamped with provider_used/input_tokens (semantic search ran). + t.Run("2.3_below_threshold_miss", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.3_below_threshold_miss") + key := "phase2-k3" + pair := pairByName(t, "boiling_water") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Unrelated), cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 5, respB) + cd := respB.cacheDebug() + if cd.ProviderUsed == nil || cd.InputTokens == nil { + t.Fatalf("expected provider_used + input_tokens stamped on semantic-search miss, got %+v", cd) + } + }) + + // 2.4 threshold_header_relax — low threshold accepts unrelated as hit. + t.Run("2.4_threshold_header_relax", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.4_threshold_header_relax") + key := "phase2-k4" + pair := pairByName(t, "vinaigrette") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + low := 0.1 + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Unrelated), cacheHeaders{Key: key, Threshold: &low}) + _ = assertHit(t, lc, 5, respB, "semantic") + }) + + // 2.5 threshold_header_tighten — high threshold rejects a normally-hit paraphrase. + t.Run("2.5_threshold_header_tighten", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.5_threshold_header_tighten") + key := "phase2-k5" + pair := pairByName(t, "opera_composer") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + high := 0.999 + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key, Threshold: &high}) + _ = assertMiss(t, lc, 5, respB) + }) + + // 2.6 threshold_clamp_above — threshold > 1.0 clamps to 1.0 → miss. + t.Run("2.6_threshold_clamp_above", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.6_threshold_clamp_above") + key := "phase2-k6" + pair := pairByName(t, "photosynthesis") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + over := 2.0 // clamps to 1.0 + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key, Threshold: &over}) + _ = assertMiss(t, lc, 5, respB) + }) + + // 2.7 threshold_clamp_below — threshold < 0 clamps to 0 → hits anything. + t.Run("2.7_threshold_clamp_below", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.7_threshold_clamp_below") + key := "phase2-k7" + pair := pairByName(t, "capital_france") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + under := -1.0 // clamps to 0.0 + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Unrelated), cacheHeaders{Key: key, Threshold: &under}) + _ = assertHit(t, lc, 5, respB, "semantic") + }) + + // 2.8 cache_type_direct_in_semantic — x-bf-cache-type=direct on a paraphrase + // suppresses semantic search; B misses despite high similarity. + t.Run("2.8_cache_type_direct_in_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.8_cache_type_direct_in_semantic") + key := "phase2-k8" + pair := pairByName(t, "boiling_water") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key, Type: "direct"}) + _ = assertMiss(t, lc, 5, respB) + }) + + // 2.9 cache_type_semantic_only_exact — x-bf-cache-type=semantic on identical + // body still produces a hit, but via the semantic path (direct suppressed). + t.Run("2.9_cache_type_semantic_only_exact", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.9_cache_type_semantic_only_exact") + key := "phase2-k9" + req := simpleChat(cfg.OpenAIModel, "Recommend one short documentary film about science.") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, req, cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, req, cacheHeaders{Key: key, Type: "semantic"}) + _ = assertHit(t, lc, 5, respB, "semantic") + }) + + // 2.10 cache_key_isolation_semantic — paraphrases under different keys → miss. + t.Run("2.10_cache_key_isolation_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.10_cache_key_isolation_semantic") + pair := pairByName(t, "vinaigrette") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: "phase2-k10"})) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: "phase2-k10-alt"}) + _ = assertMiss(t, lc, 5, respB) + }) + + // 2.11 cache_by_model_isolation_semantic — different models, default flag → miss. + t.Run("2.11_cache_by_model_isolation_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.11_cache_by_model_isolation_semantic") + key := "phase2-k11" + pair := pairByName(t, "photosynthesis") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModelAlt, pair.Paraphrase), cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 5, respB) + }) + + // 2.12 cache_by_model_false_semantic — flip flag, paraphrase cross-model → hit. + t.Run("2.12_cache_by_model_false_semantic", func(t *testing.T) { + // Serial: mutates plugin config (cache_by_model=false). + lc := newLogCtx("semantic", "2.12_cache_by_model_false_semantic") + + cfg2 := semanticBaseline() + cfg2["cache_by_model"] = false + pluginUpdate(t, lc, 1, true, cfg2) + t.Cleanup(func() { restoreSemanticBaseline(t, lc, 99) }) + + key := "phase2-k12" + pair := pairByName(t, "opera_composer") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModelAlt, pair.Paraphrase), cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "semantic") + }) + + // 2.13 cross_provider_semantic — both cache_by_* flags off; paraphrase across providers → hit. + t.Run("2.13_cross_provider_semantic", func(t *testing.T) { + // Serial: mutates plugin config (cache_by_provider/model=false). + if cfg.AnthroModel == "" { + t.Skip("anthropic model not configured") + } + lc := newLogCtx("semantic", "2.13_cross_provider_semantic") + + cfg2 := semanticBaseline() + cfg2["cache_by_model"] = false + cfg2["cache_by_provider"] = false + pluginUpdate(t, lc, 1, true, cfg2) + t.Cleanup(func() { restoreSemanticBaseline(t, lc, 99) }) + + key := "phase2-k13" + pair := pairByName(t, "capital_france") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + respB := postChat(t, lc, 4, simpleChat(cfg.AnthroModel, pair.Paraphrase), cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "semantic") + }) + + // 2.14 streaming_semantic_replay — paraphrase across two SSE streams → B replays. + t.Run("2.14_streaming_semantic_replay", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.14_streaming_semantic_replay") + key := "phase2-k14" + pair := pairByName(t, "boiling_water") + + respA := postChatStream(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + + respB := postChatStream(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "semantic") + if len(respB.dataChunks()) != len(respA.dataChunks()) { + t.Fatalf("expected B chunk count %d to match A's %d", len(respB.dataChunks()), len(respA.dataChunks())) + } + }) + + // 2.15 semantic_then_direct_same_request — paraphrase stores; exact same body + // hits via direct (faster, embedding-cost fields absent on B). + t.Run("2.15_semantic_then_direct_same_request", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.15_semantic_then_direct_same_request") + key := "phase2-k15" + pair := pairByName(t, "vinaigrette") + + // A: canonical body — stores under direct ID. + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + // B: same canonical body — direct runs first and hits. + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "direct") + cd := respB.cacheDebug() + if cd.ProviderUsed != nil || cd.ModelUsed != nil || cd.InputTokens != nil { + t.Fatalf("expected provider_used/model_used/input_tokens NIL on direct hit (no embedding generated), got %+v", cd) + } + }) + + // 2.16 clear_cache_id_semantic — populate via semantic, delete by id, retry → miss. + t.Run("2.16_clear_cache_id_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.16_clear_cache_id_semantic") + key := "phase2-k16" + pair := pairByName(t, "opera_composer") + + respA := postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key}) + idA := assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + // Confirm paraphrase hits. + _ = assertHit(t, lc, 5, postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key}), "semantic") + + if got := clearByCacheID(t, lc, 6, idA); got != http.StatusOK { + t.Fatalf("expected 200 from clear-by-id, got %d", got) + } + + // Paraphrase now misses. + _ = assertMiss(t, lc, 8, postChat(t, lc, 7, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key})) + }) + + // 2.17 clear_by_key_semantic — populate 2 paraphrases, clear-by-key, all miss. + t.Run("2.17_clear_by_key_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.17_clear_by_key_semantic") + key := "phase2-k17" + pair1 := pairByName(t, "capital_france") + pair2 := pairByName(t, "photosynthesis") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair1.Canonical), cacheHeaders{Key: key})) + _ = assertMiss(t, lc, 4, postChat(t, lc, 3, simpleChat(cfg.OpenAIModel, pair2.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 5) + + if got := clearByCacheKey(t, lc, 6, key); got != http.StatusOK { + t.Fatalf("expected 200, got %d", got) + } + + _ = assertMiss(t, lc, 8, postChat(t, lc, 7, simpleChat(cfg.OpenAIModel, pair1.Paraphrase), cacheHeaders{Key: key})) + _ = assertMiss(t, lc, 10, postChat(t, lc, 9, simpleChat(cfg.OpenAIModel, pair2.Paraphrase), cacheHeaders{Key: key})) + }) + + // 2.18 ttl_expiry_semantic — sleep past TTL, paraphrase misses. + t.Run("2.18_ttl_expiry_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.18_ttl_expiry_semantic") + key := "phase2-k18" + pair := pairByName(t, "boiling_water") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key, TTL: "5s"})) + waitForCacheWrite(t, lc, 3) + + // Confirm hit within TTL. + _ = assertHit(t, lc, 5, postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key, TTL: "5s"}), "semantic") + + wait := 6 * time.Second + logf(t, lc.at(6), "INFO", "sleep_past_ttl", map[string]any{"seconds": wait.Seconds()}) + time.Sleep(wait) + + _ = assertMiss(t, lc, 8, postChat(t, lc, 7, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key, TTL: "5s"})) + }) + + // 2.19 ttl_per_request_semantic — distinct shorter TTL applies. + t.Run("2.19_ttl_per_request_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.19_ttl_per_request_semantic") + key := "phase2-k19" + pair := pairByName(t, "vinaigrette") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key, TTL: "4s"})) + waitForCacheWrite(t, lc, 3) + _ = assertHit(t, lc, 5, postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key, TTL: "4s"}), "semantic") + + time.Sleep(5 * time.Second) + _ = assertMiss(t, lc, 7, postChat(t, lc, 6, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key, TTL: "4s"})) + }) + + // 2.20 no_store_semantic — A no-store; B paraphrase → miss (nothing stored). + t.Run("2.20_no_store_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.20_no_store_semantic") + key := "phase2-k20" + pair := pairByName(t, "opera_composer") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key, NoStore: "true"})) + waitForCacheWrite(t, lc, 3) + + _ = assertMiss(t, lc, 5, postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key})) + }) + + // 2.21 exclude_system_prompt_semantic — flag flips system out of hash + embedding; + // paraphrase + different systems → semantic hit. + t.Run("2.21_exclude_system_prompt_semantic", func(t *testing.T) { + // Serial: mutates plugin config. + lc := newLogCtx("semantic", "2.21_exclude_system_prompt_semantic") + cfg2 := semanticBaseline() + cfg2["exclude_system_prompt"] = true + pluginUpdate(t, lc, 1, true, cfg2) + t.Cleanup(func() { restoreSemanticBaseline(t, lc, 99) }) + + key := "phase2-k21" + pair := pairByName(t, "capital_france") + userA := pair.Canonical + userB := pair.Paraphrase + + _ = assertMiss(t, lc, 3, postChat(t, lc, 2, chatWithSystem(cfg.OpenAIModel, "You are a geographer.", userA), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 4) + + _ = assertHit(t, lc, 6, postChat(t, lc, 5, chatWithSystem(cfg.OpenAIModel, "You are a poet.", userB), cacheHeaders{Key: key}), "semantic") + }) + + // 2.22 conversation_threshold_semantic — 4-message conversation skipped entirely. + t.Run("2.22_conversation_threshold_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.22_conversation_threshold_semantic") + key := "phase2-k22" + + msgs := []chatMessage{ + {Role: "user", Content: textContent("Hi.")}, + {Role: "assistant", Content: textContent("Hello! How can I help?")}, + {Role: "user", Content: textContent("Tell me about the boiling point of water.")}, + {Role: "user", Content: textContent("Actually, just give me the temperature in Celsius.")}, + } + req := chatRequest{Model: cfg.OpenAIModel, Messages: msgs} + + assertNoCacheDebug(t, lc, 2, postChat(t, lc, 1, req, cacheHeaders{Key: key})) + assertNoCacheDebug(t, lc, 4, postChat(t, lc, 3, req, cacheHeaders{Key: key})) + }) + + // 2.23 attachments_change_semantic — paraphrase + different image URL → miss + // (attachments part of params_hash, filter excludes). + t.Run("2.23_attachments_change_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.23_attachments_change_semantic") + key := "phase2-k23" + textA := "What's pictured in this image?" + textB := "Describe the contents of this image." + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, chatWithImage(cfg.OpenAIModel, textA, testImageURL1), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + _ = assertMiss(t, lc, 5, postChat(t, lc, 4, chatWithImage(cfg.OpenAIModel, textB, testImageURL2), cacheHeaders{Key: key})) + }) + + // 2.24 embedding_endpoint_semantic_skip — embedding requests bypass semantic + // search entirely (PreLLMHook semanticEligible check). Exact match hits + // direct; different input misses (no paraphrase match attempt). + t.Run("2.24_embedding_endpoint_semantic_skip", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.24_embedding_endpoint_semantic_skip") + key := "phase2-k24" + req := embeddingRequest{Model: "openai/" + cfg.OpenAIEmbed, Input: "The cat sat on the mat."} + + _ = assertMiss(t, lc, 2, postEmbedding(t, lc, 1, req, cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + _ = assertHit(t, lc, 5, postEmbedding(t, lc, 4, req, cacheHeaders{Key: key}), "direct") + + // Different input — no semantic fallback, just direct miss. + req2 := embeddingRequest{Model: "openai/" + cfg.OpenAIEmbed, Input: "The dog chased the ball."} + _ = assertMiss(t, lc, 7, postEmbedding(t, lc, 6, req2, cacheHeaders{Key: key})) + }) + + // 2.25 image_gen_semantic_paraphrase — image prompts paraphrase across two calls. + t.Run("2.25_image_gen_semantic_paraphrase", func(t *testing.T) { + t.Parallel() + if os.Getenv("SC_SKIP_IMAGE_GEN") == "1" { + t.Skip("SC_SKIP_IMAGE_GEN=1") + } + lc := newLogCtx("semantic", "2.25_image_gen_semantic_paraphrase") + key := "phase2-k25" + pair := imagePairByName(t, "red_apple") + n := 1 + reqA := imageGenRequest{Model: "openai/dall-e-3", Prompt: pair.Canonical, N: &n, Size: "1024x1024"} + reqB := imageGenRequest{Model: "openai/dall-e-3", Prompt: pair.Paraphrase, N: &n, Size: "1024x1024"} + + _ = assertMiss(t, lc, 2, postImageGen(t, lc, 1, reqA, cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + _ = assertHit(t, lc, 5, postImageGen(t, lc, 4, reqB, cacheHeaders{Key: key}), "semantic") + }) + + // 2.26 responses_api_semantic — paraphrase on /v1/responses → semantic hit. + t.Run("2.26_responses_api_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.26_responses_api_semantic") + key := "phase2-k26" + pair := pairByName(t, "photosynthesis") + reqA := responsesRequest{Model: cfg.OpenAIModel, Input: pair.Canonical} + reqB := responsesRequest{Model: cfg.OpenAIModel, Input: pair.Paraphrase} + + _ = assertMiss(t, lc, 2, postResponses(t, lc, 1, reqA, cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + _ = assertHit(t, lc, 5, postResponses(t, lc, 4, reqB, cacheHeaders{Key: key}), "semantic") + }) + + // 2.27 text_completion_semantic — paraphrase on /v1/completions → semantic hit. + t.Run("2.27_text_completion_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.27_text_completion_semantic") + key := "phase2-k27" + maxTok := 40 + reqA := textCompletionRequest{Model: "openai/gpt-3.5-turbo-instruct", Prompt: "Briefly explain how photosynthesis works in green plants.", MaxTokens: &maxTok} + reqB := textCompletionRequest{Model: "openai/gpt-3.5-turbo-instruct", Prompt: "In a few sentences, describe how photosynthesis works in green plants.", MaxTokens: &maxTok} + + _ = assertMiss(t, lc, 2, postTextCompletion(t, lc, 1, reqA, cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + _ = assertHit(t, lc, 5, postTextCompletion(t, lc, 4, reqB, cacheHeaders{Key: key}), "semantic") + }) + + // 2.28 gemini_semantic_hit — chat provider != embedding provider. + t.Run("2.28_gemini_semantic_hit", func(t *testing.T) { + t.Parallel() + if cfg.GeminiModel == "" { + t.Skip("gemini model not configured") + } + lc := newLogCtx("semantic", "2.28_gemini_semantic_hit") + key := "phase2-k28" + pair := pairByName(t, "capital_france") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.GeminiModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + _ = assertHit(t, lc, 5, postChat(t, lc, 4, simpleChat(cfg.GeminiModel, pair.Paraphrase), cacheHeaders{Key: key}), "semantic") + }) + + // 2.29 params_hash_isolates_semantic — paraphrases with different temperatures → miss. + t.Run("2.29_params_hash_isolates_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.29_params_hash_isolates_semantic") + key := "phase2-k29" + pair := pairByName(t, "boiling_water") + + reqA := simpleChat(cfg.OpenAIModel, pair.Canonical) + t1 := 0.2 + reqA.Temperature = &t1 + reqB := simpleChat(cfg.OpenAIModel, pair.Paraphrase) + t2 := 0.9 + reqB.Temperature = &t2 + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, reqA, cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + _ = assertMiss(t, lc, 5, postChat(t, lc, 4, reqB, cacheHeaders{Key: key})) + }) + + // 2.30 plugin_status_semantic — GET shows status active + semantic config. + t.Run("2.30_plugin_status_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.30_plugin_status_semantic") + p, exists := pluginGet(t, lc, 1) + if !exists { + t.Fatalf("plugin should exist mid-phase") + } + if !p.Enabled || p.Status.Status != "active" { + t.Fatalf("expected enabled+active, got enabled=%v status=%q", p.Enabled, p.Status.Status) + } + gotProvider, _ := p.Config["provider"].(string) + if gotProvider != "openai" { + t.Fatalf("expected provider=openai, got %q", gotProvider) + } + gotDim, _ := p.Config["dimension"].(float64) + if int(gotDim) != 1536 { + t.Fatalf("expected dimension=1536, got %v", p.Config["dimension"]) + } + }) + + // 2.31 namespace_change_isolates — entries scoped to namespace; flipping + // the namespace makes prior entries unreachable, flipping back restores. + t.Run("2.31_namespace_change_isolates", func(t *testing.T) { + // Serial: mutates plugin config (vector_store_namespace). + lc := newLogCtx("semantic", "2.31_namespace_change_isolates") + // Use a known body for direct-cache reproducibility. + body := "What is the boiling point of pure water at standard pressure?" + key := "phase2-k31a" + altNS := cfg.Namespace + "Alt" + // Step 7 will store an entry in altNS. The outer t.Cleanup at the + // suite level iterates allKeys against whatever namespace the plugin + // currently points at — once we restore baseline below, the altNS + // entry becomes unreachable from there. Flip back to altNS, clear, + // then restore baseline. + t.Cleanup(func() { + altCfg := semanticBaseline() + altCfg["vector_store_namespace"] = altNS + pluginUpdate(t, lc, 97, true, altCfg) + _ = clearByCacheKey(t, lc.at(98), 98, key) + restoreSemanticBaseline(t, lc, 99) + }) + + // Phase 2 baseline is namespace=cfg.Namespace. Populate an entry. + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + // Confirm hit under baseline namespace. + _ = assertHit(t, lc, 5, postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key}), "direct") + + // Flip to alternate namespace; same body should miss. + cfg2 := semanticBaseline() + cfg2["vector_store_namespace"] = altNS + pluginUpdate(t, lc, 6, true, cfg2) + _ = assertMiss(t, lc, 8, postChat(t, lc, 7, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key})) + + // Flip back to baseline; entry should resurface. + pluginUpdate(t, lc, 9, true, semanticBaseline()) + _ = assertHit(t, lc, 11, postChat(t, lc, 10, simpleChat(cfg.OpenAIModel, body), cacheHeaders{Key: key}), "direct") + }) + + // 2.32 dimension_change_silent_miss — write at dim 1536, switch model to + // dim 3072 same namespace; reads should miss (UI banner warns about this). + // Documents actual behavior — error vs silent miss vs warn. + t.Run("2.32_dimension_change_silent_miss", func(t *testing.T) { + // Serial: mutates plugin config (embedding_model + dimension). + lc := newLogCtx("semantic", "2.32_dimension_change_silent_miss") + t.Cleanup(func() { restoreSemanticBaseline(t, lc, 99) }) + + key := "phase2-k32" + pair := pairByName(t, "opera_composer") + + // Write under dim=1536. + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + + // Switch to text-embedding-3-large (dim 3072) on the SAME namespace. + cfg2 := semanticConfig("openai", "text-embedding-3-large", 3072, ttlSemantic, thresholdSemantic, defaultKeySemantic) + cfg2["vector_store_namespace"] = semanticNamespace() + pluginUpdate(t, lc, 4, true, cfg2) + + // Read paraphrase. Expected: miss (UI warns: "reads will silently miss"). + // If Bifrost errors instead, the test will fail at postChat with status!=200 + // — that surfaces a different actual behavior worth documenting. + respB := postChat(t, lc, 5, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key}) + if cd := respB.cacheDebug(); cd != nil && cd.CacheHit { + t.Fatalf("expected miss (UI banner: dim change makes reads silently miss); got hit cache_id=%s", deref(cd.CacheID)) + } + logf(t, lc.at(6), "PASS", "dimension_change_silent_miss_documented", map[string]any{ + "behavior": "miss", + }) + }) + + // 2.33 streaming_tool_calls_replay — paraphrase preserves tool_calls in replay. + t.Run("2.33_streaming_tool_calls_replay", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.33_streaming_tool_calls_replay") + key := "phase2-k33" + toolDef := chatTool{Type: "function", Function: &toolFunction{ + Name: "get_weather", Description: "Get the current weather in a city", + Parameters: map[string]any{"type": "object", "properties": map[string]any{"city": map[string]any{"type": "string"}}, "required": []string{"city"}}, + }} + + reqA := simpleChat(cfg.OpenAIModel, "What's the current weather in Tokyo right now?") + reqA.Tools = []chatTool{toolDef} + reqB := simpleChat(cfg.OpenAIModel, "Tell me the present weather in Tokyo right now.") + reqB.Tools = []chatTool{toolDef} + + respA := postChatStream(t, lc, 1, reqA, cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 2, respA) + waitForCacheWrite(t, lc, 3) + respB := postChatStream(t, lc, 4, reqB, cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "semantic") + if len(respB.dataChunks()) != len(respA.dataChunks()) { + t.Fatalf("chunk count mismatch: A=%d B=%d", len(respA.dataChunks()), len(respB.dataChunks())) + } + }) + + // 2.34 tools_order_independent_semantic — paraphrase with reordered tools → hit. + t.Run("2.34_tools_order_independent_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.34_tools_order_independent_semantic") + key := "phase2-k34" + toolA := chatTool{Type: "function", Function: &toolFunction{Name: "get_weather", Parameters: map[string]any{"type": "object", "properties": map[string]any{"city": map[string]any{"type": "string"}}}}} + toolB := chatTool{Type: "function", Function: &toolFunction{Name: "search_web", Parameters: map[string]any{"type": "object", "properties": map[string]any{"query": map[string]any{"type": "string"}}}}} + + reqA := simpleChat(cfg.OpenAIModel, "What is the capital city of France in modern times?") + reqA.Tools = []chatTool{toolA, toolB} + reqB := simpleChat(cfg.OpenAIModel, "Tell me the capital city of France in modern times.") + reqB.Tools = []chatTool{toolB, toolA} + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, reqA, cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + _ = assertHit(t, lc, 5, postChat(t, lc, 4, reqB, cacheHeaders{Key: key}), "semantic") + }) + + // 2.35 tools_function_name_change_semantic — different tool names → miss. + t.Run("2.35_tools_function_name_change_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.35_tools_function_name_change_semantic") + key := "phase2-k35" + mkTool := func(name string) chatTool { + return chatTool{Type: "function", Function: &toolFunction{Name: name, Parameters: map[string]any{"type": "object", "properties": map[string]any{"q": map[string]any{"type": "string"}}}}} + } + reqA := simpleChat(cfg.OpenAIModel, "Briefly explain how photosynthesis works in green plants.") + reqA.Tools = []chatTool{mkTool("search")} + reqB := simpleChat(cfg.OpenAIModel, "In a few sentences, describe how photosynthesis works in green plants.") + reqB.Tools = []chatTool{mkTool("lookup")} + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, reqA, cacheHeaders{Key: key})) + // Wait so reqA's write commits; otherwise reqB misses for trivial + // reasons (empty cache) rather than tool-name isolation. + waitForCacheWrite(t, lc, 3) + _ = assertMiss(t, lc, 5, postChat(t, lc, 4, reqB, cacheHeaders{Key: key})) + }) + + // 2.36 prompt_cache_key_semantic — different prompt_cache_key → miss. + t.Run("2.36_prompt_cache_key_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.36_prompt_cache_key_semantic") + key := "phase2-k36" + pair := pairByName(t, "vinaigrette") + + reqA := simpleChat(cfg.OpenAIModel, pair.Canonical) + pckA := "tenant-X" + reqA.PromptCacheKey = &pckA + reqB := simpleChat(cfg.OpenAIModel, pair.Paraphrase) + pckB := "tenant-Y" + reqB.PromptCacheKey = &pckB + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, reqA, cacheHeaders{Key: key})) + // Wait so reqA's write commits; otherwise reqB misses for trivial + // reasons (empty cache) rather than prompt_cache_key isolation. + waitForCacheWrite(t, lc, 3) + _ = assertMiss(t, lc, 5, postChat(t, lc, 4, reqB, cacheHeaders{Key: key})) + }) + + // 2.37 service_tier_semantic — different service_tier → miss. + t.Run("2.37_service_tier_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.37_service_tier_semantic") + key := "phase2-k37" + pair := pairByName(t, "capital_france") + + reqA := simpleChat(cfg.OpenAIModel, pair.Canonical) + stA := "default" + reqA.ServiceTier = &stA + reqB := simpleChat(cfg.OpenAIModel, pair.Paraphrase) + stB := "auto" + reqB.ServiceTier = &stB + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, reqA, cacheHeaders{Key: key})) + // Wait so reqA's write commits; otherwise reqB misses for trivial + // reasons (empty cache) rather than service_tier isolation. + waitForCacheWrite(t, lc, 3) + _ = assertMiss(t, lc, 5, postChat(t, lc, 4, reqB, cacheHeaders{Key: key})) + }) + + // 2.38 store_flag_semantic — different store flag → miss. + t.Run("2.38_store_flag_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.38_store_flag_semantic") + key := "phase2-k38" + pair := pairByName(t, "boiling_water") + + reqA := simpleChat(cfg.OpenAIModel, pair.Canonical) + storeA := true + reqA.Store = &storeA + reqB := simpleChat(cfg.OpenAIModel, pair.Paraphrase) + storeB := false + reqB.Store = &storeB + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, reqA, cacheHeaders{Key: key})) + // Wait so reqA's write commits; otherwise reqB misses for trivial + // reasons (empty cache) rather than store-flag isolation. + waitForCacheWrite(t, lc, 3) + _ = assertMiss(t, lc, 5, postChat(t, lc, 4, reqB, cacheHeaders{Key: key})) + }) + + // 2.39 responses_previous_response_id_semantic — different previous_response_id → miss. + t.Run("2.39_responses_previous_response_id_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.39_responses_previous_response_id_semantic") + key := "phase2-k39" + + // Seed two response IDs. Distinct cache keys are essential — sharing one + // key would cause the second seed to semantic-hit the first and return + // the SAME response id, defeating the isolation test. + seed1 := postResponses(t, lc, 1, responsesRequest{Model: cfg.OpenAIModel, Input: "Recite the first digit of pi."}, cacheHeaders{Key: "phase2-k39-seedA", NoStore: "true"}) + var s1 struct { + ID string `json:"id"` + } + if err := json.Unmarshal(seed1.bodyRaw, &s1); err != nil || s1.ID == "" { + t.Skipf("could not seed response id: %v", err) + } + seed2 := postResponses(t, lc, 2, responsesRequest{Model: cfg.OpenAIModel, Input: "Name the largest moon of Jupiter."}, cacheHeaders{Key: "phase2-k39-seedB", NoStore: "true"}) + var s2 struct { + ID string `json:"id"` + } + if err := json.Unmarshal(seed2.bodyRaw, &s2); err != nil || s2.ID == "" { + t.Skipf("could not seed second response id: %v", err) + } + if s1.ID == s2.ID { + t.Skipf("seed response ids collided (%s); test prerequisite not met", s1.ID) + } + + reqA := responsesRequest{Model: cfg.OpenAIModel, Input: "Continue from before.", PreviousResponseID: &s1.ID} + reqB := responsesRequest{Model: cfg.OpenAIModel, Input: "Continue from prior.", PreviousResponseID: &s2.ID} + + _ = assertMiss(t, lc, 4, postResponses(t, lc, 3, reqA, cacheHeaders{Key: key})) + // Wait so reqA's write commits; otherwise reqB misses for trivial + // reasons (empty cache) rather than previous_response_id isolation. + waitForCacheWrite(t, lc, 5) + _ = assertMiss(t, lc, 7, postResponses(t, lc, 6, reqB, cacheHeaders{Key: key})) + }) + + // 2.40 no_store_explicit_false_semantic — header value "false" doesn't toggle. + t.Run("2.40_no_store_explicit_false_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.40_no_store_explicit_false_semantic") + key := "phase2-k40" + pair := pairByName(t, "opera_composer") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key, NoStore: "false"})) + waitForCacheWrite(t, lc, 3) + _ = assertHit(t, lc, 5, postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key, NoStore: "false"}), "semantic") + }) + + // 2.41 no_store_uppercase_true_semantic — case-sensitive match; "TRUE" does NOT block. + t.Run("2.41_no_store_uppercase_true_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.41_no_store_uppercase_true_semantic") + key := "phase2-k41" + pair := pairByName(t, "photosynthesis") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key, NoStore: "TRUE"})) + waitForCacheWrite(t, lc, 3) + _ = assertHit(t, lc, 5, postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key, NoStore: "TRUE"}), "semantic") + }) + + // 2.42 streaming_non_final_chunks_no_cache_debug_semantic — only final + // chunk has cache_debug, both on miss (semantic search ran) and hit (semantic replay). + t.Run("2.42_streaming_non_final_chunks_no_cache_debug_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.42_streaming_non_final_chunks_no_cache_debug_semantic") + key := "phase2-k42" + pair := pairByName(t, "vinaigrette") + + check := func(stage string, resp *streamResponse) { + data := resp.dataChunks() + if len(data) == 0 { + t.Fatalf("[%s] no data chunks", stage) + } + for i := 0; i < len(data)-1; i++ { + if cd := data[i].cacheDebug(); cd != nil { + t.Fatalf("[%s] non-final chunk %d had cache_debug: %+v", stage, i, cd) + } + } + if data[len(data)-1].cacheDebug() == nil { + t.Fatalf("[%s] final chunk missing cache_debug", stage) + } + } + + respA := postChatStream(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key}) + _ = assertMiss(t, lc, 2, respA) + check("miss-with-semantic-search", respA) + waitForCacheWrite(t, lc, 3) + respB := postChatStream(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key}) + _ = assertHit(t, lc, 5, respB, "semantic") + check("hit-semantic", respB) + }) + + // 2.43 ttl_zero_per_request_semantic — TTL=0s falls back to default; B paraphrase hits. + t.Run("2.43_ttl_zero_per_request_semantic", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.43_ttl_zero_per_request_semantic") + key := "phase2-k43" + pair := pairByName(t, "capital_france") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key, TTL: "0s"})) + waitForCacheWrite(t, lc, 3) + _ = assertHit(t, lc, 5, postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key, TTL: "0s"}), "semantic") + }) + + // 2.44 cache_debug_in_logs_endpoint — cross-check persisted log row's + // cache_debug column against the in-flight semantic hit. In semantic mode + // cache_debug carries the richest field set (provider_used, model_used, + // input_tokens, threshold, similarity), making this a high-value drift + // check. + t.Run("2.44_cache_debug_in_logs_endpoint", func(t *testing.T) { + t.Parallel() + lc := newLogCtx("semantic", "2.44_cache_debug_in_logs_endpoint") + key := "phase2-k44" + pair := pairByName(t, "vinaigrette") + + _ = assertMiss(t, lc, 2, postChat(t, lc, 1, simpleChat(cfg.OpenAIModel, pair.Canonical), cacheHeaders{Key: key})) + waitForCacheWrite(t, lc, 3) + respB := postChat(t, lc, 4, simpleChat(cfg.OpenAIModel, pair.Paraphrase), cacheHeaders{Key: key}) + respCD := assertHitAndReturnCacheDebug(t, lc, 5, respB, "semantic") + + entry := findLogByCacheDebug(t, lc, 6, respCD) + assertLogMatchesResponseCacheDebug(t, lc, 7, respCD, entry.CacheDebug) + }) + + logf(t, newLogCtx("semantic", "teardown").at(99), "TEARDOWN", "phase_end", nil) +} diff --git a/transports/bifrost-http/handlers/cache.go b/transports/bifrost-http/handlers/cache.go index c46515dc60..c2becb1d4b 100644 --- a/transports/bifrost-http/handlers/cache.go +++ b/transports/bifrost-http/handlers/cache.go @@ -3,38 +3,55 @@ package handlers import ( "github.com/fasthttp/router" "github.com/maximhq/bifrost/core/schemas" - "github.com/maximhq/bifrost/plugins/semanticcache" "github.com/maximhq/bifrost/transports/bifrost-http/lib" "github.com/valyala/fasthttp" ) -type CacheHandler struct { - plugin *semanticcache.Plugin +// CacheClearer is the minimal contract the handler needs from the semantic +// cache plugin. Exported so the server wiring can supply a resolver without +// pulling in the plugin's concrete type and so tests can substitute a fake. +type CacheClearer interface { + ClearCacheForCacheID(cacheID string) error + ClearCacheForKey(cacheKey string) error } -func NewCacheHandler(plugin schemas.LLMPlugin) *CacheHandler { - semanticCachePlugin, ok := plugin.(*semanticcache.Plugin) - if !ok { - logger.Fatal("Cache handler requires a semantic cache plugin") - } +// CacheClearerResolver returns the currently-loaded cache plugin or nil if +// none is loaded. Called on every cache-clear request so plugin lifecycle +// (POST/PUT/DELETE /api/plugins) is honored — without this, the handler +// would hold a stale pointer after a plugin reload and the routes would +// silently misbehave (or never exist at all if the plugin was loaded +// post-boot rather than at startup). +type CacheClearerResolver func() CacheClearer - return &CacheHandler{ - plugin: semanticCachePlugin, - } +type CacheHandler struct { + resolve CacheClearerResolver +} + +// NewCacheHandler returns a CacheHandler that resolves the current plugin +// at request time. The handler is safe to wire unconditionally — when no +// plugin is loaded, each cache-clear request returns HTTP 400 with a clear +// message rather than the route being absent (HTTP 405). +func NewCacheHandler(resolve CacheClearerResolver) *CacheHandler { + return &CacheHandler{resolve: resolve} } func (h *CacheHandler) RegisterRoutes(r *router.Router, middlewares ...schemas.BifrostHTTPMiddleware) { - r.DELETE("/api/cache/clear/{requestId}", lib.ChainMiddlewares(h.clearCache, middlewares...)) + r.DELETE("/api/cache/clear/{cacheId}", lib.ChainMiddlewares(h.clearCache, middlewares...)) r.DELETE("/api/cache/clear-by-key/{cacheKey}", lib.ChainMiddlewares(h.clearCacheByKey, middlewares...)) } func (h *CacheHandler) clearCache(ctx *fasthttp.RequestCtx) { - requestID, ok := ctx.UserValue("requestId").(string) - if !ok { - SendError(ctx, fasthttp.StatusBadRequest, "Invalid request ID") + plugin := h.resolve() + if plugin == nil { + SendError(ctx, fasthttp.StatusBadRequest, "semantic_cache plugin is not loaded") + return + } + cacheID, ok := ctx.UserValue("cacheId").(string) + if !ok || cacheID == "" { + SendError(ctx, fasthttp.StatusBadRequest, "Invalid cache ID") return } - if err := h.plugin.ClearCacheForRequestID(requestID); err != nil { + if err := plugin.ClearCacheForCacheID(cacheID); err != nil { SendError(ctx, fasthttp.StatusInternalServerError, "Failed to clear cache") return } @@ -45,12 +62,17 @@ func (h *CacheHandler) clearCache(ctx *fasthttp.RequestCtx) { } func (h *CacheHandler) clearCacheByKey(ctx *fasthttp.RequestCtx) { + plugin := h.resolve() + if plugin == nil { + SendError(ctx, fasthttp.StatusBadRequest, "semantic_cache plugin is not loaded") + return + } cacheKey, ok := ctx.UserValue("cacheKey").(string) if !ok { SendError(ctx, fasthttp.StatusBadRequest, "Invalid cache key") return } - if err := h.plugin.ClearCacheForKey(cacheKey); err != nil { + if err := plugin.ClearCacheForKey(cacheKey); err != nil { SendError(ctx, fasthttp.StatusInternalServerError, "Failed to clear cache") return } diff --git a/transports/bifrost-http/handlers/cache_test.go b/transports/bifrost-http/handlers/cache_test.go new file mode 100644 index 0000000000..c92b50b7dc --- /dev/null +++ b/transports/bifrost-http/handlers/cache_test.go @@ -0,0 +1,176 @@ +package handlers + +import ( + "errors" + "strings" + "testing" + + "github.com/valyala/fasthttp" +) + +// fakeCacheClearer records calls and returns configured errors so the handler +// branches can be exercised without a real semantic cache plugin. +type fakeCacheClearer struct { + clearByID func(string) error + clearByKey func(string) error + idCalls []string + keyCalls []string +} + +func (f *fakeCacheClearer) ClearCacheForCacheID(id string) error { + f.idCalls = append(f.idCalls, id) + if f.clearByID != nil { + return f.clearByID(id) + } + return nil +} + +func (f *fakeCacheClearer) ClearCacheForKey(key string) error { + f.keyCalls = append(f.keyCalls, key) + if f.clearByKey != nil { + return f.clearByKey(key) + } + return nil +} + +func newCacheCtx(userKey, userVal string) *fasthttp.RequestCtx { + ctx := &fasthttp.RequestCtx{} + if userKey != "" { + ctx.SetUserValue(userKey, userVal) + } + return ctx +} + +// newCacheHandler builds a CacheHandler whose resolver always returns the +// given fake — mimics a steady-state "plugin loaded" environment. +func newCacheHandler(clearer CacheClearer) *CacheHandler { + return NewCacheHandler(func() CacheClearer { return clearer }) +} + +// ----------------------------------------------------------------------------- +// clearCache (DELETE /api/cache/clear/{cacheId}) +// ----------------------------------------------------------------------------- + +func TestClearCache_OK(t *testing.T) { + clearer := &fakeCacheClearer{} + h := newCacheHandler(clearer) + + ctx := newCacheCtx("cacheId", "abc-123") + h.clearCache(ctx) + + if got := ctx.Response.StatusCode(); got != fasthttp.StatusOK { + t.Fatalf("expected 200, got %d body=%s", got, ctx.Response.Body()) + } + if len(clearer.idCalls) != 1 || clearer.idCalls[0] != "abc-123" { + t.Fatalf("expected ClearCacheForCacheID('abc-123'), got %v", clearer.idCalls) + } +} + +func TestClearCache_RejectsEmptyID(t *testing.T) { + clearer := &fakeCacheClearer{} + h := newCacheHandler(clearer) + + ctx := newCacheCtx("cacheId", "") + h.clearCache(ctx) + + if got := ctx.Response.StatusCode(); got != fasthttp.StatusBadRequest { + t.Fatalf("expected 400 for empty id, got %d", got) + } + if len(clearer.idCalls) != 0 { + t.Fatalf("expected no Clear calls on bad id, got %v", clearer.idCalls) + } +} + +func TestClearCache_MissingUserValue(t *testing.T) { + clearer := &fakeCacheClearer{} + h := newCacheHandler(clearer) + + // No user value set at all (simulates a routing misconfiguration). + ctx := &fasthttp.RequestCtx{} + h.clearCache(ctx) + + if got := ctx.Response.StatusCode(); got != fasthttp.StatusBadRequest { + t.Fatalf("expected 400 when cacheId user value missing, got %d", got) + } +} + +func TestClearCache_PluginErrorReturns500(t *testing.T) { + clearer := &fakeCacheClearer{ + clearByID: func(string) error { return errors.New("store unavailable") }, + } + h := newCacheHandler(clearer) + + ctx := newCacheCtx("cacheId", "abc-123") + h.clearCache(ctx) + + if got := ctx.Response.StatusCode(); got != fasthttp.StatusInternalServerError { + t.Fatalf("expected 500 on plugin error, got %d", got) + } + if !strings.Contains(string(ctx.Response.Body()), "Failed to clear cache") { + t.Fatalf("expected 'Failed to clear cache' in body, got %s", ctx.Response.Body()) + } +} + +// TestClearCache_PluginNotLoaded covers the regression where the handler +// would 405 (route absent) or panic on a nil pointer when the plugin +// wasn't loaded at boot. The new resolver-based handler must return 400. +func TestClearCache_PluginNotLoaded(t *testing.T) { + h := NewCacheHandler(func() CacheClearer { return nil }) + + ctx := newCacheCtx("cacheId", "abc-123") + h.clearCache(ctx) + + if got := ctx.Response.StatusCode(); got != fasthttp.StatusBadRequest { + t.Fatalf("expected 400 when plugin not loaded, got %d", got) + } + if !strings.Contains(string(ctx.Response.Body()), "semantic_cache plugin is not loaded") { + t.Fatalf("expected plugin-not-loaded message, got %s", ctx.Response.Body()) + } +} + +// ----------------------------------------------------------------------------- +// clearCacheByKey (DELETE /api/cache/clear-by-key/{cacheKey}) +// ----------------------------------------------------------------------------- + +func TestClearCacheByKey_OK(t *testing.T) { + clearer := &fakeCacheClearer{} + h := newCacheHandler(clearer) + + ctx := newCacheCtx("cacheKey", "session-42") + h.clearCacheByKey(ctx) + + if got := ctx.Response.StatusCode(); got != fasthttp.StatusOK { + t.Fatalf("expected 200, got %d body=%s", got, ctx.Response.Body()) + } + if len(clearer.keyCalls) != 1 || clearer.keyCalls[0] != "session-42" { + t.Fatalf("expected ClearCacheForKey('session-42'), got %v", clearer.keyCalls) + } +} + +func TestClearCacheByKey_PluginErrorReturns500(t *testing.T) { + clearer := &fakeCacheClearer{ + clearByKey: func(string) error { return errors.New("vector store down") }, + } + h := newCacheHandler(clearer) + + ctx := newCacheCtx("cacheKey", "session-42") + h.clearCacheByKey(ctx) + + if got := ctx.Response.StatusCode(); got != fasthttp.StatusInternalServerError { + t.Fatalf("expected 500 on plugin error, got %d", got) + } +} + +func TestClearCacheByKey_PluginNotLoaded(t *testing.T) { + h := NewCacheHandler(func() CacheClearer { return nil }) + + ctx := newCacheCtx("cacheKey", "session-42") + h.clearCacheByKey(ctx) + + if got := ctx.Response.StatusCode(); got != fasthttp.StatusBadRequest { + t.Fatalf("expected 400 when plugin not loaded, got %d", got) + } + if !strings.Contains(string(ctx.Response.Body()), "semantic_cache plugin is not loaded") { + t.Fatalf("expected plugin-not-loaded message, got %s", ctx.Response.Body()) + } +} diff --git a/transports/bifrost-http/handlers/governance.go b/transports/bifrost-http/handlers/governance.go index 8f671e256d..835133e41d 100644 --- a/transports/bifrost-http/handlers/governance.go +++ b/transports/bifrost-http/handlers/governance.go @@ -127,16 +127,14 @@ type UpdateVirtualKeyRequest struct { // CreateBudgetRequest represents the request body for creating a budget type CreateBudgetRequest struct { - MaxLimit float64 `json:"max_limit" validate:"required"` // Maximum budget in dollars - ResetDuration string `json:"reset_duration" validate:"required"` // e.g., "30s", "5m", "1h", "1d", "1w", "1M" - CalendarAligned bool `json:"calendar_aligned,omitempty"` // Snap resets to calendar boundaries (day/week/month/year) + MaxLimit float64 `json:"max_limit" validate:"required"` // Maximum budget in dollars + ResetDuration string `json:"reset_duration" validate:"required"` // e.g., "30s", "5m", "1h", "1d", "1w", "1M" } // UpdateBudgetRequest represents the request body for updating a budget type UpdateBudgetRequest struct { - MaxLimit *float64 `json:"max_limit,omitempty"` - ResetDuration *string `json:"reset_duration,omitempty"` - CalendarAligned *bool `json:"calendar_aligned,omitempty"` // When switching to true, current usage is reset to 0 + MaxLimit *float64 `json:"max_limit,omitempty"` + ResetDuration *string `json:"reset_duration,omitempty"` } // RoutingTarget represents a single weighted routing target within a rule. @@ -230,18 +228,20 @@ func collectProviderConfigDeleteIDs( // CreateTeamRequest represents the request body for creating a team type CreateTeamRequest struct { - Name string `json:"name" validate:"required"` - CustomerID *string `json:"customer_id,omitempty"` // Team can belong to a customer - Budgets []CreateBudgetRequest `json:"budgets,omitempty"` // Multi-budget: each must have a unique reset_duration - RateLimit *CreateRateLimitRequest `json:"rate_limit,omitempty"` // Team can have its own rate limit + Name string `json:"name" validate:"required"` + CustomerID *string `json:"customer_id,omitempty"` // Team can belong to a customer + Budgets []CreateBudgetRequest `json:"budgets,omitempty"` // Multi-budget: each must have a unique reset_duration + RateLimit *CreateRateLimitRequest `json:"rate_limit,omitempty"` // Team can have its own rate limit + CalendarAligned bool `json:"calendar_aligned,omitempty"` // Team-wide: snap all team budgets and rate-limit resets to calendar boundaries } // UpdateTeamRequest represents the request body for updating a team type UpdateTeamRequest struct { - Name *string `json:"name,omitempty"` - CustomerID *string `json:"customer_id,omitempty"` - Budgets []CreateBudgetRequest `json:"budgets,omitempty"` // Multi-budget: replaces all team budgets - RateLimit *UpdateRateLimitRequest `json:"rate_limit,omitempty"` + Name *string `json:"name,omitempty"` + CustomerID *string `json:"customer_id,omitempty"` + Budgets []CreateBudgetRequest `json:"budgets,omitempty"` // Multi-budget: replaces all team budgets + RateLimit *UpdateRateLimitRequest `json:"rate_limit,omitempty"` + CalendarAligned *bool `json:"calendar_aligned,omitempty"` // Team-wide setting; nil means "leave unchanged" } // CreateCustomerRequest represents the request body for creating a customer @@ -413,7 +413,11 @@ func (h *GovernanceHandler) getVirtualKeys(ctx *fasthttp.RequestCtx) { params.Offset = n } - params.Limit, params.Offset = ClampPaginationParams(params.Limit, params.Offset) + if !params.Export { + params.Limit, params.Offset = ClampPaginationParams(params.Limit, params.Offset) + } else if params.Offset < 0 { + params.Offset = 0 + } virtualKeys, totalCount, err := h.configStore.GetVirtualKeysPaginated(ctx, params) if err != nil { logger.Error("failed to retrieve virtual keys: %v", err) @@ -1510,9 +1514,10 @@ func (h *GovernanceHandler) createTeam(ctx *fasthttp.RequestCtx) { var team configstoreTables.TableTeam if err := h.configStore.ExecuteTransaction(ctx, func(tx *gorm.DB) error { team = configstoreTables.TableTeam{ - ID: uuid.NewString(), - Name: req.Name, - CustomerID: req.CustomerID, + ID: uuid.NewString(), + Name: req.Name, + CustomerID: req.CustomerID, + CalendarAligned: req.CalendarAligned, } if req.RateLimit != nil { rateLimit := configstoreTables.TableRateLimit{ @@ -1547,13 +1552,12 @@ func (h *GovernanceHandler) createTeam(ctx *fasthttp.RequestCtx) { } seenDurations[b.ResetDuration] = true budget := configstoreTables.TableBudget{ - ID: uuid.NewString(), - MaxLimit: b.MaxLimit, - ResetDuration: b.ResetDuration, - LastReset: budgetLastReset(b.CalendarAligned, b.ResetDuration), - CurrentUsage: 0, - CalendarAligned: b.CalendarAligned, - TeamID: &team.ID, + ID: uuid.NewString(), + MaxLimit: b.MaxLimit, + ResetDuration: b.ResetDuration, + LastReset: budgetLastReset(team.CalendarAligned, b.ResetDuration), + CurrentUsage: 0, + TeamID: &team.ID, } if err := validateBudget(&budget); err != nil { return err @@ -1656,6 +1660,18 @@ func (h *GovernanceHandler) updateTeam(ctx *fasthttp.RequestCtx) { team.CustomerID = req.CustomerID } } + // Resolve team-level calendar alignment for this update: + // - explicit team-level field wins (req.CalendarAligned != nil) + // - else leave existing team.CalendarAligned untouched + wasCalendarAligned := team.CalendarAligned + if req.CalendarAligned != nil { + team.CalendarAligned = *req.CalendarAligned + } + calendarAlignmentJustEnabled := !wasCalendarAligned && team.CalendarAligned + // Snap-to-calendar-period happens after budget/rate-limit reconciliation + // below, so combined `calendar_aligned + budgets/rate_limit` updates see + // the final persisted state. + // Multi-budget reconciliation: match by reset_duration, preserve usage on update, // create new budgets for new durations, delete unmatched existing budgets. // Mirrors VK multi-budget handling above. @@ -1684,17 +1700,10 @@ func (h *GovernanceHandler) updateTeam(ctx *fasthttp.RequestCtx) { matchedIDs := make(map[string]bool) for _, b := range req.Budgets { if existing, found := existingByDuration[b.ResetDuration]; found { - wasCalendarAligned := existing.CalendarAligned existing.MaxLimit = b.MaxLimit - existing.CalendarAligned = b.CalendarAligned - // Match the UI's calendar-alignment confirmation promise: on the - // false → true transition, snap LastReset to the current period - // start and zero out CurrentUsage now, instead of lazily waiting - // for the next period boundary in ResetExpiredBudgetsInMemory. - if b.CalendarAligned && !wasCalendarAligned { - existing.LastReset = configstoreTables.GetCalendarPeriodStart(b.ResetDuration, time.Now()) - existing.CurrentUsage = 0 - } + // LastReset / CurrentUsage are preserved on update; if calendar + // alignment was just enabled in this request, the post-reconciliation + // snap block below resets them. if err := validateBudget(&existing); err != nil { return err } @@ -1705,13 +1714,12 @@ func (h *GovernanceHandler) updateTeam(ctx *fasthttp.RequestCtx) { matchedIDs[existing.ID] = true } else { budget := configstoreTables.TableBudget{ - ID: uuid.NewString(), - MaxLimit: b.MaxLimit, - ResetDuration: b.ResetDuration, - LastReset: budgetLastReset(b.CalendarAligned, b.ResetDuration), - CurrentUsage: 0, - CalendarAligned: b.CalendarAligned, - TeamID: &team.ID, + ID: uuid.NewString(), + MaxLimit: b.MaxLimit, + ResetDuration: b.ResetDuration, + LastReset: budgetLastReset(team.CalendarAligned, b.ResetDuration), + CurrentUsage: 0, + TeamID: &team.ID, } if err := validateBudget(&budget); err != nil { return err @@ -1781,6 +1789,44 @@ func (h *GovernanceHandler) updateTeam(ctx *fasthttp.RequestCtx) { team.RateLimit = &rateLimit } } + // Snap budgets and rate limit to the current calendar period when calendar + // alignment transitions false -> true in this request. Runs after budget/ + // rate-limit reconciliation so both the standalone-toggle and the combined + // (toggle + budgets/rate_limit in the same request) cases are covered, and + // only fires once per transition. + if calendarAlignmentJustEnabled { + now := time.Now() + for i := range team.Budgets { + b := &team.Budgets[i] + if !configstoreTables.IsCalendarAlignableDuration(b.ResetDuration) { + continue + } + b.LastReset = configstoreTables.GetCalendarPeriodStart(b.ResetDuration, now) + b.CurrentUsage = 0 + if err := h.configStore.UpdateBudget(ctx, b, tx); err != nil { + return fmt.Errorf("failed to snap team budget %s on calendar-align enable: %w", b.ID, err) + } + } + if team.RateLimit != nil { + rl := team.RateLimit + snapped := false + if rl.TokenResetDuration != nil && configstoreTables.IsCalendarAlignableDuration(*rl.TokenResetDuration) { + rl.TokenLastReset = configstoreTables.GetCalendarPeriodStart(*rl.TokenResetDuration, now) + rl.TokenCurrentUsage = 0 + snapped = true + } + if rl.RequestResetDuration != nil && configstoreTables.IsCalendarAlignableDuration(*rl.RequestResetDuration) { + rl.RequestLastReset = configstoreTables.GetCalendarPeriodStart(*rl.RequestResetDuration, now) + rl.RequestCurrentUsage = 0 + snapped = true + } + if snapped { + if err := h.configStore.UpdateRateLimit(ctx, rl, tx); err != nil { + return fmt.Errorf("failed to snap team rate limit on calendar-align enable: %w", err) + } + } + } + } if err := h.configStore.UpdateTeam(ctx, team, tx); err != nil { return err } diff --git a/transports/bifrost-http/handlers/governance_test.go b/transports/bifrost-http/handlers/governance_test.go index f901c0ccfb..89df5df207 100644 --- a/transports/bifrost-http/handlers/governance_test.go +++ b/transports/bifrost-http/handlers/governance_test.go @@ -196,11 +196,6 @@ func TestBudgetRemovalRequestDetection(t *testing.T) { req: &UpdateBudgetRequest{ResetDuration: schemas.Ptr("1h")}, want: false, }, - { - name: "calendar aligned only is treated as removal", - req: &UpdateBudgetRequest{CalendarAligned: schemas.Ptr(true)}, - want: true, - }, } for _, tt := range tests { diff --git a/transports/bifrost-http/handlers/inference.go b/transports/bifrost-http/handlers/inference.go index 613dcba5ac..c5bd005195 100644 --- a/transports/bifrost-http/handlers/inference.go +++ b/transports/bifrost-http/handlers/inference.go @@ -21,8 +21,8 @@ import ( "github.com/fasthttp/router" bifrost "github.com/maximhq/bifrost/core" - "github.com/maximhq/bifrost/core/schemas" providerUtils "github.com/maximhq/bifrost/core/providers/utils" + "github.com/maximhq/bifrost/core/schemas" "github.com/maximhq/bifrost/transports/bifrost-http/lib" "github.com/valyala/fasthttp" ) diff --git a/transports/bifrost-http/handlers/logging.go b/transports/bifrost-http/handlers/logging.go index 56282e40b4..8521191964 100644 --- a/transports/bifrost-http/handlers/logging.go +++ b/transports/bifrost-http/handlers/logging.go @@ -475,6 +475,9 @@ func (h *LoggingHandler) getLogs(ctx *fasthttp.RequestCtx) { filters.MissingCostOnly = val } } + if cacheHitTypes := string(ctx.QueryArgs().Peek("cache_hit_types")); cacheHitTypes != "" { + filters.CacheHitTypes = parseCommaSeparated(cacheHitTypes) + } if contentSearch := string(ctx.QueryArgs().Peek("content_search")); contentSearch != "" { filters.ContentSearch = contentSearch } @@ -713,6 +716,9 @@ func (h *LoggingHandler) getLogsStats(ctx *fasthttp.RequestCtx) { filters.MissingCostOnly = val } } + if cacheHitTypes := string(ctx.QueryArgs().Peek("cache_hit_types")); cacheHitTypes != "" { + filters.CacheHitTypes = parseCommaSeparated(cacheHitTypes) + } if contentSearch := string(ctx.QueryArgs().Peek("content_search")); contentSearch != "" { filters.ContentSearch = contentSearch } @@ -869,6 +875,9 @@ func parseHistogramFilters(ctx *fasthttp.RequestCtx) *logstore.SearchFilters { filters.MissingCostOnly = val } } + if cacheHitTypes := string(ctx.QueryArgs().Peek("cache_hit_types")); cacheHitTypes != "" { + filters.CacheHitTypes = parseCommaSeparated(cacheHitTypes) + } if contentSearch := string(ctx.QueryArgs().Peek("content_search")); contentSearch != "" { filters.ContentSearch = contentSearch } diff --git a/transports/bifrost-http/handlers/middlewares.go b/transports/bifrost-http/handlers/middlewares.go index bf72a7ed7a..58c2b86777 100644 --- a/transports/bifrost-http/handlers/middlewares.go +++ b/transports/bifrost-http/handlers/middlewares.go @@ -49,33 +49,33 @@ func SecurityHeadersMiddleware() schemas.BifrostHTTPMiddleware { func CorsMiddleware(config *lib.Config) schemas.BifrostHTTPMiddleware { return func(next fasthttp.RequestHandler) fasthttp.RequestHandler { return func(ctx *fasthttp.RequestCtx) { - startTime := time.Now() + // startTime := time.Now() // skip logging if it's a /health check request if slices.IndexFunc(loggingSkipPaths, func(path string) bool { return strings.HasPrefix(string(ctx.RequestURI()), path) }) != -1 { goto corsFlow } - defer func() { - statusCode := ctx.Response.Header.StatusCode() - level := schemas.LogLevelInfo - if statusCode >= 500 { - level = schemas.LogLevelError - } else if statusCode >= 400 { - level = schemas.LogLevelWarn - } - logBuilder := logger.LogHTTPRequest(level, "request completed"). - Str("http.method", string(ctx.Method())). - Str("http.target", string(ctx.RequestURI())). - Int("http.status_code", statusCode). - Int64("http.request_duration_ms", time.Since(startTime).Milliseconds()). - Str("http.remote_addr", ctx.RemoteAddr().String()). - Str("http.user_agent", string(ctx.Request.Header.UserAgent())) - if traceID, ok := ctx.UserValue(schemas.BifrostContextKeyTraceID).(string); ok && traceID != "" { - logBuilder = logBuilder.Str("trace_id", traceID) - } - logBuilder.Send() - }() + // defer func() { + // statusCode := ctx.Response.Header.StatusCode() + // level := schemas.LogLevelInfo + // if statusCode >= 500 { + // level = schemas.LogLevelError + // } else if statusCode >= 400 { + // level = schemas.LogLevelWarn + // } + // logBuilder := logger.LogHTTPRequest(level, "request completed"). + // Str("http.method", string(ctx.Method())). + // Str("http.target", string(ctx.RequestURI())). + // Int("http.status_code", statusCode). + // Int64("http.request_duration_ms", time.Since(startTime).Milliseconds()). + // Str("http.remote_addr", ctx.RemoteAddr().String()). + // Str("http.user_agent", string(ctx.Request.Header.UserAgent())) + // if traceID, ok := ctx.UserValue(schemas.BifrostContextKeyTraceID).(string); ok && traceID != "" { + // logBuilder = logBuilder.Str("trace_id", traceID) + // } + // logBuilder.Send() + // }() corsFlow: origin := string(ctx.Request.Header.Peek("Origin")) allowed := IsOriginAllowed(origin, config.ClientConfig.AllowedOrigins) @@ -808,7 +808,7 @@ func (m *AuthMiddleware) middleware(shouldSkip func(*configstore.AuthConfig, str } authConfig := m.authConfig.Load() if authConfig == nil || !authConfig.IsEnabled { - logger.Debug("auth middleware is disabled because auth config is not present or not enabled") + // logger.Debug("auth middleware is disabled because auth config is not present or not enabled") ctx.SetUserValue(schemas.BifrostContextKeySessionToken, "") // Mark as local admin so downstream RBAC bypasses cleanly when // auth is fully disabled; otherwise RBAC 401s and the UI enters diff --git a/transports/bifrost-http/handlers/plugins.go b/transports/bifrost-http/handlers/plugins.go index 3dc4f8353c..2d94152ff9 100644 --- a/transports/bifrost-http/handlers/plugins.go +++ b/transports/bifrost-http/handlers/plugins.go @@ -5,6 +5,7 @@ import ( "encoding/json" "errors" "fmt" + "maps" "github.com/fasthttp/router" "github.com/maximhq/bifrost/core/schemas" @@ -16,9 +17,9 @@ import ( ) type PluginsLoader interface { + GetPluginStatus(ctx context.Context) map[string]schemas.PluginStatus ReloadPlugin(ctx context.Context, name string, path *string, pluginConfig any, placement *schemas.PluginPlacement, order *int) error RemovePlugin(ctx context.Context, name string) error - GetPluginStatus(ctx context.Context) map[string]schemas.PluginStatus } // PluginsHandler is the handler for the plugins API @@ -57,6 +58,7 @@ type UpdatePluginRequest struct { // RegisterRoutes registers the routes for the PluginsHandler func (h *PluginsHandler) RegisterRoutes(r *router.Router, middlewares ...schemas.BifrostHTTPMiddleware) { r.GET("/api/plugins", lib.ChainMiddlewares(h.getPlugins, middlewares...)) + r.GET("/api/plugins/builtins", lib.ChainMiddlewares(h.getBuiltinPlugins, middlewares...)) r.GET("/api/plugins/{name}", lib.ChainMiddlewares(h.getPlugin, middlewares...)) r.POST("/api/plugins", lib.ChainMiddlewares(h.createPlugin, middlewares...)) r.PUT("/api/plugins/{name}", lib.ChainMiddlewares(h.updatePlugin, middlewares...)) @@ -105,6 +107,13 @@ func (h *PluginsHandler) buildPluginResponse(ctx context.Context, plugin *config } } +// getBuiltinPlugins returns the canonical list of built-in plugin names +func (h *PluginsHandler) getBuiltinPlugins(ctx *fasthttp.RequestCtx) { + SendJSON(ctx, map[string]any{ + "plugins": lib.GetBuiltinPluginNames(), + }) +} + // getPlugins gets all plugins func (h *PluginsHandler) getPlugins(ctx *fasthttp.RequestCtx) { if h.configStore == nil { @@ -226,7 +235,10 @@ func (h *PluginsHandler) getPlugin(ctx *fasthttp.RequestCtx) { SendError(ctx, 500, "Failed to retrieve plugin") return } - SendJSON(ctx, plugin) + // Return the same shape as list/create/update — with runtime status + // merged in — so the UI doesn't see an empty status when refetching a + // single plugin via useGetPluginQuery. + SendJSON(ctx, h.buildPluginResponse(ctx, plugin)) } // createPlugin creates a new plugin @@ -341,8 +353,9 @@ func (h *PluginsHandler) updatePlugin(ctx *fasthttp.RequestCtx) { } var plugin *configstoreTables.TablePlugin var err error - // Check if plugin exists - _, err = h.configStore.GetPlugin(ctx, name) + // Fetch the existing plugin to enable config merging below. + var existingPlugin *configstoreTables.TablePlugin + existingPlugin, err = h.configStore.GetPlugin(ctx, name) if err != nil { // If doesn't exist, create it if errors.Is(err, configstore.ErrNotFound) { @@ -392,11 +405,21 @@ func (h *PluginsHandler) updatePlugin(ctx *fasthttp.RequestCtx) { if isBuiltin && request.Path != nil { request.Path = nil } + // Merge incoming config over the existing DB config so fields unknown to the + // calling form (e.g. plugin_span_filter set by a separate UI sheet) are not wiped. + mergedConfig := request.Config + if existingPlugin != nil { + if existingCfg, ok := existingPlugin.Config.(map[string]any); ok && len(existingCfg) > 0 { + mergedConfig = make(map[string]any, len(existingCfg)+len(request.Config)) + maps.Copy(mergedConfig, existingCfg) + maps.Copy(mergedConfig, request.Config) + } + } // Updating the plugin if err := h.configStore.UpdatePlugin(ctx, &configstoreTables.TablePlugin{ Name: name, Enabled: request.Enabled, - Config: request.Config, + Config: mergedConfig, Path: request.Path, IsCustom: !isBuiltin, Placement: request.Placement, @@ -418,7 +441,7 @@ func (h *PluginsHandler) updatePlugin(ctx *fasthttp.RequestCtx) { } // We reload the plugin if its enabled, otherwise we stop it if request.Enabled { - if err := h.pluginsLoader.ReloadPlugin(ctx, name, request.Path, request.Config, request.Placement, request.Order); err != nil { + if err := h.pluginsLoader.ReloadPlugin(ctx, name, request.Path, mergedConfig, request.Placement, request.Order); err != nil { logger.Error("failed to load plugin: %v", err) SendError(ctx, fasthttp.StatusInternalServerError, fmt.Sprintf("Plugin updated in database but failed to load: %v", err)) return diff --git a/transports/bifrost-http/handlers/plugins_test.go b/transports/bifrost-http/handlers/plugins_test.go new file mode 100644 index 0000000000..977ba6e87c --- /dev/null +++ b/transports/bifrost-http/handlers/plugins_test.go @@ -0,0 +1,159 @@ +package handlers + +import ( + "context" + "encoding/json" + "testing" + + "github.com/maximhq/bifrost/core/schemas" + "github.com/maximhq/bifrost/framework/configstore" + configstoreTables "github.com/maximhq/bifrost/framework/configstore/tables" + "github.com/valyala/fasthttp" + "gorm.io/gorm" +) + +// capturePluginsStore records the last config passed to UpdatePlugin so tests +// can assert that config merging occurred correctly. +type capturePluginsStore struct { + configstore.ConfigStore + existingPlugin *configstoreTables.TablePlugin + capturedConfig map[string]any + capturedEnabled bool +} + +func (s *capturePluginsStore) GetPlugin(_ context.Context, name string) (*configstoreTables.TablePlugin, error) { + if s.existingPlugin != nil && s.existingPlugin.Name == name { + return s.existingPlugin, nil + } + return nil, configstore.ErrNotFound +} + +func (s *capturePluginsStore) UpdatePlugin(_ context.Context, plugin *configstoreTables.TablePlugin, _ ...*gorm.DB) error { + if cfg, ok := plugin.Config.(map[string]any); ok { + s.capturedConfig = cfg + } + s.capturedEnabled = plugin.Enabled + return nil +} + +func (s *capturePluginsStore) CreatePlugin(_ context.Context, plugin *configstoreTables.TablePlugin, _ ...*gorm.DB) error { + s.existingPlugin = plugin + return nil +} + +// noopPluginsLoader satisfies the PluginsLoader interface without doing anything. +type noopPluginsLoader struct{} + +func (noopPluginsLoader) ReloadPlugin(_ context.Context, _ string, _ *string, _ any, _ *schemas.PluginPlacement, _ *int) error { + return nil +} +func (noopPluginsLoader) RemovePlugin(_ context.Context, _ string) error { return nil } +func (noopPluginsLoader) GetPluginStatus(_ context.Context) map[string]schemas.PluginStatus { + return nil +} + +// buildUpdateRequest creates a PUT /api/plugins/{name} fasthttp context. +func buildUpdateRequest(t *testing.T, body any) *fasthttp.RequestCtx { + t.Helper() + raw, err := json.Marshal(body) + if err != nil { + t.Fatalf("marshal request body: %v", err) + } + ctx := &fasthttp.RequestCtx{} + ctx.Request.Header.SetMethod("PUT") + ctx.Request.SetBody(raw) + ctx.SetUserValue("name", "otel") + return ctx +} + +// TestUpdatePlugin_ConfigMerge verifies that updatePlugin merges the incoming +// config over the existing DB config, preserving fields the caller did not send. +// This is critical for the plugin_span_filter field: the OTEL config form in the +// UI does not send plugin_span_filter, so it must survive a save without being wiped. +func TestUpdatePlugin_ConfigMerge(t *testing.T) { + SetLogger(&mockLogger{}) + + spanFilter := map[string]any{ + "mode": "exclude", + "plugins": []any{"logging", "compat"}, + } + existingConfig := map[string]any{ + "collector_url": "localhost:4317", + "trace_type": "genai_extension", + "protocol": "grpc", + "plugin_span_filter": spanFilter, + } + + store := &capturePluginsStore{ + existingPlugin: &configstoreTables.TablePlugin{ + Name: "otel", + Enabled: true, + Config: existingConfig, + }, + } + + h := &PluginsHandler{ + pluginsLoader: noopPluginsLoader{}, + configStore: store, + } + + // The UI OTEL form sends only the base fields — no plugin_span_filter. + reqBody := map[string]any{ + "enabled": true, + "config": map[string]any{ + "collector_url": "new-collector:4317", + "trace_type": "open_inference", + "protocol": "grpc", + }, + } + + ctx := buildUpdateRequest(t, reqBody) + h.updatePlugin(ctx) + + if ctx.Response.StatusCode() != 200 { + t.Fatalf("expected 200, got %d: %s", ctx.Response.StatusCode(), ctx.Response.Body()) + } + + // The merged config must contain both the updated base fields AND the preserved filter. + if store.capturedConfig == nil { + t.Fatal("UpdatePlugin was not called") + } + if got := store.capturedConfig["collector_url"]; got != "new-collector:4317" { + t.Errorf("collector_url = %v, want new-collector:4317", got) + } + if got := store.capturedConfig["trace_type"]; got != "open_inference" { + t.Errorf("trace_type = %v, want open_inference", got) + } + if _, ok := store.capturedConfig["plugin_span_filter"]; !ok { + t.Error("plugin_span_filter was wiped from the config; merge logic is broken") + } +} + +// TestUpdatePlugin_ConfigMerge_NewPlugin verifies that when no existing plugin +// is found in the DB (first save), the incoming config is used as-is. +func TestUpdatePlugin_ConfigMerge_NewPlugin(t *testing.T) { + SetLogger(&mockLogger{}) + + store := &capturePluginsStore{existingPlugin: nil} + h := &PluginsHandler{ + pluginsLoader: noopPluginsLoader{}, + configStore: store, + } + + reqBody := map[string]any{ + "enabled": true, + "config": map[string]any{ + "collector_url": "localhost:4317", + "trace_type": "genai_extension", + "protocol": "grpc", + }, + } + + ctx := buildUpdateRequest(t, reqBody) + h.updatePlugin(ctx) + + // Should succeed even when no existing plugin is found (creates then updates). + if ctx.Response.StatusCode() != 200 { + t.Fatalf("expected 200, got %d: %s", ctx.Response.StatusCode(), ctx.Response.Body()) + } +} diff --git a/transports/bifrost-http/handlers/realtime_client_secrets.go b/transports/bifrost-http/handlers/realtime_client_secrets.go index 9fe07dd61e..6b8f680e15 100644 --- a/transports/bifrost-http/handlers/realtime_client_secrets.go +++ b/transports/bifrost-http/handlers/realtime_client_secrets.go @@ -80,12 +80,15 @@ func (h *RealtimeClientSecretsHandler) handleRequest(ctx *fasthttp.RequestCtx) { return } - providerKey, model, normalizedBody, err := resolveRealtimeClientSecretTarget(route, body) + providerKey, model, normalizedBody, err := resolveRealtimeClientSecretTarget(ctx, h.config, route, body) if err != nil { SendBifrostError(ctx, err) return } + logger.Info("[realtime-client-secrets] request: path=%s provider=%s model=%s endpoint_type=%s", + string(ctx.Path()), providerKey, model, route.EndpointType) + bifrostCtx, cancel := lib.ConvertToBifrostContext(ctx, h.handlerStore) defer cancel() bifrostCtx.SetValue(schemas.BifrostContextKeyHTTPRequestType, schemas.RealtimeRequest) @@ -150,9 +153,14 @@ func (h *RealtimeClientSecretsHandler) handleRequest(ctx *fasthttp.RequestCtx) { resp, bifrostErr := sessionProvider.CreateRealtimeClientSecret(bifrostCtx, key, route.EndpointType, normalizedBody) if bifrostErr != nil { + logger.Error("[realtime-client-secrets] upstream error: provider=%s model=%s error=%s", + providerKey, model, bifrostErr.Error) SendBifrostError(ctx, bifrostErr) return } + + logger.Info("[realtime-client-secrets] upstream success: provider=%s model=%s status=%d", + providerKey, model, resp.StatusCode) cacheRealtimeEphemeralKeyMapping( h.handlerStore.GetKVStore(), resp.Body, @@ -208,7 +216,7 @@ func (h *RealtimeClientSecretsHandler) realtimeSessionRoutes() []schemas.Realtim return routes } -func resolveRealtimeClientSecretTarget(route schemas.RealtimeSessionRoute, body []byte) (schemas.ModelProvider, string, []byte, *schemas.BifrostError) { +func resolveRealtimeClientSecretTarget(ctx *fasthttp.RequestCtx, config *lib.Config, route schemas.RealtimeSessionRoute, body []byte) (schemas.ModelProvider, string, []byte, *schemas.BifrostError) { root, err := schemas.ParseRealtimeClientSecretBody(body) if err != nil { return "", "", nil, err @@ -221,6 +229,18 @@ func resolveRealtimeClientSecretTarget(route schemas.RealtimeSessionRoute, body defaultProvider := route.DefaultProvider providerKey, model := schemas.ParseModelString(rawModel, defaultProvider) + // Model catalog auto-resolution for bare model names on /v1 client secret routes + if defaultProvider == "" && providerKey == "" && model != "" { + providers := config.GetProvidersForModel(model) + if len(providers) > 0 { + ctx.SetUserValue(lib.FastHTTPUserValueModelCatalogResolution, &lib.ModelCatalogResolution{ + Model: model, + ResolvedProvider: providers[0], + AllProviders: providers, + }) + providerKey = providers[0] + } + } if defaultProvider == "" && providerKey == "" { return "", "", nil, newRealtimeClientSecretHandlerError( fasthttp.StatusBadRequest, diff --git a/transports/bifrost-http/handlers/realtime_client_secrets_test.go b/transports/bifrost-http/handlers/realtime_client_secrets_test.go index 8029622921..8c1b83dfa8 100644 --- a/transports/bifrost-http/handlers/realtime_client_secrets_test.go +++ b/transports/bifrost-http/handlers/realtime_client_secrets_test.go @@ -65,7 +65,8 @@ func TestResolveRealtimeClientSecretTarget(t *testing.T) { t.Run(tt.name, func(t *testing.T) { t.Parallel() - gotProvider, gotModel, _, err := resolveRealtimeClientSecretTarget(tt.route, tt.body) + var ctx fasthttp.RequestCtx + gotProvider, gotModel, _, err := resolveRealtimeClientSecretTarget(&ctx, &lib.Config{}, tt.route, tt.body) if tt.wantErr { if err == nil { t.Fatal("expected error, got nil") @@ -118,7 +119,8 @@ func TestResolveRealtimeClientSecretTarget_NormalizesModel(t *testing.T) { t.Run(tt.name, func(t *testing.T) { t.Parallel() - _, _, normalizedBody, err := resolveRealtimeClientSecretTarget(tt.route, []byte(tt.body)) + var ctx fasthttp.RequestCtx + _, _, normalizedBody, err := resolveRealtimeClientSecretTarget(&ctx, &lib.Config{}, tt.route, []byte(tt.body)) if err != nil { t.Fatalf("unexpected error: %v", err) } diff --git a/transports/bifrost-http/handlers/realtime_logging_test.go b/transports/bifrost-http/handlers/realtime_logging_test.go index 054f2ea0e9..b94307664d 100644 --- a/transports/bifrost-http/handlers/realtime_logging_test.go +++ b/transports/bifrost-http/handlers/realtime_logging_test.go @@ -255,6 +255,27 @@ func TestPendingRealtimeToolOutputUpdate(t *testing.T) { } } +func TestRealtimeSessionDedupeNestedRawEvents(t *testing.T) { + t.Parallel() + + session := bfws.NewSession(nil) + firstRaw := `{"type":"conversation.item.created","item":{"id":"item_tool_123","type":"function_call_output","output":"{\"nextResponse\":\"tool result\"}"}}` + laterRaw := `{"type":"conversation.item.done","item":{"id":"item_tool_123","type":"function_call_output","output":"{\"nextResponse\":\"tool result\"}"}}` + + session.RecordRealtimeToolOutput("item_tool_123", `{"nextResponse":"tool result"}`, firstRaw) + session.RecordRealtimeToolOutput("item_tool_123", `{"nextResponse":"tool result"}`, laterRaw) + + inputs := session.ConsumeRealtimeTurnInputs() + if len(inputs) != 1 { + t.Fatalf("len(inputs) = %d, want 1", len(inputs)) + } + // Same-itemID updates replace raw with the latest event — later events + // (e.g. conversation.item.done) carry the same or more complete data. + if inputs[0].Raw != laterRaw { + t.Fatalf("Raw = %q, want latest raw event", inputs[0].Raw) + } +} + func TestBuildRealtimeTurnPostResponseUsesFullResponseDonePayload(t *testing.T) { rawRequest := `{"type":"conversation.item.input_audio_transcription.completed","transcript":""}` rawResponse := []byte(`{ @@ -314,6 +335,49 @@ func TestBuildRealtimeTurnPostResponseUsesFullResponseDonePayload(t *testing.T) } } +func TestBuildRealtimeTurnPostResponseMergesTextAndToolCalls(t *testing.T) { + rawResponse := []byte(`{ + "type":"response.done", + "response":{ + "output":[ + { + "id":"item_message_123", + "type":"message", + "content":[{"type":"text","text":"assistant text"}] + }, + { + "id":"item_call_123", + "type":"function_call", + "call_id":"call_123", + "name":"lookup_weather", + "arguments":"{\"city\":\"SF\"}" + } + ] + } + }`) + + resp := buildRealtimeTurnPostResponse(&openai.OpenAIProvider{}, schemas.OpenAI, "gpt-realtime", "", rawResponse, "", 123) + if resp == nil || resp.ResponsesResponse == nil { + t.Fatal("expected realtime post response") + } + if len(resp.ResponsesResponse.Output) != 2 { + t.Fatalf("len(Output) = %d, want 2", len(resp.ResponsesResponse.Output)) + } + if resp.ResponsesResponse.Output[0].Type == nil || *resp.ResponsesResponse.Output[0].Type != schemas.ResponsesMessageTypeMessage { + t.Fatalf("Output[0].Type = %#v, want message", resp.ResponsesResponse.Output[0].Type) + } + toolOutput := resp.ResponsesResponse.Output[1] + if toolOutput.Type == nil || *toolOutput.Type != schemas.ResponsesMessageTypeFunctionCall { + t.Fatalf("Output[1].Type = %#v, want function_call", toolOutput.Type) + } + if toolOutput.ResponsesToolMessage == nil || toolOutput.ResponsesToolMessage.Name == nil || *toolOutput.ResponsesToolMessage.Name != "lookup_weather" { + t.Fatalf("tool name = %#v, want lookup_weather", toolOutput.ResponsesToolMessage) + } + if toolOutput.CallID == nil || *toolOutput.CallID != "call_123" { + t.Fatalf("CallID = %#v, want call_123", toolOutput.CallID) + } +} + func TestFinalizeRealtimeTurnHooksWithErrorCompletesActiveHooks(t *testing.T) { t.Parallel() diff --git a/transports/bifrost-http/handlers/realtime_turn_pipeline.go b/transports/bifrost-http/handlers/realtime_turn_pipeline.go index 91095e5843..947185ee97 100644 --- a/transports/bifrost-http/handlers/realtime_turn_pipeline.go +++ b/transports/bifrost-http/handlers/realtime_turn_pipeline.go @@ -9,6 +9,7 @@ import ( "github.com/google/uuid" bifrost "github.com/maximhq/bifrost/core" + openaiProvider "github.com/maximhq/bifrost/core/providers/openai" "github.com/maximhq/bifrost/core/schemas" bfws "github.com/maximhq/bifrost/transports/bifrost-http/websocket" ) @@ -71,6 +72,29 @@ func newRealtimeTurnContext( return ctx } +func applyRealtimeRawStorageContext(ctx *schemas.BifrostContext, storeRaw bool) { + if ctx == nil { + return + } + // Realtime turn logging captures raw payloads only for log storage. There is + // no client-facing raw send-back path for synthetic realtime turn responses. + sendBackRawRequest := false + sendBackRawResponse := false + ctx.SetValue(schemas.BifrostContextKeyShouldStoreRawInLogs, storeRaw) + ctx.SetValue(schemas.BifrostContextKeyCaptureRawRequest, storeRaw || sendBackRawRequest) + ctx.SetValue(schemas.BifrostContextKeyCaptureRawResponse, storeRaw || sendBackRawResponse) + ctx.SetValue(schemas.BifrostContextKeyDropRawRequestFromClient, storeRaw && !sendBackRawRequest) + ctx.SetValue(schemas.BifrostContextKeyDropRawResponseFromClient, storeRaw && !sendBackRawResponse) +} + +func shouldStoreRealtimeRawPayloads(ctx *schemas.BifrostContext) bool { + if ctx == nil { + return false + } + storeRaw, _ := ctx.Value(schemas.BifrostContextKeyShouldStoreRawInLogs).(bool) + return storeRaw +} + func applyRealtimeTurnContextValues(ctx *schemas.BifrostContext, values map[any]any) { if ctx == nil || len(values) == 0 { return @@ -93,6 +117,18 @@ func applyRealtimeTurnContextValues(ctx *schemas.BifrostContext, values map[any] } } +func restoreRealtimeTurnTraceContext(ctx *schemas.BifrostContext, traceID string, values map[any]any) { + if ctx == nil { + return + } + if strings.TrimSpace(traceID) != "" { + ctx.SetValue(schemas.BifrostContextKeyTraceID, strings.TrimSpace(traceID)) + } + if tracer, ok := values[schemas.BifrostContextKeyTracer].(schemas.Tracer); ok && tracer != nil { + ctx.SetValue(schemas.BifrostContextKeyTracer, tracer) + } +} + func setRealtimeTurnStreamContext(ctx *schemas.BifrostContext, startedAt time.Time, isFinal bool) { if ctx == nil { return @@ -106,7 +142,52 @@ func setRealtimeTurnStreamContext(ctx *schemas.BifrostContext, startedAt time.Ti } } -func buildRealtimeTurnPreRequest(provider schemas.ModelProvider, model string, turnInputs []bfws.RealtimeTurnInput) *schemas.BifrostRequest { +// sanitizeRealtimeSessionEventForProvider mutates outbound session events before provider +// serialization. It must not persist session state; rejected session.update events should +// not affect later turn logs. +func sanitizeRealtimeSessionEventForProvider(event *schemas.BifrostRealtimeEvent) { + if event == nil || event.Session == nil { + return + } + switch event.Type { + case schemas.RTEventSessionUpdate, + schemas.RTEventSessionCreated, + schemas.RTEventSessionUpdated: + if event.Session.ExtraParams != nil { + openaiProvider.StripNestedModelPrefixes(event.Session.ExtraParams) + } + } +} + +// updateRealtimeSessionFromEvent updates the session's tracked tool +// definitions and voice whenever a session.update, session.created, or +// session.updated event carries them. +func updateRealtimeSessionFromEvent(session *bfws.Session, event *schemas.BifrostRealtimeEvent) { + if event == nil || event.Session == nil { + return + } + switch event.Type { + case schemas.RTEventSessionUpdate, + schemas.RTEventSessionCreated, + schemas.RTEventSessionUpdated: + // Only update if the event explicitly carries tools (even an empty array + // means "clear tools"). A nil/absent tools field means "not changed". + if event.Session.Tools != nil { + session.SetRealtimeSessionTools(event.Session.Tools) + } + if event.Session.Voice != "" { + session.SetRealtimeVoice(event.Session.Voice) + } else if audioRaw, ok := event.Session.ExtraParams["audio"]; ok { + // New API format nests voice under session.audio.output.voice + // instead of the legacy top-level session.voice. + if voice := openaiProvider.ExtractNestedVoice(audioRaw); voice != "" { + session.SetRealtimeVoice(voice) + } + } + } +} + +func buildRealtimeTurnPreRequest(provider schemas.ModelProvider, model string, turnInputs []bfws.RealtimeTurnInput, sessionTools json.RawMessage) *schemas.BifrostRequest { input := make([]schemas.ResponsesMessage, 0, len(turnInputs)) for _, turnInput := range turnInputs { summary := strings.TrimSpace(turnInput.Summary) @@ -134,12 +215,21 @@ func buildRealtimeTurnPreRequest(provider schemas.ModelProvider, model string, t } } + var params *schemas.ResponsesParameters + if len(sessionTools) > 0 { + var tools []schemas.ResponsesTool + if json.Unmarshal(sessionTools, &tools) == nil && len(tools) > 0 { + params = &schemas.ResponsesParameters{Tools: tools} + } + } + return &schemas.BifrostRequest{ RequestType: schemas.RealtimeRequest, ResponsesRequest: &schemas.BifrostResponsesRequest{ Provider: provider, Model: model, Input: input, + Params: params, }, } } @@ -180,12 +270,15 @@ func buildRealtimeTurnPostResponse( func buildRealtimeTurnOutputMessages(rtProvider schemas.RealtimeProvider, rawResponse []byte, contentOverride string) []schemas.ResponsesMessage { outputs := make([]schemas.ResponsesMessage, 0) + seenFunctionCalls := make(map[string]struct{}) if outputMessage := extractRealtimeTurnOutputMessage(rtProvider, rawResponse, contentOverride); outputMessage != nil { outputs = append(outputs, buildRealtimeResponsesMessagesFromChat(outputMessage, contentOverride)...) - } - - if len(outputs) > 0 { - return outputs + for _, output := range outputs { + if output.Type == nil || *output.Type != schemas.ResponsesMessageTypeFunctionCall { + continue + } + seenFunctionCalls[realtimeResponsesFunctionCallKey(output)] = struct{}{} + } } var parsed realtimeResponseDoneEnvelope @@ -193,6 +286,9 @@ func buildRealtimeTurnOutputMessages(rtProvider schemas.RealtimeProvider, rawRes for _, item := range parsed.Response.Output { switch item.Type { case "message": + if realtimeOutputsContainMessage(outputs) { + continue + } content := strings.TrimSpace(contentOverride) if content == "" { content = extractRealtimeResponseDoneContentText(item.Content) @@ -227,6 +323,11 @@ func buildRealtimeTurnOutputMessages(rtProvider schemas.RealtimeProvider, rawRes if strings.TrimSpace(item.CallID) != "" { msg.CallID = schemas.Ptr(strings.TrimSpace(item.CallID)) } + key := realtimeResponsesFunctionCallKey(msg) + if _, exists := seenFunctionCalls[key]; exists { + continue + } + seenFunctionCalls[key] = struct{}{} outputs = append(outputs, msg) } } @@ -246,6 +347,35 @@ func buildRealtimeTurnOutputMessages(rtProvider schemas.RealtimeProvider, rawRes return outputs } +func realtimeOutputsContainMessage(outputs []schemas.ResponsesMessage) bool { + for _, output := range outputs { + if output.Type != nil && *output.Type == schemas.ResponsesMessageTypeMessage { + return true + } + } + return false +} + +func realtimeResponsesFunctionCallKey(message schemas.ResponsesMessage) string { + if message.CallID != nil && strings.TrimSpace(*message.CallID) != "" { + return "call_id:" + strings.TrimSpace(*message.CallID) + } + if message.ID != nil && strings.TrimSpace(*message.ID) != "" { + return "id:" + strings.TrimSpace(*message.ID) + } + + var parts []string + if message.ResponsesToolMessage != nil { + if message.ResponsesToolMessage.Name != nil { + parts = append(parts, strings.TrimSpace(*message.ResponsesToolMessage.Name)) + } + if message.ResponsesToolMessage.Arguments != nil { + parts = append(parts, strings.TrimSpace(*message.ResponsesToolMessage.Arguments)) + } + } + return strings.Join(parts, "\x00") +} + func buildRealtimeResponsesMessagesFromChat(message *schemas.ChatMessage, contentOverride string) []schemas.ResponsesMessage { if message == nil { return nil @@ -488,9 +618,14 @@ func startRealtimeTurnHooks( }() startedAt := time.Now() + storeRaw := shouldStoreRealtimeRawPayloads(baseCtx) turnCtx := newRealtimeTurnContext(baseCtx, "", session.ID(), session.ProviderSessionID(), realtimeTurnSourceEI, startEventType, key) + applyRealtimeRawStorageContext(turnCtx, storeRaw) + if voice := session.RealtimeVoice(); voice != "" { + turnCtx.SetValue(schemas.BifrostContextKeyRealtimeVoice, voice) + } setRealtimeTurnStreamContext(turnCtx, startedAt, false) - req := buildRealtimeTurnPreRequest(provider, model, session.PeekRealtimeTurnInputs()) + req := buildRealtimeTurnPreRequest(provider, model, session.PeekRealtimeTurnInputs(), session.RealtimeSessionTools()) hooks, bifrostErr := client.RunRealtimeTurnPreHooks(turnCtx, req) if bifrostErr != nil { // RunRealtimeTurnPreHooks already executed post-hooks and flushed the trace @@ -502,12 +637,15 @@ func startRealtimeTurnHooks( } requestID, _ := turnCtx.Value(schemas.BifrostContextKeyRequestID).(string) + traceID, _ := turnCtx.Value(schemas.BifrostContextKeyTraceID).(string) session.SetRealtimeTurnHooks(&bfws.RealtimeTurnPluginState{ PostHookRunner: hooks.PostHookRunner, Cleanup: hooks.Cleanup, RequestID: requestID, StartedAt: startedAt, PreHookValues: turnCtx.GetUserValues(), + TraceID: traceID, + RawStore: storeRaw, }) committed = true return nil @@ -548,6 +686,8 @@ func finalizeRealtimeTurnHooks( ) postCtx := newRealtimeTurnContext(baseCtx, activeHooks.RequestID, session.ID(), session.ProviderSessionID(), realtimeTurnSourceLM, rtProvider.RealtimeTurnFinalEvent(), key) applyRealtimeTurnContextValues(postCtx, activeHooks.PreHookValues) + restoreRealtimeTurnTraceContext(postCtx, activeHooks.TraceID, activeHooks.PreHookValues) + applyRealtimeRawStorageContext(postCtx, activeHooks.RawStore) setRealtimeTurnStreamContext(postCtx, activeHooks.StartedAt, true) _, bifrostErr := activeHooks.PostHookRunner(postCtx, postResponse, nil) completeRealtimeTurnTrace(postCtx) @@ -555,18 +695,22 @@ func finalizeRealtimeTurnHooks( } startedAt := time.Now() + storeRaw := shouldStoreRealtimeRawPayloads(baseCtx) preCtx := newRealtimeTurnContext(baseCtx, "", session.ID(), session.ProviderSessionID(), realtimeTurnSourceEI, "", key) + applyRealtimeRawStorageContext(preCtx, storeRaw) setRealtimeTurnStreamContext(preCtx, startedAt, false) - preReq := buildRealtimeTurnPreRequest(provider, model, turnInputs) + preReq := buildRealtimeTurnPreRequest(provider, model, turnInputs, session.RealtimeSessionTools()) hooks, bifrostErr := client.RunRealtimeTurnPreHooks(preCtx, preReq) if bifrostErr != nil { return bifrostErr } + preHookValues := preCtx.GetUserValues() if hooks.Cleanup != nil { defer hooks.Cleanup() } requestID, _ := preCtx.Value(schemas.BifrostContextKeyRequestID).(string) + traceID, _ := preCtx.Value(schemas.BifrostContextKeyTraceID).(string) postResponse := buildRealtimeTurnPostResponse( rtProvider, provider, @@ -577,7 +721,9 @@ func finalizeRealtimeTurnHooks( time.Since(startedAt).Milliseconds(), ) postCtx := newRealtimeTurnContext(baseCtx, requestID, session.ID(), session.ProviderSessionID(), realtimeTurnSourceLM, rtProvider.RealtimeTurnFinalEvent(), key) - applyRealtimeTurnContextValues(postCtx, preCtx.GetUserValues()) + applyRealtimeTurnContextValues(postCtx, preHookValues) + restoreRealtimeTurnTraceContext(postCtx, traceID, preHookValues) + applyRealtimeRawStorageContext(postCtx, storeRaw) setRealtimeTurnStreamContext(postCtx, startedAt, true) _, bifrostErr = hooks.PostHookRunner(postCtx, postResponse, nil) completeRealtimeTurnTrace(postCtx) @@ -618,6 +764,8 @@ func finalizeRealtimeTurnHooksWithError( ) postCtx := newRealtimeTurnContext(baseCtx, activeHooks.RequestID, session.ID(), session.ProviderSessionID(), realtimeTurnSourceLM, eventType, key) applyRealtimeTurnContextValues(postCtx, activeHooks.PreHookValues) + restoreRealtimeTurnTraceContext(postCtx, activeHooks.TraceID, activeHooks.PreHookValues) + applyRealtimeRawStorageContext(postCtx, activeHooks.RawStore) setRealtimeTurnStreamContext(postCtx, activeHooks.StartedAt, true) _, hookErr := activeHooks.PostHookRunner(postCtx, nil, postErr) completeRealtimeTurnTrace(postCtx) @@ -633,18 +781,22 @@ func finalizeRealtimeTurnHooksWithError( } startedAt := time.Now() + storeRaw := shouldStoreRealtimeRawPayloads(baseCtx) preCtx := newRealtimeTurnContext(baseCtx, "", session.ID(), session.ProviderSessionID(), realtimeTurnSourceEI, "", key) + applyRealtimeRawStorageContext(preCtx, storeRaw) setRealtimeTurnStreamContext(preCtx, startedAt, false) - preReq := buildRealtimeTurnPreRequest(provider, model, turnInputs) + preReq := buildRealtimeTurnPreRequest(provider, model, turnInputs, session.RealtimeSessionTools()) hooks, hookPreErr := client.RunRealtimeTurnPreHooks(preCtx, preReq) if hookPreErr != nil { return hookPreErr } + preHookValues := preCtx.GetUserValues() if hooks.Cleanup != nil { defer hooks.Cleanup() } requestID, _ := preCtx.Value(schemas.BifrostContextKeyRequestID).(string) + traceID, _ := preCtx.Value(schemas.BifrostContextKeyTraceID).(string) postErr := buildRealtimeTurnPostError( provider, model, @@ -653,7 +805,9 @@ func finalizeRealtimeTurnHooksWithError( bifrostErr, ) postCtx := newRealtimeTurnContext(baseCtx, requestID, session.ID(), session.ProviderSessionID(), realtimeTurnSourceLM, eventType, key) - applyRealtimeTurnContextValues(postCtx, preCtx.GetUserValues()) + applyRealtimeTurnContextValues(postCtx, preHookValues) + restoreRealtimeTurnTraceContext(postCtx, traceID, preHookValues) + applyRealtimeRawStorageContext(postCtx, storeRaw) setRealtimeTurnStreamContext(postCtx, startedAt, true) _, hookErr := hooks.PostHookRunner(postCtx, nil, postErr) completeRealtimeTurnTrace(postCtx) diff --git a/transports/bifrost-http/handlers/webrtc_realtime.go b/transports/bifrost-http/handlers/webrtc_realtime.go index da10f0a9cb..6119ee4a44 100644 --- a/transports/bifrost-http/handlers/webrtc_realtime.go +++ b/transports/bifrost-http/handlers/webrtc_realtime.go @@ -13,6 +13,7 @@ import ( "github.com/fasthttp/router" bifrost "github.com/maximhq/bifrost/core" + "github.com/maximhq/bifrost/core/providers/openai" "github.com/maximhq/bifrost/core/schemas" "github.com/maximhq/bifrost/transports/bifrost-http/integrations" "github.com/maximhq/bifrost/transports/bifrost-http/lib" @@ -62,6 +63,10 @@ func (h *WebRTCRealtimeHandler) RegisterRoutes(r *router.Router, middlewares ... // Base bifrost route — GA /calls format (multipart sdp + session) r.POST("/v1/realtime/calls", handler) + // Base bifrost route — legacy format (raw SDP or multipart on /v1/realtime) + h.legacyRoutes["/v1/realtime"] = "" + r.POST("/v1/realtime", handler) + // OpenAI integration routes — /calls variants (GA format) for _, path := range integrations.OpenAIRealtimeWebRTCCallsPaths("/openai") { r.POST(path, handler) @@ -105,7 +110,7 @@ func (h *WebRTCRealtimeHandler) handleRequest(ctx *fasthttp.RequestCtx) { // Raw SDP bodies (application/sdp) fall back to ?model= for the legacy // raw-SDP path only; the multipart contract has no ?model= fallback. func (h *WebRTCRealtimeHandler) handleCallsRequest(ctx *fasthttp.RequestCtx) { - sdpOffer, providerKey, model, normalizedSession, bifrostErr := parseCallsWebRTCRequest(ctx) + sdpOffer, providerKey, model, normalizedSession, bifrostErr := parseCallsWebRTCRequest(ctx, h.config) if bifrostErr != nil { SendBifrostError(ctx, bifrostErr) return @@ -124,7 +129,7 @@ func (h *WebRTCRealtimeHandler) handleCallsRequest(ctx *fasthttp.RequestCtx) { h.runWebRTCRelay(ctx, rtProvider, providerKey, model, sdpOffer, exchangeSDP) } -func parseCallsWebRTCRequest(ctx *fasthttp.RequestCtx) (string, schemas.ModelProvider, string, []byte, *schemas.BifrostError) { +func parseCallsWebRTCRequest(ctx *fasthttp.RequestCtx, config *lib.Config) (string, schemas.ModelProvider, string, []byte, *schemas.BifrostError) { contentType := strings.ToLower(string(ctx.Request.Header.ContentType())) path := string(ctx.Path()) if strings.HasPrefix(contentType, "multipart/form-data") { @@ -142,7 +147,7 @@ func parseCallsWebRTCRequest(ctx *fasthttp.RequestCtx) (string, schemas.ModelPro if strings.TrimSpace(sessionField) == "" { return "", "", "", nil, newRealtimeWebRTCError(fasthttp.StatusBadRequest, "invalid_request_error", "session form field is required", nil) } - providerKey, model, normalizedSession, bifrostErr := resolveRealtimeSDPTarget(path, []byte(sessionField)) + providerKey, model, normalizedSession, bifrostErr := resolveRealtimeSDPTarget(ctx, config, path, []byte(sessionField)) if bifrostErr != nil { return "", "", "", nil, bifrostErr } @@ -160,6 +165,18 @@ func parseCallsWebRTCRequest(ctx *fasthttp.RequestCtx) (string, schemas.ModelPro } providerKey, model := schemas.ParseModelString(rawModel, realtimeDefaultProviderForPath(path)) + // Model catalog auto-resolution for bare model names on base /v1 routes + if providerKey == "" && strings.TrimSpace(model) != "" { + providers := config.GetProvidersForModel(model) + if len(providers) > 0 { + ctx.SetUserValue(lib.FastHTTPUserValueModelCatalogResolution, &lib.ModelCatalogResolution{ + Model: model, + ResolvedProvider: providers[0], + AllProviders: providers, + }) + providerKey = providers[0] + } + } if providerKey == "" || strings.TrimSpace(model) == "" { if realtimeDefaultProviderForPath(path) == "" { return "", "", "", nil, newRealtimeWebRTCError(fasthttp.StatusBadRequest, "invalid_request_error", "model must use provider/model on /v1 realtime routes", nil) @@ -180,6 +197,18 @@ func (h *WebRTCRealtimeHandler) handleLegacyRequest(ctx *fasthttp.RequestCtx, de } providerKey, model := schemas.ParseModelString(rawModel, defaultProvider) + // Model catalog auto-resolution for bare model names on base /v1 routes + if providerKey == "" && strings.TrimSpace(model) != "" { + providers := h.config.GetProvidersForModel(model) + if len(providers) > 0 { + ctx.SetUserValue(lib.FastHTTPUserValueModelCatalogResolution, &lib.ModelCatalogResolution{ + Model: model, + ResolvedProvider: providers[0], + AllProviders: providers, + }) + providerKey = providers[0] + } + } if providerKey == "" || model == "" { SendBifrostError(ctx, newRealtimeWebRTCError(fasthttp.StatusBadRequest, "invalid_request_error", "invalid model: "+rawModel, nil)) return @@ -197,6 +226,16 @@ func (h *WebRTCRealtimeHandler) handleLegacyRequest(ctx *fasthttp.RequestCtx, de return } + // Strip provider prefixes from nested model fields (e.g. input_audio_transcription.model) + if sessionJSON != nil { + if root, parseErr := schemas.ParseRealtimeClientSecretBody(sessionJSON); parseErr == nil { + openai.StripNestedModelPrefixes(root) + if updated, marshalErr := json.Marshal(root); marshalErr == nil { + sessionJSON = updated + } + } + } + exchangeSDP := func(rCtx *schemas.BifrostContext, key schemas.Key, upstreamOffer string) (string, *schemas.BifrostError) { return legacyProvider.ExchangeLegacyRealtimeWebRTCSDP(rCtx, key, upstreamOffer, sessionJSON, model) } @@ -254,6 +293,10 @@ func (h *WebRTCRealtimeHandler) runWebRTCRelay( ) { bifrostCtx, cancel := lib.ConvertToBifrostContext(ctx, h.handlerStore) defer cancel() + // Apply governance/routing values from the transport middleware. + // ConvertToBifrostContext creates a fresh context that doesn't carry the user + // values the middleware stored on the fasthttp RequestCtx via SetUserValue. + applyRealtimeMiddlewareValues(bifrostCtx, snapshotRealtimeMiddlewareValues(ctx)) bifrostCtx.SetValue(schemas.BifrostContextKeyHTTPRequestType, schemas.RealtimeRequest) if strings.HasPrefix(string(ctx.Path()), "/openai") { bifrostCtx.SetValue(schemas.BifrostContextKeyIntegrationType, "openai") @@ -272,6 +315,11 @@ func (h *WebRTCRealtimeHandler) runWebRTCRelay( model = authKey.Aliases.Resolve(model) } + // Compute raw storage flag from provider config + per-request header overrides. + // Normal inference computes this inside bifrost.executeRequest, which is bypassed + // for realtime WebRTC connections. + applyRealtimeRawStorageContext(bifrostCtx, h.client.ComputeRawStorageForProvider(bifrostCtx, providerKey)) + boundExchange := func(rCtx *schemas.BifrostContext, upstreamOffer string) (string, *schemas.BifrostError) { return exchangeSDP(rCtx, authKey, upstreamOffer) } @@ -792,6 +840,7 @@ func (r *webrtcRealtimeRelay) handleDownstreamMessage(msg webrtc.DataChannelMess } } + sanitizeRealtimeSessionEventForProvider(event) providerEvent, err := r.provider.ToProviderRealtimeEvent(event) if err != nil { if startsTurn { @@ -816,6 +865,9 @@ func (r *webrtcRealtimeRelay) handleDownstreamMessage(msg webrtc.DataChannelMess r.sendUpstream(msg.Data, msg.IsString) return } + // Track session metadata only after provider translation succeeds. Rejected + // session.update events must not affect later turn logs. + updateRealtimeSessionFromEvent(r.session, event) r.sendUpstream(providerEvent, msg.IsString) } @@ -844,6 +896,8 @@ func (r *webrtcRealtimeRelay) handleUpstreamMessage(msg webrtc.DataChannelMessag if event.Session != nil && event.Session.ID != "" { r.session.SetProviderSessionID(event.Session.ID) } + // Track session tool definitions from session.created/session.updated (server→client). + updateRealtimeSessionFromEvent(r.session, event) inputItemID, inputSummary := pendingRealtimeInputUpdate(event) if inputSummary != "" { r.session.RecordRealtimeInput(inputItemID, inputSummary, string(msg.Data)) @@ -1062,12 +1116,26 @@ func newRealtimeRelayContext(requestCtx *schemas.BifrostContext) (*schemas.Bifro schemas.BifrostContextKeySelectedKeyID, schemas.BifrostContextKeySelectedKeyName, schemas.BifrostContextKeyIsEnterprise, + schemas.BifrostContextKeyRoutingEnginesUsed, + schemas.BifrostContextKeyRoutingEngineLogs, + schemas.BifrostContextKeyShouldStoreRawInLogs, + schemas.BifrostContextKeyAllowPerRequestStorageOverride, + schemas.BifrostContextKeyAllowPerRequestRawOverride, + schemas.BifrostContextKeyStoreRawRequestResponse, + schemas.BifrostContextKeyDisableContentLogging, + schemas.BifrostContextKeyCaptureRawRequest, + schemas.BifrostContextKeyCaptureRawResponse, + schemas.BifrostContextKeyDropRawRequestFromClient, + schemas.BifrostContextKeyDropRawResponseFromClient, } { if value := requestCtx.Value(key); value != nil { relayCtx.SetValue(key, value) } } + // Tag the relay context with transport type for downstream logging/metadata. + relayCtx.SetValue(schemas.BifrostContextKeyRealtimeTransport, "webrtc") + return relayCtx, cancel } @@ -1149,7 +1217,7 @@ func sendDataChannelMessage(dc *webrtc.DataChannel, payload []byte, isString boo } } -func resolveRealtimeSDPTarget(path string, sessionJSON []byte) (schemas.ModelProvider, string, []byte, *schemas.BifrostError) { +func resolveRealtimeSDPTarget(ctx *fasthttp.RequestCtx, config *lib.Config, path string, sessionJSON []byte) (schemas.ModelProvider, string, []byte, *schemas.BifrostError) { root, err := schemas.ParseRealtimeClientSecretBody(sessionJSON) if err != nil { return "", "", nil, err @@ -1166,6 +1234,18 @@ func resolveRealtimeSDPTarget(path string, sessionJSON []byte) (schemas.ModelPro } providerKey, model := schemas.ParseModelString(strings.TrimSpace(rawModel), realtimeDefaultProviderForPath(path)) + // Model catalog auto-resolution for bare model names in session body + if providerKey == "" && strings.TrimSpace(model) != "" { + providers := config.GetProvidersForModel(model) + if len(providers) > 0 { + ctx.SetUserValue(lib.FastHTTPUserValueModelCatalogResolution, &lib.ModelCatalogResolution{ + Model: model, + ResolvedProvider: providers[0], + AllProviders: providers, + }) + providerKey = providers[0] + } + } if providerKey == "" || strings.TrimSpace(model) == "" { if realtimeDefaultProviderForPath(path) == "" { return "", "", nil, newRealtimeWebRTCError(fasthttp.StatusBadRequest, "invalid_request_error", "session.model must use provider/model on /v1 realtime routes", nil) @@ -1178,6 +1258,7 @@ func resolveRealtimeSDPTarget(path string, sessionJSON []byte) (schemas.ModelPro return "", "", nil, newRealtimeWebRTCError(fasthttp.StatusInternalServerError, "server_error", "failed to encode normalized session model", marshalErr) } root["model"] = normalizedModel + openai.StripNestedModelPrefixes(root) normalizedSession, marshalErr := json.Marshal(root) if marshalErr != nil { return "", "", nil, newRealtimeWebRTCError(fasthttp.StatusInternalServerError, "server_error", "failed to encode normalized realtime session", marshalErr) diff --git a/transports/bifrost-http/handlers/webrtc_realtime_test.go b/transports/bifrost-http/handlers/webrtc_realtime_test.go index 8ed36bd040..a2636c9b11 100644 --- a/transports/bifrost-http/handlers/webrtc_realtime_test.go +++ b/transports/bifrost-http/handlers/webrtc_realtime_test.go @@ -33,7 +33,9 @@ func (s testHandlerStore) GetMCPExternalServerURL() string { re func (s testHandlerStore) GetMCPExternalClientURL() string { return "" } func TestResolveRealtimeSDPTarget_BaseRouteRequiresProviderPrefix(t *testing.T) { - _, _, _, err := resolveRealtimeSDPTarget("/v1/realtime", []byte(`{"model":"gpt-4o-realtime-preview"}`)) + var ctx fasthttp.RequestCtx + cfg := &lib.Config{} + _, _, _, err := resolveRealtimeSDPTarget(&ctx, cfg, "/v1/realtime", []byte(`{"model":"gpt-4o-realtime-preview"}`)) if err == nil { t.Fatal("expected provider/model validation error") } @@ -43,7 +45,9 @@ func TestResolveRealtimeSDPTarget_BaseRouteRequiresProviderPrefix(t *testing.T) } func TestResolveRealtimeSDPTarget_BaseRouteNormalizesModel(t *testing.T) { - provider, model, normalized, err := resolveRealtimeSDPTarget("/v1/realtime", []byte(`{"model":"openai/gpt-4o-realtime-preview","voice":"alloy"}`)) + var ctx fasthttp.RequestCtx + cfg := &lib.Config{} + provider, model, normalized, err := resolveRealtimeSDPTarget(&ctx, cfg, "/v1/realtime", []byte(`{"model":"openai/gpt-4o-realtime-preview","voice":"alloy"}`)) if err != nil { t.Fatalf("unexpected error: %v", err) } @@ -68,7 +72,9 @@ func TestResolveRealtimeSDPTarget_BaseRouteNormalizesModel(t *testing.T) { } func TestResolveRealtimeSDPTarget_OpenAIRouteDefaultsProvider(t *testing.T) { - provider, model, _, err := resolveRealtimeSDPTarget("/openai/v1/realtime", []byte(`{"model":"gpt-4o-realtime-preview"}`)) + var ctx fasthttp.RequestCtx + cfg := &lib.Config{} + provider, model, _, err := resolveRealtimeSDPTarget(&ctx, cfg, "/openai/v1/realtime", []byte(`{"model":"gpt-4o-realtime-preview"}`)) if err != nil { t.Fatalf("unexpected error: %v", err) } @@ -87,7 +93,7 @@ func TestParseCallsWebRTCRequest_RawSDPKeepsGARoute(t *testing.T) { ctx.Request.Header.SetContentType("application/sdp") ctx.Request.SetBodyString("v=0\r\n") - sdpOffer, provider, model, session, err := parseCallsWebRTCRequest(&ctx) + sdpOffer, provider, model, session, err := parseCallsWebRTCRequest(&ctx, &lib.Config{}) if err != nil { t.Fatalf("unexpected error: %v", err) } diff --git a/transports/bifrost-http/handlers/wsrealtime.go b/transports/bifrost-http/handlers/wsrealtime.go index 6f488103af..81edd1496f 100644 --- a/transports/bifrost-http/handlers/wsrealtime.go +++ b/transports/bifrost-http/handlers/wsrealtime.go @@ -2,6 +2,7 @@ package handlers import ( "errors" + "fmt" "io" "net" "net/http" @@ -77,7 +78,7 @@ func (h *WSRealtimeHandler) handleUpgrade(ctx *fasthttp.RequestCtx) { } } - providerKey, model, err := resolveRealtimeTarget(path, modelParam, deploymentParam) + providerKey, model, err := resolveRealtimeTarget(ctx, h.config, path, modelParam, deploymentParam) if err != nil { upgrader := h.websocketUpgrader("") upgradeErr := upgrader.Upgrade(ctx, func(conn *ws.Conn) { @@ -106,6 +107,13 @@ func (h *WSRealtimeHandler) handleUpgrade(ctx *fasthttp.RequestCtx) { return } + // Capture governance/routing values set by the transport middleware. + // TransportInterceptorMiddleware copies BifrostContext user values to individual + // fasthttp UserValue slots after HTTPTransportPreHook runs. We snapshot them now + // because the fasthttp RequestCtx is recycled after the handler returns — the + // WebSocket session outlives it. + middlewareContextValues := snapshotRealtimeMiddlewareValues(ctx) + upgrader := h.websocketUpgrader(rtProvider.RealtimeWebSocketSubprotocol()) err = upgrader.Upgrade(ctx, func(conn *ws.Conn) { defer conn.Close() @@ -118,7 +126,7 @@ func (h *WSRealtimeHandler) handleUpgrade(ctx *fasthttp.RequestCtx) { } defer h.sessions.Remove(conn) - h.runRealtimeSession(clientConn, session, auth, path, providerKey, model) + h.runRealtimeSession(clientConn, session, auth, path, providerKey, model, middlewareContextValues) }) if err != nil { logger.Warn("websocket upgrade failed for %s: %v", path, err) @@ -150,6 +158,7 @@ func (h *WSRealtimeHandler) runRealtimeSession( path string, providerKey schemas.ModelProvider, model string, + middlewareValues map[any]any, ) { clientConn.startHeartbeat() defer clientConn.stopHeartbeat() @@ -161,6 +170,12 @@ func (h *WSRealtimeHandler) runRealtimeSession( } defer cancel() + // Restore governance and routing values from the transport middleware context. + // These include routing rule ID/name, virtual key ID/name, routing engines, + // routing engine logs, raw-storage header overrides, and other values set by + // HTTPTransportPreHook plugins (governance, prompts, etc.). + applyRealtimeMiddlewareValues(bifrostCtx, middlewareValues) + // Resolve ephemeral key mapping to restore virtual key context. token := extractRealtimeBearerTokenFromHeader(auth.authorization) if isRealtimeEphemeralToken(token) { @@ -196,12 +211,26 @@ func (h *WSRealtimeHandler) runRealtimeSession( // Resolve model alias so the provider receives the actual model identifier. model = key.Aliases.Resolve(model) + // Compute raw storage flag from provider config + per-request header overrides. + // Normal inference computes this inside bifrost.executeRequest, which is bypassed + // for realtime WebSocket connections. Setting it on the session context ensures + // turn-level hooks can read it via shouldStoreRealtimeRawPayloads(). + applyRealtimeRawStorageContext(bifrostCtx, h.client.ComputeRawStorageForProvider(bifrostCtx, providerKey)) + + // Tag the session context with transport type for downstream logging/metadata. + bifrostCtx.SetValue(schemas.BifrostContextKeyRealtimeTransport, "websocket") + wsURL := rtProvider.RealtimeWebSocketURL(key, model) + realtimeHeaders, headerErr := rtProvider.RealtimeHeaders(bifrostCtx, key) + if headerErr != nil { + clientConn.writeRealtimeError(headerErr) + return + } upstream, err := h.pool.Get(bfws.PoolKey{ Provider: providerKey, KeyID: key.ID, Endpoint: wsURL, - }, mapToHTTPHeader(rtProvider.RealtimeHeaders(key))) + }, mapToHTTPHeader(realtimeHeaders)) if err != nil { clientConn.writeRealtimeError(newRealtimeWireBifrostError(502, "server_error", err.Error())) return @@ -288,6 +317,7 @@ func (h *WSRealtimeHandler) relayClientToRealtimeProvider( } } + sanitizeRealtimeSessionEventForProvider(event) providerEvent, err := provider.ToProviderRealtimeEvent(event) if err != nil { if startsTurn { @@ -310,6 +340,10 @@ func (h *WSRealtimeHandler) relayClientToRealtimeProvider( continue } + // Track session metadata only after provider translation succeeds. Rejected + // session.update events must not affect later turn logs. + updateRealtimeSessionFromEvent(session, event) + // Record tool output / input only after the event passed validation. if !startsTurn { if toolSummary != "" { @@ -402,6 +436,8 @@ func (h *WSRealtimeHandler) relayRealtimeProviderToClient( if event.Session != nil && event.Session.ID != "" { session.SetProviderSessionID(event.Session.ID) } + // Track session tool definitions from session.created/session.updated. + updateRealtimeSessionFromEvent(session, event) if event.Delta != nil && provider.ShouldAccumulateRealtimeOutput(event.Type) { session.AppendRealtimeOutputText(event.Delta.Text) session.AppendRealtimeOutputText(event.Delta.Transcript) @@ -481,25 +517,41 @@ func (h *WSRealtimeHandler) relayRealtimeProviderToClient( } } -func resolveRealtimeTarget(path, modelParam, deploymentParam string) (schemas.ModelProvider, string, error) { +func resolveRealtimeTarget(ctx *fasthttp.RequestCtx, config *lib.Config, path, modelParam, deploymentParam string) (schemas.ModelProvider, string, error) { defaultProvider := realtimeDefaultProviderForPath(path) + var rawParam string switch { case strings.TrimSpace(modelParam) != "": - provider, model := schemas.ParseModelString(strings.TrimSpace(modelParam), defaultProvider) - if provider == "" || strings.TrimSpace(model) == "" { - return "", "", errRealtimeModelFormat - } - return provider, strings.TrimSpace(model), nil + rawParam = strings.TrimSpace(modelParam) case strings.TrimSpace(deploymentParam) != "": - provider, model := schemas.ParseModelString(strings.TrimSpace(deploymentParam), defaultProvider) - if provider == "" || strings.TrimSpace(model) == "" { - return "", "", errRealtimeDeploymentFormat - } - return provider, strings.TrimSpace(model), nil + rawParam = strings.TrimSpace(deploymentParam) default: return "", "", errRealtimeModelRequired } + + provider, model := schemas.ParseModelString(rawParam, defaultProvider) + if strings.TrimSpace(model) == "" { + return "", "", errRealtimeModelFormat + } + + // Model catalog auto-resolution: when no provider prefix is present and the + // path doesn't imply a default provider, look up the model catalog — same + // logic as resolveModelAndProvider in inference.go. + if provider == "" { + providers := config.GetProvidersForModel(model) + if len(providers) == 0 { + return "", "", errRealtimeModelFormat + } + ctx.SetUserValue(lib.FastHTTPUserValueModelCatalogResolution, &lib.ModelCatalogResolution{ + Model: model, + ResolvedProvider: providers[0], + AllProviders: providers, + }) + provider = providers[0] + } + + return provider, model, nil } func realtimeDefaultProviderForPath(path string) schemas.ModelProvider { @@ -673,3 +725,113 @@ func newRealtimeWireBifrostError(status int, code, message string) *schemas.Bifr }, } } + +// applyRealtimeMiddlewareValues copies governance and routing values from the transport +// middleware BifrostContext (populated by HTTPTransportPreHook plugins) to the long-lived +// WebSocket session context. Without this, values set by the governance plugin during +// the HTTP upgrade (routing rule ID/name, VK ID/name, routing engines, routing engine +// logs, raw-storage overrides) would be lost because the WebSocket handler creates a +// fresh BifrostContext that outlives the fasthttp request. +// +// Values already explicitly set by createBifrostContextFromAuth (VK, parent request ID, +// request headers, extra headers) are preserved — middleware values do not overwrite them +// since createBifrostContextFromAuth runs first. +// realtimeMiddlewareKeys lists the BifrostContext keys that TransportInterceptorMiddleware +// copies from the governance plugin's context onto individual fasthttp UserValue slots. +// We snapshot exactly these keys before the WebSocket upgrade so the long-lived session +// has access to routing rule info, virtual key resolution, routing engine logs, etc. +var realtimeMiddlewareKeys = []any{ + schemas.BifrostContextKeyGovernanceVirtualKeyID, + schemas.BifrostContextKeyGovernanceVirtualKeyName, + schemas.BifrostContextKeyGovernanceRoutingRuleID, + schemas.BifrostContextKeyGovernanceRoutingRuleName, + schemas.BifrostContextKeyGovernanceCustomerID, + schemas.BifrostContextKeyGovernanceCustomerName, + schemas.BifrostContextKeyGovernanceTeamID, + schemas.BifrostContextKeyGovernanceTeamName, + schemas.BifrostContextKeyGovernanceBusinessUnitID, + schemas.BifrostContextKeyGovernanceBusinessUnitName, + schemas.BifrostContextKeyGovernanceIncludeOnlyKeys, + schemas.BifrostContextKeyGovernancePluginName, + schemas.BifrostContextKeyRoutingEnginesUsed, + schemas.BifrostContextKeyRoutingEngineLogs, + schemas.BifrostContextKeyShouldStoreRawInLogs, + schemas.BifrostContextKeyCaptureRawRequest, + schemas.BifrostContextKeyCaptureRawResponse, + schemas.BifrostContextKeyDropRawRequestFromClient, + schemas.BifrostContextKeyDropRawResponseFromClient, + schemas.BifrostContextKeyUserID, + schemas.BifrostContextKeyUserName, + schemas.BifrostContextKeyAPIKeyID, + schemas.BifrostContextKeyAPIKeyName, + schemas.BifrostContextKeySelectedKeyID, + schemas.BifrostContextKeySelectedKeyName, + schemas.BifrostContextKeyTraceID, + schemas.BifrostContextKeyTransportPluginLogs, +} + +// snapshotRealtimeMiddlewareValues reads governance/routing values from the fasthttp +// context's UserValue store. TransportInterceptorMiddleware copies them there as +// individual key-value pairs (not inside a BifrostContext). +// +// It also processes FastHTTPUserValueModelCatalogResolution, which is set by +// resolveRealtimeTarget when a bare model name is auto-resolved via the model +// catalog. ConvertToBifrostContext normally handles this for regular inference, +// but WebSocket handlers use createBifrostContextFromAuth instead, so we do the +// same log/engine enrichment here. +func snapshotRealtimeMiddlewareValues(ctx *fasthttp.RequestCtx) map[any]any { + result := make(map[any]any) + for _, key := range realtimeMiddlewareKeys { + if value := ctx.UserValue(key); value != nil { + result[key] = value + } + } + + // Model catalog auto-resolution: replicate the routing engine log that + // ConvertToBifrostContext would normally emit (see lib/ctx.go). + if res, ok := ctx.UserValue(lib.FastHTTPUserValueModelCatalogResolution).(*lib.ModelCatalogResolution); ok && res != nil { + providerStrs := make([]string, len(res.AllProviders)) + for i, p := range res.AllProviders { + providerStrs[i] = string(p) + } + logEntry := schemas.RoutingEngineLogEntry{ + Engine: schemas.RoutingEngineModelCatalog, + Level: schemas.LogLevelInfo, + Message: fmt.Sprintf("No provider specified for model %s, found %d options in model catalog: [%s], selecting first: %s", res.Model, len(res.AllProviders), strings.Join(providerStrs, ", "), res.ResolvedProvider), + Timestamp: time.Now().UnixMilli(), + } + // Merge with any existing routing engine logs from governance middleware. + if existing, ok := result[schemas.BifrostContextKeyRoutingEngineLogs].([]schemas.RoutingEngineLogEntry); ok { + result[schemas.BifrostContextKeyRoutingEngineLogs] = append(existing, logEntry) + } else { + result[schemas.BifrostContextKeyRoutingEngineLogs] = []schemas.RoutingEngineLogEntry{logEntry} + } + if existing, ok := result[schemas.BifrostContextKeyRoutingEnginesUsed].([]string); ok { + result[schemas.BifrostContextKeyRoutingEnginesUsed] = append(existing, schemas.RoutingEngineModelCatalog) + } else { + result[schemas.BifrostContextKeyRoutingEnginesUsed] = []string{schemas.RoutingEngineModelCatalog} + } + } + + if len(result) == 0 { + return nil + } + return result +} + +func applyRealtimeMiddlewareValues(ctx *schemas.BifrostContext, middlewareValues map[any]any) { + if ctx == nil || len(middlewareValues) == 0 { + return + } + for key, value := range middlewareValues { + if value == nil { + continue + } + // Skip values already set by createBifrostContextFromAuth to avoid overwriting + // auth-resolved values with stale middleware copies. + if existing := ctx.Value(key); existing != nil { + continue + } + ctx.SetValue(key, value) + } +} diff --git a/transports/bifrost-http/integrations/router.go b/transports/bifrost-http/integrations/router.go index 476190a149..00ff4274ec 100644 --- a/transports/bifrost-http/integrations/router.go +++ b/transports/bifrost-http/integrations/router.go @@ -667,13 +667,20 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle // Execute the request through Bifrost bifrostCtx, cancel := lib.ConvertToBifrostContext(ctx, g.handlerStore) + // Centralized cleanup. The streaming branch below transfers ownership via + // streamingOwnsCancel because its producer goroutine outlives this lambda. + streamingOwnsCancel := false + defer func() { + if !streamingOwnsCancel { + cancel() + } + }() // Set integration type to context bifrostCtx.SetValue(schemas.BifrostContextKeyIntegrationType, string(config.Type)) // Async retrieve: check x-bf-async-id header early (before body parsing) if asyncID := string(ctx.Request.Header.Peek(schemas.AsyncHeaderGetID)); asyncID != "" { - defer cancel() g.handleAsyncRetrieve(ctx, config, bifrostCtx) return } @@ -686,7 +693,6 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle var err error isLargePayload, err = g.largePayloadHook(ctx, bifrostCtx, config.Type) if err != nil { - cancel() g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "large payload detection failed")) return } @@ -699,7 +705,6 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle } else if config.RequestParser != nil { // Use custom parser (e.g., for multipart/form-data) if err := config.RequestParser(ctx, req); err != nil { - cancel() g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "failed to parse request")) return } @@ -708,7 +713,6 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle rawBody = ctx.Request.Body() if len(rawBody) > 0 { if err := sonic.Unmarshal(rawBody, req); err != nil { - cancel() g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "Invalid JSON")) return } @@ -753,12 +757,10 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle if config.ShortCircuit != nil { handled, err := config.ShortCircuit(ctx, bifrostCtx, req) if err != nil { - defer cancel() g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "short-circuit handler error: "+err.Error())) return } if handled { - defer cancel() return } } @@ -767,7 +769,6 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle if config.GetRequestModel != nil { model, err := config.GetRequestModel(ctx, req) if err != nil { - cancel() g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "failed to get model from context")) return } @@ -812,7 +813,6 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle isGenAIBatchCreate := config.Type == RouteConfigTypeGenAI && bifrostCtx.Value(isGeminiBatchCreateRequestContextKey) != nil useBatchPath := config.BatchRequestConverter != nil && (config.RequestConverter == nil || config.Type != RouteConfigTypeGenAI || isGenAIBatchCreate) if useBatchPath { - defer cancel() batchReq, err := config.BatchRequestConverter(bifrostCtx, req) if err != nil { g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "failed to convert batch request")) @@ -827,7 +827,6 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle } // Handle file requests if FileRequestConverter is set if config.FileRequestConverter != nil { - defer cancel() fileReq, err := config.FileRequestConverter(bifrostCtx, req) if err != nil { g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "failed to convert file request")) @@ -843,7 +842,6 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle // Handle container requests if ContainerRequestConverter is set if config.ContainerRequestConverter != nil { - defer cancel() containerReq, err := config.ContainerRequestConverter(bifrostCtx, req) if err != nil { g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "failed to convert container request")) @@ -859,7 +857,6 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle // Handle container file requests if ContainerFileRequestConverter is set if config.ContainerFileRequestConverter != nil { - defer cancel() containerFileReq, err := config.ContainerFileRequestConverter(bifrostCtx, req) if err != nil { g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "failed to convert container file request")) @@ -875,7 +872,6 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle // Handle cached content requests if CachedContentRequestConverter is set if config.CachedContentRequestConverter != nil { - defer cancel() cachedContentReq, err := config.CachedContentRequestConverter(bifrostCtx, req) if err != nil { g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "failed to convert cached content request")) @@ -892,12 +888,10 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle // Convert the integration-specific request to Bifrost format (inference requests) bifrostReq, err := config.RequestConverter(bifrostCtx, req) if err != nil { - defer cancel() g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "failed to convert request to Bifrost format")) return } if bifrostReq == nil { - defer cancel() g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(nil, "invalid request")) return } @@ -907,14 +901,12 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle // Extract and parse fallbacks from the request if present if err := g.extractAndParseFallbacks(req, bifrostReq); err != nil { - defer cancel() g.sendError(ctx, bifrostCtx, config.ErrorConverter, newBifrostError(err, "failed to parse fallbacks: "+err.Error())) return } // Async create: check x-bf-async header (needs parsed bifrostReq) if string(ctx.Request.Header.Peek(schemas.AsyncHeaderCreate)) != "" { - defer cancel() g.handleAsyncCreate(ctx, config, req, bifrostReq, bifrostCtx) return } @@ -926,9 +918,12 @@ func (g *GenericRouter) createHandler(config RouteConfig) fasthttp.RequestHandle } if isStreaming { + // Hand cancel ownership to the streaming path; its producer goroutine + // fires cancel on client-disconnect (handleStreaming) and on pre-stream + // errors (handleStreamingRequest). + streamingOwnsCancel = true g.handleStreamingRequest(ctx, config, bifrostReq, bifrostCtx, cancel) } else { - defer cancel() // Ensure cleanup on function exit g.handleNonStreamingRequest(ctx, config, req, bifrostReq, bifrostCtx) } } diff --git a/transports/bifrost-http/lib/config.go b/transports/bifrost-http/lib/config.go index dc3849b6aa..87e7224c16 100644 --- a/transports/bifrost-http/lib/config.go +++ b/transports/bifrost-http/lib/config.go @@ -111,16 +111,26 @@ func getWeight(w *float64) float64 { return *w } +// BuiltinPluginNames is the canonical list of built-in plugin names. +// It is the single source of truth — update here when adding or removing a built-in plugin. +var builtinPluginNames = []string{ + telemetry.PluginName, + prompts.PluginName, + logging.PluginName, + governance.PluginName, + otel.PluginName, + semanticcache.PluginName, + compat.PluginName, + maxim.PluginName, +} + +func GetBuiltinPluginNames() []string { + return slices.Clone(builtinPluginNames) +} + // IsBuiltinPlugin checks if a plugin is a built-in plugin func IsBuiltinPlugin(name string) bool { - return name == telemetry.PluginName || - name == prompts.PluginName || - name == logging.PluginName || - name == governance.PluginName || - name == compat.PluginName || - name == maxim.PluginName || - name == semanticcache.PluginName || - name == otel.PluginName + return slices.Contains(builtinPluginNames, name) } // pluginOrderInfo stores ordering metadata for a plugin. @@ -408,6 +418,48 @@ func applyV1Compat(configData *ConfigData) { } } +// promoteDeprecatedCalendarAligned lifts the legacy per-budget / per-rate-limit +// calendar_aligned input to the owning VK or Team. Owner wins if already true; +// otherwise OR across descendants (own budgets/rate-limit + every provider +// config's budgets/rate-limit). Inner pointers are always cleared. Mirrors the +// enterprise promoteDeprecatedAccessProfileCalendarAligned at the access +// profile level. Runs on every load regardless of config version +func promoteDeprecatedCalendarAligned(configData *ConfigData) { + if configData == nil || configData.Governance == nil { + return + } + for i := range configData.Governance.VirtualKeys { + vk := &configData.Governance.VirtualKeys[i] + promoteCalendarAligned(&vk.CalendarAligned, vk.Budgets, vk.RateLimit) + for j := range vk.ProviderConfigs { + pc := &vk.ProviderConfigs[j] + promoteCalendarAligned(&vk.CalendarAligned, pc.Budgets, pc.RateLimit) + } + } + for i := range configData.Governance.Teams { + team := &configData.Governance.Teams[i] + promoteCalendarAligned(&team.CalendarAligned, team.Budgets, team.RateLimit) + } +} + +// promoteCalendarAligned ORs each child's legacy calendar_aligned input into +// the owner's flag and clears the child field. Treats a nil child pointer as +// "not set" — only explicit true contributes. +func promoteCalendarAligned(owner *bool, budgets []configstoreTables.TableBudget, rateLimit *configstoreTables.TableRateLimit) { + for i := range budgets { + if budgets[i].CalendarAlignedInput != nil && *budgets[i].CalendarAlignedInput { + *owner = true + } + budgets[i].CalendarAlignedInput = nil + } + if rateLimit != nil && rateLimit.CalendarAlignedInput != nil { + if *rateLimit.CalendarAlignedInput { + *owner = true + } + rateLimit.CalendarAlignedInput = nil + } +} + // LoadConfig loads initial configuration from a JSON config file into memory // with full preprocessing including environment variable resolution and key config parsing. // All processing is done upfront to ensure zero latency when retrieving data. @@ -494,6 +546,10 @@ func LoadConfig(ctx context.Context, configDirPath string) (*Config, error) { return nil, fmt.Errorf("failed to unmarshal config: %w", err) } logger.Info("loading configuration from: %s", absConfigFilePath) + // Promote deprecated per-budget / per-rate-limit calendar_aligned to the + // owning VK / Team. Independent of config version — the deprecation + // predates the v1/v2 allow-list split. + promoteDeprecatedCalendarAligned(&configData) // If version is 1, apply v1.4.x compatibility: empty allow-list arrays mean "allow all" if configData.Version == 1 { logger.Info("config version 1 detected, applying v1.4.x compatibility semantics (empty arrays = allow all)") diff --git a/transports/bifrost-http/lib/config_test.go b/transports/bifrost-http/lib/config_test.go index 3b9cc73355..0c3f775ae4 100644 --- a/transports/bifrost-http/lib/config_test.go +++ b/transports/bifrost-http/lib/config_test.go @@ -374,6 +374,7 @@ import ( "github.com/maximhq/bifrost/framework/logstore" "github.com/maximhq/bifrost/framework/modelcatalog" "github.com/maximhq/bifrost/framework/vectorstore" + otelPlugin "github.com/maximhq/bifrost/plugins/otel" "github.com/stretchr/testify/require" "gorm.io/driver/sqlite" "gorm.io/gorm" @@ -18187,3 +18188,55 @@ func TestVersionField_Version2_NoCompat(t *testing.T) { require.Empty(t, anthropicCfg.Keys[0].Models, "v2 semantics: empty models must NOT be normalised") } + +// ============================================================================= +// OtelPluginSpanFilter seeding tests +// ============================================================================= + +// TestLoadPlugins_OtelPluginSpanFilterPassthrough verifies that plugin_span_filter +// set directly inside the OTEL plugin config (the standard config.json location) +// is preserved unchanged through loadPlugins — no special handling needed. +func TestLoadPlugins_OtelPluginSpanFilterPassthrough(t *testing.T) { + initTestLogger() + ctx := context.Background() + + otelCfg := map[string]any{ + "collector_url": "localhost:4317", + "trace_type": "genai_extension", + "protocol": "grpc", + "plugin_span_filter": map[string]any{ + "mode": "exclude", + "plugins": []any{"logging", "compat"}, + }, + } + configData := &ConfigData{ + Plugins: []*schemas.PluginConfig{ + {Name: otelPlugin.PluginName, Enabled: true, Config: otelCfg}, + }, + } + + cfg := &Config{} + loadPlugins(ctx, cfg, configData) + + var found *schemas.PluginConfig + for _, pc := range cfg.PluginConfigs { + if pc.Name == otelPlugin.PluginName { + found = pc + break + } + } + require.NotNil(t, found, "OTEL plugin should be present in PluginConfigs after loadPlugins") + + pluginCfg, ok := found.Config.(map[string]any) + require.True(t, ok, "OTEL plugin Config should be map[string]any") + + raw, ok := pluginCfg["plugin_span_filter"] + require.True(t, ok, "plugin_span_filter should be preserved in OTEL plugin config") + + filterMap, ok := raw.(map[string]any) + require.True(t, ok, "plugin_span_filter should be a map") + require.Equal(t, "exclude", filterMap["mode"]) + plugins, ok := filterMap["plugins"].([]any) + require.True(t, ok, "plugin_span_filter.plugins should be an array") + require.ElementsMatch(t, []any{"logging", "compat"}, plugins) +} diff --git a/transports/bifrost-http/lib/validator.go b/transports/bifrost-http/lib/validator.go index ad4b11215d..c1ee50b4dd 100644 --- a/transports/bifrost-http/lib/validator.go +++ b/transports/bifrost-http/lib/validator.go @@ -16,8 +16,8 @@ import ( // localSchemaCandidates lists paths (relative to CWD) where config.schema.json may be found // when running from a source checkout. Checked in order before falling back to the remote URL. var localSchemaCandidates = []string{ - "config.schema.json", // running from transports/ - "../config.schema.json", // running from transports/bifrost-http/ + "config.schema.json", // running from transports/ + "../config.schema.json", // running from transports/bifrost-http/ "transports/config.schema.json", // running from repo root } diff --git a/transports/bifrost-http/server/server.go b/transports/bifrost-http/server/server.go index 3dea79bdb4..c47ad3892e 100644 --- a/transports/bifrost-http/server/server.go +++ b/transports/bifrost-http/server/server.go @@ -1113,11 +1113,18 @@ func (s *BifrostHTTPServer) RegisterAPIRoutes(ctx context.Context, callbacks Ser return fmt.Errorf("failed to initialize governance handler: %v", err) } } - var cacheHandler *handlers.CacheHandler - semanticCachePlugin, _ := lib.FindPluginAs[*semanticcache.Plugin](s.Config, semanticcache.PluginName) - if semanticCachePlugin != nil { - cacheHandler = handlers.NewCacheHandler(semanticCachePlugin) - } + // Resolve the semantic_cache plugin per request so plugin reloads via + // /api/plugins are honored — the previous boot-time capture left stale + // references and (worse) skipped route registration entirely when the + // plugin wasn't in config.json at startup, causing 405 on all cache-clear + // endpoints for the process lifetime. + cacheHandler := handlers.NewCacheHandler(func() handlers.CacheClearer { + p, err := lib.FindPluginAs[*semanticcache.Plugin](s.Config, semanticcache.PluginName) + if err != nil || p == nil { + return nil + } + return p + }) var promptsReloader handlers.PromptCacheReloader if promptsPlugin, err := lib.FindPluginAs[handlers.PromptCacheReloader](s.Config, s.getPromptsPluginName()); err == nil && promptsPlugin != nil { promptsReloader = promptsPlugin @@ -1162,9 +1169,7 @@ func (s *BifrostHTTPServer) RegisterAPIRoutes(ctx context.Context, callbacks Ser if promptsHandler != nil { promptsHandler.RegisterRoutes(s.Router, middlewares...) } - if cacheHandler != nil { - cacheHandler.RegisterRoutes(s.Router, middlewares...) - } + cacheHandler.RegisterRoutes(s.Router, middlewares...) if governanceHandler != nil { governanceHandler.RegisterRoutes(s.Router, middlewares...) } diff --git a/transports/bifrost-http/websocket/session.go b/transports/bifrost-http/websocket/session.go index e10180280e..3d95977f83 100644 --- a/transports/bifrost-http/websocket/session.go +++ b/transports/bifrost-http/websocket/session.go @@ -1,6 +1,7 @@ package websocket import ( + "encoding/json" "strings" "sync" "time" @@ -47,6 +48,14 @@ type Session struct { // attached to a persisted turn, so late transcript updates do not pollute later turns. realtimeConsumedTurnItemIDs map[string]struct{} + // realtimeSessionTools holds the latest session tool definitions from + // session.created / session.updated / session.update events, so that + // each turn log can record which tools were available. + realtimeSessionTools json.RawMessage + + // realtimeVoice holds the voice from the latest session configuration. + realtimeVoice string + // realtimeTurnHooks tracks the active turn-scoped plugin pipeline between // response.create and response.done. realtimeTurnHooks *RealtimeTurnPluginState @@ -73,6 +82,8 @@ type RealtimeTurnPluginState struct { RequestID string StartedAt time.Time PreHookValues map[any]any + TraceID string + RawStore bool } // NewSession creates a new session for a client WebSocket connection. @@ -170,6 +181,42 @@ func (s *Session) ProviderSessionID() string { return s.providerSessionID } +// SetRealtimeSessionTools updates the tracked session tool definitions. +// Called when session.created, session.updated, or session.update events +// carry a tools array. +func (s *Session) SetRealtimeSessionTools(tools json.RawMessage) { + s.mu.Lock() + defer s.mu.Unlock() + if s.closed { + return + } + s.realtimeSessionTools = tools +} + +// RealtimeSessionTools returns the latest session tool definitions, or nil. +func (s *Session) RealtimeSessionTools() json.RawMessage { + s.mu.RLock() + defer s.mu.RUnlock() + return s.realtimeSessionTools +} + +// SetRealtimeVoice updates the tracked voice from session configuration. +func (s *Session) SetRealtimeVoice(voice string) { + s.mu.Lock() + defer s.mu.Unlock() + if s.closed { + return + } + s.realtimeVoice = voice +} + +// RealtimeVoice returns the current session voice, or empty string. +func (s *Session) RealtimeVoice() string { + s.mu.RLock() + defer s.mu.RUnlock() + return s.realtimeVoice +} + // AppendRealtimeOutputText appends provider output content for the current realtime turn. func (s *Session) AppendRealtimeOutputText(text string) { if text == "" { @@ -177,6 +224,9 @@ func (s *Session) AppendRealtimeOutputText(text string) { } s.mu.Lock() defer s.mu.Unlock() + if s.closed { + return + } s.realtimeOutputText += text } @@ -236,11 +286,16 @@ func (s *Session) recordRealtimeTurnInput(itemID, role, summary, raw string) { s.mu.Lock() defer s.mu.Unlock() + if s.closed { + return + } itemID = strings.TrimSpace(itemID) if itemID != "" { - if _, consumed := s.realtimeConsumedTurnItemIDs[itemID]; consumed { - return + if s.realtimeConsumedTurnItemIDs != nil { + if _, consumed := s.realtimeConsumedTurnItemIDs[itemID]; consumed { + return + } } for idx := range s.realtimeTurnInputs { if s.realtimeTurnInputs[idx].ItemID != itemID || s.realtimeTurnInputs[idx].Role != role { @@ -250,15 +305,11 @@ func (s *Session) recordRealtimeTurnInput(itemID, role, summary, raw string) { s.realtimeTurnInputs[idx].Summary = summary } if strings.TrimSpace(raw) != "" { - existingRaw := strings.TrimSpace(s.realtimeTurnInputs[idx].Raw) - incomingRaw := strings.TrimSpace(raw) - switch { - case existingRaw == "": - s.realtimeTurnInputs[idx].Raw = raw - case incomingRaw == "" || existingRaw == incomingRaw: - default: - s.realtimeTurnInputs[idx].Raw = existingRaw + "\n\n" + incomingRaw - } + // Same item ID + role: replace raw with the latest event. + // Later events (e.g. conversation.item.created after + // conversation.item.create) carry the same or more complete + // data, so the newest version is always preferred. + s.realtimeTurnInputs[idx].Raw = raw } return } @@ -377,6 +428,15 @@ func (s *Session) Close() { s.realtimeTurnHooks = nil } s.realtimeTurnBusy = false + + // Release accumulated turn data so GC can reclaim memory even if a + // goroutine briefly holds a reference to this session after close. + s.realtimeTurnInputs = nil + s.realtimeConsumedTurnItemIDs = nil + s.realtimeSessionTools = nil + s.realtimeVoice = "" + s.realtimeOutputText = "" + if s.clientConn != nil { _ = s.clientConn.Close() } diff --git a/transports/changelog.md b/transports/changelog.md index e69de29bb2..991bba554e 100644 --- a/transports/changelog.md +++ b/transports/changelog.md @@ -0,0 +1 @@ +[fix]: add missing padding to provider API structure form [@delm](https://github.com/delm) diff --git a/transports/config.schema.json b/transports/config.schema.json index 2728b039f9..535e67ce05 100644 --- a/transports/config.schema.json +++ b/transports/config.schema.json @@ -392,7 +392,7 @@ }, "calendar_aligned": { "type": "boolean", - "description": "Snap reset windows to clean calendar boundaries (day, week, month, year)", + "description": "Deprecated: set calendar_aligned on the owner (team / virtual key / access profile) instead. Kept for backward compatibility with older config.json files; ignored unless a reconciler promotes it to its owner.", "default": false } }, @@ -448,7 +448,7 @@ }, "calendar_aligned": { "type": "boolean", - "description": "Snap reset windows to clean calendar boundaries (day, week, month, year)", + "description": "Deprecated: set calendar_aligned on the owner (team / virtual key / access profile) instead. Kept for backward compatibility with older config.json files; ignored unless a reconciler promotes it to its owner.", "default": false } }, @@ -525,6 +525,11 @@ "type": "object", "description": "Team claims data" }, + "calendar_aligned": { + "type": "boolean", + "description": "Snap the team's budget and rate-limit reset windows to clean calendar boundaries (day, week, month, year)", + "default": false + }, "virtual_key_count": { "type": "integer", "description": "Computed count of virtual keys associated with this team", @@ -1440,10 +1445,6 @@ "type": "string", "description": "Model to use for generating embeddings in provider-backed semantic caching. Required when provider is set and not allowed in direct-only mode." }, - "cleanup_on_shutdown": { - "type": "boolean", - "description": "Clean up cache on shutdown (default: false)" - }, "ttl": { "description": "Time-to-live for cached responses (supports duration strings like '5m', '1h' or seconds as number, default: 5min)", "oneOf": [ @@ -1612,6 +1613,22 @@ "insecure": { "type": "boolean", "description": "Skip TLS verification (ignored if tls_ca_cert is set)" + }, + "plugin_span_filter": { + "type": "object", + "description": "Controls which plugin hook spans are exported to the OTEL collector. Omit to export all plugin spans.", + "properties": { + "mode": { + "type": "string", + "enum": ["include", "exclude"] + }, + "plugins": { + "type": "array", + "items": { "type": "string" } + } + }, + "required": ["mode", "plugins"], + "additionalProperties": false } }, "required": ["collector_url", "trace_type", "protocol"], @@ -3269,6 +3286,12 @@ "type": "boolean", "description": "Whether cluster mode is enabled" }, + "type": { + "type": "string", + "enum": ["mesh", "broker"], + "default": "mesh", + "description": "Clustering mode. 'mesh' (default) is peer-to-peer via memberlist gossip. 'broker' routes all cluster traffic through a central broker process, for platforms without peer-to-peer connectivity (e.g. Cloud Run)." + }, "region": { "type": "string", "description": "Region label for cluster deployment (runtime default: unknown)" @@ -3415,6 +3438,32 @@ }, "required": ["type"], "additionalProperties": false + }, + "broker": { + "type": "object", + "description": "Broker settings, used when type is 'broker'", + "properties": { + "address": { + "type": "string", + "description": "host:port of the broker that nodes dial" + }, + "tls": { + "type": "boolean", + "description": "Whether to dial the broker over TLS" + }, + "auth_token": { + "type": "string", + "description": "Optional shared secret sent on connect" + }, + "listen_port": { + "type": "integer", + "minimum": 1, + "maximum": 65535, + "description": "Port the broker process serves on (default: 50051)" + } + }, + "required": ["address"], + "additionalProperties": false } }, "required": ["enabled"], @@ -3766,7 +3815,12 @@ "timeout": { "type": "integer", "minimum": 0, - "description": "Timeout in milliseconds for rule execution" + "description": "Timeout in seconds for rule execution" + }, + "max_turns_to_send": { + "type": "integer", + "minimum": 0, + "description": "Number of historical conversation turns to send to the guardrail provider; the latest message is always included on top. 0 sends all turns." }, "provider_config_ids": { "type": "array", @@ -3805,7 +3859,7 @@ "timeout": { "type": "integer", "minimum": 0, - "description": "Timeout in milliseconds for provider execution" + "description": "Timeout in seconds for provider execution" }, "config": { "type": "object", @@ -3840,6 +3894,11 @@ "reset_duration": { "type": "string", "description": "Reset window, e.g. \"1M\", \"1h\"" + }, + "calendar_aligned": { + "type": "boolean", + "description": "Deprecated: set calendar_aligned on the parent access profile instead. Kept for backward compatibility with older config.json files; the access-profile reconciler promotes any true value here to the profile's top-level calendar_aligned at load time.", + "default": false } }, "required": ["id", "max_limit", "reset_duration"], @@ -3867,6 +3926,11 @@ "request_reset_duration": { "type": "string", "description": "Request reset window, e.g. \"1M\", \"1h\"" + }, + "calendar_aligned": { + "type": "boolean", + "description": "Deprecated: set calendar_aligned on the parent access profile instead. Kept for backward compatibility with older config.json files; the access-profile reconciler promotes any true value here to the profile's top-level calendar_aligned at load time.", + "default": false } }, "required": ["id"], @@ -3905,6 +3969,11 @@ "rate_limit": { "$ref": "#/$defs/rate_limit_line" }, + "calendar_aligned": { + "type": "boolean", + "description": "Snap budget and rate-limit reset windows to clean calendar boundaries (day, week, month, year) for this profile", + "default": false + }, "provider_configs": { "type": "array", "description": "Per-provider restrictions and limits for this profile", diff --git a/ui/app/_fallbacks/enterprise/components/mcp-tool-groups/mcpToolGroups.tsx b/ui/app/_fallbacks/enterprise/components/mcp-tool-groups/mcpToolGroups.tsx index 8e6dfc089d..a301c4302a 100644 --- a/ui/app/_fallbacks/enterprise/components/mcp-tool-groups/mcpToolGroups.tsx +++ b/ui/app/_fallbacks/enterprise/components/mcp-tool-groups/mcpToolGroups.tsx @@ -4,7 +4,7 @@ import ContactUsView from "../views/contactUsView"; export default function MCPToolGroups() { return ( <> -

+

MCP tool groups

Configure tool groups for MCP servers to organize and govern tools.

diff --git a/ui/app/_fallbacks/enterprise/components/user-groups/teamsView.tsx b/ui/app/_fallbacks/enterprise/components/user-groups/teamsView.tsx index 51f8547dd8..997da5fae2 100644 --- a/ui/app/_fallbacks/enterprise/components/user-groups/teamsView.tsx +++ b/ui/app/_fallbacks/enterprise/components/user-groups/teamsView.tsx @@ -105,7 +105,9 @@ export function TeamsView() { onOffsetChange={(newOffset) => setUrlState({ offset: newOffset })} selectedTeamId={urlState.selected_team || null} onTeamAdd={() => setUrlState({ selected_team: "new" })} - onTeamSelect={(team) => { setUrlState({ selected_team: team?.id ?? null }) }} + onTeamSelect={(team) => { + setUrlState({ selected_team: team?.id ?? null }); + }} onDialogClose={() => setUrlState({ selected_team: null })} />
diff --git a/ui/app/_fallbacks/enterprise/lib/contexts/rbacContext.tsx b/ui/app/_fallbacks/enterprise/lib/contexts/rbacContext.tsx index 8d89e25cb4..92f75fac06 100644 --- a/ui/app/_fallbacks/enterprise/lib/contexts/rbacContext.tsx +++ b/ui/app/_fallbacks/enterprise/lib/contexts/rbacContext.tsx @@ -86,4 +86,4 @@ export function useRbacContext() { }; } return context; -} +} \ No newline at end of file diff --git a/ui/app/_fallbacks/enterprise/lib/store/apis/accessProfileApi.ts b/ui/app/_fallbacks/enterprise/lib/store/apis/accessProfileApi.ts index 5830eed3bb..c5038baeee 100644 --- a/ui/app/_fallbacks/enterprise/lib/store/apis/accessProfileApi.ts +++ b/ui/app/_fallbacks/enterprise/lib/store/apis/accessProfileApi.ts @@ -15,4 +15,4 @@ export const useGetUserAccessProfilesQuery = ( isLoading: false, isError: false, error: null, -}); +}); \ No newline at end of file diff --git a/ui/app/_fallbacks/enterprise/lib/store/apis/scimApi.ts b/ui/app/_fallbacks/enterprise/lib/store/apis/scimApi.ts index 9b2031999d..5f31f3c0c1 100644 --- a/ui/app/_fallbacks/enterprise/lib/store/apis/scimApi.ts +++ b/ui/app/_fallbacks/enterprise/lib/store/apis/scimApi.ts @@ -13,4 +13,4 @@ export const useGetAuthTypeQuery = ( isLoading: false, isError: false, error: null, -}); +}); \ No newline at end of file diff --git a/ui/app/_fallbacks/enterprise/lib/store/apis/virtualKeyUsersApi.ts b/ui/app/_fallbacks/enterprise/lib/store/apis/virtualKeyUsersApi.ts index fc33a9ff78..e45cc092a7 100644 --- a/ui/app/_fallbacks/enterprise/lib/store/apis/virtualKeyUsersApi.ts +++ b/ui/app/_fallbacks/enterprise/lib/store/apis/virtualKeyUsersApi.ts @@ -19,4 +19,4 @@ export const useGetVirtualKeyUsersQuery = ( isLoading: false, isError: false, error: null, -}); +}); \ No newline at end of file diff --git a/ui/app/_fallbacks/enterprise/lib/types/accessProfile.ts b/ui/app/_fallbacks/enterprise/lib/types/accessProfile.ts index 66dfb23d98..414743dafb 100644 --- a/ui/app/_fallbacks/enterprise/lib/types/accessProfile.ts +++ b/ui/app/_fallbacks/enterprise/lib/types/accessProfile.ts @@ -38,4 +38,4 @@ export interface UserAccessProfile { export interface GetUserAccessProfilesResponse { access_profiles: UserAccessProfile[]; -} +} \ No newline at end of file diff --git a/ui/app/_fallbacks/enterprise/lib/types/user.ts b/ui/app/_fallbacks/enterprise/lib/types/user.ts index b5b6727fe9..8aea128c02 100644 --- a/ui/app/_fallbacks/enterprise/lib/types/user.ts +++ b/ui/app/_fallbacks/enterprise/lib/types/user.ts @@ -27,4 +27,4 @@ export interface GetUsersResponse { limit: number; total_pages: number; has_more: boolean; -} +} \ No newline at end of file diff --git a/ui/app/workspace/cluster/page.tsx b/ui/app/workspace/cluster/page.tsx index 8cb1b4be26..155d593efb 100644 --- a/ui/app/workspace/cluster/page.tsx +++ b/ui/app/workspace/cluster/page.tsx @@ -2,7 +2,7 @@ import ClusterView from "@enterprise/components/cluster/clusterView"; export default function ClusterPage() { return ( -
+
); diff --git a/ui/app/workspace/config/views/cachingView.tsx b/ui/app/workspace/config/views/cachingView.tsx index 4c00456cdc..c1279ce1a8 100644 --- a/ui/app/workspace/config/views/cachingView.tsx +++ b/ui/app/workspace/config/views/cachingView.tsx @@ -1,22 +1,282 @@ -import { getErrorMessage, useGetCoreConfigQuery } from "@/lib/store"; -import PluginsForm from "./pluginsForm"; +import { Button } from "@/components/ui/button"; +import { Input } from "@/components/ui/input"; +import { Label } from "@/components/ui/label"; +import { ModelMultiselect } from "@/components/ui/modelMultiselect"; +import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from "@/components/ui/select"; +import { Switch } from "@/components/ui/switch"; +import { Tabs, TabsList, TabsTrigger } from "@/components/ui/tabs"; +import { ProviderIconType, RenderProviderIcon } from "@/lib/constants/icons"; +import { EmbeddingSupportedProviders, getProviderLabel } from "@/lib/constants/logs"; +import { + getErrorMessage, + useCreatePluginMutation, + useGetCoreConfigQuery, + useGetPluginsQuery, + useGetProvidersQuery, + useUpdatePluginMutation, +} from "@/lib/store"; +import { CacheConfig, EditorCacheConfig, ModelProvider, ModelProviderName } from "@/lib/types/config"; +import { SEMANTIC_CACHE_PLUGIN } from "@/lib/types/plugins"; +import { cn } from "@/lib/utils"; +import { Loader2 } from "lucide-react"; +import { useEffect, useMemo, useState } from "react"; +import { toast } from "sonner"; + +// The local cache plugin runs in one of two modes. Direct-only is purely +// hash-based, no embedding provider needed; perfect for exact-replay +// caching. Semantic adds vector similarity on top, requiring an +// embedding-capable provider and the model's real dimension. +type CacheMode = "direct" | "semantic"; + +// Embedding-capable providers gate the semantic mode. Built-in providers +// are listed in EmbeddingSupportedProviders; custom providers expose +// support via custom_provider_config.allowed_requests.embedding. +const supportsEmbedding = (provider: ModelProvider): boolean => { + if (provider.custom_provider_config) { + return provider.custom_provider_config.allowed_requests?.embedding === true; + } + return (EmbeddingSupportedProviders as readonly string[]).includes(provider.name); +}; + +const defaultDirectConfig: EditorCacheConfig = { + ttl: 300, + threshold: 0.8, + dimension: 1, + conversation_history_threshold: 3, + exclude_system_prompt: false, + cache_by_model: true, + cache_by_provider: true, +}; + +// Configs we treat as "the user has nothing saved": both API responses +// where every field is the type's zero value and the literal undefined +// look like this. +const isEmptyConfig = (config: Partial | undefined): boolean => { + if (!config) return true; + // Booleans are deliberate user choices (e.g. cache_by_model: false), not + // empty markers — only treat numeric/string zero values as empty. + const isZero = (v: unknown) => v === undefined || v === null || v === 0 || v === ""; + return Object.values(config).every(isZero); +}; + +const toEditorCacheConfig = (config?: Partial): EditorCacheConfig => { + if (!config || isEmptyConfig(config)) { + return { ...defaultDirectConfig }; + } + return { ...defaultDirectConfig, ...config }; +}; + +const inferMode = (config: EditorCacheConfig): CacheMode => { + if (config.dimension && config.dimension > 1 && config.provider) return "semantic"; + return "direct"; +}; + +// Strip semantic-only fields when persisting a direct-only payload so the +// server validator doesn't reject a stale provider choice. +const buildPayload = (config: EditorCacheConfig, mode: CacheMode): CacheConfig => { + const base = { + ttl: config.ttl ?? 0, + threshold: config.threshold ?? 0, + conversation_history_threshold: config.conversation_history_threshold, + exclude_system_prompt: config.exclude_system_prompt, + cache_by_model: config.cache_by_model, + cache_by_provider: config.cache_by_provider, + vector_store_namespace: config.vector_store_namespace?.trim() || undefined, + default_cache_key: config.default_cache_key?.trim() || undefined, + }; + if (mode === "direct") { + return { ...base, dimension: 1 } as CacheConfig; + } + return { + ...base, + provider: config.provider as ModelProviderName, + embedding_model: config.embedding_model ?? "", + dimension: config.dimension ?? 0, + } as CacheConfig; +}; + +const validateForSave = (config: EditorCacheConfig, mode: CacheMode): string | null => { + if (mode === "semantic") { + if (!config.provider) return "Pick an embedding provider for semantic mode, or switch to Direct only."; + if (!config.embedding_model?.trim()) return "Pick an embedding model for semantic mode."; + if (!config.dimension || config.dimension <= 1) { + return "Semantic mode requires the embedding model's real dimension (must be > 1)."; + } + } + if (config.ttl !== undefined && config.ttl < 0) return "TTL must be non-negative."; + if (config.threshold !== undefined && (config.threshold < 0 || config.threshold > 1)) { + return "Similarity threshold must be between 0 and 1."; + } + if ( + config.conversation_history_threshold !== undefined && + (config.conversation_history_threshold < 1 || config.conversation_history_threshold > 50) + ) { + return "Conversation history threshold must be between 1 and 50."; + } + return null; +}; export default function CachingView() { - const { data: bifrostConfig, isLoading, error: configError } = useGetCoreConfigQuery({ fromDB: true }); + const { data: bifrostConfig, isLoading: configLoading, error: configError } = useGetCoreConfigQuery({ fromDB: true }); + const isVectorStoreEnabled = bifrostConfig?.is_cache_connected ?? false; + + // Local cache state lives on the plugin row keyed by SEMANTIC_CACHE_PLUGIN. + // No dedicated /local-cache-config endpoint exists — the plugins API is + // the source of truth for both the enabled flag and the config blob. + const { data: plugins, isLoading: pluginsLoading } = useGetPluginsQuery(); + const semanticCachePlugin = useMemo(() => plugins?.find((p) => p.name === SEMANTIC_CACHE_PLUGIN), [plugins]); + const enabledOnServer = Boolean(semanticCachePlugin?.enabled); + + const { data: providersData, error: providersError, isLoading: providersLoading } = useGetProvidersQuery(); + const providers = useMemo(() => providersData || [], [providersData]); + const embeddingProviders = useMemo(() => providers.filter(supportsEmbedding), [providers]); + + const [updatePlugin, { isLoading: isUpdating }] = useUpdatePluginMutation(); + const [createPlugin, { isLoading: isCreating }] = useCreatePluginMutation(); + const isSaving = isUpdating || isCreating; + + const [cacheConfig, setCacheConfig] = useState(defaultDirectConfig); + const [serverCacheConfig, setServerCacheConfig] = useState(defaultDirectConfig); + const [mode, setMode] = useState("direct"); + + // Hydrate from the plugin row once it lands. If the plugin doesn't exist + // yet (first-time setup), keep the default direct-only seed so the user + // can start typing before any save. + useEffect(() => { + if (plugins === undefined) return; + if (!semanticCachePlugin?.config) return; + const editorConfig = toEditorCacheConfig(semanticCachePlugin.config as Partial); + setCacheConfig(editorConfig); + setServerCacheConfig(editorConfig); + setMode(inferMode(editorConfig)); + }, [plugins, semanticCachePlugin]); + + useEffect(() => { + if (providersError) { + toast.error(`Failed to load providers: ${getErrorMessage(providersError as any)}`); + } + }, [providersError]); + + // Surface validation problems inline rather than only on Save click. + const validationError = useMemo(() => validateForSave(cacheConfig, mode), [cacheConfig, mode]); + + // Only show the dimension/namespace heads-up when the user has actually + // touched a structural field. Showing it permanently in semantic mode + // trains users to ignore it; showing it on diff makes it land. + const hasStructuralChange = useMemo(() => { + return ( + cacheConfig.provider !== serverCacheConfig.provider || + cacheConfig.embedding_model !== serverCacheConfig.embedding_model || + cacheConfig.dimension !== serverCacheConfig.dimension + ); + }, [cacheConfig, serverCacheConfig]); + + const hasUnsavedConfigChanges = useMemo(() => { + const fields: (keyof EditorCacheConfig)[] = [ + "provider", + "embedding_model", + "dimension", + "ttl", + "threshold", + "conversation_history_threshold", + "exclude_system_prompt", + "cache_by_model", + "cache_by_provider", + "vector_store_namespace", + "default_cache_key", + ]; + const changed = fields.some((k) => (cacheConfig[k] ?? "") !== (serverCacheConfig[k] ?? "")); + const modeChanged = inferMode(serverCacheConfig) !== mode; + return changed || modeChanged; + }, [cacheConfig, serverCacheConfig, mode]); + + const updateLocal = (updates: Partial) => { + setCacheConfig((prev) => ({ ...prev, ...updates })); + }; + + // Toggle handler. Updates the semantic_cache plugin's enabled flag while + // keeping the last-saved config so the backend can ReloadPlugin/RemovePlugin + // based on the new flag. When toggling on for the first time and no plugin + // row exists, we seed it with the current editor config (direct-only by + // default) so the create call has a valid payload — the user can refine + // the config and Save afterwards. + const handleToggle = async (checked: boolean) => { + try { + if (semanticCachePlugin) { + await updatePlugin({ + name: SEMANTIC_CACHE_PLUGIN, + data: { enabled: checked, config: semanticCachePlugin.config }, + }).unwrap(); + } else { + // No plugin row + user toggling off ⇒ nothing to disable. + // Bail before the success toast so we don't lie about the state. + if (!checked) return; + const err = validateForSave(cacheConfig, mode); + if (err) { + toast.error(err); + return; + } + const payload = buildPayload(cacheConfig, mode); + await createPlugin({ + name: SEMANTIC_CACHE_PLUGIN, + enabled: true, + config: payload, + path: "", + }).unwrap(); + } + toast.success(checked ? "Local cache enabled" : "Local cache disabled"); + } catch (error) { + toast.error(`Failed to ${checked ? "enable" : "disable"} local cache: ${getErrorMessage(error)}`); + } + }; + + const handleSave = async () => { + const err = validateForSave(cacheConfig, mode); + if (err) { + toast.error(err); + return; + } + const payload = buildPayload(cacheConfig, mode); + try { + const updated = semanticCachePlugin + ? await updatePlugin({ + name: SEMANTIC_CACHE_PLUGIN, + data: { enabled: semanticCachePlugin.enabled, config: payload }, + }).unwrap() + : await createPlugin({ + name: SEMANTIC_CACHE_PLUGIN, + enabled: false, + config: payload, + path: "", + }).unwrap(); + const editor = toEditorCacheConfig(updated.config as Partial); + setCacheConfig(editor); + setServerCacheConfig(editor); + setMode(inferMode(editor)); + toast.success("Cache configuration updated"); + } catch (error) { + toast.error(`Failed to update cache configuration: ${getErrorMessage(error)}`); + } + }; + + const cachingActive = enabledOnServer && isVectorStoreEnabled; + const isLoading = configLoading || pluginsLoading; return ( -
+
-

Caching

-

Configure semantic caching for requests.

+

Local Cache

+

+ Cache responses locally with two complementary lookup paths: direct hash matching for exact replays, and semantic{" "} + similarity search for related content. Send the x-bf-cache-key header to scope cached responses to a tenant or feature.{" "} + {!isVectorStoreEnabled && ( + + Requires a vector store to be configured and enabled in config.json. + + )} +

- {isLoading && ( -
-

Loading configuration...

-
- )} - {configError !== undefined && (

Failed to load configuration

@@ -26,7 +286,397 @@ export default function CachingView() {
)} - {!isLoading && !configError && } + {isLoading && ( +
+ +
+ )} + + {!isLoading && !configError && ( +
+ {/* Enable toggle flips plugin.enabled on the semantic_cache + plugin row. The plugins API handles ReloadPlugin / + RemovePlugin transparently on update. */} +
+
+ +

+ Loads (or unloads) the plugin without a server restart. Configuration changes you make below mutate the live plugin in + place, no redeploy needed.{" "} +

+
+ +
+ + {providersLoading ? ( +
+ +
+ ) : ( + <> +
+ {/* Mode picker. Direct-only is first-class. */} +
+ + setMode(v as CacheMode)}> + + + Direct only + + + Direct + Semantic + + + +

+ {mode === "direct" ? ( + <> + Direct-only mode hashes each request and replays an exact match. No embeddings, no provider needed. Cheapest path, + perfect for stable prompts. + + ) : ( + <> + Direct + semantic mode adds vector similarity search on top of direct hash matching. Requires an embedding-capable + provider and the model's real dimension. Direct hits are still served first; semantic search runs only when the + direct lookup misses. + + )} +

+
+ + {validationError && ( +
+ {validationError} +
+ )} + + {/* Provider/model/dimension only appear in semantic mode. */} + {mode === "semantic" && ( + <> + {hasStructuralChange && ( +
+ Heads up: a vector store namespace can only hold vectors of one dimension. Whenever you change the + embedding provider, model, or dimension, make sure the dimension still matches what the + model produces, otherwise writes to the existing namespace will fail and reads will silently miss. The namespace is{" "} + not recreated automatically; either use a fresh namespace or drop the existing class/index in your vector + store before saving. +
+ )} + +
+

Embedding Provider & Model

+
+
+ + +
+
+ + updateLocal({ embedding_model: model })} + placeholder={cacheConfig.provider ? "Search or type an embedding model..." : "Select a provider first"} + disabled={!cacheConfig.provider} + /> +
+
+

+ API keys are inherited from the embedding provider's main configuration, you don't need to add them again + here. +

+
+ + { + const value = e.target.value; + if (value === "") { + updateLocal({ dimension: undefined }); + return; + } + const parsed = parseInt(value); + if (!Number.isNaN(parsed)) { + updateLocal({ dimension: parsed }); + } + }} + /> +

+ Vector size produced by the embedding model. Must match the model exactly (e.g. 1536 for OpenAI{" "} + text-embedding-3-small, 3072 for text-embedding-3-large, 768{" "} + for many Cohere/Voyage models). +

+
+
+ + )} + + {/* Cache settings shared across modes. */} +
+

Cache Settings

+
+
+ + { + const value = e.target.value; + if (value === "") { + updateLocal({ ttl: undefined }); + return; + } + const parsed = parseInt(value); + if (!Number.isNaN(parsed)) { + updateLocal({ ttl: parsed }); + } + }} + /> +

+ How long cached entries live before they expire. Override per-request via the x-bf-cache-ttl header. +

+
+ {mode === "semantic" && ( +
+ + { + const value = e.target.value; + if (value === "") { + updateLocal({ threshold: undefined }); + return; + } + const parsed = parseFloat(value); + if (!Number.isNaN(parsed)) { + updateLocal({ threshold: parsed }); + } + }} + /> +

+ Minimum cosine similarity for a semantic hit. Override per-request via x-bf-cache-threshold. +

+
+ )} +
+
+ + {/* Storage & Cache Key. */} +
+

Storage & Cache Key

+
+
+ + updateLocal({ vector_store_namespace: e.target.value })} + /> +

+ Bucket/index name where cache entries live. Leave blank to use the default (BifrostLocalCachePlugin). + Changing this points the plugin at a different (possibly empty) bucket. Old entries are not deleted, they just stop + being queried. +

+
+
+ + updateLocal({ default_cache_key: e.target.value })} + /> +

+ Fallback partition key used when a request doesn't set x-bf-cache-key. Cache keys isolate entries: same + key ↔ shared cache pool. Leave blank to disable caching for any request that doesn't send the header. +

+
+
+
+ + {/* Conversation Settings. */} +
+

Conversation Settings

+
+
+ + { + const value = e.target.value; + if (value === "") { + updateLocal({ conversation_history_threshold: undefined }); + return; + } + const parsed = parseInt(value); + if (!Number.isNaN(parsed)) { + updateLocal({ conversation_history_threshold: parsed }); + } + }} + /> +

+ Skip caching for conversations with more than this many messages. Long histories rarely match exactly and inflate + the cache without paying off. +

+
+
+
+
+
+ +

Strip system messages from the cache key.

+
+ updateLocal({ exclude_system_prompt: checked })} + size="md" + /> +
+
+
+ + {/* Cache Behavior applies to both modes. */} +
+

Cache Key Composition

+
+
+
+ +

+ Include model name in the cache key. Different models won't share cached responses. +

+
+ updateLocal({ cache_by_model: checked })} + size="md" + /> +
+
+
+ +

+ Include provider name in the cache key. Different providers won't share cached responses. +

+
+ updateLocal({ cache_by_provider: checked })} + size="md" + /> +
+
+
+ +
+ +
    +
  • + x-bf-cache-key: scope this request to a specific cache partition. +
  • +
  • + x-bf-cache-ttl: override TTL for just this request. +
  • +
  • + x-bf-cache-threshold: override the semantic similarity threshold. +
  • +
  • + x-bf-cache-type: send direct or semantic to limit lookup to one path. +
  • +
  • + x-bf-cache-no-store: true to skip writing the response (still serves cached hits). +
  • +
+
+
+ +
+ +
+ + )} +
+ )}
); } \ No newline at end of file diff --git a/ui/app/workspace/config/views/loggingView.tsx b/ui/app/workspace/config/views/loggingView.tsx index 55eaf6c083..e9a9af94b8 100644 --- a/ui/app/workspace/config/views/loggingView.tsx +++ b/ui/app/workspace/config/views/loggingView.tsx @@ -144,12 +144,12 @@ export default function LoggingView() {

When enabled, individual requests can override the global content logging setting using the{" "} - x-bf-disable-content-logging header or context key, and can opt-in to persisting raw provider - bytes in logs using the x-bf-store-raw-request-response header. Raw-byte storage requires - content logging to be on — either globally, or via{" "} - x-bf-disable-content-logging: false on the same request. If content logging is off, raw bytes - are dropped from the log record even when x-bf-store-raw-request-response: true. Does not - control sending raw bytes back to callers — see Allow Per-Request Raw Override. + x-bf-disable-content-logging header or context key, and can opt-in to persisting raw + provider bytes in logs using the x-bf-store-raw-request-response header. Raw-byte storage + requires content logging to be on — either globally, or via{" "} + x-bf-disable-content-logging: false on the same request. If content logging is off, raw + bytes are dropped from the log record even when x-bf-store-raw-request-response: true. Does + not control sending raw bytes back to callers — see Allow Per-Request Raw Override.

-
- -

- When enabled, individual requests can send raw provider request/response bytes back to the caller using the{" "} - x-bf-send-back-raw-request and{" "} - x-bf-send-back-raw-response headers. Does not affect log storage — raw-byte persistence in logs is controlled by Allow Per-Request Content Storage Override. -

-
- handleConfigChange("allow_per_request_raw_override", checked)} - /> +
+ +

+ When enabled, individual requests can send raw provider request/response bytes back to the caller using the{" "} + x-bf-send-back-raw-request and x-bf-send-back-raw-response{" "} + headers. Does not affect log storage — raw-byte persistence in logs is controlled by Allow Per-Request Content Storage + Override. +

+ handleConfigChange("allow_per_request_raw_override", checked)} + /> +
{/* Log Retention Days */} {localConfig.enable_logging && bifrostConfig?.is_logs_connected && ( diff --git a/ui/app/workspace/config/views/mcpView.tsx b/ui/app/workspace/config/views/mcpView.tsx index 3bed1d4f1c..59baf54c9a 100644 --- a/ui/app/workspace/config/views/mcpView.tsx +++ b/ui/app/workspace/config/views/mcpView.tsx @@ -1,19 +1,9 @@ import { Button } from "@/components/ui/button"; import { EnvVarInput } from "@/components/ui/envVarInput"; import { Input } from "@/components/ui/input"; -import { - Select, - SelectContent, - SelectItem, - SelectTrigger, - SelectValue, -} from "@/components/ui/select"; +import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from "@/components/ui/select"; import { Switch } from "@/components/ui/switch"; -import { - getErrorMessage, - useGetCoreConfigQuery, - useUpdateCoreConfigMutation, -} from "@/lib/store"; +import { getErrorMessage, useGetCoreConfigQuery, useUpdateCoreConfigMutation } from "@/lib/store"; import { CoreConfig, DefaultCoreConfig } from "@/lib/types/config"; import { EnvVar } from "@/lib/types/schemas"; import { RbacOperation, RbacResource, useRbac } from "@enterprise/lib"; @@ -21,406 +11,339 @@ import { useCallback, useEffect, useMemo, useState } from "react"; import { toast } from "sonner"; const envVarEquals = (a?: EnvVar, b?: EnvVar) => - (a?.value ?? "") === (b?.value ?? "") && - (a?.env_var ?? "") === (b?.env_var ?? "") && - (a?.from_env ?? false) === (b?.from_env ?? false); + (a?.value ?? "") === (b?.value ?? "") && (a?.env_var ?? "") === (b?.env_var ?? "") && (a?.from_env ?? false) === (b?.from_env ?? false); export default function MCPView() { - const hasSettingsUpdateAccess = useRbac( - RbacResource.Settings, - RbacOperation.Update, - ); - const { data: bifrostConfig } = useGetCoreConfigQuery({ fromDB: true }); - const config = bifrostConfig?.client_config; - const [updateCoreConfig, { isLoading }] = useUpdateCoreConfigMutation(); - const [localConfig, setLocalConfig] = useState(DefaultCoreConfig); + const hasSettingsUpdateAccess = useRbac(RbacResource.Settings, RbacOperation.Update); + const { data: bifrostConfig } = useGetCoreConfigQuery({ fromDB: true }); + const config = bifrostConfig?.client_config; + const [updateCoreConfig, { isLoading }] = useUpdateCoreConfigMutation(); + const [localConfig, setLocalConfig] = useState(DefaultCoreConfig); - const [localValues, setLocalValues] = useState<{ - mcp_agent_depth: string; - mcp_tool_execution_timeout: string; - mcp_code_mode_binding_level: string; - mcp_tool_sync_interval: string; - }>({ - mcp_agent_depth: "10", - mcp_tool_execution_timeout: "30", - mcp_code_mode_binding_level: "server", - mcp_tool_sync_interval: "10", - }); + const [localValues, setLocalValues] = useState<{ + mcp_agent_depth: string; + mcp_tool_execution_timeout: string; + mcp_code_mode_binding_level: string; + mcp_tool_sync_interval: string; + }>({ + mcp_agent_depth: "10", + mcp_tool_execution_timeout: "30", + mcp_code_mode_binding_level: "server", + mcp_tool_sync_interval: "10", + }); - useEffect(() => { - if (bifrostConfig && config) { - setLocalConfig(config); - setLocalValues({ - mcp_agent_depth: config?.mcp_agent_depth?.toString() || "10", - mcp_tool_execution_timeout: - config?.mcp_tool_execution_timeout?.toString() || "30", - mcp_code_mode_binding_level: - config?.mcp_code_mode_binding_level || "server", - mcp_tool_sync_interval: - config?.mcp_tool_sync_interval?.toString() || "10", - }); - } - }, [config, bifrostConfig]); + useEffect(() => { + if (bifrostConfig && config) { + setLocalConfig(config); + setLocalValues({ + mcp_agent_depth: config?.mcp_agent_depth?.toString() || "10", + mcp_tool_execution_timeout: config?.mcp_tool_execution_timeout?.toString() || "30", + mcp_code_mode_binding_level: config?.mcp_code_mode_binding_level || "server", + mcp_tool_sync_interval: config?.mcp_tool_sync_interval?.toString() || "10", + }); + } + }, [config, bifrostConfig]); - const hasChanges = useMemo(() => { - if (!config) return false; - const serverURLChanged = !envVarEquals( - localConfig.mcp_external_server_url, - config.mcp_external_server_url, - ); - const clientURLChanged = !envVarEquals( - localConfig.mcp_external_client_url, - config.mcp_external_client_url, - ); - return ( - localConfig.mcp_agent_depth !== config.mcp_agent_depth || - localConfig.mcp_tool_execution_timeout !== - config.mcp_tool_execution_timeout || - localConfig.mcp_code_mode_binding_level !== - (config.mcp_code_mode_binding_level || "server") || - localConfig.mcp_tool_sync_interval !== - (config.mcp_tool_sync_interval ?? 10) || - localConfig.mcp_disable_auto_tool_inject !== - (config.mcp_disable_auto_tool_inject ?? false) || - serverURLChanged || - clientURLChanged - ); - }, [config, localConfig]); + const hasChanges = useMemo(() => { + if (!config) return false; + const serverURLChanged = !envVarEquals(localConfig.mcp_external_server_url, config.mcp_external_server_url); + const clientURLChanged = !envVarEquals(localConfig.mcp_external_client_url, config.mcp_external_client_url); + return ( + localConfig.mcp_agent_depth !== config.mcp_agent_depth || + localConfig.mcp_tool_execution_timeout !== config.mcp_tool_execution_timeout || + localConfig.mcp_code_mode_binding_level !== (config.mcp_code_mode_binding_level || "server") || + localConfig.mcp_tool_sync_interval !== (config.mcp_tool_sync_interval ?? 10) || + localConfig.mcp_disable_auto_tool_inject !== (config.mcp_disable_auto_tool_inject ?? false) || + serverURLChanged || + clientURLChanged + ); + }, [config, localConfig]); - const handleAgentDepthChange = useCallback((value: string) => { - setLocalValues((prev) => ({ ...prev, mcp_agent_depth: value })); - const numValue = Number.parseInt(value); - if (!isNaN(numValue) && numValue > 0) { - setLocalConfig((prev) => ({ ...prev, mcp_agent_depth: numValue })); - } - }, []); + const handleAgentDepthChange = useCallback((value: string) => { + setLocalValues((prev) => ({ ...prev, mcp_agent_depth: value })); + const numValue = Number.parseInt(value); + if (!isNaN(numValue) && numValue > 0) { + setLocalConfig((prev) => ({ ...prev, mcp_agent_depth: numValue })); + } + }, []); - const handleToolExecutionTimeoutChange = useCallback((value: string) => { - setLocalValues((prev) => ({ ...prev, mcp_tool_execution_timeout: value })); - const numValue = Number.parseInt(value); - if (!isNaN(numValue) && numValue > 0) { - setLocalConfig((prev) => ({ - ...prev, - mcp_tool_execution_timeout: numValue, - })); - } - }, []); + const handleToolExecutionTimeoutChange = useCallback((value: string) => { + setLocalValues((prev) => ({ ...prev, mcp_tool_execution_timeout: value })); + const numValue = Number.parseInt(value); + if (!isNaN(numValue) && numValue > 0) { + setLocalConfig((prev) => ({ + ...prev, + mcp_tool_execution_timeout: numValue, + })); + } + }, []); - const handleCodeModeBindingLevelChange = useCallback((value: string) => { - setLocalValues((prev) => ({ ...prev, mcp_code_mode_binding_level: value })); - if (value === "server" || value === "tool") { - setLocalConfig((prev) => ({ - ...prev, - mcp_code_mode_binding_level: value, - })); - } - }, []); + const handleCodeModeBindingLevelChange = useCallback((value: string) => { + setLocalValues((prev) => ({ ...prev, mcp_code_mode_binding_level: value })); + if (value === "server" || value === "tool") { + setLocalConfig((prev) => ({ + ...prev, + mcp_code_mode_binding_level: value, + })); + } + }, []); - const handleToolSyncIntervalChange = useCallback((value: string) => { - setLocalValues((prev) => ({ ...prev, mcp_tool_sync_interval: value })); - const numValue = Number.parseInt(value); - if (!isNaN(numValue) && numValue >= 0) { - setLocalConfig((prev) => ({ ...prev, mcp_tool_sync_interval: numValue })); - } - }, []); + const handleToolSyncIntervalChange = useCallback((value: string) => { + setLocalValues((prev) => ({ ...prev, mcp_tool_sync_interval: value })); + const numValue = Number.parseInt(value); + if (!isNaN(numValue) && numValue >= 0) { + setLocalConfig((prev) => ({ ...prev, mcp_tool_sync_interval: numValue })); + } + }, []); - const handleDisableAutoToolInjectChange = useCallback((checked: boolean) => { - setLocalConfig((prev) => ({ - ...prev, - mcp_disable_auto_tool_inject: checked, - })); - }, []); + const handleDisableAutoToolInjectChange = useCallback((checked: boolean) => { + setLocalConfig((prev) => ({ + ...prev, + mcp_disable_auto_tool_inject: checked, + })); + }, []); - const handleServerURLChange = useCallback((value: EnvVar) => { - setLocalConfig((prev) => ({ ...prev, mcp_external_server_url: value })); - }, []); + const handleServerURLChange = useCallback((value: EnvVar) => { + setLocalConfig((prev) => ({ ...prev, mcp_external_server_url: value })); + }, []); - const handleClientURLChange = useCallback((value: EnvVar) => { - setLocalConfig((prev) => ({ ...prev, mcp_external_client_url: value })); - }, []); + const handleClientURLChange = useCallback((value: EnvVar) => { + setLocalConfig((prev) => ({ ...prev, mcp_external_client_url: value })); + }, []); - const handleSave = useCallback(async () => { - try { - const agentDepth = Number.parseInt(localValues.mcp_agent_depth); - const toolTimeout = Number.parseInt( - localValues.mcp_tool_execution_timeout, - ); + const handleSave = useCallback(async () => { + try { + const agentDepth = Number.parseInt(localValues.mcp_agent_depth); + const toolTimeout = Number.parseInt(localValues.mcp_tool_execution_timeout); - if (isNaN(agentDepth) || agentDepth <= 0) { - toast.error("Max agent depth must be a positive number."); - return; - } + if (isNaN(agentDepth) || agentDepth <= 0) { + toast.error("Max agent depth must be a positive number."); + return; + } - if (isNaN(toolTimeout) || toolTimeout <= 0) { - toast.error("Tool execution timeout must be a positive number."); - return; - } + if (isNaN(toolTimeout) || toolTimeout <= 0) { + toast.error("Tool execution timeout must be a positive number."); + return; + } - if (!bifrostConfig) { - toast.error("Configuration not loaded. Please refresh and try again."); - return; - } - await updateCoreConfig({ - ...bifrostConfig, - client_config: localConfig, - }).unwrap(); - toast.success("MCP settings updated successfully."); - } catch (error) { - toast.error(getErrorMessage(error)); - } - }, [bifrostConfig, localConfig, localValues, updateCoreConfig]); + if (!bifrostConfig) { + toast.error("Configuration not loaded. Please refresh and try again."); + return; + } + await updateCoreConfig({ + ...bifrostConfig, + client_config: localConfig, + }).unwrap(); + toast.success("MCP settings updated successfully."); + } catch (error) { + toast.error(getErrorMessage(error)); + } + }, [bifrostConfig, localConfig, localValues, updateCoreConfig]); - return ( -
-
-

MCP Settings

-

- Configure MCP (Model Context Protocol) agent and tool settings. -

-
-
- {/* Max Agent Depth */} -
-
- -

- Maximum depth for MCP agent execution. -

-
- handleAgentDepthChange(e.target.value)} - min="1" - /> -
+ return ( +
+
+

MCP Settings

+

Configure MCP (Model Context Protocol) agent and tool settings.

+
+
+ {/* Max Agent Depth */} +
+
+ +

Maximum depth for MCP agent execution.

+
+ handleAgentDepthChange(e.target.value)} + min="1" + /> +
- {/* Tool Execution Timeout */} -
-
- -

- Maximum time in seconds for tool execution. -

-
- handleToolExecutionTimeoutChange(e.target.value)} - min="1" - /> -
+ {/* Tool Execution Timeout */} +
+
+ +

Maximum time in seconds for tool execution.

+
+ handleToolExecutionTimeoutChange(e.target.value)} + min="1" + /> +
- {/* Tool Sync Interval */} -
-
- -

- How often to refresh tool lists from MCP servers. Set to 0 to - disable. -

-
- handleToolSyncIntervalChange(e.target.value)} - min="0" - /> -
+ {/* Tool Sync Interval */} +
+
+ +

How often to refresh tool lists from MCP servers. Set to 0 to disable.

+
+ handleToolSyncIntervalChange(e.target.value)} + min="0" + /> +
- {/* Disable Auto Tool Injection */} -
-
- -

- When enabled, MCP tools are not automatically included in every - request. Tools are only injected when explicitly specified via - request headers ( - x-bf-mcp-include-tools) and still - must be allowed by the virtual key MCP configuration. -

-
- -
+ {/* Disable Auto Tool Injection */} +
+
+ +

+ When enabled, MCP tools are not automatically included in every request. Tools are only injected when explicitly specified via + request headers (x-bf-mcp-include-tools) and still must be allowed by the virtual key MCP + configuration. +

+
+ +
- {/* Code Mode Binding Level */} -
-
- -

- How tools are exposed in the VFS: server-level (all tools per - server) or tool-level (individual tools). -

-
- + {/* Code Mode Binding Level */} +
+
+ +

+ How tools are exposed in the VFS: server-level (all tools per server) or tool-level (individual tools). +

+
+ - {/* Visual Example */} -
-

- VFS Structure: -

+ {/* Visual Example */} +
+

VFS Structure:

- {localValues.mcp_code_mode_binding_level === "server" ? ( -
-
-
servers/
-
├─ calculator.py
-
├─ youtube.py
-
└─ weather.py
-
-

- All tools per server in a single .py file -

-
- ) : ( -
-
-
servers/
-
├─ calculator/
-
├─ add.py
-
└─ subtract.py
-
├─ youtube/
-
├─ GET_CHANNELS.py
-
└─ SEARCH_VIDEOS.py
-
└─ weather/
-
└─ get_forecast.py
-
-

- Individual .py file for each tool -

-
- )} -
-
- {/* External Base URLs */} -
-
-

External Base URLs

-

- Override Bifrost's public base URL when it runs behind a reverse proxy. In most setups - both URLs are the same — leave them blank to derive the URL from the incoming{" "} - Host header. Both fields support env var syntax (e.g.{" "} - env.BIFROST_EXTERNAL_URL). -

-
+ {localValues.mcp_code_mode_binding_level === "server" ? ( +
+
+
servers/
+
├─ calculator.py
+
├─ youtube.py
+
└─ weather.py
+
+

All tools per server in a single .py file

+
+ ) : ( +
+
+
servers/
+
├─ calculator/
+
├─ add.py
+
└─ subtract.py
+
├─ youtube/
+
├─ GET_CHANNELS.py
+
└─ SEARCH_VIDEOS.py
+
└─ weather/
+
└─ get_forecast.py
+
+

Individual .py file for each tool

+
+ )} +
+
+ {/* External Base URLs */} +
+
+

External Base URLs

+

+ Override Bifrost's public base URL when it runs behind a reverse proxy. In most setups both URLs are the same —{" "} + leave them blank to derive the URL from the incoming Host header. Both fields support + env var syntax (e.g. env.BIFROST_EXTERNAL_URL). +

+
-
-
- -

- Advertised in OAuth server metadata that downstream clients read about - Bifrost — e.g. /.well-known/oauth-authorization-server{" "} - and the WWW-Authenticate header on{" "} - /mcp. Example: Claude Code connects to{" "} - https://bifrost.example.com/mcp and discovers the - authorize/token endpoints from this URL. -

-
- -
+
+
+ +

+ Advertised in OAuth server metadata that downstream clients read about Bifrost — e.g.{" "} + /.well-known/oauth-authorization-server and the{" "} + WWW-Authenticate header on /mcp. Example: Claude Code + connects to https://bifrost.example.com/mcp and discovers the authorize/token endpoints + from this URL. +

+
+ +
-
-
- -

- Used as the redirect_uri Bifrost registers with{" "} - upstream OAuth providers when it acts as a client to an MCP server. - Example: when a user connects an MCP server like Notion or Jira, this is the URL - Notion/Jira will redirect the browser to after login ( - {"/api/oauth/callback"}). -

-

- Heads up: changing this after MCP clients have already completed OAuth - will break them. The upstream provider locks the redirect_uri{" "} - to whatever was registered initially, so existing clients will fail with{" "} - "Invalid redirect URI". Clear the stored OAuth client credentials - for affected MCP servers and re-authorize so Bifrost re-runs Dynamic Client Registration - with the new URL. -

-
- -
-
-
-
- -
-
- ); -} +
+
+ +

+ Used as the redirect_uri Bifrost registers with upstream OAuth providers{" "} + when it acts as a client to an MCP server. Example: when a user connects an MCP server like Notion or Jira, this is the URL + Notion/Jira will redirect the browser to after login ({"/api/oauth/callback"}). +

+

+ Heads up: changing this after MCP clients have already completed OAuth will break them. The upstream + provider locks the redirect_uri to whatever was registered initially, so existing clients + will fail with "Invalid redirect URI". Clear the stored OAuth client credentials for affected MCP servers + and re-authorize so Bifrost re-runs Dynamic Client Registration with the new URL. +

+
+ +
+
+
+
+ +
+
+ ); +} \ No newline at end of file diff --git a/ui/app/workspace/config/views/pluginsForm.tsx b/ui/app/workspace/config/views/pluginsForm.tsx deleted file mode 100644 index dcd459de4c..0000000000 --- a/ui/app/workspace/config/views/pluginsForm.tsx +++ /dev/null @@ -1,464 +0,0 @@ -import { Button } from "@/components/ui/button"; -import { Card, CardContent } from "@/components/ui/card"; -import { Input } from "@/components/ui/input"; -import { Label } from "@/components/ui/label"; -import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from "@/components/ui/select"; -import { Separator } from "@/components/ui/separator"; -import { Switch } from "@/components/ui/switch"; -import { getProviderLabel } from "@/lib/constants/logs"; -import { getErrorMessage, useCreatePluginMutation, useGetPluginsQuery, useGetProvidersQuery, useUpdatePluginMutation } from "@/lib/store"; -import { CacheConfig, EditorCacheConfig, ModelProviderName } from "@/lib/types/config"; -import { SEMANTIC_CACHE_PLUGIN } from "@/lib/types/plugins"; -import { cacheConfigSchema } from "@/lib/types/schemas"; -import { Loader2 } from "lucide-react"; -import { useEffect, useMemo, useState } from "react"; -import { toast } from "sonner"; - -const defaultCacheConfig: EditorCacheConfig = { - ttl_seconds: 300, - threshold: 0.8, - conversation_history_threshold: 3, - exclude_system_prompt: false, - cache_by_model: true, - cache_by_provider: true, -}; - -const toEditorCacheConfig = (config?: Partial): EditorCacheConfig => ({ - ...defaultCacheConfig, - ...config, -}); - -const normalizeCacheConfigForSave = (config: EditorCacheConfig) => { - const normalized: Record = { - ttl_seconds: config.ttl_seconds, - threshold: config.threshold, - cache_by_model: config.cache_by_model, - cache_by_provider: config.cache_by_provider, - }; - - if (config.conversation_history_threshold !== undefined) { - normalized.conversation_history_threshold = config.conversation_history_threshold; - } - if (config.exclude_system_prompt !== undefined) { - normalized.exclude_system_prompt = config.exclude_system_prompt; - } - if (config.created_at !== undefined) { - normalized.created_at = config.created_at; - } - if (config.updated_at !== undefined) { - normalized.updated_at = config.updated_at; - } - - const provider = config.provider?.trim(); - const embeddingModel = config.embedding_model?.trim(); - - if (provider) { - normalized.provider = provider; - } - if (embeddingModel) { - normalized.embedding_model = embeddingModel; - } - if (config.dimension !== undefined) { - normalized.dimension = config.dimension; - } - - return normalized; -}; - -interface PluginsFormProps { - isVectorStoreEnabled: boolean; -} - -export default function PluginsForm({ isVectorStoreEnabled }: PluginsFormProps) { - const [cacheConfig, setCacheConfig] = useState(defaultCacheConfig); - const [originalCacheEnabled, setOriginalCacheEnabled] = useState(false); - const [serverCacheConfig, setServerCacheConfig] = useState(defaultCacheConfig); - const [serverCacheEnabled, setServerCacheEnabled] = useState(false); - - const { data: providersData, error: providersError, isLoading: providersLoading } = useGetProvidersQuery(); - - const providers = useMemo(() => providersData || [], [providersData]); - - useEffect(() => { - if (providersError) { - toast.error(`Failed to load providers: ${getErrorMessage(providersError as any)}`); - } - }, [providersError]); - - // RTK Query hooks - const { data: plugins, isLoading: loading } = useGetPluginsQuery(); - const [updatePlugin, { isLoading: isUpdating }] = useUpdatePluginMutation(); - const [createPlugin, { isLoading: isCreating }] = useCreatePluginMutation(); - - // Get semantic cache plugin and its config - const semanticCachePlugin = useMemo(() => plugins?.find((plugin) => plugin.name === SEMANTIC_CACHE_PLUGIN), [plugins]); - - const isSemanticCacheEnabled = Boolean(semanticCachePlugin?.enabled); - const loadedDirectOnlyConfig = serverCacheConfig.dimension === 1 && !serverCacheConfig.provider; - const hasInvalidProviderBackedDimension = cacheConfig.dimension === 1 && Boolean(cacheConfig.provider?.trim()); - - // Initialize cache config from plugin data - useEffect(() => { - if (semanticCachePlugin?.config) { - const config = toEditorCacheConfig(semanticCachePlugin.config as Partial); - setCacheConfig(config); - setServerCacheConfig(config); - setOriginalCacheEnabled(semanticCachePlugin.enabled); - setServerCacheEnabled(semanticCachePlugin.enabled); - } - }, [semanticCachePlugin]); - - // Update default provider when providers are loaded (only for new configs) - useEffect(() => { - if (providers.length > 0 && !semanticCachePlugin?.config) { - setCacheConfig((prev) => ({ - ...prev, - provider: providers[0].name as ModelProviderName, - embedding_model: prev.embedding_model ?? "text-embedding-3-small", - dimension: prev.dimension ?? 1536, - })); - } - }, [providers, semanticCachePlugin?.config]); - - const hasChanges = useMemo(() => { - if (originalCacheEnabled !== serverCacheEnabled) return true; - - return ( - cacheConfig.provider !== serverCacheConfig.provider || - cacheConfig.embedding_model !== serverCacheConfig.embedding_model || - cacheConfig.dimension !== serverCacheConfig.dimension || - cacheConfig.ttl_seconds !== serverCacheConfig.ttl_seconds || - cacheConfig.threshold !== serverCacheConfig.threshold || - cacheConfig.conversation_history_threshold !== serverCacheConfig.conversation_history_threshold || - cacheConfig.exclude_system_prompt !== serverCacheConfig.exclude_system_prompt || - cacheConfig.cache_by_model !== serverCacheConfig.cache_by_model || - cacheConfig.cache_by_provider !== serverCacheConfig.cache_by_provider - ); - }, [cacheConfig, serverCacheConfig, originalCacheEnabled, serverCacheEnabled]); - - // Handle semantic cache toggle (create or update) - const handleSemanticCacheToggle = (enabled: boolean) => { - setOriginalCacheEnabled(enabled); - }; - - // Update cache config locally - const updateCacheConfigLocal = (updates: Partial) => { - setCacheConfig((prev) => ({ ...prev, ...updates })); - }; - - // Save all changes - const handleSave = async () => { - if (hasInvalidProviderBackedDimension) { - toast.error( - "Provider-backed semantic cache requires the embedding model's real dimension. Use a value greater than 1, or remove the provider to keep direct-only mode.", - ); - return; - } - - const parseResult = cacheConfigSchema.safeParse(normalizeCacheConfigForSave(cacheConfig)); - if (!parseResult.success) { - const firstIssue = parseResult.error.issues[0]?.message ?? "Semantic cache configuration is invalid."; - toast.error(firstIssue); - return; - } - - const savedConfig = parseResult.data as CacheConfig; - - try { - if (semanticCachePlugin) { - // Update existing plugin - await updatePlugin({ - name: SEMANTIC_CACHE_PLUGIN, - data: { enabled: originalCacheEnabled, config: savedConfig }, - }).unwrap(); - } else { - // Create new plugin - await createPlugin({ - name: SEMANTIC_CACHE_PLUGIN, - enabled: originalCacheEnabled, - config: savedConfig, - path: "", - }).unwrap(); - } - toast.success("Plugin configuration updated successfully"); - // Update server state to match current state - const normalizedConfig = toEditorCacheConfig(savedConfig); - setCacheConfig(normalizedConfig); - setServerCacheConfig(normalizedConfig); - setServerCacheEnabled(originalCacheEnabled); - } catch (error) { - const errorMessage = getErrorMessage(error); - toast.error(`Failed to update plugin configuration: ${errorMessage}`); - } - }; - - if (loading) { - return ( - - -
Loading plugins configuration...
-
-
- ); - } - - return ( -
- {/* Semantic Cache Toggle */} -
-
-
- -

- Enable semantic caching for requests. Send x-bf-cache-key header with requests to use semantic caching.{" "} - {!isVectorStoreEnabled && ( - Requires vector store to be configured and enabled in config.json. - )} - {!providersLoading && providers?.length === 0 && ( - Requires at least one provider to be configured. - )} -

-
-
- { - if (isVectorStoreEnabled) { - handleSemanticCacheToggle(checked); - } - }} - /> - {(isSemanticCacheEnabled || originalCacheEnabled) && ( - - )} -
-
- - {/* Cache Configuration (only show when enabled) */} - {originalCacheEnabled && - isVectorStoreEnabled && - (providersLoading ? ( -
- -
- ) : ( -
- - {loadedDirectOnlyConfig && ( -
- This plugin was loaded in direct-only mode via config.json. The Web UI currently edits provider-backed - semantic cache settings; keep using config.json if you want to stay in direct-only mode. -
- )} - {hasInvalidProviderBackedDimension && ( -
- You selected a provider while keeping dimension: 1. That is only valid for direct-only mode. Set the - embedding model's real dimension before saving, or remove the provider to stay in direct-only mode. -
- )} - {/* Provider and Model Settings */} -
-

Provider and Model Settings

-
-
- - -
-
- - updateCacheConfigLocal({ embedding_model: e.target.value })} - /> -
-
-
- - {/* Cache Settings */} -
-

Cache Settings

-
-
- - { - const value = e.target.value; - if (value === "") { - updateCacheConfigLocal({ ttl_seconds: undefined }); - return; - } - const parsed = parseInt(value); - if (!Number.isNaN(parsed)) { - updateCacheConfigLocal({ ttl_seconds: parsed }); - } - }} - /> -
-
- - { - const value = e.target.value; - if (value === "") { - updateCacheConfigLocal({ threshold: undefined }); - return; - } - const parsed = parseFloat(value); - if (!Number.isNaN(parsed)) { - updateCacheConfigLocal({ threshold: parsed }); - } - }} - /> -
-
- - { - const value = e.target.value; - if (value === "") { - updateCacheConfigLocal({ dimension: undefined }); - return; - } - const parsed = parseInt(value); - if (!Number.isNaN(parsed)) { - updateCacheConfigLocal({ dimension: parsed }); - } - }} - /> -
-
-

- API keys for the embedding provider will be inherited from the main provider configuration. The semantic cache will use - the configured provider's keys automatically. -

-
- - {/* Conversation Settings */} -
-

Conversation Settings

-
-
- - updateCacheConfigLocal({ conversation_history_threshold: parseInt(e.target.value) || 3 })} - /> -

- Skip caching for conversations with more than this number of messages (prevents false positives) -

-
-
-
-
-
- -

Exclude system messages from cache key generation

-
- updateCacheConfigLocal({ exclude_system_prompt: checked })} - size="md" - /> -
-
-
- - {/* Cache Behavior */} -
-

Cache Behavior

-
-
-
- -

Include model name in cache key

-
- updateCacheConfigLocal({ cache_by_model: checked })} - size="md" - /> -
-
-
- -

Include provider name in cache key

-
- updateCacheConfigLocal({ cache_by_provider: checked })} - size="md" - /> -
-
-
- -
- -
    -
  • - You can pass x-bf-cache-ttl header with requests to use request-specific TTL. -
  • -
  • - You can pass x-bf-cache-threshold header with requests to use request-specific similarity threshold. -
  • -
  • - You can pass x-bf-cache-type header with "direct" or "semantic" to control cache behavior. -
  • -
  • - You can pass x-bf-cache-no-store header with "true" to disable response caching. -
  • -
-
-
- ))} -
-
- ); -} \ No newline at end of file diff --git a/ui/app/workspace/config/views/securityView.tsx b/ui/app/workspace/config/views/securityView.tsx index 7deab3011e..eb2b66ce1c 100644 --- a/ui/app/workspace/config/views/securityView.tsx +++ b/ui/app/workspace/config/views/securityView.tsx @@ -19,444 +19,401 @@ import { useCallback, useEffect, useMemo, useState } from "react"; import { toast } from "sonner"; export default function SecurityView() { - const hasSettingsUpdateAccess = useRbac(RbacResource.Settings, RbacOperation.Update); - const { data: bifrostConfig } = useGetCoreConfigQuery({ fromDB: true }); - const { - data: authType, - isLoading: authTypeLoading, - error: authTypeError, - } = useGetAuthTypeQuery(undefined, { skip: !IS_ENTERPRISE }); - const config = bifrostConfig?.client_config; - const [updateCoreConfig, { isLoading }] = useUpdateCoreConfigMutation(); - const [localConfig, setLocalConfig] = useState(DefaultCoreConfig); - const showPasswordSection = - !IS_ENTERPRISE || (!authTypeLoading && !authTypeError && authType?.type !== "sso"); + const hasSettingsUpdateAccess = useRbac(RbacResource.Settings, RbacOperation.Update); + const { data: bifrostConfig } = useGetCoreConfigQuery({ fromDB: true }); + const { data: authType, isLoading: authTypeLoading, error: authTypeError } = useGetAuthTypeQuery(undefined, { skip: !IS_ENTERPRISE }); + const config = bifrostConfig?.client_config; + const [updateCoreConfig, { isLoading }] = useUpdateCoreConfigMutation(); + const [localConfig, setLocalConfig] = useState(DefaultCoreConfig); + const showPasswordSection = !IS_ENTERPRISE || (!authTypeLoading && !authTypeError && authType?.type !== "sso"); - const [localValues, setLocalValues] = useState<{ - allowed_origins: string; - allowed_headers: string; - required_headers: string; - whitelisted_routes: string; - }>({ - allowed_origins: "", - allowed_headers: "", - required_headers: "", - whitelisted_routes: "", - }); + const [localValues, setLocalValues] = useState<{ + allowed_origins: string; + allowed_headers: string; + required_headers: string; + whitelisted_routes: string; + }>({ + allowed_origins: "", + allowed_headers: "", + required_headers: "", + whitelisted_routes: "", + }); - const [authConfig, setAuthConfig] = useState({ - admin_username: { value: "", env_var: "", from_env: false }, - admin_password: { value: "", env_var: "", from_env: false }, - is_enabled: false, - disable_auth_on_inference: true, - }); + const [authConfig, setAuthConfig] = useState({ + admin_username: { value: "", env_var: "", from_env: false }, + admin_password: { value: "", env_var: "", from_env: false }, + is_enabled: false, + disable_auth_on_inference: true, + }); - useEffect(() => { - if (bifrostConfig && config) { - setLocalConfig(config); - setLocalValues({ - allowed_origins: config?.allowed_origins?.join(", ") || "", - allowed_headers: config?.allowed_headers?.join(", ") || "", - required_headers: config?.required_headers?.join(", ") || "", - whitelisted_routes: config?.whitelisted_routes?.join(", ") || "", - }); - } - if (bifrostConfig?.auth_config) { - setAuthConfig(bifrostConfig.auth_config); - } - }, [config, bifrostConfig]); + useEffect(() => { + if (bifrostConfig && config) { + setLocalConfig(config); + setLocalValues({ + allowed_origins: config?.allowed_origins?.join(", ") || "", + allowed_headers: config?.allowed_headers?.join(", ") || "", + required_headers: config?.required_headers?.join(", ") || "", + whitelisted_routes: config?.whitelisted_routes?.join(", ") || "", + }); + } + if (bifrostConfig?.auth_config) { + setAuthConfig(bifrostConfig.auth_config); + } + }, [config, bifrostConfig]); - const hasChanges = useMemo(() => { - if (!config) return false; - const localOrigins = localConfig.allowed_origins?.slice().sort().join(","); - const serverOrigins = config.allowed_origins?.slice().sort().join(","); - const originsChanged = localOrigins !== serverOrigins; + const hasChanges = useMemo(() => { + if (!config) return false; + const localOrigins = localConfig.allowed_origins?.slice().sort().join(","); + const serverOrigins = config.allowed_origins?.slice().sort().join(","); + const originsChanged = localOrigins !== serverOrigins; - const localHeaders = localConfig.allowed_headers?.slice().sort().join(","); - const serverHeaders = config.allowed_headers?.slice().sort().join(","); - const headersChanged = localHeaders !== serverHeaders; + const localHeaders = localConfig.allowed_headers?.slice().sort().join(","); + const serverHeaders = config.allowed_headers?.slice().sort().join(","); + const headersChanged = localHeaders !== serverHeaders; - const usernameChanged = - authConfig.admin_username?.value !== bifrostConfig?.auth_config?.admin_username?.value || - authConfig.admin_username?.env_var !== bifrostConfig?.auth_config?.admin_username?.env_var || - authConfig.admin_username?.from_env !== bifrostConfig?.auth_config?.admin_username?.from_env; - const passwordChanged = - authConfig.admin_password?.value !== bifrostConfig?.auth_config?.admin_password?.value || - authConfig.admin_password?.env_var !== bifrostConfig?.auth_config?.admin_password?.env_var || - authConfig.admin_password?.from_env !== bifrostConfig?.auth_config?.admin_password?.from_env; - const authChanged = showPasswordSection - ? authConfig.is_enabled !== bifrostConfig?.auth_config?.is_enabled || - usernameChanged || - passwordChanged || - authConfig.disable_auth_on_inference !== - bifrostConfig?.auth_config?.disable_auth_on_inference - : false; + const usernameChanged = + authConfig.admin_username?.value !== bifrostConfig?.auth_config?.admin_username?.value || + authConfig.admin_username?.env_var !== bifrostConfig?.auth_config?.admin_username?.env_var || + authConfig.admin_username?.from_env !== bifrostConfig?.auth_config?.admin_username?.from_env; + const passwordChanged = + authConfig.admin_password?.value !== bifrostConfig?.auth_config?.admin_password?.value || + authConfig.admin_password?.env_var !== bifrostConfig?.auth_config?.admin_password?.env_var || + authConfig.admin_password?.from_env !== bifrostConfig?.auth_config?.admin_password?.from_env; + const authChanged = showPasswordSection + ? authConfig.is_enabled !== bifrostConfig?.auth_config?.is_enabled || + usernameChanged || + passwordChanged || + authConfig.disable_auth_on_inference !== bifrostConfig?.auth_config?.disable_auth_on_inference + : false; - const localRequired = localConfig.required_headers?.slice().sort().join(","); - const serverRequired = config.required_headers?.slice().sort().join(","); - const requiredChanged = localRequired !== serverRequired; + const localRequired = localConfig.required_headers?.slice().sort().join(","); + const serverRequired = config.required_headers?.slice().sort().join(","); + const requiredChanged = localRequired !== serverRequired; - const localWhitelistedRoutes = localConfig.whitelisted_routes?.slice().sort().join(","); - const serverWhitelistedRoutes = config.whitelisted_routes?.slice().sort().join(","); - const whitelistedRoutesChanged = localWhitelistedRoutes !== serverWhitelistedRoutes; + const localWhitelistedRoutes = localConfig.whitelisted_routes?.slice().sort().join(","); + const serverWhitelistedRoutes = config.whitelisted_routes?.slice().sort().join(","); + const whitelistedRoutesChanged = localWhitelistedRoutes !== serverWhitelistedRoutes; - const enforceAuthOnInferenceChanged = - localConfig.enforce_auth_on_inference !== config.enforce_auth_on_inference; + const enforceAuthOnInferenceChanged = localConfig.enforce_auth_on_inference !== config.enforce_auth_on_inference; - return ( - originsChanged || - headersChanged || - requiredChanged || - whitelistedRoutesChanged || - authChanged || - enforceAuthOnInferenceChanged - ); - }, [config, localConfig, authConfig, bifrostConfig, showPasswordSection]); + return originsChanged || headersChanged || requiredChanged || whitelistedRoutesChanged || authChanged || enforceAuthOnInferenceChanged; + }, [config, localConfig, authConfig, bifrostConfig, showPasswordSection]); - const needsRestart = useMemo(() => { - if (!config) return false; + const needsRestart = useMemo(() => { + if (!config) return false; - const localOrigins = localConfig.allowed_origins?.slice().sort().join(","); - const serverOrigins = config.allowed_origins?.slice().sort().join(","); - const originsChanged = localOrigins !== serverOrigins; + const localOrigins = localConfig.allowed_origins?.slice().sort().join(","); + const serverOrigins = config.allowed_origins?.slice().sort().join(","); + const originsChanged = localOrigins !== serverOrigins; - const localHeaders = localConfig.allowed_headers?.slice().sort().join(","); - const serverHeaders = config.allowed_headers?.slice().sort().join(","); - const headersChanged = localHeaders !== serverHeaders; + const localHeaders = localConfig.allowed_headers?.slice().sort().join(","); + const serverHeaders = config.allowed_headers?.slice().sort().join(","); + const headersChanged = localHeaders !== serverHeaders; - const enforceAuthOnInferenceChanged = - localConfig.enforce_auth_on_inference !== config.enforce_auth_on_inference && IS_ENTERPRISE; + const enforceAuthOnInferenceChanged = localConfig.enforce_auth_on_inference !== config.enforce_auth_on_inference && IS_ENTERPRISE; - return originsChanged || headersChanged || enforceAuthOnInferenceChanged; - }, [config, localConfig]); + return originsChanged || headersChanged || enforceAuthOnInferenceChanged; + }, [config, localConfig]); - const handleAllowedOriginsChange = useCallback((value: string) => { - setLocalValues((prev) => ({ ...prev, allowed_origins: value })); - setLocalConfig((prev) => ({ ...prev, allowed_origins: parseArrayFromText(value) })); - }, []); + const handleAllowedOriginsChange = useCallback((value: string) => { + setLocalValues((prev) => ({ ...prev, allowed_origins: value })); + setLocalConfig((prev) => ({ ...prev, allowed_origins: parseArrayFromText(value) })); + }, []); - const handleAllowedHeadersChange = useCallback((value: string) => { - setLocalValues((prev) => ({ ...prev, allowed_headers: value })); - setLocalConfig((prev) => ({ ...prev, allowed_headers: parseArrayFromText(value) })); - }, []); + const handleAllowedHeadersChange = useCallback((value: string) => { + setLocalValues((prev) => ({ ...prev, allowed_headers: value })); + setLocalConfig((prev) => ({ ...prev, allowed_headers: parseArrayFromText(value) })); + }, []); - const handleRequiredHeadersChange = useCallback((value: string) => { - setLocalValues((prev) => ({ ...prev, required_headers: value })); - setLocalConfig((prev) => ({ ...prev, required_headers: parseArrayFromText(value) })); - }, []); + const handleRequiredHeadersChange = useCallback((value: string) => { + setLocalValues((prev) => ({ ...prev, required_headers: value })); + setLocalConfig((prev) => ({ ...prev, required_headers: parseArrayFromText(value) })); + }, []); - const handleWhitelistedRoutesChange = useCallback((value: string) => { - setLocalValues((prev) => ({ ...prev, whitelisted_routes: value })); - setLocalConfig((prev) => ({ ...prev, whitelisted_routes: parseArrayFromText(value) })); - }, []); + const handleWhitelistedRoutesChange = useCallback((value: string) => { + setLocalValues((prev) => ({ ...prev, whitelisted_routes: value })); + setLocalConfig((prev) => ({ ...prev, whitelisted_routes: parseArrayFromText(value) })); + }, []); - const handleConfigChange = useCallback((field: keyof CoreConfig, value: boolean) => { - setLocalConfig((prev) => ({ ...prev, [field]: value })); - }, []); + const handleConfigChange = useCallback((field: keyof CoreConfig, value: boolean) => { + setLocalConfig((prev) => ({ ...prev, [field]: value })); + }, []); - const handleAuthToggle = useCallback((checked: boolean) => { - setAuthConfig((prev) => ({ ...prev, is_enabled: checked })); - }, []); + const handleAuthToggle = useCallback((checked: boolean) => { + setAuthConfig((prev) => ({ ...prev, is_enabled: checked })); + }, []); - const handleDisableAuthOnInferenceToggle = useCallback((checked: boolean) => { - setAuthConfig((prev) => ({ ...prev, disable_auth_on_inference: checked })); - }, []); + const handleDisableAuthOnInferenceToggle = useCallback((checked: boolean) => { + setAuthConfig((prev) => ({ ...prev, disable_auth_on_inference: checked })); + }, []); - const handleAuthFieldChange = useCallback( - (field: "admin_username" | "admin_password", value: EnvVar) => { - setAuthConfig((prev) => ({ ...prev, [field]: value })); - }, - [], - ); + const handleAuthFieldChange = useCallback((field: "admin_username" | "admin_password", value: EnvVar) => { + setAuthConfig((prev) => ({ ...prev, [field]: value })); + }, []); - const handleSave = useCallback(async () => { - try { - const validation = validateOrigins(localConfig.allowed_origins); + const handleSave = useCallback(async () => { + try { + const validation = validateOrigins(localConfig.allowed_origins); - if (!validation.isValid && localConfig.allowed_origins.length > 0) { - toast.error( - `Invalid origins: ${validation.invalidOrigins.join(", ")}. Origins must be valid URLs like https://example.com, wildcard patterns like https://*.example.com, or "*" to allow all origins`, - ); - return; - } - const hasUsername = authConfig.admin_username?.value || authConfig.admin_username?.env_var; - const hasPassword = authConfig.admin_password?.value || authConfig.admin_password?.env_var; - await updateCoreConfig({ - ...bifrostConfig!, - client_config: localConfig, - ...(showPasswordSection - ? { - auth_config: - authConfig.is_enabled && hasUsername && hasPassword - ? authConfig - : { ...authConfig, is_enabled: false }, - } - : {}), - }).unwrap(); - toast.success("Security settings updated successfully."); - } catch (error) { - toast.error(getErrorMessage(error)); - } - }, [bifrostConfig, localConfig, authConfig, showPasswordSection, updateCoreConfig]); + if (!validation.isValid && localConfig.allowed_origins.length > 0) { + toast.error( + `Invalid origins: ${validation.invalidOrigins.join(", ")}. Origins must be valid URLs like https://example.com, wildcard patterns like https://*.example.com, or "*" to allow all origins`, + ); + return; + } + const hasUsername = authConfig.admin_username?.value || authConfig.admin_username?.env_var; + const hasPassword = authConfig.admin_password?.value || authConfig.admin_password?.env_var; + await updateCoreConfig({ + ...bifrostConfig!, + client_config: localConfig, + ...(showPasswordSection + ? { + auth_config: authConfig.is_enabled && hasUsername && hasPassword ? authConfig : { ...authConfig, is_enabled: false }, + } + : {}), + }).unwrap(); + toast.success("Security settings updated successfully."); + } catch (error) { + toast.error(getErrorMessage(error)); + } + }, [bifrostConfig, localConfig, authConfig, showPasswordSection, updateCoreConfig]); - return ( -
-
-

Security Settings

-

- Configure security and access control settings. -

-
+ return ( +
+
+

Security Settings

+

Configure security and access control settings.

+
-
- {authConfig.is_enabled && !authConfig.disable_auth_on_inference && ( - - - - You will need to use Basic Auth for all your inference calls (including MCP tool - execution). You can disable it below. Check{" "} - - API Keys - - - - )} - {authConfig.is_enabled && (authConfig.disable_auth_on_inference ?? true) && ( - - - - Authentication is disabled for inference calls. Only dashboard, admin API and MCP tool - execution calls require authentication. - - - )} - {/* Password Protect the Dashboard */} - {IS_ENTERPRISE && authTypeLoading ? ( -
- - Loading authentication settings -
- ) : null} - {IS_ENTERPRISE && !authTypeLoading && authTypeError ? ( - - - - Could not load authentication type. Dashboard password settings are hidden until this - request succeeds. {getErrorMessage(authTypeError)} - - - ) : null} - {showPasswordSection && ( -
-
-
-
- -

- Set up authentication credentials to protect your Bifrost dashboard. Once - configured, use the generated token for all admin API calls. -

-
- -
-
-
- - handleAuthFieldChange("admin_username", value)} - /> -
-
- - handleAuthFieldChange("admin_password", value)} - /> -
- {authConfig.is_enabled && ( -
-
- -

- When enabled, inference API calls (chat completions, embeddings, etc.) will - not require authentication. Dashboard and admin API calls will still require - authentication. -

-
- -
- )} -
-
-
- )} - {/* Enable Auth on Inference */} -
-
- -

- {IS_ENTERPRISE - ? "Require authentication (virtual key, API key, or user token) for all inference endpoints." - : "Require a virtual key for all inference requests."}{" "} - See{" "} - - documentation - {" "} - for details. -

-
- handleConfigChange("enforce_auth_on_inference", checked)} - /> -
- {/* Allowed Origins */} - {needsRestart && } -
-
-
- -

- Comma-separated list of allowed origins for CORS and WebSocket connections. - Localhost origins are always allowed. Each origin must be a complete URL with - protocol (e.g., https://app.example.com, http://10.0.0.100:3000). Wildcards are - supported for subdomains (e.g., https://*.example.com) or use "*" to allow all - origins. -

-
-