Ferro Labs - AI Gateway

	Ferro Labs - AI Gateway
	TypeScript SDK

Route LLM requests across 30 providers and 2,500+ models through a single OpenAI-compatible API. Zero code changes to migrate from openai. Built on Ferro Labs AI Gateway.

import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient({ apiKey: "sk-ferro-..." });

// Route to OpenAI
const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

// Route to Anthropic — same client, same call
const response2 = await client.chat.completions.create({
  model: "claude-3-5-sonnet-20241022",
  messages: [{ role: "user", content: "Hello" }],
});

console.log(response.choices[0]?.message.content);
console.log(`Handled by: ${response.provider} in ${response.latency_ms}ms`);

Why Ferro Labs SDK

One API for 30 providers. OpenAI, Anthropic, Google, Groq, Together, Mistral, Cohere, Bedrock, Vertex, Azure, and more — all via a single client.
Drop-in OpenAI replacement. The surface matches the OpenAI SDK. Change two lines and keep all your existing code.
Smart routing built in. Fallback chains, weighted load balancing, and per-request overrides via route_tag.
Cost and provider visibility. Every response includes provider, cost_usd, latency_ms, and trace_id — no extra calls.
Self-hostable. Point baseUrl at any Ferro Labs AI Gateway instance and go.
TypeScript-first. Full type inference, strict mode, zero runtime dependencies, ESM + CJS dual output.

Installation

npm install @ferro-labs-ai/sdk

pnpm add @ferro-labs-ai/sdk

yarn add @ferro-labs-ai/sdk

Requires Node.js 18+ (also works in Bun, Deno, and modern browsers). Zero runtime dependencies — uses native fetch.

Quickstart

You'll need a running Ferro Labs AI Gateway instance and an API key issued by it.

import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient({
  apiKey: "sk-ferro-your-key",
  baseUrl: "http://localhost:8080", // your gateway address
});

Environment variables

export FERRO_API_KEY="sk-ferro-your-key"
export FERRO_BASE_URL="http://localhost:8080"

const client = new FerroClient(); // reads FERRO_API_KEY / FERRO_BASE_URL automatically

FERRO_API_KEY takes precedence, but OPENAI_API_KEY is also accepted as a fallback to make migration painless.

Migrate from OpenAI

// Before
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "sk-openai-..." });

// After — all your existing code works unchanged
import { FerroClient } from "@ferro-labs-ai/sdk";
const client = new FerroClient({ apiKey: "sk-ferro-..." });

Every client.chat.completions.create(...) call, every streaming loop, every tool call — identical API surface. Ferro routes to the right provider based on the model name.

Framework integrations

Ferro's gateway exposes an OpenAI-compatible HTTP API at /v1/*, so anything that speaks OpenAI works. Point the base URL at your gateway and keep your existing framework.

Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";

const ferro = createOpenAI({
  apiKey: process.env.FERRO_API_KEY,
  baseURL: "http://localhost:8080/v1",
});

LangChain.js

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  openAIApiKey: "sk-ferro-your-key",
  configuration: { baseURL: "http://localhost:8080/v1" },
  modelName: "gpt-4o",
});

LlamaIndex.TS

import { OpenAI } from "llamaindex";

const llm = new OpenAI({
  apiKey: "sk-ferro-your-key",
  additionalSessionOptions: { baseURL: "http://localhost:8080/v1" },
  model: "gpt-4o",
});

Usage

Chat completions

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain LLM routing in one paragraph." },
  ],
  temperature: 0.7,
  max_tokens: 256,
});

console.log(response.choices[0]?.message.content);
console.log(`Cost: $${response.usage?.cost_usd?.toFixed(6)}`);
console.log(`Provider: ${response.provider}`);

Streaming

const stream = await client.chat.completions.create({
  model: "claude-3-5-sonnet-20241022",
  messages: [{ role: "user", content: "Write a haiku about Go performance." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Embeddings

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: ["Ferro routes LLM requests", "across 30 providers"],
});

const vectors = response.data.map((d) => d.embedding);
console.log(`Embedding dimensions: ${vectors[0]?.length}`);

Image generation

const response = await client.images.generate({
  model: "dall-e-3",
  prompt: "A futuristic AI gateway routing data streams across glowing servers",
  size: "1024x1024",
  quality: "hd",
});

console.log(response.data[0]?.url);

Model catalog

// Browse all 2,500+ models
const models = await client.models.list();

// Filter by provider
const anthropicModels = await client.models.list({ provider: "anthropic" });

// Filter by capability
const visionModels = await client.models.list({ capability: "vision" });

// Pricing for a specific model
const info = await client.models.retrieve("gpt-4o");
console.log(`Context window: ${info.context_window?.toLocaleString()} tokens`);

Ferro extras: templates & route tags

The SDK passes two Ferro-specific fields on chat.completions.create(...):

template_id + template_variables — render a server-side prompt template at request time:

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "I can't log in" }],
  template_id: "support-agent",
  template_variables: {
    product: "Acme SaaS",
    plan: "Pro",
    date: "2026-04-28",
  },
});

route_tag — override the routing strategy for a single request:

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
  route_tag: "low-cost", // forces fallback to cheaper providers
});

Both fields are silently ignored by any OpenAI-compatible backend that doesn't understand them, so it's safe to keep them in shared code paths.

Observability

Every ChatCompletion includes fields that tell you what the gateway actually did — no extra API calls, no log scraping:

Field	Type	Source
`response.provider`	`string`	Which upstream provider served the request (e.g. `"openai"`, `"anthropic"`)
`response.trace_id`	`string`	Correlates this request with gateway logs
`response.latency_ms`	`number`	End-to-end gateway latency
`response.usage.cost_usd`	`number`	Computed cost in USD
`response.usage.cache_hit`	`boolean`	Whether the response came from the gateway's semantic cache
`response.usage.prompt_tokens` / `completion_tokens` / `total_tokens`	`number`	Standard OpenAI token counts

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

console.log(
  `trace=${response.trace_id} provider=${response.provider} ` +
  `latency=${response.latency_ms}ms cost=$${response.usage?.cost_usd?.toFixed(6)}`
);

Configuration

const client = new FerroClient({
  apiKey: "sk-ferro-...",              // or FERRO_API_KEY env var
  baseUrl: "http://localhost:8080",    // or FERRO_BASE_URL env var
  timeout: 120_000,                    // milliseconds (default: 120,000)
  maxRetries: 2,                       // retries on connection errors (default: 2)
  defaultHeaders: { "x-env": "prod" }, // merged into every request
  fetch: customFetchFn,               // bring your own fetch (testing, polyfill)
});

Retries are triggered only by network errors (DNS failures, connection refused, timeouts) — HTTP errors (4xx/5xx) propagate immediately as typed exceptions so you can handle them yourself.

Bring-your-own fetch lets you use a custom implementation for testing, proxies, or runtime polyfills:

import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient({
  apiKey: "sk-ferro-...",
  fetch: myCustomFetch, // e.g. undici fetch, node-fetch, or a mock
});

Error handling

import {
  FerroClient,
  FerroAuthError,
  FerroRateLimitError,
  FerroNotFoundError,
  FerroServerError,
  FerroConnectionError,
} from "@ferro-labs-ai/sdk";

try {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof FerroAuthError) {
    console.error("Invalid API key — check FERRO_API_KEY");
  } else if (error instanceof FerroRateLimitError) {
    console.error("Rate limit hit — back off and retry");
  } else if (error instanceof FerroNotFoundError) {
    console.error("Model or endpoint not found");
  } else if (error instanceof FerroServerError) {
    console.error(`Gateway error ${error.status} — upstream may be down`);
  } else if (error instanceof FerroConnectionError) {
    console.error("Cannot reach gateway — is it running?");
  }
}

All HTTP-level exceptions inherit from FerroAPIError and expose .status, .code, .message, and .requestId. FerroConnectionError and FerroStreamError inherit from FerroError directly.

Admin API (OSS gateway)

These APIs are available on any self-hosted Ferro Labs AI Gateway instance. Requires an admin-scoped API key.

API keys

// Create
const newKey = await client.admin.keys.create({
  name: "backend-service",
  scopes: ["admin"],
});
console.log(newKey.key); // full key value — shown ONCE, store it securely

// List
const keys = await client.admin.keys.list();

// Per-key usage counts
const usage = await client.admin.keys.usage({ limit: 20 });

// Revoke — keeps the record for audit, invalidates immediately
await client.admin.keys.revoke("key_id");

// Rotate — atomically invalidates old, returns new
const rotated = await client.admin.keys.rotate("key_id");

// Permanently delete the record
await client.admin.keys.delete("key_id");

Gateway routing config

// Read the current config
const cfg = await client.admin.config.get();
console.log(cfg.strategy); // e.g. { mode: "fallback" }
console.log(cfg.targets);  // list of { virtual_key, weight, ... }

// Replace it (PUT) — hot reload, no restart
await client.admin.config.update({
  strategy: { mode: "fallback" },
  targets: [
    { virtual_key: "openai", weight: 1 },
    { virtual_key: "anthropic", weight: 1 },
    { virtual_key: "groq", weight: 1 },
  ],
  plugins: [
    { name: "cache", enabled: true },
    { name: "logger", enabled: true },
  ],
});

// Inspect history and roll back
const history = await client.admin.config.history();
await client.admin.config.rollback(history[history.length - 2]!.version);

Request logs

// Recent failures
const errors = await client.admin.logs.list({ limit: 20, stage: "on_error" });

// Aggregate stats
const stats = await client.admin.logs.stats();

// Prune old entries
await client.admin.logs.delete({ before: "2026-01-01T00:00:00Z" });

Providers, plugins, dashboard

const providers = await client.admin.providers.list(); // registered LLM providers
const plugins   = await client.admin.plugins.list();   // installed gateway plugins
const dashboard = await client.admin.dashboard();       // high-level counts
const health    = await client.admin.health();          // gateway health check

Examples

Runnable examples in the examples/ directory. Run any with npx tsx:

export FERRO_API_KEY=sk-ferro-...
npx tsx examples/basic.ts

Basic chat completion

// examples/basic.ts
import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient();
const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello, tell me a short joke." }],
});
console.log(response.choices[0]?.message.content);
console.log(`Provider: ${response.provider} | Tokens: ${response.usage?.total_tokens}`);

Streaming

// examples/streaming.ts
import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient();
const stream = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Write a haiku about distributed systems." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Multi-provider routing

// examples/multi-provider.ts — same client, different providers
import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient();
for (const model of ["gpt-4o-mini", "claude-3-5-sonnet-20241022", "llama-3.3-70b-versatile"]) {
  const r = await client.chat.completions.create({
    model,
    messages: [{ role: "user", content: "Say hello in 5 words." }],
  });
  console.log(`[${r.provider}] ${model} → ${r.choices[0]?.message.content}`);
}

Tool / function calling

// examples/tool-calling.ts
import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient();
const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "What's the weather in SF?" }],
  tools: [{
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a location.",
      parameters: {
        type: "object",
        properties: { location: { type: "string" } },
        required: ["location"],
      },
    },
  }],
  tool_choice: "auto",
});

for (const call of response.choices[0]?.message.tool_calls ?? []) {
  console.log(`Tool: ${call.function.name}(${call.function.arguments})`);
}

Embeddings

// examples/embeddings.ts
import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient();
const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: ["Ferro routes LLM requests", "across 30 providers"],
});
console.log(`Dimensions: ${response.data[0]?.embedding.length}`);

Image generation

// examples/image-generation.ts
import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient();
const response = await client.images.generate({
  model: "dall-e-3",
  prompt: "A futuristic AI gateway routing data streams",
  size: "1024x1024",
});
console.log(response.data[0]?.url);

Model catalog

// examples/model-catalog.ts
import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient();
const models = await client.models.list();
console.log(`Total: ${models.length} models`);

const anthropic = await client.models.list({ provider: "anthropic" });
console.log(`Anthropic: ${anthropic.length} models`);

const info = await client.models.retrieve("gpt-4o");
console.log(`Context: ${info.context_window?.toLocaleString()} tokens`);

Error handling

// examples/error-handling.ts
import { FerroClient, FerroAuthError, FerroRateLimitError, FerroServerError } from "@ferro-labs-ai/sdk";

const client = new FerroClient();
try {
  await client.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof FerroAuthError) console.error("Bad API key");
  else if (error instanceof FerroRateLimitError) console.error("Rate limited");
  else if (error instanceof FerroServerError) console.error(`Server error: ${error.status}`);
}

Admin: API key management

// examples/admin-keys.ts
import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient();
const newKey = await client.admin.keys.create({ name: "backend-svc", scopes: ["read_only"] });
console.log(`Key: ${newKey.key}`); // shown once

const keys = await client.admin.keys.list();
await client.admin.keys.rotate(newKey.id);
await client.admin.keys.delete(newKey.id);

Admin: Gateway config

// examples/admin-config.ts
import { FerroClient } from "@ferro-labs-ai/sdk";

const client = new FerroClient();
const config = await client.admin.config.get();
console.log("Strategy:", config.strategy);

await client.admin.config.update({
  strategy: { mode: "fallback" },
  targets: [{ virtual_key: "openai" }, { virtual_key: "anthropic" }],
});

const history = await client.admin.config.history();
await client.admin.config.rollback(history[0]!.version);

Development

git clone https://github.com/ferro-labs/ferrolabs-typescript-sdk
cd ferrolabs-typescript-sdk
npm install
npm run typecheck     # tsc --noEmit
npm test              # vitest (all HTTP is mocked — no gateway needed)
npm run build         # tsup → dist/ (ESM + CJS + declarations)

All 139 tests run in under a second against mocked fetch, so no network or running gateway is required.

See CHANGELOG.md for release history.

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.gitignore		.gitignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

Ferro Labs - AI Gateway

Why Ferro Labs SDK

Contents

Installation

Quickstart

Environment variables

Migrate from OpenAI

Framework integrations

Vercel AI SDK

LangChain.js

LlamaIndex.TS

Usage

Chat completions

Streaming

Embeddings

Image generation

Model catalog

Ferro extras: templates & route tags

Observability

Configuration

Error handling

Admin API (OSS gateway)

API keys

Gateway routing config

Request logs

Providers, plugins, dashboard

Examples

Development

License

Links

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages