Every LLM provider implements the BaseLLMProvider interface:
class BaseLLMProvider {
async chat(messages, options) // → { content, toolCalls, usage }
async *chatStream(messages, options) // → async generator yielding { type, content }
get supportsTools() // → boolean
get supportsVision() // → boolean
get useCompactPrompt() // → boolean (small models get shorter system prompt)
async testConnection() // → { ok, error?, model? }
}{
tools: [...], // tool schemas
temperature: 0.3,
maxTokens: 4096,
stream: false, // use chatStream instead of chat
extraBody: {}, // extra fields passed through to the API
}| Provider ID | Type | Category | Default Model | Vision |
|---|---|---|---|---|
llamacpp |
llamacpp |
local | (loaded model) | Yes (default on) |
ollama |
openai |
local | (loaded model) | Yes (default on) |
lmstudio |
openai |
local | (loaded model) | Yes (default on) |
jan |
openai |
local | (loaded model) | Yes (default on) |
vllm |
openai |
local | (loaded model) | Yes (default on) |
sglang |
openai |
local | (loaded model) | Yes (default on) |
openai |
openai |
cloud | gpt-5.5 |
Model-name regex |
anthropic |
anthropic |
cloud | claude-sonnet-4-6 |
Model-name regex |
claude_subscription |
anthropic_oauth |
cloud | claude-sonnet-4-6 |
Yes |
gemini |
openai |
cloud | gemini-3.1-flash |
Model-name regex |
mistral |
openai |
cloud | mistral-large-latest |
Model-name regex |
deepseek |
openai |
cloud | deepseek-v4-flash |
Model-name regex |
xai (Grok) |
openai |
cloud | grok-4.3 |
Model-name regex |
nvidia (NIM) |
openai |
cloud | meta/llama-3.1-8b-instruct |
Model-name regex |
groq |
openai |
cloud | llama-3.3-70b-versatile |
Model-name regex |
minimax |
openai |
cloud | minimax-m2.7 |
Model-name regex |
alibaba (Qwen) |
openai |
cloud | qwen-max |
Model-name regex |
openrouter |
openai |
router | stepfun/step-3.7-flash |
Model-name regex |
Six local providers are enabled by default with no API key needed unless the local server was started with auth:
- llama.cpp:
http://localhost:8080— runsllama-server -m model.gguf - Ollama:
http://localhost:11434/v1—ollama serve - LM Studio:
http://localhost:1234/v1— LM Studio's local inference server - Jan:
http://localhost:1337/v1— Jan's local OpenAI-compatible API server - vLLM:
http://localhost:8000/v1— vLLM's OpenAI-compatible server - SGLang:
http://localhost:30000/v1— SGLang's OpenAI-compatible server
All six default supportsVision: true since most models loaded locally in 2026 are multimodal.
Context window. Load local models with at least a 16k-token context window for reliable agent runs — that's the usable minimum. 8k can work with Compact mode enabled; 4k is too small to hold the system prompt + tool schemas. The agent reads the window from provider.contextWindow (providers/base.js) to drive auto-compaction; when a provider config doesn't set contextWindow, local providers default to a conservative 16k (cloud/router default to 128k). Set config.contextWindow explicitly to match a larger local window, and make sure the model server is actually started with that much context (e.g. llama-server -c 16384).
Compact prompts are opt-in per provider in both Chrome and Firefox. When
useCompactPrompt is enabled, Act mode uses SYSTEM_PROMPT_ACT_COMPACT and
filters the exposed tools through COMPACT_TOOL_NAMES; Ask mode is unchanged.
| Provider | Mechanism |
|---|---|
| OpenAI-compatible | Regex against model name (gpt-4o, gpt-5, claude-3, claude-sonnet-4, gemini-2.0-flash, etc.) |
| Anthropic | claude-(3|sonnet-4|opus-4) patterns |
| llama.cpp | Explicit supportsVision config toggle |
| Ollama / LM Studio / Jan / vLLM / SGLang | Explicit supportsVision config toggle (via OpenAI provider) |
When the active provider is Anthropic, the agent converts OpenAI-format messages:
| OpenAI format | Anthropic format |
|---|---|
system message |
system field (top-level) |
assistant + tool_calls |
assistant + tool_use content blocks |
tool role |
user + tool_result content blocks |
image_url (data URL) |
image source block |
Manages provider lifecycle:
const pm = new ProviderManager();
await pm.load(); // Load from chrome.storage.local
await pm.save(); // Persist to chrome.storage.local
pm.getActive(); // Get the active provider instance
await pm.setActive('openai'); // Switch active provider
await pm.updateProvider('openai', { model: 'gpt-5' }); // Update config
pm.getAll(); // All provider configs (for Settings UI)
await pm.testProvider('openai'); // Test connectionConfigs are stored in chrome.storage.local under the providers key, merged against defaults. Defaults provide the SHAPE (which provider keys exist); stored configs override per-key values. This allows upgrades that introduce new provider entries to work without users clearing storage.
Deprecated provider entries (webbrain, openai_subscription) are filtered out.
Settings exposes session and total cloud cost allowances. The agent prefers a provider-reported usage.cost/usage.cost_usd value when present (OpenRouter reports this directly). For direct cloud providers that only return token counts, WebBrain estimates spend from the provider config fields:
inputCostPerMillionUsdoutputCostPerMillionUsd
Those rates are editable in the provider card so custom model pricing can be adjusted without code changes. If a metered remote provider has token usage but no configured rates, the agent falls back to conservative default estimates ($3 input / $15 output per 1M tokens). Local providers are not counted.
The user can configure a separate vision provider for screenshot description. The agent sub-calls this provider to get a text description of the viewport, then feeds only the description (not the raw image) to the main planning provider. This reduces token costs when the main provider is text-only:
const vision = await providerManager.getVisionProvider();
// Returns an OpenAICompatibleProvider instance or nullUsed by Tab Recorder for Whisper transcription. Falls back through configured providers in priority order: OpenAI → Groq → LM Studio → llama.cpp. Blocklist excludes providers known not to host Whisper (Anthropic, Gemini, Mistral, DeepSeek, xAI, Nvidia).
- Create the provider class in
src/chrome/src/providers/<name>.jsimplementingBaseLLMProvider - Add the default config to
_defaultConfigs()inmanager.js - Add the factory case in
_createProvider() - Register the import in
manager.js - Add provider-specific handling in the agent if needed (e.g., Anthropic's message format conversion)
- Mirror to Firefox (
src/firefox/src/providers/)
If the provider speaks the OpenAI /v1/chat/completions API format, you only need to add a default config entry — OpenAICompatibleProvider handles the rest:
myprovider: {
type: 'openai',
category: 'cloud',
label: 'My Provider',
providerName: 'myprovider',
baseUrl: 'https://api.myprovider.com/v1',
model: 'my-model',
supportsStreamUsageOptions: false,
apiKey: '',
enabled: false,
},Vision is auto-detected via model-name regex. If the provider has a known set of vision models, add them to the regex in openai.js. Set supportsStreamUsageOptions: true only for providers that accept OpenAI-style stream_options.include_usage; leave it false when a provider returns usage without accepting that request field.