You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,19 @@ This changelog was generated from the repository Git history and release tags. V
6
6
7
7
## [Unreleased]
8
8
9
+
## [15.2.0] - 2026-06-22
10
+
11
+
### Added
12
+
- Jan, vLLM, and SGLang as built-in local providers (Chrome + Firefox). All three use OpenAI-compatible `/v1` endpoints (Jan on port 1337, vLLM on port 8000, SGLang on port 30000), support model listing via `/v1/models`, accept an optional API key for auth-enabled servers, and default to enabled with vision on and a 16 K context window.
13
+
14
+
### Changed
15
+
- Onboarding local-model detection copy now lists Jan, vLLM, and SGLang alongside LM Studio, Ollama, and llama.cpp.
16
+
- LLM request-timeout settings description and provider info panel updated to cover all six local backends.
17
+
- Updated documentation (README, architecture docs, providers guide) to reflect the expanded local-provider lineup.
18
+
19
+
### Tests
20
+
- Added coverage for `categoryFor` and `listProviderModels` with Jan, vLLM, and SGLang — including auth header forwarding and model-list deduplication — and for `_defaultConfigs` asserting all three new providers are present, enabled, local-categorized, and localhost-defaulted.
> **Context window:** For reliable agent runs, load a local model with **at least a 16k-token context window** (the usable minimum). 8k can work with **Compact mode** enabled (Settings → per-provider checkbox); 4k is too small to hold the system prompt + tool schemas. WebBrain auto-compacts the conversation as it nears the window — it assumes 16k for local models unless you set an explicit context size, so give the model server (e.g. `llama-server -c 16384`) enough room.
@@ -97,6 +104,9 @@ Click the gear icon or go to the extension's Options page to configure:
97
104
| llama.cpp |`http://localhost:8080`| Not needed | (your loaded model) |
98
105
| Ollama |`http://localhost:11434/v1`| Not needed | (your loaded model) |
99
106
| LM Studio |`http://localhost:1234/v1`| Not needed | (your loaded model) |
107
+
| Jan |`http://localhost:1337/v1`| Not needed | (your loaded model) |
108
+
| vLLM |`http://localhost:8000/v1`| Optional | (your served model) |
109
+
| SGLang |`http://localhost:30000/v1`| Optional | (your served model) |
**Compact mode** is a reduced tool set + shorter system prompt designed for small local models (2B-8B). In both Chrome and Firefox builds, it cuts the Act-mode schema from 40+ tools to about 20, reducing decision surface and hallucination. Enable it per-provider in Settings (checkbox on llama.cpp, Ollama, LM Studio; off by default).
193
+
**Compact mode** is a reduced tool set + shorter system prompt designed for small local models (2B-8B). In both Chrome and Firefox builds, it cuts the Act-mode schema from 40+ tools to about 20, reducing decision surface and hallucination. Enable it per-provider in Settings (checkbox on local providers; off by default).
184
194
185
195
> **Shadow DOM note:** The accessibility tree only traverses light DOM. On Web Component-heavy pages (Stripe, Salesforce, Shopify), use `get_interactive_elements` (pierces open shadow roots) or `get_shadow_dom` / `shadow_dom_query` for targeted reads.
Copy file name to clipboardExpand all lines: docs/THREAT-MODEL.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ So the question this document answers is: **what is the agent equivalent of the
13
13
## 2. System overview & trust boundaries
14
14
15
15
-**Extension (Manifest V3).** The agent loop, prompt assembly, and tool dispatch run in the extension's standard MV3 sandbox.
16
-
-**Local model process.** llama.cpp (via LM Studio / Ollama) runs as a *separate* process and is reached over `localhost` HTTP. No custom binaries, no elevated privileges; the model itself has only the extension's permissions, indirectly.
16
+
-**Local model process.** llama.cpp, Ollama, LM Studio, Jan, vLLM, or SGLang runs as a *separate* process and is reached over `localhost` HTTP. No custom binaries, no elevated privileges; the model itself has only the extension's permissions, indirectly.
17
17
-**Automation surface.** Page reads and actions are performed through the extension APIs and, for richer control, CDP/debugger automation.
18
18
-**Cloud option.** The same agent can target a cloud model instead of the local one.
-**LM Studio**: `http://localhost:1234/v1` — LM Studio's local inference server
65
+
-**Jan**: `http://localhost:1337/v1` — Jan's local OpenAI-compatible API server
66
+
-**vLLM**: `http://localhost:8000/v1` — vLLM's OpenAI-compatible server
67
+
-**SGLang**: `http://localhost:30000/v1` — SGLang's OpenAI-compatible server
61
68
62
-
All three default `supportsVision: true` since most models loaded locally in 2026 are multimodal.
69
+
All six default `supportsVision: true` since most models loaded locally in 2026 are multimodal.
63
70
64
71
**Context window.** Load local models with **at least a 16k-token context window** for reliable agent runs — that's the usable minimum. 8k can work with Compact mode enabled; 4k is too small to hold the system prompt + tool schemas. The agent reads the window from `provider.contextWindow` (`providers/base.js`) to drive auto-compaction; when a provider config doesn't set `contextWindow`, local providers default to a conservative **16k** (cloud/router default to 128k). Set `config.contextWindow` explicitly to match a larger local window, and make sure the model server is actually started with that much context (e.g. `llama-server -c 16384`).
65
72
@@ -74,7 +81,7 @@ filters the exposed tools through `COMPACT_TOOL_NAMES`; Ask mode is unchanged.
74
81
| OpenAI-compatible | Regex against model name (`gpt-4o`, `gpt-5`, `claude-3`, `claude-sonnet-4`, `gemini-2.0-flash`, etc.) |
0 commit comments