Summary
The token-aware auto-compaction added in #102 reads the model's window from provider.contextWindow (src/chrome/src/providers/base.js, mirrored in Firefox). For local backends without an explicit config.contextWindow, it returns a conservative 16k default. But the settings UI does not expose a context-window field, so a user running a local server with a smaller real window (e.g. 8k — which our onboarding says is workable with Compact mode) gets contextWindow = 16384.
Auto-compaction then waits until ~0.75 × 16384 ≈ 12,288 input tokens before firing — already past an 8k server's hard limit — so that setup hits context-overflow / _emergencyTrim instead of the smooth token-aware compaction the feature is meant to provide.
Raised by the review bot on #102; deliberately not fixed in that PR (it's UI/provider-config work, out of scope for the compaction change).
Why the default alone can't solve it
It's a genuine trade-off with one fixed number:
- 16k default (current): good for capable local models (avoids premature compaction), but overstates 8k models → overflow.
- 8k default: protects 8k models, but over-compacts capable 16k/32k/128k local models, throwing away context unnecessarily.
The real fix is to know each model's actual window rather than guess.
Recommended fix (pick one or both)
- Expose a "Context window (tokens)" field in the per-provider settings UI (next to Base URL / model), persisted into
config.contextWindow. base.js already honors it — this just lets users set the truth. Default the field's placeholder to the category default (16k local / 128k cloud).
- Auto-detect the window from the local server and populate
config.contextWindow:
- llama.cpp:
GET /props → default_generation_settings.n_ctx (or top-level n_ctx).
- Ollama:
POST /api/show → model_info's *.context_length.
- LM Studio:
GET /api/v0/models exposes loaded_context_length / max_context_length.
Fall back to the conservative default when detection fails or isn't supported.
Suggested: do both — auto-detect on connect/model-select, and keep the manual field as an override for servers that don't report it.
Acceptance
- An 8k local model compacts before ~6k input tokens (≈ 0.75 × 8k) instead of waiting for ~12k.
- A capable 32k+ local model is not over-compacted (its real window is used).
- Applies to Chrome and Firefox (both read
provider.contextWindow).
Pointers
src/chrome/src/providers/base.js get contextWindow() (and Firefox copy) — the default lives here.
- Per-provider settings field definitions:
src/chrome/src/ui/settings.js (e.g. the existing baseUrl field).
- Provider construction / config:
src/chrome/src/providers/manager.js.
Found during the #102 review sweep.
Summary
The token-aware auto-compaction added in #102 reads the model's window from
provider.contextWindow(src/chrome/src/providers/base.js, mirrored in Firefox). For local backends without an explicitconfig.contextWindow, it returns a conservative 16k default. But the settings UI does not expose a context-window field, so a user running a local server with a smaller real window (e.g. 8k — which our onboarding says is workable with Compact mode) getscontextWindow = 16384.Auto-compaction then waits until ~
0.75 × 16384 ≈ 12,288input tokens before firing — already past an 8k server's hard limit — so that setup hits context-overflow /_emergencyTriminstead of the smooth token-aware compaction the feature is meant to provide.Raised by the review bot on #102; deliberately not fixed in that PR (it's UI/provider-config work, out of scope for the compaction change).
Why the default alone can't solve it
It's a genuine trade-off with one fixed number:
The real fix is to know each model's actual window rather than guess.
Recommended fix (pick one or both)
config.contextWindow.base.jsalready honors it — this just lets users set the truth. Default the field's placeholder to the category default (16k local / 128k cloud).config.contextWindow:GET /props→default_generation_settings.n_ctx(or top-leveln_ctx).POST /api/show→model_info's*.context_length.GET /api/v0/modelsexposesloaded_context_length/max_context_length.Fall back to the conservative default when detection fails or isn't supported.
Suggested: do both — auto-detect on connect/model-select, and keep the manual field as an override for servers that don't report it.
Acceptance
provider.contextWindow).Pointers
src/chrome/src/providers/base.jsget contextWindow()(and Firefox copy) — the default lives here.src/chrome/src/ui/settings.js(e.g. the existingbaseUrlfield).src/chrome/src/providers/manager.js.Found during the #102 review sweep.