Skip to content

Local model context window can be overstated (16k default vs real 8k window) — expose/auto-detect it #106

Description

@esokullu

Summary

The token-aware auto-compaction added in #102 reads the model's window from provider.contextWindow (src/chrome/src/providers/base.js, mirrored in Firefox). For local backends without an explicit config.contextWindow, it returns a conservative 16k default. But the settings UI does not expose a context-window field, so a user running a local server with a smaller real window (e.g. 8k — which our onboarding says is workable with Compact mode) gets contextWindow = 16384.

Auto-compaction then waits until ~0.75 × 16384 ≈ 12,288 input tokens before firing — already past an 8k server's hard limit — so that setup hits context-overflow / _emergencyTrim instead of the smooth token-aware compaction the feature is meant to provide.

Raised by the review bot on #102; deliberately not fixed in that PR (it's UI/provider-config work, out of scope for the compaction change).

Why the default alone can't solve it

It's a genuine trade-off with one fixed number:

  • 16k default (current): good for capable local models (avoids premature compaction), but overstates 8k models → overflow.
  • 8k default: protects 8k models, but over-compacts capable 16k/32k/128k local models, throwing away context unnecessarily.

The real fix is to know each model's actual window rather than guess.

Recommended fix (pick one or both)

  1. Expose a "Context window (tokens)" field in the per-provider settings UI (next to Base URL / model), persisted into config.contextWindow. base.js already honors it — this just lets users set the truth. Default the field's placeholder to the category default (16k local / 128k cloud).
  2. Auto-detect the window from the local server and populate config.contextWindow:
    • llama.cpp: GET /propsdefault_generation_settings.n_ctx (or top-level n_ctx).
    • Ollama: POST /api/showmodel_info's *.context_length.
    • LM Studio: GET /api/v0/models exposes loaded_context_length / max_context_length.
      Fall back to the conservative default when detection fails or isn't supported.

Suggested: do both — auto-detect on connect/model-select, and keep the manual field as an override for servers that don't report it.

Acceptance

  • An 8k local model compacts before ~6k input tokens (≈ 0.75 × 8k) instead of waiting for ~12k.
  • A capable 32k+ local model is not over-compacted (its real window is used).
  • Applies to Chrome and Firefox (both read provider.contextWindow).

Pointers

  • src/chrome/src/providers/base.js get contextWindow() (and Firefox copy) — the default lives here.
  • Per-provider settings field definitions: src/chrome/src/ui/settings.js (e.g. the existing baseUrl field).
  • Provider construction / config: src/chrome/src/providers/manager.js.

Found during the #102 review sweep.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions