Local model context window can be overstated (16k default vs real 8k window) — expose/auto-detect it

## Summary

The token-aware auto-compaction added in #102 reads the model's window from `provider.contextWindow` (`src/chrome/src/providers/base.js`, mirrored in Firefox). For local backends without an explicit `config.contextWindow`, it returns a conservative **16k** default. But the settings UI does not expose a context-window field, so a user running a local server with a smaller real window (e.g. **8k** — which our onboarding says is workable with Compact mode) gets `contextWindow = 16384`.

Auto-compaction then waits until ~`0.75 × 16384 ≈ 12,288` input tokens before firing — already **past an 8k server's hard limit** — so that setup hits context-overflow / `_emergencyTrim` instead of the smooth token-aware compaction the feature is meant to provide.

Raised by the review bot on #102; deliberately **not** fixed in that PR (it's UI/provider-config work, out of scope for the compaction change).

## Why the default alone can't solve it

It's a genuine trade-off with one fixed number:
- **16k default** (current): good for capable local models (avoids premature compaction), but overstates 8k models → overflow.
- **8k default**: protects 8k models, but over-compacts capable 16k/32k/128k local models, throwing away context unnecessarily.

The real fix is to know each model's **actual** window rather than guess.

## Recommended fix (pick one or both)

1. **Expose a "Context window (tokens)" field** in the per-provider settings UI (next to Base URL / model), persisted into `config.contextWindow`. `base.js` already honors it — this just lets users set the truth. Default the field's placeholder to the category default (16k local / 128k cloud).
2. **Auto-detect the window** from the local server and populate `config.contextWindow`:
   - llama.cpp: `GET /props` → `default_generation_settings.n_ctx` (or top-level `n_ctx`).
   - Ollama: `POST /api/show` → `model_info`'s `*.context_length`.
   - LM Studio: `GET /api/v0/models` exposes `loaded_context_length` / `max_context_length`.
   Fall back to the conservative default when detection fails or isn't supported.

Suggested: do **both** — auto-detect on connect/model-select, and keep the manual field as an override for servers that don't report it.

## Acceptance
- An 8k local model compacts before ~6k input tokens (≈ 0.75 × 8k) instead of waiting for ~12k.
- A capable 32k+ local model is not over-compacted (its real window is used).
- Applies to Chrome and Firefox (both read `provider.contextWindow`).

## Pointers
- `src/chrome/src/providers/base.js` `get contextWindow()` (and Firefox copy) — the default lives here.
- Per-provider settings field definitions: `src/chrome/src/ui/settings.js` (e.g. the existing `baseUrl` field).
- Provider construction / config: `src/chrome/src/providers/manager.js`.

Found during the #102 review sweep.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local model context window can be overstated (16k default vs real 8k window) — expose/auto-detect it #106

Summary

Why the default alone can't solve it

Recommended fix (pick one or both)

Acceptance

Pointers

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Local model context window can be overstated (16k default vs real 8k window) — expose/auto-detect it #106

Description

Summary

Why the default alone can't solve it

Recommended fix (pick one or both)

Acceptance

Pointers

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions