kv cache starts back from 0 every other turn or so in all models and in all harnesses

I've got a fresh install of LM Studio with the latest versions of the various harnesses and just about any time the tool calls are done and it comes back to me for the next prompt, even if I already have the next prompt in the queue, it starts back over at 0/n tokens from cache 0% 10% ...

Harnesses I've tried:
- Claude CLI
- OpenCode
- Pi

Models I've tried (all MLX):
- Qwen 3.5 9b
- Qwen 3 coder 30b
- Qwen 3.6 27b
- Qwen 3.6 35b a3b
- Gemma 4 
- many others

I've been using the Anthropic compatibility API.

I've tried on an MacBook Pro M1 Max 32gb and an Mac Studio M2 Max 64gb.

I don't have any special templates - just the defaults.

I have context set to the maximum for all models by default and the context limit set to stop.

Is this a bug? Normal behavior? Am I holding it wrong?

The slowdown is so harsh that the system just isn't usable for coding agents. If I could figure this out, local Ai might be feasible on the 64gb Studio.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv cache starts back from 0 every other turn or so in all models and in all harnesses #327

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

kv cache starts back from 0 every other turn or so in all models and in all harnesses #327

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions