Skip to content

feat: OpenRouter reasoning support with multi-turn pass-back#4

Open
82deutschmark wants to merge 1 commit intov2-multi-modelfrom
openrouter-reasoning-support
Open

feat: OpenRouter reasoning support with multi-turn pass-back#4
82deutschmark wants to merge 1 commit intov2-multi-modelfrom
openrouter-reasoning-support

Conversation

@82deutschmark
Copy link
Copy Markdown
Collaborator

What

Adds proper OpenRouter reasoning support to the bench:

  1. Sends reasoning: {enabled: true} in the request payload for OpenRouter models
  2. Preserves reasoning_details in assistant messages for multi-turn conversations
  3. Adds --no-reasoning flag to disable when reasoning adds unwanted overhead
  4. Localhost llama.cpp calls are completely unaffected

Compaction and reasoning tokens

Verified: OpenRouter includes reasoning tokens in completion_tokens and total_tokens. The compaction trigger (total_now >= token_limit) already counts reasoning tokens — no changes needed.

Previous fix included

Branch includes commit 7ce8d14 which reads both reasoning_content (llama.cpp) and reasoning (OpenRouter) from streaming deltas.

Send reasoning: {enabled: true} in the request payload for OpenRouter
models so they return reasoning tokens. Preserve reasoning_details in
assistant messages for multi-turn conversations. Add --no-reasoning
flag to disable this when reasoning adds unwanted overhead.

Localhost llama.cpp calls are unaffected (reasoning_enabled defaults
to False in LLMClient, only set True via create_client for openrouter://).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant