Eval bug: Thinking model with thinking disabled cannot use /apply-template with final assistant turn

### Name and Version

```
$./bin/llama-cli --version
version: 6161 (291f531cd)
built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.6.0
```

### Operating systems

Mac

### GGML backends

Metal

### Hardware

M3 Max 64GB

### Models

* Thinking model: IBM Granite 3.2 8b
* Non-thinking model: IBM Granite Code 2b

### Problem description & steps to reproduce

## Description

This issue is a bug in a recently introduced change to support the `"enable_thinking"` toggle for Qwen3. It relates to this comment thread: https://github.com/ggml-org/llama.cpp/pull/13196/files#r2282737348.

The problem is that in response to https://github.com/ggml-org/llama.cpp/pull/13196/files#r2134714258, the logical condition of `inputs.enable_thinking` was inverted in https://github.com/ggml-org/llama.cpp/commit/a056e536a40f5bfeeef0c613b0d69eaef7e1918c. The result is that for a model that _does_ support thinking, but needs to explicitly disable it (`--reasoning-budget 0`), attempting to apply the chat template (either for `/apply-template` or a full prefill operation), will result in triggering the error `"Assistant response prefill is incompatible with enable_thinking."`.

## Repro

```sh
# Boot the server with Granite 3.2 8B and thinking disabled
./bin/llama-server -hf ibm-research/granite-3.2-8b-instruct-GGUF --jinja --reasoning-budget 0
```

```sh
# Send apply-template with a final assistant turn
curl http://localhost:8081/apply-template -d '{"messages": [{"role": "user", "content": "hello world"}, {"role": "assistant", "content": "hi hi"}]}'
```

**Response**
```json
{"error":{"code":500,"message":"Assistant response prefill is incompatible with enable_thinking.","type":"server_error"}}
```

**Expected**
```json
{"prompt":"<|start_of_role|>system<|end_of_role|>Knowledge Cutoff Date: April 2024.\nToday's Date: August 18, 2025.\nYou are Granite, developed by IBM. You are a helpful AI assistant.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>hello world<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>hi hi"}
```

### First Bad Commit

https://github.com/ggml-org/llama.cpp/commit/a056e536a40f5bfeeef0c613b0d69eaef7e1918c

### Relevant log output

```shell
main: server is listening on http://127.0.0.1:8081 - starting the main loop
srv  update_slots: all slots are idle
got exception: {"code":500,"message":"Assistant response prefill is incompatible with enable_thinking.","type":"server_error"}
srv  log_server_r: request: POST /apply-template 127.0.0.1 500
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Thinking model with thinking disabled cannot use /apply-template with final assistant turn #15401

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Description

Repro

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Thinking model with thinking disabled cannot use /apply-template with final assistant turn #15401

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Description

Repro

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions