Misc. bug: server /infill endpoint incorrectly inserts <bos> token

### Name and Version

Server built from c31fc8b966817b2f0b277fd28e04a189e388972a

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-server

### Problem description & steps to reproduce

It was mentioned in the [discussion of the codestral model](https://huggingface.co/bartowski/Codestral-22B-v0.1-GGUF/discussions/4#67737fcda8cc8ce76bcd7cc0) that changes in https://github.com/ggerganov/llama.cpp/pull/10023 made `/infill` endpoint add `<bos>` token incorrectly. I'm not really sure it's the case since before this change `prompt` was a required field, but anyway it doesn't seem to be correct.

To reproduce you can use this model: https://huggingface.co/bartowski/codegemma-2b-GGUF with the following request:

```
curl -XPOST "localhost:8080/infill" -d '{"input_prefix": "1, ", "input_suffix": ", 5"}' -H "Content-Type: application/json"
```

In the response you will see 2 `<bos>` tokens: `"prompt": "<bos><|fim_prefix|> 1, <bos><|fim_suffix|> , 5<|fim_middle|>"`.

According to the codegemma [readme](https://huggingface.co/google/codegemma-2b) there shouldn't be any `<bos>` tokens (see the prompt in the first code snippet there).

I don't see any discussion in the mentioned PR regarding special tokens, so I guess this wasn't intentional? Feel free to close this issue if I'm wrong.

The fix is to simply change the flag in this line: https://github.com/ggerganov/llama.cpp/blob/b56f079e28fda692f11a8b59200ceb815b05d419/examples/server/server.cpp#L3800

### First Bad Commit

958367bf530d943a902afa1ce1c342476098576b

### Relevant log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: server /infill endpoint incorrectly inserts <bos> token #11092

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: server /infill endpoint incorrectly inserts <bos> token #11092

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions