Description
Name and Version
Server built from c31fc8b
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
It was mentioned in the discussion of the codestral model that changes in #10023 made /infill
endpoint add <bos>
token incorrectly. I'm not really sure it's the case since before this change prompt
was a required field, but anyway it doesn't seem to be correct.
To reproduce you can use this model: https://huggingface.co/bartowski/codegemma-2b-GGUF with the following request:
curl -XPOST "localhost:8080/infill" -d '{"input_prefix": "1, ", "input_suffix": ", 5"}' -H "Content-Type: application/json"
In the response you will see 2 <bos>
tokens: "prompt": "<bos><|fim_prefix|> 1, <bos><|fim_suffix|> , 5<|fim_middle|>"
.
According to the codegemma readme there shouldn't be any <bos>
tokens (see the prompt in the first code snippet there).
I don't see any discussion in the mentioned PR regarding special tokens, so I guess this wasn't intentional? Feel free to close this issue if I'm wrong.
The fix is to simply change the flag in this line: https://github.com/ggerganov/llama.cpp/blob/b56f079e28fda692f11a8b59200ceb815b05d419/examples/server/server.cpp#L3800
First Bad Commit
Relevant log output
No response