Description
After #508, I realized that the model fails to predict multiple tool calls from the input. There is a wrong stop token injected in the build_stop_sequences function for Gemma models:
if (model_type == Config::ModelType::GEMMA && has_tools) {
stop_token_sequences.push_back(tokenizer->encode("<end_function_call>")); // WRONG
stop_token_sequences.push_back(tokenizer->encode("<start_function_response>"));
}
After removing this, and still having force_tools enabled, the model started to predict the same function call repeatedly until it reached the maximum number of tokens declared. This is due to a bug in the finite state machine that performs logit biasing. Once the first call has finished, the FSM should transition to a state where either another call or a stop token is generated because all tools have been called. However, it currently transitions directly to State::DONE.
There is the need for a state biasing the logits of both <start_function_call> and <start_function_response>.
Steps to reproduce
Remove the line with the <end_function_call> token and run the model with force_tools enabled. The model will now enter into a long generation phase (as described above). You can also run the model with force_tools disabled to see that this feature is not actually required for FunctionGemma because it is already baked into the weights (use the working version from #508).
Remarks:
I only checked this behaviour on Gemma models, but it is quite possible that other model families are affected too!
Description
After #508, I realized that the model fails to predict multiple tool calls from the input. There is a wrong stop token injected in the
build_stop_sequencesfunction for Gemma models:After removing this, and still having force_tools enabled, the model started to predict the same function call repeatedly until it reached the maximum number of tokens declared. This is due to a bug in the finite state machine that performs logit biasing. Once the first call has finished, the FSM should transition to a state where either another call or a stop token is generated because all tools have been called. However, it currently transitions directly to State::DONE.
There is the need for a state biasing the logits of both <start_function_call> and <start_function_response>.
Steps to reproduce
Remove the line with the <end_function_call> token and run the model with force_tools enabled. The model will now enter into a long generation phase (as described above). You can also run the model with force_tools disabled to see that this feature is not actually required for FunctionGemma because it is already baked into the weights (use the working version from #508).
Remarks:
I only checked this behaviour on Gemma models, but it is quite possible that other model families are affected too!