Skip to content

[Bug] Gemma fails on multi tool call because of wrong stop token and bug in logit biasing #509

@lennartvoelz

Description

@lennartvoelz

Description

After #508, I realized that the model fails to predict multiple tool calls from the input. There is a wrong stop token injected in the build_stop_sequences function for Gemma models:

if (model_type == Config::ModelType::GEMMA && has_tools) {
        stop_token_sequences.push_back(tokenizer->encode("<end_function_call>")); // WRONG
        stop_token_sequences.push_back(tokenizer->encode("<start_function_response>"));
}

After removing this, and still having force_tools enabled, the model started to predict the same function call repeatedly until it reached the maximum number of tokens declared. This is due to a bug in the finite state machine that performs logit biasing. Once the first call has finished, the FSM should transition to a state where either another call or a stop token is generated because all tools have been called. However, it currently transitions directly to State::DONE.
There is the need for a state biasing the logits of both <start_function_call> and <start_function_response>.

Steps to reproduce

Remove the line with the <end_function_call> token and run the model with force_tools enabled. The model will now enter into a long generation phase (as described above). You can also run the model with force_tools disabled to see that this feature is not actually required for FunctionGemma because it is already baked into the weights (use the working version from #508).

Remarks:
I only checked this behaviour on Gemma models, but it is quite possible that other model families are affected too!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions