Fix LFM2 pythonic tool parser detection#1620
Conversation
|
Thanks for the report and the concrete validation. The underlying issue is valid: LFM2/LFM2.5 text models need the pythonic tool parser so outputs like <|tool_call_start|>[name(arg='value')]<|tool_call_end|> are returned as structured OpenAI-compatible tool_calls instead of plain assistant content. I handled this with a narrower oMLX-side fix in main instead of the process-wide mlx-lm monkey patch from this PR. oMLX now detects local LFM2 text causal LM checkpoints during tokenizer config setup and sets tokenizer_config["tool_parser_type"] = "pythonic" for that mlx-lm load path. This keeps the change scoped to LFM2 text models and avoids affecting unrelated models or LFM2 audio/STS models. The fix is in 8bce38f. I also added tests for LFM2 text, LFM2 MoE text, LFM2 audio exclusion, and non-LFM2 models with similar markers. Closing this PR in favor of the scoped fix. Thanks again for catching this and for the clear reproduction details. |
Summary
This PR adds a small runtime compatibility patch for LFM2/LFM2.5-style Pythonic tool calls when using the currently pinned
mlx-lmversion.Some LFM2/LFM2.5 MLX models emit tool calls using the following format:
text
<|tool_call_start|>[function_name(arg='value')]<|tool_call_end|>
However, the pinned mlx-lm version currently infers the pythonic tool parser only when the chat template contains:
<|tool_list_start|>
Some LFM2.5 chat templates do not include <|tool_list_start|>, but they do include both:
<|tool_call_start|>
<|tool_call_end|>
As a result, valid model-generated tool calls can be returned as plain assistant content instead of structured OpenAI-compatible message.tool_calls.
This patch makes oMLX recognize that template pattern and map it to the existing pythonic parser before mlx_lm.load() runs.
Problem
When serving an LFM2.5 MLX model through oMLX, the model correctly generated a Pythonic tool call like this:
<|tool_call_start|>[echo_text(text='teste 123')]<|tool_call_end|>
But because the pythonic parser was not inferred, the OpenAI-compatible API returned the tool call as plain message content:
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "\n<|tool_call_start|>[echo_text(text='teste 123')]<|tool_call_end|>"
}
}
This prevented OpenAI-compatible clients from detecting and executing the tool call through message.tool_calls.
Change
This PR adds a new compatibility patch:
omlx/patches/lfm2_tool_parser.py
The patch wraps:
mlx_lm.tokenizer_utils._infer_tool_parser
and adds detection for chat templates that contain both:
<|tool_call_start|>
<|tool_call_end|>
When both markers are present, the patched inference function returns:
pythonic
Otherwise, it falls back to the original mlx-lm parser inference behavior.
The patch is applied from:
omlx/utils/model_loading.py
inside maybe_apply_pre_load_patches(), before mlx_lm.load() runs.
Why this approach
oMLX currently pins mlx-lm to a specific git commit.
The repository already uses runtime monkey patches for compatibility with pinned mlx-lm behavior and upstream changes that have not yet landed in the pinned dependency.
This PR follows the same pattern:
It does not vendor or modify the pinned mlx-lm package directly.
It applies a small, idempotent runtime patch.
It only changes behavior for templates containing LFM2/LFM2.5-style tool call markers.
It falls back to the original parser inference for every other template.
Once upstream mlx-lm supports this detection directly, this patch can be removed.
Manual validation
Validated locally with:
LFM2.5-8B-A1B-mlx-8Bit
served through oMLX using an OpenAI-compatible /v1/chat/completions endpoint.
{
"model": "LFM2.5-8B-A1B-mlx-8Bit",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant. When you need to use a tool, call the tool directly without explanations."
},
{
"role": "user",
"content": "Use the echo_text tool to repeat exactly: teste 123"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "echo_text",
"description": "Repeats exactly the received text.",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string"
}
},
"required": ["text"]
}
}
}
],
"tool_choice": "auto",
"max_tokens": 1024,
"temperature": 0.0
}
Before this patch
The response returned the model-generated tool call as plain assistant content:
{
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "\n<|tool_call_start|>[echo_text(text='teste 123')]<|tool_call_end|>"
}
}
]
}
After this patch
The response returns a structured OpenAI-compatible tool call:
{
"choices": [
{
"finish_reason": "tool_calls",
"message": {
"role": "assistant",
"tool_calls": [
{
"id": "call_...",
"type": "function",
"function": {
"name": "echo_text",
"arguments": "{"text": "teste 123"}"
}
}
]
}
}
]
}
Files changed
Added
omlx/patches/lfm2_tool_parser.py
Adds an idempotent runtime patch for mlx_lm.tokenizer_utils._infer_tool_parser.
Modified
omlx/utils/model_loading.py
Applies the LFM2/LFM2.5 tool parser patch during pre-load patch dispatch.
Compatibility
This patch is intentionally narrow.
It only changes parser inference when the chat template contains both:
<|tool_call_start|>
<|tool_call_end|>
All other chat templates continue to use the original mlx-lm inference logic.
Notes
This is a runtime compatibility patch for the currently pinned mlx-lm version.
A more permanent fix should ideally also be applied upstream in mlx-lm, inside _infer_tool_parser(), by mapping LFM2/LFM2.5 templates containing <|tool_call_start|> and <|tool_call_end|> to the existing pythonic parser.