Skip to content

Fix LFM2 pythonic tool parser detection#1620

Closed
fneto1977 wants to merge 1 commit into
jundot:mainfrom
fneto1977:fix-lfm2-pythonic-tool-parser
Closed

Fix LFM2 pythonic tool parser detection#1620
fneto1977 wants to merge 1 commit into
jundot:mainfrom
fneto1977:fix-lfm2-pythonic-tool-parser

Conversation

@fneto1977
Copy link
Copy Markdown

Summary

This PR adds a small runtime compatibility patch for LFM2/LFM2.5-style Pythonic tool calls when using the currently pinned mlx-lm version.

Some LFM2/LFM2.5 MLX models emit tool calls using the following format:

text
<|tool_call_start|>[function_name(arg='value')]<|tool_call_end|>

However, the pinned mlx-lm version currently infers the pythonic tool parser only when the chat template contains:
<|tool_list_start|>

Some LFM2.5 chat templates do not include <|tool_list_start|>, but they do include both:
<|tool_call_start|>
<|tool_call_end|>
As a result, valid model-generated tool calls can be returned as plain assistant content instead of structured OpenAI-compatible message.tool_calls.
This patch makes oMLX recognize that template pattern and map it to the existing pythonic parser before mlx_lm.load() runs.

Problem
When serving an LFM2.5 MLX model through oMLX, the model correctly generated a Pythonic tool call like this:
<|tool_call_start|>[echo_text(text='teste 123')]<|tool_call_end|>

But because the pythonic parser was not inferred, the OpenAI-compatible API returned the tool call as plain message content:
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "\n<|tool_call_start|>[echo_text(text='teste 123')]<|tool_call_end|>"
}
}

This prevented OpenAI-compatible clients from detecting and executing the tool call through message.tool_calls.

Change
This PR adds a new compatibility patch:
omlx/patches/lfm2_tool_parser.py

The patch wraps:
mlx_lm.tokenizer_utils._infer_tool_parser

and adds detection for chat templates that contain both:

<|tool_call_start|>
<|tool_call_end|>

When both markers are present, the patched inference function returns:
pythonic

Otherwise, it falls back to the original mlx-lm parser inference behavior.
The patch is applied from:
omlx/utils/model_loading.py

inside maybe_apply_pre_load_patches(), before mlx_lm.load() runs.

Why this approach
oMLX currently pins mlx-lm to a specific git commit.
The repository already uses runtime monkey patches for compatibility with pinned mlx-lm behavior and upstream changes that have not yet landed in the pinned dependency.
This PR follows the same pattern:

It does not vendor or modify the pinned mlx-lm package directly.
It applies a small, idempotent runtime patch.
It only changes behavior for templates containing LFM2/LFM2.5-style tool call markers.
It falls back to the original parser inference for every other template.

Once upstream mlx-lm supports this detection directly, this patch can be removed.

Manual validation
Validated locally with:
LFM2.5-8B-A1B-mlx-8Bit
served through oMLX using an OpenAI-compatible /v1/chat/completions endpoint.

{
"model": "LFM2.5-8B-A1B-mlx-8Bit",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant. When you need to use a tool, call the tool directly without explanations."
},
{
"role": "user",
"content": "Use the echo_text tool to repeat exactly: teste 123"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "echo_text",
"description": "Repeats exactly the received text.",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string"
}
},
"required": ["text"]
}
}
}
],
"tool_choice": "auto",
"max_tokens": 1024,
"temperature": 0.0
}

Before this patch
The response returned the model-generated tool call as plain assistant content:

{
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "\n<|tool_call_start|>[echo_text(text='teste 123')]<|tool_call_end|>"
}
}
]
}

After this patch
The response returns a structured OpenAI-compatible tool call:
{
"choices": [
{
"finish_reason": "tool_calls",
"message": {
"role": "assistant",
"tool_calls": [
{
"id": "call_...",
"type": "function",
"function": {
"name": "echo_text",
"arguments": "{"text": "teste 123"}"
}
}
]
}
}
]
}

Files changed
Added
omlx/patches/lfm2_tool_parser.py
Adds an idempotent runtime patch for mlx_lm.tokenizer_utils._infer_tool_parser.

Modified
omlx/utils/model_loading.py
Applies the LFM2/LFM2.5 tool parser patch during pre-load patch dispatch.

Compatibility
This patch is intentionally narrow.
It only changes parser inference when the chat template contains both:

<|tool_call_start|>
<|tool_call_end|>

All other chat templates continue to use the original mlx-lm inference logic.

Notes
This is a runtime compatibility patch for the currently pinned mlx-lm version.
A more permanent fix should ideally also be applied upstream in mlx-lm, inside _infer_tool_parser(), by mapping LFM2/LFM2.5 templates containing <|tool_call_start|> and <|tool_call_end|> to the existing pythonic parser.

@jundot
Copy link
Copy Markdown
Owner

jundot commented Jun 3, 2026

Thanks for the report and the concrete validation. The underlying issue is valid: LFM2/LFM2.5 text models need the pythonic tool parser so outputs like <|tool_call_start|>[name(arg='value')]<|tool_call_end|> are returned as structured OpenAI-compatible tool_calls instead of plain assistant content.

I handled this with a narrower oMLX-side fix in main instead of the process-wide mlx-lm monkey patch from this PR. oMLX now detects local LFM2 text causal LM checkpoints during tokenizer config setup and sets tokenizer_config["tool_parser_type"] = "pythonic" for that mlx-lm load path. This keeps the change scoped to LFM2 text models and avoids affecting unrelated models or LFM2 audio/STS models.

The fix is in 8bce38f. I also added tests for LFM2 text, LFM2 MoE text, LFM2 audio exclusion, and non-LFM2 models with similar markers.

Closing this PR in favor of the scoped fix. Thanks again for catching this and for the clear reproduction details.

@jundot jundot closed this Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants