Skip to content

Fix LFM2.5 pythonic tool parser auto-detection#1333

Open
ChristianWeyer wants to merge 1 commit into
ml-explore:mainfrom
ChristianWeyer:fix/lfm25-pythonic-tool-parser-detection
Open

Fix LFM2.5 pythonic tool parser auto-detection#1333
ChristianWeyer wants to merge 1 commit into
ml-explore:mainfrom
ChristianWeyer:fix/lfm25-pythonic-tool-parser-detection

Conversation

@ChristianWeyer
Copy link
Copy Markdown

What

_infer_tool_parser() in tokenizer_utils.py detects the pythonic tool-call
format via the chat-template token <|tool_list_start|>. LFM2.5 renamed those
delimiters to <|tool_call_start|> / <|tool_call_end|> (the inner pythonic
call format is unchanged), so the detection misses for every LFM2.5
checkpoint — including the official LiquidAI/LFM2.5-8B-A1B,
LiquidAI/LFM2.5-8B-A1B-MLX-8bit, LiquidAI/LFM2.5-1.2B-Thinking, etc.

Symptom: loading any LFM2.5 model gives tokenizer.has_tool_calling == False,
so mlx_lm.server's /v1/chat/completions returns tool calls as raw
<|tool_call_start|>[func(arg='val')]<|tool_call_end|> text inside
message.content, with tool_calls = [] and finish_reason = "stop".
OpenAI-compatible clients (Mastra, LangChain, ai-sdk, …) see no tool call and
the agent loop breaks.

Fix

Match either token. The pythonic parser already handles both formats — only
the auto-detection needed updating.

elif (
    "<|tool_list_start|>" in chat_template
    or "<|tool_call_start|>" in chat_template
):
    return "pythonic"

Same root cause and fix shape as the parallel llama.cpp issue
ggml-org/llama.cpp#23838,
and the same class of token-rename detection lag that prompted the Gemma 4
auto-detect addition.

Test

Added TestTokenizers.test_infer_tool_parser in tests/test_tokenizers.py
covering:

  • LFM2 original template (<|tool_list_start|>) — still returns "pythonic"
  • LFM2.5 template (<|tool_call_start|>) — now returns "pythonic" (was None)
  • Negative path (no tool tokens, and None input)

This is a pure unit test against _infer_tool_parser, so no HF download —
fast and deterministic. Verified locally that the new test fails on main
without the fix and passes with it.

End-to-end verified against the real LiquidAI/LFM2.5-8B-A1B-MLX-8bit
tokenizer:

>>> from mlx_lm.utils import load_tokenizer
>>> tok = load_tokenizer("LiquidAI/LFM2.5-8B-A1B-MLX-8bit")
>>> tok.has_tool_calling
True
>>> tok.tool_call_start, tok.tool_call_end
('<|tool_call_start|>', '<|tool_call_end|>')

And via mlx_lm.server /v1/chat/completions with a tools payload — after
the fix, the response now correctly returns:

{
  "finish_reason": "tool_calls",
  "message": {
    "tool_calls": [{
      "function": {"name": "...", "arguments": "{\"query\": \"...\"}"},
      "type": "function",
      "id": "..."
    }]
  }
}

All existing tests/test_tool_parsing.py and tests/test_tokenizers.py tests
still pass; ran black (25.1.0) and isort --profile=black per
CONTRIBUTING.md.

LFM2.5 renamed the chat-template delimiters from <|tool_list_start|>/
<|tool_list_end|> (original LFM2) to <|tool_call_start|>/<|tool_call_end|>,
but _infer_tool_parser() only matched the old token. Loading any LFM2.5
checkpoint (e.g. LiquidAI/LFM2.5-8B-A1B, -8B-A1B-MLX-8bit, -1.2B-Thinking)
left tokenizer.has_tool_calling False, so the server returned tool calls as
raw <|tool_call_start|>[...]<|tool_call_end|> text in message.content with
tool_calls = [] — clients see no tool call and the agent loop breaks.

The pythonic parser already handles both formats correctly; only the
auto-detection needed updating.

Added a unit test for _infer_tool_parser covering both LFM2 and LFM2.5
template snippets plus the negative path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@nastya236 nastya236 added the bug Something isn't working label Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants