Skip to content

Add MiniCPM5 XML tool-call parser#1317

Open
scrappylabsai wants to merge 1 commit into
ml-explore:mainfrom
scrappylabsai:feat/minicpm5-tool-parser
Open

Add MiniCPM5 XML tool-call parser#1317
scrappylabsai wants to merge 1 commit into
ml-explore:mainfrom
scrappylabsai:feat/minicpm5-tool-parser

Conversation

@scrappylabsai
Copy link
Copy Markdown

Summary

Adds a tool-call parser for openbmb/MiniCPM5-1B (and its -MLX, -GGUF, -SFT variants).

The model emits tool calls in XML — not JSON — so generic parsers don't pick it up, and mlx_lm.server currently warns:

WARNING - Received tools but model does not support tool calling.

The MiniCPM5 wire format:

<function name="get_weather">
  <param name="city">Tokyo</param>
  <param name="date">2024-06-27</param>
</function>

Multi-line / special-character values may be wrapped in <![CDATA[...]]>. Attributes can be single- or double-quoted.

Implementation

  • mlx_lm/tool_parsers/minicpm5.py — stateless parse_tool_call(text, tools) returning a single {"name", "arguments"} dict; raises ValueError for the ToolCallFormatter to log + drop.
  • Schema-aware: rejects unknown functions, unknown params, duplicate params, missing required params, and <param> tags without a name= attribute.
  • Type coercion: strings pass through; other types try strict JSON, then ast.literal_eval, then fall back to raw.
  • Auto-detected by <function name= + <param name= in the chat template.

Logic ported from SGLang's MiniCPM5Detector.

Tests

tests/test_tool_parsing.py:

  • Added MiniCPM5 cases to the existing multi-parser sweep (numeric args + string args).
  • New test_minicpm5 covering: CDATA multi-line params, non-string typed params (array, boolean), body-only segment form, missing-required validation, unknown-param validation, duplicate-param validation, missing-name-attr validation, single-quoted attributes.
Ran 6 tests in 0.001s
OK

Smoke test

Live tested against openbmb/MiniCPM5-1B-MLX running under mlx_lm.server:

Single call:

{
  "finish_reason": "tool_calls",
  "tool_calls": [{
    "function": {
      "name": "get_weather",
      "arguments": "{\"city\": \"Tokyo\", \"date\": \"2024-06-27\"}"
    },
    "type": "function",
    "id": "..."
  }]
}

Two consecutive <function>...</function> blocks both extracted cleanly via the state machine (one segment per pair).

Test plan

  • CI: existing test_tool_parsing.py continues to pass; new test_minicpm5 passes.
  • Manual: mlx_lm.server --model openbmb/MiniCPM5-1B-MLX accepts tools=[...] requests without warning and returns tool_calls in the response.

openbmb/MiniCPM5-1B emits tool calls in XML, not JSON:

  <function name="get_weather">
    <param name="city">Tokyo</param>
    <param name="date">2024-06-27</param>
  </function>

The mlx_lm.server warning "Received tools but model does not
support tool calling" fires for this model because no parser
recognizes the format. This adds one.

Validates schema: rejects unknown functions, unknown params,
duplicate params, missing required params, and <param> tags
without a name= attribute. Supports CDATA-wrapped multi-line
values and single/double-quoted attributes.

Auto-detected by `<function name=` + `<param name=` markers in
the chat template.

Ported from SGLang's MiniCPM5Detector (sgl-project/sglang#25600).
Smoke-tested end-to-end against openbmb/MiniCPM5-1B-MLX: single
and multi-call requests return finish_reason=tool_calls with
correctly typed arguments.
@loserbcc loserbcc mentioned this pull request May 28, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant