feat(tool_calling): parse Llama-3-style {"name","parameters"} JSON content by aeyeopsdev · Pull Request #1153 · jundot/omlx

aeyeopsdev · 2026-05-09T21:15:51Z

Symptom

mlx-community/Llama-4-Scout-17B-16E-Instruct-4bit returns its tool calls as plain content rather than as structured tool_calls:

{
  "role": "assistant",
  "content": "{\"name\": \"get_weather\", \"parameters\": {\"location\": \"Paris\"}}",
  "tool_calls": null,
  "finish_reason": "stop"
}

OpenAI-compatible clients see an empty message.tool_calls and miss the call. The model is doing exactly what its chat template asks of it.

Root cause

The model's chat template (Meta's official Llama-4 template, chat_template.json) explicitly instructs:

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

So Llama-4-Scout-Instruct emits the call as a JSON object inside content, not wrapped in <tool_call>...</tool_call> or any tokenizer marker. omlx's parse_tool_calls tries, in order:

Native parser via tokenizer.tool_parser — skipped (no tool_call_start).
<tool_call> XML tags — not present.
<ns:tool_call> namespaced tags — not present.
[Calling tool: …] bracket format — not present.
Marker-stripping fallback — does nothing for content without markers.

The Llama-3-style JSON format had no extractor, so the call ends up in content and is lost.

Fix

Adds a final fallback in parse_tool_calls (right before the marker-stripping pass) that:

Matches {"name": "<func>", "parameters": {...}} (with arguments accepted as a synonym for OpenAI parity) via a brace-aware regex.
Validates <func> against the request's tools list using the existing _extract_tool_names helper — this is what prevents arbitrary user JSON like {"name": "...", "parameters": {...}} in unrelated contexts from being mis-classified as tool calls.
Serializes captured arguments through _serialize_tool_call_arguments for the same OpenAI-shaped output other extractors produce.
Strips the matched span from cleaned_text so the JSON doesn't leak alongside tool_calls.

The fallback is gated on tools being present — a tool-less request never goes near it.

Verification

6 new unit tests in tests/test_tool_calling.py::TestParseToolCallsLlama3JsonContent:
- happy path → tool_calls populated, content cleared
- arguments synonym accepted
- surrounding prose preserved (not over-stripped)
- unknown function name → no promotion (security guard)
- no tools list → no promotion
- malformed parameter JSON → safe fall-through (no exception)
Full pytest tests/test_tool_calling.py -m "not slow": 159 passed, no regressions.

Live integration: Llama-4-Scout-17B-16E-Instruct-4bit chat completion with a get_weather tool now returns:

{
  "role": "assistant",
  "content": null,
  "tool_calls": [{
    "id": "call_84918517",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"Paris\"}"
    }
  }]
}

Trade-offs / scope

The fallback only activates when no other parser claimed the response — same priority as _parse_bracket_tool_calls. No behavior change for any model that already has a marker.
Doesn't try to handle every possible Llama-3-derivative format (e.g. multi-call responses, <|python_tag|>{"name":...} framing). One call per response, gated on tools, matches the canonical Meta template wording. Glad to extend if you have repro cases.
Pairs naturally with the Llama-4 ChunkedKVCache patch but is independent — this one is useful for any future model that follows the same Llama-3 chat-template pattern.

Thanks for the work that's gone into oMLX's tool-calling layer — the parser registry and the tool_call_start / tool_call_end abstraction made this fallback easy to slot in cleanly. Hope the project keeps growing.

(Reposted from #1151 with corrected commit authorship.)

…ntent `Llama-4-Scout-17B-16E-Instruct-4bit`'s official chat template explicitly instructs the model to "Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}" instead of emitting a marker tag like `<tool_call>...</tool_call>`. omlx's `parse_tool_calls` had no fallback for this Llama-3-style content format, so the response landed in `message.content` as a JSON string and never populated `message.tool_calls`, breaking OpenAI-compatible clients. This adds a final fallback in `parse_tool_calls` that: - matches `{"name": "<func>", "parameters": {...}}` (and `arguments` as a synonym) inside content via a brace-aware regex, - validates `<func>` against the request's `tools` list using the existing `_extract_tool_names` helper, - serializes the captured arguments through `_serialize_tool_call_arguments` for OpenAI tool-call parity, - strips the matched span from `cleaned_text` so it doesn't leak back to clients alongside `tool_calls`. The `tools` guard prevents arbitrary JSON content (e.g. a user echoing `{"name": "...", "parameters": {...}}` for unrelated reasons) from being mis-classified as a tool call. Verified locally: `Llama-4-Scout-17B-16E-Instruct-4bit` chat completion with a `get_weather` tool now returns `tool_calls=[{"function": {"name": "get_weather", "arguments": "{\"location\": \"Paris\"}"}}]` and an empty `content`.

The earlier regex-based fallback for Llama-3-style `{"name","parameters"}` content used `\{(?:[^{}]|\{[^{}]*\})*\}` for the inner parameters value — that pattern only handles one level of nested braces. Real tool schemas with two or more levels (e.g. filter expressions like `{"filter": {"date": {"gte": "..."}}}`) silently failed to match and the call leaked into `message.content` instead of being promoted to `tool_calls`. Switch to `json.JSONDecoder().raw_decode` scanning `{` candidate positions. `raw_decode` handles any valid JSON depth, returns the parsed object plus the end offset for clean text-strip, and rejects malformed JSON as cleanly as the old try/except did. Behavior on the six existing tests is unchanged. Also adds `logger.debug` when the parser skips an object whose `name` field doesn't match the request's tools list — silently dropping a tool-call-shaped object was the right thing (security guard against arbitrary user JSON) but operators had no signal that a near-miss happened. Debug level keeps it quiet in normal operation. Adds a regression test `test_llama3_json_content_handles_deeply_nested_parameters` covering the two-level-deep case the old regex missed. Existing six tests unchanged.

aeyeopsdev added 2 commits May 9, 2026 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tool_calling): parse Llama-3-style {"name","parameters"} JSON content#1153

feat(tool_calling): parse Llama-3-style {"name","parameters"} JSON content#1153
aeyeopsdev wants to merge 2 commits into
jundot:mainfrom
AeyeOps:feat-tool-calling-llama3-json-format

aeyeopsdev commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aeyeopsdev commented May 9, 2026

Symptom

Root cause

Fix

Verification

Trade-offs / scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant