Skip to content

feat(tool_calling): parse Llama-3-style {"name","parameters"} JSON content#1153

Open
aeyeopsdev wants to merge 2 commits into
jundot:mainfrom
AeyeOps:feat-tool-calling-llama3-json-format
Open

feat(tool_calling): parse Llama-3-style {"name","parameters"} JSON content#1153
aeyeopsdev wants to merge 2 commits into
jundot:mainfrom
AeyeOps:feat-tool-calling-llama3-json-format

Conversation

@aeyeopsdev
Copy link
Copy Markdown

Symptom

mlx-community/Llama-4-Scout-17B-16E-Instruct-4bit returns its tool calls as plain content rather than as structured tool_calls:

{
  "role": "assistant",
  "content": "{\"name\": \"get_weather\", \"parameters\": {\"location\": \"Paris\"}}",
  "tool_calls": null,
  "finish_reason": "stop"
}

OpenAI-compatible clients see an empty message.tool_calls and miss the call. The model is doing exactly what its chat template asks of it.

Root cause

The model's chat template (Meta's official Llama-4 template, chat_template.json) explicitly instructs:

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

So Llama-4-Scout-Instruct emits the call as a JSON object inside content, not wrapped in <tool_call>...</tool_call> or any tokenizer marker. omlx's parse_tool_calls tries, in order:

  1. Native parser via tokenizer.tool_parser — skipped (no tool_call_start).
  2. <tool_call> XML tags — not present.
  3. <ns:tool_call> namespaced tags — not present.
  4. [Calling tool: …] bracket format — not present.
  5. Marker-stripping fallback — does nothing for content without markers.

The Llama-3-style JSON format had no extractor, so the call ends up in content and is lost.

Fix

Adds a final fallback in parse_tool_calls (right before the marker-stripping pass) that:

  • Matches {"name": "<func>", "parameters": {...}} (with arguments accepted as a synonym for OpenAI parity) via a brace-aware regex.
  • Validates <func> against the request's tools list using the existing _extract_tool_names helper — this is what prevents arbitrary user JSON like {"name": "...", "parameters": {...}} in unrelated contexts from being mis-classified as tool calls.
  • Serializes captured arguments through _serialize_tool_call_arguments for the same OpenAI-shaped output other extractors produce.
  • Strips the matched span from cleaned_text so the JSON doesn't leak alongside tool_calls.

The fallback is gated on tools being present — a tool-less request never goes near it.

Verification

  • 6 new unit tests in tests/test_tool_calling.py::TestParseToolCallsLlama3JsonContent:

    • happy path → tool_calls populated, content cleared
    • arguments synonym accepted
    • surrounding prose preserved (not over-stripped)
    • unknown function name → no promotion (security guard)
    • no tools list → no promotion
    • malformed parameter JSON → safe fall-through (no exception)
  • Full pytest tests/test_tool_calling.py -m "not slow": 159 passed, no regressions.

  • Live integration: Llama-4-Scout-17B-16E-Instruct-4bit chat completion with a get_weather tool now returns:

    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_84918517",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"Paris\"}"
        }
      }]
    }

Trade-offs / scope

  • The fallback only activates when no other parser claimed the response — same priority as _parse_bracket_tool_calls. No behavior change for any model that already has a marker.
  • Doesn't try to handle every possible Llama-3-derivative format (e.g. multi-call responses, <|python_tag|>{"name":...} framing). One call per response, gated on tools, matches the canonical Meta template wording. Glad to extend if you have repro cases.
  • Pairs naturally with the Llama-4 ChunkedKVCache patch but is independent — this one is useful for any future model that follows the same Llama-3 chat-template pattern.

Thanks for the work that's gone into oMLX's tool-calling layer — the parser registry and the tool_call_start / tool_call_end abstraction made this fallback easy to slot in cleanly. Hope the project keeps growing.

(Reposted from #1151 with corrected commit authorship.)

aeyeopsdev added 2 commits May 9, 2026 17:14
…ntent

`Llama-4-Scout-17B-16E-Instruct-4bit`'s official chat template explicitly
instructs the model to "Respond in the format {"name": function name,
"parameters": dictionary of argument name and its value}" instead of
emitting a marker tag like `<tool_call>...</tool_call>`. omlx's
`parse_tool_calls` had no fallback for this Llama-3-style content format,
so the response landed in `message.content` as a JSON string and never
populated `message.tool_calls`, breaking OpenAI-compatible clients.

This adds a final fallback in `parse_tool_calls` that:

  - matches `{"name": "<func>", "parameters": {...}}` (and `arguments`
    as a synonym) inside content via a brace-aware regex,
  - validates `<func>` against the request's `tools` list using the
    existing `_extract_tool_names` helper,
  - serializes the captured arguments through
    `_serialize_tool_call_arguments` for OpenAI tool-call parity,
  - strips the matched span from `cleaned_text` so it doesn't leak
    back to clients alongside `tool_calls`.

The `tools` guard prevents arbitrary JSON content (e.g. a user echoing
`{"name": "...", "parameters": {...}}` for unrelated reasons) from being
mis-classified as a tool call.

Verified locally: `Llama-4-Scout-17B-16E-Instruct-4bit` chat completion
with a `get_weather` tool now returns `tool_calls=[{"function": {"name":
"get_weather", "arguments": "{\"location\": \"Paris\"}"}}]` and an
empty `content`.
The earlier regex-based fallback for Llama-3-style `{"name","parameters"}`
content used `\{(?:[^{}]|\{[^{}]*\})*\}` for the inner parameters value
— that pattern only handles one level of nested braces. Real tool
schemas with two or more levels (e.g. filter expressions like
`{"filter": {"date": {"gte": "..."}}}`) silently failed to match
and the call leaked into `message.content` instead of being promoted
to `tool_calls`.

Switch to `json.JSONDecoder().raw_decode` scanning `{` candidate
positions. `raw_decode` handles any valid JSON depth, returns the
parsed object plus the end offset for clean text-strip, and rejects
malformed JSON as cleanly as the old try/except did. Behavior on the
six existing tests is unchanged.

Also adds `logger.debug` when the parser skips an object whose
`name` field doesn't match the request's tools list — silently
dropping a tool-call-shaped object was the right thing (security
guard against arbitrary user JSON) but operators had no signal that
a near-miss happened. Debug level keeps it quiet in normal
operation.

Adds a regression test
`test_llama3_json_content_handles_deeply_nested_parameters` covering
the two-level-deep case the old regex missed. Existing six tests
unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant