feat(tool_calling): parse Llama-3-style {"name","parameters"} JSON content#1153
Open
aeyeopsdev wants to merge 2 commits into
Open
feat(tool_calling): parse Llama-3-style {"name","parameters"} JSON content#1153aeyeopsdev wants to merge 2 commits into
aeyeopsdev wants to merge 2 commits into
Conversation
…ntent
`Llama-4-Scout-17B-16E-Instruct-4bit`'s official chat template explicitly
instructs the model to "Respond in the format {"name": function name,
"parameters": dictionary of argument name and its value}" instead of
emitting a marker tag like `<tool_call>...</tool_call>`. omlx's
`parse_tool_calls` had no fallback for this Llama-3-style content format,
so the response landed in `message.content` as a JSON string and never
populated `message.tool_calls`, breaking OpenAI-compatible clients.
This adds a final fallback in `parse_tool_calls` that:
- matches `{"name": "<func>", "parameters": {...}}` (and `arguments`
as a synonym) inside content via a brace-aware regex,
- validates `<func>` against the request's `tools` list using the
existing `_extract_tool_names` helper,
- serializes the captured arguments through
`_serialize_tool_call_arguments` for OpenAI tool-call parity,
- strips the matched span from `cleaned_text` so it doesn't leak
back to clients alongside `tool_calls`.
The `tools` guard prevents arbitrary JSON content (e.g. a user echoing
`{"name": "...", "parameters": {...}}` for unrelated reasons) from being
mis-classified as a tool call.
Verified locally: `Llama-4-Scout-17B-16E-Instruct-4bit` chat completion
with a `get_weather` tool now returns `tool_calls=[{"function": {"name":
"get_weather", "arguments": "{\"location\": \"Paris\"}"}}]` and an
empty `content`.
The earlier regex-based fallback for Llama-3-style `{"name","parameters"}`
content used `\{(?:[^{}]|\{[^{}]*\})*\}` for the inner parameters value
— that pattern only handles one level of nested braces. Real tool
schemas with two or more levels (e.g. filter expressions like
`{"filter": {"date": {"gte": "..."}}}`) silently failed to match
and the call leaked into `message.content` instead of being promoted
to `tool_calls`.
Switch to `json.JSONDecoder().raw_decode` scanning `{` candidate
positions. `raw_decode` handles any valid JSON depth, returns the
parsed object plus the end offset for clean text-strip, and rejects
malformed JSON as cleanly as the old try/except did. Behavior on the
six existing tests is unchanged.
Also adds `logger.debug` when the parser skips an object whose
`name` field doesn't match the request's tools list — silently
dropping a tool-call-shaped object was the right thing (security
guard against arbitrary user JSON) but operators had no signal that
a near-miss happened. Debug level keeps it quiet in normal
operation.
Adds a regression test
`test_llama3_json_content_handles_deeply_nested_parameters` covering
the two-level-deep case the old regex missed. Existing six tests
unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
mlx-community/Llama-4-Scout-17B-16E-Instruct-4bitreturns its tool calls as plain content rather than as structuredtool_calls:{ "role": "assistant", "content": "{\"name\": \"get_weather\", \"parameters\": {\"location\": \"Paris\"}}", "tool_calls": null, "finish_reason": "stop" }OpenAI-compatible clients see an empty
message.tool_callsand miss the call. The model is doing exactly what its chat template asks of it.Root cause
The model's chat template (Meta's official Llama-4 template,
chat_template.json) explicitly instructs:So Llama-4-Scout-Instruct emits the call as a JSON object inside
content, not wrapped in<tool_call>...</tool_call>or any tokenizer marker. omlx'sparse_tool_callstries, in order:tokenizer.tool_parser— skipped (notool_call_start).<tool_call>XML tags — not present.<ns:tool_call>namespaced tags — not present.[Calling tool: …]bracket format — not present.The Llama-3-style JSON format had no extractor, so the call ends up in
contentand is lost.Fix
Adds a final fallback in
parse_tool_calls(right before the marker-stripping pass) that:{"name": "<func>", "parameters": {...}}(withargumentsaccepted as a synonym for OpenAI parity) via a brace-aware regex.<func>against the request'stoolslist using the existing_extract_tool_nameshelper — this is what prevents arbitrary user JSON like{"name": "...", "parameters": {...}}in unrelated contexts from being mis-classified as tool calls._serialize_tool_call_argumentsfor the same OpenAI-shaped output other extractors produce.cleaned_textso the JSON doesn't leak alongsidetool_calls.The fallback is gated on
toolsbeing present — a tool-less request never goes near it.Verification
6 new unit tests in
tests/test_tool_calling.py::TestParseToolCallsLlama3JsonContent:tool_callspopulated,contentclearedargumentssynonym acceptedtoolslist → no promotionFull
pytest tests/test_tool_calling.py -m "not slow": 159 passed, no regressions.Live integration:
Llama-4-Scout-17B-16E-Instruct-4bitchat completion with aget_weathertool now returns:{ "role": "assistant", "content": null, "tool_calls": [{ "id": "call_84918517", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"Paris\"}" } }] }Trade-offs / scope
_parse_bracket_tool_calls. No behavior change for any model that already has a marker.<|python_tag|>{"name":...}framing). One call per response, gated on tools, matches the canonical Meta template wording. Glad to extend if you have repro cases.ChunkedKVCachepatch but is independent — this one is useful for any future model that follows the same Llama-3 chat-template pattern.Thanks for the work that's gone into oMLX's tool-calling layer — the parser registry and the
tool_call_start/tool_call_endabstraction made this fallback easy to slot in cleanly. Hope the project keeps growing.(Reposted from #1151 with corrected commit authorship.)