Keep assistant tool_calls and tool messages on the MLLM chat path by waybarrios · Pull Request #611 · waybarrios/vllm-mlx

waybarrios · 2026-06-12T04:05:58Z

Summary

Fixes #608. On the MLLM path, _build_mllm_chat_messages rebuilt every message as {role, content} and dropped any message without renderable text. An OpenAI assistant message with content: null + tool_calls was removed entirely, and role:"tool" messages lost their tool_call_id. The rendered continuation prompt came out byte-identical to the fresh prompt, so agent clients looped on the same tool call forever.

What changed

All in _build_mllm_chat_messages (shared by chat() and stream_chat(), so both routes are covered with no call-site changes):

Assistant messages carrying tool_calls survive even with empty text. Arguments are normalized str -> mapping via the same normalize_messages_for_chat_template helper the batched engine path wraps, so both paths render identically. reasoning_content is forwarded when present.
role:"tool" messages keep their tool_call_id. Empty tool results (legit for action tools) survive with content: "" instead of being dropped, so template forward-scans can still pair calls to responses.
Everything else is byte-identical to before: image/audio/video part ordering, plain-text messages, and the drop rule for genuinely empty messages (verified pre/post on the same inputs).

SUPPORTS_NATIVE_TOOL_FORMAT on Gemma4ToolParser (issue suggestion 3) is intentionally left out of scope.

Tests

5 new regression tests in tests/test_mllm_message_ordering.py (written first, 3 failed on main for the exact bug reasons):

assistant {content: null, tool_calls} survives with normalized arguments
tool messages keep tool_call_id + content
empty tool results survive with their anchor
a 4-message tool exchange rebuilds strictly more messages than the fresh 3-message turn
genuinely empty messages without tool_calls are still dropped

pytest tests/test_mllm_message_ordering.py -q   -> 8 passed
pytest tests/test_mllm*.py tests/test_server.py -> 256 passed
pytest tests/ -k "not slow"                     -> 2185 passed, 0 failed
ruff check                                       -> clean

Live verification (issue repro)

Served mlx-community/Qwen3-VL-8B-Instruct-4bit with --mllm --enable-auto-tool-choice --tool-call-parser hermes, M4 Max:

Probe A (fresh: system + user + 1 tool): tool_call emitted, prompt_tokens = 166
Probe B (continuation: + assistant{content:null, tool_calls} + tool{result}): prompt_tokens = 213 > 166 (history now contributes tokens), no repeated tool_call, final answer uses the tool result: "The weather in Paris is currently 19°C with sunny conditions."

On main, probe B renders the same prompt as probe A and re-issues the identical tool call.

waybarrios · 2026-06-12T04:08:34Z

Full test logs from the branch:

Regression tests (verbose) — 8 passed

$ pytest tests/test_mllm_message_ordering.py -v
tests/test_mllm_message_ordering.py::test_mllm_chat_messages_preserve_text_only_shape PASSED [ 12%]
tests/test_mllm_message_ordering.py::test_mllm_chat_messages_preserve_image_text_order PASSED [ 25%]
tests/test_mllm_message_ordering.py::test_mllm_chat_messages_preserve_audio_text_order PASSED [ 37%]
tests/test_mllm_message_ordering.py::test_mllm_chat_messages_keep_assistant_tool_calls_without_text PASSED [ 50%]
tests/test_mllm_message_ordering.py::test_mllm_chat_messages_keep_tool_call_id_on_tool_messages PASSED [ 62%]
tests/test_mllm_message_ordering.py::test_mllm_chat_messages_keep_empty_tool_results PASSED [ 75%]
tests/test_mllm_message_ordering.py::test_mllm_chat_messages_keep_tool_exchange PASSED [ 87%]
tests/test_mllm_message_ordering.py::test_mllm_chat_messages_still_drop_empty_messages_without_tool_calls PASSED [100%]
============================== 8 passed in 0.25s ===============================

MLLM + server suites — 250 passed

$ pytest tests/test_mllm_cache.py tests/test_mllm_continuous_batching.py \
    tests/test_mllm_message_ordering.py tests/test_mllm_mtp_routing.py \
    tests/test_mllm.py tests/test_server.py -q
====================== 250 passed, 13 deselected in 5.18s ======================

Full fast suite — 2185 passed, 0 failed

$ pytest tests/ -q -k "not slow"
=============== 2185 passed, 11 skipped, 23 deselected in 30.95s ===============

$ ruff check vllm_mlx/models/mllm.py tests/test_mllm_message_ordering.py
All checks passed!

Live repro verification (Qwen3-VL-8B-Instruct-4bit, M4 Max)

$ vllm-mlx serve mlx-community/Qwen3-VL-8B-Instruct-4bit --mllm \
    --enable-auto-tool-choice --tool-call-parser hermes --port 8077

# Probe A — fresh turn (system + user + 1 tool), temperature 0
prompt_tokens: 166
tool_calls: [{"id": "call_0a25c249", "type": "function",
              "function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}}]
content: ''

# Probe B — continuation (+ assistant{content:null, tool_calls} + tool{result})
prompt_tokens: 213        # > 166: history now contributes tokens
tool_calls: null          # no repeated call — loop broken
content: 'The weather in Paris is currently 19°C with sunny conditions.'

On main, probe B renders byte-identical to probe A (166 tokens) and re-issues the same tool call.

keep assistant tool_calls and tool messages on the MLLM chat path

8419abd

apply black formatting to the new ordering tests

875404f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep assistant tool_calls and tool messages on the MLLM chat path#611

Keep assistant tool_calls and tool messages on the MLLM chat path#611
waybarrios wants to merge 2 commits into
mainfrom
fix/mllm-tool-messages

waybarrios commented Jun 12, 2026

Uh oh!

waybarrios commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waybarrios commented Jun 12, 2026

Summary

What changed

Tests

Live verification (issue repro)

Uh oh!

waybarrios commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant