fix(parsing): schema-aware tool-arg coercion for XML-style parsers#28
Merged
Conversation
Every XML-style tool parser (Qwen3.5, GLM, MiniMax) currently
``json.loads`` each ``<arg_value>`` body, so a string argument that
happens to look like JSON ("true", "42", "[1,2,3]") round-trips as a
bool / int / list. The hermes-JSON (Qwen3) and section-JSON (Kimi K2)
parsers sidestep this because their wire format quotes strings; both
serve as controls.
These tests fail loudly against current main — 15 failures across the
three XML parsers, 10 passes on the controls. They are the spec for the
fix, not documentation of accepted behavior; CI stays red until
``parse_response`` becomes schema-aware (Robin PR #21). Laguna-XS.2 has
the same bug and should be added to the matrix when PR #21 merges.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
LagunaXS2Renderer landed in #21; its parser has the same string-type corruption as the other XML parsers (5/5 cases fail). Count is now 20 failed, 10 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
XML-style chat templates (Qwen3.5, GLM-5/4.5, MiniMax-M2, Laguna) render
tool-call argument values verbatim inside ``<arg_value>X</arg_value>``
(or ``<parameter>X</parameter>``) tags with no quoting. A value of
``true`` and the string ``"true"`` produce identical wire bytes; without
the tool schema, the parser has no signal to choose between them and
defaults to ``json.loads`` — silently corrupting string args that look
like JSON.
This adds ``tools: list[ToolSpec] | None = None`` to ``parse_response``
on the ``Renderer`` Protocol and every concrete renderer. When supplied,
the four XML-style parsers (``parse_qwen35``, ``parse_glm``,
``parse_minimax``, ``parse_laguna_xs2``) consult each parameter's
declared JSON-schema ``type`` and preserve declared-string params
verbatim. Without ``tools``, behavior is unchanged.
Two new helpers in ``parsing.py``:
- ``_build_param_type_index`` — accepts either the flat ``ToolSpec``
shape or the OpenAI ``{"type":"function","function":{...}}`` envelope
and returns ``{tool_name: {param_name: schema_fragment}}``.
- ``_coerce_arg_value`` — returns ``(value, used_json_fallback)``; the
bool is True only when ``json.loads`` was tried and raised, so the
``INVALID_JSON`` status fires only for genuine parse failures, not
for schema-driven string preservation.
The JSON-shaped parsers (Qwen3 hermes, Qwen3-VL, DeepSeek-V3, Kimi K2,
Kimi K2.5, Nemotron3, gpt-oss harmony, Default) sidestep this bug
because their wire format quotes strings; they accept the ``tools``
kwarg for API uniformity but ignore it.
Matches the reference behavior of vLLM / SGLang's
``glm45_tool_parser.py`` and ``hermes_tool_parser.py``.
Raised by Robin (Poolside) on PR #21.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The in-repo SGLang HTTP client already had ``tools`` on its ``generate`` signature; plumb it through to the renderer's ``parse_response`` so XML-style parsers can use the schema-aware coercion path. Updates the ``_FakeRenderer`` test double to accept the kwarg and adds an assertion that the client actually forwards it. Downstream callers (e.g. verifiers) need the matching change on their side to opt into schema-aware parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
0a68898 to
1186398
Compare
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
XML-style chat templates (Qwen3.5, GLM-5/4.5, MiniMax-M2, Laguna) render tool-call argument values verbatim inside
<arg_value>X</arg_value>(or<parameter>X</parameter>) tags with no quoting. A value oftrueand the string"true"produce identical wire bytes — without the tool schema, the parser has no signal to choose between them and defaults tojson.loads, silently corrupting string args that happen to look like JSON. This PR plumbs the schema in and preserves the type.Raised by Robin (Poolside) on #21.
Changes
API change — additive, backwards-compatible:
When
tools=None(the historical default), behavior is unchanged. Whentoolsis supplied, the four XML-style parsers consult each parameter's declared JSON-schematypeand preserve declared-stringparams verbatim. Matches vLLM / SGLang reference parsers (glm45_tool_parser.py,hermes_tool_parser.py).Helpers added to
renderers/parsing.py:_build_param_type_index(tools)— accepts either flatToolSpecor OpenAI envelope{"type":"function","function":{…}}, returns{tool_name: {param_name: schema_fragment}}._coerce_arg_value(text, schema) -> (value, used_json_fallback)— declared-string → text verbatim; anything else → tryjson.loads, fall back to text. The bool lets callers flagINVALID_JSONonly on genuine parse failures (not on schema-driven string preservation).Parsers updated (with
toolsplumbing):parse_qwen35,parse_glm,parse_minimax,parse_laguna_xs2.Parsers that accept-but-ignore
tools(their wire formats quote strings, so schema isn't needed): Qwen3 hermes, Qwen3-VL, DeepSeek-V3, Kimi K2, Kimi K2.5, Nemotron3, gpt-oss harmony, Default.Client (
renderers/client.py):generate()already acceptedtools; now forwards it through toparse_responseso HTTP callers get schema-aware parsing automatically.Downstream callers
Anything outside this repo that calls
renderer.parse_response(token_ids)needs a one-line update to opt into schema-aware parsing:Existing callers keep working unchanged (default
tools=Nonepreserves the historical behavior).Test outcome
Qwen/Qwen3-8Bmoonshotai/Kimi-K2-InstructQwen/Qwen3.5-9Bzai-org/GLM-5MiniMaxAI/MiniMax-M2.5poolside/Laguna-XS.2Test plan
uv run pytest tests/test_tool_arg_type_preservation.py→ 30 passeduv run pytest tests/(full suite) → 1067 passed, 49 skipped, 1 xfailed, 0 faileduv run ruff check renderers/ tests/cleanuv run ruff format --check renderers/ tests/clean🤖 Generated with Claude Code