Skip to content

fix(parsing): schema-aware tool-arg coercion for XML-style parsers#28

Merged
hallerite merged 5 commits into
mainfrom
test/xml-tool-arg-type-preservation
May 13, 2026
Merged

fix(parsing): schema-aware tool-arg coercion for XML-style parsers#28
hallerite merged 5 commits into
mainfrom
test/xml-tool-arg-type-preservation

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 13, 2026

Summary

XML-style chat templates (Qwen3.5, GLM-5/4.5, MiniMax-M2, Laguna) render tool-call argument values verbatim inside <arg_value>X</arg_value> (or <parameter>X</parameter>) tags with no quoting. A value of true and the string "true" produce identical wire bytes — without the tool schema, the parser has no signal to choose between them and defaults to json.loads, silently corrupting string args that happen to look like JSON. This PR plumbs the schema in and preserves the type.

Raised by Robin (Poolside) on #21.

Changes

API change — additive, backwards-compatible:

class Renderer(Protocol):
    def parse_response(
        self,
        token_ids: list[int],
        *,
        tools: list[ToolSpec] | None = None,   # ← new
    ) -> ParsedResponse: ...

When tools=None (the historical default), behavior is unchanged. When tools is supplied, the four XML-style parsers consult each parameter's declared JSON-schema type and preserve declared-string params verbatim. Matches vLLM / SGLang reference parsers (glm45_tool_parser.py, hermes_tool_parser.py).

Helpers added to renderers/parsing.py:

  • _build_param_type_index(tools) — accepts either flat ToolSpec or OpenAI envelope {"type":"function","function":{…}}, returns {tool_name: {param_name: schema_fragment}}.
  • _coerce_arg_value(text, schema) -> (value, used_json_fallback) — declared-string → text verbatim; anything else → try json.loads, fall back to text. The bool lets callers flag INVALID_JSON only on genuine parse failures (not on schema-driven string preservation).

Parsers updated (with tools plumbing): parse_qwen35, parse_glm, parse_minimax, parse_laguna_xs2.

Parsers that accept-but-ignore tools (their wire formats quote strings, so schema isn't needed): Qwen3 hermes, Qwen3-VL, DeepSeek-V3, Kimi K2, Kimi K2.5, Nemotron3, gpt-oss harmony, Default.

Client (renderers/client.py): generate() already accepted tools; now forwards it through to parse_response so HTTP callers get schema-aware parsing automatically.

Downstream callers

Anything outside this repo that calls renderer.parse_response(token_ids) needs a one-line update to opt into schema-aware parsing:

parsed = renderer.parse_response(completion_ids, tools=tools)

Existing callers keep working unchanged (default tools=None preserves the historical behavior).

Test outcome

Model Parser shape Before After
Qwen/Qwen3-8B hermes JSON ✅ 5/5 ✅ 5/5
moonshotai/Kimi-K2-Instruct section JSON ✅ 5/5 ✅ 5/5
Qwen/Qwen3.5-9B XML ❌ 0/5 ✅ 5/5
zai-org/GLM-5 XML ❌ 0/5 ✅ 5/5
MiniMaxAI/MiniMax-M2.5 XML ❌ 0/5 ✅ 5/5
poolside/Laguna-XS.2 XML ❌ 0/5 ✅ 5/5

Test plan

  • uv run pytest tests/test_tool_arg_type_preservation.py → 30 passed
  • uv run pytest tests/ (full suite) → 1067 passed, 49 skipped, 1 xfailed, 0 failed
  • uv run ruff check renderers/ tests/ clean
  • uv run ruff format --check renderers/ tests/ clean

🤖 Generated with Claude Code

hallerite and others added 4 commits May 13, 2026 13:51
Every XML-style tool parser (Qwen3.5, GLM, MiniMax) currently
``json.loads`` each ``<arg_value>`` body, so a string argument that
happens to look like JSON ("true", "42", "[1,2,3]") round-trips as a
bool / int / list. The hermes-JSON (Qwen3) and section-JSON (Kimi K2)
parsers sidestep this because their wire format quotes strings; both
serve as controls.

These tests fail loudly against current main — 15 failures across the
three XML parsers, 10 passes on the controls. They are the spec for the
fix, not documentation of accepted behavior; CI stays red until
``parse_response`` becomes schema-aware (Robin PR #21). Laguna-XS.2 has
the same bug and should be added to the matrix when PR #21 merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
LagunaXS2Renderer landed in #21; its parser has the same string-type
corruption as the other XML parsers (5/5 cases fail). Count is now 20
failed, 10 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
XML-style chat templates (Qwen3.5, GLM-5/4.5, MiniMax-M2, Laguna) render
tool-call argument values verbatim inside ``<arg_value>X</arg_value>``
(or ``<parameter>X</parameter>``) tags with no quoting. A value of
``true`` and the string ``"true"`` produce identical wire bytes; without
the tool schema, the parser has no signal to choose between them and
defaults to ``json.loads`` — silently corrupting string args that look
like JSON.

This adds ``tools: list[ToolSpec] | None = None`` to ``parse_response``
on the ``Renderer`` Protocol and every concrete renderer. When supplied,
the four XML-style parsers (``parse_qwen35``, ``parse_glm``,
``parse_minimax``, ``parse_laguna_xs2``) consult each parameter's
declared JSON-schema ``type`` and preserve declared-string params
verbatim. Without ``tools``, behavior is unchanged.

Two new helpers in ``parsing.py``:

- ``_build_param_type_index`` — accepts either the flat ``ToolSpec``
  shape or the OpenAI ``{"type":"function","function":{...}}`` envelope
  and returns ``{tool_name: {param_name: schema_fragment}}``.
- ``_coerce_arg_value`` — returns ``(value, used_json_fallback)``; the
  bool is True only when ``json.loads`` was tried and raised, so the
  ``INVALID_JSON`` status fires only for genuine parse failures, not
  for schema-driven string preservation.

The JSON-shaped parsers (Qwen3 hermes, Qwen3-VL, DeepSeek-V3, Kimi K2,
Kimi K2.5, Nemotron3, gpt-oss harmony, Default) sidestep this bug
because their wire format quotes strings; they accept the ``tools``
kwarg for API uniformity but ignore it.

Matches the reference behavior of vLLM / SGLang's
``glm45_tool_parser.py`` and ``hermes_tool_parser.py``.

Raised by Robin (Poolside) on PR #21.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@hallerite hallerite changed the title test: pin tool-arg string-type corruption in XML-style parsers fix(parsing): schema-aware tool-arg coercion for XML-style parsers May 13, 2026
The in-repo SGLang HTTP client already had ``tools`` on its ``generate``
signature; plumb it through to the renderer's ``parse_response`` so
XML-style parsers can use the schema-aware coercion path. Updates the
``_FakeRenderer`` test double to accept the kwarg and adds an assertion
that the client actually forwards it.

Downstream callers (e.g. verifiers) need the matching change on their
side to opt into schema-aware parsing.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@hallerite hallerite force-pushed the test/xml-tool-arg-type-preservation branch from 0a68898 to 1186398 Compare May 13, 2026 14:36
@hallerite hallerite merged commit 4062040 into main May 13, 2026
6 checks passed
@hallerite hallerite deleted the test/xml-tool-arg-type-preservation branch May 13, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant