Skip to content

feat: renderer emits numpy, not torch#18

Merged
hallerite merged 1 commit into
mainfrom
feat/renderer-numpy-mm-items
May 12, 2026
Merged

feat: renderer emits numpy, not torch#18
hallerite merged 1 commit into
mainfrom
feat/renderer-numpy-mm-items

Conversation

@hallerite
Copy link
Copy Markdown
Member

Summary

HF processors return torch by default; multimodal renderers were passing those tensors through to mm_items, which leaked torch into the renderer's data model and forced downstream transport layers (verifiers — which doesn't declare torch as a dep) to handle torch tensors.

Switch all three multimodal processors (Qwen3-VL, Qwen3.5/3.6, Kimi-K2.5) to return_tensors=\"np\". mm_items[i][\"pixel_values\"] and friends now ship numpy arrays. Renderer is torch-free; downstream consumers (vLLM-glue helper in client.py, trainer in prime-rl) convert via torch.from_numpy / torch.as_tensor at their boundary, where torch is already a hard dependency.

Why

Architectural cleanup: the renderer shouldn't know about tensorization. numpy is the lowest-common-denominator format that's already a dep of every realistic consumer (HF, torch, vLLM, jax) and is natively serializable by verifiers' msgpack encoder. This lets verifiers drop its torch handling branch (companion PR opens shortly).

What changes

  • Qwen3VLRenderer._process_imagereturn_tensors=\"pt\"\"np\"
  • Qwen35Renderer._process_image — same
  • KimiK25Renderer._process_image — same
  • client._build_qwen_vl_features — wrap numpy items in torch.as_tensor before torch.cat / BatchFeature. Torch is already imported here (the vLLM payload encoder is a torch-required code path).
  • Bump 0.1.7 → 0.1.8.

Test plan

  • Full renderer suite green (980 passed, 49 skipped, 1 xfailed)
  • Multimodal byte-parity vs HF processor unchanged (compares input_ids, not pixel arrays)
  • Companion verifiers PR drops torch branch from msgpack_encoder and bumps to renderers>=0.1.8
  • prime-rl bumps renderers>=0.1.8; trainer-side decode_tensor_payload already handles the encoded wire format identically (numpy/torch source produces the same __torch_tensor__-tagged payload)

@hallerite hallerite force-pushed the feat/renderer-numpy-mm-items branch from ebc1f1b to a2aa322 Compare May 12, 2026 17:24
HF processors return torch by default; multimodal renderers were
passing those tensors through to ``mm_items``, which leaked torch
into the renderer's data model and forced downstream transport
layers (verifiers) to handle torch tensors despite not declaring
torch as a dependency.

Switch all three multimodal processors (Qwen3-VL, Qwen3.5/3.6,
Kimi-K2.5) to ``return_tensors="np"``. ``mm_items[i]["pixel_values"]``
and friends now ship numpy arrays. Renderer is torch-free; downstream
consumers (vLLM-glue helper here in client.py, trainer in prime-rl)
convert via ``torch.from_numpy`` / ``torch.as_tensor`` at their
boundary where torch is already a hard dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@hallerite hallerite force-pushed the feat/renderer-numpy-mm-items branch from a2aa322 to 4c78db5 Compare May 12, 2026 17:29
@hallerite hallerite merged commit 8d2a7d3 into main May 12, 2026
6 checks passed
@hallerite hallerite deleted the feat/renderer-numpy-mm-items branch May 12, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant