Skip to content

chore: relaunch vllm 0.20.1 bump#2448

Merged
samsja merged 1 commit into
mainfrom
chore/relaunch-vllm-0.20
May 8, 2026
Merged

chore: relaunch vllm 0.20.1 bump#2448
samsja merged 1 commit into
mainfrom
chore/relaunch-vllm-0.20

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented May 8, 2026

Re-applies #2427 (vllm 0.20.1 bump) after revert in #2437.

Why

What

  • Reverts commit bb522cc ("Revert 'bump vllm version (bump vllm version #2427)'"), which restores:
    • vllm>=0.20.1
    • flash-attn wheel pinned to +cu128torch2.11
    • torchvision / torchaudio deps via pytorch-cu128
    • removal of monkey_patch_fused_moe_lora_dp (fixed in vLLM 0.20 via #40338)
    • removal of monkey_patch_offloading_connector_cpu_block_count (fixed in vLLM 0.20 via #39617)
  • monkey_patch_fp32_lm_head (feat(inference): fp32 lm_head via native bf16xbf16 -> fp32 mm (alt to #2438) #2441) is preserved as-is — auto-merge handled the overlap with the patches removed by the bump.

🤖 Generated with Claude Code


Note

Medium Risk
Upgrades core inference dependencies (vLLM/PyTorch/Flash-Attn), which can change runtime behavior and performance across serving and distributed execution despite minimal application-code changes.

Overview
Updates the inference dependency stack to vLLM >=0.20.1 and aligns CUDA 12.8 wheels by adding torchvision/torchaudio from the pytorch-cu128 index and updating the pinned flash-attn wheel to the +cu128torch2.11 build.

Simplifies local vLLM patching by removing monkey patches that are now fixed upstream (the LoRA+MoE+DP corruption workaround and the offloading connector CPU block count fix), and adjusts comments/config (exclude-newer-package now unblocks vllm).

Reviewed by Cursor Bugbot for commit 464bea4. Bugbot is set up for automated code reviews on this repo. Configure here.

@samsja samsja marked this pull request as ready for review May 8, 2026 16:58
@samsja samsja merged commit 09e403b into main May 8, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant