Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
627 commits
Select commit Hold shift + click to select a range
735284e
[responsesAPI][7] Browser, Container MCP tools for non harmony models…
qandrew Dec 8, 2025
344b50d
Address comment to mergify.yml in #30117 (#30219)
ZhijianJiang Dec 8, 2025
d726a7b
[BugFix] Unblock use of LoRA with data parallel mode (#30220)
njhill Dec 8, 2025
c6df05e
[ROCm] [Fused Moe EP] Use binary expert mask for aiter fused moe kern…
ZhiweiYan-96 Dec 8, 2025
d143271
[Bugfix] fix fuse_allreduce_rms when tp =1 (#30178)
ZJY0516 Dec 8, 2025
cd00c44
[Misc] Rename TensorRT Model Optimizer to Model Optimizer (#30091)
Edwardf0t1 Dec 8, 2025
bcb6f59
[Perf] Remove sync point in vit torch sdpa attn backend (#30232)
DamonJiang777 Dec 8, 2025
9e77ffc
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. …
noooop Dec 8, 2025
408cf42
[CI] Prevents triggering of an inactive issue/PR check for forked rep…
wzshiming Dec 8, 2025
2e660c2
[Frontend] Binary embedding response does not return metadata by sett…
noooop Dec 8, 2025
77072e9
[docs] governance documents (#24801)
simon-mo Dec 8, 2025
5c2433a
Add tip for `mypy` and `markdownlint` to the pre-commit comment (#30259)
hmellor Dec 8, 2025
80433e2
[LoRA] Reduce the loading time of MoE LoRA (#30243)
jeejeelee Dec 8, 2025
eb1051f
[ROCm] Guard group quant RMS norm fusion patterns (#30239)
yeqcharlotte Dec 8, 2025
184076c
[DeepSeek v3.2] Make top-k work for any logit values. (#27568)
dcampora Dec 8, 2025
87aee9e
Add evaluate_guards option to DynamicShapesConfig (#27432)
laithsakka Dec 8, 2025
67312ca
[Misc] Split the LoRA code (#30253)
jeejeelee Dec 8, 2025
398a596
[MP executor] fix get device count for multi node of mp executor feat…
weiguihua2 Dec 8, 2025
fcd5306
Add latent MoE support (#30203)
shaharmor98 Dec 8, 2025
d1b5e7a
[TPU] Bump tpu-inference to 0.12.0 (#30221)
jcyang43 Dec 8, 2025
0d402d2
online fp8 quant with streaming weight post-processing (#29196)
vkuzo Dec 8, 2025
799804d
Bump nvshmem to 3.3.24 and fix CUDA 13 installation (#30149)
dmitry-tokarev-nv Dec 8, 2025
ae0f69b
Add SpecDec support to `selective_state_update` (#29488)
roikoren755 Dec 8, 2025
6af70e1
[ROCm][CI] Fix test_max_len.py for Rocm (#29916)
charlifu Dec 8, 2025
1fb632f
[Perf] Improve fp8 quant in mla; replace ReduceSum with ReduceScatter…
IwakuraRein Dec 8, 2025
60d1725
[Disagg] Support large batch size in proxy server and update NixlConn…
minosfuture Dec 9, 2025
f1599ca
feat(metrics): Add prefill KV compute metric excluding cached tokens …
ziliangpeng Dec 9, 2025
9d6235c
[moe] Allow disabling DP chunking (#29936)
minosfuture Dec 9, 2025
d941709
[Feature] Batch invariant: Enable `TRITON_MLA` without prefix-caching…
yewentao256 Dec 9, 2025
0ee6416
[Perf] Optimize `group_topk` kernel, 1.9% Throughput improvement, 2.1…
yewentao256 Dec 9, 2025
ae339b1
[Bugfix] Fix DeepGEMM after #29546 (#30267)
zhewenl Dec 9, 2025
7b35011
Mark qwen2_5_vl as xfail (#30283)
gmagogsfm Dec 9, 2025
e41312a
[Bugfix] Skip generation config fallback for GGUF to prevent multi-pr…
kitaekatt Dec 9, 2025
78c7503
[ROCm][CI] Skip NVIDIA-Only Prime-RL Test in AMD CI (#29420)
micah-wil Dec 9, 2025
db14f61
[ci] Refactor CI file structure (#29343)
khluu Dec 9, 2025
ea657f2
Lora MoE Align Improvements (#29257)
gnovack Dec 9, 2025
f6227c2
[Kernel]Support W4A8 Grouped GEMM on Hopper (#29691)
czhu-cohere Dec 9, 2025
03b91f7
[Bugfix] Fix compressed-tensors models failing to load with transform…
mgoin Dec 9, 2025
4c6fd25
kv_transfer: Rename the shared storage connectors (#30201)
orozery Dec 9, 2025
4b03b50
update torchao safetensors impl (#30155)
liangel-02 Dec 9, 2025
e130845
[CPU][CI] Enable fused MoE tests in Arm CI (#30132)
fadara01 Dec 9, 2025
c2e1987
[Doc] update Intel GPU MM status in Feature x Hardware matrix (#30294)
faaany Dec 9, 2025
58d5b3f
[Model][Quantization] Restore MoE + GGUF models support (incl. Qwen3 …
a4lg Dec 9, 2025
e4605d2
[Misc] Fix safetensors import for safe_open (#30300)
hyongtao-code Dec 9, 2025
aed8469
[Attention] Make `split_decodes_and_prefills(..., require_uniform=Tru…
LucasWilkinson Dec 9, 2025
aeb82b1
[CI] Fix Flaky test_eagle_max_len Test (#30306)
micah-wil Dec 9, 2025
9c32df6
[Bugfix] Qwen 3 VL Embedding loading (#30303)
noooop Dec 9, 2025
67475a6
[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA…
FENP Dec 9, 2025
c72ea10
[Structured Output][Reasoning] Improves decoding throughput for model…
hdlj-h Dec 9, 2025
03416ea
[bugfix][quantization] Fix fp8 per_tensor scale shape (#30257)
haoyangli0109 Dec 9, 2025
1166c31
[Bugfix]: Fix glm46 awq marlin moe wna16 compatibility (#30210)
baonudesifeizhai Dec 9, 2025
ee14644
[ROCm] Aiter Quant Kernels (#25552)
vllmellm Dec 9, 2025
5c213d2
[BUGFIX] Mistral tool call parser v11+ (#30332)
juliendenize Dec 9, 2025
5dcd593
[Feature] Batch-Invariant Support for FA2 and LoRA (#30018)
quanliu1991 Dec 9, 2025
56037df
[BugFix] Fix `assert batch_descriptor.num_tokens == num_tokens_padde…
LucasWilkinson Dec 9, 2025
83319b4
[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 m…
yewentao256 Dec 9, 2025
804e346
Update AMD test definitions (2025-12-08) (#30298)
Alexei-V-Ivanov-AMD Dec 9, 2025
0b6a8a3
[BugFix] Fix non detected failing tests (#30277)
ilmarkov Dec 9, 2025
9e6562a
[Model Runner V2] Fix Triton warning on tl.where (#30355)
WoosukKwon Dec 9, 2025
d471b2a
[Model Runner V2] Support num NaNs in logits (#30187)
WoosukKwon Dec 9, 2025
e858bfe
[Cleanup] Refactor profiling env vars into a CLI config (#29912)
benchislett Dec 9, 2025
95501a7
[BugFix] Fix DeepSeek-R1 hang with DP and MTP (#30119)
LucasWilkinson Dec 9, 2025
b37bf51
[CI/Test] Fix FP8 per-tensor quant test reference scale shape (#30352)
LucasWilkinson Dec 9, 2025
73a484c
[Model][Quantization] Fix / Add GGUF support for Qwen2 MoE models (#3…
a4lg Dec 9, 2025
7cab92f
Bump actions/checkout from 6.0.0 to 6.0.1 (#30233)
dependabot[bot] Dec 9, 2025
f8dacc6
Bump actions/stale from 10.1.0 to 10.1.1 (#30234)
dependabot[bot] Dec 9, 2025
7618dc9
[CI/Build] Make test_mha_attn.py run on correct platform only and che…
rasmith Dec 9, 2025
00e5cbb
[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply (#2…
bnellnm Dec 9, 2025
fccd532
[Quantization] FP8 Weight Reloading for Quantized RL Rollout (#28480)
kylesayrs Dec 9, 2025
3c680f4
[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + f…
charlifu Dec 9, 2025
2e7054d
Improve wvsplitK tile and balance heristics. (#29937)
amd-hhashemi Dec 9, 2025
03b5f94
[V1][Spec Decode] Optimize Medusa proposer to avoid GPU-CPU sync (#29…
dongbo910220 Dec 10, 2025
4c2e10e
[Bugfix] Fix cuda graph sizes when running with speculative decoding …
PatrykSaffer Dec 10, 2025
2e7035d
[Bugfix] Fix fp8 DeepGemm compilation issues (#30336)
ElizaWszola Dec 10, 2025
abe93bc
[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to …
LucasWilkinson Dec 10, 2025
c3487ac
[responsesAPI][6] Fix multi turn MCP tokenization (#30230)
qandrew Dec 10, 2025
b75f826
[CI/Build][AMD] Skip quantization kernels tests that require CUTLASS …
rasmith Dec 10, 2025
7d80c73
[CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_a…
micah-wil Dec 10, 2025
0646239
[bugfix][quantization] fix quark qwen3 kv_cache quantization (#30308)
haoyangli0109 Dec 10, 2025
3bdd426
Fix typos in comments across multiple files (#30345)
wilsonwu Dec 10, 2025
d007387
[Bugfix] Cache added_vocab to avoid per-token overhead (#30351)
scratch-ml Dec 10, 2025
1803458
[CMake][Build]: Remove unused ACL CMake env variables (#30339)
Radu2k Dec 10, 2025
ed7af31
[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e th…
AndreasKaratzas Dec 10, 2025
434ac76
[cpu][ci] Add CPU Attention Tests for Neon Backend (#30347)
fadara01 Dec 10, 2025
9db78f3
[Bugfix] Fix the issue where DeepSeek v3.2 cannot use structured_outp…
chaunceyjiang Dec 10, 2025
53d2420
[Bugfix] tpu_model_runner: set vllm config context when calling reset…
dtrifiro Dec 10, 2025
cebda2a
[CPU] Support for Whisper (#30062)
aditew01 Dec 10, 2025
d017bce
[BugFix] Fix minimax m2 model rotary_dim (#30384)
rogeryoungh Dec 10, 2025
c756fb6
[Core] Whisper enable `FULL_DECODE_ONLY` CudaGraph (#30072)
NickLucche Dec 10, 2025
aacf0ab
[BugFix] Fix `AttributeError: 'MergedColumnParallelLinear' object has…
LucasWilkinson Dec 10, 2025
2dcbac9
[Docs] Generate full list of metrics in user docs (#30388)
markmc Dec 10, 2025
794a787
[Misc] Consistent case for `vllm bench serve` results (#30403)
MatthewBonanni Dec 10, 2025
253305d
[Chore] Delay recent deprecations (#30398)
DarkLight1337 Dec 10, 2025
e8e8cd7
[Bugfix] Fix HunyuanOCR cross-image contamination in batch processing…
anker-c2 Dec 10, 2025
a9e4106
[P/D] KV Load Failure Recovery/Abort Configuration (#26813)
wseaton Dec 10, 2025
e72d65b
{Deprecation] Remove tokenizer setter (#30400)
DarkLight1337 Dec 10, 2025
9f042ba
[Perf] Enable environment cache in EngineCore to enable the feature f…
Jialin Dec 10, 2025
eea4180
[bug] Fix "Current vLLM config is not set." warnings when FlashInfer …
nvpohanh Dec 10, 2025
6ccb7ba
[LMCache] Fix breakage due to new LMCache version (#30216)
njhill Dec 10, 2025
fcb8942
[Docs] Update EPLB docs (#30426)
mgoin Dec 10, 2025
b9e0951
[docs] Improve wide-EP performance + benchmarking documentation (#27933)
eicherseiji Dec 10, 2025
166ac3c
fix(shm): Add memory barriers for cross-process shared memory visibil…
kitaekatt Dec 10, 2025
8580919
[Bugfix] fix confusing OOM errors during v1 init (#28051)
shivampr Dec 10, 2025
25221b4
Add more docs for regex (#30106)
xu-song Dec 11, 2025
b4054c8
Revert "[CI] Add Async Eplb nightly CI tests (#29385)" (#30431)
SageMoore Dec 11, 2025
b51255f
[ROCm] Fix broken import in platform attention backend dispatching (#…
AndreasKaratzas Dec 11, 2025
d1e1fb4
[Bugfix] Fix grouped_topk pytorch impl when num_experts can't be grou…
divakar-amd Dec 11, 2025
5a87d8b
[Deprecation] Remove deprecated plugin and compilation fields for v0.…
DarkLight1337 Dec 11, 2025
7e24e5d
[Deprecation] Remove deprecated task, seed and MM settings (#30397)
DarkLight1337 Dec 11, 2025
d6464f2
[Chore] Fix torch precision warning (#30428)
yewentao256 Dec 11, 2025
1a51655
[Doc] Add Baidu Kunlun XPU support (#30455)
xyDong0223 Dec 11, 2025
36c9ce2
Ensure minimum frames for GLM 4.6V compatibility (#30285)
gh-wf Dec 11, 2025
979f50e
[Deprecation] Remove fallbacks for `embed_input_ids` and `embed_multi…
DarkLight1337 Dec 11, 2025
d02d104
fix: enhance human_readable_int function (#30337)
andyxning Dec 11, 2025
fba8906
[perf] Use direct copy (broadcast) instead of cat for k_nope/k_pe in …
minosfuture Dec 11, 2025
6299628
[bugfix] fix MiniMaxM2ReasoningParser streaming output not separating…
JaviS-Rei Dec 11, 2025
b4e8b91
[Fix]fix import error from lmcache (#30376)
wz1qqx Dec 11, 2025
13d63b6
[Deprecation] Remove missed fallback for `embed_input_ids` (#30469)
DarkLight1337 Dec 11, 2025
4515eb1
[Fix] Update lazing loading of video loader backend (#30444)
jeremyteboul Dec 11, 2025
a5f9fb5
[Deprecation] Deprecation `--convert reward`, use `--convert embed` i…
noooop Dec 11, 2025
d917747
[Bugfix] Fix `task` still being passed in tests/benchmarks (#30476)
DarkLight1337 Dec 11, 2025
853611b
Fix typo of endpoint name in CLI args docs (#30473)
kmaehashi Dec 11, 2025
a11f4a8
[Misc][PCP&DCP] relocate PCP feature check (#30050)
pisceskkk Dec 11, 2025
f4417f8
[KVConnector] Add KV events to KV Connectors (#28309)
hickeyma Dec 11, 2025
3a3b06e
[Misc] Improve error message for `is_multimodal` (#30483)
DarkLight1337 Dec 11, 2025
97a042f
Make the `httpx` logger less annoying when Transformers v5 is install…
hmellor Dec 11, 2025
17cb540
[Docs][CPU Backend] Add nightly and per revision pre-built Arm CPU wh…
ioghiban Dec 11, 2025
93db325
Give pooling examples better names (#30488)
hmellor Dec 11, 2025
305b168
[CI] refine more logic when generating and using nightly wheels & ind…
Harry-Chen Dec 11, 2025
aa3c250
[IMPROVEMENT] Change MistralReasoningParser behavior (#30391)
juliendenize Dec 11, 2025
8781cd6
Add Eagle and Eagle3 support to Transformers modeling backend (#30340)
hmellor Dec 11, 2025
0e71eaa
[Feature] AWQ marlin quantization support for fused moe with lora (#3…
princepride Dec 11, 2025
72aaac5
[ROCm][Bugfix] Add MLACommonMetadata to allowed attention types for s…
AndreasKaratzas Dec 11, 2025
e458270
[Misc] Add mcp to requirements (#30474)
yeqcharlotte Dec 11, 2025
92fea56
[compile] Stop one-off setting enable_aot_compile and use context man…
zhxchen17 Dec 11, 2025
cf3eacf
Standardise `get_rope` to use `rope_parameters["partial_rotary_factor…
hmellor Dec 11, 2025
90d6cf9
[BugFix][MM]support VLLM_RANDOMIZE_DP_DUMMY_INPUTS (#30472)
charlotte12l Dec 11, 2025
0efd9f8
[Core] Whisper Enable Encoder Batching (#29421)
NickLucche Dec 11, 2025
3efdc3f
[Docs][CPU backend] Add pre-built Arm CPU Docker images (#30491)
ioghiban Dec 11, 2025
c817b14
[Perf] Optimize deepgemm experts initialization, 3.9% TTFT improvemen…
yewentao256 Dec 11, 2025
61249b1
[Refactor] Remove useless syncwarp (#30510)
yewentao256 Dec 11, 2025
a00d889
[EPLB] Support EPLB w/ NVFP4 (#29804)
andrewbriand Dec 11, 2025
2cc5aff
[ROCM][CI] Fix AMD Examples Test Group (#30276)
Concurrensee Dec 11, 2025
d527cf0
[FIX]Patch run-cluster.sh (fix for #28328) (#30002)
evberrypi Dec 11, 2025
48661d2
[CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm…
rasmith Dec 12, 2025
0ab23c2
[fix] fix SM check for Flashinfer TRTLLM MOE (#30314)
jiahanc Dec 12, 2025
ba80926
[CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they re…
rasmith Dec 12, 2025
b5945d4
[ROCm][CI] Use mi325_4 agent pool for V1 e2e tests (#30526)
AndreasKaratzas Dec 12, 2025
042da73
[Core] Refactor `_build_attention_metadata` (#29628)
LucasWilkinson Dec 12, 2025
f355ad5
[CPU][FIX] Fix build failures on Arm CPUs with torch nightly (#30481)
fadara01 Dec 12, 2025
6a6fc41
gptq marlin quantization support for fused moe with lora (#30254)
Bhanu068 Dec 12, 2025
9f2fc16
[Bugfix][Model] Fix Afmoe rope_parameters issue (#30505)
mgoin Dec 12, 2025
947dfda
[LMCache] Relax lmcache version requirement (#30425)
njhill Dec 12, 2025
197473c
[CI/Build] Use spawn subprocess for ROCm (#30272)
rjrock Dec 12, 2025
783644e
[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficien…
AndreasKaratzas Dec 12, 2025
fe17871
[compile] Parse compile range cache keys as Range during cache loadin…
zhxchen17 Dec 12, 2025
8f8fda2
[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729)
bbrowning Dec 12, 2025
302b2c1
[CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementat…
rasmith Dec 12, 2025
f90319d
[Bugfix] Schedule failure due to wrong get_image_size_with_most_featu…
tomtomjhj Dec 12, 2025
91401c7
[Bugfix] Fix CMakeLists Environment Variable (#21804)
wu-kan Dec 12, 2025
3e41992
[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3…
LucasWilkinson Dec 12, 2025
3e34adc
[DeepSeek V3.2] Proper drop_thinking logic (#30490)
vladnosiv Dec 12, 2025
dc13c99
fix(gguf): Disable bfloat16 for GGUF on blackwell device (#30408)
kitaekatt Dec 12, 2025
09ad3b7
[Bug] Fix attention_backend arg string parsing (#30534)
mgoin Dec 12, 2025
9c0ee99
[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel (#28306)
jvlunteren Dec 12, 2025
f3237f3
[Frontend] Fixes anthropic streaming message_start usage nesting (#30…
bbartels Dec 12, 2025
d2c919d
[bugfix] fix bug when top_logprobs=0 with spec decoding (#30059)
realliujiaxu Dec 12, 2025
02a5880
[CI] Fix mypy for vllm/v1/executor (#30517)
yewentao256 Dec 12, 2025
cd7740a
[ROCm] Enable Triton ScaledMM fallback + kernel selection fix (#26668)
shivampr Dec 12, 2025
1f19d8f
[Perf] Set split_k to 1 for triton_kernels (#30528)
xyang16 Dec 12, 2025
9693dd0
[CI/Build] Add x86 CPU wheel release pipeline (#28848)
bigPYJ1151 Dec 12, 2025
6ec0d8d
[Fix]Load kv-cache dtype from hf_quant_config.json automatically (#29…
danielafrimi Dec 12, 2025
1361862
[MoE-FP8-modelopt] Add FlashInfer alignment padding for intermediate …
danielafrimi Dec 12, 2025
1e6b115
[Refactor] Reduce duplicate code in `per_token_group_quant` cuda kern…
yewentao256 Dec 12, 2025
b4039c0
[ci] Mark PrimeRL integration test as soft fail (#30578)
khluu Dec 12, 2025
08f8a56
[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use …
rasmith Dec 12, 2025
86a3261
[Bugfix] Pass FA version in `MultiHeadAttention` (#30575)
MatthewBonanni Dec 13, 2025
fc01194
Add IBM and Red Hat to compute resources sponsors (#30581)
mgoin Dec 13, 2025
f5dfbbd
[Docs] Remove references to `VLLM_ATTENTION_BACKEND` (#30564)
MatthewBonanni Dec 13, 2025
2f32a68
[CI] Update several models in registry that are available online now …
mgoin Dec 13, 2025
57e9bf1
[CI] Whisper logprobs tests (#30504)
NickLucche Dec 13, 2025
4fa7ce4
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)
LopezCastroRoberto Dec 13, 2025
fdc135d
[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight mat…
a4lg Dec 13, 2025
b09806e
[Bugfix] Dictionary MM embeddings for online chat (#30507)
DarkLight1337 Dec 13, 2025
1cec5b7
[Scheduer] Simplify stop checking for pooling models (#30591)
njhill Dec 13, 2025
64251f4
[Chore] Adjust tokenizer import to avoid circular imports (#30601)
DarkLight1337 Dec 13, 2025
e5db3e2
[CI/Build] Fix broken mm processor test Mistral-3-large (#30597)
Isotr0py Dec 13, 2025
ace34e3
[Bugfix] Qwen3-next with --hf-overrides \{\"num_hidden_layers\":8\} …
heheda12345 Dec 13, 2025
39cefbd
[Refactor] `TokenizerRegistry` only uses lazy imports (#30609)
DarkLight1337 Dec 13, 2025
763963a
set assume_32bit_indexing and pass unbacked hints (#30459)
laithsakka Dec 13, 2025
ddbfbe5
[Docs] Clarify Expert Parallel behavior for attention and MoE layers …
majiayu000 Dec 13, 2025
7c16f3f
[Doc] Add documents for multi-node distributed serving with MP backen…
Isotr0py Dec 13, 2025
6e78ed6
[Logs] Optimize startup logs 4 (#29903)
yewentao256 Dec 13, 2025
24429d5
[Doc] Add instructions for building docker image on GB300 with CUDA13…
soodoshll Dec 13, 2025
dc7fb5b
[Bug][KVConnector][Metrics] Remove a vacuous assertion breaking exter…
QierLi Dec 14, 2025
29f7d97
Improve parse_raw_prompt test cases for invalid input .v2 (#30512)
mivehk Dec 14, 2025
97f2f16
[ROCm][CI] Add "Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy T…
micah-wil Dec 14, 2025
f569c65
enable unbacked with aot_compile (#30462)
laithsakka Dec 14, 2025
dcb3119
[Chore] Remove redundant `RequestPrompt` (#30612)
DarkLight1337 Dec 14, 2025
add1b9d
[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched i…
drslark Dec 14, 2025
1a55cfa
[Doc]: fixing typos in various files (#30540)
didier-durand Dec 14, 2025
3a20450
Add AudioFlamingo3 model support (#30539)
lashahub Dec 14, 2025
3224ea9
[torch.compile] Add encoder tag for compilation (#30489)
ilmarkov Dec 14, 2025
e9add12
[Bugfix] awq_gemm: fix argument order swap (#30364)
mgehre-amd Dec 14, 2025
0608936
fix: Update json features supported by xGrammar (#30390)
johannesflommersfeld Dec 14, 2025
0bb0bae
Nvidia ModelOpt workaround for issue 28072 (#30164)
shengliangxu Dec 14, 2025
6ecc1e4
[Bugfix] fix _get_quant_method of FusedMoE for deepseekV3.2 on non-NV…
tom-zju Dec 14, 2025
a8ec486
[Misc] Add a script to benchmark compilation time (#29919)
desertfire Dec 14, 2025
5b64ac2
[Bugfix] Update get_processor_data to use get_all method (#30583)
dbotwinick Dec 14, 2025
48b8456
[Bugfix] Revert Qwen2-VL part of change in #28271 (#30542)
zifeitong Dec 14, 2025
994acec
[Bugfix] Fix fusion for VL models (#30244)
ElizaWszola Dec 14, 2025
5ccf0ef
[Bugfix] Improve error messages in ModelConfig validation (#30213)
yifant-code Dec 14, 2025
ae88aad
[Feature]Add EVS (Efficient Video Sampling) Support for Qwen3-VL (#29…
skyloevil Dec 14, 2025
add4b0c
[Bugfix][benchmarks] Fix input token calculation for rerank benchmark…
Flink-ddd Dec 14, 2025
9e33a1a
[Model][Quantization] Override HF defaults to GGUF ones (incl. Qwen3 …
a4lg Dec 14, 2025
ae2e503
[NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665…
xuechendi Dec 14, 2025
9ccbf6b
[responsesAPI]add extra body parameters (#30532)
Ri0S Dec 14, 2025
174e39e
CPU KV Offloading: Use more CUDA streams (#29013)
orozery Dec 14, 2025
e2ed238
Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatica…
robertgshaw2-redhat Dec 15, 2025
917fdae
[Log] Skip piecewise cudagraph warn when using full cudagraph (#30657)
BoyuanFeng Dec 15, 2025
738648f
[CustomOp] Support object-level enable for CustomOp (#30547)
shen-shanshan Dec 15, 2025
84e23d1
additional protection for CVE-2025-62164 (#30649)
wenqiglantz Dec 15, 2025
87b4d15
[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the…
shen-shanshan Dec 15, 2025
a524d1b
[Bugfix] Fix deepseek_v32 tokenizer_mode (#30658)
jeejeelee Dec 15, 2025
b337647
[Bugfix] Drop empty tool_calls lists to keep assistant replies in cha…
seokhyunan Dec 15, 2025
3778673
[Feat] Refactor for `parallel_config` in `FusedMoEModularKernel` (#30…
yewentao256 Dec 15, 2025
e3a1cd1
[XPU] fix Dockerfile.xpu, avoid wheel conflicts (#30662)
jikunshang Dec 15, 2025
1adeb3b
[New Model] BAGEL support (AR only) (#28439)
princepride Dec 15, 2025
3327807
typing: Add type hints to TurnMetrics class in context.py (#30552)
yurekami Dec 15, 2025
4429d93
[Model] Automatic conversion of TokenClassification model (#30666)
noooop Dec 15, 2025
e4806d9
[BugFix] Add embed_input_ids method to make QWenLMHeadModel a vllm mo…
iwzbi Dec 15, 2025
185c22b
[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allo…
NickLucche Dec 15, 2025
2a1776b
[Refactor] [2/N] Move tool parsers into the vLLM main directory (#30675)
chaunceyjiang Dec 15, 2025
ed586e7
[Refactor] [3/N] Move tool parser tests and run on CPU (#30693)
DarkLight1337 Dec 15, 2025
3f175f1
[Bugfix] Fix multimodal configuration for Qwen3VL MOE model (#30670)
maxyanghu Dec 15, 2025
d0502b4
[MoE][Refactor 1/N] Separate Online Quantization (#30627)
robertgshaw2-redhat Dec 15, 2025
855b101
[Frontend] add tools for dsv32 developer role (#30040)
yjc9696 Dec 15, 2025
17fec3a
[Bugfix] Fix missing first token in tool calls during reasoning-to-to…
mondaylord Dec 15, 2025
970713d
Remove `SkipValidation` from `ModelConfig` (#30695)
hmellor Dec 15, 2025
ec154c3
[Platform] Refactor Platform attention backend selection to avoid bre…
Isotr0py Dec 15, 2025
51e5b3e
[Bugfix] Fix ViT with FlashAttention on ROCm (#30703)
MatthewBonanni Dec 15, 2025
b2191ab
[docs][fix] Update Arm CPU vLLM wheel installation docs (#30594)
fadara01 Dec 15, 2025
a450c64
[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid…
mgoin Dec 15, 2025
60dbf7d
Update batch invariant to use attention config (#30704)
MatthewBonanni Dec 15, 2025
c01d589
[Benchmarks] `auto_tune.sh`: Use hostname variable for server request…
KevinMusgrave Dec 15, 2025
cef5daa
sync upstream
kliuae Dec 16, 2025
f43eaa4
sync up to v0.13.0
kliuae Dec 16, 2025
884cfb1
fix native silu and mul op
kliuae Dec 17, 2025
d6113b1
wrap triton a8w8 blockscale gemm as custom op
kliuae Dec 17, 2025
8392e81
fix typo
kliuae Dec 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
24 changes: 24 additions & 0 deletions .buildkite/ci_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: vllm_ci
job_dirs:
- ".buildkite/test_areas"
- ".buildkite/image_build"
run_all_patterns:
- "docker/Dockerfile"
- "CMakeLists.txt"
- "requirements/common.txt"
- "requirements/cuda.txt"
- "requirements/build.txt"
- "requirements/test.txt"
- "setup.py"
- "csrc/"
- "cmake/"
run_all_exclude_patterns:
- "docker/Dockerfile."
- "csrc/cpu/"
- "csrc/rocm/"
- "cmake/hipify.py"
- "cmake/cpu_extension.cmake"
registries: public.ecr.aws/q9t5s3a7
repositories:
main: "vllm-ci-postmerge-repo"
premerge: "vllm-ci-test-repo"
46 changes: 0 additions & 46 deletions .buildkite/generate_index.py

This file was deleted.

56 changes: 56 additions & 0 deletions .buildkite/image_build/image_build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash
set -e

if [[ $# -lt 8 ]]; then
echo "Usage: $0 <registry> <repo> <commit> <branch> <vllm_use_precompiled> <vllm_merge_base_commit> <cache_from> <cache_to>"
exit 1
fi

REGISTRY=$1
REPO=$2
BUILDKITE_COMMIT=$3
BRANCH=$4
VLLM_USE_PRECOMPILED=$5
VLLM_MERGE_BASE_COMMIT=$6
CACHE_FROM=$7
CACHE_TO=$8

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 936637512419.dkr.ecr.us-east-1.amazonaws.com

# docker buildx
docker buildx create --name vllm-builder --driver docker-container --use
docker buildx inspect --bootstrap
docker buildx ls

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
exit 0
fi

if [[ "${VLLM_USE_PRECOMPILED:-0}" == "1" ]]; then
merge_base_commit_build_args="--build-arg VLLM_MERGE_BASE_COMMIT=${VLLM_MERGE_BASE_COMMIT}"
else
merge_base_commit_build_args=""
fi

# build
docker buildx build --file docker/Dockerfile \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--build-arg USE_SCCACHE=1 \
--build-arg TORCH_CUDA_ARCH_LIST="8.0 8.9 9.0 10.0" \
--build-arg FI_TORCH_CUDA_ARCH_LIST="8.0 8.9 9.0a 10.0a" \
--build-arg VLLM_USE_PRECOMPILED="${VLLM_USE_PRECOMPILED:-0}" \
${merge_base_commit_build_args} \
--cache-from type=registry,ref=${CACHE_FROM},mode=max \
--cache-to type=registry,ref=${CACHE_TO},mode=max \
--tag ${REGISTRY}/${REPO}:${BUILDKITE_COMMIT} \
$( [[ "${BRANCH}" == "main" ]] && echo "--tag ${REGISTRY}/${REPO}:latest" ) \
--push \
--target test \
--progress plain .
57 changes: 57 additions & 0 deletions .buildkite/image_build/image_build.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
group: Abuild
steps:
- label: ":docker: Build image"
key: image-build
depends_on: []
commands:
- .buildkite/image_build/image_build.sh $REGISTRY $REPO $BUILDKITE_COMMIT $BRANCH $VLLM_USE_PRECOMPILED $VLLM_MERGE_BASE_COMMIT $CACHE_FROM $CACHE_TO
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 2
- exit_status: -10 # Agent was lost
limit: 2

- label: ":docker: Build CPU image"
key: image-build-cpu
depends_on: []
commands:
- .buildkite/image_build/image_build_cpu.sh $REGISTRY $REPO $BUILDKITE_COMMIT
env:
DOCKER_BUILDKIT: "1"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 2
- exit_status: -10 # Agent was lost
limit: 2

- label: ":docker: Build HPU image"
soft_fail: true
depends_on: []
key: image-build-hpu
commands:
- .buildkite/image_build/image_build_hpu.sh $REGISTRY $REPO $BUILDKITE_COMMIT
env:
DOCKER_BUILDKIT: "1"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 2
- exit_status: -10 # Agent was lost
limit: 2

- label: ":docker: Build CPU arm64 image"
key: cpu-arm64-image-build
depends_on: []
optional: true
commands:
- .buildkite/image_build/image_build_cpu_arm64.sh $REGISTRY $REPO $BUILDKITE_COMMIT
env:
DOCKER_BUILDKIT: "1"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 2
- exit_status: -10 # Agent was lost
limit: 2
36 changes: 36 additions & 0 deletions .buildkite/image_build/image_build_cpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash
set -e

if [[ $# -lt 3 ]]; then
echo "Usage: $0 <registry> <repo> <commit>"
exit 1
fi

REGISTRY=$1
REPO=$2
BUILDKITE_COMMIT=$3

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
exit 0
fi

# build
docker build --file docker/Dockerfile.cpu \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--build-arg VLLM_CPU_AVX512BF16=true \
--build-arg VLLM_CPU_AVX512VNNI=true \
--build-arg VLLM_CPU_AMXBF16=true \
--tag $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu \
--target vllm-test \
--progress plain .

# push
docker push $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu
33 changes: 33 additions & 0 deletions .buildkite/image_build/image_build_cpu_arm64.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash
set -e

if [[ $# -lt 3 ]]; then
echo "Usage: $0 <registry> <repo> <commit>"
exit 1
fi

REGISTRY=$1
REPO=$2
BUILDKITE_COMMIT=$3

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
exit 0
fi

# build
docker build --file docker/Dockerfile.cpu \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--tag $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu \
--target vllm-test \
--progress plain .

# push
docker push $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu
34 changes: 34 additions & 0 deletions .buildkite/image_build/image_build_hpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash
set -e

if [[ $# -lt 3 ]]; then
echo "Usage: $0 <registry> <repo> <commit>"
exit 1
fi

REGISTRY=$1
REPO=$2
BUILDKITE_COMMIT=$3

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT-hpu) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
exit 0
fi

# build
docker build \
--file tests/pytorch_ci_hud_benchmark/Dockerfile.hpu \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--tag $REGISTRY/$REPO:$BUILDKITE_COMMIT-hpu \
--progress plain \
https://github.com/vllm-project/vllm-gaudi.git

# push
docker push $REGISTRY/$REPO:$BUILDKITE_COMMIT-hpu
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ tasks:
value: 0.80
limit: 250 # will run on 250 * 14 subjects = 3500 samples
num_fewshot: 5
rtol: 0.05
1 change: 1 addition & 0 deletions .buildkite/lm-eval-harness/configs/models-large-rocm.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml
Loading