feat(vllm-tensorizer): Update `vllm-tensorizer` cloned repository, build with `vllm-flash-attn`, other optimizations #72

sangstar · 2024-06-17T15:13:37Z

vllm-tensorizer hasn't had updates since vLLM's formal adoption of tensorizer model loading. An update to build for the most recent commit to vLLM that includes sharded tensorizer support is presented, along with some fixes to successfully build vLLM with recent updates to the source code. These include:

Building vLLM's wheel from vLLM's source code proper, rather than CoreWeave's vLLM fork (reflecting its official adoption of tensorizer)
vLLM's adoption of cmake
Updated xformers version to 0.0.26.post1
vLLM now formally using their own forked version of flash-attn, which is built here from source

[skip ci]

Commits reverted: 543ecab cdf4bce 58e2f0d 6d6eb24 3e57495 104b454 e3e0326 7d4ebd4

github-actions · 2025-02-13T22:05:46Z

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13316885649
Image: ``

:P

github-actions · 2025-02-13T23:11:12Z

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13318197054
Image: ``

[skip ci]

github-actions · 2025-02-14T00:27:04Z

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13319222913
Image: ``

github-actions · 2025-02-18T18:50:23Z

@sangstar Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13397061488
Image: ``

github-actions · 2025-03-26T15:22:26Z

@sangstar Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/14085967310
Image: ``

… constraints

github-actions · 2025-05-06T17:59:44Z

@sangstar Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/14864987607
Image: ``

… version validation

github-actions · 2025-05-07T20:06:29Z

@sangstar Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/14890901257
Image: ``

sangstar requested a review from Eta0 June 17, 2024 15:13

harubaru mentioned this pull request Jul 31, 2024

LLM Finetuner Cleanup coreweave/kubernetes-cloud#376

Open

Eta0 added 28 commits November 4, 2024 14:48

feat(torch): Update PyTorch to v2.5.1 & update CUDA 12.6

9a5990d

feat(torch): Update torch:nccl base images

eec6daa

ci(torch): Change torch:nccl matrix build layout

e411a3d

ci: Update action versions

8612038

ci: Use remote BuildKit worker & new runners

f58c927

ci(torch-nightly): Only filter specific fields from configs' include

a3f1d78

ci(torch-nightly): Change del() syntax in yq filter

e6bb688

ci(torch-nightly): Treat include as an array in yq filter

4fe66e2

ci(torch-nightly): Exclude ubuntu20.04 from torch-nightly builds

1c31940

ci(torch-nightly): Filter out entire include key for torch:base

092a837

ci: Build for multiple architectures

3448307

[skip ci]

ci: Build only for linux/amd64

3b6b17d

[skip ci]

fix(torch): Include a post-v2.5.1 bugfix patch when building PyTorch

35c959b

Merge branch 'es/torch-updates' into es/actions

0c5c0f0

feat(torch): Parameterize compiler_wrapper.f95

cdafd6b

[skip ci]

fix(torch): Enable preprocessor when compiling compiler_wrapper.f95

e0bd93a

[skip ci]

build(torch): Make the build process less architecture-dependent

074edd5

[skip ci]

fix(torch): Use cmake after installing it instead of before

dc97cd6

[skip ci]

build(torch): Allow customizing -march with --build-args

85f3b0e

[skip ci]

build(torch): Allow customizing MAX_JOBS as a build arg

68121aa

[skip ci]

build(torch): Don't apply custom MAX_JOBS to flash-attn build

d35022b

[skip ci]

build(torch): Line-buffer grep output when building PyTorch

8305f7f

[skip ci]

build(torch): Filter more output when building PyTorch

daa5a4b

[skip ci]

build(torch): Allow customizing TransformerEngine build arches

c940ad4

feat(torch): Add flash-attn 3 beta

863fca5

[skip ci]

build(torch): Filter even more output when building PyTorch

b4ce2da

build(torch-extras): Configure compiler_wrapper.f95 parameters

87c22ff

build(torch-extras): Allow overriding MAX_JOBS and NVCC_APPEND_FLAGS

9cb44f4

[skip ci]

sangstar and others added 11 commits February 13, 2025 12:16

fix: Rm cuda-nvprof

104b454

fix: Make sure cmake is installed

e3e0326

fix: Try to log build fail from cmake

7d4ebd4

fix(vllm-tensorizer): Revert several commits

48e3b73

Commits reverted: 543ecab cdf4bce 58e2f0d 6d6eb24 3e57495 104b454 e3e0326 7d4ebd4

fix(vllm-tensorizer): Install lsmod

52f47b8

feat(vllm-tensorizer): Enable more CUDA libraries

e1656cc

fix(vllm-tensorizer): Set CUDA as the target device, add py-cpuinfo

08db082

fix(vllm-tensorizer): Switch vLLM commit to 5095e96

949a7fa

fix(vllm-tensorizer): Remove obsolete vllm-flash-attn build

d455673

build(vllm-tensorizer): Lower MAX_JOBS during the vLLM build

63788f1

build(vllm-tensorizer): Hardcode MAX_JOBS during the vLLM build

63b8ced

feat(vllm-tensorizer): Include tensorizer

031a45a

:P

sangstar and others added 3 commits February 13, 2025 18:23

fix: Rm VLLM_FLASH_ATTN build arg

2d8163b

fix: Rm workflow dispatch for vllm-flash-attn

19aa40c

feat(vllm-tensorizer): Backport use_existing_torch.py for old vLLM ver

ce41316

[skip ci]

sangstar added 6 commits May 5, 2025 14:00

fix: Lower MAX_JOBS to prevent segfaults, set base version to 0.8.3

4da14f7

chore: Attempt to validate CUDA versions with Torch

3df38d4

chore: Remove nvidia-smi requirement for validate_versions.sh

663ac33

chore: Update setuptools, setuptools-scm versions and print free mem

ae905fb

chore: Change base image to newer

da8109d

chore: Increase MAX_JOBS due to segfaulting not being due to memory…

3a1178e

… constraints

sangstar added 2 commits May 7, 2025 11:46

chore: Change base image to not use nccl for a smaller size; remove…

8d09c5b

… version validation

chore: Add cuda-nvrtc-dev as vLLM indicates it needs this

7d29c19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(vllm-tensorizer): Update `vllm-tensorizer` cloned repository, build with `vllm-flash-attn`, other optimizations #72

feat(vllm-tensorizer): Update `vllm-tensorizer` cloned repository, build with `vllm-flash-attn`, other optimizations #72

Uh oh!

sangstar commented Jun 17, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Feb 13, 2025

Uh oh!

github-actions bot commented Feb 13, 2025

Uh oh!

github-actions bot commented Feb 14, 2025

Uh oh!

github-actions bot commented Feb 18, 2025

Uh oh!

github-actions bot commented Mar 26, 2025

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

github-actions bot commented May 7, 2025

Uh oh!

Uh oh!

feat(vllm-tensorizer): Update vllm-tensorizer cloned repository, build with vllm-flash-attn, other optimizations #72

Are you sure you want to change the base?

feat(vllm-tensorizer): Update vllm-tensorizer cloned repository, build with vllm-flash-attn, other optimizations #72

Uh oh!

Conversation

sangstar commented Jun 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 13, 2025

Uh oh!

github-actions bot commented Feb 13, 2025

Uh oh!

github-actions bot commented Feb 14, 2025

Uh oh!

github-actions bot commented Feb 18, 2025

Uh oh!

github-actions bot commented Mar 26, 2025

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

github-actions bot commented May 7, 2025

Uh oh!

Uh oh!

feat(vllm-tensorizer): Update `vllm-tensorizer` cloned repository, build with `vllm-flash-attn`, other optimizations #72

feat(vllm-tensorizer): Update `vllm-tensorizer` cloned repository, build with `vllm-flash-attn`, other optimizations #72

sangstar commented Jun 17, 2024 •

edited

Loading