-
Notifications
You must be signed in to change notification settings - Fork 5
feat(vllm-tensorizer): Update vllm-tensorizer
cloned repository, build with vllm-flash-attn
, other optimizations
#72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[skip ci]
[skip ci]
[skip ci]
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13316885649 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13318197054 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13319222913 |
@sangstar Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13397061488 |
@sangstar Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/14085967310 |
@sangstar Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/14864987607 |
@sangstar Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/14890901257 |
vllm-tensorizer
hasn't had updates since vLLM's formal adoption oftensorizer
model loading. An update to build for the most recent commit to vLLM that includes sharded tensorizer support is presented, along with some fixes to successfully build vLLM with recent updates to the source code. These include:tensorizer
)cmake
xformers
version to0.0.26.post1
flash-attn
, which is built here from source