-
Notifications
You must be signed in to change notification settings - Fork 5
feat(vllm-tensorizer): Optimize Multi-Stage Build for Slimmer Inference Image #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
f12a01f
284ec59
b2b48e5
97f150a
e1e1b28
f238cc3
0282ebd
b726efb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,13 @@ | ||
ARG BASE_IMAGE="ghcr.io/coreweave/ml-containers/torch-extras:es-compute-12.0-67208ca-nccl-cuda12.9.0-ubuntu22.04-nccl2.27.3-1-torch2.7.1-vision0.22.1-audio2.7.1-abi1" | ||
ARG BASE_IMAGE="ghcr.io/coreweave/ml-containers/torch-extras:es-cuda-12.9.1-74755e9-nccl-cuda12.9.1-ubuntu22.04-nccl2.27.5-1-torch2.7.1-vision0.22.1-audio2.7.1-abi1" | ||
ARG LEAN_BASE_IMAGE="ghcr.io/coreweave/ml-containers/torch-extras:es-cuda-12.9.1-74755e9-base-cuda12.9.1-ubuntu22.04-torch2.7.1-vision0.22.1-audio2.7.1-abi1" | ||
|
||
FROM scratch AS freezer | ||
WORKDIR / | ||
COPY --chmod=755 freeze.sh / | ||
|
||
FROM ${BASE_IMAGE} AS builder-base | ||
|
||
ARG MAX_JOBS="16" | ||
ARG MAX_JOBS="32" | ||
|
||
RUN ldconfig | ||
|
||
|
@@ -81,7 +83,7 @@ RUN --mount=type=bind,from=flashinfer-downloader,source=/git/flashinfer,target=/ | |
WORKDIR /wheels | ||
|
||
|
||
FROM ${BASE_IMAGE} AS base | ||
FROM ${LEAN_BASE_IMAGE} AS base | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tell me if this is a dumb question, but if this is merging to main, and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This PR is intended to replace all builds with slimmed down builds, so yes, this replaces stuff. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Risking sounding pedantic here but why don't we just call it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what you mean. There are two base images because one is used for compiling vLLM and one is used for the final image artifact being produced. The one used for compilation will be larger because it includes the compiler and dev libraries. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just wondering if NCCL libraries remain or not on the final image, since it'll be good to still be able to do vLLM distributed inference. I'm not solid on the specifics here as to what exact deps are needed to do distributed inference -- just wanting to make sure this final image can still do inference with model parallelism now that that |
||
WORKDIR /workspace | ||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.