-
Notifications
You must be signed in to change notification settings - Fork 716
feat: Better cublas support in vLLM container. #4514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jthomson04 <[email protected]>
WalkthroughThe Dockerfile.vllm runtime stage has been modified to replace explicit copying of CUDA library files (libcublas.so* and libcublasLt.so*) with creation of symbolic links to versioned 12.x libraries, changing the dependency resolution strategy from file copying to symlink creation. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
container/Dockerfile.vllm (1)
220-221: Combine RUN statements to reduce layer count.Each
RUNdirective creates a new image layer. Consolidate these two symlink creations into a singleRUNstatement to reduce the Docker image footprint.-RUN ln -s /usr/local/cuda/lib64/libcublas.so.12 /usr/local/cuda/lib64/libcublas.so -RUN ln -s /usr/local/cuda/lib64/libcublasLt.so.12 /usr/local/cuda/lib64/libcublasLt.so +RUN ln -s /usr/local/cuda/lib64/libcublas.so.12 /usr/local/cuda/lib64/libcublas.so && \ + ln -s /usr/local/cuda/lib64/libcublasLt.so.12 /usr/local/cuda/lib64/libcublasLt.so
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
container/Dockerfile.vllm(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Learnt from: grahamking
Repo: ai-dynamo/dynamo PR: 1177
File: container/Dockerfile.vllm:102-105
Timestamp: 2025-05-28T22:54:46.875Z
Learning: In Dockerfiles, when appending to environment variables that may not exist in the base image, Docker validation will fail if you reference undefined variables with ${VARIABLE} syntax. In such cases, setting the environment variable directly (e.g., ENV CPATH=/usr/include) rather than appending is the appropriate approach.
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Applied to files:
container/Dockerfile.vllm
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: sglang (amd64)
- GitHub Check: trtllm (amd64)
- GitHub Check: sglang (arm64)
- GitHub Check: trtllm (arm64)
- GitHub Check: operator (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
container/Dockerfile.vllm (1)
220-221: Symlink targets are present in the CUDA runtime image.The runtime image extends the base image by adding all the shared libraries from the CUDA toolkit for use with pre-built applications using multiple CUDA libraries. The runtime flavor includes cuBLAS, and the CUDA runtime layer provides the components needed to execute CUDA applications and includes all of the shared libraries. The files
libcublas.so.12andlibcublasLt.so.12are core CUDA runtime shared libraries, not development-only files. The symlinks will successfully resolve at runtime.
Related to #4421. Revamped to just to the symlink instead of the full copy.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.