-
Notifications
You must be signed in to change notification settings - Fork 313
Add support for blackwell architecture (sm120) #735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add support for blackwell architecture (sm120) #735
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I'm just a bit concerned on bumping CUDA from 12.2 to 12.9 just for supporting Blackwell 🤔
@@ -1,4 +1,4 @@ | |||
FROM nvidia/cuda:12.2.0-devel-ubuntu22.04 AS base-builder | |||
FROM nvidia/cuda:12.9.0-devel-ubuntu22.04 AS base-builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is bumping CUDA required? As it might eventually be a breaking change for instances running on older versions of NVIDIA as 12.2, 12.4 and 12.6; besides that everything LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alvarobartt CUDA 12.8 is required to support GPUs like the 5080 and 5090, we can potentially downgrade to 12.8 and it should still work (I can test) however I don't think it would help too much.
I understand that it might be a problem, however CUDA 12.2 is 2 years (July 2023) old and it would need to be upgraded at some point.
What if we the cuda 12.9 is used with a :129-1.x docker image tag? It doesn't feel the right solution but it wouldn't break any backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm fair enough, I then think we maybe just create Dockerfile-cuda-blackwell
in the meantime with CUDA 12.8, whilst keeping the rest of the changes, just adding that to the CI and making sure we build with a different CUDA version for Blackwell, and eventually for TEI v1.9.0 we can think about bumping CUDA from 12.2 to 12.6.
In any case, I guess that given how recent Blackwell is it makes sense to be isolated for the moment to not break anything, but ideally all those should be under the same Dockerfile in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to test with CUDA 12.8 to be certain there no odd surprises, I'll need to figure out which packages to swap to downgrade the CUDA version on my test hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks for the contribution @danielealbano, I'll try to test on my end too, and add it into the CI to make sure the Dockerfile-cuda-blackwell
image is built as experimental, and later on we can consider on bumping CUDA on the Dockerfile-cuda
and Dockerfile-cuda-all
images to make sure that it supports all the architectures today!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @alvarobartt, I didn't get the chance to do the test yet, I will try over the weekend. However, I was wondering, if we can just stick with 12.9 taking into account this is going to be an ad-hoc blackwell build.
What does this PR do?
This PR adds support for the Blackwell architecture, related to issue #652.
As I wanted to run TEI on my 5090 I went through a few iterations and got it working, tested with Qwen3-Embedding-0.6B.
Before submitting
Yes
Not discussed nor approved but it's a known issue and there is a related issue already opened at #652
Documentation updated to mention the new compute cap.
insta
snapshots?I have updated the only test already in place to validate the compute cap
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.