-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate use of bf16 (and other relatively new hardware features) #1760
Comments
From @naoyam:
Thanks for the note. Failing at creation doesn't sound like a bad idea at least :) Given catching the error at the compilation time is more work, what's the practical benefit of doing that? E.g. would we want to build the fusion on one machine (even without a GPU!), serialize it and ship it to a different machine to run? |
No, it isn't a bad idea.
Yes, that kind of things are what I have in mind. Not necessary ATM, but I feel not allowing to create such ops at all is too restrictive. Another benefit would be we would have a single lowering pass that would do this analysis, so these architecture checks would be placed in a single location with a consistent error message. Generally speaking, when a |
This introduces a minimum device version attribute to the kernel summary and populates it by traversing the fusion early in lowering to find ops or data types introduced after sm70. This version constraint is checked in FusionExecutor::compileFusion and if the target arch doesn't satisfy the constraint then an informative error message is given. Currently BFloat16, cp.async and cp.async.bulk are checked, as are the various MMA macros. Fixes #1760 --------- Co-authored-by: Jingyue Wu <[email protected]>
This introduces a minimum device version attribute to the kernel summary and populates it by traversing the fusion early in lowering to find ops or data types introduced after sm70. This version constraint is checked in FusionExecutor::compileFusion and if the target arch doesn't satisfy the constraint then an informative error message is given. Currently BFloat16, cp.async and cp.async.bulk are checked, as are the various MMA macros. Fixes #1760 --------- Co-authored-by: Jingyue Wu <[email protected]>
(Originally in #1758)
The bf16 type is only supported on sm80 and later. Currently, nvFuser will just fail when we compile a generated kernel with nvrtc. While we do not care too much about pre-A100 generations, we should at least do:
When we lower a
Fusion
to aKernel
, we should define the minimum required version. The default should be sm70, i.e., the V100 version. When theKernel
hasbf16
vals, it should at least besm80
. Other recent architecture features such as memcpy_async and TMA should also set the minimum version accordingly.This minimum version should then be checked before the
Kernel
is compiled by nvrtc. Now that we have the information about the required minimum version, we should be able to give more helpful information.The text was updated successfully, but these errors were encountered: