Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate use of bf16 (and other relatively new hardware features) #1760

Closed
naoyam opened this issue Feb 14, 2024 · 2 comments · Fixed by #1784
Closed

Validate use of bf16 (and other relatively new hardware features) #1760

naoyam opened this issue Feb 14, 2024 · 2 comments · Fixed by #1784
Assignees
Labels
ux Improving user experience

Comments

@naoyam
Copy link
Collaborator

naoyam commented Feb 14, 2024

(Originally in #1758)

The bf16 type is only supported on sm80 and later. Currently, nvFuser will just fail when we compile a generated kernel with nvrtc. While we do not care too much about pre-A100 generations, we should at least do:

  • More informative error messages rather than just nvrtc compilation failure
  • Explicitly define the required minimum architecture version

When we lower a Fusion to a Kernel, we should define the minimum required version. The default should be sm70, i.e., the V100 version. When the Kernel has bf16 vals, it should at least be sm80. Other recent architecture features such as memcpy_async and TMA should also set the minimum version accordingly.

This minimum version should then be checked before the Kernel is compiled by nvrtc. Now that we have the information about the required minimum version, we should be able to give more helpful information.

@naoyam naoyam changed the title Validate use of bf16 Validate use of bf16 (and other relatively new hardware features) Feb 14, 2024
@naoyam naoyam mentioned this issue Feb 14, 2024
3 tasks
@naoyam naoyam added the ux Improving user experience label Feb 14, 2024
@wujingyue
Copy link
Collaborator

wujingyue commented Feb 14, 2024

From @naoyam:

I initially thought when we create castOp in the Fusion IR, we should just throw an error if a cast op to bf16 is created on non-supported devices, but maybe that should not be considered a hard error as there should be nothing wrong to generate a kernel that's not supported by a given hardware. It should not be an error until it's compiled and executed.

Thanks for the note. Failing at creation doesn't sound like a bad idea at least :) Given catching the error at the compilation time is more work, what's the practical benefit of doing that? E.g. would we want to build the fusion on one machine (even without a GPU!), serialize it and ship it to a different machine to run?

@naoyam
Copy link
Collaborator Author

naoyam commented Feb 14, 2024

Failing at creation doesn't sound like a bad idea at least

No, it isn't a bad idea.

E.g. would we want to build the fusion on one machine (even without a GPU!), serialize it and ship it to a different machine to run?

Yes, that kind of things are what I have in mind. Not necessary ATM, but I feel not allowing to create such ops at all is too restrictive.

Another benefit would be we would have a single lowering pass that would do this analysis, so these architecture checks would be placed in a single location with a consistent error message. Generally speaking, when a Fusion is lowered to a Kernel, we do basic sanity checks like this. I think it's natural to have a check here for hardware features as well.

@jacobhinkle jacobhinkle self-assigned this Feb 17, 2024
jacobhinkle added a commit that referenced this issue Feb 20, 2024
This introduces a minimum device version attribute to the kernel summary
and populates it by traversing the fusion early in lowering to find ops
or data types introduced after sm70. This version constraint is checked
in FusionExecutor::compileFusion and if the target arch doesn't satisfy
the constraint then an informative error message is given. Currently
BFloat16, cp.async and cp.async.bulk are checked, as are the various MMA
macros.

Fixes #1760

---------

Co-authored-by: Jingyue Wu <[email protected]>
tfogal pushed a commit that referenced this issue Feb 20, 2024
This introduces a minimum device version attribute to the kernel summary
and populates it by traversing the fusion early in lowering to find ops
or data types introduced after sm70. This version constraint is checked
in FusionExecutor::compileFusion and if the target arch doesn't satisfy
the constraint then an informative error message is given. Currently
BFloat16, cp.async and cp.async.bulk are checked, as are the various MMA
macros.

Fixes #1760

---------

Co-authored-by: Jingyue Wu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ux Improving user experience
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants