-
Notifications
You must be signed in to change notification settings - Fork 17
Name CUDA kernels based on stack trace in auto_launch
#2376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Added some example results, which look suspiciously different. Reminds me of Simon's comment about the CUDA API not doing much in the previous implementation that computed the name every time a kernel was created. Do we need to somehow hydrate the kernel name cache initially to get a useful result here? |
Are these profile results from the first step, or a later step? The kernel renaming example takes significantly longer, which makes me think |
|
|
|
I just tried using this in ClimaLand, and it worked. This is so much easier than dev`ing GPUCompiler.jl |
ext/cuda/cuda_utils.jl
Outdated
| kernel_name = nothing | ||
| if name_kernels_from_stack_trace() | ||
| # Create a key from the method instance and types of the args | ||
| key = objectid(methodinstance(typeof(f!), typeof(args))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| key = objectid(methodinstance(typeof(f!), typeof(args))) | |
| key = objectid(methodinstance(F!, typeof(args))) |
The function signature could also be changed to
function auto_launch!(
f!::F!,
args::ARGS,
nitems::Union{Integer, Nothing} = nothing;
auto = false,
threads_s = nothing,
blocks_s = nothing,
always_inline = true,
caller = :unknown,
) where {F!, ARGS}
.
.
.
key = objectid(methodinstance(F!, ARGS))
but I think that forced specialization of the method (I'm not sure if that is desirable or not here)
Both of these approaches—redefining a method and changing an environment variables—involve recompiling ClimaCore's CUDA extension to activate this feature, since the feature is disabled in the default version of ClimaCore we precompile for the CI depot. The difference is in how this recompilation is achieved:
Since starting a new Julia session requires recompiling the Base library, the environment variable approach can end up needing more recompilation than the other approach. So, in the interest of minimizing total compilation time, we typically use method redefinitions to activate optional package features. On the other hand, if your setup requires you to start a new Julia session anyway, it doesn't really matter which option you choose. Also, the environment variable approach I described assumes that your flag needs to be a compile-time constant; if you're okay with performing an environment lookup every time you access the flag, then the environment variable approach will probably be simpler to implement. |
|
Thanks @dennisYatunin. What do you think of the current implementation with the |
|
I was finally able to get the JET tests to pass while retaining the ability to set this via environmental variable. It's not pretty, but it saves a bit of code downstream... |
Does this allow a user to turn renaming on/off while a repl is active, and without restarting it? |
Yep, but you'll need to load the extension and redefine the method if you're in an active session: ext = Base.get_extension(ClimaCore, :ClimaCoreCUDAExt)
ext.name_kernels_from_stack_trace() = false |
imreddyTeja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make documentation/a guide on how to use this at some point. As this is a developer tool, it's probably fine to merge without docs. Could you please squash the commits before merging?
60fba14 to
6ef215a
Compare


This PR adds stack trace-based CUDA kernel naming to make performance benchmarks more informative. Kernel names are computed once per session and cached in a module-level Dict.
TODO:
Example results from ClimaAtmos Buildkite
From here