-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorder Jacobian calculation #707
Conversation
The computational benefit of using this PR is around 10% for some functions and we decide not to pursue it further since the failing tests are unclear to us and the benefit is not a lot. |
@mattldawson sorry I am wrong here. This PR is crucial to achieving the impressing performance from nvhpc and gcc compilers. I would like to see if we can resolve the failed integration tests and merge in this PR later. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #707 +/- ##
==========================================
- Coverage 94.50% 94.48% -0.02%
==========================================
Files 64 64
Lines 4384 4406 +22
==========================================
+ Hits 4143 4163 +20
- Misses 241 243 +2 ☔ View full report in Codecov by Sentry. |
src/process/process_set.cu
Outdated
cudaMallocAsync( | ||
&(devstruct.jacobian_process_info_), | ||
jacobian_process_info_bytes, | ||
micm::cuda::CudaStreamSingleton::GetInstance().GetCudaStream(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we save micm::cuda::CudaStreamSingleton::GetInstance().GetCudaStream(0)
to a local variable and simplify the interface here?
src/process/process_set.cu
Outdated
@@ -266,12 +325,26 @@ namespace micm | |||
if (devstruct.yields_ != nullptr) | |||
CHECK_CUDA_ERROR( | |||
cudaFreeAsync(devstruct.yields_, micm::cuda::CudaStreamSingleton::GetInstance().GetCudaStream(0)), "cudaFree"); | |||
if (devstruct.jacobian_process_info_ != nullptr) | |||
CHECK_CUDA_ERROR( | |||
cudaFreeAsync(devstruct.jacobian_process_info_, micm::cuda::CudaStreamSingleton::GetInstance().GetCudaStream(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we save micm::cuda::CudaStreamSingleton::GetInstance().GetCudaStream(0)
to a local variable and simplify the interface here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mattldawson for working on this PR. I could confirm that with your recent fix, the CUDA performance is the same as the main branch now.
Somehow the nvhpc performance drops a lot due to this PR on Derecho, but we may investigate it separately unless you see another obvious improvement quickly.
Only have a minor comment about the iterator.
include/micm/process/process_set.hpp
Outdated
const std::size_t idx_state_variables = offset_state + (react_id[i_react] * L); | ||
auto v_state_variables_it = v_state_variables.begin() + idx_state_variables; | ||
for (std::size_t i_cell = 0; i_cell < L; ++i_cell) | ||
d_rate_d_ind[i_cell] *= *(v_state_variables_it++); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use iterator for d_rate_d_ind
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
include/micm/process/process_set.hpp
Outdated
{ | ||
auto v_jacobian_it = v_jacobian.begin() + offset_jacobian + *flat_id; | ||
for (std::size_t i_cell = 0; i_cell < L; ++i_cell) | ||
*(v_jacobian_it++) += d_rate_d_ind[i_cell]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use iterator for d_rate_d_ind
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
include/micm/process/process_set.hpp
Outdated
auto v_jacobian_it = v_jacobian.begin() + offset_jacobian + *flat_id; | ||
auto yield_value = yield[i_dep]; | ||
for (std::size_t i_cell = 0; i_cell < L; ++i_cell) | ||
*(v_jacobian_it++) -= yield_value * d_rate_d_ind[i_cell]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use iterator for d_rate_d_ind
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
Reorders the Jacobian calculation to be column-wise instead of random.
A few minor changes were made to tolerances in the analytical tests. Also, some of the tutorial examples were taking a long time, so I modified the rate constant parameters
closes #720