Skip to content

Conversation

@daverumph
Copy link

Purpose

Use cached values to avoid doing extra work in the kernel.

Content

The changes selectively add back use of cached variables that were removed en masse by Charlie a while back. Upon profiling with NVidia's NSight tools, we now understand that some of that caching is valuable to avoid "global" (from the GPU's perspective) memory reads.


  • I have read and checked the items on the review checklist.

@daverumph daverumph self-assigned this Nov 10, 2025
@daverumph daverumph force-pushed the dr/gpu_perf/manual_sparse_jacobian_1 branch from 59848ad to 5b7e47e Compare November 10, 2025 22:38
Copy link
Member

@sajjadazimi sajjadazimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

@daverumph daverumph enabled auto-merge November 11, 2025 02:16
@szy21
Copy link
Member

szy21 commented Nov 11, 2025

The regression test breaks - is the behavior change expected?

@imreddyTeja imreddyTeja disabled auto-merge November 11, 2025 04:46
@szy21
Copy link
Member

szy21 commented Nov 11, 2025

Also could you squash the commits? Thanks!

@sajjadazimi
Copy link
Member

The results have changed because K_h, which is needed later for sgs diffusion computations, gets overwritten inside the call to the function ᶜmixing_length(Y, p) around line 700 where temp_scalar_5 is used.

daverumph and others added 3 commits November 18, 2025 14:09
Merge with changes coming from main branch

Update src/prognostic_equations/implicit/manual_sparse_jacobian.jl

Co-authored-by: Teja Reddy <[email protected]>

Update src/prognostic_equations/implicit/manual_sparse_jacobian.jl

Co-authored-by: Teja Reddy <[email protected]>

Update src/prognostic_equations/implicit/manual_sparse_jacobian.jl

Co-authored-by: Teja Reddy <[email protected]>
Hoping to address small floating point differences

Allocate and use a new scratch scalar to resolve overlapping use
@daverumph daverumph force-pushed the dr/gpu_perf/manual_sparse_jacobian_1 branch from d66c45b to 87354a9 Compare November 18, 2025 22:11
@daverumph daverumph enabled auto-merge November 20, 2025 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants