Add GPU implementation of preconditioner - v2 #31
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a CUDA-based implementation of the preconditioner module, including both Ruiz, Pock–Chambolle, and objective-bound scaling.
Main Changes
preconditioner.cuwithpreconditioner.cinitialize_solver_stateinsolver.cufor GPU preconditioner integrationImplementation Details
A[i,j] *= E[i]) without extra lookups.Next Step
Note
Reduce_bound_norm_sq_atomic currently relies on
atomicAdd(double*)for the bound-norm reduction, which requires CMAKE_CUDA_ARCHITECTURES ≥ 60.Would it be preferable to: