Affine registration on the GPU by daljit46 · Pull Request #3258 · MRtrix3/mrtrix3

daljit46 · 2026-01-14T08:59:24Z

This work builds on top of #3238. It introduces a new C++ command called mrreggpu (chosen randomly and without much thought) that performs affine registration of 3D images on the GPU using WebGPU compute shaders. The code is completely independent of mrregister.
It's not ready to be merged and not ready for review yet. It needs much refinement, but I'm posting this PR to gather early feedback.
The utility of this command is rather limited since it only performs affine registration on scalar images, with some other limitations. The primary aim, however, is to provide a first real-world example of the GPU compute API introduced in #3238 so that we have a reference on how to use the GPU in the codebase.
This is also intended as a stepping stone towards non-linear registration on the GPU. I spent some time experimenting with SVF/SyN-style deformation using the current GPU API and I think the approach is feasible, but I'd appreciate guidance on which direction would be most useful.

mrreggpu currently supports:

3D affine image registration on scalar images (4D images are not supported).
Three metrics: NMI, SSD and NCC (global and local, using a sliding window).
Multi-contrast registration.
An interface mostly similar to mrregister, with minor differences (not yet sure these are justified).
A symmetric registration strategy. Unlike mrregister, it doesn't register both images into an average space. Instead it registers in both directions and then uses Lie algebra averaging to compute the transform for the next step (see here.
An optimiser based on a slightly enhanced version of Adam. I also experimented with other optimisers (e.g. Levenberg–Marquardt), but stuck with Adam for simplicity (affine registration on the GPU is fast enough). The optimiser runs on the CPU while the update step is on the GPU. I also experimented with running everything on the GPU, but the added complexity didn't feel worth it.

Notes on the current state

Add option for canonical direct I/O layout #3108 needs a resolution before this can be merged.
New code lives in cpp/core/gpu/registration and cpp/core/gpu/shaders.
The code is still rough around the edges and needs refinement, but I hope it's understandable enough to give a general idea of how things work. The registration logic may have holes.
There are some hacks / temporary solutions to get things working, which I hope to clean up over time.
The (L)NCC logic seems buggy (not sure why yet) and probably needs fixing.
Like in GPU compute API abstraction on top of WebGPU #3238 I've made extensive use of designated initialisers (a C++20 feature, though supported by Clang and GCC).
This PR also includes support for magic_enum (for enum <-> string conversion), but that should likely be split into a separate PR.
Some code is not necessarily specific to registration per say (e.g. logic for performing reduction operations in workgroups or performing some operations like computing CoM or downsampling), but was necessary for building the commands. Perhaps, I should separate that logic into a separate PR?
I've tested the code manually with the help of a rudimentary Python script, but I'm hoping to add a comprehensive enough set of unit tests to test out the core logic. On this note, we probably need to identify a suitable set of testing data to use for the NMI and NCC metrics.

I'm aware that this is a rather large PR, but any feedback would be welcome.

github-actions

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 57. Check the log or trigger a new build to see more.

github-actions · 2026-01-14T09:22:08Z

cpp/cmd/mrreggpu.cpp

+ #include "gpu/registration/globalregistration.h"
+ #include "gpu/registration/registrationtypes.h"
+ #include "gpu/registration/imageoperations.h"
+ #include "gpu/registration/imageoperations.h"


warning: duplicate include [readability-duplicate-include]

cpp/cmd/mrreggpu.cpp:28:

- #include "gpu/registration/imageoperations.h" - #include "gpu/registration/imageoperations.h" + #include "gpu/registration/imageoperations.h"

github-actions · 2026-01-14T09:22:08Z

cpp/cmd/mrreggpu.cpp

+       File::Matrix::save_transform(registration_result.transformation, centre, matrix_filename);
+     }
+     if (!matrix_1tomid_filename.empty()) {
+       File::Matrix::save_transform(halfway_transforms->half, centre, matrix_1tomid_filename);


warning: unchecked access to optional value [bugprone-unchecked-optional-access]

File::Matrix::save_transform(halfway_transforms->half, centre, matrix_1tomid_filename); ^

github-actions · 2026-01-14T09:22:08Z

cpp/cmd/mrreggpu.cpp

+       File::Matrix::save_transform(halfway_transforms->half, centre, matrix_1tomid_filename);
+     }
+     if (!matrix_2tomid_filename.empty()) {
+       File::Matrix::save_transform(halfway_transforms->half_inverse, centre, matrix_2tomid_filename);


warning: unchecked access to optional value [bugprone-unchecked-optional-access]

File::Matrix::save_transform(halfway_transforms->half_inverse, centre, matrix_2tomid_filename); ^

github-actions · 2026-01-14T09:22:08Z

cpp/cmd/mrreggpu.cpp

+   if (!transformed_midway1_filenames.empty()) {
+     // Compute midpioint transforms in scanner space and then build a midway output header that can hold both images
+     using ProjectiveTransform = Eigen::Transform<default_type, 3, Eigen::Projective>;
+     const ProjectiveTransform half_projective(halfway_transforms->half_matrix);


warning: unchecked access to optional value [bugprone-unchecked-optional-access]

const ProjectiveTransform half_projective(halfway_transforms->half_matrix); ^

github-actions · 2026-01-14T09:22:08Z

cpp/cmd/mrreggpu.cpp

+     // Compute midpioint transforms in scanner space and then build a midway output header that can hold both images
+     using ProjectiveTransform = Eigen::Transform<default_type, 3, Eigen::Projective>;
+     const ProjectiveTransform half_projective(halfway_transforms->half_matrix);
+     const ProjectiveTransform half_inverse_projective(halfway_transforms->half_inverse_matrix);


warning: unchecked access to optional value [bugprone-unchecked-optional-access]

const ProjectiveTransform half_inverse_projective(halfway_transforms->half_inverse_matrix); ^

github-actions · 2026-01-14T09:22:09Z

cpp/core/gpu/registration/imageoperations.cpp

+  const Buffer<float> matrixBuffer = context.new_buffer_from_host_memory<float>(matrixData);
+
+  const MomentUniforms uniforms{
+      .centre = {centreScanner.x(), centreScanner.y(), centreScanner.z(), 0.0f},


warning: floating point literal has suffix 'f', which is not uppercase [readability-uppercase-literal-suffix]

Suggested change

.centre = {centreScanner.x(), centreScanner.y(), centreScanner.z(), 0.0f},

.centre = {centreScanner.x(), centreScanner.y(), centreScanner.z(), 0.0F},

github-actions · 2026-01-14T09:22:09Z

cpp/core/gpu/registration/imageoperations.cpp

+
+  std::array<float, kMomentCount> momentValues{};
+  for (size_t i = 0; i < kMomentCount; ++i) {
+    std::memcpy(&momentValues[i], &momentBits[i], sizeof(float));


warning: do not use array subscript when the index is not an integer constant expression [cppcoreguidelines-pro-bounds-constant-array-index]

std::memcpy(&momentValues[i], &momentBits[i], sizeof(float)); ^

github-actions · 2026-01-14T09:22:10Z

cpp/core/gpu/registration/imageoperations.cpp

+
+  std::array<float, kMomentCount> momentValues{};
+  for (size_t i = 0; i < kMomentCount; ++i) {
+    std::memcpy(&momentValues[i], &momentBits[i], sizeof(float));


warning: do not use array subscript when the index is not an integer constant expression [cppcoreguidelines-pro-bounds-constant-array-index]

std::memcpy(&momentValues[i], &momentBits[i], sizeof(float)); ^

github-actions · 2026-01-14T09:22:10Z

cpp/core/gpu/registration/imageoperations.cpp

+
+  context.dispatch_kernel(transformKernel, dispatch_grid);
+
+  return outputTexture;


warning: constness of 'outputTexture' prevents automatic move [performance-no-automatic-move]

return outputTexture; ^

github-actions · 2026-01-14T09:22:10Z

cpp/core/gpu/registration/imageoperations.cpp

+
+  context.dispatch_kernel(transformKernel, dispatch_grid);
+
+  return outputTexture;


warning: constness of 'outputTexture' prevents automatic move [performance-no-automatic-move]

return outputTexture; ^

github-actions

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 34. Check the log or trigger a new build to see more.

github-actions · 2026-01-30T12:14:53Z