Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 17, 2025

Overview

This PR addresses issue #251 by refactoring repeated space-filling curve patterns from multiple example files into reusable utility functions in examples/common/utils.py.

Problem

The same space-filling curve patterns were duplicated across 9+ example files:

  1. XCD reordering pattern - for multi-die GPU optimization:
if NUM_XCDS != 1:
    pid = (pid % NUM_XCDS) * (NUM_SMS // NUM_XCDS) + (pid // NUM_XCDS)
  1. Tile coordinate computation pattern - for memory coalescing:
num_pid_in_group = GROUP_SIZE_M * num_pid_n
group_id = tile_id // num_pid_in_group
first_pid_m = group_id * GROUP_SIZE_M
group_size_m = min(num_pid_m - first_pid_m, GROUP_SIZE_M)
pid_m = first_pid_m + ((tile_id % num_pid_in_group) % group_size_m)
pid_n = (tile_id % num_pid_in_group) // group_size_m

This code duplication made maintenance difficult and increased the risk of inconsistencies.

Solution

Created two new Triton JIT utility functions in examples/common/utils.py:

1. chiplet_reorder(pid, NUM_XCDS, NUM_SMS)

Applies XCD (compute die) space-filling curve reordering to program IDs. This reorders program IDs such that you fill an XCD (chiplet) with work before going to the next XCD, improving locality when multiple compute dies are present.

2. program_id_reorder(tile_id, num_pid_m, num_pid_n, GROUP_SIZE_M)

Computes 2D tile coordinates (pid_m, pid_n) from linear tile_id using swizzling. This implements a space-filling curve that groups tiles along the M dimension to improve memory coalescing and cache locality.

Both functions include comprehensive docstrings explaining their purpose, parameters, and return values.

Changes

Updated 9 example files to use the new utility functions:

  • examples/07_gemm_all_scatter/gemm_all_scatter.py
  • examples/08_gemm_atomics_all_reduce/gemm_atomics_all_reduce.py
  • examples/09_gemm_one_shot_all_reduce/gemm_one_shot_all_reduce.py
  • examples/10_gemm_all_scatter_wg_specialization/gemm_all_scatter_wg_specialization.py
  • examples/11_gemm_all_scatter_producer_consumer/gemm_all_scatter_producer_consumer.py
  • examples/12_gemm_all_scatter_bulk_synchronous/gemm_all_scatter_bulk_synchronous.py
  • examples/14_all_gather_gemm/all_gather_gemm_push.py
  • examples/14_all_gather_gemm/all_gather_gemm_pull.py
  • examples/20_gemm_all_scatter_independent/gemm_all_scatter_bulk_synchronous.py

Impact

  • Code reduction: Net reduction of 40 lines (125 deletions, 85 insertions)
  • Maintainability: Pattern now exists in only one place - future optimizations benefit all examples
  • Readability: Descriptive function names make the intent clearer than inline calculations
  • Consistency: All examples now use the same implementation, reducing the risk of subtle bugs

Testing

  • ✅ All ruff linting and formatting checks pass
  • ✅ No functional changes - refactoring preserves exact behavior
  • ✅ Verified that duplicate patterns have been completely eliminated

Fixes #251

Original prompt

This section details on the original issue you should resolve

<issue_title>[Feature]: Refactor the space-filling curves out of examples into common</issue_title>
<issue_description>Repeated pattern in many-many examples and kernels:

https://github.com/ROCm/iris/blob/main/examples/10_gemm_all_scatter_wg_specialization/gemm_all_scatter_wg_specialization.py#L52-L53

https://github.com/ROCm/iris/blob/main/examples/10_gemm_all_scatter_wg_specialization/gemm_all_scatter_wg_specialization.py#L77-L83</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #251


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot AI changed the title [WIP] Refactor space-filling curves into common module Refactor space-filling curves from examples into common utilities Oct 17, 2025
Copilot AI requested a review from neoblizz October 17, 2025 05:43
Copilot finished work on behalf of neoblizz October 17, 2025 05:43
Comment on lines 177 to 179
This function reorders program IDs to improve locality when multiple compute
dies (XCDs) are present. It ensures that consecutive PIDs are distributed
across different XCDs before moving to the next set of programs within an XCD.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Slightly incorrect, it ensures the program IDs are reordered such that you "fill" an XCD with work before going to the next.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the docstring in commit 2d6f8c4. The description now correctly states that the function "fills an XCD with work before going to the next XCD".



@triton.jit
def apply_xcd_reordering(pid, NUM_XCDS: tl.constexpr, NUM_SMS: tl.constexpr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Better name: chiplet_reorder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to chiplet_reorder in commit 2d6f8c4.



@triton.jit
def compute_tile_coordinates(tile_id, num_pid_m, num_pid_n, GROUP_SIZE_M: tl.constexpr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Rename program_id_reorder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to program_id_reorder in commit 2d6f8c4.

Copilot AI requested a review from neoblizz October 17, 2025 05:55
Copilot finished work on behalf of neoblizz October 17, 2025 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Refactor the space-filling curves out of examples into common

2 participants