Improve docs #49

mawad-amd · 2025-07-13T08:09:07Z

First attempt at improving docs. local/remote is still a bit confusing.

Closes #46

Translate should be private
Improve docs

Copilot

Pull Request Overview

This PR refactors the internal pointer translation helper, tightens up public exports, and enriches docstrings for remote memory operations.

Renames translate to private __translate and updates all function signatures to use local_ptr/local_rank/remote_rank conventions
Enhances docstrings for load, store, get, put, and atomic operations with detailed parameter and return descriptions
Removes translate from public exports in __init__.py

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
iris/iris.py	Renamed `translate` to `__translate`, updated signatures and improved docstrings
iris/init.py	Removed `translate` from imports and `__all__` to make it an internal helper

Comments suppressed due to low confidence (3)

iris/iris.py:318

Add a docstring for __translate explaining its purpose, parameters, and return value to maintain consistency with other Triton helper functions.

def __translate(local_ptr, local_rank, remote_rank, heap_bases, debug=False):

iris/init.py:11

Removing translate from the public exports is a breaking change; consider deprecating it for one release or updating the changelog to notify users of this API change.

    iris,

iris/iris.py:318

The new pointer translation logic in __translate (and its use in load/store/get/put) should have unit tests to verify correctness across different rank configurations.

def __translate(local_ptr, local_rank, remote_rank, heap_bases, debug=False):

iris/iris.py

neoblizz · 2025-07-13T19:10:16Z

Why do we have to describe it as local/remote only? If we use iris.load it should do local-load if your source and destination ranks are the same, correct?

mawad-amd · 2025-07-13T20:47:31Z

Correct. It is confusing and I would like to resolve that. Do you have suggestions?

I have a few:

# 1. Emphasizes direction of data flow
def load(pointer, to_rank, from_rank, heap_bases, mask=None):

# 2. Uses 'cur' to indicate the calling rank
def load(pointer, cur_rank, dst_rank, heap_bases, mask=None):

# 3. Generalizes roles as caller/target
def load(pointer, caller_rank, target_rank, heap_bases, mask=None):

The pointer argument can be named local_ptr, sym_ptr, address, or pointer.

I do like the from/to at the moment. Here are all APIs:

def load(pointer, to_rank, from_rank, heap_bases, mask=None):

def store(pointer, val, from_rank, to_rank, heap_bases, mask=None):

def get(from_pointer, to_pointer, to_rank, from_rank, heap_bases, mask=None):

def put(from_pointer, to_pointer, from_rank, to_rank, heap_bases, mask=None):

def atomic_add(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):

def atomic_sub(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):

def atomic_cas(pointer, cmp, val, from_rank, to_rank, heap_bases, sem=None, scope=None):

def atomic_xchg(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):

neoblizz · 2025-07-13T22:04:00Z

Correct. It is confusing and I would like to resolve that. Do you have suggestions?

I have a few:

# 1. Emphasizes direction of data flow
def load(pointer, to_rank, from_rank, heap_bases, mask=None):

# 2. Uses 'cur' to indicate the calling rank
def load(pointer, cur_rank, dst_rank, heap_bases, mask=None):

# 3. Generalizes roles as caller/target
def load(pointer, caller_rank, target_rank, heap_bases, mask=None):

The pointer argument can be named local_ptr, sym_ptr, address, or pointer.

I do like the from/to at the moment. Here are all APIs:

def load(pointer, to_rank, from_rank, heap_bases, mask=None):

def store(pointer, val, from_rank, to_rank, heap_bases, mask=None):

def get(from_pointer, to_pointer, to_rank, from_rank, heap_bases, mask=None):

def put(from_pointer, to_pointer, from_rank, to_rank, heap_bases, mask=None):

def atomic_add(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):

def atomic_sub(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):

def atomic_cas(pointer, cmp, val, from_rank, to_rank, heap_bases, sem=None, scope=None):

def atomic_xchg(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):

Notes

I do like to/from, but not for the from_pointer field.
Is there a notion of "triggering" a load/store/put/get from a rank thats not "current" or "remote" rank? If so, current cannot be used.

Other suggestions:

Use value instead of val,
Use semantics instead of sem
The problematic (imo) APIs are:

def get(from_pointer, to_pointer, to_rank, from_rank, heap_bases, mask=None):
def put(from_pointer, to_pointer, from_rank, to_rank, heap_bases, mask=None):

mawad-amd · 2025-07-14T02:09:43Z

Is there a notion of "triggering" a load/store/put/get from a rank thats not "current" or "remote" rank? If so, current cannot be used.

No. For that reason, I was considering adding a text to warn against that e.g., for load "to_rank must be the same rank issuing the operation." But if there is some an interesting use-case we can accommodate that.
Eventually the "current rank" should be implicit and removed, alongside the heap_bases.

The shortened val and sem are to match Triton but I am okay with either (and I slightly prefer spelling out the complete word).

How about this:

def load(pointer, to_rank, from_rank, heap_bases, mask=None):
def store(pointer, val, from_rank, to_rank, heap_bases, mask=None):
def atomic_add(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):
def atomic_sub(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):
def atomic_cas(pointer, cmp, val, from_rank, to_rank, heap_bases, sem=None, scope=None):
def atomic_xchg(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):

and,

def get(dst_ptr, src_ptr, to_rank, from_rank, heap_bases, mask=None):
def put(dst_ptr, src_ptr, to_rank, from_rank, heap_bases, mask=None):

neoblizz · 2025-07-14T03:40:56Z

No. For that reason, I was considering adding a text to warn against that e.g., for load "to_rank must be the same rank issuing the operation." But if there is some an interesting use-case we can accommodate that.
Eventually the "current rank" should be implicit and removed, alongside the heap_bases.

I am not sure I understand the warning. You mean from_rank must be the same rank issuing the op?
Separately, I think it can be true where thats not the case. Where we can have one GPU thread somehow initiate a copy/move. Lets discuss in a call separately with @BKP.

The shortened val and sem are to match Triton but I am okay with either (and I slightly prefer spelling out the complete word).

How about this:

def load(pointer, to_rank, from_rank, heap_bases, mask=None):
def store(pointer, val, from_rank, to_rank, heap_bases, mask=None):
def atomic_add(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):
def atomic_sub(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):
def atomic_cas(pointer, cmp, val, from_rank, to_rank, heap_bases, sem=None, scope=None):
def atomic_xchg(pointer, val, from_rank, to_rank, heap_bases, mask=None, sem=None, scope=None):

and,

def get(dst_ptr, src_ptr, to_rank, from_rank, heap_bases, mask=None):
def put(dst_ptr, src_ptr, to_rank, from_rank, heap_bases, mask=None):

I prefer spelled out as well --- triton is not consistent with their naming, you can easily find examples of these.

I like this! (with sem, val, maybe even cmp fully spelled out. I know its minor, but sem could be semaphore or semantics or something else, idk)

neoblizz

Still lots of remote, lets discuss the best way to word some of the descriptions.

neoblizz · 2025-07-14T03:46:37Z

iris/iris.py

    """
-    Loads a value from the specified memory location and rank.
+    Loads a value from a remote rank's memory location.


Suggested change

Loads a value from a remote rank's memory location.

Loads a value stored from a pointer of the specified rank.

neoblizz · 2025-07-14T03:47:46Z

iris/iris.py

+
+    This function performs a remote memory read operation by translating the pointer
+    from the from_rank's address space to the to_rank's address space and loading
+    data from the remote memory location.


Suggested change

data from the remote memory location.

data from the remote memory location. If `to_rank` is the same as `from_rank`,

this function performs a local load operation instead.

neoblizz · 2025-07-14T03:50:23Z

iris/iris.py

-        heap_bases (int): The heap bases.
-        mask (Optional[tl.tensor], optional): A boolean tensor used to guard memory accesses.
+        pointer (triton.PointerType, or block of dtype=triton.PointerType): Pointer in the from_rank's address space that will be translated to the to_rank's address space.
+        to_rank (int): The rank ID to which the pointer will be translated. Must be the current rank where the pointer is local.


Suggested change

to_rank (int): The rank ID to which the pointer will be translated. Must be the current rank where the pointer is local.

to_rank (int): The rank ID for the pointer where the load will occur. `to_rank` must be the rank where the pointer resides.

neoblizz · 2025-07-14T03:52:35Z

iris/iris.py

-        mask (Optional[tl.tensor], optional): A boolean tensor used to guard memory accesses. Defaults to None.
+        pointer (triton.PointerType, or block of dtype=triton.PointerType): Pointer in the from_rank's address space that will be translated to the to_rank's address space.
+        val (Block): The tensor of elements to be stored.
+        from_rank (int): The rank ID from which the pointer originates. Must be the current rank where the pointer is local.


Suggested change

from_rank (int): The rank ID from which the pointer originates. Must be the current rank where the pointer is local.

from_rank (int): The rank ID from which the pointer originates. `from_rank` must be the rank where the pointer resides.

neoblizz · 2025-07-14T03:53:16Z

iris/iris.py

    """
-    Loads a value from the specified memory location and rank.
+    Copies data from a remote rank's memory to the current rank's local memory.


Use of the word remote again.

mawad-amd added 3 commits July 13, 2025 02:58

Improve docs

e913cb3

Use local and remote

33eecc9

Make translate private

41e25cc

Copilot AI review requested due to automatic review settings July 13, 2025 08:09

mawad-amd requested review from neoblizz and BKP as code owners July 13, 2025 08:09

Copilot AI reviewed Jul 13, 2025

View reviewed changes

iris/iris.py Outdated Show resolved Hide resolved

Remove debug option

8dfd7a7

mawad-amd added 2 commits July 13, 2025 21:33

Update function signagures except put/get

a2cea4f

Add note

e4d0193

neoblizz requested changes Jul 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve docs #49

Improve docs #49

Uh oh!

mawad-amd commented Jul 13, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

neoblizz commented Jul 13, 2025

Uh oh!

mawad-amd commented Jul 13, 2025

Uh oh!

neoblizz commented Jul 13, 2025 •

edited

Loading

Uh oh!

mawad-amd commented Jul 14, 2025 •

edited

Loading

Uh oh!

neoblizz commented Jul 14, 2025 •

edited

Loading

Uh oh!

neoblizz left a comment

Uh oh!

neoblizz Jul 14, 2025

Uh oh!

neoblizz Jul 14, 2025

Uh oh!

neoblizz Jul 14, 2025

Uh oh!

neoblizz Jul 14, 2025

Uh oh!

neoblizz Jul 14, 2025

Uh oh!

Uh oh!

	Loads a value from a remote rank's memory location.
	Loads a value stored from a pointer of the specified rank.

-    data from the remote memory location.
+    data from the remote memory location. If `to_rank` is the same as `from_rank`,
+    this function performs a local load operation instead.

	to_rank (int): The rank ID to which the pointer will be translated. Must be the current rank where the pointer is local.
	to_rank (int): The rank ID for the pointer where the load will occur. `to_rank` must be the rank where the pointer resides.

	from_rank (int): The rank ID from which the pointer originates. Must be the current rank where the pointer is local.
	from_rank (int): The rank ID from which the pointer originates. `from_rank` must be the rank where the pointer resides.

Improve docs #49

Are you sure you want to change the base?

Improve docs #49

Uh oh!

Conversation

mawad-amd commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

neoblizz commented Jul 13, 2025

Uh oh!

mawad-amd commented Jul 13, 2025

Uh oh!

neoblizz commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Uh oh!

mawad-amd commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neoblizz commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neoblizz left a comment

Choose a reason for hiding this comment

Uh oh!

neoblizz Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

neoblizz Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

neoblizz Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

neoblizz Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

neoblizz Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mawad-amd commented Jul 13, 2025 •

edited

Loading

neoblizz commented Jul 13, 2025 •

edited

Loading

mawad-amd commented Jul 14, 2025 •

edited

Loading

neoblizz commented Jul 14, 2025 •

edited

Loading