-
Notifications
You must be signed in to change notification settings - Fork 17
[Documentation]: Update Triton API docstrings to show cross-GPU operations #151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: mawad-amd <[email protected]>
…ty functions Co-authored-by: mawad-amd <[email protected]>
…mic methods Co-authored-by: mawad-amd <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Looks mostly good but the Triton API examples (load, store, atomics) are all doing local ops (you always use one rank). See the examples directory for cross-GPU load/store ops and update the docstrings. Add comments whenever possible. Something like:
# Store 3 into a remote rank's heap from the current rank.
iris.store(
ptr, # Pointer in the symmetric heap
3, # Value to store
cur_rank, # Source rank
remote_rank, # Destination rank
heap_bases, # Symmetric heap bases
)
Co-authored-by: mawad-amd <[email protected]>
Updated all Triton API docstring examples to show cross-GPU operations instead of local operations. The examples now demonstrate realistic cross-GPU communication patterns with descriptive comments, following the suggested format. All device functions (load, store, atomics) now show operations between different ranks (e.g., rank 0 operating on rank 1's memory). Commit: 733df66 |
@copilot sync with main. |
Co-authored-by: mawad-amd <[email protected]>
@copilot revert the last commit then merge main branch into this branch. Resolve any merge conflicts. |
Reverted the last commit and merged main branch as requested. Resolved merge conflict in barrier() method docstring to use updated PyTorch distributed terminology while preserving the example. Commit: 87355ba |
Signed-off-by: Muhammad Awad <[email protected]>
Signed-off-by: Muhammad Awad <[email protected]>
Signed-off-by: Muhammad Awad <[email protected]>
Signed-off-by: Muhammad Awad <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR comprehensively updates all Iris API docstrings to demonstrate cross-GPU operations, replacing single-GPU examples with multi-rank communication patterns. It resolves merge conflicts from recent PyTorch distributed backend integration and adds complete documentation for previously undocumented utility functions.
- Updates all Triton device function examples to show cross-GPU memory operations between different ranks
- Adds comprehensive docstring examples for 44 host API methods including tensor creation, utilities, and logging functions
- Resolves merge conflicts and integrates PyTorch distributed backend changes
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
iris/iris.py | Updated all Triton device function docstrings with cross-GPU examples and added comprehensive examples for host API methods |
iris/util.py | Added complete docstrings with examples for do_bench and memset_tensor utility functions |
iris/logging.py | Added example for set_logger_level function |
docs/reference/api-iris-class.md | Added autofunction directive for set_logger_level |
.github/workflows/docs.yml | Updated concurrency settings and deployment conditions |
Signed-off-by: Muhammad Awad <[email protected]>
Key Changes
Triton Device Functions - All examples now demonstrate cross-GPU operations:
load()
,store()
,get()
,put()
- Remote memory access operations between different ranksatomic_add()
,atomic_sub()
,atomic_cas()
,atomic_xchg()
- Cross-GPU atomic operationsatomic_xor()
,atomic_and()
,atomic_or()
,atomic_min()
,atomic_max()
- Additional atomic operationsHost API Methods - Comprehensive examples added for 44 methods:
zeros()
,ones()
,full()
,randn()
, etc.get_heap_bases()
,barrier()
,get_device()
, etc.debug()
,info()
,warning()
,error()
Recent Updates:
barrier()
method to use updated terminology while preserving examplesExample Format:
All docstring examples are syntactically correct, pass linting, and demonstrate realistic multi-GPU distributed computing scenarios compatible with the new PyTorch distributed backend.
Fixes #150.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.