Important
This project is intended for research purposes only and is provided by AMD Research and Advanced Development team. This is not a product. Use it at your own risk and discretion.
Iris is a Triton-based framework for Remote Memory Access (RMA) operations. Iris provides SHMEM-like APIs within Triton for Multi-GPU programming. Iris' goal is to make Multi-GPU programming a first-class citizen in Triton while retaining Triton's programmability and performance.
- SHMEM-like RMA: Iris provides SHMEM-like RMA support in Triton.
- Simple and Intuitive API: Iris provides simple and intuitive RMA APIs. Writing multi-GPU programs is as easy as writing single-GPU programs.
- Triton-based: Iris is built on top of Triton and inherits Triton's performance and capabilities.
Iris matches PyTorch APIs on the host side and Triton APIs on the device side:
import torch
import triton
import triton.language as tl
import iris
@triton.jit
def kernel(buffer, buffer_size: tl.constexpr, block_size: tl.constexpr, heap_bases_ptr):
# Compute start index of this block
pid = tl.program_id(0)
block_start = pid * block_size
offsets = block_start + tl.arange(0, block_size)
# Guard for out-of-bounds accesses
mask = offsets < buffer_size
# Store 1 in the target buffer at each offset
source_rank = 0
target_rank = 1
iris.store(buffer + offsets, 1,
source_rank, target_rank,
heap_bases_ptr, mask=mask)
heap_size = 2**30
buffer_size = 4096
block_size = 1024
iris_ctx = iris.iris(heap_size)
cur_rank = iris_ctx.get_rank()
buffer = iris_ctx.zeros(buffer_size, device="cuda", dtype=torch.float32)
grid = lambda meta: (triton.cdiv(buffer_size, meta["block_size"]),)
source_rank = 0
if cur_rank == source_rank:
kernel[grid](
buffer,
buffer_size,
block_size,
iris_ctx.get_heap_bases(),
)
iris_ctx.barrier()
The recommended way to get started is using Docker Compose, which provides a development environment with the Iris directory mounted inside the container. This allows you to make changes to the code outside the container and see them reflected inside.
# Start the development container
docker compose up --build -d
# Attach to the running container
docker attach iris-dev
# Install Iris in development mode
cd iris && pip install -e .
For manual Docker or Apptainer setup, see setup alternatives.
Check out our examples directory for ready-to-run scripts and usage patterns, including peer-to-peer communication and GEMM benchmarks.
Iris currently supports:
- MI300X
Note
Iris may work on other AMD GPUs with ROCm compatibility, but has only been tested on MI300X.
We plan to extend Iris with the following features:
- Extended GPU Support: Testing and optimization for other AMD GPUs beyond MI300X
- RDMA Support: Multi-node support using Remote Direct Memory Access (RDMA) for distributed computing across multiple machines
- More Code Examples: Comprehensive examples covering various use cases and patterns
We welcome contributions! Please see our Contributing Guide for details on how to set up your development environment and contribute to the project.
Need help? We're here to support you! Here are a few ways to get in touch:
- Open an Issue: Found a bug or have a feature request? Open an issue on GitHub
- Contact the Team: If GitHub issues aren't working for you or you need to reach us directly, feel free to contact our development team
We welcome your feedback and contributions!
This project is licensed under the MIT License - see the LICENSE file for details.