Fasten is a library aimed at speeding up Heterogeneous Graph Neural Network (HGNN) workloads. The current version of Fasten focuses on improving segmented matrix multiplication, a critical operator in HGNNs. Fasten implements a simple interface, making it easy to integrate with existing graph library PyG with minimal changes. Fasten achieved an average speedup of 13.65x and 4.72x in operator-wise benchmarks compared to CUTLASS and cuBLAS, respectively.
Install pytorch nightly and triton nightly. We use relatively new triton features so old triton releases may crash.
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightlyYou may need to build triton from source before proton is distributed with triton's pip wheel.
git clone https://github.com/Deep-Learning-Profiling-Tools/fasten.git && cd fasten
pip install .Fasten's segment matrix multiplication operator has been integrated with various HGNN architecture such as RGCN, HGT, RGAT in PyG. Examples on how to run the examples can be found below:
- RGCN
cd examples/rgcn
# Without fasten
# Available datasets are: AIFB, MUTAG, BGS, AM
python rgcn.py --device cuda --dataset AIFB
# With fasten
python rgat.py --device cuda --mode fasten --dataset AIFB- HGT
cd examples/rgcn
# Without fasten
# Available datasets are: DBLP, Freebase, AIFB, MUTAG, BGS, AM
python rgcn.py --device cuda --example DBLP
# With fasten
python rgat.py --device cuda --mode fasten --example DBLP- RGAT
cd examples/rgat
# Without fasten
# Available datasets are: AIFB, MUTAG, BGS, AM
python rgat.py --device cuda --dataset MUTAG
# With fasten
python rgat.py --device cuda --mode fasten --dataset MUTAGcd test
pytest -vs test_op.py::test_perf- Linux
- NVIDIA GPUs (Compute Capability 7.0+)
- Pytorch >=2.2.0
- Triton >=3.0.0
- PyG >=2.6.0
- Keren Zhou, Karthik Ganapathi Subramanian, Po-Hsun Lin, Matthias Fey, Binqian Yin, and Jiajia Li. 2024. FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogeneous Graph Neural Networks. In Proceedings of the 38th ACM International Conference on Supercomputing (ICS’24), June 4–7, 2024, Kyoto, Japan.

