Join the discord server if you are interested in LLM architecture or distributed training/inference research.
Module | Triton | CUDA |
---|---|---|
GRU | β | β |
MoE | β | β |
RNN | β | β |
Module | Triton | CUDA |
---|---|---|
bmm | β | β |
continuous_count | β | β |
cross_entropy | β | β |
fused_linear_cross_entropy | β | β |
fused_residual_add_rmsnorm | β | β |
gemm | β | β |
grouped_gemm | β | β |
matrix_transpose | β | β |
rmsnorm | β | β |
pack_sequence | β | β |
softmax | β | β |
swiglu | β | β |
swiglu_packed | β | β |
unpack_sequence | β | β |
zeros | β | β |