Skip to content

open-lm-engine/flash-model-architectures

Repository files navigation

Discord Server

Join the discord server if you are interested in LLM architecture or distributed training/inference research.

Efficient GPU kernels written in both CUDA and Triton

Modules

Module Triton CUDA
GRU βœ… ❌
MoE βœ… βœ…
RNN βœ… ❌

Ops

Module Triton CUDA
bmm βœ… ❌
continuous_count ❌ βœ…
cross_entropy βœ… ❌
fused_linear_cross_entropy βœ… ❌
fused_residual_add_rmsnorm βœ… ❌
gemm βœ… ❌
grouped_gemm ❌ βœ…
matrix_transpose βœ… ❌
rmsnorm βœ… ❌
pack_sequence βœ… βœ…
softmax βœ… ❌
swiglu βœ… βœ…
swiglu_packed βœ… ❌
unpack_sequence βœ… βœ…
zeros βœ… βœ…

About

A bunch of kernels that might make stuff slower πŸ˜‰

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published