Discord Server

Join the discord server if you are interested in LLM architecture or distributed training/inference research.

Efficient GPU kernels written in both CUDA and Triton

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
.github/workflows		.github/workflows
assets		assets
copyright @ 017498e		copyright @ 017498e
cutlass @ a1aaf23		cutlass @ a1aaf23
examples		examples
fma		fma
tests		tests
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
copyright-exclude.txt		copyright-exclude.txt
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py