Discord Server

Join the discord server if you are interested in LLM architecture or distributed training/inference research.

Efficient GPU kernels written in both CUDA and Triton

CuteInductor allows easier injection of kernels contained in this repository into any PyTorch module.

Name	Name	Last commit message	Last commit date
Latest commit mayank31398 add discord server link (#176 ) Mar 20, 2025 81d75a2 · Mar 20, 2025 History 153 Commits
.github/workflows	.github/workflows	Rename repo and cleanup (#1 )	Jun 26, 2024
assets	assets	add cute_inductor (#91 )	Dec 5, 2024
cute_kernels	cute_kernels	add cute_kernels namespace (#174 )	Mar 18, 2025
cutlass @ affd1b6	cutlass @ affd1b6	CUTLASS GEMM (#142 )	Feb 12, 2025
examples	examples	add cute_inductor example (#157 )	Feb 19, 2025
tests	tests	Fused residual add rmsnorm (#173 )	Mar 17, 2025
tools	tools	add cute_kernels namespace (#174 )	Mar 18, 2025
.clang-format	.clang-format	add cute_kernels namespace (#174 )	Mar 18, 2025
.gitignore	.gitignore	init	Jun 25, 2024
.gitmodules	.gitmodules	CUTLASS GEMM (#142 )	Feb 12, 2025
.pre-commit-config.yaml	.pre-commit-config.yaml	CUTLASS GEMM (#142 )	Feb 12, 2025
LICENSE	LICENSE	Rename repo and cleanup (#1 )	Jun 26, 2024
Makefile	Makefile	Fused residual add rmsnorm (#173 )	Mar 17, 2025
README.md	README.md	add discord server link (#176 )	Mar 20, 2025
requirements-dev.txt	requirements-dev.txt	Rename repo and cleanup (#1 )	Jun 26, 2024
requirements.txt	requirements.txt	Rename repo and cleanup (#1 )	Jun 26, 2024
setup.cfg	setup.cfg	init	Jun 25, 2024
setup.py	setup.py	fix FLCE (#172 )	Mar 15, 2025