CUDA-kernels Some records of personal CUDA kernel implementations. These implementations are not best optimized and mainly for learning purposes. Kernels Softmax ReLU GeLU GEMM Layer Normalization Multi Head Self Attention Matrix Transpose More kernels are coming...