The NKI Library provides pre-built reference kernels you can use directly in your model development with the AWS Neuron SDK and NKI. These kernel APIs provide the default classes, functions, and parameters you can use to integrate the NKL kernels into your models. More details can be found in the NKI Library Documentation
The kernels in this repo require the Neuron 2.27 release (coming soon).
| Kernel API | Description |
|---|---|
| Attention CTE Kernel | The kernel implements attention with support for multiple variants and optimizations. |
| Attention TKG Kernel | The kernel implements attention specifically optimized for token generation scenarios. |
| MLP Kernel | The kernel implements a Multi-Layer Perceptron with optional normalization fusion and various optimizations. |
| Output Projection CTE Kernel | The kernel computes the output projection operation optimized for Context Encoding use cases. |
| Output Projection TKG Kernel | The kernel computes the output projection operation optimized for Token Generation use cases. |
| QKV Kernel | The kernel performs Query-Key-Value projection with optional normalization fusion. |
| RMSNorm-Quant Kernel | The kernel performs optional RMS normalization followed by quantization to fp8. |