feat: add MojoPaddedWindowAttention and MojoConv1d.#274
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the MojoPaddedWindowAttention and MojoConv1d operators, including a high-performance Triton kernel implementation for padded window attention on NPU backends. The changes also include reference implementations, core operator registrations, and comprehensive accuracy and performance tests. The review feedback focuses on improving the Triton kernel's robustness and performance, specifically by addressing a potential division-by-zero risk, optimizing loop ranges, and ensuring power-of-2 block dimensions. Additionally, there are suggestions to reduce computational overhead in the operator's forward pass by removing redundant memory copies and optimizing the reference implementation's vectorization.
No description provided.