-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Backend][Relax] Add Intel GNA backend for NPU support #18201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
141157b to
77b312a
Compare
|
@Aristide021
I also think this backend can serve as a very good example for codegen in Relax. It shows a clean and minimal pattern: partitioning with basic ops, handing off to JSON, and keeping the implementation relatively lightweight. Adding a short HOWTO or developer note ("Writing a minimal Relax backend") that references this code could be very helpful for the community. |
Thanks for the review and the excellent points! You're correct about GNA being archived. I designed this backend as a stepping stone toward NPU support with OpenVINO runtime integration in mind. The JSON serialization approach should make the transition to Intel's current NPU path relatively straightforward. For the CI integration with Software Emulation Mode, I think that's a great suggestion. I can add CPU fallback support to enable E2E testing without requiring actual GNA hardware. I'd also be happy to add documentation, positioning this as a foundation for NPU backends, and include a developer guide if that would be helpful for the community. I'll go ahead and update the PR description to clarify the NPU migration path. My next step will be to add CPU emulation support for testing. Please let me know if you have any other suggestions. |
9b955d4 to
2c036cc
Compare
This commit introduces the Intel GNA (Gaussian Neural Accelerator) backend for TVM's Relax IR with a clean separation between hardware and emulation runtimes to enable CI testing without GNA hardware. Key components: - GNA codegen for Relax IR (graph partitioning and code generation) - Hardware runtime (gna_json_runtime.cc) for systems with GNA SDK - CPU emulation runtime (gna_json_runtime_emulation.cc) for CI/testing - Conditional CMake build based on GNA SDK availability - Pattern registry for dense, conv1d, and relu operations - Comprehensive test suite Architecture decisions: - Clean separation: Hardware and emulation in separate files (no mocking) - CI-friendly: Emulation runtime has no GNA SDK dependencies - Follows OpenVINO's Software Emulation Mode pattern - Same API surface for both runtime implementations The emulation runtime provides simplified reference implementations sufficient for testing graph partitioning and codegen correctness. For production CPU inference, use TVM's standard CPU backend. This backend serves as a stepping stone toward Intel NPU support and provides a minimal example for Relax backend development.
2c036cc to
7d5d812
Compare
|
Thanks for the contribution, given GNA is archived, it perhaps does not make sense to maintain it in the main tree, adding ci will also add extra overhead here. However, i agree that having generic tutorials for BYOC NPU would be useful, if we can have something that support a current NPU that would be great |
I'd be happy to refactor this into a generic NPU tutorial targeting Intel's current NPU plugin. Should this live in the tutorials section or as a contrib module? I can adapt the JSON architecture for educational purposes. |
|
i think starting as contrib is fine, and we can have a tutorial explaination point to the code |
This commit introduces an educational NPU backend example that teaches key architectural concepts common across Neural Processing Units. Key features: - Multi-tier memory hierarchy (L0/L1/L2/L3) management with spilling - Tiling engine for large tensors that exceed on-chip SRAM - Quantization support (INT8/INT16) with dedicated patterns - Multiple execution engines (matrix, vector, conv, pooling, activation) - Operation fusion patterns to reduce memory traffic - Power mode management for efficiency tuning Educational value: - Demonstrates NPU memory management strategies - Shows how tiling enables large model execution - Explains quantization's role in NPU acceleration - Illustrates operation-to-engine mapping - Provides CPU emulation for testing without hardware This vendor-neutral implementation serves as a template for developers creating custom NPU backends, teaching BYOC integration patterns while demonstrating real NPU architectural concepts. Addresses feedback from apache#18201 requesting generic NPU BYOC tutorials.
This commit introduces an educational NPU backend example that teaches key architectural concepts common across Neural Processing Units. Key features: - Multi-tier memory hierarchy (L0/L1/L2/L3) management with spilling - Tiling engine for large tensors that exceed on-chip SRAM - Quantization support (INT8/INT16) with dedicated patterns - Multiple execution engines (matrix, vector, conv, pooling, activation) - Operation fusion patterns to reduce memory traffic - Power mode management for efficiency tuning Educational value: - Demonstrates NPU memory management strategies - Shows how tiling enables large model execution - Explains quantization's role in NPU acceleration - Illustrates operation-to-engine mapping - Provides CPU emulation for testing without hardware This vendor-neutral implementation serves as a template for developers creating custom NPU backends, teaching BYOC integration patterns while demonstrating real NPU architectural concepts. Addresses feedback from apache#18201 requesting generic NPU BYOC tutorials.
…cepts This commit introduces a vendor-neutral NPU backend that demonstrates architectural patterns common across Neural Processing Units. The implementation covers key NPU concepts including multi-tier memory hierarchy management, automatic tiling for large tensors, quantization handling, and specialized execution engines. It shows how NPUs manage memory across different tiers (L0/L1/L2/L3), tile operations to fit in on-chip SRAM, and dispatch operations to dedicated compute units. This serves as an educational template for developers creating NPU backends, demonstrating BYOC integration while teaching NPU-specific optimization strategies. Uses CPU emulation for testing without requiring actual NPU hardware. Addresses feedback from apache#18201 requesting generic NPU BYOC tutorials.
…cepts This commit introduces a vendor-neutral NPU backend that demonstrates architectural patterns common across Neural Processing Units. The implementation covers key NPU concepts including multi-tier memory hierarchy management, automatic tiling for large tensors, quantization handling, and specialized execution engines. It shows how NPUs manage memory across different tiers (L0/L1/L2/L3), tile operations to fit in on-chip SRAM, and dispatch operations to dedicated compute units. This serves as an educational template for developers creating NPU backends, demonstrating BYOC integration while teaching NPU-specific optimization strategies. Uses CPU emulation for testing without requiring actual NPU hardware. Addresses feedback from apache#18201 requesting generic NPU BYOC tutorials.
…cepts This commit introduces a vendor-neutral NPU backend that demonstrates architectural patterns common across Neural Processing Units. The implementation covers key NPU concepts including multi-tier memory hierarchy management, automatic tiling for large tensors, quantization handling, and specialized execution engines. It shows how NPUs manage memory across different tiers (L0/L1/L2/L3), tile operations to fit in on-chip SRAM, and dispatch operations to dedicated compute units. This serves as an educational template for developers creating NPU backends, demonstrating BYOC integration while teaching NPU-specific optimization strategies. Uses CPU emulation for testing without requiring actual NPU hardware. Addresses feedback from apache#18201 requesting generic NPU BYOC tutorials.
…cepts This commit introduces a vendor-neutral NPU backend that demonstrates architectural patterns common across Neural Processing Units. The implementation covers key NPU concepts including multi-tier memory hierarchy management, automatic tiling for large tensors, quantization handling, and specialized execution engines. It shows how NPUs manage memory across different tiers (L0/L1/L2/L3), tile operations to fit in on-chip SRAM, and dispatch operations to dedicated compute units. This serves as an educational template for developers creating NPU backends, demonstrating BYOC integration while teaching NPU-specific optimization strategies. Uses CPU emulation for testing without requiring actual NPU hardware. Addresses feedback from apache#18201 requesting generic NPU BYOC tutorials.
…cepts This commit introduces a vendor-neutral NPU backend that demonstrates architectural patterns common across Neural Processing Units. The implementation covers key NPU concepts including multi-tier memory hierarchy management, automatic tiling for large tensors, quantization handling, and specialized execution engines. It shows how NPUs manage memory across different tiers (L0/L1/L2/L3), tile operations to fit in on-chip SRAM, and dispatch operations to dedicated compute units. This serves as an educational template for developers creating NPU backends, demonstrating BYOC integration while teaching NPU-specific optimization strategies. Uses CPU emulation for testing without requiring actual NPU hardware. Addresses feedback from apache#18201 requesting generic NPU BYOC tutorials.
…cepts This commit introduces a vendor-neutral NPU backend that demonstrates architectural patterns common across Neural Processing Units. The implementation covers key NPU concepts including multi-tier memory hierarchy management, automatic tiling for large tensors, quantization handling, and specialized execution engines. It shows how NPUs manage memory across different tiers (L0/L1/L2/L3), tile operations to fit in on-chip SRAM, and dispatch operations to dedicated compute units. This serves as an educational template for developers creating NPU backends, demonstrating BYOC integration while teaching NPU-specific optimization strategies. Uses CPU emulation for testing without requiring actual NPU hardware. Addresses feedback from apache#18201 requesting generic NPU BYOC tutorials.
…cepts This commit introduces a vendor-neutral NPU backend that demonstrates architectural patterns common across Neural Processing Units. The implementation covers key NPU concepts including multi-tier memory hierarchy management, automatic tiling for large tensors, quantization handling, and specialized execution engines. It shows how NPUs manage memory across different tiers (L0/L1/L2/L3), tile operations to fit in on-chip SRAM, and dispatch operations to dedicated compute units. This serves as an educational template for developers creating NPU backends, demonstrating BYOC integration while teaching NPU-specific optimization strategies. Uses CPU emulation for testing without requiring actual NPU hardware. Addresses feedback from apache#18201 requesting generic NPU BYOC tutorials.
Intel GNA (Gaussian Neural Accelerator) backend for TVM Relax, designed as a foundation for Intel NPU support. While GNA hardware is present in Intel Core Ultra processors, this backend serves as a stepping stone toward Intel's current NPU path with OpenVINO runtime integration.
Features:
Supported operations:
This implementation provides a clean, minimal pattern for backend development while preparing the foundation for Intel's recommended NPU acceleration path through TVM's compilation pipeline.