Skip to content

[Questions] Hardware Support for Sparse Model Acceleration on Snapdragon 8 Gen 2/3 Devices #3960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Jiawei888 opened this issue Mar 31, 2025 · 1 comment

Comments

@Jiawei888
Copy link

I'm interested in understanding the hardware-level support for sparse model acceleration on Snapdragon 8 Gen 2/3 devices, specifically across the CPU, GPU, and AI accelerators (NPU/DSP). Does the platform provide native hardware acceleration for sparse neural networks, such as N:M sparsity patterns or random sparsity?

I notice that AIMET supports model pruning, but I'm unclear whether this sparsity can be effectively leveraged by the underlying Snapdragon hardware. If the hardware doesn't natively support sparse acceleration, are there any frameworks or libraries specifically designed for Snapdragon platforms that can efficiently execute sparse models and deliver actual performance improvements?

Specifically, I'm looking to understand:

  1. Which processing units (CPU/GPU/NPU/DSP) on Snapdragon 8 Gen 2/3 have hardware-level support for sparse models?
  2. What sparsity patterns are supported (structured N:M, unstructured random sparsity, etc.) ?
  3. If hardware support is limited, what software solutions exist to effectively run sparse models on these devices?

Any insights regarding the practical implementation and performance benefits of model sparsity on Snapdragon platforms would be greatly appreciated.

@quic-akhobare
Copy link
Collaborator

quic-akhobare commented May 1, 2025

Hi @Jiawei888 - thanks for your question.

The model pruning techniques you find in AIMET are in the category of structured pruning - which basically reduces the dimensionality of layers. These techniques are not for exploiting sparsity. Having said this we recommend quantization techniques as opposed to model pruning since the latter generally causes model accuracy drops that need to be recovered via model fine-tuning.

We are not the right experts to comment on HW-level sparsity support on different Snapdragon devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants