[QNN EP] Add LowPowerBlockQuantization support for Gemm node #25458

quic-tirupath · 2025-07-19T01:57:34Z

Description

Low Power Block Quantization(LPBQ) is widely used to accelerate accuracy sensitive models via QNN(Qualcomm Neural Network) stack.
LPBQ encoding format is Qualcomm's alternative for BlockQuantization technique.
The current implementation expects LPBQ encodings packed in a node sequence (DQ -> Q -> DQ)
This PR folds LPBQ pattern on Weight of Gemm nodes into a Qnn BlockExpansion encoding structure.
This PR adds INT4 Quantization support

Motivation and Context

This enables acceleration of accuracy sensitive models that require block quantization kind of encodings via QNN EP
This avoids fallback of nodes consuming block quantized tensors on CPU EP and further improves inference time.

- Low Power Block Quantization(LPBQ) is widely used to accelerate accuracy sensitive models via QNN(Qualcomm Neural Network) stack. - LPBQ encoding format is Qualcomm's alternative for BlockQuantization technique. - The current implementation expects LPBQ encodings packed in a node sequence (DQ -> Q -> DQ) - This PR folds LPBQ pattern on Weight of Gemm nodes into a Qnn BlockExpansion encoding structure. - This PR adds INT4 Quantization support

Copilot

Pull Request Overview

This PR adds Low Power Block Quantization (LPBQ) support for Gemm nodes in the QNN (Qualcomm Neural Network) execution provider. LPBQ is an alternative block quantization technique that enables acceleration of accuracy-sensitive models by avoiding CPU fallback for block-quantized tensors.

Introduces a new fusion pattern to detect DQ->Q->DQ sequences on Gemm weights and convert them to QNN's BlockExpansion encoding
Adds INT4 quantization support through specialized template traits and quantization functions
Extends the quantization parameter wrapper to handle LPBQ encodings with per-channel float scales and per-block integer scales

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
qnn_utils.h	Adds LPBQ data quantization function and Int4 quantization traits
qnn_utils.cc	Implements LowPowerBlockQuantizeData function for LPBQ encoding
qnn_quant_params_wrapper.h	Extends wrapper to support LPBQ quantization parameters
qnn_quant_params_wrapper.cc	Implements LPBQ constructor and deep copy logic
utils.h/cc	Adds utility functions for parent/child node traversal in fusion detection
qnn_node_group.cc	Registers LPBQ Gemm fusion and updates fusion dispatch logic
lpbqgemm_fusion.h/cc	Implements the LPBQ Gemm fusion pattern detection and QNN node creation
qnn_model_wrapper.h/cc	Templated UnpackScales function to support both float and uint8_t scales

onnxruntime/core/providers/qnn/builder/qnn_utils.h

onnxruntime/core/providers/qnn/builder/qnn_utils.cc

onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqgemm_fusion.cc

jywu-msft · 2025-07-19T15:23:29Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows x64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-07-19T15:23:47Z

Azure Pipelines successfully started running 5 pipeline(s).

jywu-msft · 2025-07-19T20:10:49Z

There are build errors in both Linux QNN and Windows ARM64 QNN CI pipelines

- Fixes Linux build error - fix documentation for a function

HectorSVC · 2025-07-21T16:00:05Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows x64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-07-21T16:00:26Z

Azure Pipelines successfully started running 5 pipeline(s).

onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqgemm_fusion.h

HectorSVC

jywu-msft requested review from HectorSVC, Copilot and adrianlizarraga July 19, 2025 03:38

Copilot AI reviewed Jul 19, 2025

View reviewed changes

jywu-msft added the ep:QNN issues related to QNN exeution provider label Jul 19, 2025

Fix build error

125d90d

- Fixes Linux build error - fix documentation for a function

HectorSVC reviewed Jul 21, 2025

View reviewed changes

onnxruntime/core/providers/qnn/builder/qnn_node_group/lpbqgemm_fusion.h Show resolved Hide resolved

HectorSVC approved these changes Jul 21, 2025

View reviewed changes

HectorSVC merged commit 91e9118 into microsoft:main Jul 21, 2025
103 of 114 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QNN EP] Add LowPowerBlockQuantization support for Gemm node #25458

[QNN EP] Add LowPowerBlockQuantization support for Gemm node #25458

Uh oh!

quic-tirupath commented Jul 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jywu-msft commented Jul 19, 2025

Uh oh!

azure-pipelines bot commented Jul 19, 2025

Uh oh!

jywu-msft commented Jul 19, 2025

Uh oh!

HectorSVC commented Jul 21, 2025

Uh oh!

azure-pipelines bot commented Jul 21, 2025

Uh oh!

Uh oh!

HectorSVC left a comment

Uh oh!

Uh oh!

Uh oh!

[QNN EP] Add LowPowerBlockQuantization support for Gemm node #25458

[QNN EP] Add LowPowerBlockQuantization support for Gemm node #25458

Uh oh!

Conversation

quic-tirupath commented Jul 19, 2025

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jywu-msft commented Jul 19, 2025

Uh oh!

azure-pipelines bot commented Jul 19, 2025

Uh oh!

jywu-msft commented Jul 19, 2025

Uh oh!

HectorSVC commented Jul 21, 2025

Uh oh!

azure-pipelines bot commented Jul 21, 2025

Uh oh!

Uh oh!

HectorSVC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!