[OV CPU] add CPU pass ConvertConvolutionToMatMul into decompression_handling_manager #32983

bopeng1234 · 2025-11-21T09:34:45Z

Details:

Enabling the model that uses 1x1 kernel convolution with a dequantize pattern.
Adding the pass ConvertConvolutionToMatMul to convert the 1x1 kernel conv to matmul, then reuse MarkDequantization and related patterns
if not, the int4 weight will be const folding to fp32

Tickets:

CVS-175417

… matmul. Then met MarkDequantization pattern

maxnick · 2025-11-24T13:44:52Z

@v-Golubev , could you please review this PR?

v-Golubev · 2025-11-24T15:54:25Z

src/common/transformations/tests/op_conversions/convert_convolution_to_matmul_test.cpp

+namespace {
+std::shared_ptr<Model> create_conv_function(const Shape& input_shape,
+                                            const Shape& weights_shape,
+                                            bool with_dequantization) {


Minor:

Suggested change

bool with_dequantization) {

bool with_decompression) {

As far as I see, this feature is targeted on decompressed weights + float activations scenario. Let's reflect it in the builder name

changed to decompression

v-Golubev · 2025-11-24T15:55:38Z

src/common/transformations/tests/op_conversions/convert_convolution_to_matmul_test.cpp

+using namespace testing;
+
+namespace {
+std::shared_ptr<Model> create_conv_function(const Shape& input_shape,


Can we add test cases with dynamic input_shape or modify the existing ones?

added dynamic shape testcase

v-Golubev · 2025-11-24T16:12:49Z

src/common/transformations/tests/op_conversions/convert_convolution_to_matmul_test.cpp

+        auto weights_convert = std::make_shared<op::v0::Convert>(weights_const, element::f32);
+        auto sub_const = op::v0::Constant::create(element::i4, {1}, {1});
+        auto sub_convert = std::make_shared<op::v0::Convert>(sub_const, element::f32);
+        auto subtract = std::make_shared<op::v1::Subtract>(weights_convert, sub_convert);


The ConvertConvolutionToMatMul matcher covers more complicated cases as well (e.g. zp_reshape or weights_sub_multiply_reshape). Can we cover them by tests?
You can try to reuse ov::test::utils::initMatMulDecompressionSubgraph to avoid code duplication

reused initMatMulDecompressionSubgraph and covered more cases

v-Golubev · 2025-11-24T16:20:10Z

src/common/transformations/src/transformations/op_conversions/convert_convolution_to_matmul.cpp

+            // weights should be static 1x1 kernel, [hidden_out, hidden_in, 1, 1]
+            const auto& weights_shape = conv_node->get_input_partial_shape(1);
+            if (weights_shape.is_dynamic()) {
+                return false;
+            }
+            if (weights_shape.size() != 4 || weights_shape[2] != 1 || weights_shape[3] != 1) {
+                return false;
+            }
+
+            // input should met: [seq_len, hidden_in, 1, 1], [1, hidden_in, 1, seq_len] or [1, hidden_in, seq_len, 1]
+            const auto& input_shape = conv_node->get_input_partial_shape(0);
+            if (input_shape.rank().get_length() != 4) {
+                return false;
+            }
+            const bool is_supported_shape = (input_shape[2] == 1 && input_shape[3] == 1) ||
+                                            (input_shape[0] == 1 && input_shape[2] == 1) ||
+                                            (input_shape[0] == 1 && input_shape[3] == 1);
+            if (!is_supported_shape) {
+                return false;
+            }


Shape checks can be done without custom predicate logic using ov::pass::pattern::shape_matches predicate which uses symbolics feature. Can we try to reuse it here to simplify the code?
Please see example in src/common/transformations/src/transformations/common_optimizations/fuse_rotary_positional_embeddings.cpp

thanks, reused shape_matches for the input shape.

v-Golubev · 2025-11-24T16:29:48Z

src/common/transformations/src/transformations/op_conversions/convert_convolution_to_matmul.cpp

+            return conv_node->get_strides() == ov::Strides{1, 1} && conv_node->get_dilations() == ov::Strides{1, 1} &&
+                   conv_node->get_pads_begin() == ov::CoordinateDiff{0, 0} &&
+                   conv_node->get_pads_end() == ov::CoordinateDiff{0, 0};


Shouldn't we check auto_pad value here?

We can pass the desired attributes as a separate wrap_type parameter and avoid a custom predicate introduction. Could you please try this? As a reference, you can use mlp3_no_bias_swiglu_block block in src/common/transformations/src/transformations/common_optimizations/fuse_moe_experts.cpp: Attributes are set as last wrap_type parameter (e.g. {{"mode", "bidirectional"}} for broadcast node)

yep, add auto_pad check, and use attri check with warp_type.

v-Golubev · 2025-11-24T16:45:48Z

src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp

    ov::pass::Manager decompression_handling_manager("CPU:DecompressionHandling");
    decompression_handling_manager.set_per_pass_validation(false);
+    CPU_REGISTER_PASS_COMMON(decompression_handling_manager, ov::pass::ConvertConvolutionToMatMul);
+    CPU_REGISTER_PASS_COMMON(decompression_handling_manager, ov::pass::EliminateReshape);


Please keep InitNodeInfo first in the pipeline

position moved

v-Golubev · 2025-11-24T16:50:47Z

src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp

    // This must be done in order to keep compressed MatMul weights with decompression operations as is
    ov::pass::Manager decompression_handling_manager("CPU:DecompressionHandling");
    decompression_handling_manager.set_per_pass_validation(false);
+    CPU_REGISTER_PASS_COMMON(decompression_handling_manager, ov::pass::ConvertConvolutionToMatMul);


As far as I understood, this transformation converts 1x1 Convolution with compressed weights to Transpose->MatMul->Transpose sequence: this allows to keep weights decompression subgraph unfolded, which is better from memory consumption perspective.

The main concern I have is that we may potentially introduce performance degradations for the quantized models. In this case, low precision transformations handles dequantization operations from activations and weights, so after the low precision pipeline we usually have u8 activations + i8 weights, so the target layer can be executed in low precision. But if earlier we had just one Convolution(Quantized) after low precision transformations, now we have Transpose->MatMul(Quantized)->Transpose, which may be worse from performance perspective.

From my perspective, it looks like we should avoid the transformation if the model is quantized. What do you think?

as we discussed offline, added the supported_precisions parameter into the pattern to handle LPT.

…ulDecompressionSubgraph and cover more pattern test; use shape_matches; use attri match in wrap_type; add lpt supported precesion arguement in pattern

add pass ConvertConvolutionToMatMul, which convert 1x1 kernel conv to…

a8a1fa0

… matmul. Then met MarkDequantization pattern

github-actions bot added category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations labels Nov 21, 2025

bopeng1234 marked this pull request as ready for review November 21, 2025 09:45

bopeng1234 requested review from a team as code owners November 21, 2025 09:45

bopeng1234 requested review from CuriousPanCake and removed request for a team November 21, 2025 09:45

bopeng1234 added 2 commits November 24, 2025 11:51

add dequantize pattern in, not all conv1x1 convert to matmul

80f1f4e

add test

b4daf06

maxnick requested a review from v-Golubev November 24, 2025 13:44

maxnick assigned maxnick and v-Golubev and unassigned maxnick Nov 24, 2025

v-Golubev reviewed Nov 24, 2025

View reviewed changes

address review comments, typo; add dynamic shape test; reuse initMatM…

9165bd1

…ulDecompressionSubgraph and cover more pattern test; use shape_matches; use attri match in wrap_type; add lpt supported precesion arguement in pattern

[OV CPU] add CPU pass ConvertConvolutionToMatMul into decompression_handling_manager #32983

Are you sure you want to change the base?

[OV CPU] add CPU pass ConvertConvolutionToMatMul into decompression_handling_manager #32983

Conversation

bopeng1234 commented Nov 21, 2025

Details:

Tickets:

Uh oh!

maxnick commented Nov 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants