feat: enable F32 output in CpuGemmConv2d #1184

morgolock · 2025-09-19T10:56:49Z

Updated convolution reference to branch epilogue:
- TO=float: int32 to float dequant (acc * sA * sB + bias_f32)
- TO!=float: usual quantize_down_scale_by_fixedpoint with int32 bias
Changed fixture to use F32 bias tensor for Q->F32 runs (instead of S32), matching arm_gemm dequant epilogue which only supports float bias.
Added explicit template instantiations for convolution_layer with TBias=float, TO=float to fix linker errors in validation.
Disabled activation in arm_gemm dequant path: offsets are applied afterwards by CpuGemmLowpOffsetContributionKernel, so activation must run there to see the correct final accumulator.

This aligns target and reference for quantized to F32 convolution tests and prevents premature clamping before offset contributions.

Change-Id: I6fffc98dc0798542a2702e6a593b850c16561e3b

tests/validation/fixtures/ConvolutionLayerFixture.h

gunes-arm · 2025-09-22T13:30:37Z

tests/validation/NEON/ConvolutionLayer.cpp

    validate(Accessor(_target), _reference, tolerance_qasymm8);
 }
+
+FIXTURE_DATA_TEST_CASE(RunSmallDequantizeF32, NEGEMMConvolutionLayerQuantizedF32OutputFixture<int8_t>, framework::DatasetMode::ALL, combine(combine(combine(combine(combine(datasets::SmallConvolutionLayerDataset(),


single combine(..) is enough

I think you've changed the one in QASYMM8 block :) This one is still with multiple combines.

Fixed in next patch

gunes-arm · 2025-09-22T13:34:05Z

tests/validation/fixtures/ConvolutionLayerFixture.h


    TensorType       _target{};
-    SimpleTensor<T>  _reference{};
+    SimpleTensor<TO>  _reference{};


space alignment

Done in next patchset

gunes-arm · 2025-10-03T09:39:21Z

tests/validation/fixtures/ConvolutionLayerFixture.h

        ARM_COMPUTE_ASSERT(dst.info()->is_resizable());
        // Test "add padding after configure" behavior. This behavior should not affect the correctness
-        add_padding_x({ &src, &bias, &dst }, _data_layout);
+        if( !(T_is_q && TO_is_f32))


I believe we should enable this now

gunes-arm · 2025-10-03T09:39:41Z

tests/validation/fixtures/ConvolutionLayerFixture.h

-        add_padding_x({ &src, &bias, &dst }, _data_layout);
+        if( !(T_is_q && TO_is_f32))
+        {
+            add_padding_x({ &src, &bias, &dst }, _data_layout);


gunes-arm · 2025-10-03T10:24:45Z

tests/datasets/SmallConvolutionLayerDataset.h

 */
-#ifndef ARM_COMPUTE_TEST_SMALL_CONVOLUTION_LAYER_DATASET
-#define ARM_COMPUTE_TEST_SMALL_CONVOLUTION_LAYER_DATASET
+#ifndef ACL_TESTS_DATASETS_SMALLCONVOLUTIONLAYERDATASET_H


I suppose we don't need to change this file

Reverted in latest patch

gunes-arm · 2025-10-03T11:03:50Z

src/cpu/operators/CpuGemmConv2d.cpp

-        quantization::calculate_quantized_multipliers(iqinfo, wqinfo, oqinfo, output_info);
+
+        // F32 dequant path? (input quantized, output float)
+        const bool dequantize_f32 = (dst->data_type() == DataType::F32);


We might also need to check input type. _is_quantized is true when we have UInt8 and Int16.

gunes-arm · 2025-10-03T11:04:59Z

src/cpu/operators/CpuGemmConv2d.cpp

        gemm_input_to_use = &im2col_reshaped_info;
    }

+    const bool dequantize_f32 = is_data_type_quantized(data_type) && dst->data_type() == DataType::F32;


similarly for is_data_type_quantized

gunes-arm · 2025-10-03T11:10:26Z

src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp

            case DataType::S8:
            {
-                if (is_data_type_quantized_asymmetric(a_to_use->data_type()) &&
+                if (dst->data_type() != DataType::F32 && is_data_type_quantized_asymmetric(a_to_use->data_type()) &&


!dequantize_f32

done in latest patch

gunes-arm · 2025-10-03T11:20:02Z

src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp

+        // CpuGemmLowpMatrixAReductionKernel
+        if (a->data_type() == DataType::QASYMM8_SIGNED && output->data_type() == DataType::F32)
+        {
+            TensorInfo info_vector_sum_col{};


I think we can move this logic to a function and reuse it in the place where you copied from.

Moved to an inline function

I was referring to the code inside the conditional. Starting with TensorInfo and ending with if(b_offset_kernel_needed) code segment.

Done in the latest patch.

gunes-arm · 2025-10-03T11:25:18Z

tests/validation/NEON/ConvolutionLayer.cpp

 template <typename T>
 using NEGEMMConvolutionLayerQuantizedFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T>;
+template <typename T>
+using NEGEMMConvolutionLayerQuantizedF32OutputFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T,false,float>;


We need to modify the public header files' comment headers to reflect the data type support as well as the operator list document.

Looks like there are two entry points. NEConvolutionLayer and NEGEMMConvolutionLayer. We need to be testing both.

Also, is it guaranteed to choose CpuGemmConv2d as the algorithm when this is triggered via NEConvolutionLayer? The get_convolution_method in CpuConv2d doesn't account for it. Or, do we have some configurations that can't be run for this data type configuration?

We might also need to check if we accidentally return true in NEDeconv as it's using NEConv.

Added new test DequantFP32_SupportedTypes

Thanks for adding the supported types.

For the layers that we should test, i.e. NEConvolutionLayer and NEGEMMConvolutionalLayer, we should

Use NEGEMMConvolutionLayer in the declaration of NEGEMMConvolutionLayerQuantizedF32OutputFixture

Create another fixture instantiation with NEConvolutionLayer and fixture name NEConvolutionLayerQuantizedF32OutputFixture. At the moment the fixture name and the layer name are mixed.
This way, we'll be able to test both layers.

I've also looked at the get_convolution_method. It's not guaranteed to choose CpuGemmConv2d for this configuration. Set channels to smth > 16, e.g. 22 in the first configuration in the SmallConvolutionLayerDataset; the test fails. This is because we're not validating correctly in CpuGemmDirectConv2d. I'll create a PR with a fix, and we can rebase on top of that. This brings us to my other observation, which is, we need to test on more shapes in the NIGHTLY suite. When I changed SmallConvolutionLayerDataset with LargeConvolutionLayerDataset, it again failed. We don't have enough coverage at the moment.

I've finally looked into NEDeconvolutionLayer which uses NEConvolutionLayer under the hood. It is fine as its validate() call is able to catch the Input/Dst data type configuration, unlike CpuGemmDirectConv2d.

gunes-arm · 2025-10-07T08:49:41Z

src/cpu/operators/CpuConv2d.h

     * |QASYMM8        |QASYMM8_SIGNED     |S32    |QASYMM8        |
     * |QASYMM8        |QSYMM8_PER_CHANNEL |S32    |QASYMM8        |
     * |QASYMM8_SIGNED |QASYMM8_SIGNED     |S32    |QASYMM8_SIGNED |
+     * |QASYMM8_SIGNED |QASYMM8_SIGNED     |F32    |F32            |


We also need to modify CpuGemmConv2d header

gunes-arm · 2025-10-07T19:52:17Z

tests/validation/NEON/ConvolutionLayer.cpp

    validate(Accessor(_target), _reference, tolerance_qasymm8);
 }
+
+FIXTURE_DATA_TEST_CASE(RunSmallDequantizeF32, NEGEMMConvolutionLayerQuantizedF32OutputFixture<int8_t>, framework::DatasetMode::ALL, combine(combine(combine(combine(combine(datasets::SmallConvolutionLayerDataset(),


I think you've changed the one in QASYMM8 block :) This one is still with multiple combines.

gunes-arm · 2025-10-07T20:05:56Z

tests/validation/NEON/ConvolutionLayer.cpp

 template <typename T>
 using NEGEMMConvolutionLayerQuantizedFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T>;
+template <typename T>
+using NEGEMMConvolutionLayerQuantizedF32OutputFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T,false,float>;


Thanks for adding the supported types.

For the layers that we should test, i.e. NEConvolutionLayer and NEGEMMConvolutionalLayer, we should

Use NEGEMMConvolutionLayer in the declaration of NEGEMMConvolutionLayerQuantizedF32OutputFixture

Create another fixture instantiation with NEConvolutionLayer and fixture name NEConvolutionLayerQuantizedF32OutputFixture. At the moment the fixture name and the layer name are mixed.
This way, we'll be able to test both layers.

I've also looked at the get_convolution_method. It's not guaranteed to choose CpuGemmConv2d for this configuration. Set channels to smth > 16, e.g. 22 in the first configuration in the SmallConvolutionLayerDataset; the test fails. This is because we're not validating correctly in CpuGemmDirectConv2d. I'll create a PR with a fix, and we can rebase on top of that. This brings us to my other observation, which is, we need to test on more shapes in the NIGHTLY suite. When I changed SmallConvolutionLayerDataset with LargeConvolutionLayerDataset, it again failed. We don't have enough coverage at the moment.

I've finally looked into NEDeconvolutionLayer which uses NEConvolutionLayer under the hood. It is fine as its validate() call is able to catch the Input/Dst data type configuration, unlike CpuGemmDirectConv2d.

gunes-arm · 2025-10-07T20:07:27Z

tests/validation/NEON/ConvolutionLayer.cpp

+
+
+
+


We are inside UpdateStaticQuantInfoAfterConfigure suite. We should be outside of it.

Moved the tests outside UpdateStaticQuantInfoAfterConfigure

- Updated convolution reference to branch epilogue: * TO=float: int32 to float dequant (acc * sA * sB + bias_f32) * TO!=float: usual quantize_down_scale_by_fixedpoint with int32 bias - Changed fixture to use F32 bias tensor for Q->F32 runs (instead of S32), matching arm_gemm dequant epilogue which only supports float bias. - Added explicit template instantiations for convolution_layer with TBias=float, TO=float to fix linker errors in validation. - Disabled activation in arm_gemm dequant path: offsets are applied afterwards by CpuGemmLowpOffsetContributionKernel, so activation must run there to see the correct final accumulator. - src/cpu/kernels/gemmlowp/generic/neon/impl.h neon_run_offset_contribution_float(): replace per-batch offset for vector_sum_col from Y stride to W stride. This aligns target and reference for quantized to F32 convolution tests and prevents premature clamping before offset contributions. Change-Id: I6fffc98dc0798542a2702e6a593b850c16561e3b Signed-off-by: Pablo Marquez Tello <[email protected]>

morgolock requested a review from gunes-arm September 19, 2025 10:56

wajahat-abbas reviewed Sep 19, 2025

View reviewed changes

tests/validation/fixtures/ConvolutionLayerFixture.h Outdated Show resolved Hide resolved

wajahat-abbas reviewed Sep 19, 2025

View reviewed changes

tests/validation/fixtures/ConvolutionLayerFixture.h Outdated Show resolved Hide resolved

morgolock force-pushed the pr/conv_f32_dequant branch 5 times, most recently from d0ad533 to 20c80c0 Compare September 25, 2025 09:01

morgolock force-pushed the pr/conv_f32_dequant branch from 20c80c0 to 976a634 Compare September 30, 2025 07:38

morgolock mentioned this pull request Sep 30, 2025

NEConvolutionLayer \ NEGEMMConvolutionLayer: F32 dequantized output for QASYMM8 \ QASYMM8_SIGNED inputs #1169

Open

gunes-arm requested changes Oct 3, 2025

View reviewed changes

morgolock force-pushed the pr/conv_f32_dequant branch 4 times, most recently from 02cc3e6 to 4c780b3 Compare October 7, 2025 08:45

gunes-arm reviewed Oct 7, 2025

View reviewed changes

morgolock force-pushed the pr/conv_f32_dequant branch 3 times, most recently from 6a8b64a to 88c1594 Compare October 7, 2025 17:22

gunes-arm requested changes Oct 7, 2025

View reviewed changes

morgolock force-pushed the pr/conv_f32_dequant branch 2 times, most recently from 7c6490c to 780b190 Compare October 10, 2025 22:04

morgolock force-pushed the pr/conv_f32_dequant branch from 780b190 to 950a00b Compare October 13, 2025 21:26

feat: enable F32 output in CpuGemmConv2d #1184

Are you sure you want to change the base?

feat: enable F32 output in CpuGemmConv2d #1184

Conversation

morgolock commented Sep 19, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants