Skip to content

Conversation

morgolock
Copy link

  • Updated convolution reference to branch epilogue:
    • TO=float: int32 to float dequant (acc * sA * sB + bias_f32)
    • TO!=float: usual quantize_down_scale_by_fixedpoint with int32 bias
  • Changed fixture to use F32 bias tensor for Q->F32 runs (instead of S32), matching arm_gemm dequant epilogue which only supports float bias.
  • Added explicit template instantiations for convolution_layer with TBias=float, TO=float to fix linker errors in validation.
  • Disabled activation in arm_gemm dequant path: offsets are applied afterwards by CpuGemmLowpOffsetContributionKernel, so activation must run there to see the correct final accumulator.

This aligns target and reference for quantized to F32 convolution tests and prevents premature clamping before offset contributions.

Change-Id: I6fffc98dc0798542a2702e6a593b850c16561e3b

@morgolock morgolock requested a review from gunes-arm September 19, 2025 10:56
validate(Accessor(_target), _reference, tolerance_qasymm8);
}

FIXTURE_DATA_TEST_CASE(RunSmallDequantizeF32, NEGEMMConvolutionLayerQuantizedF32OutputFixture<int8_t>, framework::DatasetMode::ALL, combine(combine(combine(combine(combine(datasets::SmallConvolutionLayerDataset(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single combine(..) is enough

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've changed the one in QASYMM8 block :) This one is still with multiple combines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next patch


TensorType _target{};
SimpleTensor<T> _reference{};
SimpleTensor<TO> _reference{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space alignment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in next patchset

ARM_COMPUTE_ASSERT(dst.info()->is_resizable());
// Test "add padding after configure" behavior. This behavior should not affect the correctness
add_padding_x({ &src, &bias, &dst }, _data_layout);
if( !(T_is_q && TO_is_f32))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should enable this now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renabled

add_padding_x({ &src, &bias, &dst }, _data_layout);
if( !(T_is_q && TO_is_f32))
{
add_padding_x({ &src, &bias, &dst }, _data_layout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabled

*/
#ifndef ARM_COMPUTE_TEST_SMALL_CONVOLUTION_LAYER_DATASET
#define ARM_COMPUTE_TEST_SMALL_CONVOLUTION_LAYER_DATASET
#ifndef ACL_TESTS_DATASETS_SMALLCONVOLUTIONLAYERDATASET_H
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we don't need to change this file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted in latest patch

quantization::calculate_quantized_multipliers(iqinfo, wqinfo, oqinfo, output_info);

// F32 dequant path? (input quantized, output float)
const bool dequantize_f32 = (dst->data_type() == DataType::F32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also need to check input type. _is_quantized is true when we have UInt8 and Int16.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

gemm_input_to_use = &im2col_reshaped_info;
}

const bool dequantize_f32 = is_data_type_quantized(data_type) && dst->data_type() == DataType::F32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly for is_data_type_quantized

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case DataType::S8:
{
if (is_data_type_quantized_asymmetric(a_to_use->data_type()) &&
if (dst->data_type() != DataType::F32 && is_data_type_quantized_asymmetric(a_to_use->data_type()) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!dequantize_f32

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in latest patch

// CpuGemmLowpMatrixAReductionKernel
if (a->data_type() == DataType::QASYMM8_SIGNED && output->data_type() == DataType::F32)
{
TensorInfo info_vector_sum_col{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move this logic to a function and reuse it in the place where you copied from.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to an inline function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the code inside the conditional. Starting with TensorInfo and ending with if(b_offset_kernel_needed) code segment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest patch.

template <typename T>
using NEGEMMConvolutionLayerQuantizedFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T>;
template <typename T>
using NEGEMMConvolutionLayerQuantizedF32OutputFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T,false,float>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to modify the public header files' comment headers to reflect the data type support as well as the operator list document.

Looks like there are two entry points. NEConvolutionLayer and NEGEMMConvolutionLayer. We need to be testing both.

Also, is it guaranteed to choose CpuGemmConv2d as the algorithm when this is triggered via NEConvolutionLayer? The get_convolution_method in CpuConv2d doesn't account for it. Or, do we have some configurations that can't be run for this data type configuration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also need to check if we accidentally return true in NEDeconv as it's using NEConv.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new test DequantFP32_SupportedTypes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the supported types.

For the layers that we should test, i.e. NEConvolutionLayer and NEGEMMConvolutionalLayer, we should

  • Use NEGEMMConvolutionLayer in the declaration of NEGEMMConvolutionLayerQuantizedF32OutputFixture
  • Create another fixture instantiation with NEConvolutionLayer and fixture name NEConvolutionLayerQuantizedF32OutputFixture. At the moment the fixture name and the layer name are mixed.
    This way, we'll be able to test both layers.

I've also looked at the get_convolution_method. It's not guaranteed to choose CpuGemmConv2d for this configuration. Set channels to smth > 16, e.g. 22 in the first configuration in the SmallConvolutionLayerDataset; the test fails. This is because we're not validating correctly in CpuGemmDirectConv2d. I'll create a PR with a fix, and we can rebase on top of that. This brings us to my other observation, which is, we need to test on more shapes in the NIGHTLY suite. When I changed SmallConvolutionLayerDataset with LargeConvolutionLayerDataset, it again failed. We don't have enough coverage at the moment.

I've finally looked into NEDeconvolutionLayer which uses NEConvolutionLayer under the hood. It is fine as its validate() call is able to catch the Input/Dst data type configuration, unlike CpuGemmDirectConv2d.

@morgolock morgolock force-pushed the pr/conv_f32_dequant branch 4 times, most recently from 02cc3e6 to 4c780b3 Compare October 7, 2025 08:45
* |QASYMM8 |QASYMM8_SIGNED |S32 |QASYMM8 |
* |QASYMM8 |QSYMM8_PER_CHANNEL |S32 |QASYMM8 |
* |QASYMM8_SIGNED |QASYMM8_SIGNED |S32 |QASYMM8_SIGNED |
* |QASYMM8_SIGNED |QASYMM8_SIGNED |F32 |F32 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to modify CpuGemmConv2d header

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@morgolock morgolock force-pushed the pr/conv_f32_dequant branch 3 times, most recently from 6a8b64a to 88c1594 Compare October 7, 2025 17:22
validate(Accessor(_target), _reference, tolerance_qasymm8);
}

FIXTURE_DATA_TEST_CASE(RunSmallDequantizeF32, NEGEMMConvolutionLayerQuantizedF32OutputFixture<int8_t>, framework::DatasetMode::ALL, combine(combine(combine(combine(combine(datasets::SmallConvolutionLayerDataset(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've changed the one in QASYMM8 block :) This one is still with multiple combines.

template <typename T>
using NEGEMMConvolutionLayerQuantizedFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T>;
template <typename T>
using NEGEMMConvolutionLayerQuantizedF32OutputFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T,false,float>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the supported types.

For the layers that we should test, i.e. NEConvolutionLayer and NEGEMMConvolutionalLayer, we should

  • Use NEGEMMConvolutionLayer in the declaration of NEGEMMConvolutionLayerQuantizedF32OutputFixture
  • Create another fixture instantiation with NEConvolutionLayer and fixture name NEConvolutionLayerQuantizedF32OutputFixture. At the moment the fixture name and the layer name are mixed.
    This way, we'll be able to test both layers.

I've also looked at the get_convolution_method. It's not guaranteed to choose CpuGemmConv2d for this configuration. Set channels to smth > 16, e.g. 22 in the first configuration in the SmallConvolutionLayerDataset; the test fails. This is because we're not validating correctly in CpuGemmDirectConv2d. I'll create a PR with a fix, and we can rebase on top of that. This brings us to my other observation, which is, we need to test on more shapes in the NIGHTLY suite. When I changed SmallConvolutionLayerDataset with LargeConvolutionLayerDataset, it again failed. We don't have enough coverage at the moment.

I've finally looked into NEDeconvolutionLayer which uses NEConvolutionLayer under the hood. It is fine as its validate() call is able to catch the Input/Dst data type configuration, unlike CpuGemmDirectConv2d.





Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are inside UpdateStaticQuantInfoAfterConfigure suite. We should be outside of it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the tests outside UpdateStaticQuantInfoAfterConfigure

@morgolock morgolock force-pushed the pr/conv_f32_dequant branch 2 times, most recently from 7c6490c to 780b190 Compare October 10, 2025 22:04
- Updated convolution reference to branch epilogue:
  * TO=float: int32 to float dequant (acc * sA * sB + bias_f32)
  * TO!=float: usual quantize_down_scale_by_fixedpoint with int32 bias
- Changed fixture to use F32 bias tensor for Q->F32 runs (instead of S32),
  matching arm_gemm dequant epilogue which only supports float bias.
- Added explicit template instantiations for convolution_layer with
  TBias=float, TO=float to fix linker errors in validation.
- Disabled activation in arm_gemm dequant path:
  offsets are applied afterwards by CpuGemmLowpOffsetContributionKernel,
  so activation must run there to see the correct final accumulator.
- src/cpu/kernels/gemmlowp/generic/neon/impl.h
    neon_run_offset_contribution_float(): replace per-batch offset
    for vector_sum_col from Y stride to W stride.

This aligns target and reference for quantized to F32 convolution tests
and prevents premature clamping before offset contributions.

Change-Id: I6fffc98dc0798542a2702e6a593b850c16561e3b
Signed-off-by: Pablo Marquez Tello <[email protected]>
@morgolock morgolock force-pushed the pr/conv_f32_dequant branch from 780b190 to 950a00b Compare October 13, 2025 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants