-
Notifications
You must be signed in to change notification settings - Fork 803
feat: enable F32 output in CpuGemmConv2d #1184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d0ad533
to
20c80c0
Compare
20c80c0
to
976a634
Compare
validate(Accessor(_target), _reference, tolerance_qasymm8); | ||
} | ||
|
||
FIXTURE_DATA_TEST_CASE(RunSmallDequantizeF32, NEGEMMConvolutionLayerQuantizedF32OutputFixture<int8_t>, framework::DatasetMode::ALL, combine(combine(combine(combine(combine(datasets::SmallConvolutionLayerDataset(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single combine(..) is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've changed the one in QASYMM8 block :) This one is still with multiple combines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next patch
|
||
TensorType _target{}; | ||
SimpleTensor<T> _reference{}; | ||
SimpleTensor<TO> _reference{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next patchset
ARM_COMPUTE_ASSERT(dst.info()->is_resizable()); | ||
// Test "add padding after configure" behavior. This behavior should not affect the correctness | ||
add_padding_x({ &src, &bias, &dst }, _data_layout); | ||
if( !(T_is_q && TO_is_f32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should enable this now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renabled
add_padding_x({ &src, &bias, &dst }, _data_layout); | ||
if( !(T_is_q && TO_is_f32)) | ||
{ | ||
add_padding_x({ &src, &bias, &dst }, _data_layout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabled
*/ | ||
#ifndef ARM_COMPUTE_TEST_SMALL_CONVOLUTION_LAYER_DATASET | ||
#define ARM_COMPUTE_TEST_SMALL_CONVOLUTION_LAYER_DATASET | ||
#ifndef ACL_TESTS_DATASETS_SMALLCONVOLUTIONLAYERDATASET_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we don't need to change this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted in latest patch
src/cpu/operators/CpuGemmConv2d.cpp
Outdated
quantization::calculate_quantized_multipliers(iqinfo, wqinfo, oqinfo, output_info); | ||
|
||
// F32 dequant path? (input quantized, output float) | ||
const bool dequantize_f32 = (dst->data_type() == DataType::F32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also need to check input type. _is_quantized is true when we have UInt8 and Int16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/cpu/operators/CpuGemmConv2d.cpp
Outdated
gemm_input_to_use = &im2col_reshaped_info; | ||
} | ||
|
||
const bool dequantize_f32 = is_data_type_quantized(data_type) && dst->data_type() == DataType::F32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly for is_data_type_quantized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
case DataType::S8: | ||
{ | ||
if (is_data_type_quantized_asymmetric(a_to_use->data_type()) && | ||
if (dst->data_type() != DataType::F32 && is_data_type_quantized_asymmetric(a_to_use->data_type()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!dequantize_f32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in latest patch
// CpuGemmLowpMatrixAReductionKernel | ||
if (a->data_type() == DataType::QASYMM8_SIGNED && output->data_type() == DataType::F32) | ||
{ | ||
TensorInfo info_vector_sum_col{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move this logic to a function and reuse it in the place where you copied from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to an inline function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the code inside the conditional. Starting with TensorInfo and ending with if(b_offset_kernel_needed) code segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the latest patch.
template <typename T> | ||
using NEGEMMConvolutionLayerQuantizedFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T>; | ||
template <typename T> | ||
using NEGEMMConvolutionLayerQuantizedF32OutputFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T,false,float>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to modify the public header files' comment headers to reflect the data type support as well as the operator list document.
Looks like there are two entry points. NEConvolutionLayer and NEGEMMConvolutionLayer. We need to be testing both.
Also, is it guaranteed to choose CpuGemmConv2d as the algorithm when this is triggered via NEConvolutionLayer? The get_convolution_method in CpuConv2d doesn't account for it. Or, do we have some configurations that can't be run for this data type configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also need to check if we accidentally return true in NEDeconv as it's using NEConv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added new test DequantFP32_SupportedTypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the supported types.
For the layers that we should test, i.e. NEConvolutionLayer and NEGEMMConvolutionalLayer, we should
- Use
NEGEMMConvolutionLayer
in the declaration ofNEGEMMConvolutionLayerQuantizedF32OutputFixture
- Create another fixture instantiation with
NEConvolutionLayer
and fixture nameNEConvolutionLayerQuantizedF32OutputFixture
. At the moment the fixture name and the layer name are mixed.
This way, we'll be able to test both layers.
I've also looked at the get_convolution_method
. It's not guaranteed to choose CpuGemmConv2d for this configuration. Set channels to smth > 16, e.g. 22 in the first configuration in the SmallConvolutionLayerDataset
; the test fails. This is because we're not validating correctly in CpuGemmDirectConv2d
. I'll create a PR with a fix, and we can rebase on top of that. This brings us to my other observation, which is, we need to test on more shapes in the NIGHTLY suite. When I changed SmallConvolutionLayerDataset
with LargeConvolutionLayerDataset
, it again failed. We don't have enough coverage at the moment.
I've finally looked into NEDeconvolutionLayer
which uses NEConvolutionLayer
under the hood. It is fine as its validate() call is able to catch the Input/Dst data type configuration, unlike CpuGemmDirectConv2d
.
02cc3e6
to
4c780b3
Compare
* |QASYMM8 |QASYMM8_SIGNED |S32 |QASYMM8 | | ||
* |QASYMM8 |QSYMM8_PER_CHANNEL |S32 |QASYMM8 | | ||
* |QASYMM8_SIGNED |QASYMM8_SIGNED |S32 |QASYMM8_SIGNED | | ||
* |QASYMM8_SIGNED |QASYMM8_SIGNED |F32 |F32 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to modify CpuGemmConv2d header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
6a8b64a
to
88c1594
Compare
validate(Accessor(_target), _reference, tolerance_qasymm8); | ||
} | ||
|
||
FIXTURE_DATA_TEST_CASE(RunSmallDequantizeF32, NEGEMMConvolutionLayerQuantizedF32OutputFixture<int8_t>, framework::DatasetMode::ALL, combine(combine(combine(combine(combine(datasets::SmallConvolutionLayerDataset(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've changed the one in QASYMM8 block :) This one is still with multiple combines.
template <typename T> | ||
using NEGEMMConvolutionLayerQuantizedFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T>; | ||
template <typename T> | ||
using NEGEMMConvolutionLayerQuantizedF32OutputFixture = ConvolutionValidationQuantizedFixture<Tensor, Accessor, NEConvolutionLayer, T,false,float>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the supported types.
For the layers that we should test, i.e. NEConvolutionLayer and NEGEMMConvolutionalLayer, we should
- Use
NEGEMMConvolutionLayer
in the declaration ofNEGEMMConvolutionLayerQuantizedF32OutputFixture
- Create another fixture instantiation with
NEConvolutionLayer
and fixture nameNEConvolutionLayerQuantizedF32OutputFixture
. At the moment the fixture name and the layer name are mixed.
This way, we'll be able to test both layers.
I've also looked at the get_convolution_method
. It's not guaranteed to choose CpuGemmConv2d for this configuration. Set channels to smth > 16, e.g. 22 in the first configuration in the SmallConvolutionLayerDataset
; the test fails. This is because we're not validating correctly in CpuGemmDirectConv2d
. I'll create a PR with a fix, and we can rebase on top of that. This brings us to my other observation, which is, we need to test on more shapes in the NIGHTLY suite. When I changed SmallConvolutionLayerDataset
with LargeConvolutionLayerDataset
, it again failed. We don't have enough coverage at the moment.
I've finally looked into NEDeconvolutionLayer
which uses NEConvolutionLayer
under the hood. It is fine as its validate() call is able to catch the Input/Dst data type configuration, unlike CpuGemmDirectConv2d
.
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are inside UpdateStaticQuantInfoAfterConfigure
suite. We should be outside of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the tests outside UpdateStaticQuantInfoAfterConfigure
7c6490c
to
780b190
Compare
- Updated convolution reference to branch epilogue: * TO=float: int32 to float dequant (acc * sA * sB + bias_f32) * TO!=float: usual quantize_down_scale_by_fixedpoint with int32 bias - Changed fixture to use F32 bias tensor for Q->F32 runs (instead of S32), matching arm_gemm dequant epilogue which only supports float bias. - Added explicit template instantiations for convolution_layer with TBias=float, TO=float to fix linker errors in validation. - Disabled activation in arm_gemm dequant path: offsets are applied afterwards by CpuGemmLowpOffsetContributionKernel, so activation must run there to see the correct final accumulator. - src/cpu/kernels/gemmlowp/generic/neon/impl.h neon_run_offset_contribution_float(): replace per-batch offset for vector_sum_col from Y stride to W stride. This aligns target and reference for quantized to F32 convolution tests and prevents premature clamping before offset contributions. Change-Id: I6fffc98dc0798542a2702e6a593b850c16561e3b Signed-off-by: Pablo Marquez Tello <[email protected]>
780b190
to
950a00b
Compare
This aligns target and reference for quantized to F32 convolution tests and prevents premature clamping before offset contributions.
Change-Id: I6fffc98dc0798542a2702e6a593b850c16561e3b