The purpose of different implementations of qs8/qu8-f32-vcvt kernels. #7644

tingboliao · 2025-01-06T15:12:50Z

Hi, @oliIMG
In the PR #7127, I see there are different versions of qs8/qu8-f32-vcvt kernels
namely qs8-f32-vcvt-rvv-u1v.c, qs8-f32-vcvt-rvv-u2v.c, qu8-f32-vcvt-rvv-u1v.c and qu8-f32-vcvt-rvv-u2v.c.
They are about the m1/m2 rvv implementations of the kernels.

In some other kernels, they have four RVV implementation versions (based on scalar) in the forms of m1, m2, m4 and m8.

I want to know the purpose of the different implementations, and how the users can select the best version.

fbarchard · 2025-01-21T22:15:19Z

In general a kernel with 'u4v' means 'm4' for the source.
Kernels such as float binary ops, can implement all 4 variations - m1, m2, m4, m8.
In the src/configs/gemm-config.c etc, the fastest of these variations can be enabled.
It will depend on hardware, so once some benchmarks can be done with different vendors, a switch statement on uarch can be added to select different kernels for different hardware.

With 8 or 16 bit datatypes, the intermediates are often lengthened, limiting the variations to m1, m2, and maybe m4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The purpose of different implementations of qs8/qu8-f32-vcvt kernels. #7644

The purpose of different implementations of qs8/qu8-f32-vcvt kernels. #7644

tingboliao commented Jan 6, 2025

fbarchard commented Jan 21, 2025

The purpose of different implementations of qs8/qu8-f32-vcvt kernels. #7644

The purpose of different implementations of qs8/qu8-f32-vcvt kernels. #7644

Comments

tingboliao commented Jan 6, 2025

fbarchard commented Jan 21, 2025