Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The purpose of different implementations of qs8/qu8-f32-vcvt kernels. #7644

Open
tingboliao opened this issue Jan 6, 2025 · 1 comment
Open

Comments

@tingboliao
Copy link

Hi, @oliIMG
In the PR #7127, I see there are different versions of qs8/qu8-f32-vcvt kernels
namely qs8-f32-vcvt-rvv-u1v.c, qs8-f32-vcvt-rvv-u2v.c, qu8-f32-vcvt-rvv-u1v.c and qu8-f32-vcvt-rvv-u2v.c.
They are about the m1/m2 rvv implementations of the kernels.

In some other kernels, they have four RVV implementation versions (based on scalar) in the forms of m1, m2, m4 and m8.

I want to know the purpose of the different implementations, and how the users can select the best version.

@fbarchard
Copy link
Collaborator

In general a kernel with 'u4v' means 'm4' for the source.
Kernels such as float binary ops, can implement all 4 variations - m1, m2, m4, m8.
In the src/configs/gemm-config.c etc, the fastest of these variations can be enabled.
It will depend on hardware, so once some benchmarks can be done with different vendors, a switch statement on uarch can be added to select different kernels for different hardware.

With 8 or 16 bit datatypes, the intermediates are often lengthened, limiting the variations to m1, m2, and maybe m4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants