Skip to content

llamafile: add rvv support for sgemm kernels#3

Merged
luhenry merged 1 commit into
masterfrom
10x-llamafile-sgemm
Dec 18, 2025
Merged

llamafile: add rvv support for sgemm kernels#3
luhenry merged 1 commit into
masterfrom
10x-llamafile-sgemm

Conversation

@taimur-10x

Copy link
Copy Markdown
Collaborator

Summary

This PR adds RISC-V vector (RVV) support for the SGEMM kernels.

These kernels are used by GGML_OP_MUL_MAT for prompt processing.

Key Changes

  • RVV Support is added for F16 and F32 types (with the zvfh extension) and BF16 (with the zvfbfwma extension).
  • Tiling was decided based on various LMUL configurations
    • 4x6 with LMUL=1 (32 register groups)
    • 4x3 with LMUL=2 (16 register groups)
    • 2x2 with LMUL=4 (8 register groups)

Testing

Kernels were functionally tested on QEMU for VLENs (128-bit, 256-bit, 512-bit and 1024-bit) for a range of input sizes.

Benchmarking Results

End-to-end benchmarking on BananaPI-BPI F3 (VLEN=256)

Prefill / Prompt Processing

Tokens / Second

Model Prompt Size SGEMM 4x6 SGEMM 4x3 SGEMM 2x2 Vector Dot
Tinyllama F16 1.1B 32 6.08 7.89 6.26 8.42
Tinyllama F16 1.1B 64 6.09 7.25 11.31 7.57
Tinyllama F16 1.1B 128 5.93 6.9 13.73 8.78
Tinyllama F16 1.1B 256 5.54 6.79 12.56 8.57
Tinyllama F16 1.1B 512 5.37 6.64 13.37 8.68

@taimur-10x taimur-10x marked this pull request as draft December 5, 2025 11:03
@github-actions github-actions Bot added the ggml label Dec 5, 2025
@taimur-10x taimur-10x marked this pull request as ready for review December 8, 2025 14:01
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
@luhenry

luhenry commented Dec 11, 2025

Copy link
Copy Markdown

@xctan it would be lovely to have your review on that as it will go to upstream next.

@xctan

xctan commented Dec 11, 2025

Copy link
Copy Markdown

LGTM.

@luhenry luhenry merged commit c7b29f2 into master Dec 18, 2025
63 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants