UPSTREAM PR #21071: hexagon: optimize HMX matmul operations by loci-dev · Pull Request #1351 · auroralabs-loci/llama.cpp

loci-dev · 2026-04-15T03:10:57Z

Note

Source pull request: ggml-org/llama.cpp#21071

Overview

This pull request refactors several matrix multiplication and data handling routines in ggml-hexagon/htp/hmx-matmul-ops.c to improve type safety, consistency, and code clarity. The main changes involve standardizing loop counters and size-related variables to use size_t instead of int, updating function signatures accordingly, and simplifying tile indexing logic. Additionally, the initialization of column scales is made more consistent, and some redundant or legacy code paths are removed.

Type safety and consistency improvements:

Changed loop counters and size-related variables from int to size_t across multiple functions (e.g., core_dot_chunk_fp16, core_mma_chunk_fp16, transfer_output_chunk_fp16_to_fp32) and updated related calculations and function signatures for better type safety and to prevent integer overflow issues. [1] [2] [3] [4] [5]
Updated function signatures and local variable declarations to consistently use const size_t for sizes and counts, improving code clarity and reducing potential bugs from type mismatches. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Additional information

Tested with Qwen3.5-2b-q4, works well

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, for commit log and PR descriptions

…front

…e readability

…r tile counts

…ation

…ales initialization

…ing and locking

…output

# Conflicts: # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c

… initialization

# Conflicts: # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c

loci-review · 2026-04-15T04:05:48Z

No meaningful performance changes were detected across 127615 analyzed functions in the following binaries: build.bin.llama-bench, build.bin.libmtmd.so, build.bin.libllama.so, build.bin.llama-tts, build.bin.llama-cvector-generator, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.libggml-base.so, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli, build.bin.llama-tokenize, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli.

💬 Questions? Tag @loci-dev

chraac added 19 commits March 26, 2026 21:28

optimize hmx_mat_mul functions by calculating row and column tiles up…

de56c35

…front

refactor core_dot_chunk_fp16 to use size_t for tile counts and improv…

b2b21a3

…e readability

wip

5e18f4e

set scale outside of loop

a262832

wip

ee95d92

refactor core_mma_chunk_fp16 and mat_mul_qk_0_d16a32 to use size_t fo…

33d9431

…r tile counts

wip

3a97015

wip

6e291d8

refactor transfer_output_chunk_fp16_to_fp32 to use size_t for dimensions

f43d68c

refactor core_dot_chunk_fp16 to use size_t for tile row stride calcul…

42bd08c

…ation

wip

ee95146

refactor hmx_mat_mul functions to use hvx_vec_splat_f16 for column sc…

91d88a3

…ales initialization

refactor hmx_mat_mul_permuted_w16a32_batched to streamline scale sett…

55d7258

…ing and locking

refactor core_dot_chunk_fp16 to improve tile stride calculations for …

362c62c

…output

Merge branch 'master' into dev-hmx-opt

e31e30a

# Conflicts: # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c

refactor hmx_mat_mul functions to use Q6_V_vsplat_R for column scales…

7c1a5a3

… initialization

Merge branch 'master' into dev-hmx-opt

3cd8041

Merge branch 'master' into dev-hmx-opt

1a71699

# Conflicts: # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c

fix compiling error

2f37db7

loci-dev temporarily deployed to PROD__AL_DEMO April 15, 2026 03:11 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 6 times, most recently from 7638ab4 to f1b46d5 Compare April 20, 2026 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #21071: hexagon: optimize HMX matmul operations#1351

UPSTREAM PR #21071: hexagon: optimize HMX matmul operations#1351
loci-dev wants to merge 19 commits intomainfrom
loci/pr-21071-dev-hmx-opt

loci-dev commented Apr 15, 2026

Uh oh!

loci-review Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Apr 15, 2026

Overview

Additional information

Requirements

Uh oh!

loci-review Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants