Skip to content

UPSTREAM PR #21071: hexagon: optimize HMX matmul operations#1351

Open
loci-dev wants to merge 19 commits intomainfrom
loci/pr-21071-dev-hmx-opt
Open

UPSTREAM PR #21071: hexagon: optimize HMX matmul operations#1351
loci-dev wants to merge 19 commits intomainfrom
loci/pr-21071-dev-hmx-opt

Conversation

@loci-dev
Copy link
Copy Markdown

Note

Source pull request: ggml-org/llama.cpp#21071

Overview

This pull request refactors several matrix multiplication and data handling routines in ggml-hexagon/htp/hmx-matmul-ops.c to improve type safety, consistency, and code clarity. The main changes involve standardizing loop counters and size-related variables to use size_t instead of int, updating function signatures accordingly, and simplifying tile indexing logic. Additionally, the initialization of column scales is made more consistent, and some redundant or legacy code paths are removed.

Type safety and consistency improvements:

  • Changed loop counters and size-related variables from int to size_t across multiple functions (e.g., core_dot_chunk_fp16, core_mma_chunk_fp16, transfer_output_chunk_fp16_to_fp32) and updated related calculations and function signatures for better type safety and to prevent integer overflow issues. [1] [2] [3] [4] [5]
  • Updated function signatures and local variable declarations to consistently use const size_t for sizes and counts, improving code clarity and reducing potential bugs from type mismatches. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Additional information

Tested with Qwen3.5-2b-q4, works well

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES, for commit log and PR descriptions

@loci-review
Copy link
Copy Markdown

loci-review Bot commented Apr 15, 2026

No meaningful performance changes were detected across 127615 analyzed functions in the following binaries: build.bin.llama-bench, build.bin.libmtmd.so, build.bin.libllama.so, build.bin.llama-tts, build.bin.llama-cvector-generator, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.libggml-base.so, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli, build.bin.llama-tokenize, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli.

💬 Questions? Tag @loci-dev

@loci-dev loci-dev force-pushed the main branch 6 times, most recently from 7638ab4 to f1b46d5 Compare April 20, 2026 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants