Skip to content

PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp #12326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 147 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
147 commits
Select commit Hold shift + click to select a range
06067d9
ggml-qnn: add Qualcomm QNN backend for GGML
zhouwg Feb 14, 2025
b5fd39d
ggml-qnn: santiy check
zhouwg Feb 15, 2025
0d7c4a8
ggml-qnn: update script build-run-android.sh to compare peformance of…
zhouwg Feb 16, 2025
0a451d9
ggml-qnn: fix minor issue in test-backend-ops.cpp
zhouwg Feb 17, 2025
1a24e56
ggml-qnn: merge QNN RPC feature from https://github.com/zhouwg/kantv/…
zhouwg Feb 18, 2025
db468ae
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 18, 2025
ebd8fa3
ggml-qnn: a concise approach to offload mulmat to QNN backend(sync fr…
zhouwg Feb 19, 2025
ea83eba
ggml-qnn: remove redundant codes
zhouwg Feb 20, 2025
fce215d
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 20, 2025
502247c
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 20, 2025
e701869
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 21, 2025
0de8553
ggml-qnn: add Qualcomm QNN backend for GGML
zhouwg Feb 14, 2025
a89e64a
ggml-qnn: merge QNN RPC feature from https://github.com/zhouwg/kantv/…
zhouwg Feb 18, 2025
c4a780b
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 18, 2025
d05629c
ggml-qnn: a concise approach to offload mulmat to QNN backend(sync fr…
zhouwg Feb 19, 2025
e2c8f93
ggml-qnn: remove redundant codes
zhouwg Feb 20, 2025
66d6564
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 20, 2025
da4eae5
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 20, 2025
9d616fe
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
zhouwg Feb 21, 2025
9eea8da
ggml-qnn: fix a minior typo in internal doc
zhouwg Feb 23, 2025
e197999
ggml-qnn: refine function ggml_qnn_create_general_tensor() to avoid c…
zhouwg Feb 23, 2025
f255ac8
ggml-qnn: fix a minor typo in source code
zhouwg Feb 24, 2025
0be90ba
build: avoid ggml-qnn backend breaking other backend's builds
zhouwg Feb 24, 2025
2f43a99
ggml-qnn: remove redundant codes to make PR reviewers happy
zhouwg Feb 25, 2025
70dfefd
ggml-qnn: refine code format
zhouwg Feb 25, 2025
dea57fa
ggml-qnn: offload quantized type mulmat to QNN backend
zhouwg Feb 26, 2025
30d3e8f
ggml-qnn: refine source code structure to make code more clearly
zhouwg Feb 27, 2025
2322321
ggml-qnn: enable release build with necessary logs to make reviewers …
zhouwg Feb 27, 2025
0a5c240
ggml-qnn: enable all quantize type with 2d mulmat
zhouwg Feb 27, 2025
f1dc950
ggml-qnn: enable log output of GGMLQNN_LOG_INFO in command line mode …
zhouwg Feb 28, 2025
713a0b7
ggml-qnn: Windows port --- step2
zhouwg Feb 28, 2025
82e8513
ggml-qnn: merge UT code and corresponding script from local dev branc…
zhouwg Mar 2, 2025
93b8956
ggml-qnn: merge ggml_qnn_mul_mat_4d from local dev branch to make wor…
zhouwg Mar 2, 2025
39c8d1f
ggml-qnn: submit AI-assisted ggml_qnn_mul_mat_4d(not worked currently…
zhouwg Mar 2, 2025
681ae34
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step2
zhouwg Mar 2, 2025
5ce91c7
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step3
zhouwg Mar 2, 2025
e693bd1
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step4
zhouwg Mar 2, 2025
0b868cf
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step5
zhouwg Mar 2, 2025
022fc4a
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step6
zhouwg Mar 2, 2025
8e0867f
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step7
zhouwg Mar 2, 2025
d2bd470
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step8
zhouwg Mar 2, 2025
0a3fb93
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- good in step9
zhouwg Mar 2, 2025
946d104
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- narrow down t…
zhouwg Mar 2, 2025
23903f7
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step10
zhouwg Mar 2, 2025
f0d3a15
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- narrow down t…
zhouwg Mar 2, 2025
bd6ffca
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step11
zhouwg Mar 2, 2025
6ea2d67
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- both ok in st…
zhouwg Mar 2, 2025
8f5d12e
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 ---finalizing ver…
zhouwg Mar 2, 2025
bf71293
ggml-qnn: refine ggml_qnn_mul_mat and ggml_qnn_general_node according…
zhouwg Mar 2, 2025
8017b10
ggml-qnn: remove no-needed comments
zhouwg Mar 2, 2025
da22d48
ggml-qnn: Windows port --- step3
zhouwg Mar 3, 2025
7d14f99
ggml-qnn: remove un-needed function
zhouwg Mar 4, 2025
5b00e25
ggml-qnn:rebase to upstream
zhouwg Mar 4, 2025
886a405
ggml-qnn: fix a minior issue during rebase to upstream
zhouwg Mar 4, 2025
8bfb49d
ggml-qnn: update script according to https://github.com/ggml-org/llam…
zhouwg Mar 4, 2025
62da247
ggml-qnn: fix a minior issue in ggmlqnn_create_general_tensor()
zhouwg Mar 4, 2025
38fc2fe
ggml-qnn: active member variable _device_id in class qnn_instance
zhouwg Mar 4, 2025
bb296cd
ggml-qnn: refine ggml_qnn_general_node and ggml_qnn_mul_mat to make c…
zhouwg Mar 4, 2025
5780ce6
ggml-qnn: Windows port --- step4
zhouwg Mar 6, 2025
291d473
ggml-qnn: Windows port -- step5
zhouwg Mar 7, 2025
3262771
ggml-qnn: WoA(Windows on ARM) -- step6
zhouwg Mar 8, 2025
3bd422a
ggml-qnn: rebase to upstream
zhouwg Mar 9, 2025
9969040
ggml-qnn: pr to upstream
zhouwg Mar 11, 2025
2afb697
ggml-qnn: rebase to upstream
zhouwg Mar 18, 2025
82ed776
ggml-qnn: self code-review
zhouwg Mar 18, 2025
340fd04
ggml-qnn: rebase upstream
zhouwg Mar 19, 2025
ec481d5
ggml-qnn: add approach through Hexagon cDSP
zhouwg Mar 22, 2025
1f26917
ggml-qnn: refine general approach through Hexagon cDSP
zhouwg Mar 23, 2025
df168b0
ggml-qnn: refine the entire ggml-qnn.cpp to make code more clear
zhouwg Mar 24, 2025
5a4bdb4
ggml-qnn: refine the entire ggml-qnn.cpp to make code more clear
zhouwg Mar 24, 2025
b64bec1
ggml-qnn: add build script for libggmlop_skel.so
zhouwg Mar 24, 2025
5d90272
ggml-qnn: remove redundant functions in this PR and make codes more c…
zhouwg Mar 25, 2025
3edd11e
ggml-qnn: original ggml_compute_forward_add and ggml_compute_forward_…
zhouwg Mar 25, 2025
007c621
ggml-qnn: modify build-run-android.sh to verify mulmat and validate m…
zhouwg Mar 25, 2025
6f486d0
ggml-qnn: make host code(ggml-qnn.cpp) more clear and more stable
zhouwg Mar 26, 2025
7614e9a
ggml-qnn: refine code according to self code-review and make code mor…
zhouwg Mar 26, 2025
8aa0ad3
ggml-qnn: offload more ggml op to Hexagon cDSP
zhouwg Mar 27, 2025
2d2c09d
ggml-hexagon: code on AP(arm-cpu) side is stable now
zhouwg Mar 28, 2025
3437cbd
ggml-hexagon: optimize GGML_OP_ADD on cDSP side
zhouwg Mar 28, 2025
b1225bc
ggml-hexagon: simplify hexagon-kernel build logic in CMakeLists.txt
zhouwg Mar 29, 2025
904f74d
ggml-hexagon: release ggml-hexagon v0.98
zhouwg Mar 29, 2025
09b86f1
ggml-hexagon: release ggml-hexagon v0.99
zhouwg Mar 29, 2025
94960b2
ggml-hexagon: try to offload q6_k mulmat to cDSP
zhouwg Mar 29, 2025
dfc4a44
ggml-hexagon: fix minior issue in ggml-hexagon.cpp after self code-re…
zhouwg Mar 29, 2025
787132b
ggml-hexagon: check validation of ggml-hexagon.cfg before create appr…
zhouwg Mar 30, 2025
f749303
ggml-hexagon: fix all compiler warnings in ggml-hexagon.cpp
zhouwg Mar 30, 2025
bf0fcf9
ggml-hexagon: enable only one backend device for HWACCEL_CDSP and ena…
zhouwg Mar 31, 2025
95fa682
ggml-hexagon: rpc ion memory pool and test-backend-ops works fine in …
zhouwg Mar 31, 2025
d93f4f3
ggml-hexagon: make comprision of mulmat performance between HWACCEL_Q…
zhouwg Mar 31, 2025
aa9f754
ggml-hexagon: release ggml-hexagon v1.00
zhouwg Mar 31, 2025
25a182b
ggml-hexagon: rebase to upstream
zhouwg Apr 1, 2025
75582bf
ggml-hexagon: check configuration of enable_rpc_dma_mempool in functi…
zhouwg Apr 1, 2025
259b1f4
ggml-hexagon: uniform rpc_ion_memsize and rpc_ion_usage between HWACC…
zhouwg Apr 1, 2025
7a53251
ggml-hexagon: make buffer mechanism more clear in HWACCEL_CDSP approach
zhouwg Apr 1, 2025
d7536f8
ggml-hexagon: add perf function in hexagon kernerls on cDSP side
zhouwg Apr 2, 2025
0060577
ggml-hexagon: fix a stupid issue of why set rpc latency failure and i…
zhouwg Apr 2, 2025
ddf63ae
ggml-hexagon: make helper function ggmlhexagon_get_timestring() threa…
zhouwg Apr 2, 2025
dead9d1
ggml-hexagon: fix a typo in ggml-hexagon.cpp
zhouwg Apr 2, 2025
d3f452e
ggml-hexagon: list all known todo and fixme tasks in ggml-hexagon.cpp
zhouwg Apr 2, 2025
d2c3741
ggml-hexagon: fix units MB -> MiB
zhouwg Apr 2, 2025
7b30ca7
ggml-hexagon: try to make ggml-hexagon backend works fine in a standa…
zhouwg Apr 3, 2025
be43653
ggml-hexagon: remove reduament code and make debug log more clear
zhouwg Apr 3, 2025
8f715f0
ggml-hexagon: add gemma-3-4b-it-Q8_0.gguf to verify q8_0 mulmat on cDSP
zhouwg Apr 3, 2025
6135d35
ggml-hexagon:add skeleton code of offload GGML_OP_SOFT_MAX/GGML_OP_RM…
zhouwg Apr 3, 2025
7c003bb
ggml-hexagon: release ggml-dsp v0.60 on cDSP side
zhouwg Apr 4, 2025
dee3e8f
ggml-hexagon: merge build logic in kernels/Makefile to ggml-hexagon/C…
zhouwg Apr 5, 2025
3e839f6
ggml-hexagon: fix a typo in ggml-hexagon.cpp
zhouwg Apr 5, 2025
62f6362
ggml-hexagon: uniform NDEBUG usage in ggml-hexagon.cpp and ggml-dsp.c
zhouwg Apr 6, 2025
88091c7
ggml-hexagon: add profiler feature for purpose of visualize NPU perfo…
zhouwg Apr 7, 2025
36cd3ea
ggml-hexagon: remove so-called dma memory pool to avoid confusion and…
zhouwg Apr 8, 2025
258ebed
ggml-hexagon: make function ggmlhexagon_init_rpcmempool in ggml-hexag…
zhouwg Apr 8, 2025
798caa7
ggml-hexagon: fix potential resource leak in class hexagon_profiler
zhouwg Apr 8, 2025
fbe6dbc
ggml-hexagon: enable multi-threading feature on cDSP side
zhouwg Apr 8, 2025
3a651b1
ggml-hexagon: upgrade QNN SDK to v2.33.0.250327
zhouwg Apr 9, 2025
4fc155d
ggml-hexagon: fix typo in ggml-hexagon.cpp
zhouwg Apr 9, 2025
d9234c9
ggml-dsp: probe QuRT RTOS information in function ggmlop_dsp_open
zhouwg Apr 9, 2025
0d9f9e8
ggml-hexagon: setting enable_rpc_ion_mempool to 1 and make test-backe…
zhouwg Apr 10, 2025
d769c4c
ggml-hexagon: check whether user's specified htp arch is valid in CMa…
zhouwg Apr 10, 2025
2331c4d
ggml-hexagon: sync with upstream
zhouwg Apr 11, 2025
8a2f81d
ggml-hexagon: refine pinned-memory feature
zhouwg Apr 11, 2025
c2a35f7
ggml-hexagon: refine build system in ggml-hexagon
zhouwg Apr 11, 2025
e04fb01
ggml-hexagon: remove redundant code in struct ggml_backend_hexagon_bu…
zhouwg Apr 11, 2025
0607e4a
ggml-hexagon: upgrade Android NDK to android-ndk-r28
zhouwg Apr 11, 2025
366d416
ggml-dsp: split ggml-dsp.c into multiple files and cleanup
zhouwg Apr 11, 2025
d4c9eda
ggml-dsp: refine ggml-dsp and make ggml-dsp more clear
zhouwg Apr 12, 2025
9983048
ggml-hexagon: fix a minior issue in dev ops
zhouwg Apr 12, 2025
a715712
ggml-hexagon: fix a build issue in CI
zhouwg Apr 12, 2025
5238676
ggml-dsp: cleanup code
zhouwg Apr 15, 2025
d4853c4
ggml-hexagon: sync with upstream
zhouwg Apr 15, 2025
e865197
ggml-dsp: cleanup code
zhouwg Apr 16, 2025
c1d3c70
ggml-dsp:refine ggmlhexagon_dsp_add_f32
zhouwg Apr 16, 2025
d2cacda
ggml-dsp: refine logic of thread_counts
zhouwg Apr 17, 2025
78ef94f
ggml-hexagon: release v1.06 and ready for code review
zhouwg Apr 17, 2025
a6dd75a
ggml-dsp: make GGML_OP_ADD more faster on cDSP side
zhouwg Apr 19, 2025
8d7ac64
ggml-hexagon: sync from project kantv(make ggml-hexagon backend can w…
zhouwg Apr 24, 2025
24382c0
sync with upstream llama.cpp and sync ggml-hexagon.cpp from project k…
zhouwg Apr 29, 2025
4183579
sync with upstream
zhouwg May 7, 2025
e8ecbfe
sync with upstream
zhouwg May 10, 2025
2a6df8e
ggml-hexagon: upgrade QNN SDK to v2.34.0.250424
zhouwg May 11, 2025
79bcc6e
sync with upstream
zhouwg May 16, 2025
8494724
ggml-hexagon: sync from project kantv(fix a long-term issue which int…
zhouwg May 17, 2025
cbb69fa
ggml-hexagon: sync with upstream llama.cpp
zhouwg May 23, 2025
4740d4f
ggml-hexagon: add set_hexagon_cfg(int new_hexagon_backend, int new_hw…
zhouwg Jun 3, 2025
f7d8ea1
ggml-hexagon: sync with branch self-build
zhouwg Jun 19, 2025
0b3136d
ggml-hexagon:sycn with branch self-build
zhouwg Jun 23, 2025
fe95d3b
project: sync with upstream(PR-14501:remove kompute backend)
zhouwg Jul 3, 2025
98b0c5e
ggml:fix minior issue during rebase upstream PR-14501: remove kompute…
zhouwg Jul 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -146,3 +146,5 @@ poetry.toml
# Local scripts
/run-vim.sh
/run-chat.sh

/prebuilts
15 changes: 15 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,20 @@ set(CMAKE_WARN_UNUSED_CLI YES)

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

if(CMAKE_SYSTEM_NAME STREQUAL "Android")
if(DEFINED HTP_ARCH_VERSION)
if (${HTP_ARCH_VERSION} STREQUAL "v75" OR ${HTP_ARCH_VERSION} STREQUAL "v79")
#works fine on Snapdragon 8Gen3&8Elite with 1.5x - 3x performance gains with the default ggml backend
set(OPT_FLAG " -O3 -march=armv8.7-a -mcpu=cortex-x1 -mtune=cortex-x1 -ffp-model=fast -fno-finite-math-only")
message("OPT_FLAG:${OPT_FLAG}")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")
endif()
endif()
endif()

if (NOT XCODE AND NOT MSVC AND NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS "Debug" "Release" "MinSizeRel" "RelWithDebInfo")
Expand Down Expand Up @@ -127,6 +141,7 @@ llama_option_depr(WARNING LLAMA_RPC GGML_RPC)
llama_option_depr(WARNING LLAMA_SYCL GGML_SYCL)
llama_option_depr(WARNING LLAMA_SYCL_F16 GGML_SYCL_F16)
llama_option_depr(WARNING LLAMA_CANN GGML_CANN)
llama_option_depr(WARNING LLAMA_HEXAGON GGML_HEXAGON)

if (NOT MSVC)
if (LLAMA_SANITIZE_THREAD)
Expand Down
2 changes: 2 additions & 0 deletions ggml/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ option(GGML_OPENCL_EMBED_KERNELS "ggml: embed kernels"
option(GGML_OPENCL_USE_ADRENO_KERNELS "ggml: use optimized kernels for Adreno" ON)
set (GGML_OPENCL_TARGET_VERSION "300" CACHE STRING
"gmml: OpenCL API version to target")
option(GGML_HEXAGON "ggml: use HEXAGON" OFF)

# toolchain for vulkan-shaders-gen
set (GGML_VULKAN_SHADERS_GEN_TOOLCHAIN "" CACHE FILEPATH "ggml: toolchain file for vulkan-shaders-gen")
Expand Down Expand Up @@ -270,6 +271,7 @@ set(GGML_PUBLIC_HEADERS
include/ggml-rpc.h
include/ggml-sycl.h
include/ggml-vulkan.h
include/ggml-hexagon.h
include/gguf.h)

set_target_properties(ggml PROPERTIES PUBLIC_HEADER "${GGML_PUBLIC_HEADERS}")
Expand Down
48 changes: 48 additions & 0 deletions ggml/include/ggml-hexagon.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#pragma once

#include "ggml.h"
#include "ggml-backend.h"

#ifdef __cplusplus
extern "C" {
#endif

#define GGML_HEXAGON_MAX_DEVICES 4
#define GGML_HEXAGON_BACKEND_NAME "hexagon"

enum HEXAGONBackend {
HEXAGON_BACKEND_QNNCPU = 0,
HEXAGON_BACKEND_QNNGPU = 1,
HEXAGON_BACKEND_QNNNPU = 2,
HEXAGON_BACKEND_CDSP = 3,
HEXAGON_BACKEND_GGML = 4, //"fake" HEXAGON backend for compare performance between HEXAGON backend and ggml backend
};

//0: general approach through QNN:offload ggmlop to QNN(QNNCPU, QNNGPU, QNNNPU)
//1: special approach through QNN-SINGLEGRAPH:mapping entire ggml cgraph to a single QNN graph
//2: general approach through Hexagon cDSP:offload ggmlop to Hexagon cDSP directly
enum hwaccel_approach_type {
HWACCEL_QNN = 0,
HWACCEL_QNN_SINGLEGRAPH= 1,
HWACCEL_CDSP = 2,
};

GGML_BACKEND_API ggml_backend_t ggml_backend_hexagon_init(size_t dev_num, const char * qnn_lib_path);

GGML_BACKEND_API bool ggml_backend_is_hexagon(ggml_backend_t backend);

GGML_BACKEND_API int ggml_backend_hexagon_get_device_count(void);

GGML_BACKEND_API ggml_backend_reg_t ggml_backend_hexagon_reg(void);

GGML_BACKEND_API const char * ggml_backend_hexagon_get_devname(size_t dev_num);

GGML_BACKEND_API void ggml_backend_hexagon_set_cfg(int new_hexagon_backend, int new_hwaccel_approach);

GGML_BACKEND_API int ggml_backend_hexagon_get_mulmat_algotype(void);

GGML_BACKEND_API void ggml_backend_hexagon_set_mulmat_algotype(int new_mulmat_algotype);

#ifdef __cplusplus
}
#endif
1 change: 1 addition & 0 deletions ggml/src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,7 @@ ggml_add_backend(RPC)
ggml_add_backend(SYCL)
ggml_add_backend(Vulkan)
ggml_add_backend(OpenCL)
ggml_add_backend(HEXAGON)

foreach (target ggml-base ggml)
target_include_directories(${target} PUBLIC $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/../include> $<INSTALL_INTERFACE:include>)
Expand Down
8 changes: 8 additions & 0 deletions ggml/src/ggml-backend-reg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@
#include "ggml-cann.h"
#endif

#ifdef GGML_USE_HEXAGON
#include "ggml-hexagon.h"
#endif

// disable C++17 deprecation warning for std::codecvt_utf8
#if defined(__clang__)
# pragma clang diagnostic push
Expand Down Expand Up @@ -185,6 +189,9 @@ struct ggml_backend_registry {
#ifdef GGML_USE_RPC
register_backend(ggml_backend_rpc_reg());
#endif
#ifdef GGML_USE_HEXAGON
register_backend(ggml_backend_hexagon_reg());
#endif
#ifdef GGML_USE_CPU
register_backend(ggml_backend_cpu_reg());
#endif
Expand Down Expand Up @@ -574,6 +581,7 @@ void ggml_backend_load_all_from_path(const char * dir_path) {
ggml_backend_load_best("vulkan", silent, dir_path);
ggml_backend_load_best("opencl", silent, dir_path);
ggml_backend_load_best("musa", silent, dir_path);
ggml_backend_load_best("hexagon", silent, dir_path);
ggml_backend_load_best("cpu", silent, dir_path);
// check the environment variable GGML_BACKEND_PATH to load an out-of-tree backend
const char * backend_path = std::getenv("GGML_BACKEND_PATH");
Expand Down
133 changes: 133 additions & 0 deletions ggml/src/ggml-hexagon/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
project(ggml-hexagon)
message(STATUS "Using HEXAGON backend")
message("CMAKE_SYSTEM_NAME : ${CMAKE_SYSTEM_NAME}")

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

if(NOT DEFINED QNN_SDK_PATH)
message(FATAL_ERROR "QNN_SDK_PATH not defined")
endif()

if(NOT DEFINED HEXAGON_SDK_PATH)
message(FATAL_ERROR "HEXAGON_SDK_PATH not defined")
endif()

message("QNN_SDK_PATH : ${QNN_SDK_PATH}")
message("HEXAGON_SDK_PATH: ${HEXAGON_SDK_PATH}")
message("HTP_ARCH_VERSION: ${HTP_ARCH_VERSION}")

if (CMAKE_BUILD_TYPE STREQUAL "Debug")
set(DEBUG_FLAG "-DDEBUG -Wall")
message("Debug mode:${DEBUG_FLAG}")
else()
set(DEBUG_FLAG "-DNDEBUG -Wall")
#manually disable all verbose logs in ggml-hexagon/CMakeLists.txt to
#make compare NPU performance through llama-bench more clear
#set(DEBUG_FLAG "-DNDEBUG -Wall -DDISABLE_ALL_LOG")
message("Release mode:${DEBUG_FLAG}")
endif()

#v68 --- Snapdragon 888
#v69 --- Snapdragon 8 Gen1
#v73 --- Snapdragon 8 Gen2
#v75 --- Snapdragon 8 Gen3
#v79 --- Snapdragon 8 Elite
if(NOT DEFINED HTP_ARCH_VERSION)
message(FATAL_ERROR "HTP_ARCH_VERSION not defined, valid htp arch: v68,v69,v73,v75,v79")
endif()

#check whether user's specified htp arch is valid
set(CHECK_HTP_ARCH "WRONG")
foreach (feat v68 v69 v73 v75 v79)
if (${feat} STREQUAL ${HTP_ARCH_VERSION})
set(CHECK_HTP_ARCH "GOOD")
endif()
endforeach()
if (${CHECK_HTP_ARCH} STREQUAL "WRONG")
message(FATAL_ERROR "ggml-hexagon backend only support htp arch v68,v69,v73,v75,v79")
endif()

#check optimization flags
set(OPT_FLAG " ")
if (${HTP_ARCH_VERSION} STREQUAL "v75" OR ${HTP_ARCH_VERSION} STREQUAL "v79")
#works fine on Snapdragon 8Gen3&8Elite with 1.5x - 3x performance gains with the default ggml backend
set(OPT_FLAG " -O3 -march=armv8.7-a -mcpu=cortex-x1 -mtune=cortex-x1 -flto -D_GNU_SOURCE -fvectorize -ffp-model=fast -fno-finite-math-only")
endif()
message("OPT_FLAG:${OPT_FLAG}")

if(CMAKE_SYSTEM_NAME STREQUAL "Android")
find_library(LOG_LIB log)

add_library(cdsprpc
SHARED
IMPORTED)
set_target_properties(cdsprpc
PROPERTIES
IMPORTED_LOCATION
${HEXAGON_SDK_PATH}/ipc/fastrpc/remote/ship/android_aarch64/libcdsprpc.so)

set(QNN_LINK_LIBRARIES ${LOG_LIB} cdsprpc)
set(QNN_DEFAULT_LIB_SEARCH_PATH "/data/local/tmp/" CACHE STRING "customized library search path for QNN backend")

include_directories(${HEXAGON_SDK_PATH}/incs)
include_directories(${HEXAGON_SDK_PATH}/incs/stddef)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/incs)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/rpcmem/inc)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/remote/ship/android_Debug_aarch64)
include_directories(${HEXAGON_SDK_PATH}/utils/examples)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/rtld/ship/android_aarch64)
include_directories(${HEXAGON_SDK_PATH}/libs/atomic/inc)
include_directories(${HEXAGON_SDK_PATH}/libs/atomic/android_Debug_aarch64/ship)
include_directories(${CMAKE_SOURCE_DIR}/ggml/src/ggml-hexagon/)
include_directories(${CMAKE_SOURCE_DIR}/ggml/src/ggml-hexagon/kernels/)
elseif(CMAKE_SYSTEM_NAME STREQUAL "Windows")
set(QNN_DEFAULT_LIB_SEARCH_PATH "C:\\" CACHE STRING "customized library search path for QNN backend")
else()
message(FATAL_ERROR "ggml-hexagon now only available on Android and Windows(Windows on ARM)")
endif()

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")

file(GLOB HEXAGON_SOURCES "${CMAKE_CURRENT_LIST_DIR}/*.cpp" "${CMAKE_CURRENT_LIST_DIR}/kernels/stub.c")
ggml_add_backend_library(ggml-hexagon ${HEXAGON_SOURCES})

target_include_directories(ggml-hexagon PRIVATE ${QNN_SDK_PATH}/include/QNN ${HEXAGON_SDK_PATH} ${CMAKE_CURRENT_LIST_DIR})
target_link_libraries(ggml-hexagon PRIVATE ${QNN_LINK_LIBRARIES})

string(REGEX REPLACE "/$" "" QNN_DEFAULT_LIB_SEARCH_PATH "${QNN_DEFAULT_LIB_SEARCH_PATH}")
target_compile_definitions(ggml-hexagon PRIVATE QNN_DEFAULT_LIB_SEARCH_PATH="${QNN_DEFAULT_LIB_SEARCH_PATH}/")

#cross compiling source codes of hexagon kernels which running on cDSP side
function(ggml_hexagon_build_kernel KNAME)
message(STATUS "ggml_hexagon: build hexagon-kernel ${KNAME}")

add_custom_command(
TARGET ${PROJECT_NAME}
POST_BUILD
COMMAND echo "current working path:`pwd`\n"
COMMAND echo "${CMAKE_CURRENT_LIST_DIR}/kernels"
COMMAND make -C ${CMAKE_CURRENT_LIST_DIR}/kernels/ clean
COMMAND make -C ${CMAKE_CURRENT_LIST_DIR}/kernels/ HEXAGON_SDK_PATH=${HEXAGON_SDK_PATH} HTP_ARCH_VERSION=${HTP_ARCH_VERSION} DEBUG_FLAG=${DEBUG_FLAG}
COMMAND echo "current working path:`pwd`\n"
COMMAND ls -l ../../../bin/libggmldsp-skel.so
COMMENT "build hexagon-kernel"
)
endfunction()

function(ggml_hexagon_setup_cfg KNAME)
message(STATUS "ggml_hexagon: setup runtime configuration file ${KNAME}")
add_custom_command(
TARGET ${PROJECT_NAME}
POST_BUILD
COMMAND echo "current working path:`pwd`\n"
COMMAND /bin/cp -fv ../../../../../scripts/${KNAME} ../../../bin/
COMMENT "setup runtime configuration file"
)
endfunction()

ggml_hexagon_build_kernel("cdsp")
ggml_hexagon_setup_cfg("ggml-hexagon.cfg")
Loading
Loading