Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
f65bc34
hexagon: use DIRID 13 in libggml-htp.inf for modern InfVerif (#22306)
mengshengwu Apr 24, 2026
13d36cf
ggml-webgpu: enable FLASH_ATTN_EXT on browser without subgroup matrix…
ArberSephirotheca Apr 24, 2026
a702f39
CI Snapdragon: Switch ubuntu-latest to ubuntu-slim runner (#22303)
shreyajn Apr 24, 2026
361fe72
Hexagon: Bump HMX Frequency to Max Corner (#22334)
trivikram-reddy1 Apr 24, 2026
0adede8
parser: fix structured output bug (#22302)
pwilkin Apr 24, 2026
dd2914d
ggml-webgpu: support for SSM_SCAN and disable set_rows error checking…
reeselevine Apr 25, 2026
eddd7a1
[SYCL] Optimize Q4_0 mul_mat for Arc770, add scripts (#22291)
arthw Apr 25, 2026
8ea8fee
gitignore : add .pi + personal SYSTEM.md (#22316)
ggerganov Apr 25, 2026
9d34231
llama-quant : default ftype param `Q5_1` --> `Q8_0` (#20828)
ddh0 Apr 25, 2026
d164904
metal : optimize Metal Tensor API usage for GGML_OP_MUL_MAT (#20962)
Developer-Ecosystem-Engineering Apr 25, 2026
9725a31
CUDA: reduce MMQ stream-k overhead (#22298)
JohannesGaessler Apr 25, 2026
98dc141
spec : fix vocab compat checks (#22358)
ggerganov Apr 25, 2026
dcad77c
chat: fix handling of space in reasoning markers (#22353)
pwilkin Apr 25, 2026
b760272
hexagon: guard HMX clock request for v75+ platforms (#22377)
trivikram-reddy1 Apr 26, 2026
f454bd7
opencl: add iq4_nl support (#22272)
lhez Apr 26, 2026
2dd8416
ggml-cpu: optimize avx2 q6_k (#22345)
netrunnereve Apr 26, 2026
0c6ee1c
ggml-cpu : re-enable fast gelu_quick_f16 (#22339)
CISC Apr 26, 2026
b1a5bd4
CUDA: better coalesce data-access for contiguous concat (#22330)
ORippler Apr 26, 2026
7ec36aa
Github: set meta backend code owner (#22388)
JohannesGaessler Apr 26, 2026
78433f6
Fix recurrent state serialization for partial reads and writes (#22362)
gaugarg-nv Apr 26, 2026
06a811d
add performance-portable tuning for register-tile and subgroup matmul…
SharmaRithik Apr 26, 2026
f535774
pr2wt : symlink .pi (#22386)
ggerganov Apr 26, 2026
5594d13
common: fix missing exports in llama-common (#22340)
max-krasnyansky Apr 27, 2026
f84270e
ggml : use 64 bytes aligned tile buffers (#21058)
angt Apr 27, 2026
d13540b
convert : remove input_scale for dequantized fp8 modelopt (#22356)
CISC Apr 27, 2026
0f1bb60
model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (…
ynankani Apr 27, 2026
e940b3d
download : prefer q8_0 when q4_k not available (#22428)
ggerganov Apr 27, 2026
42401c7
Fix type casting for unaccounted memory calculation (#22424)
rankaiyx Apr 27, 2026
ceaf47c
fix: rpc-server cache may not work in Windows environments (#22394)
unraido Apr 27, 2026
4414c04
Additional test for common/gemma4 : handle parsing edge cases (#22420)
hextriclosan Apr 27, 2026
665abc6
add fast mat-vec kernels for i-quants (#22344)
SharmaRithik Apr 27, 2026
983ca89
server: (router) Forward form-data to model server (Fixes #22044) (#2…
tha80 Apr 27, 2026
434b2a1
ggml-webgpu: add Q1_0 support (#22374)
SharmaRithik Apr 27, 2026
516e8d7
server: use pos_next instead of n_tokens for m-rope (#22439)
am17an Apr 28, 2026
14e733e
spec : refactor params (#22397)
ggerganov Apr 28, 2026
c3e08f4
CANN: add new ops, optimize existing ops (#21204)
hipudding Apr 28, 2026
d530d6e
ggml : revert to -lm linking instead of find_library (#22355)
angt Apr 28, 2026
50494a2
ggml : skip already registered backends and devices (#22296)
angt Apr 28, 2026
698d19b
ggml: improve SPIR-V headers detection with __has_include (#21918)
EmilAskerov Apr 28, 2026
1982117
vulkan: add barrier after writetimestamp (#21865)
jeffbolznv Apr 28, 2026
f42e29f
webui: Server tools (#21237)
allozaur Apr 28, 2026
98bb579
ggml-webgpu: fix buffer aliasing for ssm_scan and refactor aliasing l…
reeselevine Apr 28, 2026
f9f3365
vulkan: Coalesce Q4_K/Q5_K scale loads (#21751)
TheBlueMatt Apr 28, 2026
52e5f0a
common : re-arm reasoning budget after DONE on new <think> (#22323)
BruceJillis Apr 28, 2026
5d56eff
convert : add support for Nemotron Nano 3 Omni (#22481)
danbev Apr 28, 2026
7b8443a
ggml-cuda: add flash-attn support for DKQ=320/DV=256 with ncols2=32 (…
lnigam Apr 28, 2026
fc2b005
ggml-cuda: Repost of 21896: Blackwell native NVFP4 support (#22196)
michaelw9999 Apr 28, 2026
739393b
TP: fix delayed AllReduce + zero-sized slices (#22489)
JohannesGaessler Apr 29, 2026
bdc9c74
ggml : add sve tuned code for gemm_q8_0_4x8_q8_0() kernel (#21916)
hrushitfujitsu Apr 29, 2026
7b95ea5
common: Intentionally leak logger instance to fix hanging on Windows …
rillomas Apr 29, 2026
d6a5094
ggml-webgpu: Fix bug in FlashAttention support check (#22492)
reeselevine Apr 29, 2026
b5c4227
ggml-cpu: cmake: append xsmtvdotii march for SpacemiT IME (#22317)
qiurui144 Apr 29, 2026
3142f1d
ggml-cuda: refactor fusion code (#22468)
am17an Apr 29, 2026
1cbc846
ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault …
shalinib-ibm Apr 29, 2026
59237bf
webui: fix slow mic stop and WAV encode (#22480)
ServeurpersoCom Apr 29, 2026
4b221b7
ggml : bump version to 0.10.1 (ggml/1469)
ggerganov Apr 29, 2026
b1d5f5b
sync : ggml
ggerganov Apr 29, 2026
f7135f3
ggml-cpu: add rvv 512b,1024b impls for iq4_xs
taimur-10x Feb 13, 2026
ea500fe
ggml-cpu: refactor; add rvv 512b, 1024b impls for q6_K, i-quants
taimur-10x Feb 14, 2026
489de49
added 512 and 1024 implementations of tq3_s, iq3_xxs, iq2_s, iq2_xs, …
RehanQasim-dev Feb 24, 2026
5cce4e3
ggml-cpu: refactor; improve iq2_xs impl for rvv 256
RehanQasim-dev Feb 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<!-- You can provide more details and link related discussions here. Delete this section if not applicable -->

# Requirements
## Requirements

<!-- IMPORTANT: Please do NOT delete this section, otherwise your PR may be rejected -->

Expand Down
33 changes: 18 additions & 15 deletions .github/workflows/build-and-test-snapdragon.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,28 +49,19 @@ jobs:
cp docs/backend/snapdragon/CMakeUserPresets.json .
cmake --preset arm64-android-snapdragon-release -B build
cmake --build build
cmake --install build --prefix pkg-adb/llama.cpp
cmake --install build --prefix pkg-snapdragon/llama.cpp

- name: Upload Llama.CPP Snapdragon Android Build Artifact
if: ${{ always() && steps.build_llama_cpp_snapdragon_android.outcome == 'success' }}
uses: actions/upload-artifact@v6
with:
name: llama-cpp-android-arm64-snapdragon
path: pkg-adb/llama.cpp

check-secret:
runs-on: ubuntu-latest
outputs:
has-key: ${{ steps.check.outputs.has-key }}
steps:
- id: check
run: echo "has-key=${{ secrets.QDC_API_KEY != '' }}" >> "$GITHUB_OUTPUT"
path: pkg-snapdragon/llama.cpp

test-snapdragon-qdc:
name: Test on QDC Android Device (${{ matrix.device }})
needs: [android-ndk-snapdragon, check-secret]
if: needs.check-secret.outputs.has-key == 'true'
runs-on: ubuntu-latest
needs: [android-ndk-snapdragon]
runs-on: ubuntu-slim
strategy:
fail-fast: false
matrix:
Expand All @@ -81,24 +72,36 @@ jobs:
uses: actions/checkout@v6

- name: Download build artifact
uses: actions/download-artifact@v4
uses: actions/download-artifact@v7
with:
name: llama-cpp-android-arm64-snapdragon
path: pkg-snapdragon/
path: pkg-snapdragon/llama.cpp

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'
cache: pip

- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y curl unzip

- name: Install QDC SDK wheel
run: |
curl -fSL -o qdc_sdk.zip https://softwarecenter.qualcomm.com/api/download/software/tools/Qualcomm_Device_Cloud_SDK/All/0.2.3/qualcomm_device_cloud_sdk-0.2.3.zip
unzip qdc_sdk.zip -d qdc_sdk
pip install qdc_sdk/qualcomm_device_cloud_sdk-0.2.3-py3-none-any.whl

- name: Check QDC API key
id: check_secret
env:
QDC_API_KEY: ${{ secrets.QDC_API_KEY }}
run: echo "has-qdc-key=${{ env.QDC_API_KEY != '' }}" >> "$GITHUB_OUTPUT"

- name: Run QDC tests (${{ matrix.device }})
if: steps.check_secret.outputs.has-qdc-key == 'true'
run: |
python scripts/snapdragon/qdc/run_qdc_jobs.py \
--test all \
Expand Down
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@
/.vscode/
/nppBackup


# Coverage

/gcovr-report/
Expand Down Expand Up @@ -74,6 +73,7 @@
!/models/templates

# Zig

/zig-out/
/zig-cache/

Expand All @@ -93,6 +93,7 @@
!/examples/sycl/*.sh

# Server Web UI temporary files

/tools/server/webui/node_modules
/tools/server/webui/dist
# we no longer use gz for index.html
Expand All @@ -106,9 +107,11 @@ __pycache__/
poetry.toml

# Nix

/result

# Test binaries

/tests/test-backend-ops
/tests/test-double-float
/tests/test-grad0
Expand All @@ -124,6 +127,7 @@ poetry.toml
/tests/test-tokenizer-1-spm

# Scripts

!/scripts/install-oneapi.bat

# Generated by scripts
Expand All @@ -132,18 +136,24 @@ poetry.toml
/wikitext-2-raw/

# Test models for lora adapters

/lora-tests

# Local scripts

/run-vim.sh
/run-chat.sh
/run-spec.sh
/.ccache/

# IDE

/*.code-workspace
/.windsurf/
# emscripten
a.out.*

# AGENTS

AGENTS.local.md
.pi/SYSTEM.md
33 changes: 33 additions & 0 deletions .pi/gg/SYSTEM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
You are a coding agent. Here are some very important rules that you must follow:

General:
- By very precise and concise when writing code, comments, explanations, etc.
- PR and commit titles format: `<module> : <title>`. Lookup recents for examples
- Don't try to build or run the code unless you are explicitly asked to do so

Coding:
- When in doubt, always refer to the CONTRIBUTING.md file of the project
- When referencing issues or PRs in comments, use the format:
- C/C++ code: `// ref: <url>`
- Other (CMake, etc.): `# ref: <url>`

Pull requests (PRs):
- New branch names are prefixed with "gg/"
- Before opening a pull request, ask the user to confirm the description
- When creating a pull request, look for the repository's PR template and follow it
- For the AI usage disclosure section, write "YES. llama.cpp + pi"
- Always create the pull requests in draft mode

Commits:
- On every commit that you make, include a "Assisted-by: llama.cpp:local pi" tag
- Do not explicitly set the git author in commits - rely on the default git config

Resources (read on demand):
- [CONTRIBUTING.md](CONTRIBUTING.md)
- [Build documentation](docs/build.md)
- [Server usage documentation](tools/server/README.md)
- [Server development documentation](tools/server/README-dev.md)
- [PEG parser](docs/development/parsing.md)
- [Auto parser](docs/autoparser.md)
- [Jinja engine](common/jinja/README.md)
- [PR template](.github/pull_request_template.md)
9 changes: 5 additions & 4 deletions CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -53,28 +53,29 @@
/examples/speculative/ @ggerganov
/ggml/cmake/ @ggerganov
/ggml/include/ @ggerganov
/ggml/src/ggml-backend-meta.cpp @JohannesGaessler
/ggml/src/ggml-cann/ @ggml-org/ggml-cann
/ggml/src/ggml-common.h @ggerganov
/ggml/src/ggml-cpu/ @ggerganov
/ggml/src/ggml-cpu/spacemit/ @alex-spacemit
/ggml/src/ggml-cuda/ @ggml-org/ggml-cuda
/ggml/src/ggml-cuda/vendors/hip.h @IMbackK
/ggml/src/ggml-cuda/fattn-wmma* @IMbackK
/ggml/src/ggml-hexagon/ @ggml-org/ggml-hexagon
/ggml/src/ggml-hip/ @IMbackK
/ggml/src/ggml-cuda/vendors/hip.h @IMbackK
/ggml/src/ggml-impl.h @ggerganov
/ggml/src/ggml-metal/ @ggml-org/ggml-metal
/ggml/src/ggml-opencl/ @ggml-org/ggml-opencl
/ggml/src/ggml-hexagon/ @ggml-org/ggml-hexagon
/ggml/src/ggml-openvino/ @cavusmustafa @wine99
/ggml/src/ggml-opt.cpp @JohannesGaessler
/ggml/src/ggml-quants.* @ggerganov
/ggml/src/ggml-rpc/ @ggml-org/ggml-rpc
/ggml/src/ggml-sycl/ @ggml-org/ggml-sycl
/ggml/src/ggml-threading.* @ggerganov
/ggml/src/ggml-vulkan/ @ggml-org/ggml-vulkan
/ggml/src/ggml-virtgpu/ @kpouget
/ggml/src/ggml-vulkan/ @ggml-org/ggml-vulkan
/ggml/src/ggml-webgpu/ @ggml-org/ggml-webgpu
/ggml/src/ggml-zdnn/ @ggml-org/ggml-zdnn @Andreas-Krebbel @AlekseiNikiforovIBM
/ggml/src/ggml-openvino/ @cavusmustafa @wine99
/ggml/src/ggml.c @ggerganov
/ggml/src/ggml.cpp @ggerganov
/ggml/src/gguf.cpp @JohannesGaessler @Green-Sky
Expand Down
Loading
Loading