Fix visual encoders with no CLS #11982

alex-jw-brooks · 2025-02-20T19:44:09Z

This PR fixes the bug outlined in this issue: #10157

As well as discussed in projects leverage llama cpp like ollama: ollama/ollama#7441 ollama/ollama-python#433

Summary

In clip.cpp, we initialize a "patches" vector, which is then used to index into the embedding with a get rows op (here).

This can cause the out of bounds assertion to be triggered when run with the CPU backend when it's used with a visual encoder that has no CLS embedding, e.g., siglip. I.e.,

Siglip has an embedding dimension of 729 and no CLS
This causes the "patches" vector to get initialized with values [1, 2, ..., 729] instead of the correct [0, 1, ..., 728]
Because of the offset for the CLS that isn't there, the last value triggers the assertion

Steps to Verify

Download a model that doesn't have siglip as the visual encoder; I verified this fix with granite vision, which uses siglip, but you can also check it with nanollava.

Download GGUF files from ollama

wget https://registry.ollama.ai/v2/qnguyen3/nanollava/blobs/sha256:511ad0036913a93bd04aa1c08de98bcdfa15bcbe0e03e5e9e4334039531ba863 -O model.gguf

wget  https://registry.ollama.ai/v2/qnguyen3/nanollava/blobs/sha256:8a16a1e306eba4791488fd4f9585403ecb03da9b71d9f36e8944c33f35ca8754 -O projector.gguf

Build llava llama cli with cmake --build build --config Release --target llama-llava-cli
Try running the model.

MODEL_GGUF_PATH=/Users/alexanderjbrooks/Desktop/nanollava/model.gguf
PROJECT_GGUF_PATH=/Users/alexanderjbrooks/Desktop/nanollava/projector.gguf
IMG=~/Desktop/duck.jpg

./build/bin/llama-llava-cli -m $MODEL_GGUF_PATH \
    --mmproj $PROJECT_GGUF_PATH \
    --image $IMG \
    -p "<|im_start|>user<image>\nCan you describe this image?<|im_end|>\n<|im_start|>assistant" \
    --temp 0

On main, it blows up because of the patch 729:

/Users/alexanderjbrooks/workspace/develop/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:8518: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
zsh: abort      ./build/bin/llama-llava-cli -m $MODEL_GGUF_PATH --mmproj $PROJECT_GGUF_PATH

On this branch, things are happy:

This image features a beautiful outdoor scene with a clear blue sky and a variety of flowers. The sky is dotted with fluffy white clouds, and there are several trees with pink and white flowers. The image also includes a tall, thin tree with pink flowers and a tall tree with pink flowers. There are also some bushes with pink flowers. The image is rich in detail, with various elements such as the sky, clouds, trees, and flowers.

@ngxson @ggerganov @gabe-l-hart PTAL when you can - this change is needed to run granite vision models correctly as well (being added by this PR), but decoupling the bug fix from the new model support 🙂

Signed-off-by: Alex-Brooks <[email protected]>

github-actions bot added the examples label Feb 20, 2025

This was referenced Feb 20, 2025

Error: unknown error was encountered while running the model GGML_ASSERT(i01 >= 0 && i01 < ne01) failed ollama/ollama#7441

Closed

Add Granite Vision Support #11794

Merged

Fix visual encoders with no CLS

192afbc

Signed-off-by: Alex-Brooks <[email protected]>

alex-jw-brooks force-pushed the fix_no_cls_vencoders branch from b846817 to 192afbc Compare February 20, 2025 21:14

ggerganov approved these changes Feb 21, 2025

View reviewed changes

ggerganov merged commit ee02ad0 into ggml-org:master Feb 21, 2025
1 check passed

alex-jw-brooks mentioned this pull request Feb 21, 2025

Add patch for granite vision support ollama/ollama#9071

Closed

orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025

clip : fix visual encoders with no CLS (ggml-org#11982)

d99f6d0

Signed-off-by: Alex-Brooks <[email protected]>

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025

clip : fix visual encoders with no CLS (ggml-org#11982)

c81507b

Signed-off-by: Alex-Brooks <[email protected]>

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

clip : fix visual encoders with no CLS (ggml-org#11982)

ec3309d

Signed-off-by: Alex-Brooks <[email protected]>

mostlyuseful pushed a commit to mostlyuseful/llama.cpp that referenced this pull request May 12, 2025

clip : fix visual encoders with no CLS (ggml-org#11982)

e84c494

Signed-off-by: Alex-Brooks <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix visual encoders with no CLS #11982

Fix visual encoders with no CLS #11982

Uh oh!

alex-jw-brooks commented Feb 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix visual encoders with no CLS #11982

Fix visual encoders with no CLS #11982

Uh oh!

Conversation

alex-jw-brooks commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Steps to Verify

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-jw-brooks commented Feb 20, 2025 •

edited

Loading