Why is my embedding vector result so large? #13968

rsoika · 2025-06-02T11:57:31Z

rsoika
Jun 2, 2025

Hi,

I have a question about embeddings.
I am running the latest llama.cpp in a Kubernetes cluster with CUBA support. I am using the model Mistral-Nemo-Instruct-2407-Q6_K.gguf
All works fine with '/completion' requests.

But if I want t use the the llama.cpp web server to compute embeddings the result is confusing me.
For example using the following curl command to compute a embedding:

$ curl https://llama.cpp.foo.com/embedding -H "Content-Type: application/json" -d '{"input":["Paris"]}'

The result is very fast (below 1sec), but the server returns a json object with 5120 Floats!? And I guess this is wrong?

The result looks like this one:

[{"index":0,"embedding":[[3.4135327339172363,-1.7873748540878296,....,0.41544172167778015]]}]

Can someone explain me what I am doing wrong here?

Thanks for any help

===
Ralph

Answered by TafadzwaD

Jun 25, 2025

Hi,

I have a question about embeddings. I am running the latest llama.cpp in a Kubernetes cluster with CUBA support. I am using the model Mistral-Nemo-Instruct-2407-Q6_K.gguf All works fine with '/completion' requests.

But if I want t use the the llama.cpp web server to compute embeddings the result is confusing me. For example using the following curl command to compute a embedding:

$ curl https://llama.cpp.foo.com/embedding -H "Content-Type: application/json" -d '{"input":["Paris"]}'

The result is very fast (below 1sec), but the server returns a json object with 5120 Floats!? And I guess this is wrong?

The result looks like this one:

[{"index":0,"embedding":[[3.4135327339172363,-1.787…

View full answer

TafadzwaD · 2025-06-25T16:36:12Z

TafadzwaD
Jun 25, 2025

Hi,

I have a question about embeddings. I am running the latest llama.cpp in a Kubernetes cluster with CUBA support. I am using the model Mistral-Nemo-Instruct-2407-Q6_K.gguf All works fine with '/completion' requests.

But if I want t use the the llama.cpp web server to compute embeddings the result is confusing me. For example using the following curl command to compute a embedding:

$ curl https://llama.cpp.foo.com/embedding -H "Content-Type: application/json" -d '{"input":["Paris"]}'

The result is very fast (below 1sec), but the server returns a json object with 5120 Floats!? And I guess this is wrong?

The result looks like this one:

[{"index":0,"embedding":[[3.4135327339172363,-1.7873748540878296,....,0.41544172167778015]]}]

Can someone explain me what I am doing wrong here?

Thanks for any help

=== Ralph

@rsoika I tried embedding the word Paris using qwen3-embedding and got 1024 floats, which matches the embedding_length = 1024 shown in the logs:

You're using Mistral-Nemo-Instruct-2407, which has an internal embedding size (n_embd) of 5120. So it's expected that the /embedding endpoint returns a 5120-dimensional vector, even for a single-token input like Paris. You can confirm this by checking your logs for n_embd or embedding_length.

1 reply

rsoika Jun 25, 2025
Author

Yes you are right! It was a misunderstanding from my side.
Now I am using the model nomic-embed-text-v2-moe.Q4_K_M.gguf . This model has a dimension of 768. I learned that this is the size you need to define also the size of your vector column (in my case a Cassandra DB):

    public static final int DIMENSIONS = 768;
   .....
        String query = "CREATE TABLE IF NOT EXISTS document_vectors (\n" + //
                "  id text,\n" + //
                "  chunk_id uuid,\n" + //
                "  content_chunk text,\n" + //
                "  content_vector VECTOR <FLOAT, " + DIMENSIONS + ">,\n" + //
                "  PRIMARY KEY (id, chunk_id)\n" + //
                ");";
        session.execute(query);

All is now working fine. So even if you just compute the embedding for the word 'paris' you ill receive a float vector with 768 size. So it is important to find the best chunk size of text to store in the RAG db.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why is my embedding vector result so large? #13968

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why is my embedding vector result so large? #13968

Uh oh!

rsoika Jun 2, 2025

Replies: 1 comment · 1 reply

Uh oh!

TafadzwaD Jun 25, 2025

Uh oh!

rsoika Jun 25, 2025 Author

rsoika
Jun 2, 2025

Replies: 1 comment 1 reply

TafadzwaD
Jun 25, 2025

rsoika Jun 25, 2025
Author