Skip to content

Why is my embedding vector result so large? #13968

Answered by TafadzwaD
rsoika asked this question in Q&A
Discussion options

You must be logged in to vote

Hi,

I have a question about embeddings. I am running the latest llama.cpp in a Kubernetes cluster with CUBA support. I am using the model Mistral-Nemo-Instruct-2407-Q6_K.gguf All works fine with '/completion' requests.

But if I want t use the the llama.cpp web server to compute embeddings the result is confusing me. For example using the following curl command to compute a embedding:

$ curl https://llama.cpp.foo.com/embedding -H "Content-Type: application/json" -d '{"input":["Paris"]}'

The result is very fast (below 1sec), but the server returns a json object with 5120 Floats!? And I guess this is wrong?

The result looks like this one:

[{"index":0,"embedding":[[3.4135327339172363,-1.787…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@rsoika
Comment options

Answer selected by rsoika
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants