Replies: 1 comment 1 reply
-
@rsoika I tried embedding the word ![]() You're using Mistral-Nemo-Instruct-2407, which has an internal embedding size ( |
Beta Was this translation helpful? Give feedback.
-
@rsoika I tried embedding the word ![]() You're using Mistral-Nemo-Instruct-2407, which has an internal embedding size ( |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have a question about embeddings.
I am running the latest llama.cpp in a Kubernetes cluster with CUBA support. I am using the model
Mistral-Nemo-Instruct-2407-Q6_K.gguf
All works fine with '/completion' requests.
But if I want t use the the llama.cpp web server to compute embeddings the result is confusing me.
For example using the following curl command to compute a embedding:
$ curl https://llama.cpp.foo.com/embedding -H "Content-Type: application/json" -d '{"input":["Paris"]}'
The result is very fast (below 1sec), but the server returns a json object with 5120 Floats!? And I guess this is wrong?
The result looks like this one:
[{"index":0,"embedding":[[3.4135327339172363,-1.7873748540878296,....,0.41544172167778015]]}]
Can someone explain me what I am doing wrong here?
Thanks for any help
===
Ralph
Beta Was this translation helpful? Give feedback.
All reactions