-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Using HNSW index requires using Euclidian Distance Operator? #152
Comments
The pgvector extension will let you use any of the distance operators: But, for optimal performance, you should create an HNSW index for each operator you expect to be using. This repo uses cosine similarity since it's designed for compatibility with multiple embedding models, and cosine is the most accurate while also being flexible (in that it also works for non-unit vectors, versus innerproduct). |
Actually I have a bug in this repo in that I defined the indexes using inner product, I'll fix that now! |
Got it, each index has its own operator which you define while creating the index. Thanks! |
I think the inconsistency happened because I originally used innerproduct, as I was only using OpenAI embedding models and those are normalized, so innerproduct works just as well and is faster than cosine distance. But then I added nomic, which I think may not be normalized?, so I moved to cosine. I've added a comment about all that in the PR. |
Description
You have this comment from the pgvector playgroud repo
# Define HNSW index to support vector similarity search through the vector_l2_ops access method (Euclidean distance). The SQL operator for Euclidean distance is written as <->.
Does it mean that the HNSW index will only work with
<->
operator?Since here in the repo you are using the cosine similarity operator not the
<->
rag-postgres-openai-python/src/backend/fastapi_app/postgres_searcher.py
Line 48 in 61bde71
The text was updated successfully, but these errors were encountered: