Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature discussion: Lexical and hybrid search with Milvus 2.5 #1157

Open
stefanwebb opened this issue Feb 14, 2025 · 0 comments
Open

Feature discussion: Lexical and hybrid search with Milvus 2.5 #1157

stefanwebb opened this issue Feb 14, 2025 · 0 comments

Comments

@stefanwebb
Copy link

I'd like to start a discussion on how we could add the new lexical search from Milvus 2.5 (for pure lexical search, or hybrid search). This would mean that the text is stored directly in the vector database.

@doberst Before I start on a PR could we please reach an alignment on the design?

Here's how I would envisage it working in practice:

Lexical search:

LLMWareConfig().set_active_db("milvus")
MilvusConfig().set_config("host", "localhost", "port", 19530)

...

parsing_output = library.add_files(ingestion_folder_path)
query_results = Query(library).text_query(test_query, result_count=10)

and for hybrid search:

LLMWareConfig().set_active_db("milvus")
MilvusConfig().set_config("host", "localhost", "port", 19530)

...

embedding_model = "mini-lm-sbert"
library.add_files(ingestion_folder_path)
library.install_new_embedding(embedding_model_name=embedding_model, vector_db=vector_db, batch_size=100)

query_results = Query(library).hybrid_query(sample_query, result_count=20) # is dual_pass_query equivalent to hybrid search? doesn't seem to be documented

I think LLMWareConfig().set_active_vector_db("milvus") when Milvus is the lexical database should give a warning that it's unnecessary but not throw an exception.

Also, library.add_files() should do the chunking, but no data is inserted into the database until you call library.install_new_embedding or Query(library).text_query since you need the full schema and want to insert the text and embedding simultaneously (you can add fields dynamically in Milvus but its very inefficient, and even more inefficient to update entities).

Do you think it would be an improved design if the function of install_new_embedding was actually done in add_files and the embedding model is configured to be part of MilvusConfig().set_config?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant