Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need Debugging Help with errors of azure ai search #288

Open
jmandivarapu opened this issue Jan 6, 2025 · 0 comments
Open

Need Debugging Help with errors of azure ai search #288

jmandivarapu opened this issue Jan 6, 2025 · 0 comments

Comments

@jmandivarapu
Copy link
Collaborator

jmandivarapu commented Jan 6, 2025

Problem: Azure AI Search Returns Results for Garbage or Random Words.

I just wanted to know what is the right way of using query rewrite or regular semantic hybrid search with non query rewriting on how I can automatically avoid lot of these results for really bad and no related words such as 'aaaa' or 's*x' or any such un related words.

I though of using the re-ranking score but even for word like 'xxxxxx' reranking score is greater than 2.5. If I use threshold like 2 then these results also pop up, if I use 2.5 as threshold then even for good search query lot of matching results are lost.

Documents

I have 40 documents in the search index. Each document contains a product title and description.

Queries When Results Are Not Expected

Try 1: Using the Older Version of Azure AI Search Without Recent Query Rewrite

(Refer: [Azure AI Search Query Rewrite Documentation](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-rewrite))

Scenarios Inside This Try

  1. Just Semantic Search
  2. Semantic Hybrid Search (Semantic + Vectorization)

Case A: Just Semantic Search

  • Input: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
  • Code:
results = search_client.search(
    search_text=input_data,
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    top=3
)
  • Output: As expected, empty results.

Case B: Semantic Hybrid Search

  • Input: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
  • Code:
vector_query = VectorizedQuery(
    vector=embedding,
    k_nearest_neighbors=50,
    exhaustive=True,
    fields="experienceDescriptionVector,experienceTitleVector"
)

search_client = SearchClient(
    endpoint=endpoint,
    index_name='bars-v3',
    credential=credential,
    api_version='2024-11-01-preview'
)

results = search_client.search(
    search_text=input_data,
    vector_queries=[vector_query],
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    top=3
)
  • Output: Not as expected. Results are returned even though they shouldn’t.
    • Search Results for 'aaaaaaa':
      [
        {"productis": 0, "score": 0.0234118290245533, "reranker_score": 1.6579372882843018},
        {"productis": 1, "score": 0.026050420477986336, "reranker_score": 1.6370235681533813},
        {"productis": 2, "score": 0.025913622230291367, "reranker_score": 1.626389503479004},
        {"productis": 3, "score": 0.03205128386616707, "reranker_score": 1.618236780166626}
      ]

Decision: Use regular semantic search due to errors caused by Semantic Hybrid Search.


Try 2: Newer Version of Azure AI Search Including Query Rewriting

(Refer: [Azure AI Search Query Rewrite Documentation](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-rewrite))

Scenarios Inside This Try

  1. Just Semantic Search + Query Rewrite
  2. Semantic Hybrid Search + Query Rewrite

Case A: Just Semantic Search + Query Rewrite

  • Input: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
  • Code:
results = search_client.search(
    search_text=input_data,
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    query_rewrites="generative",
    debug="queryRewrites",
    top=4
)
  • Output: Not as expected.
    • Search Results for 'aaaaaaa':
      [
        "meaning of aaaaaaaa",
        "what does aaaaaaaa mean",
        "define aaaaaaa",
        "aaaaaaa meaning"
      ]
      [
        {"productis": 0, "score": 0.7754897, "reranker_score": 1.6579372882843018},
        {"productis": 1, "score": 0.27041504, "reranker_score": 1.6370235681533813},
        {"productis": 2, "score": 1.0258656, "reranker_score": 1.618236780166626},
        {"productis": 3, "score": 0.20604418, "reranker_score": 1.524656891822815}
      ]

Case B: Semantic Hybrid Search + Query Rewrite

  • Input: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
  • Code:
results = search_client.search(
    search_text=input_data,
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    query_rewrites="generative",
    debug="queryRewrites",
    top=4
)
  • Output: Not as expected. Results returned despite nonsensical input.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant