Skip to content

Python: it is a bug: missing a comma #13028

Description

@xiaohei520321

distance_clause = f"VectorDistance(c.{vector_field_name}, @vector, false {distance_obj})"

Let's review the highlighted snippet from CosmosNoSqlCollection._inner_search:

if search_type == SearchType.VECTOR:
    distance_clause = f"VectorDistance(c.{vector_field_name}, @vector, false {distance_obj})"
elif search_type == SearchType.KEYWORD_HYBRID:
    # Hybrid search: requires both a vector and keywords
    params.append({"name": "@keywords", "value": values})
    text_field = options.additional_property_name
    if not text_field:
        raise VectorStoreModelException("Hybrid search requires 'keyword_field_name' in options.")
    distance_clause = f"RRF(VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj}), "
    f"FullTextScore(c.{text_field}, @keywords))"
else:
    raise VectorStoreModelException(f"Search type '{search_type}' is not supported.")

1. Code Smells and Antipatterns

  • String Formatting Error:
    In the SearchType.VECTOR branch, the string for VectorDistance is missing a comma before {distance_obj}.

    f"VectorDistance(c.{vector_field_name}, @vector, false {distance_obj})"

    Should likely be:

    f"VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj})"

    This is a bug and could cause query syntax errors.

  • Multiline String Concatenation:
    In the SearchType.KEYWORD_HYBRID branch, string concatenation is split across two lines but not joined, so only the first assignment to distance_clause is used. The intended result is likely a single string.

  • Magic Strings and Lack of Validation:
    The string construction uses literals for query clauses which may be error-prone if the Cosmos DB query language changes. While this is sometimes unavoidable, extracting common query fragments into helper functions can help manage this.

  • Parameter Handling:
    The code appends parameters to params inside one branch but not the other, which could confuse maintainers if more parameters are added in the future.

2. Concrete Recommendations

A. Fix String Formatting

Correct the comma in the VectorDistance clause:

distance_clause = f"VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj})"

B. Join Multiline String Properly

For the hybrid search, build distance_clause as a single string:

distance_clause = (
    f"RRF(VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj}), "
    f"FullTextScore(c.{text_field}, @keywords))"
)

C. Clarify Parameter Handling

To make parameter handling clearer and less error-prone, you can refactor parameter additions to be more explicit and consistent:

params = [{"name": "@top", "value": options.top}]
vector_field = self.definition.try_get_vector_field(options.vector_property_name)
# ... other code ...
params.append({"name": "@vector", "value": vector})

if search_type == SearchType.KEYWORD_HYBRID:
    params.append({"name": "@keywords", "value": values})

D. Extract Query Clause Construction (Optional)

For maintainability, consider extracting the query clause construction into helper functions that encapsulate the logic for building vector and hybrid search clauses.

E. Defensive Programming

Add validation to ensure distance_obj is always a valid JSON string and vector_field_name and text_field are correctly set.

3. Example Improved Code

if search_type == SearchType.VECTOR:
    distance_clause = f"VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj})"
elif search_type == SearchType.KEYWORD_HYBRID:
    params.append({"name": "@keywords", "value": values})
    text_field = options.additional_property_name
    if not text_field:
        raise VectorStoreModelException("Hybrid search requires 'keyword_field_name' in options.")
    distance_clause = (
        f"RRF(VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj}), "
        f"FullTextScore(c.{text_field}, @keywords))"
    )
else:
    raise VectorStoreModelException(f"Search type '{search_type}' is not supported.")

4. Python Best Practices

  • Use f-strings carefully to avoid syntax errors and always validate string interpolations.
  • Prefer joining multiline strings using parentheses for readability in Python.
  • Keep parameter handling explicit and consistent across branches.

Summary:
The main improvement is to fix the string formatting for the Cosmos DB query, ensuring commas and concatenations are correct. Consider extracting query construction to helpers for maintainability. The rest of the code is logical and follows good Python practices, but be cautious when constructing dynamic queries.

Metadata

Metadata

Labels

bugSomething isn't workingmemorypythonPull requests for the Python Semantic Kernel

Type

Fields

No fields configured for Bug.

Projects

Status
Bug

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions