|
distance_clause = f"VectorDistance(c.{vector_field_name}, @vector, false {distance_obj})" |
Let's review the highlighted snippet from CosmosNoSqlCollection._inner_search:
if search_type == SearchType.VECTOR:
distance_clause = f"VectorDistance(c.{vector_field_name}, @vector, false {distance_obj})"
elif search_type == SearchType.KEYWORD_HYBRID:
# Hybrid search: requires both a vector and keywords
params.append({"name": "@keywords", "value": values})
text_field = options.additional_property_name
if not text_field:
raise VectorStoreModelException("Hybrid search requires 'keyword_field_name' in options.")
distance_clause = f"RRF(VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj}), "
f"FullTextScore(c.{text_field}, @keywords))"
else:
raise VectorStoreModelException(f"Search type '{search_type}' is not supported.")
1. Code Smells and Antipatterns
-
String Formatting Error:
In the SearchType.VECTOR branch, the string for VectorDistance is missing a comma before {distance_obj}.
f"VectorDistance(c.{vector_field_name}, @vector, false {distance_obj})"
Should likely be:
f"VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj})"
This is a bug and could cause query syntax errors.
-
Multiline String Concatenation:
In the SearchType.KEYWORD_HYBRID branch, string concatenation is split across two lines but not joined, so only the first assignment to distance_clause is used. The intended result is likely a single string.
-
Magic Strings and Lack of Validation:
The string construction uses literals for query clauses which may be error-prone if the Cosmos DB query language changes. While this is sometimes unavoidable, extracting common query fragments into helper functions can help manage this.
-
Parameter Handling:
The code appends parameters to params inside one branch but not the other, which could confuse maintainers if more parameters are added in the future.
2. Concrete Recommendations
A. Fix String Formatting
Correct the comma in the VectorDistance clause:
distance_clause = f"VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj})"
B. Join Multiline String Properly
For the hybrid search, build distance_clause as a single string:
distance_clause = (
f"RRF(VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj}), "
f"FullTextScore(c.{text_field}, @keywords))"
)
C. Clarify Parameter Handling
To make parameter handling clearer and less error-prone, you can refactor parameter additions to be more explicit and consistent:
params = [{"name": "@top", "value": options.top}]
vector_field = self.definition.try_get_vector_field(options.vector_property_name)
# ... other code ...
params.append({"name": "@vector", "value": vector})
if search_type == SearchType.KEYWORD_HYBRID:
params.append({"name": "@keywords", "value": values})
D. Extract Query Clause Construction (Optional)
For maintainability, consider extracting the query clause construction into helper functions that encapsulate the logic for building vector and hybrid search clauses.
E. Defensive Programming
Add validation to ensure distance_obj is always a valid JSON string and vector_field_name and text_field are correctly set.
3. Example Improved Code
if search_type == SearchType.VECTOR:
distance_clause = f"VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj})"
elif search_type == SearchType.KEYWORD_HYBRID:
params.append({"name": "@keywords", "value": values})
text_field = options.additional_property_name
if not text_field:
raise VectorStoreModelException("Hybrid search requires 'keyword_field_name' in options.")
distance_clause = (
f"RRF(VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj}), "
f"FullTextScore(c.{text_field}, @keywords))"
)
else:
raise VectorStoreModelException(f"Search type '{search_type}' is not supported.")
4. Python Best Practices
- Use f-strings carefully to avoid syntax errors and always validate string interpolations.
- Prefer joining multiline strings using parentheses for readability in Python.
- Keep parameter handling explicit and consistent across branches.
Summary:
The main improvement is to fix the string formatting for the Cosmos DB query, ensuring commas and concatenations are correct. Consider extracting query construction to helpers for maintainability. The rest of the code is logical and follows good Python practices, but be cautious when constructing dynamic queries.
semantic-kernel/python/semantic_kernel/connectors/azure_cosmos_db.py
Line 807 in d3318ad
Let's review the highlighted snippet from
CosmosNoSqlCollection._inner_search:1. Code Smells and Antipatterns
String Formatting Error:
In the
SearchType.VECTORbranch, the string forVectorDistanceis missing a comma before{distance_obj}.f"VectorDistance(c.{vector_field_name}, @vector, false {distance_obj})"Should likely be:
f"VectorDistance(c.{vector_field_name}, @vector, false, {distance_obj})"This is a bug and could cause query syntax errors.
Multiline String Concatenation:
In the
SearchType.KEYWORD_HYBRIDbranch, string concatenation is split across two lines but not joined, so only the first assignment todistance_clauseis used. The intended result is likely a single string.Magic Strings and Lack of Validation:
The string construction uses literals for query clauses which may be error-prone if the Cosmos DB query language changes. While this is sometimes unavoidable, extracting common query fragments into helper functions can help manage this.
Parameter Handling:
The code appends parameters to
paramsinside one branch but not the other, which could confuse maintainers if more parameters are added in the future.2. Concrete Recommendations
A. Fix String Formatting
Correct the comma in the
VectorDistanceclause:B. Join Multiline String Properly
For the hybrid search, build
distance_clauseas a single string:C. Clarify Parameter Handling
To make parameter handling clearer and less error-prone, you can refactor parameter additions to be more explicit and consistent:
D. Extract Query Clause Construction (Optional)
For maintainability, consider extracting the query clause construction into helper functions that encapsulate the logic for building vector and hybrid search clauses.
E. Defensive Programming
Add validation to ensure
distance_objis always a valid JSON string andvector_field_nameandtext_fieldare correctly set.3. Example Improved Code
4. Python Best Practices
Summary:
The main improvement is to fix the string formatting for the Cosmos DB query, ensuring commas and concatenations are correct. Consider extracting query construction to helpers for maintainability. The rest of the code is logical and follows good Python practices, but be cautious when constructing dynamic queries.