Bug Description
The structured_search method in database/vector_store.py constructs LanceDB .where() filter clauses by directly interpolating user-derived values (persons, entities, timestamps) into query strings using f-strings. The persons and entities fields have no escaping at all, allowing an attacker to inject arbitrary filter logic.
Location
database/vector_store.py:206-216
Reproduction
from database.vector_store import VectorStore
store = VectorStore(db_path="./test_db")
# ... add some entries ...
# Attacker crafts a person name with injection payload:
# The person name breaks out of make_array() and injects arbitrary conditions
malicious_persons = ["Alice')) OR true--"]
# This produces the where clause:
# array_has_any(persons, make_array('Alice')) OR true--'))
# Which bypasses the intended filter and returns ALL entries
results = store.structured_search(persons=malicious_persons)
The vulnerable code:
# Line 206-208: No escaping of person names
if persons:
values = ", ".join([f"'{p}'" for p in persons])
conditions.append(f"array_has_any(persons, make_array({values}))")
# Line 214-216: Same issue with entities
if entities:
values = ", ".join([f"'{e}'" for e in entities])
conditions.append(f"array_has_any(entities, make_array({values}))")
Note: The location field (line 211) does have basic replace("'", "''") escaping, but persons and entities do not.
Impact
- Data exfiltration: Bypass filters to read all stored memories across tenants
- Filter manipulation: Inject conditions to return specific records or no records
Suggested Fix
# Apply the same escaping used for location to all string interpolations:
if persons:
safe_persons = [p.replace("'", "''") for p in persons]
values = ", ".join([f"'{p}'" for p in safe_persons])
conditions.append(f"array_has_any(persons, make_array({values}))")
if entities:
safe_entities = [e.replace("'", "''") for e in entities]
values = ", ".join([f"'{e}'" for e in safe_entities])
conditions.append(f"array_has_any(entities, make_array({values}))")
# Also escape timestamp_range values:
if timestamp_range:
start_time = str(start_time).replace("'", "''")
end_time = str(end_time).replace("'", "''")
conditions.append(f"timestamp >= '{start_time}' AND timestamp <= '{end_time}'")
Ideally, use parameterized queries if LanceDB supports them.
Found via automated codebase analysis (confirmed by independent architecture review). Happy to submit a PR if confirmed.
Bug Description
The
structured_searchmethod indatabase/vector_store.pyconstructs LanceDB.where()filter clauses by directly interpolating user-derived values (persons, entities, timestamps) into query strings using f-strings. Thepersonsandentitiesfields have no escaping at all, allowing an attacker to inject arbitrary filter logic.Location
database/vector_store.py:206-216Reproduction
The vulnerable code:
Note: The
locationfield (line 211) does have basicreplace("'", "''")escaping, butpersonsandentitiesdo not.Impact
Suggested Fix
Ideally, use parameterized queries if LanceDB supports them.
Found via automated codebase analysis (confirmed by independent architecture review). Happy to submit a PR if confirmed.