Skip to content

Security: SQL/filter injection in VectorStore.structured_search via unsanitized person/entity names #53

@CrepuscularIRIS

Description

@CrepuscularIRIS

Bug Description

The structured_search method in database/vector_store.py constructs LanceDB .where() filter clauses by directly interpolating user-derived values (persons, entities, timestamps) into query strings using f-strings. The persons and entities fields have no escaping at all, allowing an attacker to inject arbitrary filter logic.

Location

database/vector_store.py:206-216

Reproduction

from database.vector_store import VectorStore

store = VectorStore(db_path="./test_db")
# ... add some entries ...

# Attacker crafts a person name with injection payload:
# The person name breaks out of make_array() and injects arbitrary conditions
malicious_persons = ["Alice')) OR true--"]

# This produces the where clause:
# array_has_any(persons, make_array('Alice')) OR true--'))
# Which bypasses the intended filter and returns ALL entries
results = store.structured_search(persons=malicious_persons)

The vulnerable code:

# Line 206-208: No escaping of person names
if persons:
    values = ", ".join([f"'{p}'" for p in persons])
    conditions.append(f"array_has_any(persons, make_array({values}))")

# Line 214-216: Same issue with entities
if entities:
    values = ", ".join([f"'{e}'" for e in entities])
    conditions.append(f"array_has_any(entities, make_array({values}))")

Note: The location field (line 211) does have basic replace("'", "''") escaping, but persons and entities do not.

Impact

  • Data exfiltration: Bypass filters to read all stored memories across tenants
  • Filter manipulation: Inject conditions to return specific records or no records

Suggested Fix

# Apply the same escaping used for location to all string interpolations:
if persons:
    safe_persons = [p.replace("'", "''") for p in persons]
    values = ", ".join([f"'{p}'" for p in safe_persons])
    conditions.append(f"array_has_any(persons, make_array({values}))")

if entities:
    safe_entities = [e.replace("'", "''") for e in entities]
    values = ", ".join([f"'{e}'" for e in safe_entities])
    conditions.append(f"array_has_any(entities, make_array({values}))")

# Also escape timestamp_range values:
if timestamp_range:
    start_time = str(start_time).replace("'", "''")
    end_time = str(end_time).replace("'", "''")
    conditions.append(f"timestamp >= '{start_time}' AND timestamp <= '{end_time}'")

Ideally, use parameterized queries if LanceDB supports them.


Found via automated codebase analysis (confirmed by independent architecture review). Happy to submit a PR if confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions