Skip to content

feat(search): Cosmos full-text search as composable index mode#180

Merged
prsasattms merged 1 commit into
mainfrom
feat-cosmos-fts-search
Jun 4, 2026
Merged

feat(search): Cosmos full-text search as composable index mode#180
prsasattms merged 1 commit into
mainfrom
feat-cosmos-fts-search

Conversation

@prsasattms

Copy link
Copy Markdown
Collaborator

Adds full-text search as a composable index mode alongside vector search. Callers can now mix FTS-only and vector indexes in a single /search request and let the existing RRF merge fuse the results.

What

Schema ( + "search/schemas.py" + )

    • "IndexSpec.mode: "vector" | "fts"" + (default + ""vector"" + , fully backward compatible)
    • "IndexSpec.embedding" + is now + "Optional" + ; validator requires it iff + "mode="vector"" +
    • "IndexSpec.fts_field" + (defaults to + "content_fields[0]" + ), validated against a strict path regex ( + "[A-Za-z_][A-Za-z0-9_](\.[A-Za-z_][A-Za-z0-9_])*" + ) to prevent SQL injection
    • "mode="fts"" + rejected for + "PgVectorStore" + in v1 (Cosmos only — pgvector follow-up)

Searcher ( + "search/searcher.py" + )

    • "search_cosmos_fts()" + issues:
    • "``" + sql
      SELECT TOP @top_k c FROM c [WHERE ...]
      ORDER BY RANK FullTextScore(c., @Term0, @Term1, ...)
    • "``" +
      Cosmos forbids + "FullTextScore" + in projection, so hits return + "score=None" + and rank via RRF (use + "merge.strategy="rrf"" + — a warning fires for + ""score"" + with mixed FTS+vector).
    • "_tokenize_fts_query()" + : whitespace-split, lowercase, dedupe, cap at + "SEARCH_FTS_MAX_TERMS=16" +
    • "_search_one_index()" + branches on + "idx.mode" + — skips + "�mbed_query()" + entirely for FTS
    • "�xplain_search()" + handles FTS plans without an embedding policy
    • "
      un_search()" + modality classification treats FTS as text

Why

User wanted full-text search next to vector search. Option B (FTS as a separate index mode merged via RRF) was chosen over native Cosmos hybrid because it composes: an FTS-only index over Cosmos can merge with vector indexes over Cosmos or pgvector or CLIP image indexes — all via the existing per-index RRF.

Cosmos container requirement

The target container must have a full-text indexing policy on + " ts_field" + before this works. The existing demo container ( + "
ationwide-delta-1536" + ) will need an ALTER. Follow-up deployment task.

Tests

  • 17 new unit tests in + " ests/unit/test_search_fts.py" + covering: schema validation, tokenization edge cases, exact SQL+params shape via a fake Cosmos container, + "_search_one_index" + branching (verifies + "�mbed_query" + is not called for FTS), + "�xplain_search" + FTS output.
  • OpenAPI snapshot refreshed for the new optional + "�mbedding" + + new + "mode/ ts_field" + fields.
  • Full unit suite: 217 passed locally.

Follow-ups

  • pgvector FTS via + " svector" + + + " s_rank" + (next PR)
  • Add a full-text indexing-policy helper to the container-create script for the demo

Adds 'mode: vector|fts' to IndexSpec so callers can declare an index as full-text only and merge it with vector indexes via the existing RRF strategy. Cosmos-only in v1 (pgvector follows separately).

Schema changes:
- IndexSpec.mode (default 'vector', preserves backward compat)
- IndexSpec.embedding now Optional; validator requires it iff mode='vector'
- IndexSpec.fts_field (defaults to content_fields[0] at search time)
- fts_field validated against strict path regex (SQL-injection guard)
- mode='fts' rejected for pgvector stores (clear error, not silent)

Searcher changes:
- search_cosmos_fts() uses Cosmos 'ORDER BY RANK FullTextScore(field, @t0, @t1, ...)' with bound term params; native score is not projectable so hits get score=None and rank via RRF
- _tokenize_fts_query(): lowercase, dedupe, cap to SEARCH_FTS_MAX_TERMS=16
- _search_one_index branches on idx.mode, skipping embed_query entirely for FTS
- run_search emits warning when merge.strategy='score' is used with FTS indexes
- explain_search renders FTS plan without touching idx.embedding

Tests: 17 new (schema validation, tokenization, SQL shape, per-index branching, explain). OpenAPI snapshot refreshed. Full unit suite: 217 passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@prsasattms prsasattms merged commit 9c70eaf into main Jun 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants