Skip to content

vchord backend: ANN queries cannot use vchordrq indexes due to opclass/operator mismatch (vector_l2_ops vs <=>) #1667

@isac322

Description

@isac322

Summary

When Hindsight is deployed with the vchord vector backend (HINDSIGHT_API_VECTOR_EXTENSION=vchord), every ANN query in the codebase falls back to a sequential scan because the vchordrq indexes are built with vector_l2_ops but all application SQL uses the cosine-distance operator <=>. The opclass and the operator are not compatible in PostgreSQL, so the planner never picks the vector index, and the database CPU-walks every embedding in the partition for each query.

I observed this in production on a small dataset (~16k rows, 3072-dim embeddings, tensorchord/cloudnative-vectorchord:18.3-1.1.1): the primary Postgres pod sits at ~870m CPU because the retain pipeline's Phase 1 ANN link search (a CROSS JOIN LATERAL over _ann_seeds) is multiplying the slowdown by the seed count. Other backends (pgvector, pgvectorscale, pg_diskann) are unaffected because they map to vector_cosine_ops and match the <=> operator.

Environment

  • Hindsight API: main (verified on commit bd86e7ea, ahead by the recent merge bringing the repo current as of the report date)
  • Vector backend: vchord (tensorchord/cloudnative-vectorchord:18.3-1.1.1, extension vchord 1.1.1, vector 0.8.2)
  • PostgreSQL: 18
  • Deployment: CloudNativePG cluster, single primary
  • Embedding dimension: 3072 (OpenAI text-embedding-3-large-style, L2-normalized; see evidence below)

Symptoms

  • cnpg-*-1 primary Pod CPU stays at ~870m steady-state during retain operations.
  • pg_stat_activity shows the Phase 1 ANN query (engine/retain/link_utils.py) running for 50+ seconds with empty wait_event (pure CPU, not I/O- or lock-bound).
  • The same shape of query is used in recall, link expansion, and reflect paths — they share the latency penalty but are less visible because they are not wrapped in a CROSS JOIN LATERAL.

Root Cause

vchordrq operator classes are strictly bound 1:1 to operators (VectorChord docs — Operator Classes):

opclass usable operator
vector_l2_ops <-> only
vector_cosine_ops <=> only
vector_ip_ops <#> only

Hindsight's vchord index mapping is in hindsight-api-slim/hindsight_api/_vector_index.py:

INDEX_USING: dict[str, str] = {
    "pgvector":      "USING hnsw    (embedding vector_cosine_ops)",
    "pgvectorscale": "USING diskann (embedding vector_cosine_ops) WITH (num_neighbors = 50)",
    "pg_diskann":    "USING diskann (embedding vector_cosine_ops) WITH (max_neighbors = 50)",
    "vchord":        "USING vchordrq (embedding vector_l2_ops)",   # ← only L2
}

…but the ANN SQL across the engine uses cosine distance:

# hindsight-api-slim/hindsight_api/engine/retain/link_utils.py:737-744
SELECT mu.id,
       1 - (mu.embedding <=> s.emb_text::vector) AS similarity
FROM memory_units mu
WHERE mu.bank_id = $1 AND mu.fact_type = $2 AND mu.embedding IS NOT NULL
ORDER BY mu.embedding <=> s.emb_text::vector
LIMIT $3

Because <=> is not in the operator family of vector_l2_ops, the planner cannot use the vchordrq index and falls back to a sequential scan with a per-row cosine computation, sorted by top-N heapsort.

Evidence (production cluster, 2927-row partition)

opencode::plugbear / fact_type = 'experience' (2927 rows of 3072-dim embeddings):

Current query (<=>, cosine):

Limit  (... actual time=142.418..142.443 rows=50.00)
  Buffers: shared hit=71531 read=1
  -> Sort  (... actual time=142.415..142.426 rows=50.00)
       Sort Key: (mu.embedding <=> ...)
       Sort Method: top-N heapsort  Memory: 31kB
       -> Seq Scan on memory_units mu  (... actual time=2.251..141.358 rows=2927.00)
            Filter: (embedding IS NOT NULL AND bank_id = '...' AND fact_type = 'experience')
            Rows Removed by Filter: 13538
            Buffers: shared hit=71528 read=1
Planning Time: 15.001 ms
Execution Time: 143.828 ms

Same data, equivalent query rewritten with <-> (L2) — index is used:

Limit  (... actual time=15.993..22.968 rows=50.00)
  Buffers: shared hit=770 read=520
  -> Index Scan using idx_mu_emb_expr_<hash> on memory_units mu
       Order By: (embedding <-> ...)
Planning Time: 16.354 ms
Execution Time: 25.646 ms

~5.6× single-query difference, and because the retain pipeline wraps this in CROSS JOIN LATERAL over a temp table of seeds, the penalty multiplies by seed count. A run with ~350 seeds takes ~50 s with the current opclass; the same workload on a matching opclass would finish in ~9 s.

Embedding norm statistics (verifies that L2 and cosine are monotonically equivalent for this corpus, so the fix is mathematically safe to discuss in either direction):

min_norm   avg_norm   max_norm   stddev_norm
0.999316   0.999998   1.000723   0.000197

Affected code

All call sites using <=> (every one of these hits the same fallback when the backend is vchord):

  • engine/retain/link_utils.py:737, 742Phase 1 ANN link creation (hottest path)
  • engine/search/retrieval.py:410, 411, 415, 517, 532 — semantic recall
  • engine/search/link_expansion_retrieval.py:86, 91, 95 — graph link expansion
  • engine/reflect/tools.py:86, 89 — reflect tool retrieval
  • engine/sql/postgresql.py:157, 164, 168 — SQL helper used to compose other queries

Secondary finding: SET LOCAL hnsw.ef_search is a no-op under vchord

engine/retain/link_utils.py:711-713:

# Default 400 is tuned for recall precision but at 164k units each HNSW probe
# takes 94ms. ef_search=60 gives 2.7ms per probe (35x faster) ...
await conn.execute("SET LOCAL hnsw.ef_search = 60")

The hnsw.ef_search GUC is part of pgvector's HNSW index. It does not exist under vchord, so the SET LOCAL has no effect when vchordrq is in use. The intended "recall-vs-latency" tuning has therefore never been applied to vchord deployments. The equivalent vchord dials are vchordrq.probes and vchordrq.epsilon (see Fallback Parameters).

Why this only affects vchord users

The other three backends (pgvector, pgvectorscale, pg_diskann) all map to vector_cosine_ops in _vector_index.py, so their indexes match the <=> operator and the same code paths perform as intended.

What the fix probably looks like (for discussion)

VectorChord's documentation explicitly recommends vector_cosine_ops for normalized/cosine embeddings:

"For most datasets using cosine similarity, enabling residual_quantization and build.internal.spherical_centroids improves both QPS and recall."

So the cleanest fix is to switch the vchord mapping in _vector_index.py to vector_cosine_ops (optionally with residual_quantization = true and spherical_centroids = true) and provide an Alembic migration that re-creates the existing vchordrq indexes — both the global one (idx_memory_units_embedding_vchordrq) and the many per-(bank_id, fact_type) partial indexes — with the new opclass, ideally via CREATE INDEX CONCURRENTLY followed by drops of the old ones.

The secondary hnsw.ef_search issue can be fixed by dispatching the tuning GUC by backend (vchordrq.probes / vchordrq.epsilon for vchord, hnsw.ef_search for pgvector, etc.).

I'm happy to send a PR along these lines — opening this issue first so the diagnosis is documented and so anyone hitting the same symptom can find it.

Reproduction

  1. Deploy Hindsight against PostgreSQL with the vchord extension and HINDSIGHT_API_VECTOR_EXTENSION=vchord.
  2. Ingest enough memories that any single (bank_id, fact_type) partition contains a few thousand rows.
  3. Run any retain (or recall) that triggers ANN search.
  4. Observe Postgres CPU climb and the activity table show long-running queries with <=> and no wait_event.
  5. EXPLAIN (ANALYZE, BUFFERS) shows Seq Scan over the partition with the vchordrq index ignored.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions