Summary
When Hindsight is deployed with the vchord vector backend (HINDSIGHT_API_VECTOR_EXTENSION=vchord), every ANN query in the codebase falls back to a sequential scan because the vchordrq indexes are built with vector_l2_ops but all application SQL uses the cosine-distance operator <=>. The opclass and the operator are not compatible in PostgreSQL, so the planner never picks the vector index, and the database CPU-walks every embedding in the partition for each query.
I observed this in production on a small dataset (~16k rows, 3072-dim embeddings, tensorchord/cloudnative-vectorchord:18.3-1.1.1): the primary Postgres pod sits at ~870m CPU because the retain pipeline's Phase 1 ANN link search (a CROSS JOIN LATERAL over _ann_seeds) is multiplying the slowdown by the seed count. Other backends (pgvector, pgvectorscale, pg_diskann) are unaffected because they map to vector_cosine_ops and match the <=> operator.
Environment
- Hindsight API:
main (verified on commit bd86e7ea, ahead by the recent merge bringing the repo current as of the report date)
- Vector backend:
vchord (tensorchord/cloudnative-vectorchord:18.3-1.1.1, extension vchord 1.1.1, vector 0.8.2)
- PostgreSQL: 18
- Deployment: CloudNativePG cluster, single primary
- Embedding dimension: 3072 (OpenAI
text-embedding-3-large-style, L2-normalized; see evidence below)
Symptoms
cnpg-*-1 primary Pod CPU stays at ~870m steady-state during retain operations.
pg_stat_activity shows the Phase 1 ANN query (engine/retain/link_utils.py) running for 50+ seconds with empty wait_event (pure CPU, not I/O- or lock-bound).
- The same shape of query is used in recall, link expansion, and reflect paths — they share the latency penalty but are less visible because they are not wrapped in a
CROSS JOIN LATERAL.
Root Cause
vchordrq operator classes are strictly bound 1:1 to operators (VectorChord docs — Operator Classes):
| opclass |
usable operator |
vector_l2_ops |
<-> only |
vector_cosine_ops |
<=> only |
vector_ip_ops |
<#> only |
Hindsight's vchord index mapping is in hindsight-api-slim/hindsight_api/_vector_index.py:
INDEX_USING: dict[str, str] = {
"pgvector": "USING hnsw (embedding vector_cosine_ops)",
"pgvectorscale": "USING diskann (embedding vector_cosine_ops) WITH (num_neighbors = 50)",
"pg_diskann": "USING diskann (embedding vector_cosine_ops) WITH (max_neighbors = 50)",
"vchord": "USING vchordrq (embedding vector_l2_ops)", # ← only L2
}
…but the ANN SQL across the engine uses cosine distance:
# hindsight-api-slim/hindsight_api/engine/retain/link_utils.py:737-744
SELECT mu.id,
1 - (mu.embedding <=> s.emb_text::vector) AS similarity
FROM memory_units mu
WHERE mu.bank_id = $1 AND mu.fact_type = $2 AND mu.embedding IS NOT NULL
ORDER BY mu.embedding <=> s.emb_text::vector
LIMIT $3
Because <=> is not in the operator family of vector_l2_ops, the planner cannot use the vchordrq index and falls back to a sequential scan with a per-row cosine computation, sorted by top-N heapsort.
Evidence (production cluster, 2927-row partition)
opencode::plugbear / fact_type = 'experience' (2927 rows of 3072-dim embeddings):
Current query (<=>, cosine):
Limit (... actual time=142.418..142.443 rows=50.00)
Buffers: shared hit=71531 read=1
-> Sort (... actual time=142.415..142.426 rows=50.00)
Sort Key: (mu.embedding <=> ...)
Sort Method: top-N heapsort Memory: 31kB
-> Seq Scan on memory_units mu (... actual time=2.251..141.358 rows=2927.00)
Filter: (embedding IS NOT NULL AND bank_id = '...' AND fact_type = 'experience')
Rows Removed by Filter: 13538
Buffers: shared hit=71528 read=1
Planning Time: 15.001 ms
Execution Time: 143.828 ms
Same data, equivalent query rewritten with <-> (L2) — index is used:
Limit (... actual time=15.993..22.968 rows=50.00)
Buffers: shared hit=770 read=520
-> Index Scan using idx_mu_emb_expr_<hash> on memory_units mu
Order By: (embedding <-> ...)
Planning Time: 16.354 ms
Execution Time: 25.646 ms
~5.6× single-query difference, and because the retain pipeline wraps this in CROSS JOIN LATERAL over a temp table of seeds, the penalty multiplies by seed count. A run with ~350 seeds takes ~50 s with the current opclass; the same workload on a matching opclass would finish in ~9 s.
Embedding norm statistics (verifies that L2 and cosine are monotonically equivalent for this corpus, so the fix is mathematically safe to discuss in either direction):
min_norm avg_norm max_norm stddev_norm
0.999316 0.999998 1.000723 0.000197
Affected code
All call sites using <=> (every one of these hits the same fallback when the backend is vchord):
engine/retain/link_utils.py:737, 742 — Phase 1 ANN link creation (hottest path)
engine/search/retrieval.py:410, 411, 415, 517, 532 — semantic recall
engine/search/link_expansion_retrieval.py:86, 91, 95 — graph link expansion
engine/reflect/tools.py:86, 89 — reflect tool retrieval
engine/sql/postgresql.py:157, 164, 168 — SQL helper used to compose other queries
Secondary finding: SET LOCAL hnsw.ef_search is a no-op under vchord
engine/retain/link_utils.py:711-713:
# Default 400 is tuned for recall precision but at 164k units each HNSW probe
# takes 94ms. ef_search=60 gives 2.7ms per probe (35x faster) ...
await conn.execute("SET LOCAL hnsw.ef_search = 60")
The hnsw.ef_search GUC is part of pgvector's HNSW index. It does not exist under vchord, so the SET LOCAL has no effect when vchordrq is in use. The intended "recall-vs-latency" tuning has therefore never been applied to vchord deployments. The equivalent vchord dials are vchordrq.probes and vchordrq.epsilon (see Fallback Parameters).
Why this only affects vchord users
The other three backends (pgvector, pgvectorscale, pg_diskann) all map to vector_cosine_ops in _vector_index.py, so their indexes match the <=> operator and the same code paths perform as intended.
What the fix probably looks like (for discussion)
VectorChord's documentation explicitly recommends vector_cosine_ops for normalized/cosine embeddings:
"For most datasets using cosine similarity, enabling residual_quantization and build.internal.spherical_centroids improves both QPS and recall."
So the cleanest fix is to switch the vchord mapping in _vector_index.py to vector_cosine_ops (optionally with residual_quantization = true and spherical_centroids = true) and provide an Alembic migration that re-creates the existing vchordrq indexes — both the global one (idx_memory_units_embedding_vchordrq) and the many per-(bank_id, fact_type) partial indexes — with the new opclass, ideally via CREATE INDEX CONCURRENTLY followed by drops of the old ones.
The secondary hnsw.ef_search issue can be fixed by dispatching the tuning GUC by backend (vchordrq.probes / vchordrq.epsilon for vchord, hnsw.ef_search for pgvector, etc.).
I'm happy to send a PR along these lines — opening this issue first so the diagnosis is documented and so anyone hitting the same symptom can find it.
Reproduction
- Deploy Hindsight against PostgreSQL with the
vchord extension and HINDSIGHT_API_VECTOR_EXTENSION=vchord.
- Ingest enough memories that any single
(bank_id, fact_type) partition contains a few thousand rows.
- Run any retain (or recall) that triggers ANN search.
- Observe Postgres CPU climb and the activity table show long-running queries with
<=> and no wait_event.
EXPLAIN (ANALYZE, BUFFERS) shows Seq Scan over the partition with the vchordrq index ignored.
Summary
When Hindsight is deployed with the vchord vector backend (
HINDSIGHT_API_VECTOR_EXTENSION=vchord), every ANN query in the codebase falls back to a sequential scan because thevchordrqindexes are built withvector_l2_opsbut all application SQL uses the cosine-distance operator<=>. The opclass and the operator are not compatible in PostgreSQL, so the planner never picks the vector index, and the database CPU-walks every embedding in the partition for each query.I observed this in production on a small dataset (~16k rows, 3072-dim embeddings,
tensorchord/cloudnative-vectorchord:18.3-1.1.1): the primary Postgres pod sits at ~870m CPU because the retain pipeline's Phase 1 ANN link search (aCROSS JOIN LATERALover_ann_seeds) is multiplying the slowdown by the seed count. Other backends (pgvector,pgvectorscale,pg_diskann) are unaffected because they map tovector_cosine_opsand match the<=>operator.Environment
main(verified on commitbd86e7ea, ahead by the recent merge bringing the repo current as of the report date)vchord(tensorchord/cloudnative-vectorchord:18.3-1.1.1, extensionvchord 1.1.1,vector 0.8.2)text-embedding-3-large-style, L2-normalized; see evidence below)Symptoms
cnpg-*-1primary Pod CPU stays at ~870m steady-state during retain operations.pg_stat_activityshows the Phase 1 ANN query (engine/retain/link_utils.py) running for 50+ seconds with emptywait_event(pure CPU, not I/O- or lock-bound).CROSS JOIN LATERAL.Root Cause
vchordrqoperator classes are strictly bound 1:1 to operators (VectorChord docs — Operator Classes):vector_l2_ops<->onlyvector_cosine_ops<=>onlyvector_ip_ops<#>onlyHindsight's vchord index mapping is in
hindsight-api-slim/hindsight_api/_vector_index.py:…but the ANN SQL across the engine uses cosine distance:
Because
<=>is not in the operator family ofvector_l2_ops, the planner cannot use the vchordrq index and falls back to a sequential scan with a per-row cosine computation, sorted bytop-N heapsort.Evidence (production cluster, 2927-row partition)
opencode::plugbear/fact_type = 'experience'(2927 rows of 3072-dim embeddings):Current query (
<=>, cosine):Same data, equivalent query rewritten with
<->(L2) — index is used:~5.6× single-query difference, and because the retain pipeline wraps this in
CROSS JOIN LATERALover a temp table of seeds, the penalty multiplies by seed count. A run with ~350 seeds takes ~50 s with the current opclass; the same workload on a matching opclass would finish in ~9 s.Embedding norm statistics (verifies that L2 and cosine are monotonically equivalent for this corpus, so the fix is mathematically safe to discuss in either direction):
Affected code
All call sites using
<=>(every one of these hits the same fallback when the backend is vchord):engine/retain/link_utils.py:737, 742— Phase 1 ANN link creation (hottest path)engine/search/retrieval.py:410, 411, 415, 517, 532— semantic recallengine/search/link_expansion_retrieval.py:86, 91, 95— graph link expansionengine/reflect/tools.py:86, 89— reflect tool retrievalengine/sql/postgresql.py:157, 164, 168— SQL helper used to compose other queriesSecondary finding:
SET LOCAL hnsw.ef_searchis a no-op under vchordengine/retain/link_utils.py:711-713:The
hnsw.ef_searchGUC is part of pgvector's HNSW index. It does not exist under vchord, so theSET LOCALhas no effect whenvchordrqis in use. The intended "recall-vs-latency" tuning has therefore never been applied to vchord deployments. The equivalent vchord dials arevchordrq.probesandvchordrq.epsilon(see Fallback Parameters).Why this only affects vchord users
The other three backends (
pgvector,pgvectorscale,pg_diskann) all map tovector_cosine_opsin_vector_index.py, so their indexes match the<=>operator and the same code paths perform as intended.What the fix probably looks like (for discussion)
VectorChord's documentation explicitly recommends
vector_cosine_opsfor normalized/cosine embeddings:So the cleanest fix is to switch the vchord mapping in
_vector_index.pytovector_cosine_ops(optionally withresidual_quantization = trueandspherical_centroids = true) and provide an Alembic migration that re-creates the existing vchordrq indexes — both the global one (idx_memory_units_embedding_vchordrq) and the many per-(bank_id, fact_type)partial indexes — with the new opclass, ideally viaCREATE INDEX CONCURRENTLYfollowed by drops of the old ones.The secondary
hnsw.ef_searchissue can be fixed by dispatching the tuning GUC by backend (vchordrq.probes/vchordrq.epsilonfor vchord,hnsw.ef_searchfor pgvector, etc.).I'm happy to send a PR along these lines — opening this issue first so the diagnosis is documented and so anyone hitting the same symptom can find it.
Reproduction
vchordextension andHINDSIGHT_API_VECTOR_EXTENSION=vchord.(bank_id, fact_type)partition contains a few thousand rows.<=>and nowait_event.EXPLAIN (ANALYZE, BUFFERS)showsSeq Scanover the partition with the vchordrq index ignored.