feat(db): unified asset search (semantic + text + tag filters)#25
Conversation
Add searchAssets() over the existing quirk_assets registry: pgvector cosine-similarity ranking when a query embedding is supplied, plus text (title/raw_text), asset-type, status, and tag filters. Embedding generation stays out of scope — the caller passes the query vector — so the module has no new dependencies and the param-normalization logic is unit-tested without a database.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e2682d1760
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| limit: clampInt(params.limit ?? DEFAULT_LIMIT, 1, MAX_LIMIT), | ||
| offset: Math.max(0, Math.trunc(params.offset ?? 0)), |
There was a problem hiding this comment.
Reject non-finite limit/offset values during normalization
normalizeSearchParams currently clamps/truncates limit and offset without checking finiteness, so values like NaN (common when parsing query params such as Number("abc")) survive as NaN because Math.trunc/Math.max/Math.min propagate it. That lets p.limit/p.offset reach Drizzle as invalid numbers and can turn a bad client input into a runtime query error instead of falling back to safe pagination defaults.
Useful? React with 👍 / 👎.
| if (similarity) { | ||
| conditions.push(isNotNull(quirkAssets.embedding)); | ||
| if (p.minSimilarity !== undefined) { | ||
| conditions.push(gt(similarity, p.minSimilarity)); |
There was a problem hiding this comment.
Apply minSimilarity as an inclusive lower bound
The code documents minSimilarity as a minimum threshold, but the query uses gt(similarity, p.minSimilarity), which excludes rows exactly at the threshold. In practice this makes minSimilarity: 1 return no results even for perfect matches, and generally drops boundary-equal hits that callers would reasonably expect to keep.
Useful? React with 👍 / 👎.
| params.embedding !== undefined && | ||
| params.embedding.length !== EMBEDDING_DIMENSIONS | ||
| ) { |
There was a problem hiding this comment.
Validate embedding values are finite numbers
The embedding validation only checks array length, so vectors containing NaN/Infinity pass normalization and are sent into the pgvector distance expression. pgvector requires finite elements, so this turns malformed caller input into a database error path (Result failure) instead of deterministic upfront validation, even though this function is intended to normalize and validate search parameters.
Useful? React with 👍 / 👎.
| const minSimilarity = | ||
| params.minSimilarity === undefined | ||
| ? undefined | ||
| : Math.min(Math.max(params.minSimilarity, 0), 1); |
There was a problem hiding this comment.
Reject NaN minSimilarity values during normalization
minSimilarity is clamped but not validated for finiteness, so an input like Number("abc") becomes NaN and survives normalization. That NaN is then used in the SQL threshold predicate, which produces unintuitive filtering behavior (effectively dropping all finite-similarity rows) instead of treating the input as invalid or defaulting safely.
Useful? React with 👍 / 👎.
What
Adds
searchAssets()(src/lib/db/search.ts) — one typed entry point for "find the right asset fast" over the existingquirk_assetsregistry:embedding(1536-d) is supplied, ranks by pgvector cosine similarity (1 - cosineDistance), returning asimilarityscore per hit.title/raw_text(wildcards escaped).assetTypes,statuses, andtags(matched viaquirk_annotationswhereannotation_type = 'tag').limit(1–100) /offset.Returns the repo's
Result<AssetSearchHit[]>rather than throwing.Why this shape
Builds on the schema that already exists — no new table, no migration, no new dependencies. Embedding generation is deliberately out of scope: the caller passes the query vector, which keeps this module pure retrieval and free of any API-key/network dependency.
Tests
src/lib/db/search.test.tscoversnormalizeSearchParams(limit clamping, offset floor, text trimming, tag lower-casing/de-dup, type/status de-dup,minSimilarityclamping, embedding-dimension validation). The normalization logic is pure, so it runs under Vitest with no database.Notes / open questions
quirk_assets.embeddingand produces query vectors) is the natural next PR.Opened as a draft — review the API surface and tell me if the slice is right before I build further.
Generated by Claude Code