Skip to content

feat(db): unified asset search (semantic + text + tag filters)#25

Merged
bryansayler merged 1 commit into
mainfrom
claude/fervent-hypatia-O49BE
May 25, 2026
Merged

feat(db): unified asset search (semantic + text + tag filters)#25
bryansayler merged 1 commit into
mainfrom
claude/fervent-hypatia-O49BE

Conversation

@bryansayler
Copy link
Copy Markdown
Contributor

What

Adds searchAssets() (src/lib/db/search.ts) — one typed entry point for "find the right asset fast" over the existing quirk_assets registry:

  • Semantic — when a query embedding (1536-d) is supplied, ranks by pgvector cosine similarity (1 - cosineDistance), returning a similarity score per hit.
  • Text — case-insensitive match on title / raw_text (wildcards escaped).
  • FiltersassetTypes, statuses, and tags (matched via quirk_annotations where annotation_type = 'tag').
  • Pagination via clamped limit (1–100) / offset.

Returns the repo's Result<AssetSearchHit[]> rather than throwing.

Why this shape

Builds on the schema that already exists — no new table, no migration, no new dependencies. Embedding generation is deliberately out of scope: the caller passes the query vector, which keeps this module pure retrieval and free of any API-key/network dependency.

Tests

src/lib/db/search.test.ts covers normalizeSearchParams (limit clamping, offset floor, text trimming, tag lower-casing/de-dup, type/status de-dup, minSimilarity clamping, embedding-dimension validation). The normalization logic is pure, so it runs under Vitest with no database.

Notes / open questions

  • This is a deliberately small first slice. It's the searchable foundation the bigger "auto-research / self-improving agent" ideas would sit on — those are not in this PR and await a real spec.
  • Embedding generation (which model/provider populates quirk_assets.embedding and produces query vectors) is the natural next PR.
  • Query execution isn't unit-tested (needs a live Postgres + pgvector); happy to add an integration/E2E test if you want it wired into CI.

Opened as a draft — review the API surface and tell me if the slice is right before I build further.


Generated by Claude Code

Add searchAssets() over the existing quirk_assets registry: pgvector
cosine-similarity ranking when a query embedding is supplied, plus
text (title/raw_text), asset-type, status, and tag filters. Embedding
generation stays out of scope — the caller passes the query vector — so
the module has no new dependencies and the param-normalization logic is
unit-tested without a database.
@bryansayler bryansayler marked this pull request as ready for review May 25, 2026 06:22
@bryansayler bryansayler merged commit bd41f8f into main May 25, 2026
3 checks passed
@bryansayler bryansayler deleted the claude/fervent-hypatia-O49BE branch May 25, 2026 06:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2682d1760

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/lib/db/search.ts
Comment on lines +113 to +114
limit: clampInt(params.limit ?? DEFAULT_LIMIT, 1, MAX_LIMIT),
offset: Math.max(0, Math.trunc(params.offset ?? 0)),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject non-finite limit/offset values during normalization

normalizeSearchParams currently clamps/truncates limit and offset without checking finiteness, so values like NaN (common when parsing query params such as Number("abc")) survive as NaN because Math.trunc/Math.max/Math.min propagate it. That lets p.limit/p.offset reach Drizzle as invalid numbers and can turn a bad client input into a runtime query error instead of falling back to safe pagination defaults.

Useful? React with 👍 / 👎.

Comment thread src/lib/db/search.ts
if (similarity) {
conditions.push(isNotNull(quirkAssets.embedding));
if (p.minSimilarity !== undefined) {
conditions.push(gt(similarity, p.minSimilarity));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply minSimilarity as an inclusive lower bound

The code documents minSimilarity as a minimum threshold, but the query uses gt(similarity, p.minSimilarity), which excludes rows exactly at the threshold. In practice this makes minSimilarity: 1 return no results even for perfect matches, and generally drops boundary-equal hits that callers would reasonably expect to keep.

Useful? React with 👍 / 👎.

Comment thread src/lib/db/search.ts
Comment on lines +85 to +87
params.embedding !== undefined &&
params.embedding.length !== EMBEDDING_DIMENSIONS
) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate embedding values are finite numbers

The embedding validation only checks array length, so vectors containing NaN/Infinity pass normalization and are sent into the pgvector distance expression. pgvector requires finite elements, so this turns malformed caller input into a database error path (Result failure) instead of deterministic upfront validation, even though this function is intended to normalize and validate search parameters.

Useful? React with 👍 / 👎.

Comment thread src/lib/db/search.ts
Comment on lines +101 to +104
const minSimilarity =
params.minSimilarity === undefined
? undefined
: Math.min(Math.max(params.minSimilarity, 0), 1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject NaN minSimilarity values during normalization

minSimilarity is clamped but not validated for finiteness, so an input like Number("abc") becomes NaN and survives normalization. That NaN is then used in the SQL threshold predicate, which produces unintuitive filtering behavior (effectively dropping all finite-similarity rows) instead of treating the input as invalid or defaulting safely.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant