feat(db): unified asset search (semantic + text + tag filters) by bryansayler · Pull Request #25 · Quirk-Systems/project-scaffold

bryansayler · 2026-05-25T05:55:10Z

What

Adds searchAssets() (src/lib/db/search.ts) — one typed entry point for "find the right asset fast" over the existing quirk_assets registry:

Semantic — when a query embedding (1536-d) is supplied, ranks by pgvector cosine similarity (1 - cosineDistance), returning a similarity score per hit.
Text — case-insensitive match on title / raw_text (wildcards escaped).
Filters — assetTypes, statuses, and tags (matched via quirk_annotations where annotation_type = 'tag').
Pagination via clamped limit (1–100) / offset.

Returns the repo's Result<AssetSearchHit[]> rather than throwing.

Why this shape

Builds on the schema that already exists — no new table, no migration, no new dependencies. Embedding generation is deliberately out of scope: the caller passes the query vector, which keeps this module pure retrieval and free of any API-key/network dependency.

Tests

src/lib/db/search.test.ts covers normalizeSearchParams (limit clamping, offset floor, text trimming, tag lower-casing/de-dup, type/status de-dup, minSimilarity clamping, embedding-dimension validation). The normalization logic is pure, so it runs under Vitest with no database.

Notes / open questions

This is a deliberately small first slice. It's the searchable foundation the bigger "auto-research / self-improving agent" ideas would sit on — those are not in this PR and await a real spec.
Embedding generation (which model/provider populates quirk_assets.embedding and produces query vectors) is the natural next PR.
Query execution isn't unit-tested (needs a live Postgres + pgvector); happy to add an integration/E2E test if you want it wired into CI.

Opened as a draft — review the API surface and tell me if the slice is right before I build further.

Generated by Claude Code

Add searchAssets() over the existing quirk_assets registry: pgvector cosine-similarity ranking when a query embedding is supplied, plus text (title/raw_text), asset-type, status, and tag filters. Embedding generation stays out of scope — the caller passes the query vector — so the module has no new dependencies and the param-normalization logic is unit-tested without a database.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2682d1760

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-25T06:27:54Z

+    limit: clampInt(params.limit ?? DEFAULT_LIMIT, 1, MAX_LIMIT),
+    offset: Math.max(0, Math.trunc(params.offset ?? 0)),


Reject non-finite limit/offset values during normalization

normalizeSearchParams currently clamps/truncates limit and offset without checking finiteness, so values like NaN (common when parsing query params such as Number("abc")) survive as NaN because Math.trunc/Math.max/Math.min propagate it. That lets p.limit/p.offset reach Drizzle as invalid numbers and can turn a bad client input into a runtime query error instead of falling back to safe pagination defaults.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-25T06:27:54Z

+    if (similarity) {
+      conditions.push(isNotNull(quirkAssets.embedding));
+      if (p.minSimilarity !== undefined) {
+        conditions.push(gt(similarity, p.minSimilarity));


Apply minSimilarity as an inclusive lower bound

The code documents minSimilarity as a minimum threshold, but the query uses gt(similarity, p.minSimilarity), which excludes rows exactly at the threshold. In practice this makes minSimilarity: 1 return no results even for perfect matches, and generally drops boundary-equal hits that callers would reasonably expect to keep.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-25T06:27:54Z

+    params.embedding !== undefined &&
+    params.embedding.length !== EMBEDDING_DIMENSIONS
+  ) {


Validate embedding values are finite numbers

The embedding validation only checks array length, so vectors containing NaN/Infinity pass normalization and are sent into the pgvector distance expression. pgvector requires finite elements, so this turns malformed caller input into a database error path (Result failure) instead of deterministic upfront validation, even though this function is intended to normalize and validate search parameters.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-25T06:27:54Z

+  const minSimilarity =
+    params.minSimilarity === undefined
+      ? undefined
+      : Math.min(Math.max(params.minSimilarity, 0), 1);


Reject NaN minSimilarity values during normalization

minSimilarity is clamped but not validated for finiteness, so an input like Number("abc") becomes NaN and survives normalization. That NaN is then used in the SQL threshold predicate, which produces unintuitive filtering behavior (effectively dropping all finite-similarity rows) instead of treating the input as invalid or defaulting safely.

Useful? React with 👍 / 👎.

bryansayler marked this pull request as ready for review May 25, 2026 06:22

bryansayler merged commit bd41f8f into main May 25, 2026
3 checks passed

bryansayler deleted the claude/fervent-hypatia-O49BE branch May 25, 2026 06:22

chatgpt-codex-connector Bot reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(db): unified asset search (semantic + text + tag filters)#25

feat(db): unified asset search (semantic + text + tag filters)#25
bryansayler merged 1 commit into
mainfrom
claude/fervent-hypatia-O49BE

bryansayler commented May 25, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		limit: clampInt(params.limit ?? DEFAULT_LIMIT, 1, MAX_LIMIT),
		offset: Math.max(0, Math.trunc(params.offset ?? 0)),

Conversation

bryansayler commented May 25, 2026

What

Why this shape

Tests

Notes / open questions

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant