events: denormalize kind onto event_tags so tag-CTE can pre-filter by kind by MastaP · Pull Request #24 · unicitynetwork/unicity-relay

MastaP · 2026-05-09T11:22:41Z

Fixes #23.

Why

Production groupchat relay (`sphere-zooid-relay-eu`) is at 99.7% CPU on db.m6g.large because two NIP-29 group-history queries account for ~65% of all DBLoad. The materialized tag CTE filters by tag only — it can't see the `kind IN (...)` predicate.

Event distribution in production:

kind	count	what
9021	90,437	NIP-29 join request
9000	90,024	NIP-29 add member
9	11,365	chat message
other	~3k	group state, etc.

So for a hot group like `general`: ~97k events tagged `h='general'` (95k membership + 1.4k chat). A subscription read for `{kinds: [9,11,12], #h: ['general']}` materializes 97k tag rows, hash-joins, throws away 95% on the kind filter, sorts, takes 1000. ~1.1s per query (verified by EXPLAIN ANALYZE).

50k users × ~2 membership events per join is what tipped the balance. Same query was fine at 45k users.

What

`event_tags` gets a nullable `kind` column.
`saveEventWith` writes kind alongside (event_id, key, value); batch size unchanged at 15000 (60000 params, under 65535 cap).
`buildSelectQuery` pushes `kind IN (...) OR kind IS NULL` into each per-tag CTE branch when `filter.Kinds` is set. The `IS NULL` branch keeps reads correct on un-backfilled rows during rollout.
`Init` is split into pre-migrate and post-migrate phases. The new `(key, value, kind, event_id)` covering index lives in post-migrate so it runs only after migration 002 adds the column on existing schemas.
Migration 002: `ALTER TABLE event_tags ADD COLUMN IF NOT EXISTS kind INTEGER`. Instant on PG 11+.

The new index creation and the historical-row backfill are intentionally not in the auto-migration. The runner enforces a 30s per-statement deadline; `CREATE INDEX CONCURRENTLY` cannot run inside a transaction and leaves an INVALID index on timeout that `IF NOT EXISTS` would silently skip on retry. So those run as one-shot ops via the dbops task — see Rollout below.

Once the CTE returns 1.4k rows instead of 97k for hot groups, expected per-query cost drops from ~1.1s to ~10–50ms.

Verification

`go test -timeout 300s ./...` passes; `gofmt -l -s .` clean. Four new integration tests:

`TestEventStore_SaveEvent_PopulatesTagKind` — kind column populated on insert.
`TestEventStore_QueryEvents_TagAndKindCTE` — 3 chat + 5 membership events tagged with the same group; asserts the kind filter excludes membership at the CTE.
`TestEventStore_Init_UpgradeFromPre002` — the test that earns its keep. Drops the kind column + index + migration marker on a freshly-initialized schema, then re-runs `Init()`. Asserts column and post-migrate index both come back. Fresh-schema tests would not catch the original ordering bug (CREATE TABLE rebuilds the column), so this simulates the production upgrade path explicitly.
`TestEventStore_QueryEvents_TagKindNullCompat` — clears `kind` to NULL on a saved event's tag rows and asserts the `OR IS NULL` branch keeps it visible.

Rollout (production schema with 438k existing tag rows)

Run these from the dbops task before merging this PR, so the column and index exist before the new code path lands:

```sql
-- 1. Add column (instant, metadata-only on PG 11+).
ALTER TABLE sphere_relay__event_tags ADD COLUMN IF NOT EXISTS kind INTEGER;

-- 2. Backfill historical rows. Idempotent — re-runnable if interrupted.
UPDATE sphere_relay__event_tags et
SET kind = e.kind
FROM sphere_relay__events e
WHERE et.event_id = e.id AND et.kind IS NULL;

-- 3. Build the new covering index (does NOT block writes).
CREATE INDEX CONCURRENTLY IF NOT EXISTS sphere_relay__idx_event_tags_key_value_kind_event_id
ON sphere_relay__event_tags(key, value, kind, event_id);
```

After merge, `Init()` will see the column and index already exist (IF NOT EXISTS is idempotent), the migration marker will be set on first run, and the new CTE shape goes live with full plan support.

Follow-ups (not in this PR)

Drop the `OR kind IS NULL` branch once production backfill is verified complete and any remaining NULL rows are reconciled.
Consider whether the eventual storage of all 90k+ membership events as durable history is the right design vs. a snapshot/projection. That's a NIP-29 design conversation, not an immediate fix.

… kind For NIP-29 group reads (`{kinds: [9,11,12], #h: ['<group>']}`), the materialized tag CTE filters by tag only — it can't see the kind predicate. Hot groups whose tag rows are dominated by membership events (kinds 9000/9021) feed ~95k useless rows into the hash-join just to throw them away on the kind filter, dominating CPU on the read path. Production observation that motivated this: 50k users × ~2 membership events per join (9021 + 9000) makes the `h='general'` tag set ~98k rows, of which only ~1.4k are actual chat. Each subscription read pays for the full 98k. PI top SQL showed two query shapes accounting for ~65% of all DBLoad on db.m6g.large pegged at 99.7% CPU. Issue #23. Changes: - event_tags gets a nullable `kind` column. Fresh schemas have it via CREATE TABLE; existing schemas pick it up via migration 002. - saveEventWith writes the kind alongside (event_id, key, value). Batch size stays at 15000 (60000 params, under 65535 cap) to keep the kind-39002 SERIALIZABLE critical-section economy intact. - buildSelectQuery pushes `kind IN (...) OR kind IS NULL` into each per-tag CTE branch when filter.Kinds is set. The IS NULL branch keeps reads correct on un-backfilled rows during the production rollout window. Drop it in a follow-up once backfill is verified. - Init splits the schema-bootstrap into pre-migrate and post-migrate phases. The covering index `(key, value, kind, event_id)` lives in post-migrate so it runs only after migration 002 has added the column on existing schemas — avoiding the "column does not exist" failure on the production upgrade path. Migration scope is intentionally minimal (ADD COLUMN only). The backfill UPDATE and CREATE INDEX CONCURRENTLY are documented as manual one-shot ops because the auto-migration runner has a 30s per-statement deadline and CONCURRENTLY can't survive that without leaving an INVALID index that IF NOT EXISTS would silently skip. Tests: - TestEventStore_SaveEvent_PopulatesTagKind — verifies kind populates on new inserts. - TestEventStore_QueryEvents_TagAndKindCTE — saves 3 chat + 5 membership events tagged with the same group; asserts the kind filter excludes membership at the CTE. - TestEventStore_Init_UpgradeFromPre002 — drops the kind column and the migration marker on a freshly-initialized schema, then re-runs Init() and asserts the column and post-migrate index both come back. This is the test that catches the pre-migrate/post-migrate ordering bug — fresh-schema tests can't, because CREATE TABLE on a fresh schema makes the column from the start. - TestEventStore_QueryEvents_TagKindNullCompat — clears kind to NULL on a saved event's tag rows and asserts the OR IS NULL branch keeps it visible. Fixes #23.

gemini-code-assist

Code Review

This pull request optimizes tag-based queries by denormalizing the kind column into the event_tags table and introducing a new covering index. The changes include updates to the schema initialization, event saving logic, and the query builder to improve performance for groups with many membership events. Feedback focuses on improving the database migration safety by using concurrent index creation, removing a now-redundant index, refactoring duplicated kind-conversion logic, and ensuring errors from the SQL builder are properly handled.

Copilot

Pull request overview

This PR denormalizes events.kind onto event_tags so the tag-filter MATERIALIZED CTE can pre-filter by kind, significantly reducing work for hot NIP-29 group-history queries dominated by membership-event tag rows (issue #23).

Changes:

Add nullable event_tags.kind via migration 002 and populate it on the write path (saveEventWith).
Push kind IN (...) OR kind IS NULL into each per-tag CTE branch when filter.Kinds is set, preserving correctness during backfill rollout.
Split EventStore.Init() into pre-/post-migrate phases and add integration tests covering insert population, upgrade ordering, and NULL-compat behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`zooid/migrations/002_event_tags_kind.sql`	Adds nullable `kind` column to `__event_tags` and documents rollout considerations.
`zooid/events.go`	Writes `kind` into `event_tags`, pushes kind predicate into tag CTE branches, and adds a post-migrate covering index creation step.
`zooid/events_test.go`	Adds integration tests for tag-kind population, kind/NULL compatibility, and pre-002 upgrade path behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- gemini: hoist `kindInts` conversion out of the CTE block to function scope and reuse it in the outer events filter — was duplicated. - gemini: error-check `subQ.ToSql()` instead of dropping the error. Build-state failures here are programmer errors (malformed squirrel.Select), so panic loudly rather than silently appending an empty CTE part. - Copilot: strengthen TestEventStore_QueryEvents_TagAndKindCTE to assert on the generated SQL shape — the result-level assertion alone would still pass with the kind pushdown removed (the outer events.kind filter does the real work). Now we slice out the CTE text and require it to contain a `kind` predicate, so the optimization itself is guarded against regression. - Copilot: update migration 002 comment to reflect what actually happens — the post-migrate `CREATE INDEX IF NOT EXISTS` in Init() is a no-op on production after the documented dbops one-shot CONCURRENTLY create has run, fast on fresh schemas. The migration file no longer claims the index is "intentionally not applied" by any startup path. Not taken: gemini's suggestion to drop the now-prefix-redundant `(key, value, event_id)` index. The new `(key, value, kind, event_id)` covers it via prefix matching, so dropping is safe in principle, but removing a hot-path index in the same change as adding the new one mixes two reversibility profiles. Will do as a follow-up after the new index is verified in production.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

- Copilot: stop running CREATE INDEX in Init() — move it into migration 002. Migration runs once per schema, with the runner's 30s per-statement deadline acting as a fail-fast on any production schema where the index wasn't pre-built via the dbops CONCURRENTLY step. With the prep done, IF NOT EXISTS no-ops the migration's CREATE INDEX. Without prep, the migration fails loudly and the relay refuses to start with an actionable error. The pre-/post-migrate split in Init() goes away with this — there's nothing post-migrate left to run. - Copilot: stop panicking on `subQ.ToSql()` failure in `buildSelectQuery`. The function now returns `(squirrel.SelectBuilder, error)`. Both callers (`queryEventsWith`, `CountEvents`) handle the error and return / log instead of crashing the goroutine. - Copilot: replace the brittle exact-substring slicing in TestEventStore_QueryEvents_TagAndKindCTE with a regex `(?s)MATERIALIZED.*?\\bkind\\b.*?\\)\\s+SELECT\\b`. Both lazy quantifiers anchor to the nearest CTE-close-and-SELECT, so an outer-only `e.kind` filter doesn't satisfy it (there is no `) SELECT` after the outer query's kind). Tolerant of squirrel whitespace/formatting drift.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

zooid/events.go:299

buildSelectQuery now returns (squirrel.SelectBuilder, error), but zooid/query_performance_test.go still calls it as if it returned only a builder (e.g., in explainAnalyze). This will break compilation/tests; update that helper (and any other remaining call sites) to handle the new error return.

func (events *EventStore) buildSelectQuery(filter nostr.Filter) (squirrel.SelectBuilder, error) {
	eventsTable := events.Schema.Prefix("events")
	eventTagsTable := events.Schema.Prefix("event_tags")

	// Collect valid single-letter tag filters and sort for deterministic SQL.

- gemini: hoist `kindInts` conversion out of the CTE block to function scope and reuse it in the outer events filter — was duplicated. - gemini: error-check `subQ.ToSql()` instead of dropping the error. Build-state failures here are programmer errors (malformed squirrel.Select), so panic loudly rather than silently appending an empty CTE part. - Copilot: strengthen TestEventStore_QueryEvents_TagAndKindCTE to assert on the generated SQL shape — the result-level assertion alone would still pass with the kind pushdown removed (the outer events.kind filter does the real work). Now we slice out the CTE text and require it to contain a `kind` predicate, so the optimization itself is guarded against regression. - Copilot: update migration 002 comment to reflect what actually happens — the post-migrate `CREATE INDEX IF NOT EXISTS` in Init() is a no-op on production after the documented dbops one-shot CONCURRENTLY create has run, fast on fresh schemas. The migration file no longer claims the index is "intentionally not applied" by any startup path. Not taken: gemini's suggestion to drop the now-prefix-redundant `(key, value, event_id)` index. The new `(key, value, kind, event_id)` covers it via prefix matching, so dropping is safe in principle, but removing a hot-path index in the same change as adding the new one mixes two reversibility profiles. Will do as a follow-up after the new index is verified in production.

MastaP requested a review from Copilot May 9, 2026 11:23

Copilot started reviewing on behalf of MastaP May 9, 2026 11:23 View session

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

Comment thread zooid/events.go Outdated

Comment thread zooid/events.go Outdated

Comment thread zooid/events.go Outdated

Copilot AI reviewed May 9, 2026

View reviewed changes

Comment thread zooid/events.go Outdated

Comment thread zooid/events_test.go

Comment thread zooid/migrations/002_event_tags_kind.sql Outdated

MastaP requested a review from Copilot May 9, 2026 11:33

Copilot started reviewing on behalf of MastaP May 9, 2026 11:34 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

Comment thread zooid/events.go Outdated

Comment thread zooid/events.go Outdated

Comment thread zooid/events_test.go Outdated

MastaP requested a review from Copilot May 9, 2026 12:00

Copilot started reviewing on behalf of MastaP May 9, 2026 12:00 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

MastaP merged commit 0ac08db into master May 9, 2026
5 checks passed

MastaP deleted the fix/event-tags-kind-denorm branch May 9, 2026 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

events: denormalize kind onto event_tags so tag-CTE can pre-filter by kind#24

events: denormalize kind onto event_tags so tag-CTE can pre-filter by kind#24
MastaP merged 3 commits into
masterfrom
fix/event-tags-kind-denorm

MastaP commented May 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MastaP commented May 9, 2026

Why

What

Verification

Rollout (production schema with 438k existing tag rows)

Follow-ups (not in this PR)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants