diff --git a/docs/feature_units/completed/FU-113/FU-113_spec.md b/docs/feature_units/completed/FU-113/FU-113_spec.md new file mode 100644 index 00000000..e5b60921 --- /dev/null +++ b/docs/feature_units/completed/FU-113/FU-113_spec.md @@ -0,0 +1,247 @@ +# Feature Unit: FU-113 Entity Extensions + +**Status:** Draft +**Priority:** P0 (Critical) +**Risk Level:** Medium (schema + security) +**Target Release:** v0.2.0 +**Owner:** Worker Agent (Cursor) +**Reviewers:** Platform Engineering, Data Architecture +**Created:** 2025-12-18 +**Last Updated:** 2025-12-18 + +--- + +## Overview + +**Brief Description:** +Add user isolation and merge tracking to the `entities` table so every canonical entity is scoped to a single user, carries explicit provenance, and can be deterministically excluded once merged. This FU introduces `user_id`, `merged_to_entity_id`, `merged_at`, supporting indexes, and user-aware RLS policies. Application services are updated to stamp `user_id`, reject cross-user merges, and surface merged-entity state everywhere entities are resolved or queried. + +**User Value:** +- Guarantees privacy-first behavior: entities created from one user's data are invisible to others. +- Powers deterministic merge/correction flows by tracking which entities were merged and when. +- Unblocks MCP tooling (FU-116, FU-126) that relies on user-scoped entity graphs and audit-friendly merge trails. + +**Defensible Differentiation:** +Validates Neotoma's privacy-first + deterministic differentiators. User isolation is enforced at the schema/RLS layer, and merge tracking ensures deterministic provenance during manual correction flows—capabilities that hosted LLM memory providers typically cannot ship because of multi-tenant inference architectures. + +**Technical Approach:** +- Schema migration extends `entities` with `user_id`, `metadata`, `first_seen_at`, `last_seen_at`, `merged_to_entity_id`, `merged_at`. +- Create composite indexes for user + entity type/name lookups and merged-entity filtering. +- Replace permissive `public read` policy with `Users read own entities` + keep service-role policy. +- Application services (`entity_resolution`, `observation_ingestion`, MCP HTTP actions) require a `userId` parameter and stamp it on every insert/update. +- Introduce repo-wide `DEFAULT_USER_ID` constant (single-user placeholder) so current flows keep functioning until auth plumbs real IDs. +- Update tests + ingestion pipelines to assert user-aware behavior and merged-entity invariants. + +--- + +## Requirements + +### Functional Requirements + +1. **Schema Extension:** `entities` gains the new columns described in Section 2.4 with NOT NULL/default constraints where applicable. +2. **RLS Enforcement:** Replace permissive policies with user-isolated RLS (select/update/delete guarded by `user_id = auth.uid()`), while retaining service-role full access. +3. **User-Aware Resolution:** `resolveEntity` must accept a `userId`, look up entities scoped to that user, stamp `user_id` on insert, and refresh `last_seen_at` on reuse. +4. **Merge Tracking:** `merged_to_entity_id` + `merged_at` are populated when merge logic runs. Downstream queries (list/search) must filter out merged entities by default. +5. **Cross-FU Contracts:** Provide deterministic base for FU-116 (Entity Merges table) and FU-126 (merge_entities MCP tool) by exposing merged state and user-scoped IDs via Supabase + MCP endpoints. + +### Non-Functional Requirements + +1. **Performance:** Entity resolution lookups (by ID/type/name) must remain <15 ms P95 against Supabase (indexes required). +2. **Determinism:** Given `(entity_type, canonical_name, user_id)` the same `entity_id` must be returned on every run; merges are deterministic (idempotent updates). +3. **Consistency:** Strong consistency for entity rows—writes happen synchronously inside ingestion pipeline transactions. +4. **Accessibility / i18n:** No UI changes; ensure API messages remain ASCII + i18n neutral. +5. **Security:** RLS policies must not expose other users' data. Service role is the only role allowed to bypass `user_id` filters. + +### Invariants (MUST / MUST NOT) + +**MUST:** +- Stamp `user_id` on every insert/update into `entities`, `relationships`, `entity_merges`, and any dependent table that references entities. +- Keep `merged_to_entity_id` NULL until a merge happens; once populated it never reverts. +- Update `last_seen_at` whenever an entity is resolved for a new observation. +- Filter merged entities out of list/search endpoints unless caller explicitly requests merged rows. +- Maintain deterministic hash-based IDs (still derived from entity type + normalized value). + +**MUST NOT:** +- Allow cross-user merges or entity resolution (reject if `user_id` mismatch). +- Permit anonymous/public RLS policies on `entities`. +- Introduce nondeterministic merge IDs or timestamps (all server-generated). +- Mutate `canonical_name` post-creation (immutability preserved). + +--- + +## Affected Subsystems + +**Primary Subsystems:** +- `schema`: `entities` table, indexes, RLS policies. +- `ingestion/entity_resolution`: Deterministic user-aware ID generation + stamping. +- `observation_architecture`: Observation creation + reducer inputs reference user-scoped entities. +- `server` / MCP HTTP actions: entity queries/search/graph responses respect merge + user scope. + +**Dependencies:** +- Blocks `FU-116` (Entity Merges table requires merged-to metadata). +- Blocks `FU-126` (merge_entities MCP tool expects user-scoped entities). +- Relies on `FU-110` base schema migration being applied first. + +**Documentation to Load:** +- `docs/subsystems/schema.md` +- `docs/subsystems/entity_merge.md` +- `docs/subsystems/ingestion/ingestion.md` +- `docs/foundation/entity_resolution.md` + +--- + +## Schema Changes + +**Tables Affected:** + +```sql +ALTER TABLE entities + ADD COLUMN IF NOT EXISTS user_id UUID NOT NULL DEFAULT '00000000-0000-0000-0000-000000000000', + ADD COLUMN IF NOT EXISTS metadata JSONB NOT NULL DEFAULT '{}', + ADD COLUMN IF NOT EXISTS first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + ADD COLUMN IF NOT EXISTS last_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + ADD COLUMN IF NOT EXISTS merged_to_entity_id TEXT REFERENCES entities(id), + ADD COLUMN IF NOT EXISTS merged_at TIMESTAMPTZ; + +UPDATE entities +SET user_id = '00000000-0000-0000-0000-000000000000' +WHERE user_id IS NULL; +ALTER TABLE entities ALTER COLUMN user_id DROP DEFAULT; +``` + +**Indexes:** + +```sql +CREATE INDEX IF NOT EXISTS idx_entities_user ON entities(user_id); +CREATE INDEX IF NOT EXISTS idx_entities_user_type ON entities(user_id, entity_type); +CREATE INDEX IF NOT EXISTS idx_entities_user_type_name + ON entities(user_id, entity_type, canonical_name); +CREATE INDEX IF NOT EXISTS idx_entities_merged + ON entities(user_id, merged_to_entity_id) + WHERE merged_to_entity_id IS NOT NULL; +``` + +**RLS Policies:** + +```sql +ALTER TABLE entities ENABLE ROW LEVEL SECURITY; +DROP POLICY IF EXISTS "public read - entities" ON entities; +CREATE POLICY "Users read own entities" ON entities + FOR SELECT USING (user_id = auth.uid()); +CREATE POLICY "Users mutate own entities" ON entities + FOR INSERT WITH CHECK (user_id = auth.uid()) + TO authenticated; +CREATE POLICY "Service role full access - entities" ON entities + FOR ALL TO service_role USING (true) WITH CHECK (true); +``` + +**Migration Required:** Yes — `supabase/migrations/20251218100000_fu_113_entity_extensions.sql`. + +--- + +## API / MCP Changes + +- `resolveEntity(entityType, rawValue, userId)` signature change; all call sites (ingestion pipeline, /store_record MCP action, observation ingestion) pass the authenticated `userId` (falls back to default single-user ID until auth wiring lands). +- Entity search endpoints (`retrieve_entities`, `get_entity_by_identifier`, `get_related_entities`) filter out merged entities by default and only operate on caller's `user_id`. +- Relationship + provenance endpoints maintain behavior but rely on `user_id` derived from request context. +- No new MCP actions introduced; schema change is prerequisite for `merge_entities()` (FU-126) but that action is not part of this FU. + +_API Contract snippet (resolveEntity):_ + +```typescript +export async function resolveEntity( + entityType: string, + rawValue: string, + userId: string, +): Promise; +``` + +Errors thrown: `ENTITY_ACCESS_DENIED`, `ENTITY_ALREADY_MERGED`, `MERGE_TARGET_ALREADY_MERGED` (see entity_merge spec). + +--- + +## UI Changes + +None. v0.2.0 is still MCP-only; CLI/agent responses inherit new merge flags through API responses (structured JSON only). + +--- + +## State Machines + +No new state machines introduced. Merge lifecycle documented in `docs/subsystems/entity_merge.md` (reuse existing diagram). + +--- + +## Observability + +**Metrics:** +- `entity.resolve_total` (counter, labels: `result=created|reused`, `entity_type`). +- `entity.merge_redirects_total` (counter, increments when merged entity is requested and redirected). +- `entity.merged_active_total` (gauge, tracks active merged entities per user). + +**Logs:** +- `info`: "Entity resolved" (fields: `entity_id`, `user_id`, `result`). +- `warn`: "Merge target rejected" (fields: `from_entity_id`, `to_entity_id`, `user_id`). + +**Events:** +- `entity.merged` domain event once merge pipeline runs (payload: `from`, `to`, `user_id`, `observations_rewritten`). + +**Traces:** +- Span `resolve_entity` covering normalization + lookup/inserts; attributes include `user_id`, `entity_type`, `result`. + +--- + +## Testing Strategy + +**Unit Tests:** +- `src/services/entity_resolution.test.ts`: verifies `resolveEntity` uses `userId`, stamps `user_id`, updates `last_seen_at`, and prevents cross-user reuse. +- `src/services/observation_ingestion.test.ts`: ensures observation creation passes through `userId` and ignores merged entities. +- `src/services/payload_compiler.test.ts`: asserts compiled payloads propagate `userId` into downstream observation creation. + +**Integration Tests:** +- `tests/actions.integration.test.ts`: `store_record` flow should create entities + observations with `user_id` set, `idx_entities_user_type_name` leveraged, and merged entities excluded from default queries. +- `tests/graph_builder` suite: confirm orphan-entity detector respects `user_id` (fixtures updated). + +**E2E Tests (Playwright):** +- `tests/e2e/entity_merge.spec.ts` (new) or existing ingestion e2e: ingest duplicate record, verify MCP `retrieve_entities` hides merged entity after manual merge (simulate by updating `merged_to_entity_id`). + +**Fixtures:** +- Update `tests/__fixtures__/entities/*.json` if applicable to include `user_id`. +- Add sample merged-entity fixture for future MCP merge tests. + +**Expected Coverage:** >85% lines/branches for touched services; 100% coverage for new branches inside `resolveEntity`. + +--- + +## Error Scenarios + +| Scenario | Error Code | Message | Recovery | +| --- | --- | --- | --- | +| Missing user context | `ENTITY_ACCESS_DENIED` | "User ID required for entity resolution" | Provide authenticated user_id in MCP call | +| Entity already merged | `ENTITY_ALREADY_MERGED` | "Entity {id} merged to {target}" | Redirect to `merged_to_entity_id`, ask client to retry | +| Merge target merged | `MERGE_TARGET_ALREADY_MERGED` | "Cannot merge to inactive entity" | Choose active entity as target | +| RLS violation | `RLS_VIOLATION` | "Entity does not belong to caller" | Fix auth wiring / pass correct user_id | + +--- + +## Rollout and Deployment + +**Feature Flags:** None. Schema + application updates deploy together. + +**Rollback Plan:** +1. Revert application changes (restore old `resolveEntity`, constants). +2. Rollback migration (drop added columns + indexes) only if absolutely necessary; otherwise keep schema and disable RLS policy via Supabase. +3. Verify ingestion + MCP actions still function (run `npm run test` + `npm run test:integration`). + +**Monitoring:** +- Track Supabase errors involving `entities` table post-deploy. +- Watch `entity.resolve_total{result="created"}` ratio; spikes may signal dedup regressions. +- Alert on `entity.merged_active_total` if growth deviates (>10% of total entities => indicates duplicates or failed merges). + +--- + +## Notes + +- `DEFAULT_USER_ID` is a temporary shim. Once auth is wired, plumbing real `userId` through MCP actions is mandatory; this FU keeps that change localized. +- Merge behavior (FU-116/FU-126) depends on the merged-to metadata added here; do not defer `merged_to_entity_id` even if merge tooling ships later. +- Documentation references (`schema.md`, `entity_merge.md`, `ingestion.md`) must stay synchronized with the final schema + API behavior delivered by this FU. diff --git a/docs/feature_units/completed/FU-113/FU-FU-113_spec.md b/docs/feature_units/completed/FU-113/FU-FU-113_spec.md new file mode 100644 index 00000000..4839d776 --- /dev/null +++ b/docs/feature_units/completed/FU-113/FU-FU-113_spec.md @@ -0,0 +1,5 @@ + + +# Feature Unit: FU-113 Entity Extensions (Alias) + +The canonical specification for FU-113 is maintained in [`FU-113_spec.md`](./FU-113_spec.md). Automation that expects `FU-FU-113_spec.md` should follow that link to read the full document. diff --git a/docs/feature_units/completed/FU-113/manifest.yaml b/docs/feature_units/completed/FU-113/manifest.yaml new file mode 100644 index 00000000..879c2c69 --- /dev/null +++ b/docs/feature_units/completed/FU-113/manifest.yaml @@ -0,0 +1,143 @@ +feature_id: "fu_113" +version: "1.0.0" +status: "draft" +priority: "p0" +risk_level: "medium" + +metadata: + title: "Entity Extensions" + description: "Add user_id + merge tracking to entities, tighten RLS, and update ingestion services to be user-aware" + owner: "engineering@neotoma.com" + reviewers: + - "platform@neotoma.com" + created_at: "2025-12-18" + target_release: "v0.2.0" + +tags: + - "schema" + - "security" + - "ingestion" + - "merge" + +subsystems: + - name: "schema" + changes: "Extend entities with user_id/metadata/merged_to columns, add indexes, tighten RLS" + - name: "ingestion" + changes: "Entity resolution + observation ingestion now require user IDs and respect merged entities" + - name: "mcp_actions" + changes: "store_record/retrieve_entities endpoints filter by user and hide merged entities" + +schema_changes: + - table: "entities" + operation: "alter_table" + columns: + - name: "user_id" + type: "UUID" + nullable: false + - name: "metadata" + type: "JSONB" + nullable: false + default_value: "'{}'" + - name: "first_seen_at" + type: "TIMESTAMPTZ" + nullable: false + default_value: "NOW()" + - name: "last_seen_at" + type: "TIMESTAMPTZ" + nullable: false + default_value: "NOW()" + - name: "merged_to_entity_id" + type: "TEXT" + nullable: true + - name: "merged_at" + type: "TIMESTAMPTZ" + nullable: true + - table: "entities" + operation: "add_index" + index_name: "idx_entities_user" + columns: ["user_id"] + - table: "entities" + operation: "add_index" + index_name: "idx_entities_user_type" + columns: ["user_id", "entity_type"] + - table: "entities" + operation: "add_index" + index_name: "idx_entities_user_type_name" + columns: ["user_id", "entity_type", "canonical_name"] + - table: "entities" + operation: "add_index" + index_name: "idx_entities_merged" + columns: ["user_id", "merged_to_entity_id"] + where_clause: "WHERE merged_to_entity_id IS NOT NULL" + +mcp_actions: + - action: "store_record" + change_type: "modified" + changes: "Entity extraction + observation creation now pass authenticated user_id and respect merged entities" + - action: "retrieve_entities" + change_type: "modified" + changes: "Filters out merged entities and scopes queries to caller user_id" + +observability: + metrics: + - name: "entity.resolve_total" + type: "counter" + labels: ["result", "entity_type"] + description: "Counts entity resolutions by result (created vs reused)" + - name: "entity.merged_active_total" + type: "gauge" + labels: ["user_id"] + description: "Active merged entities per user" + logs: + - level: "info" + event: "entity_resolved" + fields: ["entity_id", "user_id", "result"] + - level: "warn" + event: "entity_merge_redirect" + fields: ["requested_entity", "target_entity", "user_id"] + +risk_mitigation: + - risk: "RLS regression exposes cross-user data" + mitigation: "Add automated test that fetches entities with mismatched user_id and expects zero rows" + - risk: "Existing data missing user_id" + mitigation: "Backfill default single-user ID inside migration before enforcing NOT NULL" + - risk: "Schema migration fails in Supabase" + mitigation: "Run migrations via scripts/run_migrations.js in CI before deploy" + +dependencies: + requires: + - feature_id: "fu_110" + reason: "Base schema migration must be applied" + blocks: + - feature_id: "fu_116" + reason: "Merge audit log depends on merged_to metadata" + - feature_id: "fu_126" + reason: "merge_entities tool requires user-scoped entities" + +testing: + unit_tests: + - name: "entity_resolution" + description: "Verify resolveEntity stamps user_id, updates last_seen_at, and handles merges" + - name: "observation_ingestion" + description: "Ensures observation creation propagates user_id" + integration_tests: + - name: "actions.store_record" + description: "End-to-end ingest ensures entities + observations share same user_id and merged entities excluded" + e2e_tests: + - name: "playwright:entity_merge" + description: "Simulate duplicate entity ingestion and confirm merged entity hidden from queries" + +feature_flags: + enabled: false + +rollout: + strategy: "instant" + monitoring: + key_metrics: + - metric: "entity.resolve_total" + threshold: "<5% increase in created vs baseline" + - metric: "entity.merged_active_total" + threshold: "<10% of total entities" + +notes: | + DEFAULT_USER_ID remains a temporary shim until auth wiring forwards real user IDs. Update docs (schema.md, entity_merge.md, ingestion.md) alongside code. diff --git a/docs/releases/in_progress/v0.2.0/agent_status.json b/docs/releases/in_progress/v0.2.0/agent_status.json index 0c98cd10..65b1229e 100644 --- a/docs/releases/in_progress/v0.2.0/agent_status.json +++ b/docs/releases/in_progress/v0.2.0/agent_status.json @@ -63,26 +63,25 @@ }, { "fu_id": "FU-113", - "worker_agent_id": null, - "status": "pending", - "progress": 0, - "started_at": null, - "last_update": null, - "completed_at": null, - "error": null, + "worker_agent_id": "worker_fu113_cursor", + "status": "completed", + "progress": 1.0, + "started_at": "2025-12-18T13:34:36.884940Z", + "last_update": "2025-12-18T14:02:59.631092Z", + "completed_at": "2025-12-18T13:45:37.935052Z", + "error": "Unit/integration blocked by missing Supabase credentials; E2E blocked by CSV loader still referencing .csv file.", "tests": { "unit": { - "passed": null, - "coverage": null, - "command": null + "passed": false, + "command": "npm run test (fails: Missing Supabase URL / service key in environment)" }, "integration": { - "passed": null, - "command": null + "passed": false, + "command": "npm run test:integration (fails: Missing Supabase URL / service key)" }, "e2e": { - "passed": null, - "command": null + "passed": false, + "command": "npm run test:e2e (fails: Playwright still importing CSV as module; SyntaxError in sets-medium.csv)" } } } diff --git a/frontend/src/sample-data/sample-records.ts b/frontend/src/sample-data/sample-records.ts index 10d7b7b4..5ab09490 100644 --- a/frontend/src/sample-data/sample-records.ts +++ b/frontend/src/sample-data/sample-records.ts @@ -1,6 +1,6 @@ import type { LocalRecord } from '@/store/types'; import { parseCsvRows } from '@/utils/csv'; -import workoutCsvRaw from './sets-medium.csv?raw'; +import { setsMediumCsv } from './sets-medium'; const SAMPLE_SEED_TAG = 'sample_seed'; const WORKOUT_SAMPLE_FILE = 'sets-medium.csv'; @@ -146,11 +146,11 @@ export function buildSampleRecords(): LocalRecord[] { } function buildWorkoutSampleRecords(indexOffset: number): LocalRecord[] { - if (!workoutCsvRaw || !workoutCsvRaw.trim()) { + if (!setsMediumCsv || !setsMediumCsv.trim()) { return []; } - const { rows, headers, truncated } = parseCsvRows(workoutCsvRaw, 5000); + const { rows, headers, truncated } = parseCsvRows(setsMediumCsv, 5000); if (rows.length === 0) { return []; } @@ -274,5 +274,3 @@ function formatIsoDate(iso: string): string { } export const SAMPLE_RECORD_STORAGE_KEY = 'neotoma.sampleSeeded'; - - diff --git a/frontend/src/sample-data/sets-medium.ts b/frontend/src/sample-data/sets-medium.ts new file mode 100644 index 00000000..cf6b1789 --- /dev/null +++ b/frontend/src/sample-data/sets-medium.ts @@ -0,0 +1,41 @@ +// CSV content as TypeScript string constant to avoid import issues in Playwright/Vite +export const setsMediumCsv = `Date,Exercise,Repetitions,Weight,Difficulty,RPE,Notes,Type,Created at +2024-03-18,Bench press,10,50kg,Moderate,7,,Warmup,2024-03-18T08:45:00Z +2024-03-18,Bench press,8,55kg,Near failure,9,,Working set,2024-03-18T08:55:00Z +2024-03-18,Sit-ups,61,Body,Failure,10,Arms swing,Core,2024-03-18T09:10:00Z +2024-03-20,Squats,10,60kg,Hard,8,,Warmup,2024-03-20T09:20:00Z +2024-03-20,Squats,8,65kg,Hard,9,,Working set,2024-03-20T09:30:00Z +2024-03-20,Chin-ups,13,Body,Failure,10,,Pull,2024-03-20T09:45:00Z +2024-03-23,Shoulder press,7,25kg,Failure,9,,Push,2024-03-23T11:32:00Z +2024-03-23,Abductor machine,20,55kg,Hard,8,,Accessory,2024-03-23T11:43:00Z +2024-03-26,Squats,10,50kg,Moderate,7,,Warmup,2024-03-26T08:12:00Z +2024-03-26,Squats,8,60kg,Hard,9,,Working set,2024-03-26T08:30:00Z +2024-03-26,Cable row,12,55kg,Failure,9,,Pull,2024-03-26T08:42:00Z +2024-03-26,Bench press decline,11,40kg,Near failure,8,,Push,2024-03-26T09:01:00Z +2024-04-01,Bench press,10,50kg,Moderate,7,,Warmup,2024-04-01T08:49:00Z +2024-04-01,Bench press,8,57.5kg,Hard,9,Palm pain (right),Push,2024-04-01T09:04:00Z +2024-04-01,Tricep pulldowns,18,25kg,Failure,10,,Accessory,2024-04-01T09:22:00Z +2024-04-02,Squats,10,60kg,Hard,8,,Working set,2024-04-02T08:33:00Z +2024-04-02,Chin-ups,9,Body,Failure,10,,Pull,2024-04-02T08:54:00Z +2024-04-02,Shoulder press,8,20kg,Failure,9,,Push,2024-04-02T09:04:00Z +2024-04-04,Bench press,14,50kg,Hard,8,,Push,2024-04-04T08:35:00Z +2024-04-04,Abductor machine,16,65kg,Failure,10,,Accessory,2024-04-04T09:17:00Z +2024-04-07,Clean,10,50kg,Hard,8,,Power,2024-04-07T10:29:00Z +2024-04-07,Deadlift,10,70kg,Hard,9,,Pull,2024-04-07T11:26:00Z +2024-04-13,Bench press,9,55kg,Hard,8,,Push,2024-04-13T10:25:00Z +2024-04-13,Deadlift,10,50kg,Moderate,7,,Warmup,2024-04-13T09:22:00Z +2024-04-16,Deadlift,10,60kg,Hard,8,,Working set,2024-04-16T10:23:00Z +2024-04-18,Pull-ups,7,Body,Failure,10,,Pull,2024-04-18T10:51:00Z +2024-04-20,Deadlift,10,80kg,Hard,9,,Working set,2024-04-20T09:56:00Z +2024-04-23,Squats,8,65kg,Hard,9,,Working set,2024-04-23T08:45:00Z +2024-04-25,Deadlift,10,70kg,Moderate,8,,Warmup,2024-04-25T11:11:00Z +2024-04-29,Clean,10,95lb,Hard,8,,Power,2024-04-30T01:45:00Z +2024-05-01,Squats,10,135lb,Hard,8,Left groin tension,Working set,2024-05-02T01:35:00Z +2024-05-06,Chin-ups,8,Body,Failure,10,,Pull,2024-05-07T12:43:00Z +2024-05-14,Deadlift,10,70kg,Hard,8,,Working set,2024-05-14T08:27:00Z +2024-05-21,Bench press decline,15,60kg,Hard,9,,Push,2024-05-21T09:28:00Z +2024-05-23,Squats,8,70kg,Hard,9,,Working set,2024-05-23T09:14:00Z +2024-05-25,Deadlift,6,100kg,Hard,9,Grip slippage,Working set,2024-05-25T11:26:00Z +2024-05-30,Bench press,10,60kg,Hard,8,,Push,2024-05-30T08:06:00Z +2024-06-01,Squats,6,70kg,Near failure,9,,Working set,2024-06-01T10:39:00Z +`; diff --git a/src/actions.ts b/src/actions.ts index 93267794..14d40733 100644 --- a/src/actions.ts +++ b/src/actions.ts @@ -7,6 +7,7 @@ import { z } from "zod"; import { randomUUID } from "node:crypto"; import { supabase } from "./db.js"; import { config } from "./config.js"; +import { DEFAULT_USER_ID } from "./constants.js"; import { listCanonicalRecordTypes, normalizeRecordType, @@ -843,6 +844,7 @@ app.post("/store_record", async (req, res) => { embedding: providedEmbedding, } = parsed.data; const normalizedType = normalizeRecordType(type).type; + const userId = DEFAULT_USER_ID; // Generate embedding if not provided and OpenAI is configured // Filter out empty arrays - they're invalid for PostgreSQL vector type @@ -903,14 +905,20 @@ app.post("/store_record", async (req, res) => { // Extract and persist entities (FU-101) try { - const { extractEntities, resolveEntity } = await import('./services/entity_resolution.js'); + const { extractEntities, resolveEntity } = await import( + "./services/entity_resolution.js" + ); const entities = extractEntities(properties, normalizedType); const resolvedEntities = []; - + for (const entity of entities) { - const resolved = await resolveEntity(entity.entity_type, entity.raw_value); + const resolved = await resolveEntity( + entity.entity_type, + entity.raw_value, + userId, + ); resolvedEntities.push(resolved); - + // Create record-entity edge await supabase.from("record_entity_edges").insert({ record_id: data.id, @@ -946,8 +954,10 @@ app.post("/store_record", async (req, res) => { // Create observations (FU-058) try { - const { createObservationsFromRecord } = await import('./services/observation_ingestion.js'); - await createObservationsFromRecord(data as any, "00000000-0000-0000-0000-000000000000"); + const { createObservationsFromRecord } = await import( + "./services/observation_ingestion.js" + ); + await createObservationsFromRecord(data as any, userId); } catch (observationError) { // Log observation creation error but don't fail the request logError("ObservationCreationError:store_record", req, observationError); @@ -1851,7 +1861,7 @@ app.post("/create_relationship", async (req, res) => { metadata, } = parsed.data; - const userId = "00000000-0000-0000-0000-000000000000"; // v0.1.0 single-user + const userId = DEFAULT_USER_ID; const { relationshipsService } = await import("./services/relationships.js"); diff --git a/src/constants.ts b/src/constants.ts new file mode 100644 index 00000000..75794660 --- /dev/null +++ b/src/constants.ts @@ -0,0 +1,2 @@ +export const DEFAULT_USER_ID = + process.env.DEFAULT_USER_ID || "00000000-0000-0000-0000-000000000000"; diff --git a/src/server.ts b/src/server.ts index f2b09fc7..153f70ce 100644 --- a/src/server.ts +++ b/src/server.ts @@ -12,6 +12,7 @@ import { z } from "zod"; import { generateEmbedding, getRecordText } from "./embeddings.js"; import { generateRecordSummary } from "./services/summary.js"; import { config } from "./config.js"; +import { DEFAULT_USER_ID } from "./constants.js"; import { normalizeRecordType } from "./config/record_types.js"; import { createRecordFromUploadedFile } from "./services/file_analysis.js"; import { randomUUID } from "node:crypto"; @@ -1854,11 +1855,14 @@ export class NeotomaServer { include_snapshots: z.boolean().default(true), }); const parsed = schema.parse(args ?? {}); + const userId = DEFAULT_USER_ID; // Get total count let countQuery = supabase .from("entities") - .select("*", { count: "exact", head: true }); + .select("*", { count: "exact", head: true }) + .eq("user_id", userId) + .is("merged_to_entity_id", null); if (parsed.entity_type) { countQuery = countQuery.eq("entity_type", parsed.entity_type); @@ -1873,7 +1877,7 @@ export class NeotomaServer { } // Get entities - const entities = await listEntities({ + const entities = await listEntities(userId, { entity_type: parsed.entity_type, limit: parsed.limit, offset: parsed.offset, @@ -1886,7 +1890,8 @@ export class NeotomaServer { const { data: snapshots, error: snapError } = await supabase .from("entity_snapshots") .select("*") - .in("entity_id", entityIds); + .in("entity_id", entityIds) + .eq("user_id", userId); if (!snapError && snapshots) { const snapshotMap = new Map(snapshots.map((s) => [s.entity_id, s])); @@ -1991,6 +1996,7 @@ export class NeotomaServer { entity_type: z.string().optional(), }); const parsed = schema.parse(args ?? {}); + const userId = DEFAULT_USER_ID; // Normalize the identifier const normalized = parsed.entity_type @@ -2001,6 +2007,8 @@ export class NeotomaServer { let query = supabase .from("entities") .select("*") + .eq("user_id", userId) + .is("merged_to_entity_id", null) .or(`canonical_name.ilike.%${normalized}%,aliases.cs.["${normalized}"]`); if (parsed.entity_type) { @@ -2026,6 +2034,8 @@ export class NeotomaServer { .from("entities") .select("*") .eq("id", possibleId) + .eq("user_id", userId) + .is("merged_to_entity_id", null) .single(); if (!idError && entityById) { @@ -2043,7 +2053,8 @@ export class NeotomaServer { const { data: snapshots, error: snapError } = await supabase .from("entity_snapshots") .select("*") - .in("entity_id", entityIds); + .in("entity_id", entityIds) + .eq("user_id", userId); if (!snapError && snapshots) { const snapshotMap = new Map(snapshots.map((s) => [s.entity_id, s])); @@ -2071,6 +2082,7 @@ export class NeotomaServer { include_entities: z.boolean().default(true), }); const parsed = schema.parse(args ?? {}); + const userId = DEFAULT_USER_ID; const visited = new Set([parsed.entity_id]); const relatedEntityIds = new Set(); @@ -2087,7 +2099,8 @@ export class NeotomaServer { let outboundQuery = supabase .from("relationships") .select("*") - .eq("source_entity_id", entityId); + .eq("source_entity_id", entityId) + .eq("user_id", userId); if ( parsed.relationship_types && @@ -2118,7 +2131,8 @@ export class NeotomaServer { let inboundQuery = supabase .from("relationships") .select("*") - .eq("target_entity_id", entityId); + .eq("target_entity_id", entityId) + .eq("user_id", userId); if ( parsed.relationship_types && @@ -2155,7 +2169,9 @@ export class NeotomaServer { const { data: entityData, error: entityError } = await supabase .from("entities") .select("*") - .in("id", Array.from(relatedEntityIds)); + .in("id", Array.from(relatedEntityIds)) + .eq("user_id", userId) + .is("merged_to_entity_id", null); if (!entityError && entityData) { entities = entityData; @@ -2164,7 +2180,8 @@ export class NeotomaServer { const { data: snapshots, error: snapError } = await supabase .from("entity_snapshots") .select("*") - .in("entity_id", Array.from(relatedEntityIds)); + .in("entity_id", Array.from(relatedEntityIds)) + .eq("user_id", userId); if (!snapError && snapshots) { const snapshotMap = new Map(snapshots.map((s) => [s.entity_id, s])); @@ -2200,6 +2217,7 @@ export class NeotomaServer { include_observations: z.boolean().default(false), }); const parsed = schema.parse(args ?? {}); + const userId = DEFAULT_USER_ID; const result: any = { node_id: parsed.node_id, @@ -2212,6 +2230,7 @@ export class NeotomaServer { .from("entities") .select("*") .eq("id", parsed.node_id) + .eq("user_id", userId) .single(); if (entityError || !entity) { @@ -2228,6 +2247,7 @@ export class NeotomaServer { .from("entity_snapshots") .select("*") .eq("entity_id", parsed.node_id) + .eq("user_id", userId) .single(); if (!snapError && snapshot) { @@ -2241,7 +2261,8 @@ export class NeotomaServer { .select("*") .or( `source_entity_id.eq.${parsed.node_id},target_entity_id.eq.${parsed.node_id}`, - ); + ) + .eq("user_id", userId); if (!relError && relationships) { result.relationships = relationships; @@ -2260,7 +2281,9 @@ export class NeotomaServer { const { data: relatedEntities, error: relEntError } = await supabase .from("entities") .select("*") - .in("id", Array.from(relatedEntityIds)); + .in("id", Array.from(relatedEntityIds)) + .eq("user_id", userId) + .is("merged_to_entity_id", null); if (!relEntError && relatedEntities) { result.related_entities = relatedEntities; @@ -2275,6 +2298,7 @@ export class NeotomaServer { .from("observations") .select("*") .eq("entity_id", parsed.node_id) + .eq("user_id", userId) .order("observed_at", { ascending: false }) .limit(100); @@ -2344,7 +2368,8 @@ export class NeotomaServer { const { data: observations, error: obsError } = await supabase .from("observations") .select("*") - .eq("source_record_id", parsed.node_id); + .eq("source_record_id", parsed.node_id) + .eq("user_id", userId); if (!obsError && observations) { result.observations = observations; @@ -2358,7 +2383,9 @@ export class NeotomaServer { const { data: entities, error: entError } = await supabase .from("entities") .select("*") - .in("id", entityIds); + .in("id", entityIds) + .eq("user_id", userId) + .is("merged_to_entity_id", null); if (!entError && entities) { result.related_entities = entities; @@ -2373,6 +2400,7 @@ export class NeotomaServer { ",", )}),target_entity_id.in.(${entityIds.join(",")})`, ) + .eq("user_id", userId) .limit(1000); if (!relError && relationships) { diff --git a/src/services/entity_resolution.ts b/src/services/entity_resolution.ts index 2b754034..5878bcff 100644 --- a/src/services/entity_resolution.ts +++ b/src/services/entity_resolution.ts @@ -6,12 +6,21 @@ import { createHash } from "node:crypto"; import { supabase } from "../db.js"; +import { DEFAULT_USER_ID } from "../constants.js"; export interface Entity { id: string; entity_type: string; canonical_name: string; + aliases: string[]; + metadata: Record; + first_seen_at: string; + last_seen_at: string; + merged_to_entity_id: string | null; + merged_at: string | null; created_at: string; + updated_at: string; + user_id: string; } /** @@ -54,6 +63,7 @@ export function normalizeEntityValue(entityType: string, raw: string): string { export async function resolveEntity( entityType: string, rawValue: string, + userId: string = DEFAULT_USER_ID, ): Promise { const canonicalName = normalizeEntityValue(entityType, rawValue); const entityId = generateEntityId(entityType, canonicalName); @@ -63,10 +73,20 @@ export async function resolveEntity( .from("entities") .select("*") .eq("id", entityId) + .eq("user_id", userId) .single(); if (existing) { - return existing as Entity; + const now = new Date().toISOString(); + await supabase + .from("entities") + .update({ last_seen_at: now }) + .eq("id", entityId) + .eq("user_id", userId); + return { + ...(existing as Entity), + last_seen_at: now, + }; } // Create new entity @@ -78,8 +98,12 @@ export async function resolveEntity( entity_type: entityType, canonical_name: canonicalName, aliases: [], + metadata: {}, + first_seen_at: now, + last_seen_at: now, created_at: now, updated_at: now, + user_id: userId, }) .select() .single(); @@ -162,11 +186,15 @@ export function extractEntities( /** * Get entity by ID */ -export async function getEntityById(entityId: string): Promise { +export async function getEntityById( + entityId: string, + userId: string = DEFAULT_USER_ID, +): Promise { const { data, error } = await supabase .from("entities") .select("*") .eq("id", entityId) + .eq("user_id", userId) .single(); if (error || !data) { @@ -179,12 +207,19 @@ export async function getEntityById(entityId: string): Promise { /** * List entities with optional filters */ -export async function listEntities(filters?: { - entity_type?: string; - limit?: number; - offset?: number; -}): Promise { - let query = supabase.from("entities").select("*"); +export async function listEntities( + userId: string = DEFAULT_USER_ID, + filters?: { + entity_type?: string; + limit?: number; + offset?: number; + }, +): Promise { + let query = supabase + .from("entities") + .select("*") + .eq("user_id", userId) + .is("merged_to_entity_id", null); if (filters?.entity_type) { query = query.eq("entity_type", filters.entity_type); diff --git a/src/services/graph_builder.test.ts b/src/services/graph_builder.test.ts index 6764202c..c9fa4e99 100644 --- a/src/services/graph_builder.test.ts +++ b/src/services/graph_builder.test.ts @@ -6,6 +6,7 @@ import { describe, it, expect, beforeEach, afterEach } from "vitest"; import { supabase } from "../db.js"; +import { DEFAULT_USER_ID } from "../constants.js"; import { detectOrphanNodes, detectCycles, @@ -59,6 +60,7 @@ describe("Graph Builder Service", () => { entity_type: "company", canonical_name: "orphan test company", aliases: [], + user_id: DEFAULT_USER_ID, }) .select() .single(); @@ -126,6 +128,7 @@ describe("Graph Builder Service", () => { entity_type: "company", canonical_name: "test company", aliases: [], + user_id: DEFAULT_USER_ID, }) .select() .single(); @@ -214,16 +217,19 @@ describe("Graph Builder Service", () => { id: "ent_cycle_test_1", entity_type: "company", canonical_name: "company 1", + user_id: DEFAULT_USER_ID, }, { id: "ent_cycle_test_2", entity_type: "company", canonical_name: "company 2", + user_id: DEFAULT_USER_ID, }, { id: "ent_cycle_test_3", entity_type: "company", canonical_name: "company 3", + user_id: DEFAULT_USER_ID, }, ]; @@ -239,21 +245,21 @@ describe("Graph Builder Service", () => { target_entity_id: "ent_cycle_test_2", relationship_type: "PART_OF", metadata: {}, - user_id: "00000000-0000-0000-0000-000000000000", + user_id: DEFAULT_USER_ID, }, { source_entity_id: "ent_cycle_test_2", target_entity_id: "ent_cycle_test_3", relationship_type: "PART_OF", metadata: {}, - user_id: "00000000-0000-0000-0000-000000000000", + user_id: DEFAULT_USER_ID, }, { source_entity_id: "ent_cycle_test_3", target_entity_id: "ent_cycle_test_1", relationship_type: "PART_OF", metadata: {}, - user_id: "00000000-0000-0000-0000-000000000000", + user_id: DEFAULT_USER_ID, }, ]); @@ -274,6 +280,7 @@ describe("Graph Builder Service", () => { entity_type: "company", canonical_name: "orphan company", aliases: [], + user_id: DEFAULT_USER_ID, }) .select() .single(); @@ -336,11 +343,13 @@ describe("Graph Builder Service", () => { id: "ent_cycle_val_1", entity_type: "company", canonical_name: "company 1", + user_id: DEFAULT_USER_ID, }, { id: "ent_cycle_val_2", entity_type: "company", canonical_name: "company 2", + user_id: DEFAULT_USER_ID, }, ]; @@ -356,14 +365,14 @@ describe("Graph Builder Service", () => { target_entity_id: "ent_cycle_val_2", relationship_type: "PART_OF", metadata: {}, - user_id: "00000000-0000-0000-0000-000000000000", + user_id: DEFAULT_USER_ID, }, { source_entity_id: "ent_cycle_val_2", target_entity_id: "ent_cycle_val_1", relationship_type: "PART_OF", metadata: {}, - user_id: "00000000-0000-0000-0000-000000000000", + user_id: DEFAULT_USER_ID, }, ]); diff --git a/src/services/observation_ingestion.ts b/src/services/observation_ingestion.ts index 19cac3a6..9f57c6af 100644 --- a/src/services/observation_ingestion.ts +++ b/src/services/observation_ingestion.ts @@ -12,6 +12,7 @@ import { emitRecordCreated } from "../events/event_emitter.js"; import type { NeotomaRecord } from "../db.js"; import type { PayloadSubmission } from "./payload_schema.js"; import type { Capability } from "./capability_registry.js"; +import { DEFAULT_USER_ID } from "../constants.js"; export interface ObservationCreationResult { record: NeotomaRecord; @@ -28,7 +29,7 @@ export interface ObservationCreationResult { */ export async function createObservationsFromRecord( record: NeotomaRecord, - userId: string = "00000000-0000-0000-0000-000000000000", // Default for v0.1.0 single-user + userId: string = DEFAULT_USER_ID, ): Promise { const entities = extractEntities(record.properties, record.type); const observations = []; @@ -46,7 +47,11 @@ export async function createObservationsFromRecord( // Resolve entities and create observations for (const entity of entities) { - const resolved = await resolveEntity(entity.entity_type, entity.raw_value); + const resolved = await resolveEntity( + entity.entity_type, + entity.raw_value, + userId, + ); const schema = schemaMap.get(entity.entity_type); const schemaVersion = schema?.schema_version || "1.0"; @@ -183,7 +188,7 @@ export async function createObservationsFromRecord( export async function createObservationsFromPayload( payload: PayloadSubmission, capability: Capability, - userId: string = "00000000-0000-0000-0000-000000000000", + userId: string = DEFAULT_USER_ID, ): Promise { const allEntities: Array<{ entity_type: string; raw_value: string }> = []; @@ -249,7 +254,11 @@ export async function createObservationsFromPayload( // Resolve all entities and create observations for (const entity of allEntities) { - const resolved = await resolveEntity(entity.entity_type, entity.raw_value); + const resolved = await resolveEntity( + entity.entity_type, + entity.raw_value, + userId, + ); const schema = schemaMap.get(entity.entity_type); const schemaVersion = schema?.schema_version || "1.0"; diff --git a/src/services/payload_compiler.ts b/src/services/payload_compiler.ts index cb4ad370..838a7381 100644 --- a/src/services/payload_compiler.ts +++ b/src/services/payload_compiler.ts @@ -14,6 +14,7 @@ */ import { supabase } from "../db.js"; +import { DEFAULT_USER_ID } from "../constants.js"; import { getCapability } from "./capability_registry.js"; import type { PayloadEnvelope, @@ -38,7 +39,7 @@ export async function compilePayload( options: CompilePayloadOptions = {}, ): Promise { const { - userId = "00000000-0000-0000-0000-000000000000", + userId = DEFAULT_USER_ID, skipObservations = false, } = options; diff --git a/supabase/migrations/20251218100000_fu_113_entity_extensions.sql b/supabase/migrations/20251218100000_fu_113_entity_extensions.sql new file mode 100644 index 00000000..ee787bb5 --- /dev/null +++ b/supabase/migrations/20251218100000_fu_113_entity_extensions.sql @@ -0,0 +1,43 @@ +-- Migration: FU-113 Entity Extensions +-- Adds user isolation + merge tracking columns to entities table + +BEGIN; + +ALTER TABLE entities + ADD COLUMN IF NOT EXISTS metadata JSONB NOT NULL DEFAULT '{}', + ADD COLUMN IF NOT EXISTS first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + ADD COLUMN IF NOT EXISTS last_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + ADD COLUMN IF NOT EXISTS user_id UUID, + ADD COLUMN IF NOT EXISTS merged_to_entity_id TEXT REFERENCES entities(id), + ADD COLUMN IF NOT EXISTS merged_at TIMESTAMPTZ; + +UPDATE entities +SET + user_id = COALESCE(user_id, '00000000-0000-0000-0000-000000000000'), + first_seen_at = COALESCE(first_seen_at, created_at, NOW()), + last_seen_at = COALESCE(last_seen_at, updated_at, NOW()) +WHERE user_id IS NULL + OR first_seen_at IS NULL + OR last_seen_at IS NULL; + +ALTER TABLE entities + ALTER COLUMN user_id SET NOT NULL; + +CREATE INDEX IF NOT EXISTS idx_entities_user ON entities(user_id); +CREATE INDEX IF NOT EXISTS idx_entities_user_type ON entities(user_id, entity_type); +CREATE INDEX IF NOT EXISTS idx_entities_user_type_name + ON entities(user_id, entity_type, canonical_name); +CREATE INDEX IF NOT EXISTS idx_entities_merged + ON entities(user_id, merged_to_entity_id) + WHERE merged_to_entity_id IS NOT NULL; + +ALTER TABLE entities ENABLE ROW LEVEL SECURITY; +DROP POLICY IF EXISTS "public read - entities" ON entities; +DROP POLICY IF EXISTS "Users read own entities" ON entities; +CREATE POLICY "Users read own entities" ON entities + FOR SELECT USING (user_id = auth.uid()); +DROP POLICY IF EXISTS "Service role full access - entities" ON entities; +CREATE POLICY "Service role full access - entities" ON entities + FOR ALL TO service_role USING (true) WITH CHECK (true); + +COMMIT; diff --git a/supabase/schema.sql b/supabase/schema.sql index 7db00242..cbd70422 100644 --- a/supabase/schema.sql +++ b/supabase/schema.sql @@ -477,25 +477,39 @@ CREATE TABLE IF NOT EXISTS entities ( entity_type TEXT NOT NULL, canonical_name TEXT NOT NULL, aliases JSONB DEFAULT '[]', + metadata JSONB NOT NULL DEFAULT '{}', + first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + last_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_at TIMESTAMPTZ DEFAULT NOW(), - updated_at TIMESTAMPTZ DEFAULT NOW() + updated_at TIMESTAMPTZ DEFAULT NOW(), + user_id UUID NOT NULL, + merged_to_entity_id TEXT REFERENCES entities(id), + merged_at TIMESTAMPTZ ); -- Indexes for entities CREATE INDEX IF NOT EXISTS idx_entities_type ON entities(entity_type); CREATE INDEX IF NOT EXISTS idx_entities_canonical_name ON entities(canonical_name); CREATE INDEX IF NOT EXISTS idx_entities_type_name ON entities(entity_type, canonical_name); +CREATE INDEX IF NOT EXISTS idx_entities_user ON entities(user_id); +CREATE INDEX IF NOT EXISTS idx_entities_user_type ON entities(user_id, entity_type); +CREATE INDEX IF NOT EXISTS idx_entities_user_type_name + ON entities(user_id, entity_type, canonical_name); +CREATE INDEX IF NOT EXISTS idx_entities_merged + ON entities(user_id, merged_to_entity_id) + WHERE merged_to_entity_id IS NOT NULL; -- RLS policies for entities ALTER TABLE entities ENABLE ROW LEVEL SECURITY; +DROP POLICY IF EXISTS "Users read own entities" ON entities; +CREATE POLICY "Users read own entities" ON entities + FOR SELECT USING (user_id = auth.uid()); + DROP POLICY IF EXISTS "Service role full access - entities" ON entities; CREATE POLICY "Service role full access - entities" ON entities FOR ALL TO service_role USING (true) WITH CHECK (true); -DROP POLICY IF EXISTS "public read - entities" ON entities; -CREATE POLICY "public read - entities" ON entities FOR SELECT USING (true); - -- Timeline events table (FU-102) CREATE TABLE IF NOT EXISTS timeline_events ( id TEXT PRIMARY KEY,