Skip to content

feat: knowledge graph schema, entity resolution, and MCP tools#20

Merged
mikejgray merged 3 commits intomainfrom
feat/knowledge-graph
Mar 25, 2026
Merged

feat: knowledge graph schema, entity resolution, and MCP tools#20
mikejgray merged 3 commits intomainfrom
feat/knowledge-graph

Conversation

@mikejgray
Copy link
Copy Markdown
Collaborator

Summary

  • Adds three new tables (entities, knowledge, episode_links) created automatically on startup — no migration required for existing databases
  • Adds add_knowledge MCP tool: stores subject-predicate-object triples with embedding-based entity resolution (cosine ≥ 0.88), controlled predicate vocabulary (15 predicates), and full provenance tracking (source, confidence, verified flag)
  • Adds link_episodes MCP tool: creates directional links between episodes with automatic deduplication
  • 10 new tests covering entity CRUD, triple insertion, link deduplication, bidirectional retrieval, and table creation

Context

This is the foundation for v3.0 (knowledge graph + Dreamer). Frontier models can start building the graph immediately through normal tool use. Dreamer (async local-model enrichment) will build on this in a subsequent PR.

No breaking changes — all additions are purely additive.

Test plan

  • All existing tests pass (no regressions)
  • New entity tests: create, retrieve, default group_id
  • New triple tests: insert with defaults, insert with custom confidence
  • New link tests: create, deduplicate, bidirectional retrieval, via_entity_id
  • Graph table existence verification
  • Manual: start engram serve, call add_knowledge and link_episodes via MCP client

🤖 Generated with Claude Code

mikejgray and others added 3 commits March 24, 2026 22:28
Add the foundational knowledge graph layer for v3.0. Three new tables
(entities, knowledge, episode_links) enable frontier models to build
a graph of facts and connections between episodes via MCP tool use.

Entity resolution uses embedding similarity (cosine >= 0.88) so
"DuckDB" and "duckdb" resolve to the same node without heuristics.
Knowledge triples carry full provenance (source, confidence, verified)
and use a controlled predicate vocabulary. Episode links deduplicate
automatically.

No breaking changes — new tables are created on startup alongside
existing episodes table, and two new MCP tools (add_knowledge,
link_episodes) are additive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ings

nomic-embed-text produces degenerate embeddings for short texts —
"Mike" vs "DuckDB" scores 1.0 cosine similarity. Entity resolution
now uses two string-based strategies:

1. Case-insensitive exact match: "Mike" = "mike"
2. Normalized match (strip non-alphanumeric, lowercase):
   "OscillateLabs" = "Oscillate Labs" = "Oscillate Labs LLC"

Embeddings are still stored on entities for future use with
longer-context models, but not used for resolution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address code review feedback:

- Add UNIQUE index on entities (LOWER(canonical_name), group_id) to
  prevent duplicate entities from concurrent writes
- Add UNIQUE constraint on episode_links (source, target, relationship)
  and replace check-then-insert with INSERT OR IGNORE
- Add tests for: case-insensitive entity resolution, normalized name
  matching (spaces, hyphens), group isolation, distinct entity
  separation, normalizeEntityName function, and UNIQUE constraint
  dedup behavior
- Total: 21 tests (was 10)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mikejgray mikejgray merged commit 2fd857b into main Mar 25, 2026
4 checks passed
@mikejgray mikejgray deleted the feat/knowledge-graph branch March 25, 2026 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant