feat: knowledge graph schema, entity resolution, and MCP tools#20
Merged
feat: knowledge graph schema, entity resolution, and MCP tools#20
Conversation
Add the foundational knowledge graph layer for v3.0. Three new tables (entities, knowledge, episode_links) enable frontier models to build a graph of facts and connections between episodes via MCP tool use. Entity resolution uses embedding similarity (cosine >= 0.88) so "DuckDB" and "duckdb" resolve to the same node without heuristics. Knowledge triples carry full provenance (source, confidence, verified) and use a controlled predicate vocabulary. Episode links deduplicate automatically. No breaking changes — new tables are created on startup alongside existing episodes table, and two new MCP tools (add_knowledge, link_episodes) are additive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ings nomic-embed-text produces degenerate embeddings for short texts — "Mike" vs "DuckDB" scores 1.0 cosine similarity. Entity resolution now uses two string-based strategies: 1. Case-insensitive exact match: "Mike" = "mike" 2. Normalized match (strip non-alphanumeric, lowercase): "OscillateLabs" = "Oscillate Labs" = "Oscillate Labs LLC" Embeddings are still stored on entities for future use with longer-context models, but not used for resolution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address code review feedback: - Add UNIQUE index on entities (LOWER(canonical_name), group_id) to prevent duplicate entities from concurrent writes - Add UNIQUE constraint on episode_links (source, target, relationship) and replace check-then-insert with INSERT OR IGNORE - Add tests for: case-insensitive entity resolution, normalized name matching (spaces, hyphens), group isolation, distinct entity separation, normalizeEntityName function, and UNIQUE constraint dedup behavior - Total: 21 tests (was 10) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
entities,knowledge,episode_links) created automatically on startup — no migration required for existing databasesadd_knowledgeMCP tool: stores subject-predicate-object triples with embedding-based entity resolution (cosine ≥ 0.88), controlled predicate vocabulary (15 predicates), and full provenance tracking (source, confidence, verified flag)link_episodesMCP tool: creates directional links between episodes with automatic deduplicationContext
This is the foundation for v3.0 (knowledge graph + Dreamer). Frontier models can start building the graph immediately through normal tool use. Dreamer (async local-model enrichment) will build on this in a subsequent PR.
No breaking changes — all additions are purely additive.
Test plan
engram serve, calladd_knowledgeandlink_episodesvia MCP client🤖 Generated with Claude Code