Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 56 additions & 7 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,12 @@ Cortex is a self-organizing graph memory engine built in Rust. It combines tradi

**Design**:
- Primary table: `nodes` (NodeId → Node)
- Includes `valid_from`, `valid_until` (temporal validity window)
- Includes `expires_at` (lifecycle GC)
- Includes `embedding_model` (vector provenance tracking)
- Edge table: `edges` (EdgeId → Edge)
- Includes `metadata` (extensible HashMap for edge context)
- `EdgeProvenance::Custom` variant for forward-compatible linking mechanisms
- Secondary indexes for filtering:
- `nodes_by_kind` (NodeKind → Set<NodeId>)
- `nodes_by_source` (SourceAgent → Set<NodeId>)
Expand Down Expand Up @@ -121,13 +126,15 @@ score = α × vector_similarity + (1-α) × graph_proximity
└─────────────────────────────────────┘
```

2. **Link Rules**:
- **Similarity**: Vector similarity > 0.85 → Similar edge
- **Temporal**: Events in sequence → Precedes edge
- **Source**: Same session → PartOf edge
- **Causality**: Decision before event → Causes edge
- **Support**: Fact near decision → Supports edge
- **Reference**: Title mentions → References edge
2. **Configurable Rules** (ConfigRule):
- Rules define from_kind, to_kind, relation, weight, and a condition
- Wildcard kinds (`"*"`) match any node kind
- Condition types: min_similarity, shared_tags, temporal_proximity, newer_than, same_agent, body_field_ref, tag_references_title, negation_detected
- Legacy hardcoded rules (similarity, temporal, source, causality, support, reference) are automatically disabled when config rules are defined

**Entity features**:
- **Entity co-occurrence**: nodes from different agents referencing the same entity get `shared_entity` edges
- **Entity promotion**: periodic scan promotes entity strings referenced by 2+ agents to first-class entity nodes

3. **Contradiction Detection**:
- Finds semantically similar nodes with opposite polarity
Expand Down Expand Up @@ -188,6 +195,48 @@ score = α × vector_similarity + (1-α) × graph_proximity
- Deduplicates by title+session
- Event types: stage.advanced, item.completed, evidence.submitted, etc.

### Trust Engine (`cortex-core/trust`)

**Purpose**: Compute trust scores from graph topology at query time.

Trust is never stored as a field. It's derived from the neighbourhood of a node, like PageRank derives authority from link structure.

**Five signals**:
1. **Corroboration** -- how many independent agents stored similar facts
2. **Contradiction penalty** -- unresolved `contradicts` edges reduce trust
3. **Source track record** -- historical accuracy of the authoring agent
4. **Access reinforcement** -- frequently retrieved, never corrected
5. **Freshness** -- recency of last access

**Combination**: configurable weighted sum, defaults in `[trust]` config.

**Caching**: source reliability is cached per-agent, refreshed every N auto-linker cycles. All other signals are computed per-query from live graph state.

### Entity Layer

**Purpose**: Cross-agent discovery via shared entity resolution.

Entity nodes are regular nodes with `kind: "entity"` and `metadata.entity_type`. They serve as hub nodes: all knowledge about "Company X" from every agent converges on a single entity node via `references` edges.

**Entity extraction**: `metadata.entities` array and `entity-` prefixed tags.

**Auto-promotion**: when 2+ agents mention the same normalised entity string, the auto-linker promotes it to a first-class entity node.

**Agent nodes**: deprecated `kind: "agent"` migrates to `kind: "entity"` with `metadata.entity_type: "agent"`.

### Briefing Engine (`cortex-core/briefing`)

**Purpose**: Generate structured context documents for agents.

Briefings use a **role-based** configuration rather than hardcoded sections. Each role (identity, persistent, trackable, temporal, reviewable, superseding) defines a retrieval strategy. Kinds are mapped to roles in config.

**Scope parameter**:
- **Agent** (default): only the requesting agent's knowledge
- **Shared**: agent's knowledge plus cross-agent context about shared entities (via two-hop entity traversal)
- **Unified**: multi-agent briefing for orchestrators, spans multiple agents

**Trust-aware ranking**: when trust scoring is enabled, nodes are ranked by `0.6 * importance + 0.4 * trust_score`.

## Data Flow

### Ingestion Flow
Expand Down
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,39 @@

All notable changes to Cortex are documented in this file.

## [0.3.0] - YYYY-MM-DD

### Added
- **Temporal validity** -- Node fields: `valid_from`, `valid_until` for truth windows.
Query with `valid_at()` filter.
- **Lifecycle expiry** -- Node field: `expires_at`. Retention engine auto-sweeps
expired nodes.
- **Embedding model tracking** -- Node field: `embedding_model`. Tracks which
model generated each vector for migration detection.
- **Edge metadata** -- Extensible HashMap on edges for contextual data.
- **Custom provenance** -- `EdgeProvenance::Custom` variant for forward-compatible
linking mechanisms.
- **Trust scoring** -- Compute trust from graph topology: corroboration,
contradiction, source reliability, access reinforcement, freshness.
- **Entity layer** -- Entity nodes (`kind: "entity"` + `metadata.entity_type`),
`authored_by`/`references` relations, auto-promotion from co-occurrence.
- **Briefing roles** -- Configurable mapping from node kinds to briefing roles
(identity, persistent, trackable, temporal, reviewable, superseding).
- **Briefing scope** -- Agent (default), Shared (cross-agent), Unified
(orchestrator) scope parameter.
- **Metadata query filter** -- `NodeFilter.with_metadata(key, value)` for
querying by metadata values.
- **Legacy rule deprecation** -- Hardcoded structural rules replaced by
configurable `[[auto_linker.rules]]` with wildcard kind support.
- **Metadata conventions** -- Well-known metadata keys documented for
interoperability (`entity_type`, `aliases`, `parent_agent`, `task_id`, etc).

### Changed
- Briefing engine uses role-based config instead of hardcoded section kinds.
- Auto-linker supports entity co-occurrence and entity promotion.
- Retention engine respects `expires_at` field.
- `cortex init` accepts `--template` flag for agent-type presets.

## [0.2.0] - 2026-03-14

### Added
Expand Down
13 changes: 13 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ cortex shell # Interactive REPL
- `Cortex::open(path)` — library mode, no server needed
- `Storage` trait — `RedbStorage` implements it
- `IngestAdapter` trait — pluggable event sources
- `TrustEngine` -- computes trust from graph topology, no stored confidence
- `BriefingRoleConfig` -- maps node kinds to briefing roles (identity, persistent, trackable, temporal, reviewable, superseding)
- `BriefingScope` -- Agent (default), Shared (cross-agent), Unified (orchestrator)
- Entity convention: `kind: "entity"` + `metadata.entity_type`
- Config: `cortex.toml` with `#[serde(default)]` on all structs

## Architecture decisions
Expand All @@ -57,6 +61,10 @@ cortex shell # Interactive REPL
- **gRPC** (tonic) for production API, **HTTP** (axum) for debug/viz
- **warren-adapter** is optional (`--features warren`), cortex-core has zero network deps
- Auto-linker runs background loop: similarity rules → edges, decay → prune, dedup → merge
- **Trust from topology** -- confidence is computed at query time from graph structure (corroboration, contradiction, source reliability, access, freshness). Never stored as a field.
- **Entities as convention** -- entity nodes are regular nodes with kind: "entity" and metadata.entity_type. Not a separate primitive. Relations: authored_by, references.
- **Shared graph model** -- one graph, all agents write to it. Isolation is deployment (run two instances), not data layer scoping.
- **Temporal validity** -- valid_from/valid_until for epistemic truth windows. expires_at for lifecycle GC. Distinct concerns.

## Config

Expand Down Expand Up @@ -99,6 +107,11 @@ order, adding fields mid-struct, or removing fields **silently corrupts all exis
The `test_node_schema_golden` test in `redb_storage.rs` will fail immediately if the bincode
format changes without these steps being followed.

**Fields added in evolution (specs 09-10):**
- Node: `valid_from`, `valid_until`, `expires_at`, `embedding_model` (all Option, serde default None)
- Edge: `metadata` (HashMap, serde default empty)
- EdgeProvenance: `Custom { kind, detail }` variant

## Common pitfalls

- Port 9090 may conflict with existing services — override in `cortex.toml`
Expand Down
Loading
Loading