Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions docs/concepts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# What Is This?

You give it documents. It finds the ideas, connects them, and remembers what contradicts what.

## The 30-Second Version

Feed the system your documents - research papers, notes, articles, reports. It reads them and extracts the key ideas. Then it connects those ideas: this concept *supports* that one, this claim *contradicts* that one, this cause *leads to* that effect.

Unlike a search engine that just finds keywords, this system understands meaning. Ask "what causes inflation?" and it finds concepts related to inflation's causes - even if those exact words don't appear in your documents.

Unlike a chatbot that makes things up, every idea traces back to its source. You can always ask "where did this come from?" and get a real answer.

## The Real Point

This isn't just a search tool for humans. It's infrastructure for AI that can reason about what it knows.

Most AI "memory" is just similarity search - find things that look like what you asked for. This system tracks:

- **Grounding**: How well-supported is this idea? One source or twenty?
- **Contradiction**: Do sources disagree? Which ones?
- **Provenance**: Where exactly did this idea come from?

That's the foundation for AI that doesn't just retrieve information but *reasons about how reliable it is*.

Current state: AI assistants can query the system via standard protocols (MCP).
Future state: The knowledge graph becomes part of how AI thinks, not just something it queries.

## What Can You Do With It?

**As a human:**
- Search your documents by meaning, not just keywords
- See how ideas connect across different sources
- Find where your sources contradict each other
- Trace any claim back to its origin

**As an AI agent:**
- Query persistent memory that survives across sessions
- Get grounded answers with confidence levels
- Reason about contradictions and uncertainty
- Build knowledge incrementally over time

## Next Steps

- [How It Works](how-it-works.md) - The conceptual model (still no code)
- [Glossary](glossary.md) - Terms explained in plain language
- [Using the System](../using/README.md) - Getting started as a user
- [Operating the System](../operating/README.md) - Deploying and maintaining
172 changes: 172 additions & 0 deletions docs/concepts/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Glossary

Terms used in this system, explained in plain language.

---

## Concept

An idea extracted from a document. Not a keyword - a meaningful unit of thought.

Examples:
- "Climate change increases extreme weather events"
- "The mitochondria is the powerhouse of the cell"
- "Napoleon was defeated at Waterloo in 1815"

Concepts can be claims, definitions, events, entities, or other types. Each concept has a grounding score indicating how well-supported it is.

---

## Relationship

A connection between two concepts. The system discovers how ideas relate to each other.

Common relationship types:
- **Supports**: One concept provides evidence for another
- **Contradicts**: Two concepts are in tension or conflict
- **Implies**: If one is true, the other follows
- **Causes**: One concept leads to another
- **Is Part Of**: One concept belongs to a larger whole
- **Is Example Of**: One concept illustrates another

---

## Grounding

A measure of how well-supported a concept is. High grounding means many sources confirm the idea. Low grounding means few sources mention it.

Grounding considers:
- Number of sources mentioning the concept
- Whether sources agree or disagree
- Strength of the evidence in each source

A grounding score ranges from -1.0 (strongly contradicted) to +1.0 (strongly supported). Near zero means mixed or insufficient evidence.

---

## Source

A chunk of original text from a document. Sources are the evidence - they're what concepts are extracted from.

Each source preserves:
- The actual text
- Which document it came from
- Location information (for highlighting and reference)

When you want to verify a concept, you trace it back to its sources.

---

## Evidence

The link between a concept and a source. Evidence shows *which specific text* led to *which concept*.

Multiple sources can provide evidence for the same concept. When they do, the concept's grounding increases.

---

## Provenance

The chain of origin for any piece of knowledge. Provenance answers "where did this come from?"

For a concept, provenance traces:
Document → Chunk → Extraction → Concept

This matters because claims without provenance can't be verified.

---

## Ontology

A collection of related knowledge. Think of it as a named knowledge base.

You might create separate ontologies for:
- "Research Papers"
- "Company Documentation"
- "Meeting Notes"

Ontologies can be queried separately or together. They help organize knowledge into meaningful collections.

---

## Epistemic Status

The reliability classification of knowledge. Describes whether something is well-established, contested, or uncertain.

Possible statuses:
- **Affirmative**: Well-supported, high confidence
- **Contested**: Sources disagree
- **Contradictory**: Strong evidence against
- **Insufficient Data**: Not enough sources to judge
- **Historical**: Considered accurate for its time period

---

## Semantic Search

Finding concepts by meaning, not just matching keywords.

Search for "economic downturn" and find concepts about recessions, market crashes, and financial crises - even if none use those exact words.

This works because concepts are compared by what they mean, not just what words they contain.

---

## Contradiction

When sources disagree. The system tracks contradictions rather than hiding them.

Example: One paper says "coffee prevents heart disease" while another says "coffee increases heart disease risk." Both concepts are stored with their sources, and the contradiction is noted.

This lets you (or an AI) reason about disagreements rather than pretending they don't exist.

---

## Ingestion

The process of adding documents to the system. During ingestion:
1. Documents are stored
2. Text is split into chunks
3. Concepts are extracted from each chunk
4. Relationships are discovered
5. Grounding is calculated

---

## MCP (Model Context Protocol)

A standard way for AI assistants to use external tools. This system provides MCP tools so AI agents like Claude can:
- Search concepts
- Explore relationships
- Query grounding
- Ingest new documents

This is how AI assistants gain persistent memory.

---

## Chunk

A portion of a document, roughly page-sized. Documents are split into chunks for processing.

Chunks preserve context - they overlap slightly so ideas that span a page break aren't lost.

---

## Instance

A specific occurrence of a concept in a source. If the same concept appears in three documents, there are three instances but one concept.

Instances are the individual sightings. The concept is the aggregated understanding.

---

## Diversity Score

A measure of how broadly connected a concept is. High diversity means the concept connects to many different topics. Low diversity means it's narrowly focused.

Useful for finding concepts that bridge different domains.

---

Next: [Using the System](../using/README.md) - Getting started as a user
132 changes: 132 additions & 0 deletions docs/concepts/how-it-works.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# How It Works

A conceptual overview. No code, no implementation details - just the model.

## The Flow

```
Documents → Extraction → Connection → Grounding
```

### 1. Documents Go In

You provide documents: PDFs, text files, markdown, web pages. The system stores the original text so you can always go back to the source.

Documents are split into manageable chunks - roughly page-sized pieces that can be processed individually while preserving context.

### 2. Ideas Come Out

Each chunk is analyzed to extract the key ideas. Not keywords - *concepts*.

A concept is a meaningful unit of thought: "inflation reduces purchasing power" or "sleep deprivation impairs memory" or "the French Revolution began in 1789."

The extraction finds:
- What the concept is (the idea itself)
- What type it is (claim, definition, event, entity, etc.)
- How it relates to other concepts in the same chunk

### 3. Connections Form

Concepts don't exist in isolation. The system discovers relationships:

| Relationship | Meaning |
|--------------|---------|
| **Supports** | This concept provides evidence for that one |
| **Contradicts** | These concepts are in tension |
| **Implies** | If this is true, that follows |
| **Causes** | This leads to that |
| **Part of** | This belongs to a larger whole |

When a new concept matches one that already exists, they're merged. The connection grows stronger. When they conflict, both views are preserved with their sources.

### 4. Grounding Accumulates

As more documents come in, concepts gain *grounding* - a measure of how well-supported they are.

- A concept mentioned in one source has low grounding
- The same concept confirmed across many sources has high grounding
- A concept that some sources support and others contradict has mixed grounding

Grounding isn't just a count. It considers:
- How many sources mention the concept
- Whether sources agree or disagree
- The strength of the supporting evidence

## What Gets Remembered

The system maintains five types of information:

### Concepts
The ideas themselves. Each concept has:
- A name or description
- A type (claim, entity, event, etc.)
- Grounding score (how well-supported)

### Relationships
How concepts connect. Each relationship has:
- Source concept and target concept
- Type (supports, contradicts, implies, etc.)
- Evidence for why this connection exists

### Sources
The original text chunks. Each source has:
- The actual text
- Which document it came from
- Where in the document (for highlighting)

### Evidence
The link between concepts and sources. Shows exactly which text led to which concept.

### Ontologies
Collections of related knowledge. You might have one ontology for "climate research" and another for "company policies." They can be queried separately or together.

## How Queries Work

When you search, you're not matching keywords. You're finding concepts similar in *meaning* to what you're looking for.

Ask about "economic downturn" and you'll find concepts about recessions, market crashes, and financial crises - even if none of them use the exact phrase "economic downturn."

Results include:
- The matching concepts
- Their grounding scores (how reliable)
- The sources they came from (where to verify)
- Related concepts (what else connects)

## How Contradiction Works

Traditional databases assume consistency - if two things conflict, one is wrong. This system assumes **reality is messy**.

When sources disagree, the system:
1. Keeps both viewpoints
2. Records which sources support which view
3. Notes that a contradiction exists
4. Lets you (or an AI) reason about the disagreement

This is crucial for:
- Research where experts disagree
- Historical documents with conflicting accounts
- Evolving knowledge where old information conflicts with new

## The Epistemic Layer

*Epistemic* means "relating to knowledge." This system has an epistemic layer that most databases lack.

It doesn't just store *what* is claimed. It tracks:
- **Confidence**: How well-supported is this claim?
- **Controversy**: Do sources agree or disagree?
- **Provenance**: Where did this claim originate?
- **Freshness**: When was this last confirmed?

This matters because knowledge isn't certain. An AI using this system can say "this is well-established" vs "this is contested" vs "this comes from a single source and should be verified."

## What This Enables

For humans: Search that understands meaning. Sources that trace back. Contradictions made visible.

For AI agents: Memory that persists. Confidence that's grounded. Uncertainty that's explicit.

For both: Knowledge that accumulates over time without losing track of where it came from.

---

Next: [Glossary](glossary.md) - Terms explained in plain language
Loading