GBrain Infrastructure Layer

The shared foundation that all skills, recipes, and integrations build on.

Data Pipeline

INPUT (markdown files, git repo)
  ↓
FILE RESOLUTION (local → .redirect → .supabase → error)
  ↓
MARKDOWN PARSER (gray-matter frontmatter + body)
  → compiled_truth + timeline separation
  ↓
CONTENT HASH (SHA-256 idempotency check — skip if unchanged)
  ↓
CHUNKING (3 strategies, configurable)
  ├── Recursive: 300-word chunks, 50-word overlap, 5-level delimiter hierarchy
  ├── Semantic: embed sentences, cosine similarity, Savitzky-Golay smoothing
  └── LLM-guided: Claude Haiku identifies topic shifts in 128-word candidates
  ↓
EMBEDDING (OpenAI text-embedding-3-large, 1536 dimensions)
  → batch 100, exponential backoff, non-fatal if fails
  ↓
DATABASE TRANSACTION (atomic: page + chunks + tags + version)
  ↓
SEARCH (hybrid, available immediately)

Search Architecture

GBrain uses Reciprocal Rank Fusion (RRF) to merge vector and keyword search:

User Query
  ↓
EXPANSION (optional: Claude Haiku generates 2 alternative phrasings)
  ↓
  ├── VECTOR SEARCH (pgvector HNSW, cosine distance)
  │     → 2x limit results per query variant
  │
  └── KEYWORD SEARCH (PostgreSQL tsvector, ts_rank)
        → 2x limit results
  ↓
RRF MERGE (score = Σ(1/(60 + rank)), balances both fairly)
  ↓
4-LAYER DEDUP
  ├── Best 3 chunks per page (source dedup)
  ├── Jaccard similarity > 0.85 (text dedup)
  ├── No type exceeds 60% (diversity)
  └── Max 2 chunks per page (page cap)
  ↓
TOP N RESULTS (default 20)

Key Components

File	Purpose
`src/core/engine.ts`	Pluggable engine interface (BrainEngine)
`src/core/postgres-engine.ts`	Postgres + pgvector implementation
`src/core/import-file.ts`	importFromFile + importFromContent pipeline
`src/core/sync.ts`	Git-based incremental change detection
`src/core/markdown.ts`	YAML frontmatter + compiled_truth/timeline parsing
`src/core/embedding.ts`	OpenAI embedding with batch, retry, backoff
`src/core/chunkers/recursive.ts`	Base chunker (300w, 5-level delimiters)
`src/core/chunkers/semantic.ts`	Embedding-based topic boundary detection
`src/core/chunkers/llm.ts`	Claude Haiku guided chunking
`src/core/search/hybrid.ts`	RRF merge of vector + keyword
`src/core/search/dedup.ts`	4-layer result deduplication
`src/core/search/expansion.ts`	Multi-query expansion via Claude Haiku
`src/core/storage.ts`	Pluggable storage (S3, Supabase, local)
`src/core/operations.ts`	Contract-first operation definitions (31 ops)
`src/schema.sql`	Full DDL (10 tables, RLS, tsvector, HNSW)

Schema Overview

10 tables in Postgres:

pages — slug (unique), type, title, compiled_truth, timeline, frontmatter (JSONB)
content_chunks — pgvector 1536-dim embedding, chunk_source (compiled_truth|timeline)
links — typed edges (knows, works_at, invested_in, founded, etc.)
tags — many-to-many page tagging
timeline_entries — structured events (date, source, summary, detail)
page_versions — snapshot history for diff/revert
raw_data — sidecar JSON from external APIs (preserves provenance)
files — binary attachments in storage backend
ingest_log — audit trail of import operations
config — brain-level settings (version, embedding model, chunk strategy)

Full-text search uses weighted tsvector: title (A), compiled_truth (B), timeline (C). Vector search uses HNSW index with cosine distance on content_chunks.embedding.

The Thin Harness Principle

GBrain is the deterministic layer. Skills and recipes are the latent space layer.

See Thin Harness, Fat Skills for the full architecture philosophy.

GBrain CLI = thin harness (same input → same output)
Skills (ingest, query, maintain, enrich, briefing, migrate, setup) = fat skills
Recipes (voice-to-brain, email-to-brain) = fat skills that install infrastructure

The agent reads the skill/recipe and uses GBrain's deterministic tools to do the work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GBrain Infrastructure Layer

Data Pipeline

Search Architecture

Key Components

Schema Overview

The Thin Harness Principle

FilesExpand file tree

infra-layer.md

Latest commit

History

infra-layer.md

File metadata and controls

GBrain Infrastructure Layer

Data Pipeline

Search Architecture

Key Components

Schema Overview

The Thin Harness Principle