Skip to content

Database Schema

Mundo edited this page Apr 8, 2026 · 2 revisions

Database Schema

Overview

Oh My Workers uses PostgreSQL (hosted on Neon) with the native pg driver — no ORM. All queries use parameterized statements.

Two database connections:

Connection Purpose
DATABASE_URL Main app data (kpi, diary, cleanup_log, github_trending)
COMPANY_DB_URL External company database for cleanup operations

Tables

kpi

Daily GitHub activity records combined with manual input.

CREATE TABLE IF NOT EXISTS kpi (
  id             SERIAL PRIMARY KEY,
  github_summary TEXT    NOT NULL,
  commits_count  INTEGER NOT NULL DEFAULT 0,
  prs_count      INTEGER NOT NULL DEFAULT 0,
  activities     TEXT[]  NOT NULL DEFAULT '{}',
  created_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at     TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

diary

AI-generated daily KPI reports.

CREATE TABLE IF NOT EXISTS diary (
  id         SERIAL PRIMARY KEY,
  content    TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

cleanup_log

History of company database cleanup operations.

CREATE TABLE IF NOT EXISTS cleanup_log (
  id            SERIAL PRIMARY KEY,
  company_table TEXT    NOT NULL,
  deleted_count INTEGER NOT NULL DEFAULT 0,
  failed_count  INTEGER NOT NULL DEFAULT 0,
  status        TEXT    NOT NULL,
  errors        TEXT[]  NOT NULL DEFAULT '{}',
  created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

github_trending

Curated daily GitHub trending repos with tags for future vector search.

CREATE TABLE IF NOT EXISTS github_trending (
  id          SERIAL PRIMARY KEY,
  repo_name   TEXT    NOT NULL,
  url         TEXT    NOT NULL,
  description TEXT    NOT NULL DEFAULT '',
  language    TEXT    NOT NULL DEFAULT '',
  stars       INTEGER NOT NULL DEFAULT 0,
  today_stars INTEGER NOT NULL DEFAULT 0,
  summary     TEXT    NOT NULL DEFAULT '',
  tags        TEXT[]  NOT NULL DEFAULT '{}',
  sent        BOOLEAN NOT NULL DEFAULT false,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

The repo_name column (e.g. vercel/ai) is used for deduplication — repos sent in the last 7 days are filtered out before curation.

Initialization

Run once to create all tables:

pnpm run setup

Useful Queries

-- Recent trending repos
SELECT repo_name, stars, today_stars, tags FROM github_trending ORDER BY created_at DESC LIMIT 10;

-- Today's KPI
SELECT * FROM kpi WHERE created_at::date = CURRENT_DATE;

-- Cleanup history
SELECT * FROM cleanup_log ORDER BY created_at DESC LIMIT 5;

-- All diary entries this week
SELECT * FROM diary WHERE created_at >= NOW() - INTERVAL '7 days';

-- Find repos by tag
SELECT repo_name, summary FROM github_trending WHERE 'ai' = ANY(tags);

Future: Vector Support

Neon supports the pgvector extension. When ready to add semantic search:

CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE github_trending ADD COLUMN embedding vector(1536);

Then query by similarity:

SELECT repo_name, summary, tags
FROM github_trending
ORDER BY embedding <=> $1
LIMIT 10;

Tags enable hybrid search — combine vector similarity with tag-based filtering.

Clone this wiki locally