Skip to content

Commit 2b94856

Browse files
committed
feat: add BM25 + semantic search artifacts
1 parent 21cf58c commit 2b94856

8 files changed

Lines changed: 496 additions & 48 deletions

File tree

.gitignore

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@ build/
1212
.next/
1313
out/
1414

15-
# SQL Cache
15+
# SQL Cache (local/CI)
1616
.cache/
17-
*.db
18-
*.db-shm
19-
*.db-wal
17+
.cache/*.db
18+
.cache/*.db-shm
19+
.cache/*.db-wal
2020

2121
# OS / Editor
2222
.DS_Store

CONTRIBUTING.md

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
# Contributing to Frame Codex
2+
3+
Welcome! This document is a high-level, human-friendly guide for contributing to **Frame Codex**.
4+
For the full, step-by-step workflow, see [`docs/contributing/how-to-submit.md`](docs/contributing/how-to-submit.md).
5+
6+
---
7+
8+
## 1. What is Frame Codex?
9+
10+
Frame Codex is a **data-only, markdown-first knowledge fabric** designed for AI systems:
11+
12+
- **Fabric** – The entire repository: a collection of weaves.
13+
- **Weave** – Top-level knowledge universe (e.g. `weaves/frame/`, `weaves/wiki/`).
14+
- **Loom** – Any subdirectory inside a weave (topic/module, inferred from folders).
15+
- **Strand** – Individual markdown file at any depth inside a weave (atomic unit).
16+
17+
There is **no UI** in this repo. The primary viewer lives at:
18+
19+
- **Frame.dev Codex UI**: <https://frame.dev/codex>
20+
- **Repo**: <https://github.com/framersai/frame.dev> (app that renders this Codex)
21+
22+
---
23+
24+
## 2. Recommended Contribution Path (Frame.dev UI)
25+
26+
The easiest way to contribute is **inside the Codex UI** at [`frame.dev/codex`](https://frame.dev/codex):
27+
28+
1. Open the Codex viewer and browse to the area you want to extend.
29+
2. Click **“Contribute”** in the toolbar (or use the contribution hotkey).
30+
3. Fill out:
31+
- **Title** (required)
32+
- **Summary** (20–300 chars, required)
33+
- **Content** (markdown, required)
34+
- **Weave + Loom** (optional – UI can suggest based on current path/content)
35+
- **Tags, Difficulty, Subjects, Topics** (optional but encouraged)
36+
4. (Optional) Toggle **AI Enhancement** to let the pipeline refine tags and metadata.
37+
5. (Optional) Provide a **GitHub Personal Access Token (PAT)** to let the browser create a PR via API.
38+
6. Preview the generated markdown + frontmatter.
39+
7. Click **“Create Pull Request”**:
40+
- With PAT → a PR is created via the GitHub API from your fork.
41+
- Without PAT → GitHub’s web editor opens with the file pre-filled.
42+
43+
### GitHub PAT Privacy
44+
45+
- PAT is entered **only** into the Codex contribution modal in your browser.
46+
- It is stored **only in memory** in that tab while the modal is open.
47+
- It is **never** written to:
48+
- `localStorage`
49+
- `sessionStorage`
50+
- IndexedDB / SQL cache
51+
- Any Frame.dev backend
52+
- The token is sent **only** to GitHub’s API endpoints for:
53+
- Forking `framersai/codex`
54+
- Creating a branch + file
55+
- Opening a pull request
56+
57+
You can always skip the PAT and use the GitHub web editor flow instead.
58+
59+
---
60+
61+
## 3. Manual Git Workflow (Advanced)
62+
63+
If you prefer the CLI, follow the traditional flow:
64+
65+
1. **Fork & Clone**
66+
67+
```bash
68+
gh repo fork framersai/codex --clone
69+
cd codex
70+
```
71+
72+
2. **Create a Branch**
73+
74+
```bash
75+
git checkout -b add-my-content
76+
```
77+
78+
3. **Add Content**
79+
80+
- Place files under a weave:
81+
82+
```text
83+
weaves/
84+
[weave]/ # e.g. frame/, wiki/, technology/
85+
weave.yaml
86+
overview.md # Strand at weave root
87+
guides/ # Loom (folder inferred from path)
88+
loom.yaml # Optional
89+
intro.md # Strand
90+
deep-dive/notes.md # Nested loom/strand
91+
```
92+
93+
- See [`docs/contributing/submission-schema.md`](docs/contributing/submission-schema.md) for required fields.
94+
95+
4. **Validate & Index**
96+
97+
```bash
98+
npm install
99+
npm run validate
100+
npm run index -- --validate
101+
```
102+
103+
5. **Commit & Push**
104+
105+
```bash
106+
git add .
107+
git commit -m "feat: add [your content title]"
108+
git push origin add-my-content
109+
```
110+
111+
6. **Open a PR**
112+
113+
```bash
114+
gh pr create --title "Add: [Your Content Title]" --body "Description of your contribution"
115+
```
116+
117+
---
118+
119+
## 4. Using the Frame.dev UI with Your Own Codex Repo
120+
121+
You can reuse the Frame.dev Codex UI to render **any** Codex-style repository:
122+
123+
1. **Fork `framersai/frame.dev`**.
124+
2. In your fork, create `.env.local` under `apps/frame.dev`:
125+
126+
```bash
127+
NEXT_PUBLIC_CODEX_REPO_OWNER=your-github-username-or-org
128+
NEXT_PUBLIC_CODEX_REPO_NAME=your-codex-repo
129+
NEXT_PUBLIC_CODEX_REPO_BRANCH=main
130+
```
131+
132+
3. The viewer reads these in `components/codex/constants.ts` via `REPO_CONFIG`.
133+
4. Run the UI locally:
134+
135+
```bash
136+
cd apps/frame.dev
137+
pnpm install
138+
pnpm dev
139+
```
140+
141+
5. Visit `http://localhost:3000/codex` to browse your own fabric.
142+
143+
---
144+
145+
## 5. Quality Guidelines
146+
147+
- **Content**: Clear, self-contained strands with real knowledge (no placeholders).
148+
- **Metadata**: Fill in `title`, `summary`, `tags`, `difficulty`, `subjects`, `topics`.
149+
- **Structure**: Use headings, lists, and code blocks for readability.
150+
- **Licensing**: Contributions must be compatible with **CC-BY-4.0**.
151+
152+
For full details, examples, and schema reference, see:
153+
154+
- [`docs/contributing/how-to-submit.md`](docs/contributing/how-to-submit.md)
155+
- [`docs/contributing/submission-schema.md`](docs/contributing/submission-schema.md)
156+
- [`docs/openstrand-architecture.md`](docs/openstrand-architecture.md)
157+
158+

README.md

Lines changed: 53 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Frame Codex is a data-only knowledge repository designed to be the canonical sou
3737
- **Frame Codex**: Public markdown knowledge repository (this repo) - read-only, curated, version-controlled
3838
- **OpenStrand**: Full personal knowledge management platform at [openstrand.ai](https://openstrand.ai) - supports any file type (images, videos, PDFs, code), AI analysis, serialization to markdown, private workspaces, and advanced features
3939

40-
**Schema**: Frame Codex follows the [OpenStrand schema specification](https://openstrand.ai/docs/schema) for weaves, looms, and strands. Looms are now inferred from folders (no `looms/` prefix required) and strands are any markdown files within a weave.
40+
**Schema**: Frame Codex follows the [OpenStrand schema specification](https://openstrand.ai/docs/schema) for weaves, looms, and strands. Looms are now inferred from folders (no `looms/` or `strands/` prefixes required) and strands are any markdown files within a weave.
4141

4242
## 🔄 Automated Indexing Workflow
4343

@@ -132,11 +132,11 @@ Frame Codex uses [@framers/sql-storage-adapter](https://github.com/framersai/sql
132132
- Reduces indexing time from ~30s to ~2-5s on typical PRs (85-95% speedup)
133133
- Cache persists across workflow runs via GitHub Actions cache
134134

135-
**Browser (IndexedDB):**
136-
- Caches fetched index data locally for offline access
137-
- Progressive sync with ETag-based updates
138-
- Instant repeat loads, no network requests for cached content
139-
- Quota: 50MB-1GB+ depending on browser
135+
**Browser (IndexedDB via Frame.dev Codex UI):**
136+
- Caches fetched Codex strands locally for faster reloads
137+
- SQL-backed cache lives entirely in your browser (IndexedDB/sql.js), never on Frame.dev servers
138+
- No secrets or tokens are ever stored in this cache—only public markdown content
139+
- Quota: 50MB1GB+ depending on browser
140140

141141
**Performance:**
142142
- First run: ~30s (full analysis, populates cache)
@@ -148,6 +148,23 @@ Frame Codex uses [@framers/sql-storage-adapter](https://github.com/framersai/sql
148148
SQL_CACHE_DISABLED=true # Disable SQL caching (falls back to full indexing)
149149
```
150150

151+
### Search Data (BM25 + Semantic Embeddings)
152+
153+
After building the main index, generate the search artifacts consumed by `frame.dev/codex`:
154+
155+
```bash
156+
npm run index # builds codex-index.json
157+
npm run build:search # builds codex-search.json (BM25 + MiniLM embeddings)
158+
```
159+
160+
`codex-search.json` contains:
161+
162+
- **BM25 postings** for every token (term frequency per strand)
163+
- **Document metadata** (path, title, summary, weave/loom, doc length)
164+
- **Packed Float32 embeddings** (MiniLM-L6-v2, mean pooled, normalized) stored as base64
165+
166+
These assets are completely static, so they can be hosted on GitHub Pages or any CDN. Frame.dev downloads them once and performs all ranking + semantic re-ranking in the browser (no server calls, no API keys).
167+
151168
## Repository Structure
152169

153170
```
@@ -188,11 +205,40 @@ Frame.dev and OpenStrand consume this content via:
188205
```javascript
189206
// Example: Fetch a strand (file at any depth inside a weave)
190207
const response = await fetch(
191-
'https://raw.githubusercontent.com/framersai/codex/main/weaves/frame/openstrand/architecture.md'
208+
'https://raw.githubusercontent.com/framersai/codex/main/weaves/frame/overview.md'
192209
);
193210
const content = await response.text();
194211
```
195212

213+
### Using Frame.dev as the Codex Viewer
214+
215+
The primary UI for browsing Frame Codex lives at [`https://frame.dev/codex`](https://frame.dev/codex):
216+
217+
- **Browse**: Tree + outline view with loom/strand badges
218+
- **Search**: NLP-enhanced client-side search (names + content, typo-tolerant)
219+
- **Bookmarks & History**: Stored locally in your browser
220+
- **Contribution Modal**: AI-assisted PR creation with optional GitHub PAT
221+
222+
> Privacy: The Frame.dev Codex UI stores bookmarks, history, preferences, and SQL cache **only in your browser**.
223+
> GitHub Personal Access Tokens (PATs), if you choose to provide one, are held only in memory while the contribution modal is open and are sent directly to GitHub—never to any Frame.dev backend and never written to localStorage/IndexedDB/SQL.
224+
225+
### Pointing Frame Codex UI at Your Own Repository
226+
227+
You can reuse the Frame.dev Codex viewer to render *any* GitHub-hosted Codex-style repository:
228+
229+
1. **Fork `framersai/frame.dev`**
230+
2. In the `apps/frame.dev` app, configure the Codex repo via environment variables:
231+
232+
```bash
233+
# .env.local
234+
NEXT_PUBLIC_CODEX_REPO_OWNER=your-github-username-or-org
235+
NEXT_PUBLIC_CODEX_REPO_NAME=your-codex-repo
236+
NEXT_PUBLIC_CODEX_REPO_BRANCH=main
237+
```
238+
239+
3. The viewer reads these in `components/codex/constants.ts` (`REPO_CONFIG`) and will render your repository instead of `framersai/codex`.
240+
4. Deploy your fork (e.g., Vercel, Netlify, GitHub Pages) and you now have a hosted Codex UI for your own knowledge fabric.
241+
196242
### Building the Index
197243

198244
```bash

0 commit comments

Comments
 (0)