Compass combines two complementary experiences:
- Granular topic exploration - A FastAPI + Python pipeline embeds a Kaggle NYT dataset, clusters it into 40 coarse and 200 fine topics, and renders an interactive D3 map so you can zoom from broad beats (e.g., Health) into subtopics (e.g., Stem Cell Therapy). Search and paste-in-text features reuse the same embeddings so you can see where your ideas land inside the knowledge map.
- Document intelligence + insights - A Node + Express backend handles PDF/text uploads, runs Claude embeddings, finds nearest NYT coverage, and asks Claude for tightly scoped insights and citations. The React force graph (
react-force-graph-2d) visualizes those neighbors alongside your document for deeper analysis.
Instead of scrolling feeds, you can visually explore how ideas relate, drill down into subtopics, or drop in your own writing to see related reporting and generate citations.
- Accelerate research. Coarse/fine topic hierarchies make it easy to jump from a broad beat to specific threads in seconds.
- Power student projects. Upload a paper draft or summary and instantly discover matching NYT coverage and suggested citations.
- Zoom into concepts, not noise. When you enter a cluster the surrounding map fades, keeping focus on that topic's inner structure.
-
📍 Hierarchical topic map – 40 coarse clusters and 200 fine clusters rendered with D3, complete with exploded “subtopic orbits” for legibility.
-
🔍 Semantic search – SentenceTransformer embeddings let you query ideas, not just keywords.
-
✍️ Document + PDF drop-in – Paste text or upload a PDF/doc and the backend places it inside the nearest fine cluster, listing the most similar NYT articles.
-
🏷️ AI-assisted naming – OpenAI refines auto-generated cluster names so labels sound like real research topics instead of raw metadata.
-
🧾 APA citations on demand – Every article in the sidebar has a Cite button that asks GPT-4o mini for a clean APA reference you can drop into a paper.
-
🧠 Research-first UX – Focus mode hides unrelated bubbles, tooltips summarize subtopics, and the sidebar pivots between search, cluster deep dives, and upload insights.
-
Hierarchical topic map - D3 renders exploded "subtopic orbits" for legibility when you zoom into fine clusters.
-
Semantic search - SentenceTransformer embeddings let you query ideas instead of raw keywords.
-
Document drop-in - Paste text or upload a PDF, place it inside the nearest fine cluster, and list the most similar NYT articles.
-
Research-first UX - Focus mode hides unrelated bubbles, tooltips summarize subtopics, and the sidebar pivots between search, cluster deep dives, upload results, and your force graph insights.
-
Accelerate research. Surfacing coarse → fine topic hierarchies makes it easy to jump from “health” to “stem cell therapy” in seconds.
-
Power student projects. Upload a paper draft or reading summary and instantly discover the closest coverage, citations, and context.
-
Zoom into concepts, not noise. When you enter a cluster the surrounding map fades, keeping your attention on that topic’s inner structure.
- 📍 Hierarchical topic map – 40 coarse clusters and 200 fine clusters rendered with D3, complete with exploded “subtopic orbits” for legibility.
- 🔍 Semantic search – SentenceTransformer embeddings let you query ideas, not just keywords.
- ✍️ Document + PDF drop-in – Paste text or upload a PDF/doc and the backend places it inside the nearest fine cluster, listing the most similar NYT articles.
- 🏷️ AI-assisted naming – OpenAI refines auto-generated cluster names so labels sound like real research topics instead of raw metadata.
- 🧾 APA citations on demand – Every article in the sidebar has a AI-generated Cite button for a clean APA reference you can drop into a paper.
- 🧠 Research-first UX – Focus mode hides unrelated bubbles, tooltips summarize subtopics, and the sidebar pivots between search, cluster deep dives, and upload insights.
- Backend: FastAPI, sentence-transformers, scikit-learn (MiniBatchKMeans, t-SNE), pandas/joblib for preprocessing and persistence.
- Frontend: React + Vite, custom D3 zoom/drag rendering, Tailwind-inspired styling in plain CSS.
- Data: Kaggle NYT article sample embedded and clustered offline via
build_index.py.
backend/ # Node service (upload, embed, nearest, analyze) + scripts/utilities
backend/api.py # FastAPI entry point (map/search/upload endpoints)
frontend/ # Vite + React + Tailwind + react-force-graph + D3 map UI
You now run two servers side-by-side. The Vite dev server proxies /api/* to FastAPI on port 8000 and everything else to the Node backend on port 3000.
cd backend
python -m venv .venv
source .venv/bin/activate # or .\.venv\Scripts\activate on Windows
pip install -r requirements.txt
python build_index.py # one-time: builds embeddings, clusters, & 2D layout
uvicorn api:app --reload --port 8000Note: Set both OPENAI_API_KEY and API_ACCESS_TOKEN in your environment (or a .env) before running build_index.py or the API. The OpenAI key enables AI naming + citation generation; the shared access token locks down every /api/* route (clients must send it via X-Compass-Key). Optional rate-limiting knobs (FACULTY_RATE_LIMIT, UPLOAD_RATE_LIMIT, CITATION_RATE_LIMIT, etc.) are also read from the environment. If the OpenAI key is missing, the app falls back to heuristic labels and manual citations.
build_index.py expects the Kaggle NYT CSV + embeddings referenced in config.py. It creates backend/data/index.pkl, which FastAPI loads on startup to serve (endpoints honor API_ACCESS_TOKEN + rate limits configured via FACULTY_RATE_LIMIT, UPLOAD_RATE_LIMIT, CITATION_RATE_LIMIT, etc.):
GET /api/map- coarse/fine cluster geometry + boundsGET /api/fine_cluster/:id- fine cluster metadata + article coordinatesGET /api/search?q=...- semantic search over the embedded articlesPOST /api/upload- accept raw text, embed it withsentence-transformers, return top neighbors plus the closest fine clusterGET /api/faculty?q=topic- scrapes UVA People Search for faculty whose bios mention the provided topic keywords (used to surface related experts in the sidebar)
cd backend
npm install
cp .env.example .env # add CLAUDE_API_KEY and optional overrides
npm run dev # or node index.js
# or run both this service and FastAPI together:
# npm run dev:allKey routes:
POST /upload- Multer + pdf/text extraction, returns{ text }POST /embed- Calls Claude embeddings (kept for future use)POST /nearest- Cosine search over whatever NYT embedding JSON you optionally drop in (backend/index.jsloads it when present)POST /analyze- Sends a structured prompt to Claude, anchored to whatevernearestArticlesthe frontend sendsGET /articles- provides a sample slice for demo/fallback graphs
Utilities in backend/scripts/ help trim or reformat the NYT CSV so the Python side stays fast.
Rate limiting:
/uploadand/analyzeautomatically enforceNODE_UPLOAD_RATE_LIMIT/NODE_ANALYZE_RATE_LIMIT(defaults 10 requests per 60s). Excess calls receive HTTP 429 responses.
Note:
Set OPENAI_API_KEY and API_ACCESS_TOKEN in your environment (or a .env) before running build_index.py or the API. Configure the frontend with VITE_COMPASS_TOKEN so every request sends the matching X-Compass-Key header. If the OpenAI key is missing, the app falls back to heuristic labels and manual citations.
cd frontend
npm install
npm run dev # http://localhost:5173 with Vite proxies configuredWhat's new:
KnowledgeMap.jsxrenders the coarse topic bubbles + zoomed fine nodes via D3. Selecting a cluster controls the force graph sidebar.MapSidebar.jsxreuses the semantic search + upload UX. Typed text calls the FastAPI/api/upload, while file uploads still go through the Node backend for PDF extraction but immediately ask FastAPI for the better Kaggle neighbors.Graph.jsx(react-force-graph-2d) keeps your document node + animated links. It now consumes whichever article set is active: doc upload, pasted text, a selected fine cluster, or semantic search results.- The Details/Insights panel stays the same, and
POST /analyzestill receives the final list of articles so Claude can cite them.
Tailwind (src/styles/globals.css) handles the overall shell, while src/index.css keeps the dedicated map/search styles authored for the knowledge map.
- Start both backends (FastAPI on :8000, Express on :3000).
- Run
npm run devinfrontend/and open http://localhost:5173. - Paste arbitrary text in the right-hand sidebar or upload a document. Uploads still call the Node
/uploadroute for extraction, but the text immediately goes to FastAPI for the Kaggle neighbors + cluster metadata. - Explore the Knowledge Map card to drill into coarse vs fine clusters. The map and sidebar drive the force graph, so selecting a subtopic or search result instantly refreshes the React force graph view.
- When a document is active, click Generate Insights to run the guarded Claude analysis that cites only the neighbors currently stored in context.
- Need to refresh the Kaggle-derived dataset? Update
config.pyand re-runpython build_index.py. It writes a newdata/index.pklthe API will load on restart. - If you provide your own NYT embedding JSON, keep the
{ embedding: number[], ...metadata }schema identical so/articlesand/analyzecontinue to work (otherwise those routes simply return empty arrays). - The Vite dev server proxies both APIs, so no CORS fiddling is required locally. For production replicate the
/apivs/upload|/analyzerouting with your reverse proxy. - The frontend normalizes article objects (section name, url, snippet) everywhere before handing them to Claude, react-force-graph, or the knowledge-map sidebar. That means both backends can evolve independently as long as they return the expected fields.
Have fun combining the macro map with the micro react-force-graph view!
..venv\Scripts\activate python -m uvicorn api:app --reload --port 8000