Skip to content

Rylorx/Compass

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Research Compass - Granular Knowledge Map

Compass combines two complementary experiences:

  1. Granular topic exploration - A FastAPI + Python pipeline embeds a Kaggle NYT dataset, clusters it into 40 coarse and 200 fine topics, and renders an interactive D3 map so you can zoom from broad beats (e.g., Health) into subtopics (e.g., Stem Cell Therapy). Search and paste-in-text features reuse the same embeddings so you can see where your ideas land inside the knowledge map.
  2. Document intelligence + insights - A Node + Express backend handles PDF/text uploads, runs Claude embeddings, finds nearest NYT coverage, and asks Claude for tightly scoped insights and citations. The React force graph (react-force-graph-2d) visualizes those neighbors alongside your document for deeper analysis.

Instead of scrolling feeds, you can visually explore how ideas relate, drill down into subtopics, or drop in your own writing to see related reporting and generate citations.

Why it matters

  • Accelerate research. Coarse/fine topic hierarchies make it easy to jump from a broad beat to specific threads in seconds.
  • Power student projects. Upload a paper draft or summary and instantly discover matching NYT coverage and suggested citations.
  • Zoom into concepts, not noise. When you enter a cluster the surrounding map fades, keeping focus on that topic's inner structure.

Core capabilities

  • 📍 Hierarchical topic map – 40 coarse clusters and 200 fine clusters rendered with D3, complete with exploded “subtopic orbits” for legibility.

  • 🔍 Semantic search – SentenceTransformer embeddings let you query ideas, not just keywords.

  • ✍️ Document + PDF drop-in – Paste text or upload a PDF/doc and the backend places it inside the nearest fine cluster, listing the most similar NYT articles.

  • 🏷️ AI-assisted naming – OpenAI refines auto-generated cluster names so labels sound like real research topics instead of raw metadata.

  • 🧾 APA citations on demand – Every article in the sidebar has a Cite button that asks GPT-4o mini for a clean APA reference you can drop into a paper.

  • 🧠 Research-first UX – Focus mode hides unrelated bubbles, tooltips summarize subtopics, and the sidebar pivots between search, cluster deep dives, and upload insights.

  • Hierarchical topic map - D3 renders exploded "subtopic orbits" for legibility when you zoom into fine clusters.

  • Semantic search - SentenceTransformer embeddings let you query ideas instead of raw keywords.

  • Document drop-in - Paste text or upload a PDF, place it inside the nearest fine cluster, and list the most similar NYT articles.

  • Research-first UX - Focus mode hides unrelated bubbles, tooltips summarize subtopics, and the sidebar pivots between search, cluster deep dives, upload results, and your force graph insights.

  • Accelerate research. Surfacing coarse → fine topic hierarchies makes it easy to jump from “health” to “stem cell therapy” in seconds.

  • Power student projects. Upload a paper draft or reading summary and instantly discover the closest coverage, citations, and context.

  • Zoom into concepts, not noise. When you enter a cluster the surrounding map fades, keeping your attention on that topic’s inner structure.

Core capabilities

  • 📍 Hierarchical topic map – 40 coarse clusters and 200 fine clusters rendered with D3, complete with exploded “subtopic orbits” for legibility.
  • 🔍 Semantic search – SentenceTransformer embeddings let you query ideas, not just keywords.
  • ✍️ Document + PDF drop-in – Paste text or upload a PDF/doc and the backend places it inside the nearest fine cluster, listing the most similar NYT articles.
  • 🏷️ AI-assisted naming – OpenAI refines auto-generated cluster names so labels sound like real research topics instead of raw metadata.
  • 🧾 APA citations on demand – Every article in the sidebar has a AI-generated Cite button for a clean APA reference you can drop into a paper.
  • 🧠 Research-first UX – Focus mode hides unrelated bubbles, tooltips summarize subtopics, and the sidebar pivots between search, cluster deep dives, and upload insights.

Stack at a glance

  • Backend: FastAPI, sentence-transformers, scikit-learn (MiniBatchKMeans, t-SNE), pandas/joblib for preprocessing and persistence.
  • Frontend: React + Vite, custom D3 zoom/drag rendering, Tailwind-inspired styling in plain CSS.
  • Data: Kaggle NYT article sample embedded and clustered offline via build_index.py.

Repository layout

backend/            # Node service (upload, embed, nearest, analyze) + scripts/utilities
backend/api.py      # FastAPI entry point (map/search/upload endpoints)
frontend/           # Vite + React + Tailwind + react-force-graph + D3 map UI

Backend services

You now run two servers side-by-side. The Vite dev server proxies /api/* to FastAPI on port 8000 and everything else to the Node backend on port 3000.

1. FastAPI knowledge map (Python)

cd backend
python -m venv .venv
source .venv/bin/activate        # or .\.venv\Scripts\activate on Windows
pip install -r requirements.txt
python build_index.py            # one-time: builds embeddings, clusters, & 2D layout
uvicorn api:app --reload --port 8000

Note: Set both OPENAI_API_KEY and API_ACCESS_TOKEN in your environment (or a .env) before running build_index.py or the API. The OpenAI key enables AI naming + citation generation; the shared access token locks down every /api/* route (clients must send it via X-Compass-Key). Optional rate-limiting knobs (FACULTY_RATE_LIMIT, UPLOAD_RATE_LIMIT, CITATION_RATE_LIMIT, etc.) are also read from the environment. If the OpenAI key is missing, the app falls back to heuristic labels and manual citations.

build_index.py expects the Kaggle NYT CSV + embeddings referenced in config.py. It creates backend/data/index.pkl, which FastAPI loads on startup to serve (endpoints honor API_ACCESS_TOKEN + rate limits configured via FACULTY_RATE_LIMIT, UPLOAD_RATE_LIMIT, CITATION_RATE_LIMIT, etc.):

  • GET /api/map - coarse/fine cluster geometry + bounds
  • GET /api/fine_cluster/:id - fine cluster metadata + article coordinates
  • GET /api/search?q=... - semantic search over the embedded articles
  • POST /api/upload - accept raw text, embed it with sentence-transformers, return top neighbors plus the closest fine cluster
  • GET /api/faculty?q=topic - scrapes UVA People Search for faculty whose bios mention the provided topic keywords (used to surface related experts in the sidebar)

2. Node/Express insight service

cd backend
npm install
cp .env.example .env  # add CLAUDE_API_KEY and optional overrides
npm run dev           # or node index.js
# or run both this service and FastAPI together:
# npm run dev:all

Key routes:

  • POST /upload - Multer + pdf/text extraction, returns { text }
  • POST /embed - Calls Claude embeddings (kept for future use)
  • POST /nearest - Cosine search over whatever NYT embedding JSON you optionally drop in (backend/index.js loads it when present)
  • POST /analyze - Sends a structured prompt to Claude, anchored to whatever nearestArticles the frontend sends
  • GET /articles - provides a sample slice for demo/fallback graphs

Utilities in backend/scripts/ help trim or reformat the NYT CSV so the Python side stays fast.

Rate limiting: /upload and /analyze automatically enforce NODE_UPLOAD_RATE_LIMIT / NODE_ANALYZE_RATE_LIMIT (defaults 10 requests per 60s). Excess calls receive HTTP 429 responses.

Note:

Set OPENAI_API_KEY and API_ACCESS_TOKEN in your environment (or a .env) before running build_index.py or the API. Configure the frontend with VITE_COMPASS_TOKEN so every request sends the matching X-Compass-Key header. If the OpenAI key is missing, the app falls back to heuristic labels and manual citations.

Frontend

cd frontend
npm install
npm run dev   # http://localhost:5173 with Vite proxies configured

What's new:

  • KnowledgeMap.jsx renders the coarse topic bubbles + zoomed fine nodes via D3. Selecting a cluster controls the force graph sidebar.
  • MapSidebar.jsx reuses the semantic search + upload UX. Typed text calls the FastAPI /api/upload, while file uploads still go through the Node backend for PDF extraction but immediately ask FastAPI for the better Kaggle neighbors.
  • Graph.jsx (react-force-graph-2d) keeps your document node + animated links. It now consumes whichever article set is active: doc upload, pasted text, a selected fine cluster, or semantic search results.
  • The Details/Insights panel stays the same, and POST /analyze still receives the final list of articles so Claude can cite them.

Tailwind (src/styles/globals.css) handles the overall shell, while src/index.css keeps the dedicated map/search styles authored for the knowledge map.

Typical flow

  1. Start both backends (FastAPI on :8000, Express on :3000).
  2. Run npm run dev in frontend/ and open http://localhost:5173.
  3. Paste arbitrary text in the right-hand sidebar or upload a document. Uploads still call the Node /upload route for extraction, but the text immediately goes to FastAPI for the Kaggle neighbors + cluster metadata.
  4. Explore the Knowledge Map card to drill into coarse vs fine clusters. The map and sidebar drive the force graph, so selecting a subtopic or search result instantly refreshes the React force graph view.
  5. When a document is active, click Generate Insights to run the guarded Claude analysis that cites only the neighbors currently stored in context.

Notes & tips

  • Need to refresh the Kaggle-derived dataset? Update config.py and re-run python build_index.py. It writes a new data/index.pkl the API will load on restart.
  • If you provide your own NYT embedding JSON, keep the { embedding: number[], ...metadata } schema identical so /articles and /analyze continue to work (otherwise those routes simply return empty arrays).
  • The Vite dev server proxies both APIs, so no CORS fiddling is required locally. For production replicate the /api vs /upload|/analyze routing with your reverse proxy.
  • The frontend normalizes article objects (section name, url, snippet) everywhere before handing them to Claude, react-force-graph, or the knowledge-map sidebar. That means both backends can evolve independently as long as they return the expected fields.

Have fun combining the macro map with the micro react-force-graph view!

..venv\Scripts\activate python -m uvicorn api:app --reload --port 8000

About

Compass turns thousands of New York Times articles into a living research atlas for students, analysts, and obsessive note-takers. Instead of scrolling through feeds, you can visually explore how ideas relate, drill down into subtopics, and even drop in your own writing to see where it belongs, and explore related articles.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • JavaScript 70.9%
  • Python 24.0%
  • CSS 4.8%
  • HTML 0.3%