This project provides a visualization tool for exploring course relationships based on semantic similarity. Using NV-Embed-v2 embeddings and graph visualization, it allows users to explore courses, their semantic relationships, and perform semantic search.
The project has a consolidated structure with clear separation of concerns:
-
backend/: FastAPI server that provides course data and pre-calculated similaritiesmain.py: Server that calculates similarities and provides APIs
-
frontend/: React application for visualizationsrc/: React source codepublic/: Static assets
-
data/: Course data filescourse-embd-data.csv: Original course datacourse-embd-data-with-embeddings.csv: Course data with pre-calculated embeddings
-
scripts/: Utility scriptsembedding_script.py: Script to generate embeddings for all coursesserver.py: Standalone embedding service (used by generate_embeddings.py)generate_embeddings.py: Script to generate embeddings from within the search app
-
start.sh: Launcher script to start both backend and frontend
- Python 3.8+
- PyTorch 2.0+
- Node.js 14+
- npm 7+
- FastAPI
- Sentence-Transformers
- React
- Sigma.js 2.0+
-
Set up Python Environment:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt -
Setup Node.js Environment:
cd frontend npm install -
Start the Application:
./start.sh
This script will:
- Start the backend server on port 8001
- Start the frontend development server on port 3000
- Open your browser to http://localhost:3000
- The backend pre-calculates pairwise cosine similarities between course embeddings during initialization
- The graph visualization places courses with higher similarity closer together
- Departments are still visually distinguishable by color
- Users can search for courses, filter by department, and explore the semantic space
- Semantic search for courses based on natural language queries
- Interactive graph visualization of course relationships
- Department filtering and navigation
- Course detail view when selecting a course
- Fast navigation with pre-calculated similarities
- Backend uses FastAPI and Sentence-Transformers to handle embeddings and similarity calculations
- Frontend uses React with Sigma.js for graph visualization
- Cosine similarity is used to measure semantic relatedness between courses
- Force-directed layout with ForceAtlas2 algorithm positions similar courses closer together
Open source for educational purposes.