π Discover patterns in your GitHub stars through machine learning
Stardex helps you explore and understand your GitHub starred repositories through advanced machine learning clustering and interactive visualizations.
- β¨ Features
- π οΈ Technology Stack
- π Detailed Features
- ποΈ Architecture
- π Getting Started
- π API Reference
- π§ͺ Development
- π Performance
- π¨βπ» Author
- π How to Cite
- π License
- π Smart Analysis: Machine learning-based clustering of repositories
- π Interactive Visualization: Dynamic D3.js visualization of repository clusters
- β‘ Real-time Processing: Fast data processing and clustering
- π Efficient Data Flow: Optimized communication between services
- π‘οΈ Type Safety: Full TypeScript and Python type coverage
- π¨ Modern UI: Clean, responsive interface with Tailwind CSS
- π± Mobile Ready: Fully responsive design for all devices
-
Frontend
- Next.js 13 with App Router
- React 18 with TypeScript
- TanStack Query for data management
- D3.js for visualizations
- Tailwind CSS for styling
- Shadcn/ui components
-
Backend
- FastAPI for REST API
- scikit-learn for ML operations
- Poetry for dependency management
- Pydantic for data validation
- Real-time repository search
- Language-based filtering
- Star count range filtering
- Topic-based filtering
- Date range filtering
- Multi-algorithm clustering approach:
- K-means for broad repository grouping
- Hierarchical clustering for detailed relationships
- PCA + Hierarchical clustering for large datasets
- TF-IDF vectorization for text analysis
- Configurable clustering parameters
- Performance metrics tracking
- Efficient processing of large datasets
- Interactive D3.js force-directed graph
- Cluster-based coloring
- Zoom and pan capabilities
- Repository details on hover
- Smooth animations and transitions
The application is structured as a monorepo with two main services:
- Located in
/frontend
- Built with Next.js, React, and TypeScript
- Uses TanStack Query for data fetching
- Implements a responsive UI with Tailwind CSS
- Visualizes repository clusters using D3.js
- Located in
/backend
- Built with FastAPI and Python
- Implements advanced clustering using scikit-learn
- Provides RESTful API endpoints
- Efficient data processing with sparse matrices
- Parallel processing capabilities
-
Clone & Install:
# Install root dependencies npm install # Install frontend dependencies cd frontend npm install # Install backend dependencies cd ../backend poetry install
-
Environment Setup:
# Frontend (.env.local) NEXT_PUBLIC_API_URL=http://localhost:8000
-
Development:
# Run both services npm run dev # Or run individually npm run dev:frontend npm run dev:backend
Clusters GitHub repositories based on their features.
Request Body
{
"repositories": [
{
"id": number,
"name": string,
"full_name": string,
"description": string | null,
"html_url": string,
"stargazers_count": number,
"forks_count": number,
"open_issues_count": number,
"size": number,
"watchers_count": number,
"language": string | null,
"topics": string[],
"owner": {
"login": string,
"avatar_url": string
},
"updated_at": string
}
]
}
Response
[
{
"repo": {
// Repository data (same as input)
},
"cluster_id": number,
"coordinates": [number, number]
}
]
Health check endpoint.
{
"status": "healthy"
}
The clustering process follows these steps:
-
π Feature Extraction
- TF-IDF vectorization for text data
- Repository metadata processing
- Language and topic encoding
-
π Dimensionality Reduction
- PCA for high-dimensional data
- Configurable number of components
- Efficient sparse matrix operations
-
π― Clustering
- K-means for initial grouping
- Hierarchical clustering with Ward linkage
- PCA-enhanced hierarchical clustering for large datasets
-
π¨ Visualization
- Interactive D3.js rendering
- Cluster-based coloring
- Smooth animations
-
π Style Guides
- Frontend: ESLint + Prettier
- Backend: Black + isort
-
β Testing
- Frontend: Jest + React Testing Library
- Backend: pytest
-
π Git Workflow
- Feature branches
- Pull request reviews
- Semantic versioning
- Efficient sparse matrix operations
- Parallel processing capabilities
- Memory-optimized data structures
- Request validation & caching
- Optimized D3.js rendering
- React Query data caching
- Component lazy loading
- GitHub: @BjornMelin
- Website: bjornmelin.io
- LinkedIn: @bjorn-melin
If you use Stardex in your research or project, please cite it as follows:
@software{melin2024stardex,
author = {Melin, Bjorn},
title = {Stardex: GitHub Stars Explorer},
year = {2024},
publisher = {GitHub},
url = {https://github.com/BjornMelin/stardex},
version = {1.0.0},
description = {A machine learning-powered tool for exploring and understanding GitHub starred repositories through clustering and interactive visualizations}
}
This project is licensed under the MIT License - see the LICENSE file for details.
Built with β€οΈ by [Bjorn Melin](https://bjornmelin.io)