Skip to content

GraphSage Implementation based on paper "Inductive Representation Learning on Large Graphs" by Hamilton et al., 2017

Notifications You must be signed in to change notification settings

SamarKri/GraphSage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GraphSage

Inductive Representation Learning on Large Graphs

πŸ“„ Description

This project implements the GraphSAGE model with four types of aggregation (mean, max, sum, GCN) and allows comparison of its performance with DeepWalk across multiple datasets (Citeseer, PPI, and OpenAlex πŸ€). The project includes a comprehensive evaluation based on F1 score, recall, precision, accuracy, confusion matrix, and classification report, along with embedding visualization.

Implementation of GraphSAGE based on the paper "Inductive Representation Learning on Large Graphs" by Hamilton et al., 2017

πŸ“‚ Project Structure

GraphSage/

β”œβ”€β”€ Figures/               # All generated figures
β”œβ”€β”€ graphvenv/             # Virtual environment to isolate dependencies
β”œβ”€β”€ config.py              # Parameters and hyperparameters (dataset, learning rate, number of layers, etc.)
β”œβ”€β”€ dataloader.py          # Data loading and preprocessing
β”œβ”€β”€ models.py              # Model definitions: GraphSAGE (mean, max, LSTM aggregations), GCN, and DeepWalk
β”œβ”€β”€ train.py               # Training loop with early stopping and logging
β”œβ”€β”€ evaluation.py          # Evaluation and visualization functions (F1 score, recall, precision, confusion matrix, classification report)
β”œβ”€β”€ utils.py               # Utility functions (embedding visualization, model saving, etc.)
β”œβ”€β”€ requirements.txt       # Python dependencies (torch, torch-geometric, etc.)
β”œβ”€β”€ .gitignore             # Files and folders to ignore in Git (e.g., __pycache__, checkpoints, logs, etc.)
β”œβ”€β”€ README.md              # Project documentation (description, installation, usage, etc.)
└── main.py                # Entry point for training and evaluation

πŸ› οΈ Installation

πŸ“Œ Prerequisites

  • Operating System: Windows (tested)
  • Hardware: A GPU is recommended for full training; compatible with platforms like SageMaker
  • Use a CUDA-compatible GPU for faster training
  • Python 3.12.6+

Clone the repository

git clone https://github.com/SamarKri/GraphSage.git
cd GraphSage

Create a virtual environment (Windows)

python -m venv graphvenv
graphvenv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Run training with GraphSAGE (example on Citeseer)

python main.py --model graphsage --dataset citeseer

πŸ’‘ Future Work

  • Add unit tests for dataloader, model, train, evaluation, and utils
  • Test implementation on other datasets like Cora, PubMed, or Reddit
  • Add a directory for dataset storage/reference (Citeseer, Cora, PubMed, Reddit, PPI, OpenAlex)
  • Optimize hyperparameters using Optuna
  • Define additional performance metrics such as AUC-ROC for multi-label problems

About

GraphSage Implementation based on paper "Inductive Representation Learning on Large Graphs" by Hamilton et al., 2017

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages