Photo by Shubham Dhage on Unsplash
β‘ Multi-modal Graph Neural Networks for Harmful Brain Activity Classification
AlphaHMS is a deep-learning pipeline for classifying harmful brain activity from EEG and spectrogram recordings, built around the HMS Harmful Brain Activity Classification Kaggle challenge. The system represents each recording as a pair of temporal graph sequences and learns a joint EEG + spectrogram representation with Graph Attention Networks, BiLSTM temporal encoders, hierarchical regional pooling, and cross-modal attention fusion.
π― The six target classes are: Seizure, LPD (Lateralized Periodic Discharges), GPD (Generalized Periodic Discharges), LRDA (Lateralized Rhythmic Delta Activity), GRDA (Generalized Rhythmic Delta Activity), and Other.
- πΈοΈ Graph-based EEG modelling β each 50 s EEG recording is split into 9 overlapping 10 s windows; nodes are the 19 EEG channels and edges are derived from inter-channel coherence (threshold 0.5).
- π Graph-based spectrogram modelling β 600 s spectrograms are split into 119 windows over 4 spatial regions (LL, RL, LP, RP) with fixed spatial connectivity.
- π§© Hierarchical pooling by clinical brain regions (Frontal, Central, Parietal, Occipital) before temporal modelling.
- π Cross-modal fusion with multi-head attention between EEG and spectrogram regional embeddings.
- β‘ PyTorch Lightning training with mixed precision (BF16), WandB logging, cross-validation, class-weighted / KL-divergence losses, and early stopping.
- π Explainability via GNNExplainer and attention-weight inspection.
- π Baselines included: EEG-only GNN and a raw-EEG MLP.
AlphaHMS/
βββ configs/ # OmegaConf YAML configs
β βββ graphs.yaml # Preprocessing parameters
β βββ model.yaml # Multi-modal GNN architecture
β βββ model_eeg.yaml # EEG-only baseline architecture
β βββ train.yaml # Main training config
β βββ train_4fold.yaml # Cross-validation training
β βββ train_eeg.yaml # EEG-only baseline training
β βββ training_mlp.yaml # MLP baseline training
β βββ inference_mlp.yaml # MLP inference
β βββ smoke_test.yaml # Quick smoke test
βββ notebooks/
β βββ eda.ipynb # Exploratory data analysis & preprocessing
βββ src/
β βββ data/ # Datasets, DataModules, graph builders
β β βββ graph_dataset.py
β β βββ graph_datamodule.py
β β βββ baseline_dataset.py
β β βββ baseline_datamodule.py
β β βββ raw_eeg_dataset.py
β β βββ raw_datamodule.py
β β βββ make_graph_dataset.py # Build graph dataset from raw data
β β βββ utils/
β β βββ eeg_process.py
β β βββ spectrogram_process.py
β βββ models/
β β βββ hms_model.py # Multi-modal model
β β βββ hms_eeg_model.py # EEG-only baseline
β β βββ eeg_mlp.py # MLP baseline
β β βββ regularization.py
β β βββ explainer_wrappers.py
β β βββ graph_layers/ # GAT, temporal, fusion, pooling, classifier
β βββ lightning_trainer/ # LightningModules (multi-modal, EEG, MLP)
β βββ explainers/
β β βββ gnn_explainer.py
β βββ train.py # Training entrypoint (GNN models)
β βββ train_mlp.py # Training entrypoint (MLP baseline)
β βββ explain.py # GNNExplainer driver
β βββ explain_attention.py # Attention-weight analysis
β βββ explain_model.py
βββ tests/ # Pytest suite
βββ inspect_data.py # Quick data inspection utility
βββ environment.yaml # Conda environment specification
βββ pytest.ini
conda env create -f environment.yaml -y
conda activate graphThe environment file deliberately omits PyTorch so you can match your local CUDA version. For CUDA 12.1:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install torcheegYou will also need PyTorch Geometric matching your PyTorch / CUDA build β follow the official install guide.
You must accept the competition terms on Kaggle first.
mkdir -p data/raw && cd data/raw
kaggle competitions download -c hms-harmful-brain-activity-classification
unzip hms-harmful-brain-activity-classification.zipExpected layout under data/raw/:
data/raw/
βββ train.csv
βββ train_eegs/ # parquet files, one per EEG recording
βββ train_spectrograms/ # parquet files, one per spectrogram
jupyter execute notebooks/eda.ipynbpython src/data/make_graph_dataset.pyThis produces one data/processed/patient_{id}.pt per patient plus a metadata.pt index. See src/data/README.md for full details on graph construction, output format, and memory requirements.
All training scripts log to Weights & Biases; run wandb login once before starting.
python src/train.py --train-config configs/train.yamlpython src/train.py --train-config configs/train_4fold.yamlpython src/train.py --train-config configs/train_eeg.yamlpython src/train_mlp.py --config configs/training_mlp.yamlpython src/train.py --train-config configs/smoke_test.yamlEvaluation runs automatically at the end of training. Checkpoints are written to the directory specified in the config.
The multi-modal model (see configs/model.yaml) is composed of:
- EEG encoder β 2-layer multi-head GAT (64-dim, 4 heads) with coherence edge weights β hierarchical regional pooling β 2-layer BiLSTM (128-dim, bidirectional).
- Spectrogram encoder β 2-layer GAT (64-dim, 4 heads) over the 4 spatial regions β BiLSTM (128-dim, bidirectional).
- Cross-modal fusion β multi-head cross-attention (8 heads, 256-dim) between EEG and spectrogram regional embeddings, with attention pooling over regions.
- Classifier β MLP with hidden sizes
[256, 128], ELU activations, dropout 0.3 β 6-class softmax.
Loss defaults to KL divergence against the soft expert-vote distribution; class weighting, graph-Laplacian regularization, and edge-weight penalties are all configurable.
# GNNExplainer over a trained checkpoint
python src/explain.py
# Attention-weight visualisation
python src/explain_attention.pypytestThe test suite covers the data module, preprocessing pipeline, multiprocessing, regularization, checkpoint resume, and spectrogram processing.
- Training was developed on H200 / RTX-class GPUs with BF16 mixed precision.
- Preprocessing is CPU-bound and benefits from
build_workersset to your physical core count. - Recommended: β₯ 16 GB RAM for preprocessing, β₯ 1 modern CUDA GPU for training.
If you use AlphaHMS in your research or build upon it, please cite this repository:
@software{krylov2025alphahms,
author = {Denis Krylov, Samuel Goldie, Alberto Pasinato, Serkan Akin, Leonardo Lago},
title = {{AlphaHMS}: Multi-modal Graph Neural Networks for Harmful Brain Activity Classification},
year = {2025},
institution = {Delft University of Technology},
url = {https://github.com/deniskrylov/AlphaHMS},
note = {Built for the HMS Harmful Brain Activity Classification challenge (Kaggle)}
}