This repository contains the code used for the experiments presented in the paper A Theoretically-Principled Sparse, Connected, and Rigid Graph Representation of Molecules, selected for an Oral presentation at ICLR 2025.
All experiments were conducted using Python 3.9.20 within a Conda 24.11.3 environment, with CUDA 12.4 support on NVIDIA RTX 3090 GPUs. All required packages are specified in the requirements.txt file.
We use the reaction and fold datasets from DIG and and the LBA dataset from atom3d. We unified the storage format by saving all datasets as .mdb files, which keeps the total storage of the datasets under 2GB. The datasets are loaded as LMDBDataset in atom3d package when runing the experiments.
Please download the dataset from the provided in the releases or using
wget https://github.com/Utah-Math-Data-Science/SCHull4Science/releases/download/v1.0.0/Data.zip
and unzip Data.zip before running the experiments.
The data folders are organized as follows:
├── Data
│ │
│ ├── Reaction-EC
│ │ │
│ │ ├── train
│ │ │ ├── data.mdb
│ │ │ ├── lock.mdb
│ │ │ ...
│ │ ├── val
│ │ ...
│ ├── FoldData
│ ... When the code is run for the first time, it will automatically pre-process the data, which includes constructing the original node features and the SCHull graph using
import SCHull; schull = SCHull.SCHull()
schull.get_schull
Each pre-processing step takes less than 10 minutes to complete, with approximately 40% of the time spent on constructing the SCHull graph.
We follow the section 3.4 in SCHull paper to integrate the SCHull graph into the baseline models. The project codes are organized as:
├── SCHull4Science
│ │
│ ├── SCHull
│ │ ├── SCHull.py
│ │ ...
│ ├── models
│ │ ├── pronet.py
│ │ ...
│ ├── dataset
│ │ ├── fold_dataset.py
│ │ ...
│ ...
│ ├── main_react.py
│ │
│ ├── main_react.py
│ ...To run the experiments, one approach is to cd to the SCHull4Science directory and execute the following command:
python main_react.py --data_path <PATH_to_Data/Reaction-EC> \
--save_dir <PATH_to_SAVE> \
--exp_name <Experiment_Name> \
--schull <True_for_Integrating_SCHull>
python main_fold.py --data_path <PATH_to_Data/FoldData> \
--save_dir <PATH_to_SAVE> \
--exp_name <Experiment_Name> \
--schull <True_for_Integrating_SCHull>
python main_lba.py --data_path <PATH_to_Data/LBA-split-by-sequence-identity-30> \
--save_dir <PATH_to_SAVE> \
--exp_name <Experiment_Name> \
--schull <True_for_Integrating_SCHull>
You're welcome to cite our paper — we appreciate your support!
@inproceedings{wang2025schull,
title={A Theoretically-Principled Sparse, Connected, and Rigid Graph Representation of Molecules},
author={Wang, Shih-Hsin and Huang, Yuhao and Baker, Justin and Sun, Yuan-En and Tang, Qi and Wang, Bao},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025}
}
This repository incorporates components of code from the DIG.
