AttentiveFP-UV

Project Overview

Built around the model AttentiveFP provided by PyG (PyTorch Geometric), this project aims to train versions of the model to predict the UV-absorption spectra in organic molecules. The dataset is provided by Oak Ridge National Laboratory (ORNL), which includes a wide range of organic molecules and their corresponding UV-Vis absorption spectra.

This project seeks to leverage graph neural networks to understand and predict how organic molecules absorb UV light, which is crucial for various applications in chemistry and materials science. By accurately predicting these spectra, we can facilitate the design of new materials and molecules with desired optical properties.

Background

The prediction of absorption spectra using deep learning models has become a significant topic in spectroscopy. Traditional methods for predicting UV-Vis spectra often involve complex quantum mechanical calculations, which are computationally expensive and time-consuming. Deep learning models, especially graph neural networks like AttentiveFP, offer a promising alternative by learning directly from data.

AttentiveFP, a model based on attention mechanisms within graph neural networks, has shown promising results in various molecular property prediction tasks. In this project, we adapt AttentiveFP and explore different graph attention methods such as GAT, GATv2, MoGAT(v2), and DenseGAT to enhance the prediction accuracy of UV-Vis spectra.

Goals and Objectives

The main objectives of this project are:

Data Preparation: Prepare a dataset of molecules with corresponding UV-Vis spectra.
Model Training: Train AttentiveFP with different Graph Attention methods, such as GAT, GATv2, MoGAT(v2), and DenseGAT.
Prediction and Evaluation: Predict UV-Vis spectra for unseen molecules and evaluate the performance of the models.

Setup

Prerequisites

Python 3.11+
Anaconda

Installation

Follow these steps to set up your environment and install all necessary dependencies:

Clone the Repository

   git clone https://github.com/williamnyren/AttentiveFP-UV.git
   cd AttentiveFP-UV

Create and Activate the Conda Environment

This will create the environment and install all conda and pip dependencies specified in the environment.yml file.
```
   conda env create -f environment.yml
   conda activate attentive_fp
```
Make the postBuild Script Executable

The postBuild script is used to install PyTorch, torchvision, and torchaudio with the specified CUDA version.
```
   chmod +x postBuild
```
Run the postBuild Script

Execute the script to complete the installation of the required packages.
```
   ./postBuild
```
By following these steps, you will set up your development environment with all the necessary dependencies to run the AttentiveFP-UV project.

Retrieving the Dataset

To retrieve the dataset required for this project, follow these detailed steps:

Based on the Work Here Refer to the article Two excited-state datasets for quantum chemical UV-vis spectra of organic molecules.

Steps to Download the Dataset

Go to the ORNL Data Transfer Guide
- Follow the instructions on how to install and use Globus from the ORNL Data Transfer Guide:
  - Data Transferring Guide
Install Globus
- Download and install the Globus app from the official Globus site:
  - Globus Download Site
Connect to Globus
- Open the Globus app and sign in using your credentials. Follow the on-screen instructions to connect to the Globus file transfer service.
Locate the Files on Globus
- The files to transfer on Globus can be found via the references in the article:
  - Yoo, P., Lupo Pasini, M., Mehta, K. & Irle, S. Supplementary material for GDB-9-Ex. OSTI.gov DOI (2023).
  - Yoo, P., Lupo Pasini, M., Mehta, K. & Irle, S. Supplementary material for ORNL_AISD-Ex. OSTI.gov DOI (2023).
Transfer the Files
- Ensure that you are connected to Globus in the installed app.
- Navigate to the file you want to download from Globus via the references mentioned above.
- Select a directory on your local system where you want to transfer the files.
- Follow the instructions in the Globus app to initiate and complete the file transfer.

Extract data files and directories

Extract the two files transferred from Globus and put them into the directory ORNL_data

   - ORNL_data
      - extracted
         - gdb9_ex.csv
         - ornl_aisd_ex_1.csv
         - ornl_aisd_ex_2.csv
         - ...
         - ornl_aisd_ex_1000.csv

By following these steps, you will be able to download the dataset required for this project.

Preprocess the dataset and prepare it before training the model

Before training and running the model the data has to be prepared, do so by running preprocess.py

   python -m src.processing.preprocess

Training the Model

Train the model using command line arguments or using a WandB configuration file. You can

Using Command Line Arguments

Override the default parameters directly through command line arguments. Defaults:

  default_config = {
     'lr': 5e-4,
     'hidden_channels': 250,
     'num_layers': 4,
     'num_timesteps': 2,
     'dropout': 0.025,
     'seed': None,
     'num_workers': 4,
     'total_epochs': 10,
     'warmup_epochs': 1,
     'run_id': None,
     'batch_size': 0,
     'Attention_mode': 'MoGATv2',
     'heads': 2,
     'loss_function': 'mse_loss',
     'metric': 'srmse',
     'savitzkey_golay': [],
     'window_length': 5,
     'polyorder': 3,
     'padding': True,
     'lr_ddp_scaling': 0,
     'batch_ddp_scaling': 1,
     'with_fake_edges': 0,
     'LOSS_FUNCTION': '',
     'METRIC_FUNCTION': 'srmse',
     'NUM_PROCESSES': 1,
     'DATA_DIRECTORY': 'data'
  }

     --lr', type=float, default=default_config['lr'], help='Learning rate')
     --hidden_channels', type=int, default=default_config['hidden_channels'], help='Hidden channels')
     --num_layers', type=int, default=default_config['num_layers'], help='Number of layers')
     --num_timesteps', type=int, default=default_config['num_timesteps'], help='Number of timesteps')
     --dropout', type=float, default=default_config['dropout'], help='Dropout')
     --seed', type=int, default=default_config['seed'], help='Seed')
     --num_workers', type=int, default=default_config['num_workers'], help='Number of workers')
     --run_id', type=str, default=default_config['run_id'], help='Run ID for resuming training')
     --total_epochs', type=int, default=default_config['total_epochs'], help='Number of epochs to train')
     --batch_size', type=int, default=default_config['batch_size'], help='Batch size')
     --Attention_mode', type=str, default=default_config['Attention_mode'], help='Attention mode')
     --heads', type=int, default=default_config['heads'], help='Number of heads')
     --loss_function', type=str, default=default_config['loss_function'], help='Loss function')
     --metric', type=str, default=default_config['metric'], help='Metric')
     --savitzkey_golay', type=list, default=default_config['savitzkey_golay'], help='Savitzkey Golay filter')
     --with_fake_edges', type=int, default=default_config['with_fake_edges'], help='Data with fake edges')
     --DATA_DIRECTORY', type=str, default=default_config['DATA_DIRECTORY'], help='Data directory')
     --lr_ddp_scaling, type=int, default=default_config['lr_ddp_scaling'], help='Scale learning rate with number of GPUs')
     --batch_ddp_scaling, type=int, default=default_config['batch_ddp_scaling'], help='Scale batch size with number of GPUs')
     --warmup_epochs, type=int, default=default_config['warmup_epochs'], help='Warmup epochs')

Run the script with your desired parameters:

   python train_ddp.py --lr 0.001 --hidden_channels 256 --num_layers 6

Using a WandB Config File

You can also use a WandB configuration file to set the parameters. Create a config.yaml file with the following content:

   #Program to run
   program: 'train_ddp.py'
   #Sweep search method: random, grid or Bayes
   method: 'random'

   # Project this sweep is part of
   project: 'example_project'
   entity: <WANDB_USER>

   # Metric to optimize
   metric:
   name: 'val_srmse'
   goal: 'minimize'

   # Parameters search space
   parameters:
   lr: 
      values: [0.0001]
   hidden_channels:
      values: [600]
   num_layers:
      values: [8]
   num_timesteps:
      values: [2]
   dropout:
      values: [0.05]
   num_workers:
      value: 3
   total_epochs:
      value: 50
   warmup_epochs:
      value: 2
   batch_size:
      values: [0]
   Attention_mode:
      values: ['GATv2']
   heads: 
      values: [3]
   loss_function:
      values: ['mse_loss']
   with_fake_edges: 
      value: 0
   lr_ddp_scaling:
      value: 0
   batch_ddp_scaling:
      value: 1
   savitzkey_golay:
      values: [0]
   #  seed:
   #    values: [42, 13, 7]

With a configuration file setup, we are now able to utilize the W&B sweep functionality.

   wandb sweep config.yaml

You should now receive a in the terminal. The project and sweep should also be present on your W&B page.

We are now able to start a new run in the sweep:

   wandb agent <WANDB_USER>/example_project/<sweep-ID>

Necessary modifications

The final step to get everything operational is to edit path variables and potential variables related to the wandb setup. This is changes are done in src/config/params.py.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
notebooks		notebooks
plots		plots
src		src
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
environment.yml		environment.yml
postBuild		postBuild
setup.py		setup.py
train_ddp.py		train_ddp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AttentiveFP-UV

Table of Contents

Project Overview

Background

Goals and Objectives

Setup

Prerequisites

Installation

Retrieving the Dataset

Steps to Download the Dataset

Preprocess the dataset and prepare it before training the model

Training the Model

Using Command Line Arguments

Using a WandB Config File

Necessary modifications

About

Releases

Packages

Languages

williamnyren/AttentiveFP-UV

Folders and files

Latest commit

History

Repository files navigation

AttentiveFP-UV

Table of Contents

Project Overview

Background

Goals and Objectives

Setup

Prerequisites

Installation

Retrieving the Dataset

Steps to Download the Dataset

Preprocess the dataset and prepare it before training the model

Training the Model

Using Command Line Arguments

Using a WandB Config File

Necessary modifications

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages