Skip to content

mizuno-group/multi-objective-optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Objective Optimization of Reference Compound Lists for Rigorous Evaluation of Predictive Toxicity Models

This repository optimizes a set of compounds using a genetic algorithm. Furthermore, we curate a dataset of compounds used in pharmaceutical safety test validation studies by JaCVAM and compare them with the compounds derived from the genetic algorithm.


Publication

Not yet.


File Structure

The project is organized into the following main directories:

  • data/: Contains data files.
  • processed/: Processed data.
  • raw/: Raw data.
  • result/: The directory is structured to correspond with the figures in our paper.
  • data_validation_test_and_HTS/: Contains validation test and HTS (High-Throughput Screening) data in an easy-to-use format.
  • notebook/: Contains Jupyter Notebooks for various tasks.
  • experiment/:
    • 0_dataset_unbalance.ipynb: Visualizes the imbalance in the number of positive data points and the bias in toxicity strength within the dataset.
    • 1_GA.ipynb: Runs the multi-objective genetic algorithm (GA) to optimize compound sets based on similarity scores and toxicity variation.
    • 2_UMAP.ipynb: Uses UMAP to visualize the distribution of compounds generated through a genetic algorithm.
    • 3_random_generation.ipynb: Compares the random population, GA results, and the validation test dataset by generating various plots and histograms.
    • 4_use_each_gen_for_eval.ipynb: Analyzes the performance of the GA by measuring the distance of the generated compound sets from the Pareto front.
    • 5_tox_prediction.ipynb: Compares toxicity prediction performance between the validation data and the GA-generated datasets.
  • preprocess/:
    • 0_preprocess_ice.ipynb: Extracts LD50 or DART test data, including CAS-RN and toxicity information, from the ICE dataset.
    • 1_preprocess_toxcast.ipynb: Extracts Tox21 ER or AR test data, including CAS-RN and toxicity information, from the ToxCast dataset.
    • 2_preprocess_validation_test.ipynb: Prepares the main dataset for the genetic algorithm by removing compounds used in the validation test.
    • 3_preprocess_for_lookup.ipynb: Creates lookup tables containing canonical SMILES and physicochemical properties for compound pairs by utilizing PubChemPy and CAS-RN.
    • 4_preprocess_for_tox_prediction.ipynb: Prepares the datasets for toxicity prediction by splitting the data into training, evaluation, test, and GA-selected sets.
  • src/: Python scripts for core functionalities.
    • ga.py: Genetic algorithm core components.
    • prep.py: Data preprocessing scripts.
    • util.py: Utility functions.
  • requirements.txt: Lists the project dependencies.

Authors

  • Yohei Ohto - Main contributor
  • Tadahaya Mizuno - Correspondence

Contact

If you have any questions or comments, please feel free to create an issue on GitHub here, or email us:

  • oy826c60[at]gmail.com
  • tadahaya[at]gmail.com
  • tadahaya[at]mol.f.u-tokyo.ac.jp

About

多目的最適化実行用

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •