Multi-Objective Optimization of Reference Compound Lists for Rigorous Evaluation of Predictive Toxicity Models

This repository optimizes a set of compounds using a genetic algorithm. Furthermore, we curate a dataset of compounds used in pharmaceutical safety test validation studies by JaCVAM and compare them with the compounds derived from the genetic algorithm.

Publication

Not yet.

File Structure

The project is organized into the following main directories:

data/: Contains data files.
processed/: Processed data.
raw/: Raw data.
result/: The directory is structured to correspond with the figures in our paper.
data_validation_test_and_HTS/: Contains validation test and HTS (High-Throughput Screening) data in an easy-to-use format.
notebook/: Contains Jupyter Notebooks for various tasks.
experiment/:
- 0_dataset_unbalance.ipynb: Visualizes the imbalance in the number of positive data points and the bias in toxicity strength within the dataset.
- 1_GA.ipynb: Runs the multi-objective genetic algorithm (GA) to optimize compound sets based on similarity scores and toxicity variation.
- 2_UMAP.ipynb: Uses UMAP to visualize the distribution of compounds generated through a genetic algorithm.
- 3_random_generation.ipynb: Compares the random population, GA results, and the validation test dataset by generating various plots and histograms.
- 4_use_each_gen_for_eval.ipynb: Analyzes the performance of the GA by measuring the distance of the generated compound sets from the Pareto front.
- 5_tox_prediction.ipynb: Compares toxicity prediction performance between the validation data and the GA-generated datasets.
preprocess/:
- 0_preprocess_ice.ipynb: Extracts LD50 or DART test data, including CAS-RN and toxicity information, from the ICE dataset.
- 1_preprocess_toxcast.ipynb: Extracts Tox21 ER or AR test data, including CAS-RN and toxicity information, from the ToxCast dataset.
- 2_preprocess_validation_test.ipynb: Prepares the main dataset for the genetic algorithm by removing compounds used in the validation test.
- 3_preprocess_for_lookup.ipynb: Creates lookup tables containing canonical SMILES and physicochemical properties for compound pairs by utilizing PubChemPy and CAS-RN.
- 4_preprocess_for_tox_prediction.ipynb: Prepares the datasets for toxicity prediction by splitting the data into training, evaluation, test, and GA-selected sets.
src/: Python scripts for core functionalities.
- ga.py: Genetic algorithm core components.
- prep.py: Data preprocessing scripts.
- util.py: Utility functions.
requirements.txt: Lists the project dependencies.

Authors

Yohei Ohto - Main contributor
Tadahaya Mizuno - Correspondence

Contact

If you have any questions or comments, please feel free to create an issue on GitHub here, or email us:

oy826c60[at]gmail.com
tadahaya[at]gmail.com
tadahaya[at]mol.f.u-tokyo.ac.jp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Objective Optimization of Reference Compound Lists for Rigorous Evaluation of Predictive Toxicity Models

Publication

File Structure

Authors

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
data_validation_test_and_HTS		data_validation_test_and_HTS
notebook		notebook
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

mizuno-group/multi-objective-optimization

Folders and files

Latest commit

History

Repository files navigation

Multi-Objective Optimization of Reference Compound Lists for Rigorous Evaluation of Predictive Toxicity Models

Publication

File Structure

Authors

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages