Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #62

Open
wants to merge 2 commits into
base: pre-release
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 48 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,19 @@
# ComputAge
A library for full-stack aging clocks design and benchmarking.

*The full release version of the package is currently developing. Only bechmarking module is released and ready for use. Please see below.*
*The full release version of this package is currently under development. Only the bechmarking module is released and ready for use. Please see below.*

## Installation

You can install the whole library with `pip`:

`pip install computage`

This provides all necessary instruments for aging clocks benchmarking.
This provides all instruments necessary for aging clocks benchmarking.

# ComputAgeBench

A module in the `computage` library for epigenetic aging clocks benchmarking. This library is tightly bound with `computage_bench` huggingface [repository](https://huggingface.co/datasets/computage/computage_bench) where all DNA methylation data of **66** GSEs from more than **50** studies can be retrieved from. All details on our methodology of epigenetic aging clocks benchmarking and results can be found in the [paper](https://www.biorxiv.org/content/10.1101/2024.06.06.597715v1).
A module in the `computage` library for epigenetic aging clocks benchmarking. This library is tightly bound with the `computage_bench` Hugging Face [repository](https://huggingface.co/datasets/computage/computage_bench), where all **66** DNA methylation datasets from more than **50** studies are assembled and can be retrieved from. All details regarding our methodology of epigenetic aging clocks benchmarking and its results can be found in the [paper](https://www.biorxiv.org/content/10.1101/2024.06.06.597715v1).

## Introduction

Expand All @@ -30,9 +30,8 @@ and used to predict an individual’s age. Moreover, it has been hypothesized th
However, comparing aging clock performance is no trivial task, as there is no gold standard measure of one’s biological age, so using MAE, Pearson’s *r*, or other
common correlation metrics is not sufficient.

To foster greater advances in the aging clock field, [we developed a methodology and a dataset](https://huggingface.co/datasets/computage/computage_bench) for aging clock benchmarking, ComputAge Bench, which relies on measuring
model ability to predict increased ages in samples from patients with *pre-defined* **aging-accelerating conditions** (AACs) relative to samples from
healthy controls (HC). **We highly recommend consulting the Methods and Discussion sections of our paper before proceeding to use this dataset and to build
To foster greater advances in the aging clock field, [we developed a methodology and a dataset](https://huggingface.co/datasets/computage/computage_bench) for aging clock benchmarking, ComputAge Bench, which relies on measuring model ability to predict increased ages in samples from patients with *pre-defined* **aging-accelerating conditions** (AACs) relative to samples from
healthy controls (HC). **We highly recommend consulting the Methods and Discussion sections of our paper before proceeding to use the benchmarking dataset and to build
any conclusions upon it.**

<p align="center">
Expand All @@ -47,61 +46,85 @@ any conclusions upon it.**

### sklearn-based model

Suppose you trained brand-new epigenetic aging clocks model using classic `scikit-learn` library. You saved your model as `pickle` file. Then, the following block of code can be used for benchmarking your model. We also added several other published aging clocks for comparison with yours.
Suppose you have trained your brand new epigenetic aging clock model using the classic `scikit-learn` library. You should save your model as a `pickle` file. Then, the following block of code can be used to benchmark your model. We also implemented imputation of missing values from the R [SeSAMe](https://github.com/zwdzwd/sesame) package and added several published aging clock models for comparison.

```python
from computage import run_benchmark

#first define NaN imputation method for `in_library` models
#for simlicity here we recommend to use imputation with
#gold standard averages (from R package `sesame`)
# first, define a method to impute NaNs for the in_library models
# we recommend using imputation with gold standard values from SeSAMe
imputation = 'sesame_450k'

# for example, take these three clock models for benchmarking
models_config = {
"in_library":{
'HorvathV1':{'imputation':imputation},
'Hannum':{'imputation':imputation},
'PhenoAgeV2':{'imputation':imputation},
},
#here we should define a name of our new model as well as path
#to the pickle file (.pkl) of the model
# here we can define a name of our new model, as well as path
# to the pickle file (.pkl) that contains it
"new_models":{
#'my_new_model_name': {'path':/path/to/model.pkl}
}
}
#now run the benchmark
# now run the benchmark
bench = run_benchmark(models_config,
experiment_prefix='my_model_test',
output_folder='./benchmark'
)
#upon completion, the results will be saved in the folder you specified

# upon completion, the results will be saved in the folder you have specified for output
```

### pytorch-based model
[...upcoming...]


### Explore the dataset
In case you want just to explore our dataset locally, use the following commands for downloading.
In case you only want to explore our dataset locally, use the following commands to download it:
```python
from huggingface_hub import snapshot_download
snapshot_download(
repo_id='computage/computage_bench',
repo_type="dataset",
local_dir='.')
```
Once downloaded, the dataset can be open with `pandas` (or any other `parquet` reader).

Once downloaded, the dataset can be opened with `pandas` (or any other `parquet` reader).

```python
import pandas as pd
#let's choose a study id, for example `GSE100264`
df = pd.read_parquet('data/computage_bench_data_GSE100264.parquet').T
#note we transpose data for more convenient perception
#Don't forget to explore metadata (which is common for all datasets):

# let's choose a study id, for example, `GSE100264`
df = pd.read_parquet('data/computage_bench_data_GSE100264.parquet').T
# note that we transpose data for a more convenient perception of samples and features

# don't forget to explore metadata (which is common for all datasets):
meta = pd.read_csv('computage_bench_meta.tsv', sep='\t', index_col=0)
```

## Reproducing paper results
All results and plots of the `ComputAgeBench` paper can be reproduced using this [notebook](https://drive.google.com/file/d/1_nrGMUd8oH8ADNWUPNeXHr4ZAJlZOQhm/view?usp=sharing). Alternatively, you can just clone this repository and run the same notebook locally from the `notebooks` folder.
All results and plots from the `ComputAgeBench` paper can be reproduced using this [notebook](https://drive.google.com/file/d/1_nrGMUd8oH8ADNWUPNeXHr4ZAJlZOQhm/view?usp=sharing). Alternatively, you can simply clone this repository and run the same `benchmarking.ipynb` notebook locally from the `notebooks` folder.

## Additional information
[...Table with all clocks...]

Aging clock models included in this package.

| Name | Year | Number of CpGs | Generation | Extra parameters | Tissues used for training | Reference |
|:------------:|:----:|:--------------:|:----------:|:----------------:|:-------------------------:|:----------------------------------------------------------------------:|
| Hannum | 2013 | 71 | 1 | — | Blood | [Hannum G. et al.](https://doi.org/10.1016/j.molcel.2012.10.016) |
| HorvathV1 | 2013 | 353 | 1 | — | Multi-tissue | [Horvath S.](https://doi.org/10.1186/gb-2013-14-10-r115) |
| Lin | 2016 | 99 | 1 | — | Blood | [Lin Q. et al.](https://doi.org/10.18632/aging.100908) |
| VidalBralo | 2016 | 8 | 1 | — | Blood | [Vidal-Bralo L. et al.](https://doi.org/10.3389/fgene.2016.00126) |
| HorvathV2 | 2018 | 391 | 1 | — | Blood, Skin | [Horvath S. et al.](https://doi.org/10.18632/aging.101508) |
| PhenoAgeV1 | 2018 | 513 | 2 | — | Blood | [Levine M.E. et al.](https://doi.org/10.18632/aging.101414) |
| Zhang19_EN | 2019 | 514 | 1 | — | Blood, Saliva | [Zhang Q, et al.](https://doi.org/10.1186/s13073-019-0667-1) |
| GrimAgeV1 | 2019 | 1030 | 2 | Age, Sex | Blood | [Lu A. et al.](https://doi.org/10.18632%2Faging.101684) |
| GrimAgeV2 | 2022 | 1030 | 2 | Age, Sex | Blood | [Lu A. et al.](https://doi.org/10.18632%2Faging.204434) |
| PhenoAgeV2 | 2022 | 959 | 2 | — | Blood | [Higgins-Chen A.T. et al.](https://doi.org/10.1038/s43587-022-00248-2) |
| YingAdaptAge | 2024 | 999 | 1 | — | Blood | [Ying K. et al.](https://doi.org/10.1038/s43587-023-00557-0) |
| YingCausAge | 2024 | 585 | 1 | — | Blood | [Ying K. et al.](https://doi.org/10.1038/s43587-023-00557-0) |
| YingDamAge | 2024 | 1089 | 1 | — | Blood | [Ying K. et al.](https://doi.org/10.1038/s43587-023-00557-0) |

## Cite us

Expand All @@ -124,12 +147,9 @@ If you found this library or corresponding [dataset]((https://huggingface.co/dat
For any questions or clarifications, please reach out to: [email protected]

## Community
Please feel free to leave any questions and suggestions in issues, however, if you want a faster and broader discussion, please join to our [telegram chat](https://t.me/agingmath).
Please feel free to leave any questions and suggestions in the issues section. However, if you want a faster and broader discussion, please join our [telegram chat](https://t.me/agingmath).

## Acknowledgments

We thank the [biolearn](https://bio-learn.github.io/data.html) team for providing inspiration and many useful tools that were helpful during the initial development stage of this library.


We thank the [biolearn](https://bio-learn.github.io/data.html) team for providing an inspiration and a lot of useful tools that were helpful during the initial stages of developing this library.