From 1d43d99a338e4a2b0d29504733c9909ec0ced9cb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Vin=C3=ADcius=20H=2E=20Franceschini-Santos?=
 <v.franceschini@nki.nl>
Date: Mon, 7 Jul 2025 15:50:29 +0200
Subject: [PATCH 1/9] Update README.md

---
 README.md | 42 +++++++++++++-----------------------------
 1 file changed, 13 insertions(+), 29 deletions(-)

diff --git a/README.md b/README.md
index a86defa..aab676d 100644
--- a/README.md
+++ b/README.md
@@ -12,7 +12,7 @@
 PARM (Promoter Activity Regulatory Model) is a deep learning model that predicts the promoter activity from the DNA sequence itself.
 As a convolution neural network trained on MPRA data, **PARM** is very lightweight and produces predictions in a cell-type-specific manner.
 
-With the `PARM predict` tool, you can get predictions for any sequence that you want for K562, HepG2, MCF7, LNCaP, or HCT116 cells. 
+With the `PARM predict` tool, you can get predictions for any sequence that you want for AGS, HAP1, HCT116, HEK116, HepG2, K562, LNCaP, MCF7, and U2OS cells.
 
 With `PARM mutagenesis`, in addition to simple promoter activity scores, **PARM** can also produce the so-called _in-silico_ mutagenesis plot.
 This is useful for predicting which TFs are regulating (activating or repressing) your sequence. (read more on [Running _in-silico_ mutagenesis](#running-in-silico-mutagenesis)).
@@ -35,32 +35,24 @@ To predict the promoter activity in K562 of every sequence in a fasta file, run:
 parm predict \
   --input example_data/input.fasta \
   --output output_K562.txt \
-  --model pre_trained_models/K562.parm
+  --model pre_trained_models/K562/
 ```
 
-> Note that you should replace `pre_trained_models/K562.parm` with the actual path to the pre-trained models available on this page.
-
-To perform predictions for more than one cell, you can simply provide all the paths separated by space:
-
-```sh
-parm predict \
-  --input example_data/input.fasta \
-  --output output_K562_HepG2_LNCaP.txt \
-  --model pre_trained_models/K562.parm pre_trained_models/HepG2.parm pre_trained_models/LNCaP.parm
-```
+> Note that you should replace `pre_trained_models/K562/` with the actual path to the pre-trained models available on this page.
+> Also, note that a PARM model is composed of five different folds, as each model is trained five times. If you check the content of `pre_trained_models/K562/`,
+> you will see the `.parm` files there, one for each fold. Do not rename or change the files there unless you know what you are doing.
 
 The output is a tab-separated file. 
 The first and second columns contain information about the sequence (the sequence and its header).
-The following column contains the predicted promoter activity for the model you have selected. 
-If you performed predictions for more than one cell, more than one column will be created here.
+The following column contains the predicted promoter activity for the model you have selected.
 
 For the command line above, you should expect the following result:
 
-| sequence    | header                           | prediction_K562   | prediction_HepG2   | prediction_LNCaP    |
-|-------------|----------------------------------|-------------------|--------------------|---------------------|
-| CTGGGAGG... | CXCR4_chr2:136875708:136875939:- | 2.287095785140991 | 1.4889564514160156 | 0.2345067262649536  |
-| GCAACTAA... | MED16_chr19:893131:893362:-      | 2.22406268119812  | 2.6182565689086914 | 0.30299943685531616 |
-| ACGCCCAG... | TERT_chr5:1295135:1295366:-      | 1.993780255317688 | 1.474591612815857  | 0.11847741901874542 |
+| sequence    | header                           | prediction_K562   |
+|-------------|----------------------------------|-------------------|
+| CTGGGAGG... | CXCR4_chr2:136875708:136875939:- | 2.287095785140991 |
+| GCAACTAA... | MED16_chr19:893131:893362:-      | 2.22406268119812  |
+| ACGCCCAG... | TERT_chr5:1295135:1295366:-      | 1.993780255317688 |
 
 
 ## Running _in-silico_ mutagenesis
@@ -71,16 +63,7 @@ To compute the _in-silico_ mutagenesis for every sequence in a fasta file, run:
 parm mutagenesis \
   --input example_data/input.fasta \
   --output in_silico_mutagenesis_K562 \
-  --model pre_trained_models/K562.parm
-```
-
-You can also run `PARM mutagenesis` for more than one cell:
-
-```sh
-parm mutagenesis \
-  --input input.fasta \
-  --output in_silico_mutagenesis_K562_HepG2_LNCaP \
-  --model pre_trained_models/K562.parm pre_trained_models/HepG2.parm pre_trained_models/LNCaP.parm
+  --model pre_trained_models/K562/
 ```
 
 For every sequence in the input fasta, **PARM** will predict the effect of every possible mutation of every single base pair.
@@ -108,5 +91,6 @@ parm plot \
 This will read the mutagenesis matrix and the hits for the sequence `sequence_of_interest` and generate the plot.
 By default, **PARM** stored the result plot as a PDF inside the input dir.
 This can be changed using optional arguments. 
+
 Run `parm plot --help` for additional help on that.
 

From da6be5afa4979b7120c8003c91d9e21c5d3576e6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Vin=C3=ADcius=20H=2E=20Franceschini-Santos?=
 <v.franceschini@nki.nl>
Date: Fri, 11 Jul 2025 16:06:19 +0200
Subject: [PATCH 2/9] Update README.md

---
 README.md | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/README.md b/README.md
index 3712fd1..5a9cd10 100644
--- a/README.md
+++ b/README.md
@@ -13,8 +13,8 @@
 
 ## Introduction
 
-PARM (Promoter Activity Regulatory Model) is a deep learning model that predicts the promoter activity from the DNA sequence itself.
-As a convolution neural network trained on MPRA data, **PARM** is very lightweight and produces predictions in a cell-type-specific manner.
+PARM (Promoter Activity Regulatory Model) is a deep learning model that predicts promoter activity from the DNA sequence itself.
+As a convolutional neural network trained on MPRA data, **PARM** is very lightweight and produces predictions in a cell-type-specific manner.
 
 With the `PARM predict` tool, you can get predictions for any sequence that you want for AGS, HAP1, HCT116, HEK116, HepG2, K562, LNCaP, MCF7, and U2OS cells.
 
@@ -79,13 +79,13 @@ The output of `PARM mutagenesis` is a directory where, for every sequence, both
 
 ## Plotting results of _in-silico_ mutagenesis
 
-Results of _in-silico_ mutagenesis are more insightful when visualized in the following format:
+Results of _in-silico_ mutagenesis are more insightful when visualised in the following format:
 
 <p align="center"><img src="misc/CXCR4_chr2:136875708:136875939:-.png" alt="plot example" width="100%"></p>
 
 You can easily see the mutagenesis matrix and all the scanned TF motifs.
 
-To produce such a visualization, you can run:
+To produce such a visualisation, you can run:
 
 ```sh
 parm plot \
@@ -93,7 +93,7 @@ parm plot \
 ```
 
 This will read the mutagenesis matrix and the hits for the sequence `sequence_of_interest` and generate the plot.
-By default, **PARM** stored the result plot as a PDF inside the input dir.
+By default, **PARM** stores the result plot as a PDF inside the input dir.
 This can be changed using optional arguments. 
 
 Run `parm plot --help` for additional help on that.
@@ -171,8 +171,6 @@ parm train \
   --cell_type AGS
 ```
 
-### Making predictions with your own model
-
 After training all the folds, you should place all the folds in a single directory:
 
 ```sh
@@ -185,17 +183,25 @@ cp AGS_fold0/AGS_fold0.parm \
    my_AGS_model/
 ```
 
-and then, run:
+### Evaluating your model with the test fold
+
+Now, you can evaluate the model using the test fold. This is part of your dataset that was excluded from the training. 
+Therefore, a standard evaluation of the model is to compare the measured and predicted promoter activity of the fragments in this fold.
+
+For this, you can make use of the `--predict_test_fold` flag of the `PARM predict`, as follows:
 
 ```sh
 parm predict \
-  --input example_data/input.fasta \
-  --output output_my_AGS.txt \
+  --predict_test_fold \
+  --input example_data/training_data/onehot/test.hdf5 \
+  --output my_AGS_model_test \
   --model my_AGS_model/
 ```
 
+This will create `my_AGS_model_test` directory containing the scatter plots showing the correlation between measured and predicted activity, both at the fragment and feature levels (averaging fragments of the same regulatory features). 
+
 #### Considerations for training your model
 
 - The provided data in the `example_data/training_data` is not enough to train a good PARM model. We only provide it here for the sake of this tutorial.
-- Always run the `PARM train` function from a GPU server. A normal CPU machine will take a long time to train a model, even the provided example data. In the start of the training, PARM will print in the screen if a GPU is detected. Make sure that you see `GPU detected? True`.
-- Even if your input data contains measurements for more than one cell (as the provided example, that contains data for AGS and HAP1), you can only train a model for one cell at a time.
+- Always run the `PARM train` function from a GPU server. A normal CPU machine will take a long time to train a model, even with the provided example data. At the start of the training, PARM will print on the screen if a GPU is detected. Make sure that you see `GPU detected? True`.
+- Even if your input data contains measurements for more than one cell (as the provided example, which contains data for AGS and HAP1), you can only train a model for one cell at a time.

From 5b248679517fb3cf98c212f1c631c2df40886bd6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Vin=C3=ADcius=20H=2E=20Franceschini-Santos?=
 <v.franceschini@nki.nl>
Date: Fri, 11 Jul 2025 16:26:42 +0200
Subject: [PATCH 3/9] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 5a9cd10..265e6b6 100644
--- a/README.md
+++ b/README.md
@@ -8,8 +8,8 @@
 - [Running _in-silico_ mutagenesis](#running-in-silico-mutagenesis)
 - [Plotting results of _in-silico_ mutagenesis](#plotting-results-of-in-silico-mutagenesis)
   - [Training your own PARM model](#training-your-own-parm-model)
-  - [Making predictions with your own model](#making-predictions-with-your-own-model)
-    - [Considerations for training your model](#considerations-for-training-your-model)
+  - [Evaluating your model with the test fold](#evaluating-your-model-with-the-test-fold)
+  - [Considerations for training your model](#considerations-for-training-your-model)
 
 ## Introduction
 

From 11535619f10d6bb190c1a395842a4b617337b721 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Vin=C3=ADcius=20H=2E=20Franceschini-Santos?=
 <v.franceschini@nki.nl>
Date: Fri, 11 Jul 2025 16:37:45 +0200
Subject: [PATCH 4/9] Update README.md

---
 README.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/README.md b/README.md
index 265e6b6..77132f3 100644
--- a/README.md
+++ b/README.md
@@ -110,8 +110,8 @@ To train the PARM models for the AGS cell, you can run:
 ```sh
 # Fold 0 model
 parm train \
-  --input example_data/training_data/onehot/fold[1234].* \
-  --validation example_data/training_data/onehot/fold0.hdf5 \
+  --input example_data/training_data/fold[1234].* \
+  --validation example_data/training_data/fold0.hdf5 \
   --output AGS_fold0 \
   --cell_type AGS
 ```
@@ -144,29 +144,29 @@ Similarly, for the other folds, you can run:
 ```sh
 # Fold 1 model
 parm train \
-  --input example_data/training_data/onehot/fold[0234].* \
-  --validation example_data/training_data/onehot/fold1.hdf5 \
+  --input example_data/training_data/fold[0234].* \
+  --validation example_data/training_data/fold1.hdf5 \
   --output AGS_fold1 \
   --cell_type AGS
 
 # Fold 2 model
 parm train \
-  --input example_data/training_data/onehot/fold[0134].* \
-  --validation example_data/training_data/onehot/fold2.hdf5 \
+  --input example_data/training_data/fold[0134].* \
+  --validation example_data/training_data/fold2.hdf5 \
   --output AGS_fold2 \
   --cell_type AGS
 
 # Fold 3 model
 parm train \
-  --input example_data/training_data/onehot/fold[0124].* \
-  --validation example_data/training_data/onehot/fold3.hdf5 \
+  --input example_data/training_data/fold[0124].* \
+  --validation example_data/training_data/fold3.hdf5 \
   --output AGS_fold3 \
   --cell_type AGS
 
 # Fold 4 model
 parm train \
-  --input example_data/training_data/onehot/fold[0123].* \
-  --validation example_data/training_data/onehot/fold4.hdf5 \
+  --input example_data/training_data/fold[0123].* \
+  --validation example_data/training_data/fold4.hdf5 \
   --output AGS_fold4 \
   --cell_type AGS
 ```
@@ -193,7 +193,7 @@ For this, you can make use of the `--predict_test_fold` flag of the `PARM predic
 ```sh
 parm predict \
   --predict_test_fold \
-  --input example_data/training_data/onehot/test.hdf5 \
+  --input example_data/training_data/test.hdf5 \
   --output my_AGS_model_test \
   --model my_AGS_model/
 ```

From a9bf91334aa346165e67000d8ca98553f6b143d3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Vin=C3=ADcius=20H=2E=20Franceschini-Santos?=
 <v.franceschini@nki.nl>
Date: Mon, 14 Jul 2025 11:22:07 +0200
Subject: [PATCH 5/9] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 77132f3..8ead571 100644
--- a/README.md
+++ b/README.md
@@ -202,6 +202,6 @@ This will create `my_AGS_model_test` directory containing the scatter plots show
 
 #### Considerations for training your model
 
-- The provided data in the `example_data/training_data` is not enough to train a good PARM model. We only provide it here for the sake of this tutorial.
-- Always run the `PARM train` function from a GPU server. A normal CPU machine will take a long time to train a model, even with the provided example data. At the start of the training, PARM will print on the screen if a GPU is detected. Make sure that you see `GPU detected? True`.
+- The provided data in the `example_data/training_data` is not enough to train a good PARM model. We provide it here solely for this tutorial.
+- Always run the `PARM train` function from a GPU server. A normal CPU machine will take a long time to train a model, even with the provided example data. At the start of the training, PARM will print on the screen if a GPU is detected. Make sure that you see `GPU detected? True`. You can also run `parm train --check_cuda`; this will check if any GPU is detected and exit.
 - Even if your input data contains measurements for more than one cell (as the provided example, which contains data for AGS and HAP1), you can only train a model for one cell at a time.

From d3bc387636ea1e212213998226f0e67ba678cead Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Vin=C3=ADcius=20H=2E=20Franceschini-Santos?=
 <v.franceschini@nki.nl>
Date: Tue, 22 Jul 2025 13:58:58 +0200
Subject: [PATCH 6/9] Update README.md

---
 README.md | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/README.md b/README.md
index 8ead571..ede6b4b 100644
--- a/README.md
+++ b/README.md
@@ -10,6 +10,7 @@
   - [Training your own PARM model](#training-your-own-parm-model)
   - [Evaluating your model with the test fold](#evaluating-your-model-with-the-test-fold)
   - [Considerations for training your model](#considerations-for-training-your-model)
+- [Citation](#citation)
 
 ## Introduction
 
@@ -205,3 +206,11 @@ This will create `my_AGS_model_test` directory containing the scatter plots show
 - The provided data in the `example_data/training_data` is not enough to train a good PARM model. We provide it here solely for this tutorial.
 - Always run the `PARM train` function from a GPU server. A normal CPU machine will take a long time to train a model, even with the provided example data. At the start of the training, PARM will print on the screen if a GPU is detected. Make sure that you see `GPU detected? True`. You can also run `parm train --check_cuda`; this will check if any GPU is detected and exit.
 - Even if your input data contains measurements for more than one cell (as the provided example, which contains data for AGS and HAP1), you can only train a model for one cell at a time.
+
+---
+
+## Citation
+
+If you make use of PARM and/or this pipeline, please cite:
+
+> [Barbadilla-Martínez, L.; Klaassen, N.; Franceschini-Santos, V. H.; Breda, J.; Hernandez-Quiles, M.; van Lieshout, T.; Urzua Traslaviña, C.; Yücel, H.; Boi, M.; Hermana-Garcia-Agullo, C.; Gregoricchio, S.; Zwart, W.; Voest, E.; Franke, L.; Vermeulen, M.; de Ridder, J., van Steensel, B. (2024). The regulatory grammar of human promoters uncovered by MPRA-trained deep learning. BioRxiv.](https://www.biorxiv.org/content/10.1101/2024.07.09.602649v2)

From 2a3d5db78c9c06d53329a737500fad5b4be7292c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Vin=C3=ADcius=20H=2E=20Franceschini-Santos?=
 <v.franceschini@nki.nl>
Date: Mon, 4 Aug 2025 14:51:18 +0200
Subject: [PATCH 7/9] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index ede6b4b..764a31b 100644
--- a/README.md
+++ b/README.md
@@ -99,7 +99,7 @@ This can be changed using optional arguments.
 
 Run `parm plot --help` for additional help on that.
 
-### Training your own PARM model
+## Training your own PARM model
 
 If you want to train a PARM model with your MPRA data, you must pre-process the raw MPRA counts using our [pre-processing pipeline](https://github.com/vansteensellab/PARM_preprocessing_pipeline).
 This will produce, mainly, one-hot encoded files with the promoter activity per fragment, per cell. 

From 7499f0938849271b2edeb6229f99c0532574a047 Mon Sep 17 00:00:00 2001
From: Peter Sobolewski <76622105+psobolewskiPhD@users.noreply.github.com>
Date: Fri, 29 Aug 2025 11:01:42 -0400
Subject: [PATCH 8/9] Update README.md to remove anaconda and clarify __cuda

---
 README.md | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 764a31b..8a25b87 100644
--- a/README.md
+++ b/README.md
@@ -27,7 +27,15 @@ This is useful for predicting which TFs are regulating (activating or repressing
 **PARM** can be easily installed with `conda`:
 
 ```sh
-conda install -c anaconda -c conda-forge -c bioconda -c pytorch parm
+conda install -n parm-env -c conda-forge -c bioconda -c pytorch parm
+```
+
+Note: this package can benefit from GPU acceleration using CUDA. Ensure that the [virtual package](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html) `__cuda` is present using `conda info` or use `CONDA_OVERRIDE_CUDA=<version>` to specify CUDA version.
+
+To use the package, activate the environment using:
+
+```sh
+conda activate parm-env
 ```
 
 ## Usage examples

From 1c7226d82a0498b021ce99db284a4dd7f1079fab Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Vin=C3=ADcius=20H=2E=20Franceschini-Santos?=
 <v.franceschini@nki.nl>
Date: Mon, 1 Sep 2025 09:57:06 +0200
Subject: [PATCH 9/9] update README

---
 README.md | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index 8a25b87..56ef2d7 100644
--- a/README.md
+++ b/README.md
@@ -27,15 +27,12 @@ This is useful for predicting which TFs are regulating (activating or repressing
 **PARM** can be easily installed with `conda`:
 
 ```sh
-conda install -n parm-env -c conda-forge -c bioconda -c pytorch parm
+conda create -n parm_env -c conda-forge -c bioconda -c pytorch parm
 ```
-
-Note: this package can benefit from GPU acceleration using CUDA. Ensure that the [virtual package](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html) `__cuda` is present using `conda info` or use `CONDA_OVERRIDE_CUDA=<version>` to specify CUDA version.
-
-To use the package, activate the environment using:
+This will create an environment with **PARM** and all dependencies. Before running, activate the environment with:
 
 ```sh
-conda activate parm-env
+conda activate parm_env
 ```
 
 ## Usage examples