finetune_pinnacle

Sep 23, 2024

18d1d3e · Sep 23, 2024

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md	Add README for finetune	Dec 30, 2023
data_prep.py	data_prep.py	Update finetune	May 6, 2024
extract_txdata_utils.py	extract_txdata_utils.py	Update uniprot API	Sep 23, 2024
metrics_utils.py	metrics_utils.py	Generalize finetune	Dec 30, 2023
model.py	model.py	Initial commit of code	Jul 19, 2023
prepare_txdata.py	prepare_txdata.py	Add default param to args.celltype_ppi	Apr 3, 2024
read_data.py	read_data.py	Update finetune	May 6, 2024
run_model.sh	run_model.sh	Generalize finetune	Dec 30, 2023
setup.py	setup.py	Generalize finetune	Dec 30, 2023
train.py	train.py	Update finetune	May 6, 2024
train_utils.py	train_utils.py	Generalize finetune	Dec 30, 2023

README.md

Finetuning PINNACLE

You can finetune PINNACLE on your own datasets by using our provided model checkpoints or contextualized representations (i.e., no re-training needed).

Step-by-Step Instructions

We provide detailed instructions for fine-tuning PINNACLE on the pretrained contextualized protein representations.

Step 1: Curate fine-tuning data

You may use prepare_txdata.py as an example.

The required outputs of your script are:

Positive proteins (dict)
- Filename: positive_proteins_<task_name>.json
- Data structure: {"<celltype name>": ["<protein name>"]}
Negative proteins (dict)
- Filename: negative_proteins_<task_name>.json
- Data structure: {"<celltype name>": ["<protein name>"]}
Raw data (list)
- Filename: raw_data_<task_name>.json
- Data structure: ["<protein name>"]

Step 2: Split and format data

With the three files created in Step 1, run data_prep.py. The outputs of this script are:

Data split indices (dict)
- Filename: data_split_<task_name>.json
- Data Structure: {"pos_train_indices": [...], "neg_train_indices": [...], "pos_test_indices": [...], "neg_test_indices": [...]}
Data split name (dict)
- Filename: data_split_<task_name>_name.json
- Data Structure: {"pos_train_names": [...], "neg_train_names": [...], "pos_test_names": [...], "neg_test_names": [...]}

Example command:

python data_prep.py \
    --embeddings_dir ../data/pinnacle_embeds/ \
    --embed pinnacle \
    --celltype_ppi ../data/networks/ \
    --positive_proteins_prefix ../data/therapeutic_target_task/positive_proteins_EFO_0000685 \
    --negative_proteins_prefix ../data/therapeutic_target_task/negative_proteins_EFO_0000685 \
    --raw_data_prefix ../data/therapeutic_target_task/raw_targets_EFO_0000685 \
    --data_split_path ../data/therapeutic_target_task/data_split_EFO_0000685

Step 3: Finetune

With the following files, run train.py:

positive_proteins_<task_name>.json
negative_proteins_<task_name>.json
data_split_<task_name>.json

Example command:

python train.py \
    --task_name EFO_0000685 \
    --embeddings_dir ../data/pinnacle_embeds/ \
    --embed pinnacle \
    --positive_proteins_prefix ../data/therapeutic_target_task/positive_proteins_EFO_0000685 \
    --negative_proteins_prefix ../data/therapeutic_target_task/negative_proteins_EFO_0000685 \
    --data_split_path ../data/therapeutic_target_task/data_split_EFO_0000685

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

finetune_pinnacle

finetune_pinnacle

README.md

Finetuning PINNACLE

Step-by-Step Instructions

Step 1: Curate fine-tuning data

Step 2: Split and format data

Step 3: Finetune

Files

finetune_pinnacle

Directory actions

More options

Directory actions

More options

Latest commit

History

finetune_pinnacle

Folders and files

parent directory

README.md

Finetuning PINNACLE

Step-by-Step Instructions

Step 1: Curate fine-tuning data

Step 2: Split and format data

Step 3: Finetune