Experimental Workflow for EVOLVEpro

This directory contains scripts for running the experimental workflow of EVOLVEpro, which iteratively optimizes protein activity through experimental rounds of evolution. The source functions used here are in evolvepro/src/evolve.py, which calls the underlying model.

For exp, use the evolvepro environment:

conda activate evolvepro

Evolution Types

Single Mutant Evolution: Explores individual amino acid substitutions
Multi-Mutant Evolution: Explores combinations of mutations based on previous rounds

Input Files

FASTA file: Contains the wild-type protein sequence
PLM embeddings: CSV file(s) containing embeddings for all mutants of interest
Round data: Excel files containing activity measurements for each round of evolution

Key Parameters

protein_name: Name of the protein being evolved
round_name: Identifier for the current round of evolution
number_of_variants: Number of variants to predict for the next round
rename_WT: Boolean to indicate if the wild-type sequence should be renamed in the output

Usage

Create a Python script (e.g., t7_pol.py) with the following structure:

from evolvepro.src.evolve import evolve_experimental, evolve_experimental_multi

protein_name = 't7_pol'
embeddings_base_path = '/path/to/embeddings'
embeddings_file_name = 'embeddings_file.csv'
round_base_path = '/path/to/round/data'
wt_fasta_path = "/path/to/wildtype/fasta"
number_of_variants = 12
output_dir = '/path/to/output/directory'

# Single variant
round_name = 'Round2'
round_file_names = ['T7_pol_Round1.xlsx', 'T7_pol_Round2.xlsx']
rename_WT = True

evolve_experimental(
    protein_name,
    round_name,
    embeddings_base_path,
    embeddings_file_name,
    round_base_path,
    round_file_names,
    wt_fasta_path,
    rename_WT,
    number_of_variants,
    output_dir
)

# Multivariant
embeddings_file_name_2nd = 'embeddings_2nd_file.csv'
embeddings_file_name_3rd = 'embeddings_3rd_file.csv'

round_name = 'Round6'
round_file_names_single = ['T7_pol_Round1.xlsx', 'T7_pol_Round2.xlsx', 'T7_pol_Round3.xlsx', 'T7_pol_Round4.xlsx']
round_file_names_multi = ['T7_pol_Round5.xlsx']
rename_WT = True

evolve_experimental_multi(
    protein_name,
    round_name,
    embeddings_base_path,
    [embeddings_file_name, embeddings_file_name_2nd, embeddings_file_name_3rd],
    round_base_path,
    round_file_names_single,
    round_file_names_multi,
    wt_fasta_path,
    rename_WT,
    number_of_variants,
    output_dir
)

Detailed results will be saved in the specified output directory, for each round, and the specified top number_of_variants to assess for the following round will be returned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Experimental Workflow for EVOLVEpro

Evolution Types

Input Files

Key Parameters

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Experimental Workflow for EVOLVEpro

Evolution Types

Input Files

Key Parameters

Usage