Skip to content

Latest commit

 

History

History
134 lines (104 loc) · 6.21 KB

File metadata and controls

134 lines (104 loc) · 6.21 KB

AMES: atomistic molecular evolution simulator

Experimental code for simulating structural evolution of protein, RNA and their complexes with atomistic models.

Installation

git clone https://github.com/sahakyanhk/ames.git && cd ames
pip install numpy pandas pybind11 setuptools matplotlib tqdm zstandard
pip install . # install pdb_contacts

Install AlphaFold3, OpenFold3, or ESMFold via Transformers

Quick Start

Evolution of a protein interacting with RNA using AlphaFold3

python src/ames.py --iseq1 'protein:randoms:65:evolv' --seq1_rate 0.5 \
                    --iseq2 'rna:randoms:24:evolv' --seq2_rate 1 \
                    -pm npm -rm pmo \
                    -ps 100 -ng 1000 \
                    -ann -ann_s 150 -ann_e 999 \
                    -b0 0.8 -bt 8.0 \
                    --engine af3 \
                    -o outputs/protein_rna_test

Use src/visualames.py to process trajectory and run basic analyses.
This will extract main lineage, structures in pdb format and generate summary plots

python src/visualames.py -l outputs/protein_rna_test/progress.log

Use AMESViewer to visualize and analyse the simulation trajectories in ChimeraX

Single protein fold evolution simulation (PFES) with ESMFold

python src/ames.py --iseq1 'protein:randoms:65:evolv' --seq1_rate 1 \
                    -pm pmo -ps 100 -ng 1000 \
                    --engine esmfold \
                    -o outputs/esmfold_test


python src/visualames.py -l outputs/esmfold_test/progress.log

Running an batch of simulations on HPC with Slurm

for i in {01..09}; do bash_helpers/run_ames.sbatch outputs/batch1/run$i; done

When the batch jobs are done, use bash_helpers/summarize_batch.sh to summarize the results. Install rnpclust for clustering RNA-protein complexes.

bash_helpers/summarize_batch.sh outputs/batch1 

Dry run for debugging working without structure prediction engine.

python src/ames.py --pop_size 50 --num_generations 100 --engine simulacrum -o outputs/simulacrum_test

Settings

simulation control
-ng, --num_generations Total number of generations in the simulation
-ps, --pop_size Population size in each generation

sequence setup
--iseq1 1st seq info to initiate simulation [protein, rna, dna]:[random, randoms]:[sequence_length]:[evolv, static]. e.g., "protein:random:25:evolv"
--seq1_init 1st seqence, can be actual sequence, random - the same random sequence for entire populations, randoms - each sequence in the population is random
--seq1_type 1st seq type [protein, rna, dna]
--seq1_len if random(s) is used, provide random sequence length, ignore if sequence is provided
--seq1_evol does seq1 evolve [True/False], store_true
--seq1_rate probability of 1st sequence to mutate [0,1]
use the same settings for seq2 with --iseq2, --seq2_init, --seq2_type, --seq2_len, --seq2_evol, --seq2_rate

temperature control
-ann,--annealing Use temperature annealing, see -b0, -bt, -ann_s, -ann_e, or -ann_step for annealing setup
-b0, --beta Selection strength, the higher beta the lower temperature and stronger selection
-bt, --beta_target Target temperature if annealing is used
-ann_s, --annealing_start, Generation when annealing starts
-ann_e,--annealing_end, Generation when annealing reaches target temperature
-ann_step,--annealing_step, Annealing step, calculated automatically if -ann_s and -ann_e are provided

mutation setup
-pm, -rm, -dm or --protein_mutations, --rna_mutations, --dna_mutations protein, RNA and DNA mutations types
    npm substitutions, insertions, deletions, permutations and duplications
    pmo substitutions and single residues indels
    rso residue substitutions only

-pa, --protein_alphabet amino acid mutation probabilities [uniform, uniprot, codonrates], uniform by default
-ra, --rna_alphabet nucleotide mutation probabilities
-da, --dna_alphabet nucleotide mutation probabilities

ligand setup
--ligand ligand(s) provided in ccd or smiles format separated with commas e.g., "--ligand ATP,MG" \

outputs control
-o,--outpath output dir name where log and checkpoint files are saved, "ames_output/output" by default
-l, --log output log file name, "progress.log" by default
-c, --ckp checkpoint file name, "progress.ckp" by default
-ckpi, --checkpoint_interval checkpoint saving frequency generations
--nobackup, action=store_true overwrite output if exists

constraints
--seq1_min_len seq1 minimal length constraint
--seq1_max_len seq1 maximal length constraint
--seq2_min_len seq2 minimal length constraint
--seq2_max_len seq2 maximal length constraint

other
--engine structure prediction engine [alphafold3, openfold3, esmfold, simulacrum]"
--contact_min_seq_dist minimum sequence distance for contact calculation
--contact_cutoff cutoff distance for contact calculation
--contact_min_plddt minimum plddt for contact calculation
--interface_plddt_cutoff cutoff distance for interface plddt
--lig_contact_cutoff cutoff distance for ligand contact calculation
--lig_contact_min_plddt minimum plddt for ligand contact calculation
--clash_overlap_threshold cutoff distance for clash calculation in angstroms
--clash_min_seq_dist minimum sequence distance for clash calculation
--norepeat do not generate and/or select the same sequences more than once, off by default
--max_seq_per_batch max_seq_per_batch, half or population size by default
--config json file with settings
see simparam.json for full list of settings and default values.