Skip to content

sahakyanhk/ames

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMES: atomistic molecular evolution simulator

Experimental code for simulating structural evolution of protein, RNA and their complexes with atomistic models.

Installation

git clone https://github.com/sahakyanhk/ames.git && cd ames
pip install numpy pandas pybind11 setuptools matplotlib tqdm zstandard
pip install . # install pdb_contacts

Install AlphaFold3, OpenFold3, or ESMFold via Transformers

Quick Start

Evolution of a protein interacting with RNA using AlphaFold3

python src/ames.py --iseq1 'protein:randoms:65:evolv' --seq1_rate 0.5 \
                    --iseq2 'rna:randoms:24:evolv' --seq2_rate 1 \
                    -pm npm -rm pmo \
                    -ps 100 -ng 1000 \
                    -ann -ann_s 150 -ann_e 999 \
                    -b0 0.8 -bt 8.0 \
                    --engine af3 \
                    -o outputs/protein_rna_test

Use src/visualames.py to process trajectory and run basic analyses.
This will extract main lineage, structures in pdb format and generate summary plots

python src/visualames.py -l outputs/protein_rna_test/progress.log

Use AMESViewer to visualize and analyse the simulation trajectories in ChimeraX

Single protein fold evolution simulation (PFES) with ESMFold

python src/ames.py --iseq1 'protein:randoms:65:evolv' --seq1_rate 1 \
                    -pm pmo -ps 100 -ng 1000 \
                    --engine esmfold \
                    -o outputs/esmfold_test


python src/visualames.py -l outputs/esmfold_test/progress.log

Running an batch of simulations on HPC with Slurm

for i in {01..09}; do bash_helpers/run_ames.sbatch outputs/batch1/run$i; done

When the batch jobs are done, use bash_helpers/summarize_batch.sh to summarize the results. Install rnpclust for clustering RNA-protein complexes.

bash_helpers/summarize_batch.sh outputs/batch1 

Dry run for debugging working without structure prediction engine.

python src/ames.py --pop_size 50 --num_generations 100 --engine simulacrum -o outputs/simulacrum_test

Settings

simulation control
-ng, --num_generations Total number of generations in the simulation
-ps, --pop_size Population size in each generation

sequence setup
--iseq1 1st seq info to initiate simulation [protein, rna, dna]:[random, randoms]:[sequence_length]:[evolv, static]. e.g., "protein:random:25:evolv"
--seq1_init 1st seqence, can be actual sequence, random - the same random sequence for entire populations, randoms - each sequence in the population is random
--seq1_type 1st seq type [protein, rna, dna]
--seq1_len if random(s) is used, provide random sequence length, ignore if sequence is provided
--seq1_evol does seq1 evolve [True/False], store_true
--seq1_rate probability of 1st sequence to mutate [0,1]
use the same settings for seq2 with --iseq2, --seq2_init, --seq2_type, --seq2_len, --seq2_evol, --seq2_rate

temperature control
-ann,--annealing Use temperature annealing, see -b0, -bt, -ann_s, -ann_e, or -ann_step for annealing setup
-b0, --beta Selection strength, the higher beta the lower temperature and stronger selection
-bt, --beta_target Target temperature if annealing is used
-ann_s, --annealing_start, Generation when annealing starts
-ann_e,--annealing_end, Generation when annealing reaches target temperature
-ann_step,--annealing_step, Annealing step, calculated automatically if -ann_s and -ann_e are provided

mutation setup
-pm, -rm, -dm or --protein_mutations, --rna_mutations, --dna_mutations protein, RNA and DNA mutations types
    npm substitutions, insertions, deletions, permutations and duplications
    pmo substitutions and single residues indels
    rso residue substitutions only

-pa, --protein_alphabet amino acid mutation probabilities [uniform, uniprot, codonrates], uniform by default
-ra, --rna_alphabet nucleotide mutation probabilities
-da, --dna_alphabet nucleotide mutation probabilities

ligand setup
--ligand ligand(s) provided in ccd or smiles format separated with commas e.g., "--ligand ATP,MG" \

outputs control
-o,--outpath output dir name where log and checkpoint files are saved, "ames_output/output" by default
-l, --log output log file name, "progress.log" by default
-c, --ckp checkpoint file name, "progress.ckp" by default
-ckpi, --checkpoint_interval checkpoint saving frequency generations
--nobackup, action=store_true overwrite output if exists

constraints
--seq1_min_len seq1 minimal length constraint
--seq1_max_len seq1 maximal length constraint
--seq2_min_len seq2 minimal length constraint
--seq2_max_len seq2 maximal length constraint

other
--engine structure prediction engine [alphafold3, openfold3, esmfold, simulacrum]"
--contact_min_seq_dist minimum sequence distance for contact calculation
--contact_cutoff cutoff distance for contact calculation
--contact_min_plddt minimum plddt for contact calculation
--interface_plddt_cutoff cutoff distance for interface plddt
--lig_contact_cutoff cutoff distance for ligand contact calculation
--lig_contact_min_plddt minimum plddt for ligand contact calculation
--clash_overlap_threshold cutoff distance for clash calculation in angstroms
--clash_min_seq_dist minimum sequence distance for clash calculation
--norepeat do not generate and/or select the same sequences more than once, off by default
--max_seq_per_batch max_seq_per_batch, half or population size by default
--config json file with settings
see simparam.json for full list of settings and default values.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors