-
Notifications
You must be signed in to change notification settings - Fork 45
Hw4_Grigoriants #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
VovaGrig
wants to merge
46
commits into
Python-BI-2023:main
Choose a base branch
from
VovaGrig:HW4_Grigoriants
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Hw4_Grigoriants #19
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
f4861e7
Add folder HW4_Grigoriants, create README.md
VovaGrig c3b919c
Add protein_tools.py with run_protein_tools and check_for_motif funct…
VovaGrig bc24a41
Add 'search_for_alt_frames' function
EkaterinShitik f81d442
Add 'convert_to_nucl_acids' function
EkaterinShitik cbeb58a
Add conditions in 'main' function
EkaterinShitik 1cd3287
Merge pull request #1 from EkaterinShitik/HW4_Grigoriants
VovaGrig d91cfd4
Add minor fix to protein_tools.py
VovaGrig 7f54bec
Merge branch 'HW4_Grigoriants' of github.com:VovaGrig/HW4_Functions2 …
VovaGrig 39b8acd
Add check_and_parse_user_input in protein_tools.py, add fixes
VovaGrig 29fd752
Add minor fixes in protein_tools.py
VovaGrig de4e146
Add check_and_parse_user_input in protein_tools.py, add fixes
VovaGrig 620a551
Add three_one_letter_code and define_molecular_weight functions and f…
c641b5e
Merge pull request #2 from vladislavi27/HW4_Grigoriants
VovaGrig 93d2d5f
Add minor fixes in protein_tools.py
VovaGrig ac9a165
Merge branch 'HW4_Grigoriants' of github.com:VovaGrig/HW4_Functions2 …
VovaGrig d731697
Add minor fixes in protein_tools.py
VovaGrig e670429
Add minor changes to 'convert_to_nucl_acids' function
EkaterinShitik fe41d85
Change transcription rule in 'convert_to_nucl_acids' function
EkaterinShitik c8e9823
Correct inaccuracies in the dockstring of 'convert_to_nucl_acids'
EkaterinShitik cb03cf4
Change inaccuracies in the dockstring of 'convert_to_nucl_acids'
EkaterinShitik b193a6b
Change annotation of 'search_for_alt_frames' function
EkaterinShitik f53914a
Add minor fixes in protein_tools.py
VovaGrig 2ce8ada
Add plan of README.md
EkaterinShitik 18c1a76
Complete 'Usage'
EkaterinShitik ea3be7e
Add preliminary 'Options'
EkaterinShitik a1c1c23
Add preliminary 'Examples'
EkaterinShitik 6a4e2b1
Merge branch 'VovaGrig:HW4_Grigoriants' into HW4_Grigoriants
EkaterinShitik 454d703
Complete 'Examples'
EkaterinShitik e5628a5
Complete four first parts
EkaterinShitik 53a7556
Complete all parts except for contacts
EkaterinShitik 33744ad
Complete all parts
EkaterinShitik 0fbb184
Add minor changes in 'Options'
EkaterinShitik d9bdb50
Add dockstrings to main function, search_for_motifs function, add min…
VovaGrig 78fc1e0
Add docstrings to three_one_letter_code and define_molecular_weight f…
fdf4b60
Add minor fixes
VovaGrig 5b30b20
Merge branch 'HW4_Grigoriants' into HW4_Grigoriants
VovaGrig bb79bb0
Merge pull request #7 from vladislavi27/HW4_Grigoriants
VovaGrig 39db203
Merge branch 'HW4_Grigoriants' into HW4_Grigoriants
VovaGrig 20f86fe
Merge pull request #6 from EkaterinShitik/HW4_Grigoriants
VovaGrig 6794624
Add mifixes to docstrings
VovaGrig d3b21d1
Add mminor fixes
VovaGrig 7412e71
Update README.md: add information, pictures, team photo
VovaGrig 4d23561
Update README.md
VovaGrig dd6f4a6
Update README.md
VovaGrig a3bec1b
Add fixes based on feedback to dictionaries.py and protein_tools.py
VovaGrig 0416e82
Merge branch 'HW4_Grigoriants' of github.com:VovaGrig/HW4_Functions2 …
VovaGrig File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,223 @@ | ||
# Protein_tools.py | ||
## A tool to work with protein sequences | ||
|
||
*Proteins* are under the constant focus of scientists. Currently, there are an enormous amount of tools to operate with nucleotide sequences, however, the same ones for proteins are extremely rare. | ||
|
||
|
||
`protein_tools.py` is an open-source program that facilitates working with protein sequences. | ||
|
||
## Usage | ||
The programm is based on `run_protein_tools` function that takes the list of **one-letter amino acid sequences**, a name of procedure and a relevant argument. If you have three-letter amino acids sequences you could convert them by using `three_one_letter_code` procedure in advance. Please convert your three-letter coded sequences with `three_one_letter_code` procedure before using any other procedures on them. | ||
|
||
To start with the program run the following command: | ||
|
||
`run_protein_tools(sequences, procedure="procedure", ...)` | ||
|
||
Where: | ||
- sequences - positional argument, a list of protein sequences | ||
- procedure - keyword argument, a type of procedure to use that is inputed in *string* type | ||
- ... - an additional keyword arguments that are to be inputed in *string* type | ||
- | ||
Before start, check the *Options* and *Examples*. | ||
## Options | ||
|
||
The program has five types of procedures, for more information please see provided docstrings: | ||
|
||
`three_one_letter_code` | ||
|
||
 | ||
|
||
- The main aim - to convert three-letter amino acid sequences to one-letter ones and vice-versa | ||
- In case of three-to-one translation the names of amino acids **must be separated with hyphen** | ||
- An additional argument: no | ||
``` | ||
""" | ||
Reverse the protein sequences from one-letter to three-letter format and vice-versa | ||
|
||
Case 1: get three-letter sequence\n | ||
Use one-letter amino-acids sequences of any letter case | ||
|
||
Case 2: get one-letter sequence\n | ||
Use three-letter amino-acid separated by "-" sequences. | ||
Please note that sequences without "-" are parsed as one-letter code sequences\n | ||
Example: for sequence "Ala" function will return "Ala-leu-ala" | ||
|
||
Arguments: | ||
- sequences (tuple[str] or list[str]): protein sequences to convert\n | ||
Example: ["WAG", "MkqRe", "msrlk", "Met-Ala-Gly", "Met-arg-asn-Trp-Ala-Gly", "arg-asn-trp"] | ||
|
||
Return: | ||
- list: one-letter/three-letter protein sequences\n | ||
Example: ["Met-Ala-Gly", "Met-arg-asn-Trp-Ala-Gly", "arg-asn-trp", "WAG", "MkqRe", "rlk"] | ||
""" | ||
``` | ||
|
||
`define_molecular_weight` | ||
|
||
 | ||
|
||
- The main aim - to determine the exact molecular weight of protein sequences | ||
- An additional argument: no | ||
``` | ||
""" | ||
Define molecular weight of the protein sequences | ||
|
||
Use one-letter amino-acids sequences of any letter case | ||
The molecular weight is: | ||
- a sum of masses of each atom constituting a molecule | ||
- expressed in units called daltons (Da) | ||
- rounded to hundredths | ||
|
||
Arguments: | ||
- sequences (tuple[str] or list[str]): protein sequences to convert | ||
|
||
Return: | ||
- dictionary: protein sequences as keys and molecular masses as values\n | ||
Example: {"WAG": 332.39, "MkqRe": 690.88, "msrlk": 633.86} | ||
""" | ||
``` | ||
|
||
`search_for_motifs` | ||
|
||
 | ||
|
||
- The main aim - to search for the motif of interest in protein sequences | ||
- An additional arguments: motif (*str*), overlapping (*bool*) | ||
``` | ||
""" | ||
Search for motifs - conserved amino acids residues in protein sequence | ||
|
||
Search for one motif at a time\n | ||
Search is letter case sensitive\n | ||
Use one-letter aminoacids code for desired sequences and motifs\n | ||
Positions of AA in sequences are counted from 0\n | ||
By default, overlapping matches are counted | ||
|
||
Arguments: | ||
- sequences (tuple[str] or list[str]): sequences to check for given motif within\n | ||
Example: sequences = ["AMGAGW", "GAWSGRAGA"] | ||
- motif (str]: desired motif to check presense in every given sequence\n | ||
Example: motif = "GA" | ||
- overlapping (bool): count (True) or skip (False) overlapping matches. (Optional)\n | ||
Example: overlapping = False | ||
Return: | ||
- dictionary: sequences (str] as keys , starting positions for presented motif (list) as values\n | ||
Example: {"AMGAGW": [2], "GAWSGRAGA": [0, 7]} | ||
""" | ||
``` | ||
`search_for_alt_frames` | ||
|
||
 | ||
|
||
- The main aim - to look for alternative frames that start with methyonine or other non-canonical start amino acids | ||
- Ignores the last three amino acids due to the insignicance of alternative frames of this length | ||
- An additional argument: alt_start_aa (*str*) | ||
- Use alt_start_aa **only for non-canonical start amino acids** | ||
- Without alt_start_aa the procedure find alternative frames that start with methyonine | ||
``` | ||
""" | ||
Search for alternative frames in a protein sequences | ||
|
||
Search is not letter case sensitive\n | ||
Without an alt_start_aa argument search for frames that start with methionine ("M") | ||
To search frames with alternative start codon add alt_start_aa argument\n | ||
In alt_start_aa argument use one-letter code | ||
|
||
The function ignores the last three amino acids in sequences | ||
|
||
Arguments: | ||
- sequences (tuple[str] or list[str]): sequences to check | ||
- alt_start_aa (str]: the name of an amino acid that is encoded by alternative start AA (Optional)\n | ||
Example: alt_start_aa = "I" | ||
|
||
Return: | ||
- dictionary: the number of a sequence and a collection of alternative frames | ||
""" | ||
``` | ||
`convert_to_nucl_acids` | ||
|
||
 | ||
|
||
- The main aim - to convert protein sequences to DNA, RNA or both nucleic acid sequences | ||
- The program use the most frequent codons in human that could be found [here](https://www.genscript.com/tools/codon-frequency-table) | ||
- An additional argument: nucl_acids (*str*) | ||
- Use as nucl_acids only DNA, RNA or both (for more detailes, check *Examples*) | ||
``` | ||
""" | ||
Convert protein sequences to RNA or DNA sequences. | ||
|
||
Use the most frequent codons in human. The source - https://www.genscript.com/tools/codon-frequency-table\n | ||
All nucleic acids (DNA and RNA) are showed in 5"-3" direction | ||
|
||
Arguments: | ||
- sequences (tuple[str] or list[str]): sequences to convert | ||
- nucl_acids (str]: the nucleic acid that is prefered\n | ||
Example: nucl_acids = "RNA" - convert to RNA\n | ||
nucl_acids = "DNA" - convert to DNA\n | ||
nucl_acids = "both" - convert to RNA and DNA | ||
Return: | ||
- dictionary: nucleic acids (str) as keys, collection of sequences (list) as values | ||
""" | ||
``` | ||
|
||
## Examples | ||
```python | ||
# three_one_letter_code | ||
run_protein_tools(['met-Asn-Tyr', 'Ile-Ala-Ala'], procedure='three_one_letter_code') # ['mNY', 'IAA'] | ||
run_protein_tools(['mNY','IAA'], procedure='three_one_letter_code') # ['met-Asn-Tyr', 'Ile-Ala-Ala'] | ||
|
||
|
||
# define_molecular_weight | ||
run_protein_tools(['MNY','IAA'], procedure='define_molecular_weight') # {'MNY': 426.52, 'IAA': 273.35} | ||
|
||
|
||
# check_for_motifs | ||
run_protein_tools(['mNY','IAA'], procedure='search_for_motifs', motif='NY') | ||
#Sequence: mNY | ||
#Motif: NY | ||
#Motif is present in protein sequence starting at positions: 1 | ||
|
||
#Sequence: IAA | ||
#Motif: NY | ||
#Motif is not present in protein sequence | ||
|
||
{'mNY': [1], 'IAA': []} | ||
|
||
|
||
# search_for_alt_frames | ||
run_protein_tools(['mNYQTMSPYYDMId'], procedure='search_for_alt_frames') # {'mNYQTMSPYYDMId': ['MSPYYDMId']} | ||
run_protein_tools(['mNYTQTSP'], procedure='search_for_alt_frames', alt_start_aa='T') # {'mNYTQTSP': ['TQTSP']} | ||
|
||
|
||
# convert_to_nucl_acids | ||
run_protein_tools(['MNY'], procedure='convert_to_nucl_acids', nucl_acids = 'RNA') # {'RNA': ['AUGAACUAU']} | ||
run_protein_tools(['MNY'], procedure='convert_to_nucl_acids', nucl_acids = 'DNA') # {'DNA': ['TACTTGATA']} | ||
run_protein_tools(['MNY'], procedure='convert_to_nucl_acids', nucl_acids = 'both') # {'RNA': ['AUGAACUAU'], 'DNA': ['TACTTGATA']} | ||
|
||
``` | ||
|
||
## Troubleshooting | ||
|
||
| Type of the problem | Probable cause | ||
| ------------------------------------------------------------ |-------------------- | ||
| Output does not correspond the expected resultes | The name of procedure is wrong. You see the results of another procedure | ||
| ValueError: No sequences provided | A list of sequences are not inputed | ||
| ValueError: Wrong procedure | The procedure does not exist in this program | ||
| TypeError: takes from 0 to 1 positional arguments but n were given | Sequences are not collected into the list type | ||
| ValueError: Invalid sequence given | The sequences do not correspond to standard amino acid code | ||
| ValueError: Please provide desired motif | There are no an additional argument *motif* in `search_for_motifs` | ||
| ValueError: Invalid start AA | There is more than one letter in an additional argument *alt_start_aa* in `search_for_alt_frames` | ||
| ValueError: Please provide desired type of nucl_acids | There are no an additional argument *nucl_acids* in `convert_to_nucl_acids` | ||
| ValueError: Invalid nucl_acids argument | An additional argument in `convert_to_nucl_acids` is written incorrectly | ||
## Contacts | ||
Vladimir Grigoriants ([email protected]) | ||
Team-leader. Bioinformatician, immunologist, MiLaborary inc. TCR-libraries QC developer | ||
|
||
Ekaterina Shitik ([email protected]) | ||
Doctor of medicine, molecular biologist with the main interests on gene engineering, AAV vectors and CRISPR/Cas9 technologies | ||
|
||
Vlada Tuliavko ([email protected]) | ||
MiLaboratory inc. manager&designer, immunologist | ||
|
||
## Our team | ||
 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
AMINO_ACIDS = { | ||
"A": "Ala", | ||
"C": "Cys", | ||
"D": "Asp", | ||
"E": "Glu", | ||
"F": "Phe", | ||
"G": "Gly", | ||
"H": "His", | ||
"I": "Ile", | ||
"K": "Lys", | ||
"L": "Leu", | ||
"M": "Met", | ||
"N": "Asn", | ||
"P": "Pro", | ||
"Q": "Gln", | ||
"R": "Arg", | ||
"S": "Ser", | ||
"T": "Thr", | ||
"V": "Val", | ||
"W": "Trp", | ||
"Y": "Tyr", | ||
} | ||
TRANSLATION_RULE = { | ||
"F": "UUU", | ||
"L": "CUG", | ||
"I": "AUU", | ||
"M": "AUG", | ||
"V": "GUG", | ||
"P": "CCG", | ||
"T": "ACC", | ||
"A": "GCG", | ||
"Y": "UAU", | ||
"H": "CAU", | ||
"Q": "CAG", | ||
"N": "AAC", | ||
"K": "AAA", | ||
"D": "GAU", | ||
"E": "GAA", | ||
"C": "UGC", | ||
"W": "UGG", | ||
"R": "CGU", | ||
"S": "AGC", | ||
"G": "GGC", | ||
} | ||
AMINO_ACID_WEIGHTS = { | ||
"A": 89.09, | ||
"C": 121.16, | ||
"D": 133.10, | ||
"E": 147.13, | ||
"F": 165.19, | ||
"G": 75.07, | ||
"H": 155.16, | ||
"I": 131.17, | ||
"K": 146.19, | ||
"L": 131.17, | ||
"M": 149.21, | ||
"N": 132.12, | ||
"P": 115.13, | ||
"Q": 146.15, | ||
"R": 174.20, | ||
"S": 105.09, | ||
"T": 119.12, | ||
"V": 117.15, | ||
"W": 204.23, | ||
"Y": 181.19, | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Супер!