Skip to content

Hw4 chesnokova #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions HW4_Chesnokova/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@

# Protein sequence utility
This tool is designed to work with amino acid sequences consisting of _22 proteinogenic amino acid_ residues (including pyrrolizine and selenocysteine) recorded in a standard one-letter format. It is not intended to process sequences with post-translational and other amino acid modifications

## Usage
You call the `amino_acid_tools` function, which takes as input an arbitrary number of arguments with amino-acid sequences (str), as well as the name of the procedure to be executed (it is always the last argument, str). After that the command performs the specified action on all the given sequences. If one sequence is submitted, a string with the result is returned. If several sequences are submitted, a list of strings is returned.
Input sequences can contain both uppercase and lowercase letters, but the last argument with the function name must correspond to the listed functions.

### Remark
- if the sequences passed by you contain inappropriate characters (not from the single-letter aminoxylot encoding), the result of the function will be a list without them
- the fewer amino acids a sequence contains, the less reliable the 'folding' function is

## Options
The following options for aminoacid sequence processing are available at the moment:

- **molecular_weight**: calculate the molecular weight of the amino acid chain in Da, according to the average amino acid residues molecular masses rounded to 1 or 2 decimal places.
- **three_letter_code**: converts standard single letter translations to three letter translations
- **show_length**: count the overall number of amino acids in the given
- **sequence folding**: count the number of amino acids characteristic separately for alpha helixes and beta sheets,and give out what will be the structure of the protein more. This function has been tested on proteins such as 2M3X, 6DT4 (PDB ID) and MHC, CRP. The obtained results corresponded to reality.
- **seq_charge**: evaluates the overall charge of the aminoacid chain in neutral aqueous solution (pH = 7), according to the pKa of amino acid side chains, lysine, pyrrolizine and arginine contribute +1, while asparagine and glutamic amino acids contribute -1. The total charge of a protein is evaluated as positive, negative, or neutral as the sum of these contributions

## Examples
Below is an example of processing an amino acid sequence.

### Using the function for molecular weight calculation

```shell
amino_acid_tools('EGVIMSELKLK', 'PLPKvelPPDFVD', 'DVIGISILGKEV', 'molecular_weight')
```

Input: 'EGVIMSELKLK', 'PLPKvelPPDFVD', 'DVIGISILGKEV', 'molecular_weight'
Output: '[1228.66, 1447.8400000000001, 1224.6399999999999]'

### Using the function to convert one-letter translations to three-letter translations

```shell
amino_acid_tools('EGVIMSELKLK', 'PLPKvelPPDFVD', 'DVIGISILGKEV', 'three_letter_code')
```

Input: 'EGVIMSELKLK', 'PLPKvelPPDFVD', 'DVIGISILGKEV', 'three_letter_code'
Output: '['GluGlyValIleMetSerGluLeuLysLeuLys', 'ProLeuProLysValGluLeuProProAspPheValAsp', 'AspValIleGlyIleSerIleLeuGlyLysGluVal']'

### Using the function to counts the number of amino acids in the given sequence

```shell
amino_acid_tools('EGVIMSELKLK', 'PLPKvelPPDFVD', 'DVIGISILGKEV', 'show_length')
```

Input: 'EGVIMSELKLK', 'PLPKvelPPDFVD', 'DVIGISILGKEV', 'show_length'
Output: '[11, 13, 12]'

### Using the function to determine the predominant secondary structure

```shell
amino_acid_tools('EGVIMSELKLK', 'PLPKvelPPDFVD', 'DVIGISILGKEV', 'folding')
```
Input: 'EGVIMSELKLK', 'PLPKvelPPDFVD', 'DVIGISILGKEV', 'folding'
Output: '['alfa_helix', 'equally', 'equally']'

### Using the function to estimate relative charge

```shell
amino_acid_tools('EGVIMSELKLK', 'PLPKvelPPDFVD', 'DVIGISILGKEV', 'seq_charge')
```

Input: 'EGVIMSELKLK', 'PLPKvelPPDFVD', 'DVIGISILGKEV', 'seq_charge'
Output: '['neutral', 'negative', 'negative']'



## Contacts
- [Cesnokova Anna] [email protected]
- [Lukina Maria]
[email protected]

![Screenshot](https://github.com/anisssum/HW4_Functions2/blob/HW4_Chesnokova/%D0%A1%D0%BD%D0%B8%D0%BC%D0%BE%D0%BA%20%D1%8D%D0%BA%D1%80%D0%B0%D0%BD%D0%B0%202023-09-30%20223608.png)
162 changes: 162 additions & 0 deletions HW4_Chesnokova/amino_acid_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
AMINO_ACIDS = 'ARNDCEQGHILKMFPSTWYVUOarndceqghilkmfpstwyvuo'
SHORT_CODE = list(AMINO_ACIDS)
LONG_CODE = ['Ala', 'Arg', 'Asn', 'Asp', 'Cys', 'Glu', 'Gln', 'Gly', 'His', 'Ile', 'Leu', 'Lys', 'Met', 'Phe', 'Pro',
'Ser', 'Thr', 'Trp', 'Tyr', 'Val', 'Sec', 'Pyl',
'Ala', 'Arg', 'Asn', 'Asp', 'Cys', 'Glu', 'Gln', 'Gly', 'His', 'Ile', 'Leu', 'Lys', 'Met', 'Phe', 'Pro',
'Ser', 'Thr', 'Trp', 'Tyr', 'Val', 'Sec', 'Pyl']
MASSE = [71.08, 156.2, 114.1, 115.1, 103.1, 129.1, 128.1, 57.05, 137.1, 113.2, 113.2, 128.2, 131.2, 147.2, 97.12, 87.08,
101.1, 186.2, 163.2, 99.13, 168.05, 255.3,
71.08, 156.2, 114.1, 115.1, 103.1, 129.1, 128.1, 57.05, 137.1, 113.2, 113.2, 128.2, 131.2, 147.2, 97.12, 87.08,
101.1, 186.2, 163.2, 99.13, 168.05, 255.3]
Comment on lines +7 to +10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. MASS )))
  2. Вот тут всё-таки лучше бы задать словарем, т.к. сейчас это просто последовательность чисел



def molecular_weight(seq: str) -> float:
"""
Function calculates molecular weight of the amino acid chain
Parameters:
seq (str): each letter refers to one-letter coded proteinogenic amino acids
Returns:
(float) Molecular weight of tge given amino acid chain in Da
"""
d_mass = dict(zip(SHORT_CODE, MASSE))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну вот вроде круто, но нет. Лучше все-таки в MASS положить словарь, где ключ -- аминокислота и значение -- масса

m = 0
for acid in seq:
m = m + d_mass[acid]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Есть такой оператор += ))

return m


def three_letter_code(seq: str) -> str:
"""
Function converts single letter translations to three letter translations
Parameters:
seq (str): each letter refers to one-letter coded proteinogenic amino acids
Returns:
(str) translated in three-letter code
"""
d_names = dict(zip(SHORT_CODE, LONG_CODE))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

То же круто, но словарь опять же лучше ))0)0

recording = seq.maketrans(d_names)
return seq.translate(recording)


def show_length(seq: str) -> int:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Почему show? 0_0 ))

"""
Function counts the number of amino acids in the given sequence
Parameters:
seq (str): amino acid sequence
Returns:
(int): integer number of amino acid residues
"""
return len(seq)


def folding(seq: str) -> str:
"""
Counts the number of amino acids characteristic separately for alpha helixes and beta sheets,
and gives out what will be the structure of the protein more.
This function has been tested on proteins such as 2M3X, 6DT4 (PDB ID) and MHC, CRP.
The obtained results corresponded to reality.
Parameters:
seq (str): amino acid sequence
Returns:
(str): overcoming structure ('alfa_helix', 'beta_sheet', 'equally')
"""
alfa_helix = ['A', 'E', 'L', 'M', 'G', 'Y', 'S', 'a', 'e', 'l', 'm', 'g', 'y', 's']
beta_sheet = ['Y', 'F', 'W', 'T', 'V', 'I', 'y', 'f', 'w', 't', 'v', 'i']
alfa_helix_counts = 0
beta_sheet_counts = 0
for amino_acid in seq:
if amino_acid in alfa_helix:
alfa_helix_counts += 1
elif amino_acid in beta_sheet:
beta_sheet_counts += 1
if alfa_helix_counts > beta_sheet_counts:
return 'alfa_helix'
elif alfa_helix_counts < beta_sheet_counts:
return 'beta_sheet'
elif alfa_helix_counts == beta_sheet_counts:
return 'equally'


def seq_charge(seq: str) -> str:
"""
Function evaluates the overall charge of the aminoacid chain in neutral aqueous solution (pH = 7)
Parameters:
seq (str): amino acid sequence of proteinogenic amino acids
Returns:
(str): "positive", "negative" or "neutral"
Function realized by Anna Chesnokova
"""
aminoacid_charge = {'R': 1, 'D': -1, 'E': -1, 'K': 1, 'O': 1, 'r': 1, 'd': -1, 'e': -1, 'k': 1, 'o': 1}
charge = 0
for aminoacid in seq:
if aminoacid in 'RDEKOrdeko':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Лучше if aminoacid in aminoacid_charge. Просто если почему-то добавиться что-то в словарь, то нужно и тут в строке это добавлять

charge += aminoacid_charge[aminoacid]
if charge > 0:
return 'positive'
elif charge < 0:
return 'negative'
else:
return 'neutral'


def aminoacid_seqs_only(seqs: list) -> list:
"""
Leaves only the amino acid sequences from the fed into the function.
Parameters:
seqs (list): amino acid sequence list
Returns:
aminoacid_seqs (list): amino acid sequence list without non amino acid sequence
"""
aminoacid_seqs = []
for seq in seqs:
unique_chars = set(seq)
amino_acid = set(AMINO_ACIDS)
if unique_chars <= amino_acid:
aminoacid_seqs.append(seq)
return aminoacid_seqs


def amino_acid_tools(*args: str):
"""
Performs functions for working with protein sequences.

Parameters:
The function must accept an unlimited number of protein sequences (str) as input,
the last variable must be the function (str) you want to execute.
The amino acid sequence can consist of both uppercase and lowercase letters.
Input example:
amino_acid_tools('PLPKVEL','VDviRIkLQ','PPDFGKT','folding')
Function:
molecular_weight: calculates molecular weight of the amino acid chain
three_letter_code: converts single letter translations to three letter translations
show_length: counts the number of amino acids in the given sequence
folding: counts the number of amino acids characteristic separately for alpha helixes and beta sheets,
and gives out what will be the structure of the protein more
seq_charge: evaluates the overall charge of the aminoacid chain in neutral aqueous solution (pH = 7)

Returns:
If one sequence is supplied, a string with the result is returned.
If several are submitted, a list of strings is returned.
Depending on the function performed, the following returns will occur:
molecular_weight (int) or (list): amino acid sequence molecular weight number or list of numbers
three_letter_code (str) or (list): translated sequence from one-letter in three-letter code
show_length (int) or (list): integer number of amino acid residues
folding (str) or (list): 'alpha_helix', if there are more alpha helices
'beta_sheet', if there are more beta sheets
'equally', if the probability of alpha spirals and beta sheets are the same
seq_charge(str) or (list): "positive", "negative" or "neutral"
"""
*seqs, function = args
d_of_functions = {'molecular_weight': molecular_weight,
'three_letter_code': three_letter_code,
'show_length': show_length,
'folding': folding,
'seq_charge': seq_charge}
answer = []
aminoacid_seqs = aminoacid_seqs_only(seqs)
for sequence in aminoacid_seqs:
answer.append(d_of_functions[function](sequence))
if len(answer) == 1:
return answer[0]
else:
return answer
Comment on lines +159 to +162

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    if len(answer) == 1:
        return answer[0]
    return answer

Binary file added Снимок экрана 2023-09-30 223608.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.