Skip to content

Hw4 Uzun #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
b652eb8
Create code file
uzunmasha Sep 26, 2023
00a477d
Add code
uzunmasha Sep 27, 2023
f32cf2b
Add protein_length function
zhurkv Sep 29, 2023
a6a3033
Add def essential_amino_acids function
zhurkv Sep 29, 2023
5a1ae4f
Move AAmigo.py to directory
uzunmasha Sep 29, 2023
86f3bb2
Add new file AAmigo.py
uzunmasha Sep 29, 2023
c70c0b5
Add function protein_mass to AAmigo.py
icalledmyselfmoon Sep 29, 2023
b9dce5b
Add aa_substring function
uzunmasha Sep 29, 2023
010a456
Add aa_count function
uzunmasha Sep 29, 2023
92d597c
Add functions protein_mass and aa_profile
Sep 29, 2023
d6e423e
Add main function aa_tools
uzunmasha Sep 29, 2023
6333781
Merge branch 'HW4_Uzun' into HW4_Uzun
uzunmasha Sep 29, 2023
e0261ca
Merge pull request #1 from icalledmyselfmoon/HW4_Uzun
uzunmasha Sep 29, 2023
146659f
Merge branch 'HW4_Uzun' into HW4_Uzun
uzunmasha Sep 29, 2023
825684b
Merge pull request #2 from zhurkr/HW4_Uzun
uzunmasha Sep 29, 2023
e7413e7
Update AAmigo.py
uzunmasha Sep 29, 2023
96117db
Update README.md
uzunmasha Sep 29, 2023
cce5ee8
Update README.md
uzunmasha Sep 29, 2023
d7f1ef1
Update README.md
zhurkr Sep 30, 2023
a3bd061
Merge pull request #3 from uzunmasha/zhurkr-patch-1
uzunmasha Sep 30, 2023
4aca2c8
Update README.md
uzunmasha Sep 30, 2023
59c1260
Add new function names
uzunmasha Sep 30, 2023
a241a9a
Update README.md
uzunmasha Sep 30, 2023
de07438
Update README.md
zhurkr Sep 30, 2023
b6c0346
Merge pull request #4 from uzunmasha/zhurkr-patch-2
uzunmasha Sep 30, 2023
cd874f3
Update README.md
zhurkr Sep 30, 2023
cbf64db
Update AAmigo.py
zhurkr Sep 30, 2023
dfa4196
Merge pull request #5 from uzunmasha/zhurkr-patch-3
uzunmasha Sep 30, 2023
9ba4b80
Merge pull request #6 from uzunmasha/zhurkr-patch-4
uzunmasha Sep 30, 2023
965be18
Update functions to allow lower case in string as an input
icalledmyselfmoon Oct 1, 2023
ca590c1
Correct function names
uzunmasha Oct 1, 2023
1dc1402
Update README.md
uzunmasha Oct 1, 2023
01e0ab5
Update README.md
uzunmasha Oct 1, 2023
c456658
Add README to working directory
uzunmasha Oct 1, 2023
382e419
Relocate README file
uzunmasha Oct 1, 2023
3431135
Update README.md
uzunmasha Oct 1, 2023
08bc52d
Update README.md
uzunmasha Oct 1, 2023
4d8774b
Correct code quality
uzunmasha Oct 1, 2023
0947cb9
Correct incompatible types issue
uzunmasha Oct 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 181 additions & 0 deletions HW4_Uzun/AAmigo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
def protein_mass(seq: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def protein_mass(seq: str):
def calculate_protein_mass(seq: str) -> float:

"""

Calculate the mass (Da) of a protein based on its amino acids sequence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Правильно что указали размерность!

Takes a string of amino acids, returns the molecular weight in Da.
Amino acids in the string should be indicated as one-letter symbols.

"""
aa_seq = list(seq.upper())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не обязательно было переводить это дело в список)

mass_dictionary = dict({'A': 89, 'R': 174, 'N': 132, 'D': 133, 'C': 121, 'Q': 146, 'E': 147, 'Z': 147,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Не нужно тут писать dict, у нас уже {... : ...} это словарь
  2. Такие вещи лучше выносить в начало кода. Это называют константами и нейминг у них идет целиком капсом. Мы это подробно обсуждали на консультации 03.10.23, советую посмотреть

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

И не очень принято добавлять тип переменной в ее название. Хотя бы не целиком, а dict. Ну а вообще питонисто было бы как-то так:

Suggested change
mass_dictionary = dict({'A': 89, 'R': 174, 'N': 132, 'D': 133, 'C': 121, 'Q': 146, 'E': 147, 'Z': 147,
AA_MASSES: dict = dict({'A': 89, 'R': 174, 'N': 132, 'D': 133, 'C': 121, 'Q': 146, 'E': 147, 'Z': 147,

'G': 75, 'H': 155, 'I': 131, 'L': 131, 'K': 146, 'M': 149, 'F': 165, 'P': 115, 'S': 105,
'T': 119, 'W': 204, 'Y': 181, 'V': 117})
mass = 0
for amino_acid in aa_seq:
mass += mass_dictionary[amino_acid]

return mass
Comment on lines +13 to +17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mass = 0
for amino_acid in aa_seq:
mass += mass_dictionary[amino_acid]
return mass
mass = 0
for amino_acid in aa_seq:
mass += mass_dictionary[amino_acid]
return mass



def amino_acid_profile(seq: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Чтобы это было глаголом то скорее так:

Suggested change
def amino_acid_profile(seq: str):
def profile_amino_acid(seq: str):

Хотя не знаю насколько тут слово profile подходит с точки зрения английского языка. Если ок - то ок. А так я бы что нибдуь с info или stats написал.

"""

Displays the proportion of hydrophobic, polar, negatively and positively charged amino acids in the protein.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну не Displays тут все таки, она же не печатает а возвращает

Takes a string of amino acids, returns a dictionary.
Amino acids in the string should be indicated as one-letter symbols.

"""
aa_seq = list(seq.upper())
aa_biochemistry = dict(
{'hydrophobic': ['G', 'A', 'V', 'L', 'I', 'P', 'F', 'M', 'W'], 'polar': ['S', 'T', 'C', 'N', 'Q', 'Y'],
'- charged': ['E', 'D'], '+ charged': ['K', 'H', 'R']})
profile = dict({'hydrophobic': 0.0, 'polar': 0.0, '- charged': 0.0, '+ charged': 0.0})
Comment on lines +29 to +32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Это тоже хорошо бы в константы


for amino_acid in aa_seq:
for group_name, group_list in aa_biochemistry.items():
if amino_acid in group_list:
profile[group_name] += 1

for group, count in profile.items():
profile[group] = round((count/len(seq)), 2)
return profile


def amino_acid_substring(seq: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Тут по названию вообще не понятно что происходит:)

"""

Searches for a substring of amino acids in the entire amino acid sequence.
Takes a string of amino acids and a substring, which should be found.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исходя из названия переменных и аннотации типов - у вас функция принимает ровно одну строчку

Returns the position where the searched one was found for the first time.
Amino acids in the string should be indicated as one-letter symbols.

"""
aa_seq = list(seq)
aa_seq_upper = []
for sequences in aa_seq:
upper_case = sequences.upper()
aa_seq_upper.append(upper_case)
Comment on lines +53 to +57
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Я читаю это так: seq это какая то строка (белок) и вы итерируетесь по каждому элементу в белке (буква, аминокилсота) и поднимаете регистр. Звучит не очень логично))
Я понял что у вас seq это много строк, но тогда ведь seqs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Переменная seq мне кстати кажется что это все таки обычно про ДНК

amino_acids = aa_seq_upper[:-1]
substring = aa_seq_upper[-1]
Comment on lines +58 to +59
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Вот это не очень хорошо. У вас по логике есть набор белков, а есть какая-то подстрока. А вы их вместе принимаете в одной куче. Лучше было бы принимать список из белков и отдельным аргументом подстроку (обычно это еще называют словом pattern

def find_pattern_in_protein(protein, pattern)

results = []
for sequences in amino_acids:
subst = sequences.find(substring)
results.append(subst)
return results


def amino_acid_count(seq: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def amino_acid_count(seq: str):
def count_pattern_in_seqs(seqs: List[str], pattern: str) -> List[int]:

"""

Finds how many times a particular sequence(s) occurs in the original one.
Takes a string of amino acids and a substring, which should be counted.
Returns the count of searched amino acids.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну не просто аминокислот, а именно подстроки

Amino acids in the string should be indicated as one-letter symbols.

"""
aa_seq = list(seq)
aa_seq_upper = []
for sequences in aa_seq:
upper_case = sequences.upper()
aa_seq_upper.append(upper_case)
amino_acids = aa_seq_upper[:-1]
substring = aa_seq_upper[-1]
Comment on lines +81 to +82
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Отдельно белки, отдельно подстроку

results = []
for sequences in amino_acids:
aa_count = sequences.count(substring)
results.append(aa_count)
return results


def protein_length(*seq: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну, простенько конечно, но хотя главное что работает правильно:)
В данном случае можно было бы чуть больше это дело обосновать, если бы ваша функция могла принимать и 1буквенный и 3буквенный код. Все таки 3буквенный сильно привычнее людям.

"""

Calculate the length (number of amino acids) of a protein.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Calculate the length (number of amino acids) of a protein.
Calculates the length (number of amino acids) of a protein.

Takes a string of amino acids, returns the number.
Amino acids in the string should be indicated as one-letter symbols.

"""
lengths = []

for sequences in seq:
lengths.append(len(sequences))
Comment on lines +100 to +101
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

У вас же цикл по сиквенсам уже есть в главной функции
Тогда получается не совсем правильная работа
image


return lengths
Comment on lines +98 to +103
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lengths = []
for sequences in seq:
lengths.append(len(sequences))
return lengths
lengths = []
for sequences in seq:
lengths.append(len(sequences))
return lengths



def essential_amino_acids(*seq: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def essential_amino_acids(*seq: str):
def count_essential_amino_acids(*seq: str):

"""

Calculate the number of essential amino acids based on its amino acids sequence.
Takes a string of amino acids, returns only the essential amino acids.
Amino acids in the string should be indicated as one-letter symbols.

"""
eaa_dictionary = ['H', 'I', 'K', 'L', 'M', 'F', 'T', 'W', 'V', 'h', 'i', 'k', 'l', 'm', 'f', 't', 'w', 'v']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

В смысле eaa_dictionary? Это же список:)))?
image

Ну и eaa без контекста совсем не понятно что такое. Плюс указывать тип данных в названии переменной не всегда обязательно

Suggested change
eaa_dictionary = ['H', 'I', 'K', 'L', 'M', 'F', 'T', 'W', 'V', 'h', 'i', 'k', 'l', 'm', 'f', 't', 'w', 'v']
essential_amino_acids = ['H', 'I', 'K', 'L', 'M', 'F', 'T', 'W', 'V', 'h', 'i', 'k', 'l', 'm', 'f', 't', 'w', 'v']

И в конце концов это дело мы убираем в начало кода в константы и назваем целиком капсом, так вообще красота будет

eaa_list = []

for sequences in seq:
eaa_seq = []
for amino_acid in sequences:
if amino_acid in eaa_dictionary:
eaa_seq.append(amino_acid)
eaa_list.append(eaa_seq)

return eaa_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Так у вас же заявлено в докстринге что он "Calculate the number of essential amino acids based on its amino acids sequence."
А тут вы список возвращаете)

Было бы здорово выдать статистику. Какой сколько, в штуках.



def aa_tools(*args):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def aa_tools(*args):
def run_aa_tools(*args):

Или как-то по-другому, но чтобы был глагол.

"""

Main function for amino acid sequences processing.
Parameters: *args (str) - amino acid sequences and operation.
Returns: List of results or None if non-amino acid chars found.

"""
seq = args[:-1]
operation = args[-1]
Comment on lines +135 to +136
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Это не очень хорошо на самом деле)
Мы так делали в 3 домашке чтобы научиться работать с *args. Все таки в реальной жизни лучше делать немного по другому. У нас же по смыслу аминокислоты и название операции это совсем разные вещи. Поэтому надо их сразу принимать в разные аргументы. Аминокислоты - можно даже сразу брать списком, а операцию - именованным апргументом:

def run_aa_tools(seqs: List[str], operation: str)

non_aa_chars = set('BJOUXbjoux')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

хех, хитрое решение)

contains_non_aa = False

for sequence in seq:
contains_non_aa = False
for amino_acid in sequence:
if amino_acid in non_aa_chars:
contains_non_aa = True
break
if contains_non_aa:
break
if contains_non_aa:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Лучше не возвращать None в таком случае, потому что так молча пользователь и не поймет что произошло. Лучше было бы тут упасть с ошибкой и напечатать что у вас не правильный ввод. Иногда это даже в отдельные функции выделяют


results = []

for sequence in seq:
if operation == "protein_mass":
result = protein_mass(sequence)
results.append(result)
Comment on lines +155 to +156
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Я бы эту строчку которая у вас повторяется в каждом блоке if-elif вынес бы в конце цикла после условия


elif operation == "amino_acid_profile":
result = amino_acid_profile(sequence)
results.append(result)

if operation == "amino_acid_substring":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

почему это новый if? Все также elif было бы правильно

result = amino_acid_substring(seq)
return result

if operation == "amino_acid_count":
result = amino_acid_count(seq)
return result

if operation == "protein_length":
result = protein_length(sequence)
results.append(result)

if operation == "essential_amino_acids":
result = essential_amino_acids(sequence)
results.append(result)

return results


aa_tools()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А это-то тут и не надо)
Тем более что без аргументов эта функция ведь не работает

92 changes: 92 additions & 0 deletions HW4_Uzun/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# AAmigo
This readme describes the user-friendly program AAmigo for performing various operations with amino acid sequences.

AAmigo can perform different operations:
* Calculate the mass of a protein.
* Calculate the ratio of amino acids with different polarities in a protein
* Find for a particular amino acid(s) in the entire sequence
* Calculate amino acid's occurrence in a sequence
* Calculate amino acid sequence(s) length
* Finds essential amino acids (in humans) in a sequence(s)

## Usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥🔥

1. Clone this repo using SSH or HTTPS:
```bash
git clone [email protected]:uzunmasha/HW4_Functions2.git
```
**or**
```bash
git clone https://github.com/uzunmasha/HW4_Functions2.git
```
2. Launch the program with the required function (listed below) in a code interpreter like Jupyter Notebook.
3. Enjoy your results!

## List of functions:
For all functions, amino acids in the sequences should be indicated as one-letter symbols. Letters can be uppercase or lowercase.

### protein_mass
This function calculates the mass (Da) of a protein based on its amino acid sequence. As input, it takes a string of amino acids and returns the molecular weight in Da.
Usage example:
```python
Comment on lines +28 to +30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

В Markdown как и в latex чтобы сделать новый абзац надо оставить пустую строку

Suggested change
This function calculates the mass (Da) of a protein based on its amino acid sequence. As input, it takes a string of amino acids and returns the molecular weight in Da.
Usage example:
```python
This function calculates the mass (Da) of a protein based on its amino acid sequence. As input, it takes a string of amino acids and returns the molecular weight in Da.
Usage example:

aa_tools('MARY', 'protein_mass') #593 (in Da)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Супер, так и надо:)

```
### amino_acid_profile
This function displays the proportion of hydrophobic, polar, negatively, and positively charged amino acids in the protein. It takes a string of amino acids, and returns a dictionary with the result.
Usage example:
```python
aa_tools('EEKFG', 'amino_acid_profile') #{'hydrophobic': 0.4, 'polar': 0.0, '- charged': 0.4, '+ charged': 0.2}
```
### amino_acid_substring
This function searches for the presence of particular amino acid(s) in the entire amino acid sequence. As input, it takes a string of amino acids and a substring that needs to be found. All sequences and subsequence should be comma-separated. Any number of amino acid sequences is possible. The searched substring should be one and it should be pointed last. As an output, the function returns the position in the original sequence where the searched element was found for the first time.
Usage example:
```python
aa_tools('RNwDeACEQEZ', 'E','amino_acid_substring') #4
aa_tools('RNwDeACEQEZ', 'DFKAaaE','A','amino_acid_substring') #[5, 3]
```
### amino_acid_count
This function finds how many times a particular amino acid or sequence of several amino acids occurs in the original sequence. As input, it takes a string of amino acids and a substring that needs to be counted. All sequences and subsequence should be comma-separated. Any number of amino acid sequences is possible. The searched substring should be one and it should be pointed last. As an output, the function returns the count of searched amino acid(s).
Usage example:
```python
aa_tools('GHcLfKF','f','amino_acid_count') #2
aa_tools('HILAKMaF', 'GDaKFAAE','A','amino_acid_count') #[2, 3]
```
### protein_length
This function can analyze an aminoacid sequence and gives a length of it (number of amino acids). Any number of amino acid sequences is possible. All sequences should be comma-separated. As input, it takes a string or strings of amino acids, as an output, the function returns the length of each protein.
Usage example:
```python
aa_tools('KKNNfF', 'KKFFRRVV', 'KK', 'protein_length') #[6, 8, 2]
```
### essential_amino_acids
This function can analyze an amino acid sequence and gives a list of essential amino acids (in humans) that are present in the sequence.
Any number of amino acid sequences is possible. All sequences should be comma-separated. As input, it takes a string or strings of amino acids, as an output, the function returns essential amino acids for each sequence.
Usage example:
```python
aa_tools('KKNNfF', 'KKFFRRVV', 'KK', 'essential_amino_acids') #[['K', 'K', 'f', 'F'], ['K', 'K', 'F', 'F', 'V', 'V'], ['K', 'K']]
```

## Troubleshooting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Круто что есть такая секция. Это можно еще проработать в коде в виде осмысленных ошибок

* In function `'amino_acid_substring'` the position counting starts at 0, so don't be confused if the second element in the sequence has the output [1].
* In functions `'amino_acid_substring'` and `'amino_acid_count'` output [-1] means that there is no such element in the sequence.
* In functions `'amino_acid_substring'` and `'amino_acid_count'` the error message "name '..' is not defined" means that the given argument is not quoted in the input string.

## Bibliography
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

[1] Wu G. Amino acids: metabolism, functions, and nutrition. Amino Acids. 2009 May;37(1):1-17. doi: 10.1007/s00726-009-0269-0.

## Developers and contacts
* Maria Uzun - contributed to `'amino_acid_substring'`, `'amino_acid_count'`, and `'aa_tools'` functions.
* Maria Babaeva - contributed to `'protein_mass'` and `'amino_acid_profile'` functions.
* Kristina Zhur - contributed to `'protein_length'` and `'essential_amino_acids'` functions.
* Julia the Cat - team's emotional support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☺️☺️



![photo_2023-09-26_18-33-49_3](https://github.com/uzunmasha/HW4_Functions2/assets/44806106/63fdea24-5c0a-4650-8bed-181871aa540f)


In case of non-working code:

* Please blame the one who has the paws
* Report any problems directly to the GitHub issue tracker

or

* Send your feedback to [email protected]
65 changes: 0 additions & 65 deletions README.md

This file was deleted.