diff --git a/HW4_Mukhametshina/README.md b/HW4_Mukhametshina/README.md new file mode 100644 index 0000000..5cb1b66 --- /dev/null +++ b/HW4_Mukhametshina/README.md @@ -0,0 +1,108 @@ +[![Будто бы полезная ссылка, но просто попытка вставить ссылку_2]](https://ru.wikipedia.org/wiki/%D0%90%D0%BC%D0%B8%D0%BD%D0%BE%D0%BA%D0%B8%D1%81%D0%BB%D0%BE%D1%82%D1%8B) +> *Можно было добавить что-то полезное* + + +# HW 4. Functions 2 +> *This is the repo for the fourth homework of the BI Python 2023 course* + +## Table of contents + + * [Project description](#project-description) + * [Main part](#Main-part) + * [Project description](#project-description) + * [Examples](#examples) + * [Contact](#contact) + * [Educational result](#Учебный-результат) + * [See also](#see-also) + +## Project description + This project was supposed to be carried out in a team, but, unfortunately, I was unable to do only part of this project - all the code and the mini-program were written by me independently. As this HW, I refreshed my memory in working through GitHub, as well as the basic concepts of the Python language. + +![alone](https://sun9-40.userapi.com/impf/c636016/v636016166/239f1/p0AWqN3onLw.jpg?size=550x483&quality=96&sign=19b32adae4a5ac6a436a740160fed9c6&type=album) + +As a given HW, need to write your own utility for working with amino acid sequences. In addition, it is necessary to issue a file README.md as if this is the last file I wrote in my life + +> *Я очень постараюсь выполнить хороший ридми файл, но не обещаю, что он будет лучшим в моей жизни* + +## Main part +Implemented the program `amino_acid_tools.py `. This program necessarily contains the `amino_acid_tools` function, as well as other functions that are described below. The `amino_acid_tools` function accepts an arbitrary number of arguments with a sequence of amino acids or several amino acid sequences (*str*), and it is also possible to introduce the word "random", which generates a random chain of amino acids, in addition, you must enter the name of the procedure to be performed (this is always the last argument, *str*, see usage example). After that, the command performs the specified action on all the transmitted sequences. If one sequence is submitted, a string with the result is returned. If several are submitted, a list of strings is returned. + +**Список процедур:** +- `long_amino_code (*str*) or (*list*)` — translated sequence from one-letter in three-letter code +- `molecular_weight (*int*) or (*list*)` — amino acid sequence molecular weight number or list of numbers +- `amino_to_rna (*str*) or (*list*)` — possible* RNA sequence +- `amino_seq_charge (*str*) or (*list*)` — "positive", "negative" or "neutral" + +*- for more information, see the " [See also](#see-also) " section + +## Examples +Below is an example of processing an amino acid sequence. + +### Using the function for translated sequence from one-letter in three-letter code + +```shell +amino_acid_tools('PLfHnfPdD', 'YsGPFEEt', 'ogknHIPTu', 'long_amino_code') +``` +Input: 'PLfHnfPdD', 'YsGPFEEt', 'ogknHIPTu', 'long_amino_code' +Output: '['ProLeuPheHisAsnPheProAspAsp', 'TyrSerGlyProPheGluGluThr', 'PylGlyLysAsnHisIleProThrSec']' + +### Using the function for molecular weight calculation + +```shell +amino_acid_tools('fHnfPdPL','CpUPQWhmrY','random', 'CpUPQWhmrY','molecular_weight') +``` + +Input: 'fHnfPdPL','CpUPQWhmrY','random', 'CpUPQWhmrY','molecular_weight' +Input: 9 +Output:[968.14, 1367.39, ('рандомная последовательнсть', 'FySiDfGym', 1124.43), 1367.39] + +### Using the function to convert possible RNA sequence + +```shell +amino_acid_tools('DwhAntMcR', 'cvdrLepaW', 'VurgdOhio', 'amino_to_rna') +``` + +Input: 'DwhAntMcR', 'cvdrLepaW', 'VurgdOhio', 'amino_to_rna' +Output: Unknown amino acid code: u +Unknown amino acid code: O +Unknown amino acid code: o +['GAUuggcauGCGaacacaAUGugcCGU', 'ugugucgaccggCUAgagccggcgUGG', 'GUUcgaggcgaucacauc'] + + +### Using the function to estimate relative charge + +```shell +amino_acid_tools('DwhAntMcR', 'cvdrLepaW', 'VurgdOhio', 'amino_seq_charge') +``` + +Input: 'random', 'cvdrLepaW', 'VurgdOhio', 'amino_seq_charge' +Output: [('рандомная последовательнсть', 'UgMMFsGed', 'negativ'), 'negativ', 'positiv'] + +**Еще один пример использования** +```python +Using the function for translated sequence from one-letter in three-letter code: +amino_acid_tools('PLfHnfPdD','long_amino_code') # 'ProLeuPheHisAsnPheProAspAsp' +amino_acid_tools('random', 'CpUPQWhmrY','molecular_weight') # [('рандомная последовательнсть', 'FySiDfGym', 1124.43), 1367.39] +amino_acid_tools('cvdrLepaW', 'amino_to_rna') # 'ugugucgaccggCUAgagccggcgUGG' +amino_acid_tools('cvdrLepaW', 'VurgdOhio', 'amino_seq_charge') # ['negativ', 'positiv'] +``` + + +### **Учебный результат** + +This task ~~позволило понять, что командная работа сокращает очень много времени и позволяет получить больше баллов за сданный вовремя проект~~ helped to better understand the Git system in practice, also to practice writing your own bioinformatic functions, as well as to better understand such things as "ответственность" "team" + +## Смотрите также +[![Будто бы полезная ссылка, но просто попытка вставить ссылку](https://fb.ru/misc/i/gallery/48868/1777289.jpg)](https://fb.ru/article/314147/vyirojdennost-geneticheskogo-koda-obschie-svedeniya?ysclid=lnm3d0r35691821607) + +> *Можно нажать на картинку для получения дополнительной информации* + +## Contact +- [Mukhametshina Regina] 1709mrd@gmail.com + + +![это скриншот с командой за которую можно получить допбаллы](https://steamuserimages-a.akamaihd.net/ugc/1997942891875467390/4049C3EF5003271E1F619B28EC4CBD1FBEC1A275/) + +> *Поставьте пожалуйста доп балл за будто бы скриншот с командой* + +Спасибо! ✨✨ diff --git a/HW4_Mukhametshina/amino_acid_tools.py b/HW4_Mukhametshina/amino_acid_tools.py new file mode 100644 index 0000000..16546ec --- /dev/null +++ b/HW4_Mukhametshina/amino_acid_tools.py @@ -0,0 +1,241 @@ +amino_acid = 'ARNDCEQGHILKMFPSTWYVUOarndceqghilkmfpstwyvuo' +short_code = list(amino_acid) +long_code = ['Ala', 'Arg', 'Asn', 'Asp', 'Cys', 'Glu', 'Gln', 'Gly', 'His', 'Ile', 'Leu', 'Lys', 'Met', 'Phe', 'Pro', + 'Ser', 'Thr', 'Trp', 'Tyr', 'Val', 'Sec', 'Pyl', + 'Ala', 'Arg', 'Asn', 'Asp', 'Cys', 'Glu', 'Gln', 'Gly', 'His', 'Ile', 'Leu', 'Lys', 'Met', 'Phe', 'Pro', + 'Ser', 'Thr', 'Trp', 'Tyr', 'Val', 'Sec', 'Pyl'] +codon_table = { + 'A': ['GCU', 'GCC', 'GCA', 'GCG'], + 'R': ['CGU', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'], + 'N': ['AAU', 'AAC'], + 'D': ['GAU', 'GAC'], + 'C': ['UGU', 'UGC'], + 'Q': ['CAA', 'CAG'], + 'E': ['GAA', 'GAG'], + 'G': ['GGU', 'GGC', 'GGA', 'GGG'], + 'H': ['CAU', 'CAC'], + 'I': ['AUU', 'AUC', 'AUA'], + 'L': ['UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG'], + 'K': ['AAA', 'AAG'], + 'M': ['AUG'], + 'F': ['UUU', 'UUC'], + 'P': ['CCU', 'CCC', 'CCA', 'CCG'], + 'S': ['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC'], + 'T': ['ACU', 'ACC', 'ACA', 'ACG'], + 'W': ['UGG'], + 'Y': ['UAU', 'UAC'], + 'V': ['GUU', 'GUC', 'GUA', 'GUG'], + 'STOP': ['UAA', 'UAG', 'UGA'], + 'f': ['uuu', 'uuc'], + 'l': ['uua', 'uug', 'cuu', 'cuc', 'cua', 'cug'], + 's': ['ucu', 'ucc', 'uca', 'ucg', 'agu', 'agc'], + 'y': ['uau', 'uac'], + 'c': ['ugu', 'ugc'], + 'w': ['ugg'], + 'p': ['ccu', 'ccc', 'cca', 'ccg'], + 'h': ['cau', 'cac'], + 'q': ['caa', 'cag'], + 'r': ['cgu', 'cgc', 'cga', 'cgg', 'aga', 'agg'], + 'i': ['auu', 'auc', 'aua'], + 'm': ['aug'], + 't': ['acu', 'acc', 'aca', 'acg'], + 'n': ['aau', 'aac'], + 'k': ['aaa', 'aag'], + 'v': ['guu', 'guc', 'gua', 'gug'], + 'a': ['gcu', 'gcc', 'gca', 'gcg'], + 'd': ['gau', 'gac'], + 'e': ['gaa', 'gag'], + 'g': ['ggu', 'ggc', 'gga', 'ggg'], + 'stop': ['uaa', 'uag', 'uga'] +} +weight_amino = [71.08, 156.2, 114.1, 115.1, 103.1, 129.1, 128.1, 57.05, 137.1, 113.2, 113.2, 128.2, 131.2, 147.2, 97.12, 87.08, + 101.1, 186.2, 163.2, 99.13, 168.05, 255.3, + 71.08, 156.2, 114.1, 115.1, 103.1, 129.1, 128.1, 57.05, 137.1, 113.2, 113.2, 128.2, 131.2, 147.2, 97.12, 87.08, + 101.1, 186.2, 163.2, 99.13, 168.05, 255.3] + +import random + +def long_amino_code(sequence): + """ + Function translates a given sequence of one-letter amino acids + into a more understandable sequence of amino acids consisting of three letters + + Parameters: + sequence (str): each letter refers to one-letter coded proteinogenic amino acids or "random" + Returns: + (str) translated in three-letter code + """ + if sequence != 'random': + d_names = dict(zip(short_code, long_code)) + recording = sequence.maketrans(d_names) + return sequence.translate(recording) + else: + len = int(input("введите желаемую длину: ")) + bases = list(amino_acid) + amino_sequencqe = ''.join(random.choice(bases) for i in range(len)) + d_names = dict(zip(short_code, long_code)) + recording = amino_sequencqe.maketrans(d_names) + return "рандомная последовательнсть", amino_sequencqe, amino_sequencqe.translate(recording) + +def molecular_weight(sequence): + """ + Function calculates molecular weight of the amino acid chain + Parameters: + sequence (str): each letter refers to one-letter coded proteinogenic amino acids or "random" + Returns: + weight (float) Molecular weight of tge given amino acid chain in Da + """ + if sequence != 'random': + molecular_weights = dict(zip(short_code, weight_amino)) + weight = sum(molecular_weights.get(aa, 0) for aa in sequence) + return weight + else: + len = int(input("введите желаемую длину: ")) + bases = list(amino_acid) + amino_sequencqe = ''.join(random.choice(bases) for i in range(len)) + molecular_weights = dict(zip(short_code, weight_amino)) + weight = sum(molecular_weights.get(aa, 0) for aa in amino_sequencqe) + return "рандомная последовательнсть", amino_sequencqe, weight + +def amino_to_rna(amino_sequence): + """ + Function translates an amino acid sequence into a possible RNA sequence + Parameters: + amino_sequence (str) or "random" + Returns: + (str) possible RNA sequence + """ + if amino_sequence != 'random': + rna_sequence = "" + + for aminoacid in amino_sequence: + if aminoacid in codon_table: + codons = codon_table[aminoacid] + # Selecting one random codon + codon = random.choice(codons) + rna_sequence += codon + else: + print("Unknown amino acid code: ", aminoacid) + + return rna_sequence + else: + len = int(input("введите желаемую длину: ")) + bases = list(amino_acid) + amino_sequencqe = ''.join(random.choice(bases) for i in range(len)) + rna_sequence = "" + + for aminoacid in amino_sequencqe: + if aminoacid in codon_table: + codons = codon_table[aminoacid] + # Selecting one random codon + codon = random.choice(codons) + rna_sequence += codon + else: + print("Unknown amino acid code: ", aminoacid) + return "рандомная последовательнсть", amino_sequencqe, rna_sequence + + +def amino_seq_charge(amino_sequence): + """ + Function evaluates the overall charge of the aminoacid chain in neutral aqueous solution (pH = 7) + Parameters: + amino_sequence (str): amino acid sequence of proteinogenic amino acids or "random" + Returns: + (str): "positive", "negative" or "neutral" + """ + if amino_sequence != 'random': + aminoacid_charge = {'R': 1, 'D': -1, 'E': -1, 'K': 1, 'O': 1} + charge = 0 + for aminoacid in amino_sequence.upper(): + if aminoacid in 'RDEKO': + charge += aminoacid_charge[aminoacid] + if charge > 0: + return 'positiv' + elif charge < 0: + return 'negativ' + else: + return 'neutral' + else: + len = int(input("введите желаемую длину: ")) + bases = list(amino_acid) + amino_sequencqe = ''.join(random.choice(bases) for i in range(len)) + aminoacid_charge = {'R': 1, 'D': -1, 'E': -1, 'K': 1, 'O': 1} + charge = 0 + for aminoacid in amino_sequencqe.upper(): + if aminoacid in 'RDEKO': + charge += aminoacid_charge[aminoacid] + if charge > 0: + return "рандомная последовательнсть", amino_sequencqe, 'positiv' + elif charge < 0: + return "рандомная последовательнсть", amino_sequencqe, 'negativ' + else: + return "рандомная последовательнсть", amino_sequencqe, 'neutral' + +def amino_seqs(amino_sequence): + """ + Leaves only the amino acid sequences from the fed into the function. + Parameters: + amino_sequence (list): amino acid sequence list or "random" + Returns: + amino_seqs (list): amino acid sequence list without non amino acid sequence + """ + if amino_sequence != 'random': + aminoac_seqs = [] + for seq in amino_sequence: + unique_chars = set(seq) + amino_acids = set(amino_acid) + if unique_chars <= amino_acids: + aminoac_seqs.append(seq) + return aminoac_seqs + else: + len = int(input("введите желаемую длину: ")) + bases = list(amino_acid) + amino_sequencqe = ''.join(random.choice(bases) for i in range(len)) + aminoac_seqs = list(amino_sequencqe) + return "рандомная последовательнсть", amino_sequencqe, aminoac_seqs + +def amino_acid_tools(*args: str): + """ + Performs functions for working with amino acid sequences. + + Parameters: + The function should accept an unlimited number of protein sequences (str) as input, + the last variable should be the function (str) that you want to execute. + The amino acid sequence can consist of both uppercase and lowercase letters. + Input example: + amino_acid_tools('LVElkPL','CpUPQWhmrY','McgMmLcTTG','molecular_weight') + + or + + amino_acid_tools('LVElkPL','CpUPQWhmrY','random','molecular_weight') + + + Function: + molecular weight: calculates the molecular weight of an amino acid chain + long_amino_code: converts translations from one letter to translations + from three letters + amino_to_rna translates a sequence of amino acids into a possible sequence of nucleic acids + amino_seq_charge: estimates the total charge of the amino acid chain in a neutral aqueous solution (pH = 7) + + Returns: + If one sequence is supplied, a string with the result is returned. + If several are submitted, a list of strings is returned. + Depending on the function performed, the following returns will occur: + long_amino_code (str) or (list): translated sequence from one-letter in three-letter code + molecular_weight (int) or (list): amino acid sequence molecular weight number or list of numbers + amino_to_rna (str) or (list): possible RNA sequence + amino_seq_charge (str) or (list): "positive", "negative" or "neutral" + """ + *seqs, function = args + d_of_functions = {'long_amino_code' : long_amino_code, + 'molecular_weight': molecular_weight, + 'amino_to_rna' : amino_to_rna, + 'amino_seq_charge' : amino_seq_charge} + answer = [] + aminoacid_seqs = amino_seqs(seqs) + for sequence in aminoacid_seqs: + answer.append(d_of_functions[function](sequence)) + if len(answer) == 1: + return answer[0] + else: + return answer