-
Notifications
You must be signed in to change notification settings - Fork 0
Review TRAF4 #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Review TRAF4 #32
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хороший рабочий код 😊
Есть небольшие расхождения с условием заданий, внесла парочку исправлений и предложений.
Но в общем и целом всё гуд, молодец!)
pass | ||
|
||
|
||
class BiologicalSequence(str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В этой дз такое вроде было разрешено, но в целом лучше не наследоваться от встроенных типов данных. Подробнее об этом можно посмотреть в консультации от 28 февраля (примерно 28-я минута)
class BiologicalSequence(str): | |
class BiologicalSequence(): |
'g': 'c', 'G': 'C', | ||
'c': 'g', 'C': 'G' | ||
} | ||
if 'U' in self.sequence.upper(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Здорово, что этот биологический момент предусмотрен!
super().__init__(sequence) | ||
|
||
def complement(self): | ||
if self.complement_dict == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
С None
лучше использовать is
, а не ==
:
if self.complement_dict == None: | |
if self.complement_dict is None: |
|
||
def check_seq(self): | ||
valid_nucleotide_symbols = {'A', 'C', 'G', 'T', 'U'} | ||
valid_prot_symbols = {'A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
По PEP-8 здесь стоит поставить пробелы после запятых :) Также можно было бы немного разнести символы по строкам, это бы улучшило восприятие кода
't': 'u', 'T': 'U', | ||
'g': 'g', 'G': 'G', | ||
'c': 'c', 'C': 'C' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Замечательно, что здесь подумали о читабельности кода и вывели элементы словаря на разных строках :)
Потерялась табуляция, возвращаю:
} | |
transcription_dict = { | |
'a': 'a', 'A': 'A', | |
't': 'u', 'T': 'U', | |
'g': 'g', 'G': 'G', | |
'c': 'c', 'C': 'C' | |
} |
(по PEP-8 закрывающая скобка идёт на уровне с названием переменной)
return f'AminoAcidSequence("{self.sequence}")' | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Затесалась лишняя строка
elif seq_symbols.issubset(valid_prot_symbols): | ||
print('AA sequence') | ||
else: | ||
raise ValueError('Incorrect sequence input!') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Мне кажется, здесь тоже было бы здорово вызвать кастомную ошибку, тем более что одна из таких у вас уже определена в начале скрипта
def __init__(self, sequence): | ||
self.sequence = sequence | ||
|
||
def check_seq(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Насколько я помню, BiologicalSequence
должен был содержать только абстрактные методы, которые уже переопределялись бы в дочерних классах
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Все так
't': 'u', 'T': 'U', | ||
'g': 'g', 'G': 'G', | ||
'c': 'c', 'C': 'C' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Тут получается, что метод transcribe
создает словарь transcription_dict
каждый раз, когда вызывается метод. Чтобы этого избежать, можно было бы создать этот словарь как атрибут класса, т.е. просто поставить его перед __init__
- average phred scores, not less than specified | ||
Default output file name 'filtered.fastq' | ||
""" | ||
records = list(SeqIO.parse(input_path, 'fastq')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В подобных случаях лучше обрабатывать записи в цикле, а не загружать всё целиком (на больших файлах может не хватить памяти):
records = list(SeqIO.parse(input_path, 'fastq')) | |
for record in SeqIO.parse(input_path, "fastq"): | |
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В целом мне всё понравилось!
Очень хороший код, всё супер классно.
Возможно, мои отдельные комментарии ощущались как-то душно, но я сам всё ещё смешарик в питоне, и специально для ревью сидел и вчитывался во всё что можно...
super().__init__(sequence) | ||
|
||
def complement(self): | ||
if self.complement_dict == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хорошо
if self.complement_dict == None:
Лучше
if self.complement_dict is None:
if self.complement_dict == None: | |
if self.complement_dict is None: |
from Bio.SeqUtils import gc_fraction | ||
|
||
|
||
class NuclAcidnucleotideError(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Класс NuclAcidnucleotideError
:
Этот класс представляет собой пользовательскую ошибку для классов, связанных с нуклеиновыми кислотами. Очень хорошо! Однако, название класса немного запутанное + не соответствует CamelCase. Можно сделать его более ясным. Например, NucleotideError
или InvalidNucleotideError
.
class NuclAcidnucleotideError(ValueError): | |
class InvalidNucleotideError(ValueError): |
def gravy(self): | ||
"""Calculate GRAVY (grand average of hydropathy) value""" | ||
gravy_aa_values = {'L': 3.8, | ||
'K': -3.9, | ||
'M': 1.9, | ||
'F': 2.8, | ||
'P': -1.6, | ||
'S': -0.8, | ||
'T': -0.7, | ||
'W': -0.9, | ||
'Y': -1.3, | ||
'V': 4.2, | ||
'A': 1.8, | ||
'R': -4.5, | ||
'N': -3.5, | ||
'D': -3.5, | ||
'C': 2.5, | ||
'Q': -3.5, | ||
'E': -3.5, | ||
'G': -0.4, | ||
'H': -3.2, | ||
'I': 4.5} | ||
gravy_aa_sum = 0 | ||
for amino_ac in self.sequence.upper(): | ||
gravy_aa_sum += gravy_aa_values[amino_ac] | ||
return round(gravy_aa_sum / len(self.sequence), 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно использовать функцию sum с генератором, чтобы упростить код.
def gravy(self): | |
"""Calculate GRAVY (grand average of hydropathy) value""" | |
gravy_aa_values = {'L': 3.8, | |
'K': -3.9, | |
'M': 1.9, | |
'F': 2.8, | |
'P': -1.6, | |
'S': -0.8, | |
'T': -0.7, | |
'W': -0.9, | |
'Y': -1.3, | |
'V': 4.2, | |
'A': 1.8, | |
'R': -4.5, | |
'N': -3.5, | |
'D': -3.5, | |
'C': 2.5, | |
'Q': -3.5, | |
'E': -3.5, | |
'G': -0.4, | |
'H': -3.2, | |
'I': 4.5} | |
gravy_aa_sum = 0 | |
for amino_ac in self.sequence.upper(): | |
gravy_aa_sum += gravy_aa_values[amino_ac] | |
return round(gravy_aa_sum / len(self.sequence), 3) | |
def gravy(self): | |
gravy_aa_values = {'L': 3.8, 'K': -3.9, 'M': 1.9, ...} | |
return round(sum(gravy_aa_values[amino_ac] for amino_ac in self.sequence.upper()) / len(self.sequence), 3) |
for count, record in enumerate(records): | ||
gc_percent = gc_fraction(record.seq) * 100 | ||
if min_gc <= gc_percent <= max_gc: | ||
filtered_1_gc_idxs.append(count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно использовать более читаемую конструкцию for record in records вместо for count, record in enumerate(records).
for count, record in enumerate(records): | |
gc_percent = gc_fraction(record.seq) * 100 | |
if min_gc <= gc_percent <= max_gc: | |
filtered_1_gc_idxs.append(count) | |
for record in records: | |
gc_percent = gc_fraction(record.seq) * 100 | |
if min_gc <= gc_percent <= max_gc: | |
filtered_1_gc_idxs.append(record) |
def make_thresholds(threshold: int | float | tuple) -> tuple: | ||
"""Check thresholds input and convert single value to tuple""" | ||
if isinstance(threshold, int) or isinstance(threshold, float): | ||
lower = 0 | ||
upper = threshold | ||
else: | ||
lower = threshold[0] | ||
upper = threshold[1] | ||
return lower, upper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно изменить make_thresholds
для более явного понимания его назначения.
def make_thresholds(threshold: int | float | tuple) -> tuple: | |
"""Check thresholds input and convert single value to tuple""" | |
if isinstance(threshold, int) or isinstance(threshold, float): | |
lower = 0 | |
upper = threshold | |
else: | |
lower = threshold[0] | |
upper = threshold[1] | |
return lower, upper | |
def make_thresholds(threshold: int | float | tuple) -> tuple: | |
if isinstance(threshold, (int, float)): | |
return 0, threshold | |
elif isinstance(threshold, tuple): | |
return threshold | |
else: | |
raise ValueError('Invalid threshold input') |
return f'AminoAcidSequence("{self.sequence}")' | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Лишняя строчка
for idx in filtered_3_phred_idxs: | ||
filtered_results.append(records[idx]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно использовать генераторное выражение для создания списка filtered_results
.
for idx in filtered_3_phred_idxs: | |
filtered_results.append(records[idx]) | |
filtered_results = [records[idx] for idx in filtered_3_phred_idxs] |
def complement(self): | ||
if self.complement_dict == None: | ||
raise NotImplementedError('It is a basic NA class. You should implement it for descendant class: DNASequence or RNASequence.') | ||
result = type(self)(''.join([self.complement_dict[nuc] for nuc in self.sequence])) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В методе complement
можно использовать метод .get
для получения значения из словаря. Это позволит избежать ошибок, если нуклеотид отсутствует в словаре.
def complement(self): | |
if self.complement_dict == None: | |
raise NotImplementedError('It is a basic NA class. You should implement it for descendant class: DNASequence or RNASequence.') | |
result = type(self)(''.join([self.complement_dict[nuc] for nuc in self.sequence])) | |
return result | |
def complement(self): | |
if not all(nuc in self.complement_dict for nuc in self.sequence): | |
raise NuclAcidnucleotideError('Invalid nucleotide in the sequence') | |
result = type(self)(''.join([self.complement_dict.get(nuc, '') for nuc in self.sequence])) | |
return result |
class AminoAcidSequence(BiologicalSequence): | ||
def gravy(self): | ||
"""Calculate GRAVY (grand average of hydropathy) value""" | ||
gravy_aa_values = {'L': 3.8, | ||
'K': -3.9, | ||
'M': 1.9, | ||
'F': 2.8, | ||
'P': -1.6, | ||
'S': -0.8, | ||
'T': -0.7, | ||
'W': -0.9, | ||
'Y': -1.3, | ||
'V': 4.2, | ||
'A': 1.8, | ||
'R': -4.5, | ||
'N': -3.5, | ||
'D': -3.5, | ||
'C': 2.5, | ||
'Q': -3.5, | ||
'E': -3.5, | ||
'G': -0.4, | ||
'H': -3.2, | ||
'I': 4.5} | ||
gravy_aa_sum = 0 | ||
for amino_ac in self.sequence.upper(): | ||
gravy_aa_sum += gravy_aa_values[amino_ac] | ||
return round(gravy_aa_sum / len(self.sequence), 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Константу - словарь gravy_aa_values
можно определить в начале класса, что поможет улучшить их видимость и управление.
class AminoAcidSequence(BiologicalSequence): | |
def gravy(self): | |
"""Calculate GRAVY (grand average of hydropathy) value""" | |
gravy_aa_values = {'L': 3.8, | |
'K': -3.9, | |
'M': 1.9, | |
'F': 2.8, | |
'P': -1.6, | |
'S': -0.8, | |
'T': -0.7, | |
'W': -0.9, | |
'Y': -1.3, | |
'V': 4.2, | |
'A': 1.8, | |
'R': -4.5, | |
'N': -3.5, | |
'D': -3.5, | |
'C': 2.5, | |
'Q': -3.5, | |
'E': -3.5, | |
'G': -0.4, | |
'H': -3.2, | |
'I': 4.5} | |
gravy_aa_sum = 0 | |
for amino_ac in self.sequence.upper(): | |
gravy_aa_sum += gravy_aa_values[amino_ac] | |
return round(gravy_aa_sum / len(self.sequence), 3) | |
class AminoAcidSequence(BiologicalSequence): | |
GRAVY_AA_VALUES = {'L': 3.8, 'K': -3.9, ...} | |
def gravy(self): | |
return round(sum(AminoAcidSequence.GRAVY_AA_VALUES[amino_ac] for amino_ac in self.sequence.upper()) / len(self.sequence), 3) |
self.sequence = sequence | ||
|
||
def check_seq(self): | ||
valid_nucleotide_symbols = {'A', 'C', 'G', 'T', 'U'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Как я поняла из задания, проверки необходимо делать в дочерних классах, это абстрактный класс
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Верно
else: | ||
raise ValueError('Incorrect sequence input!') | ||
|
||
def __str__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
сначала вы насладитесь от str, а потом пишете метод def str. как я понимаю, тут наследование от str- лишнее тогда
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В целом нет, __str__
же задает правило отображения, мы вполне можем хотеть их переопределить
'c': 'g', 'C': 'G' | ||
} | ||
if 'T' in self.sequence.upper(): | ||
raise NuclAcidnucleotideError('T-contain sequence is not proper RNA sequence') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
молодец, что тут и раннее пишешь кастомные ошибки. очень хорошо вышло
raise NuclAcidnucleotideError('U-contain sequence is not proper DNA sequence') | ||
|
||
def transcribe(self): | ||
transcription_dict = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
словарь лучше бы вынести за функцию, в начало класса
|
||
class NucleicAcidSequence(BiologicalSequence): | ||
def __init__(self, sequence): | ||
self.complement_dict = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
отличный подход с complement_dict = None в родительском классе и его последующей спецификации в дочерних классах
raise NuclAcidnucleotideError('U-contain sequence is not proper DNA sequence') | ||
|
||
def transcribe(self): | ||
transcription_dict = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
может быть в таком случае, можно было не создавать словарь, а изменить всего одну букву. это кажется более оптимальным здесь
def make_thresholds(threshold: int | float | tuple) -> tuple: | ||
"""Check thresholds input and convert single value to tuple""" | ||
if isinstance(threshold, int) or isinstance(threshold, float): | ||
lower = 0 | ||
upper = threshold | ||
else: | ||
lower = threshold[0] | ||
upper = threshold[1] | ||
return lower, upper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
хорошо, что вы нашли такой удобный и универсальный подход через функцию к тому, что границы могут по разному задаваться
for count, record in enumerate(records): | ||
gc_percent = gc_fraction(record.seq) * 100 | ||
if min_gc <= gc_percent <= max_gc: | ||
filtered_1_gc_idxs.append(count) | ||
|
||
for idx in filtered_1_gc_idxs: | ||
if min_len <= len(records[idx].seq) <= max_len: | ||
filtered_2_len_idxs.append(idx) | ||
|
||
for idx in filtered_2_len_idxs: | ||
phred_values = records[idx].letter_annotations['phred_quality'] | ||
if sum(phred_values) / len(phred_values) >= quality_threshold: | ||
filtered_3_phred_idxs.append(idx) | ||
|
||
for idx in filtered_3_phred_idxs: | ||
filtered_results.append(records[idx]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
все логично, но очень громоздко. как известно, чем проще и короче, тем легче читается код и быстрее выполняется, можно было бы записать каждый из фильтров как свою функцию, а потом проверить для каждой последовательности выполнение трех условий (вывод каждой функции True/False).
но твой вариант тоже рабочий, это главное
Review TRAF4