-
Notifications
You must be signed in to change notification settings - Fork 0
Review TAOK3 #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Review TAOK3 #23
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хорошая работа, код мне понравился, видно, что очень старались! Не поняла только, почему в последней функции итерация идет не с 1 аминокислоты, а со второй.
Константы в питоне принято писать большими буквами, об этом не забываем. Спасибо за работу!
For more information please see README | ||
|
||
""" | ||
if type(length_bounds) is int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно еще проверить на None, если вдруг пользователь захочет передать в функцию такое значение параметров и в случае None присвоить значение по умолчанию
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Да, об этом не подумала
if os.path.exists(output_path): | ||
error = 'File with such name exists! Change output_filename arg!' | ||
raise ValueError(error) | ||
SeqIO.write(good_reads, output_path, "fastq") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хорошая работа!
gc_counter = 0 | ||
for nucl in self.seq: | ||
if nucl in ('G', 'C', 'g', 'c'): | ||
gc_counter += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Здесь можно было бы использовать метод .upper() или .lower(), чтобы не сравнивать регистры или вынести эти значения в константы, чтобы каждый раз не создавался новый tuple при каждой итерации
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Да, абсолютно согласна)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не поспорить)
""" | ||
The class for RNA sequences | ||
""" | ||
rule_complement = 'AUCG'.maketrans('AaUuCcGg', 'UuAaGgCc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Интересная реазизация!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Спасибо)
""" | ||
transcribe_seq = self.seq.translate(self.rule_transcription) | ||
rna_seq = RNASequence(transcribe_seq) | ||
return rna_seq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Очень красиво!
""" | ||
The class for amino acid sequences | ||
""" | ||
aa_alphabet = 'ACDEFGHIKLMNPQRSTVWY' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Константы обозначаем большими буквами
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
А вот это точно константы? Я тоже об этом думала, но в данном случае наши константы стали атрибутами. Как будто они всегда записываются в нижнем регистре?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хм, наверное это вопрос уже вкуса. Мне кажется, что если есть какое-то значение, которое ты задаешь единожды и не изменяешь, то это все-таки константа. А начальные какие-то значения, которые ты передаешь в def init, например - это уже атрибуты, которые ты определяешь.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Константами в питоне называются конкретно штуки вынсенные в начало модуля. Тут это просто классовые атрибуты, поэтому да, просто в нижнем регистре
- Обрати внимание, когда пишешь какой то код в PR, то его кушает маркдаун. У тебя
__init__
стал жирным init. Чтобы он оставался собой, надо добавлять штрихи (на букве ё).
""" | ||
alternative_frames = [] | ||
num_position = 0 | ||
for amino_acid in self.seq[1:-3]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Здесь так и задумано, что мы пропускаем первую аминокислоту?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Да, здесь так задумано) С первой амк начинается стандартная рамка считывания, а мы ищем альтернативные)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Отлично!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Спасибо огромное за работу, всем бы такой лаконичный код и такие наиподробнейшие докстринги с аннотацией!
10/10
Example: output_filename='result' # 'result.fastq' | ||
output_filename='result.fastq' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Очень хорошая идея с примерами, порой их очень не хватает
if output_filename is None: | ||
input_filename = os.path.split(input_path)[-1] | ||
output_filename = input_filename | ||
if not (output_filename.endswith('.fastq')): | ||
output_filename = output_filename + '.fastq' | ||
current_directory = os.getcwd() | ||
path = os.path.join(current_directory, 'fastq_filtrator_results') | ||
if not (os.path.exists(path)): | ||
os.mkdir(path) | ||
output_path = os.path.join(path, output_filename) | ||
if os.path.exists(output_path): | ||
error = 'File with such name exists! Change output_filename arg!' | ||
raise ValueError(error) | ||
SeqIO.write(good_reads, output_path, "fastq") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💥💥💥🔥🔥🔥
""" | ||
The class for RNA sequences | ||
""" | ||
rule_complement = 'AUCG'.maketrans('AaUuCcGg', 'UuAaGgCc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Возьму на заметку, хороший метод!
|
||
Return: DNAsequence or RNAsequence - the result sequence object | ||
""" | ||
complement = self.seq.translate(type(self).rule_complement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хотелось бы еще как то справляться с ошибкой в случае, если complement вызывается на самой NucleicAcidSequence (NotImplementedError или что нибудь такое, указывающее, что пользователю нужно выбрать между DNASequence и RNASequence)
""" | ||
return set(self.seq.upper()) <= set(self.aa_alphabet) | ||
|
||
def search_for_alt_frames(self, alt_start_aa='M') -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хорошая функция)
if type(length_bounds) is int: | ||
length_bounds = 0, length_bounds | ||
if type(gc_bounds) is int or type(gc_bounds) is float: | ||
gc_bounds = 0, gc_bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Круто, что была сделана проверка на типы!
Важно: Если в gc_bounds пользователь укажет tuple проверка сломается, нужно придумать, что делать в этом случае...
Неважно: тут так и просится isinstance)
if type(length_bounds) is int: | |
length_bounds = 0, length_bounds | |
if type(gc_bounds) is int or type(gc_bounds) is float: | |
gc_bounds = 0, gc_bounds | |
if isinstance(length_bounds, int): | |
length_bounds = 0, length_bounds | |
if isinstance(gc_bounds, int) or isinstance(gc_bounds, float): | |
gc_bounds = 0, gc_bounds |
if len(good_reads) == 0: | ||
raise ValueError('There are no sequences suited to requirements') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Кажется, что лучше просто вывести предупреждение в stdout о конкретном файле. Было бы неплохо прогонять фильтр по нескольким файлам в цикле... Но если хоть в одном из них не будет хорошей последовательности, то цикл остановится с ошибкой - неприятно
if len(good_reads) == 0: | |
raise ValueError('There are no sequences suited to requirements') | |
if len(good_reads) == 0: | |
print(f'In {input_path} there is no sequences suited to requirements') |
def __init__(self, seq): | ||
self.seq = seq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Для стандартизации тут неплохо было бы сделать self.seq = seq.upper() (или .lower()) - не придется приводить к определенному регистру в будущем/сравнивать несколько регистров как на строчке 246
def __init__(self, seq): | |
self.seq = seq | |
def __init__(self, seq): | |
self.seq = seq.upper() |
Return: DNAsequence or RNAsequence - the result sequence object | ||
""" | ||
complement = self.seq.translate(type(self).rule_complement) | ||
complement_seq = type(self)(complement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно как альтернативу использовать: self.__class__(complement) (две пары кавычек немного смущают, но это дело вкуса)
complement_seq = type(self)(complement) | |
complement_seq = self.__class__(complement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
У кода хорошая структура, он читаемый. Используются понятные названия функций и переменных. Документация к функциям прекрасно оформлена. Получила удовольствие от проверки.
if type(length_bounds) is int: | ||
length_bounds = 0, length_bounds | ||
if type(gc_bounds) is int or type(gc_bounds) is float: | ||
gc_bounds = 0, gc_bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if type(length_bounds) is int: | |
length_bounds = 0, length_bounds | |
if type(gc_bounds) is int or type(gc_bounds) is float: | |
gc_bounds = 0, gc_bounds | |
if isinstance(length_bounds, int): | |
length_bounds = 0, length_bounds | |
if isinstance(gc_bounds, (int, float)): | |
gc_bounds = 0, gc_bounds | |
Так попроще будет
is_above_quality_threshold(seq_record, quality_threshold)): | ||
good_reads += [seq_record] | ||
if len(good_reads) == 0: | ||
raise ValueError('There are no sequences suited to requirements') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Очень здорово, что есть ошибка на такой случай
input_filename = os.path.split(input_path)[-1] | ||
output_filename = input_filename | ||
if not (output_filename.endswith('.fastq')): | ||
output_filename = output_filename + '.fastq' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Тоже хорошая проверка
os.mkdir(path) | ||
output_path = os.path.join(path, output_filename) | ||
if os.path.exists(output_path): | ||
error = 'File with such name exists! Change output_filename arg!' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Еще одна прекрасная проверка!
gc_counter = 0 | ||
for nucl in self.seq: | ||
if nucl in ('G', 'C', 'g', 'c'): | ||
gc_counter += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не поспорить)
""" | ||
rule_complement = 'ATCG'.maketrans('AaTtCcGg', 'TtAaGgCc') | ||
rule_transcription = 'AUCG'.maketrans('Tt', 'Uu') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Вааааау, это очень классно!
Review TAOK3