-
Notifications
You must be signed in to change notification settings - Fork 0
Review AGTRAP #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Review AGTRAP #29
Conversation
output_filename = os.path.splitext(os.path.basename(input_path))[0] | ||
output_filename = f'{output_filename}.fastq' | ||
|
||
os.makedirs('fastq_filtrator_resuls', exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
класс, без лишних проверок - просто параметром проверка существования файла
count_record = 0 | ||
|
||
with open(os.path.join(output_dir, output_filename), mode='w') as out_file: | ||
for record in SeqIO.parse("reads.3.fastq", "fastq"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Кажется кто-то забыл тут свой файлик. Возможно, должно быть
for record in SeqIO.parse("reads.3.fastq", "fastq"): | |
for record in SeqIO.parse(input_path, "fastq"): |
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хе-хе, да
if report: | ||
print(f' Number of source sequences = {count_record},' | ||
f'\n Number of sequences after filtering = {count_filtered_record},') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
крутая идея с report!
def __getitem__(self, key: Union[int, slice]) -> object: | ||
"""Returnы a subsequence in the form of a single letter or a string of several letters""" | ||
if isinstance(key, int): | ||
if 0 >= key or key > len(self.seq): | ||
raise IndexError(f"Index {key} is out of range of the sequence") | ||
return super().__getitem__(key - 1) | ||
elif isinstance(key, slice): | ||
if key.start <= 0 or key.stop > len(self.seq): | ||
raise IndexError(f"Index {key} is out of range of the sequence") | ||
new_slice = slice(key.start - 1, key.stop, key.step) | ||
return super().__getitem__(new_slice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Кажется, что в абстрактном классе определять функции все же не стоит. Все равно мы должны их переопределить в дочерних классах)
"""Checks the sequence to match the alphabet""" | ||
pass | ||
|
||
def __str__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Опять же, все функции абстрактного класса должны быть переопределены в дочерних. Лучше унести это вниз
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Верно
def transcribe(self) -> str: | ||
""" Returns the transcribed DNA """ | ||
return RNASequence(self.seq.replace('T', 'U').replace('t', 'u')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Коротко, красиов и возращает то что нужно - класс!
elif record_type == "three_letter_sequence": | ||
slice_sequence = [seq.upper()[i:i + 3] for i in range(0, len(seq), 3)] | ||
return set(slice_sequence).issubset(type(self).three_letter_alphabet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
круто что есть возможность передавать в 3 буквенном коде!
alphabet = set('ATCGUatcgu') | ||
compliment_dict = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Интересно, зачем тут алфавит, если он переопределяется дочерними классами? Хотя алфавит, конечно, логичный)
alphabet = set('ATCGUatcgu') | |
compliment_dict = None | |
alphabet = None | |
compliment_dict = None |
if type(self) is NucleicAcidSequence: | ||
raise NotImplementedError("The compliment method cannot be called for an object of the NucleicAcidSequence") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Красиво!
class BiologicalSequence(ABC, str): | ||
|
||
def __init__(self, seq: str): | ||
super().__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
не очень понимаю зачем здесь это, хочешь вызвать init от str? Зачем?
super().__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не если делать свой инит, то оно правда наверное надо (иначе мы просто перезаписываем инит, а не дополним), но в абстрактных классах так бывает нужно очень редко, да
def __init__(self, seq: str): | ||
super().__init__() | ||
self.seq = seq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В идеале в абстрактном классе init быть не должно
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ага
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Это очень качественная работа, заставила узнать что я не знала и вспомнить многие декараторы - очень и очень круто! Спасибо! Взяла некоторые вещи себе на на заметку :)
|
||
def reverse(self): | ||
"""Create a reverse sequence""" | ||
return self.__class__(self.seq[::-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Круто реализовано!
if self._check_alphabet(seq): | ||
super().__init__(seq) | ||
else: | ||
raise ValueError(f"The sequence does not match the alphabet for the {self.__class__.__name__}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
кайф!
def __getitem__(self, key: Union[int, slice]) -> object: | ||
"""Returns a subsequence in the form of a single letter or a string of several letters""" | ||
return super().__getitem__(key) | ||
|
||
def __len__(self) -> int: | ||
"""Return the length of the sequence""" | ||
return super().__len__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
А я и не знала что так можно! Очень круто!!!
return super().__getitem__(key) | ||
|
||
if self.record_type == "three_letter_sequence": | ||
if isinstance(key, int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Я уже не могу столько писать, что крутая работа, поэтом вот - еще один комплемент, но на целый блок)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хорошая работа) Кажется, всё же от абстрактных классов наследовать через super не стоит, лучше переопределять их в дочерних классах) Докстринги подробные, аннотации типов местами не хватает, но код читается хорошо. Спасибо за работу!)
if not isinstance(gc_bounds, tuple): | ||
gc_bounds = (0, gc_bounds) | ||
if not isinstance(length_bounds, tuple): | ||
length_bounds = (0, length_bounds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Кажется, хорошо бы проверить, что если эти параметры не тюплы, то числа. А то получается, что и string можно засунуть, если они не тюплы
with open(os.path.join(output_dir, output_filename), mode='w') as out_file: | ||
for record in SeqIO.parse("reads.3.fastq", "fastq"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Тут ошибочка... Я так подозреваю reads.3.fastq
на самом деле это input_path
with open(os.path.join(output_dir, output_filename), mode='w') as out_file: | |
for record in SeqIO.parse("reads.3.fastq", "fastq"): | |
with open(os.path.join(output_dir, output_filename), mode='w') as out_file: | |
for record in SeqIO.parse(input_path, "fastq"): | |
def __init__(self, seq: str): | ||
super().__init__() | ||
self.seq = seq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Кажется init лишний для абстрактного класса)
@abstractmethod | ||
def __len__(self): | ||
"""Return the length of the sequence""" | ||
return super().__len__() | ||
|
||
@abstractmethod | ||
def __getitem__(self, key: Union[int, slice]) -> object: | ||
"""Returnы a subsequence in the form of a single letter or a string of several letters""" | ||
if isinstance(key, int): | ||
if 0 >= key or key > len(self.seq): | ||
raise IndexError(f"Index {key} is out of range of the sequence") | ||
return super().__getitem__(key - 1) | ||
elif isinstance(key, slice): | ||
if key.start <= 0 or key.stop > len(self.seq): | ||
raise IndexError(f"Index {key} is out of range of the sequence") | ||
new_slice = slice(key.start - 1, key.stop, key.step) | ||
return super().__getitem__(new_slice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Функции позже в дочерних классах переопределяются, абстракный класс выступает в качестве шаблона, потому в нём подобные определения методов не нужны)
alphabet = set('ATCGUatcgu') | ||
compliment_dict = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Такс, тут нужно бы определять через self
, и нет нужды сохранять алфавит, если позже он будет переопределяться
alphabet = set('ATCGUatcgu') | |
compliment_dict = None | |
set.alphabet = None | |
set. compliment_dict = None | |
def __init__(self, seq: str): | ||
if self._check_alphabet(seq): | ||
super().__init__(seq) | ||
else: | ||
raise ValueError(f"The sequence does not match the alphabet for the {self.__class__.__name__}") | ||
|
||
def __getitem__(self, key: Union[int, slice]) -> object: | ||
"""Returns a subsequence in the form of a single letter or a string of several letters""" | ||
return super().__getitem__(key) | ||
|
||
def __len__(self) -> int: | ||
"""Return the length of the sequence""" | ||
return super().__len__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Вот тут как раз хорошо бы определить все методы, не наследуя их от абстрактного класса, но кажется и это рабочка)
class AminoAcidSequence(BiologicalSequence): | ||
one_letter_alphabet = set('GAVLIMPFWSTNQYCKRHDE') | ||
three_letter_alphabet = {'GLY', 'ALA', 'VAL', 'LEU', 'ILE', 'MET', 'PRO', 'PHE', 'TRP', 'SER', 'THR', 'ASN', 'GLN', | ||
'TYR', 'CYS', 'LYS', 'ARG', 'HIS', 'ASP', 'GLU' | ||
} | ||
amino_acid_weights = { | ||
'A': 89, 'R': 174, 'N': 132, 'D': 133, 'C': 121, | ||
'E': 147, 'Q': 146, 'G': 75, 'H': 155, 'I': 131, | ||
'L': 131, 'K': 146, 'M': 149, 'F': 165, 'P': 115, | ||
'S': 105, 'T': 119, 'W': 204, 'Y': 181, 'V': 117, | ||
'GLY': 75, 'ALA': 89, 'VAL': 117, 'LEU': 131, 'ILE': 131, 'MET': 149, | ||
'PRO': 115, 'PHE': 165, 'TRP': 204, 'SER': 105, 'THR': 119, 'ASN': 132, 'GLN': 146, | ||
'TYR': 181, 'CYS': 121, 'LYS': 146, 'ARG': 174, 'HIS': 155, 'ASP': 133, 'GLU': 147, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Сэты в данном случае кажется не слишком подходят, достаточно одного словаря с однобуквенными ключами и трибуквенными значениями) Тогда и словарь с массами станет в два раза меньше, а то получается избыточным
if type(self) is NucleicAcidSequence: | ||
raise NotImplementedError("The compliment method cannot be called for an object of the NucleicAcidSequence") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Круто! То, что комплемент не может вызываться для NucleicAcidSequence кажется разумным, в прошлом ревью я это не заметил)))) А у тебя клёво)
if isinstance(key, int): | ||
if 0 >= key or key > len(self.seq): | ||
raise IndexError(f"Index {key} is out of range of the sequence") | ||
return super().__getitem__(slice((key * 3) - 2, key * 3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Чуть сложно воспринимается(
:param report:report = True (show filtering report), report = True (default) | ||
:return:A filtered fastq file containing sequences | ||
that have passed all tests (according to specified parameters) | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ну это чисто отвал фляги, спасибо за подбробную аннотацию, это невероятно приятно видеть и читать
output_filename = os.path.splitext(os.path.basename(input_path))[0] | ||
output_filename = f'{output_filename}.fastq' | ||
|
||
os.makedirs('fastq_filtrator_resuls', exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
не знал, что так можно. интересно и хорошо
count_filtered_record = 0 | ||
count_record = 0 | ||
|
||
with open(os.path.join(output_dir, output_filename), mode='w') as out_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
вероятно тут просто input_path
if length_bounds[0] <= len(record.seq) <= length_bounds[1]: | ||
if np.mean(record.letter_annotations['phred_quality']) > quality_threshold: | ||
count_filtered_record += 1 | ||
SeqIO.write(record, out_file, "fastq") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Это единственное место где используется np. Вероятно, можно этого избежать, что бы не импортировать лишнюю библиотку для одной функции
if key.start <= 0 or key.stop > len(self.seq): | ||
raise IndexError(f"Index {key} is out of range of the sequence") | ||
new_slice = slice(key.start - 1, key.stop, key.step) | ||
return super().__getitem__(new_slice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Все хорошо и понятно
|
||
class NucleicAcidSequence(BiologicalSequence): | ||
alphabet = set('ATCGUatcgu') | ||
compliment_dict = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Очень хороший словарь для проверки сразу двух аминокислот, мне кажется, что действительно правильно писать его в двух регистрах, потому что пользователь может писать в обоих форматах, это прям хорошо
if self._check_alphabet(seq): | ||
super().__init__(seq) | ||
else: | ||
raise ValueError(f"The sequence does not match the alphabet for the {self.__class__.__name__}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
красиво!!!
'PRO': 115, 'PHE': 165, 'TRP': 204, 'SER': 105, 'THR': 119, 'ASN': 132, 'GLN': 146, | ||
'TYR': 181, 'CYS': 121, 'LYS': 146, 'ARG': 174, 'HIS': 155, 'ASP': 133, 'GLU': 147, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
немного сложно такое писать, это прям лично моя предвзятая вкусовщина, вероятнее лечше писать построчно словари, но на работу никак не влияет, просто лично мое предпочтение. Классно, что работает с однобуквенным и трехбуквенным форматом
elif self.record_type == "three_letter_sequence": | ||
for i in range(0, len(self.seq), 3): | ||
molecular_weight += type(self).amino_acid_weights[self.seq[i:i + 3].upper()] | ||
return molecular_weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Это очень хорошая и красивая работа, из которой я многое подчерпнул для себя, спасибо за такую работу, она прям очень хороша и красива
Review AGTRAP