-
Notifications
You must be signed in to change notification settings - Fork 21
Added a clustal test file, an example notebook, and alignment parsing… #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… in the _read function. Also made two hidden funcitons to iterate sequence and alignment biopython objects.
Awesome! This looks good. I'm testing it now and will leave comments if I see things that need fixing. |
One general comment about coding style: I typically use a "lowercase and underscore" convention when naming instance variables, functions, This convention was set by PEP 8. I'd like to follow PEP 8 as best as we can. I'll leave comments in the PR where this convention is broken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only marked a few of the places where PEP 8 was broken. There a few more that need fixing as well.
|
||
# Distinguish sequence vs. alignment, pass SeqRecords | ||
if schema == 'clustal': | ||
alignIter = AlignIO.parse(filename, format=schema, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow PEP 8: alignIter
-> align_iter
if schema == 'clustal': | ||
alignIter = AlignIO.parse(filename, format=schema, **kwargs) | ||
# Parse the multiple sequence alignments | ||
alignDict = _parseAlignRec(alignIter, seq_label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow PEP 8: alignDict
-> align_dict
# Return DataFrame. | ||
return data | ||
|
||
def _parseAlignRec(alignIter, seq_label): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow PEP 8:
_parseAlignRec
->_parse_align_record
alignIter
->align_iter
|
||
return alignData | ||
|
||
def _parseSeqRec(seqIter, seq_label): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow PEP 8:
_parseSeqRec
->_parse_seq_rec
seqIter
->seq_iter
def _parseSeqRec(seqIter, seq_label): | ||
"""A Bio.SequenceRecord parser. | ||
""" | ||
seqDict = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow PEP 8: seqDict
-> seq_dict
Continue below.
I think I agree with this statement... give each item a value in the Could you explain the output in the example clustal? Are each mouse/opossum line-pairs just segments of one longer sequence? Or are each pair a different sequence? |
… in the _read function. Also made two hidden funcitons to iterate sequence and alignment biopython objects.
Did a bit of hacking to get the dataframes to have the same form as previously. Multiple sequence alignments are spit out as a multiindexed dataframe.