-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathacr_aca.py
executable file
·50 lines (45 loc) · 1.72 KB
/
acr_aca.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
'''
*****************************************************************************************************
Purpose:
Contains methods used to parse and handle files produced by acr_aca_finder.py
Author:
Javi Gomez - https://github.com/rtomyj wrote the version 1.0.
Haidong Yi - https://github.com/haidyi revised the codes, remove some bugs, wrote the version 2.0.
Project:
This work is advised by Dr. Yanbin Yin at UNL - [email protected]
*****************************************************************************************************
'''
'''
# GENERATOR #
Background:
A locus is a region that contains Acr/Aca proteins and satisfies all 4 filters used to find Acr/Aca loci.
Primarily, proteins in a locus are close to each other.
A loci is more than one locus.
Summary:
Parses file line by line to obtain a locus. Once an empty line is reached a locus has ended.
Yields next locus of handle to caller.
Params:
Handle - file handle to parse
Yields:
proteins of a locus where each protein is a line of the main file.
'''
def getLocus(handle):
locus = ""
for line in handle:
if line.strip() == "" and locus != "": # blank line means there are no more proteins in that locus
yield locus.strip() # yeild locus to caller
locus = "" # reset
else:
locus += line # append new protein to locus
'''
Summary:
Obtains the starting BP and the ending BP of a given locus.
Params:
locusProteins - string that contains all proteins in a locus where each line contains one protein.
Returns:
(starting BP of loci), (ending BP of loci)
'''
def getLocusStartAndEnd(locusProteins:str):
start = locusProteins[0].split('\t')[3]
end = locusProteins[len(locusProteins) - 1].split('\t')[4]
return int(start), int(end)