Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterative motif search #10

Open
ozgunbabur opened this issue Apr 28, 2022 · 8 comments
Open

Iterative motif search #10

ozgunbabur opened this issue Apr 28, 2022 · 8 comments
Assignees

Comments

@ozgunbabur
Copy link
Contributor

Write an iterative method that will start by looking for enrichments and deficiencies for each location and each amino acid. Then it will

  • select the significant subset using a given threshold
  • for each significant result, do a new search within the pattern of that result

Iterate this until nothing comes out as significant. This will give you a tree of results, but you will see that some of the nodes on the tree will converge in the same motif, meaning it is actually a DAG.

Report each significant motif with their p-values, and with their parent-child relations.

@AdamFinkleUMB
Copy link
Collaborator

Tentatively done but not sure if result is what you want.

@ozgunbabur
Copy link
Contributor Author

Hi Adam, please describe what you have done about this issue and please tell how we can test it.

@ozgunbabur ozgunbabur reopened this May 5, 2022
@AdamFinkleUMB
Copy link
Collaborator

I fixed the bug we saw today: I needed to convert the number of the motif into a character. The search now neatly returns a readable result if the threshold is kept low.

@ozgunbabur
Copy link
Contributor Author

What is the result on the simulated dataset with window 5?

@AdamFinkleUMB
Copy link
Collaborator

path = "test_data/simulated-phosphoproteomic-data.txt"
window = 5; length = 2 * window + 1
step = 1024
threshold = 0.0005

Key:
motif => [newfound_motifs]
(index, letter, presence)
Index is relative to 0 at the left, letter is the amino acid,
and presence is whether the acid must appear (True) or absent (False)

22212
None => [(4, 'S', False), (5, 'P', True)]

18981
(4, 'S', False) => [(5, 'P', True), (9, 'I', False)]

1239
(5, 'P', True) => []

15209
(9, 'I', False) => [(4, 'H', True), (5, 'P', True), (7, 'K', True)]

440
(4, 'H', True) => []

1017
(5, 'P', True) => []

1918
(7, 'K', True) => [(5, 'P', True)]

106
(5, 'P', True) => []

1512
(5, 'P', True) => []

Final Graph: {None: (5, 'P', True)}

@ozgunbabur
Copy link
Contributor Author

How should we read these? I would like to understand the resulting DAG structure.

@AdamFinkleUMB
Copy link
Collaborator

The resulting acyclic graph in this case would be the original sequences with a single edge of (5, "P", True) leading to only the sequences with a "P" at index 5. The method itself works, and understanding the DAG structure can be part of the visualization issue.

@ozgunbabur
Copy link
Contributor Author

Let me give an example: The output above produces "(5, 'P', True) => []" twice in the last steps. Why is that?

How can we look at this output and draw the DAG?

Also, where does the index 5 map on the sequence? Is it the center?

@ozgunbabur ozgunbabur reopened this May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants