Skip to content

How to repeat file Xreducedall.2002.npy for another organisms #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Jasonxu0109 opened this issue Dec 31, 2021 · 4 comments
Open

How to repeat file Xreducedall.2002.npy for another organisms #23

Jasonxu0109 opened this issue Dec 31, 2021 · 4 comments

Comments

@Jasonxu0109
Copy link

Hi,
Could you provide the python script to produce Xreducedall.2002.npy file, pls?
Maybe we can use the pipeline for another organisms analysis, such as FLY. Thansk in advance!!!

Best Wishes,

@jzthree
Copy link
Collaborator

jzthree commented Jan 3, 2022

For generating the equivalent of Xreducedall.2002.npy for new organisms, you will need to first train a sequence model to predict chromatin profiles in the organism of interest for you first, or use an existing model such as DeepArk (https://www.ncbi.nlm.nih.gov/pubmed/33888512/). It is important that Xreducedall.2002.npy is not computed from NarrowPeak files (as you mentioned in your other post), but is computed from sequence model predictions from sequences centered at major TSS for genes.

Once you have the sequence model predictions, codein this discussion should be helpful for generating the equivalent of Xreducedall.2002.npy #9

@Jasonxu0109
Copy link
Author

Hi,
Thank you for your reply. Importantly, we have sequenced our epigenetics data such as ATAC-seq. Therefore, We want to train our model rather than pre-compute model your provided! However, we can't find python script from the link your gave to preprocess our atac-seq data. Could you provide code and test file to reproduce the Xreducedall file? Thanks in advance!

@jzthree
Copy link
Collaborator

jzthree commented Jan 5, 2022

If you want to train new sequence models for epigenetics data, feel free to check out https://github.com/FunctionLab/selene (there are tutorials and manuscript examples provided). Note that for ExPecto model there are two steps of training that are needed. First, you train the chromatin profiles sequence model, which will allow you to generate the equivalent Xreducedall.2002.npy, then you can modify the ExPecto script to do the second step of training for expression prediction.

@Jasonxu0109
Copy link
Author

Hi,
Thank you so much. It's useful!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants