-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signal data format support (e.g., BigWig) #56
Comments
@marcomass, As i understand that BigWig is a binary indexed version of Wiggle format. And Wiggle format is compressed, less accurate, version of BedGraph. Why do not we use BedGraph and always convert BigWig and Wig files to BedGraph ? |
@akaitoua In any case, this issue regards two aspects:
|
@marcomass, I check it and these are more details. I suggests to support only BigGraph format since it does not change our data model. So when ever we copy data into GMQL we change the format to what we call GMQL_WIG => which is a BEDGraph but in columnar format, which is binary that GMQL can read and small in Size in fomparison to BEDGraph. Then we store GMQL_WIG in our repository. Why not BIGWIG and WIG for GMQL, is because we are performing different type of queries than the others in the field. We are performing always a full join between the reference and the experiment (set of regions in the reference almost equal size to the experiment sample). In case we will start supporting an interval joins (which is like selecting small portion of the BIGWIG file) then it is better to change GMQL to index which will be faster in this case. |
@akaitoua Do you think that using BEDGraph (or GMQL_WIG) as an experiment dataset of a MAP using genes as reference regions in the reference dataset (thus, about 25000 for human) would be handle by the current system with reasonable performance? |
Enable the use of signal data sample (e.g. BigWig) in some operands, e.g., as second operand of MAP (or in COVER, to be discussed).
Possibly/probably a specific "special" versio,n of the defined MAP operator could be better.
Examples of BigWig files (from 0.6 to 1.5 GB) are available at https://www.encodeproject.org/experiments/ENCSR620VIC/
The text was updated successfully, but these errors were encountered: