Skip to content

BGZIP support #228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: develop
Choose a base branch
from
Draft

BGZIP support #228

wants to merge 8 commits into from

Conversation

rhpvorderman
Copy link
Collaborator

Checklist

  • Pull request details were added to CHANGELOG.rst
  • Documentation was updated (if needed)

@marcelm I noticed Sequali was bottlenecked by the gzip decompression. On the latest develop branch, analysing ONT files is faster than decompressing them. On BAM formats however this can be theoretically circumvented beacuse all BGZIP blocks are independent. FASTQ files are also often bgzip compressed. So I set out to write some code to alleviate this.

It works. Question is how to integrate this properly into xopen. My thoughts on this are

  • Create a separate bgzip module here (isal.bgzip).
  • Create a bgzip.open function. One threaded opening is moved off to the single-threaded opener in isal.igzip_threaded as there is less overhead involved.
  • More threads are moved off to the _ThreadedBgzipReader class
  • In xopen detect the bgzip format by parsing the gzip header and use bgzip.open if that is the case.
  • I have no plans to support writing yet, but that could be useful if dnaio supports uBAM writing in the future. I will leave that job for when I need it though.

What are your thoughts on this?

subfield_length != 2
) {
PyErr_Format(
PyExc_ValueError,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a BgzipFormatError or something in this module to raise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant