Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find RG tag correctly using BamTools API #35

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

douglasgscofield
Copy link

@douglasgscofield douglasgscofield commented Jun 11, 2018

accuMUlate uses a homegrown solution to find RG tags attached to reads: using std::string::find() to search for the literal string "\000RGZ". This only works if the previous tag fortuitously ended with \000.

BAM tags containing, e.g., integer values are encoded using a type-sensitive fixed width. "XS:i:20" is encoded as "XSC\024", with the C indicating a single-byte unsigned int. In the problem case this was immediately followed by "RGZ..." encoding the tag for the read group. This is perfectly valid BAM tag encoding, however, the accuMUlate search for "\000RGZ" never found the RG tag for this read or similar reads.

The solution is to use BamTools' own API for finding tags and their values. It knows about these encodings, and searches for them robustly. A single call to BamTools::BamAlignment::GetTag() will return a tag's value and indicate an error. The interface to ReadDataVisitor::GetSampleIndex() is changed slightly and the function is simplified. The constants ReadDataVisitor::RG_TAG and ReadDataVisitor::ZERO_CHAR are removed as they are no longer required.

@dwinter
Copy link
Owner

dwinter commented Jun 11, 2018

Hi Douglas,

Many thanks for identifying this problem and proving a PR to address it. I'll make some time to check this on a few test datasets I have here, then merge this in and make a new release.

@douglasgscofield
Copy link
Author

Hi, I'm hoping this could get merged into the master at some point and be part of the new 0.2.2 that's been started?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants