Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "tseq" output & make available to Python #1201

Closed
wants to merge 2 commits into from

Conversation

nickzoic
Copy link

It'd be handy to be able to retrieve the sequence which minimap2 has matched with, to generate HGVS strings or etc.

You could reconstruct this from the long form CS string (see #1194) but it seems like it'd be handy to just retrieve the whole string in one go.

This PR adds an mm_gen_tseq C function to retrieve the sequence, and a tseq property on the python Alignment object.

I've called it "tseq" because it's called that in the write_cs_ds_or_MD function but it's not necessarily a great name.

@nickzoic
Copy link
Author

G'day @lh3 just wondering if you've had a chance to consider this one? I've rebased it so it should work against current master. The purpose is to extract the sequence matched against so that it's easier to extract the differences.

@lh3
Copy link
Owner

lh3 commented Dec 16, 2024

You can use mappy.Aligner.seq() to retrieve reference subsequences.

@lh3 lh3 closed this Dec 16, 2024
@nickzoic
Copy link
Author

Oh, awesome, I'd missed that!

That means I can get what I need using aligner.seq(alignment.ctg)[alignment.r_st:alignment.r_en].

Thanks!

@lh3
Copy link
Owner

lh3 commented Dec 16, 2024

Better use aligner.seq(alignment.ctg, alignment.r_st, alignment.r_en)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants