Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unmapped Region in Y Chromosome Assembly Using Verkko #311

Open
LeoHongboWANG opened this issue Jan 2, 2025 · 3 comments
Open

Unmapped Region in Y Chromosome Assembly Using Verkko #311

LeoHongboWANG opened this issue Jan 2, 2025 · 3 comments

Comments

@LeoHongboWANG
Copy link

LeoHongboWANG commented Jan 2, 2025

Hi,

I am assembling the Y chromosome of my study species using Verkko with the following data:

90x ultra-long ONT reads
100x HiFi reads
200x Hi-C reads
Assembly comparison:
Hifiasm: 5 contigs, total length 43 Mb.
Verkko: Single contig, 47 Mb, with telomeres at both ends.
I am very excited, however, when mapping HiFi and ONT reads back to the Verkko assembly, I found a 10 kb gap in the Y chromosome contig where no reads align. (minimap2 -ax map-ont -t 32 -a -k 19 -O 5,56 -E 4,1 -B 5 -z 400,50 -r 2k --eqx --secondary=no hap1.fa pass.ul.fq.gz >ont.sam)
issue
Questions:
Why do no reads align in this 10 kb region?
Do you know if this additional sequence in Verkko's assembly is accurate?
Thanks for your help!

Best regards,
Hongbo

@skoren
Copy link
Member

skoren commented Jan 2, 2025

Could be several reasons including mapping parameters or genome repetitiveness or incorrect/higher error sequence. Since the region is relatively short I suspect verkko would have resolved it correctly since it's also shorter than most HiFi reads. I'd suggest mapping as in the T2T polish pipeline as we've found those alignments to be more reliable and see whether the region is covered then: https://github.com/arangrhie/T2T-Polish.

@LeoHongboWANG
Copy link
Author

Hi @skoren,

Thank you for your suggestion! Using the T2T-Polish pipeline, I found some ultralong reads spanning the 10 kb region, but coverage remains sparse. Would this limited support be enough to confirm the region’s accuracy, or could this still be due to this region's repetitiveness or potential sequencing errors?
issue2

Thanks again for your help!

Best regards,
Hongbo

@skoren
Copy link
Member

skoren commented Jan 3, 2025

The support doesn't look great so I suspect it's an assembly error, the regions probably do go near each other but the sequence in between is not accurate. The T2T polish will flag regions with bad clipping reads/etc and I suspect this will be flagged as such an issue region. I'd recommend flagging this as an issue in the assembly (or updating it have Ns).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants