Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference Sequence SNP Call #118

Closed
hdesale2408 opened this issue Jul 19, 2022 · 3 comments
Closed

Reference Sequence SNP Call #118

hdesale2408 opened this issue Jul 19, 2022 · 3 comments

Comments

@hdesale2408
Copy link

hdesale2408 commented Jul 19, 2022

Hello, I ran parsnp using a 17 whole genomes. I picked one of these genomes to use as a reference, but kept the sequence file in the directory still. After running parsnp I looked at the VCF output and it seems that there are SNPs being called when the reference sequence is mapped against itself. Does this make sense? It should be the exact same sequence mapped against each other, so why would there be SNPs?

Thank you!

@bkille
Copy link
Contributor

bkille commented Nov 16, 2023

Hi @hdesale2408,

Sorry for the delay in responding. Yes, there should not be SNPs between the same sequence in an alignment. There have been some noted (but rare) issues w/ Parsnp incorrectly parsing .gbk files. Did your input contain .gbk/.gb files by any chance?

@bkille
Copy link
Contributor

bkille commented Jan 5, 2024

Hi @hdesale2408,

This bug has been fixed in Parsnp 2.0. Please let me know if you continue to experience it!

@bkille bkille closed this as completed Jan 5, 2024
@vinicius-santos-bmc
Copy link

vinicius-santos-bmc commented Dec 4, 2024

The bug is still happening in versions >= 2.0:

##FILTER=<ID=IND,Description="Column contains indel">
##FILTER=<ID=N,Description="Column contains N">
##FILTER=<ID=LCB,Description="LCB smaller than 200bp">
##FILTER=<ID=CID,Description="SNP in aligned 100bp window with < 50% column % ID">
##FILTER=<ID=ALN,Description="SNP in aligned 100b window with > 20 indels">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	GCF_000146045.2_R64_genomic.fasta.ref	PE-2.purified.assembly.fasta	2770Lv1.purified.assembly.fasta
NC_001136.10	3320	CGCTCAAACG.GAGGCCATGC	G	A	40	LCB	NA	GT	0	0	1
NC_001136.10	12147	AAGACATTTT.ACCCCGATAC	A	T,C	40	PASS	NA	GT	1	1	2
NC_001136.10	12186	AGCCATCATT.GAAGCCGCTC	G	T,C	40	PASS	NA	GT	1	2	2
NC_001136.10	12187	GCCATCATTG.AAGCCGCTCC	A	G,C	40	PASS	NA	GT	1	2	2
NC_001136.10	12188	CCATCATTGA.AGCCGCTCCG	A	C	40	PASS	NA	GT	0	0	1
NC_001136.10	12192	CATTGAAGCC.GCTCCGAATA	G	C,T	40	PASS	NA	GT	1	2	2
NC_001136.10	12195	TGAAGCCGCT.CCGAATAACA	C	T,A	40	PASS	NA	GT	1	2	2
NC_001136.10	12198	AGCCGCTCCG.AATAACAGAC	A	G	40	PASS	NA	GT	1	0	0
NC_001136.10	12204	TCCGAATAAC.AGACATTTAC	A	C	40	PASS	NA	GT	1	1	0
NC_001136.10	12222	ACGACGGCCT.ATTTTGTCTA	A	T	40	PASS	NA	GT	1	1	0
NC_001136.10	12258	AATAGTGGAG.AAGAAATTCA	A	G	40	PASS	NA	GT	1	1	0
NC_001136.10	12264	GGAGAAGAAA.TTCACTGTAC	T	A,G	40	PASS	NA	GT	1	2	2
NC_001136.10	12288	GACGATCGAA.GTCTCAAACC	G	A,C	40	PASS	NA	GT	1	2	2
NC_001136.10	12309	ATCAGTAAGC.CCAActgatt	C	A	40	PASS	NA	GT	0	0	1
NC_001136.10	12312	AGTAAGCCCA.Actgatttga	A	G	40	PASS	NA	GT	0	0	1
NC_001136.10	12417	AAGGACTTTT.GTTTTGACGG	G	T,A	40	PASS	NA	GT	1	2	2

GCF_000146045.2_R64_genomic.fasta.ref is the ref genome and is was not supposed no have SNPs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants