Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubleshooting creating a SNP distance from Parsnp .ggr file #172

Open
djbradshaw2 opened this issue Jan 17, 2025 · 0 comments
Open

Troubleshooting creating a SNP distance from Parsnp .ggr file #172

djbradshaw2 opened this issue Jan 17, 2025 · 0 comments

Comments

@djbradshaw2
Copy link

Dear Parsnp developers,

Thank you for the great tool! After running Parsnp I wanted to create a SNP distance matrix, per what I could find the in literature, I used snp-dists to create a snp matrix from parsnpLCB.aln file created from the .ggr file parsnp normally outputs. Relevant code and versions are below.

Versions:
parsnp 1.7.4
harvestools 1.2
snp-dists 0.8.2

Scripts
parsnp -r SX514.polish.fna -d FSIS_assemblies -o parsnp_SX514_Ref --vcf --threads 12
harvesttools -i parsnp.ggr -M parsnpLCB.aln
snp-dists parsnpLCB.aln > distances.tab -b
snp-dists parsnpLCB.aln > distances_pw.txt -m

As a sanity check, when I compared the sums of counts >0 (to account for >1 alternative bases) from the vcf columns to the column in the distance matrix for the reference, I see differences among a couple of the total core SNPs (41/4775 isolates). All the differences were due to one additional SNP in the snp-dists version of the core SNPs to the reference.

I wanted to check what could be causing this difference to occur? It seems like snp-dists may have found an additional SNP (or multiple SNPs) that did not make it into the vcf file? Would there be filtering of SNPs between the .ggr file and the creation of the .vcf file?

Alternatively, is there a way within harvestools/parsnp to create a SNP distance matrix from the .ggr or .aln files so I do not have to use an outside tool? Or from the vcf file?

I understand this may be a snp-dists issue, but wanted to ask some Parsnp relevant questions. I hope that is okay.

Example tables are below. Please let me know if you need any additional information.

Thanks for your time and help.

Sincerely,

David

VCF Example

<style> </style>
  Ref Sample_1 Sample_2 Sampl_3
Position_1 0 1 0 0
Position_2 0 2 1 0
Position_3 0 0 1 2
Position_4 0 0 1 1
Position_5 0 0 1 1
Total SNPs vs Ref 0 2 4 3

snp-dists example

<style> </style>
  Ref Sample_1 Sample_2 Sample_3
Ref 0 2 4 4
Sample_1 2 0 5 5
Sample_2 4 5 0 4
Sample_3 4 5 4 0

Comparison example

<style> </style>
  vcf_core_snps_Ref dist_core_snps_Ref
Sample_1 2 2
Sample_2 4 4
Sample_3 3 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant