This repository was archived by the owner on Mar 2, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 12
This repository was archived by the owner on Mar 2, 2021. It is now read-only.
svtk bedcluster doesn't cluster events with identical coordinates #99
Copy link
Copy link
Open
Description
E.g. for this input:
$ cat input.bed
chr10 105297001 105303000 chr10:105297001-105303000 WHB3855 deletion
chr10 105297001 105302000 chr10:105297001-105302000 WHB3873 deletion
chr10 105297001 105303000 chr10:105297001-105303000 WHB3880 deletion
chr10 105297001 105303000 chr10:105297001-105303000 WHB3882 deletion
chr10 105297001 105303000 chr10:105297001-105303000 WHB3884 deletion
chr10 105297001 105302000 chr10:105297001-105302000 WHB3904 deletion
chr10 105297001 105303000 chr10:105297001-105303000 WHB3934 deletion
chr10 105297001 105303000 chr10:105297001-105303000 WHB3939 deletion
chr10 105297001 105303000 chr10:105297001-105303000 WHB3961 deletion
Running svtk bedcluster gives the following output:
$ svtk bedcluster input.bed output.bed --merge-coordinates --prefix bedcluster --frac 0.8
$ cat output.bed
#chrom start end name svtype sample call_name vaf vac pre_rmsstd post_rmsstd
chr10 105297001 105303000 bedcluster_0 deletion WHB3855 chr10:105297001-105303000 0.111 1 0.000 0.000
chr10 105297001 105302000 bedcluster_1 deletion WHB3873 chr10:105297001-105302000 0.111 1 0.000 0.000
chr10 105297001 105303000 bedcluster_2 deletion WHB3880 chr10:105297001-105303000 0.111 1 0.000 0.000
chr10 105297001 105303000 bedcluster_3 deletion WHB3882 chr10:105297001-105303000 0.111 1 0.000 0.000
chr10 105297001 105303000 bedcluster_4 deletion WHB3884 chr10:105297001-105303000 0.111 1 0.000 0.000
chr10 105297001 105302500 bedcluster_5 deletion WHB3904 chr10:105297001-105302000 0.222 2 500.000 500.000
chr10 105297001 105302500 bedcluster_5 deletion WHB3961 chr10:105297001-105303000 0.222 2 500.000 500.000
chr10 105297001 105303000 bedcluster_6 deletion WHB3934 chr10:105297001-105303000 0.111 1 0.000 0.000
chr10 105297001 105303000 bedcluster_7 deletion WHB3939 chr10:105297001-105303000 0.111 1 0.000 0.000
There's actually 6 calls that share the same coordinates, but they are reported with different bedcluster ids:
$ cat output.bed | cut -f 1-3 | grep -v '^#' | sort | uniq -c | sort -r
6 chr10 105297001 105303000
2 chr10 105297001 105302500
1 chr10 105297001 105302000
A suggestion: ensure that events with identical coordinates have the same bedcluster id in the clustered output bed.
Thanks!
Metadata
Metadata
Assignees
Labels
No labels