22
22
Lets first create directories to organize files.
23
23
24
24
``` bash
25
- mkdir -p data benchmark reference output happy
25
+ mkdir -p input benchmark reference output happy
26
26
```
27
27
28
28
### Download the GRCh38 Reference
@@ -56,8 +56,8 @@ For this case study, we download the chr20 of a HG004 MAS-Seq BAM.
56
56
``` bash
57
57
HTTPDIR=https://storage.googleapis.com/deepvariant/masseq-case-study
58
58
59
- curl -L ${HTTPDIR} /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.chr20.bam > data /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.chr20.bam
60
- curl -L ${HTTPDIR} /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.chr20.bam.bai > data /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.chr20.bam.bai
59
+ curl -L ${HTTPDIR} /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.chr20.bam > input /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.chr20.bam
60
+ curl -L ${HTTPDIR} /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.chr20.bam.bai > input /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.chr20.bam.bai
61
61
```
62
62
63
63
@@ -69,58 +69,42 @@ include regions where the BAM file has 10x or more coverage.
69
69
``` bash
70
70
HTTPDIR=https://storage.googleapis.com/deepvariant/masseq-case-study
71
71
72
- curl -L ${HTTPDIR} /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.depth.10x.exons.bed > data/HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.depth.10x.exons.bed
73
- ```
74
-
75
-
76
-
77
-
78
- ### Download the MAS-Seq model
79
-
80
- Finally, lets download the MAS-Seq model that we will use to call variants.
81
-
82
- ``` bash
83
- gsutil cp -R gs://deepvariant/models/DeepVariant/1.8.0/savedmodels/deepvariant.masseq.savedmodel .
72
+ curl -L ${HTTPDIR} /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.depth.10x.exons.bed > input/HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.depth.10x.exons.bed
84
73
```
85
74
86
75
### Running DeepVariant MAS-Seq on a CPU-only machine
87
76
88
77
The command below will run the DeepVariant MAS-Seq model and produce an output
89
- VCF ( ` output/out.vcf.gz ` ) .
78
+ VCF.
90
79
91
80
``` bash
92
- BIN_VERSION=" head687331500 "
81
+ BIN_VERSION=" 1.8.0 "
93
82
94
83
sudo docker run \
95
- -v " $( pwd) :$( pwd) " \
96
- -w $( pwd) \
84
+ -v " ${PWD} /input" :" /input" \
85
+ -v " ${PWD} /output" :" /output" \
86
+ -v " ${PWD} /reference" :" /reference" \
97
87
google/deepvariant:" ${BIN_VERSION} " \
98
88
run_deepvariant \
99
- --model_type=PACBIO \
100
- --customized_model=deepvariant.masseq.savedmodel \
101
- --ref=reference/GRCh38_no_alt_analysis_set.fasta \
102
- --reads=data/HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.chr20.bam \
103
- --output_vcf=output/HG004.output.vcf.gz \
89
+ --model_type=MASSEQ \
90
+ --ref=/reference/GRCh38_no_alt_analysis_set.fasta \
91
+ --reads=/input/HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.chr20.bam \
92
+ --output_vcf=/output/HG004.output.vcf.gz \
104
93
--num_shards=$( nproc) \
105
94
--regions=chr20 \
106
- --make_examples_extra_args=" phase_reads=true,sort_by_haplotypes=true,parse_sam_aux_fields=true,realign_reads=false,vsc_min_fraction_indels=0.12,alt_aligned_pileup=diff_channels,trim_reads_for_pileup=true,pileup_image_width=199,min_mapping_quality=1,track_ref_reads=true,partition_size=25000,max_reads_per_partition=0,max_reads_for_dynamic_bases_per_region=1500" \
107
- --disable_small_model=true \
108
- --intermediate_results_dir=output/intermediate_results_dir
95
+ --intermediate_results_dir=/output/intermediate_results_dir
109
96
```
110
97
111
98
** Flag summary**
112
99
113
- * ` --model_type ` - Sets the model and options, but we will override the model
114
- with ` --customized model ` .
100
+ * ` --model_type ` - Sets the model and options for MAS-Seq data.
115
101
* ` --customized_model ` - Points to a model trained using MAS-Seq data.
116
102
* ` --ref ` - Specifies the reference sequence.
117
103
* ` --reads ` - Specifies the input bam file.
118
104
* ` --output_vcf ` - Specifies the output variant file.
119
105
* ` --num_shards ` - Sets the number of shards to the number of available
120
106
processors (` $(nproc) ` ). This is used to perform parallelization.
121
107
* ` --regions ` - Restricts to chr20 to make this case study faster.
122
- * ` --make_examples_extra_args= ` - Passes additional arguments to
123
- make_examples.
124
108
* ` --intermediate_results_dir ` - Outputs results to an intermediate directory.
125
109
This is optional. If you don't need the intermediate files, no need to
126
110
specify this flag.
@@ -132,18 +116,21 @@ For running on GPU machines, or using Singularity instead of Docker, see
132
116
133
117
``` bash
134
118
sudo docker run \
135
- -v $( pwd) :$( pwd) \
136
- -w $( pwd) \
119
+ -v " ${PWD} /benchmark" :" /benchmark" \
120
+ -v " ${PWD} /input" :" /input" \
121
+ -v " ${PWD} /output" :" /output" \
122
+ -v " ${PWD} /reference" :" /reference" \
123
+ -v " ${PWD} /happy:/happy" \
137
124
jmcdani20/hap.py:v0.3.12 /opt/hap.py/bin/hap.py \
138
- benchmark/HG004_GRCh38_1_22_v4.2.1_benchmark.vcf.gz \
139
- output/HG004.output.vcf.gz \
140
- -f benchmark/HG004_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed \
141
- -r reference/GRCh38_no_alt_analysis_set.fasta \
142
- -o happy/happy.output \
125
+ / benchmark/HG004_GRCh38_1_22_v4.2.1_benchmark.vcf.gz \
126
+ / output/HG004.output.vcf.gz \
127
+ -f / benchmark/HG004_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed \
128
+ -r / reference/GRCh38_no_alt_analysis_set.fasta \
129
+ -o / happy/happy.output \
143
130
--engine=vcfeval \
144
131
--pass-only \
145
132
-l chr20 \
146
- --target-regions=data /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.depth.10x.exons.bed \
133
+ --target-regions=/input /HG004.giab_na24143.hifi_reads.lima.0--0.lima.IsoSeqX_bc01_5p--IsoSeqX_3p.refined.grch38.mm2.splitN.fc.depth.10x.exons.bed \
147
134
--threads=$( nproc)
148
135
```
149
136
0 commit comments