The genome downsampling workflow consists of downsampling_part1 and downsampling_part2 and executed from Downsample.wdl. The workflow utilizes Samtools, Picard, BWA, GATK, and fastq-tools to randomly down-sample to a desired sequencing coverage.
Downsampling_part1: Converts a CRAM/BAM input file into paired-end FASTQ files and extracts read groups from input file.
Downsampling_part2: Handles downsampling to a specified final coverage and assumes an initial 30x coverage. It outputs read counts, downsampled CRAM file, coverage estimate, marked duplicate reads.
Note: Dockerfile installs:
- bwa version 0.7.17, which is different than publically available bwa internal index files if using bwakit (0.7.12).
- samtools 1.11, which may require regeneration of faidx
- picard 2.23.8
- fastq-tools 0.8.3