Skip to content

sph17/down-sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genome Downsampling Wokflow

The genome downsampling workflow consists of downsampling_part1 and downsampling_part2 and executed from Downsample.wdl. The workflow utilizes Samtools, Picard, BWA, GATK, and fastq-tools to randomly down-sample to a desired sequencing coverage.

Downsampling_part1: Converts a CRAM/BAM input file into paired-end FASTQ files and extracts read groups from input file.

Downsampling_part2: Handles downsampling to a specified final coverage and assumes an initial 30x coverage. It outputs read counts, downsampled CRAM file, coverage estimate, marked duplicate reads.

Note: Dockerfile installs:

  • bwa version 0.7.17, which is different than publically available bwa internal index files if using bwakit (0.7.12).
  • samtools 1.11, which may require regeneration of faidx
  • picard 2.23.8
  • fastq-tools 0.8.3

About

Randomly down-sample genome files to a desired sequencing coverage.

Resources

License

Stars

Watchers

Forks

Packages

No packages published