Skip to content

mmterpstra/Bestie

Repository files navigation

Bestie

WDL based hts-analysis for slurm cluster with enviroment modules. More scalable then ever before setting discrete cpu, memory, runtimes and disk space based on input files for maximum* sheduling efficiency.

*: Always WIP due to lfs stability and edge cases.

Goals:

  • is to get generic alignment working.
  • basic variant calling (haplotypecallerGvcf).
  • somatic variant calling(MuTect2,...).
  • ichorCNA integration.
  • Variant annotation
  • Functional filtering of vcfs
  • Stability over > 10 samples
  • Rework dir structure to be more in line with warp/other public resources
  • End to end easybuild install

Relevant example resources for writing wdl files :

WARP BioWDL

How to use

Run the cromwell validation tool womtool to validate the input and generate a template input file. for sampleJson see ./tests/data/raw/fastq/samples.json. See below for the most simple run example

 java -Xmx8g -Dconfig.file=./path/to/cromwell.conf -jar ./path/to/cromwell.jar run Bestie.wdl -i inputs_integration.json

prepping a samplesheet/json for input

This first part makes a sample tsv with one readgroup per line that can be edited as needed

ls /path/to/raw/fastq/*_R1.fastq.gz | perl -wne 'BEGIN{print "fq1,fq2,sampleName\n"};chomp;print $_;s/_R1./_R2./g; print ",$_"; s/.*/([\w\d]*)_.*/$1/g; print ",$_" ;print "\n"'| perl SampleSheetTool.pl reformatmin /dev/stdin > samplesheet.csv

Converts the samplesheet to json file merging readgroups to samples as needed (some text alignment issues)

perl SamplesheetTool jsondump samplesheet.csv > samplesheet.json

The next command sets up the environment in the /path/to/workflow/output_folder/ copying all the needed files and linking all the needed folders raw and runs cromwell.jar to execute the workflow.

  • This needs '-d /path/to/folder/containing/data/' usually the 'apps/' folder for me.
  • This results in outputs below '/path/to/workflow/output_folder/' containing all the (fixupped) files needed for running the analysis.
fastq folders (example based on `tests/integration/run_local.sh`):
(
    set -ex
    bash tests/run_project.sh \
        -i $PWD/tests/integration/json/fastqToVariants/inputs_local.json \
        -s samplesheet.json \
        -w $PWD \
        -r /path/to/workflow/output_folder/ \
        -f /path/to/raw/fastq/ \
        -d /path/to/folder/containing/data/
)

How to install

easybuild the required modules or use future wrapper module

About

wdl hts-analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published