Skip to content

mpestle/rakeiora_public_somatic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Somatic Workflow - kinghorn1 dataset version

Rakeiora


Somatic workflow - tumour vs normal.


This workflow is a combination of Ben Curran's original somatic workflow (https://github.com/nesi/rakeiora-public-somatic) and Peter Tsai's scripts to do a somatic analysis.

This workflow utilises a pipe between the pileup and varscan functions and therefore eliminates a massive amount of interim storage on disk that Ben's workflow required. Parallelism is achieved by partitioning the mpileup and varscan steps by chromosome (1-22, X, Y, M).

This is running in the Rakeiora environment on a 16cpu32ram VM in about 3 hours, when snakemake is given 16 cores.

Before running, the calling environment needs to set (and export) the SINGULARITY_BIND environment variable, which needs to include the /shared resources and the attached volume with the dataset, so after you get the volume attached:

export SINGULARITY_BIND=/shared,/rv/kinghorn1

Then

snakemake -c16 --use-singularity

Utilises a singularity specifically built for this workflow which contains:

  • SAMTOOLS_VERSION=1.19
  • BCFTOOLS_VERSION=1.19
  • VCFANNO_VERSION=v0.3.6
  • Java 20

See singularity.def

To build this singularity:

singularity build --fakeroot rakeiora-apps-20250730.sif singularity.def

Two libraries are used from the /shared/lib area, VarScan.v2.4.6.jar and picard.jar (and the picard isn't actually used), along with a shared reference file in /shared/reference/, GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna

The two input files expected are named tumour.bam and normal.bam, in the resources directory, so the definition of the inputs in the Rakeiora sandbox should define these accordingly, presumably with the tumour sample related to/dependent on the previously selected normal one.

Note that a bunch of vcfs come back into the results directory, and the contents of that directory will be copied to the jupyter hub area when the researcher runs a workflow in Rakeiora. So before this is run on real data, the kaitiaki of the data and the workflow reviewers need to be OK with that.

NeSI does not claim ownership or authorship of this workflow. Authorship credits will follow later.

About

Rakeiora public workflow #2. Somatic - tumour vs normal

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages