SomaticWrapper is a fully automated and modular pipeline for detecting somatic variants from paired tumor–normal WGS/WXS data on the LSF compute1 cluster (WashU).
It integrates multiple industry-standard variant callers — Strelka2, VarScan2, Mutect1, and Pindel — and produces comprehensive, annotated mutation calls in MAF format.
- SNV calls: intersection of 2 out of 3 callers — Strelka2, Mutect1, VarScan2
- Indel calls: intersection of 2 out of 3 callers — Strelka2, VarScan2, Pindel
- Reference genome: Human GRCh38 (HG38)
- Scheduler: LSF (supports job dependencies and groups)
Final output files:
dnp.annotated.maf→ all variantsdnp.annotated.coding.maf→ coding variants only
-
Added Step 0 — automatically submits the full pipeline (Steps 1 → 11) with job dependencies (
j2waits forj1, etc.). -
Added Step 22 — automatically submits the full pipeline (Steps 2 → 11) with job dependencies (
j3waits forj2, etc.). -
Added Step 23 — automatically submits the full pipeline (Steps 3 → 11) with job dependencies
-
Added Step 24 — automatically submits the full pipeline (Steps 4 → 11) with job dependencies
-
Added Step 25 — automatically submits the full pipeline (Steps 5 → 11) with job dependencies
-
Added Step 26 — automatically submits the full pipeline (Steps 6 → 11) with job dependencies
-
Added Step 27 — automatically submits the full pipeline (Steps 7 → 11) with job dependencies
-
Added Step 28 — automatically submits the full pipeline (Steps 8 → 11) with job dependencies
-
Added Step 29 — automatically submits the full pipeline (Steps 9 → 11) with job dependencies
-
Added Step 30 — automatically submits the full pipeline (Steps 10 → 11) with job dependencies
Before running, update your ~/.bashrc to include the necessary environment variables:
export PATH=/storage1/fs1/songcao/Active/Software/anaconda3/bin:$PATH
export STORAGE1=/storage1/fs1/songcao/Active
export STORAGE2=/storage1/fs1/dinglab/Active
export STORAGE3=/storage1/fs1/m.wyczalkowski/Active
export LSF_DOCKER_VOLUMES="$STORAGE1:$STORAGE1 $STORAGE2:$STORAGE2 $STORAGE3:$STORAGE3"Then activate:
source ~/.bashrcgit clone https://github.com/YourGitRepo/somaticwrapper.git
cd somaticwrapperExample:
mkdir -p /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025
mkdir -p /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025/logUse --step 0 to run Steps 1–14 sequentially with built-in job dependencies:
perl somaticwrapper.pl --step 0 --rdir /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025 --log /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025/log --ref /storage1/fs1/songcao/Active/Database/hg38_database/GRCh38.d1.vd1/GRCh38.d1.vd1.fa --smg /storage1/fs1/songcao/Active/Database/SMG/smg_list.txt --groupname example_run_somatic_2025 --users scao --wgs 0 --srg 1 --sre 0 --exonic 1 --q long --mincovt 14 --mincovn 8 --minvaf 0.05 --maxindsize 100perl somaticwrapper.pl --step 5 --rdir <run_dir> --log <log_dir> ...| Step | Description |
|---|---|
| 0 | Submit steps (1–11) automatically with dependencies |
| 1 | Run Strelka2 |
| 2 | Run VarScan2 |
| 3 | Run Pindel |
| 4 | Run Mutect1 |
| 5 | Parse Mutect results |
| 6 | Parse Strelka2 results |
| 7 | Parse VarScan2 results |
| 8 | Parse Pindel results |
| 9 | QC VCF files |
| 10 | Merge VCF files |
| 11 | Generate MAF files |
| 12 | Merge run-level MAF |
| 13 | DNP annotation |
| 14 | Clean unnecessary intermediate files |
| 22 | Submit steps (2–11) automatically with dependencies |
| 23 | Submit steps (3–11) automatically with dependencies |
| 24 | Submit steps (4–11) automatically with dependencies |
| 25 | Submit steps (5–11) automatically with dependencies |
| 26 | Submit steps (6–11) automatically with dependencies |
| 27 | Submit steps (7–11) automatically with dependencies |
| 28 | Submit steps (8–11) automatically with dependencies |
| 29 | Submit steps (9–11) automatically with dependencies |
| 30 | Submit steps (10–11) automatically with dependencies |
| Parameter | Description |
|---|---|
--rdir |
Full path to run directory containing per-sample folders |
--log |
Path for log output (usually parent of rdir) |
--srg |
BAM has read groups (1 = yes, 0 = no) |
--sre |
Rerun and overwrite results (1 = yes, 0 = no) |
--wgs |
1 = WGS, 0 = WXS |
--groupname |
Job group name |
--users |
LSF user account (used in job group path) |
--ref |
HG38 reference FASTA |
--smg |
SMG gene list file |
--q |
LSF queue (research-hpc, ding-lab, or long) |
--mincovt |
Minimum tumor coverage (≥ 14) |
--mincovn |
Minimum normal coverage (≥ 8) |
--minvaf |
Minimum variant allele frequency (≥ 0.05) |
--maxindsize |
Maximum indel size (≤ 100) |
--exonic |
Output exonic region (1 = yes, 0 = no) |
run_dir/
├── <sample_name>/
│ ├── strelka/
│ ├── varscan/
│ ├── pindel/
│ ├── mutect1/
│ ├── merged.withmutect.vcf
│ ├── <sample>.withmutect.maf
│ └── <sample>.dnp.annotated.maf
└── log/
├── LSF_DIR_SOMATIC/
└── tmpsomatic/
Author: Song Cao
Email: [email protected]
Washington University in St. Louis