You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried running sarek (3.3.2) on 47 germline WES samples with --joint_germline from --step variant_calling to get a joint gVCF. I'm using the Agilent BED file with target region with --intervals. The GATK GenomicsDBImport process runs for several days (before getting killed) and generates millions of files occupying terabytes of data (see https://nfcore.slack.com/archives/CGFUX04HZ/p1736549603967039). It does, however, have a helpful suggestion:
05:58:53.313 WARN GenomicsDBImport - A large number of intervals were specified. Using more than 100 intervals in a single import is not recommended and can cause performance to suffer. If GVCF data only exists within those intervals, performance can be improved by aggregating intervals with the merge-input-intervals argument.
And indeed, adding --merge-input-intervals to the process's ext.args via a config file solves the issue.
This should be done automatically whenever a pipeline is run with --wes (indicating a large number of intervals).
Command used and terminal output
nextflow run "$pipelinedir" -profile curc_alpine -ansi-log false \ --step variant_calling --wes --genome GATK.GRCh38 --input "$samplefile" \ --intervals "$scrproj/targets.bed" --outdir "$pipeoutdir" \ --tools haplotypecaller,vep --joint_germline(Runs for days and generates millions of files in the work dir of `GATK4_GENOMICSDBIMPORT`.)
Relevant files
No response
System information
Nextflow 23.04.1
HPC cluster with Red Hat 8,10, SLURM & Apptainer (run as Singularity)
nf-core/sarek 3.3.2
The text was updated successfully, but these errors were encountered:
<!--
# nf-core/sarek pull request
Many thanks for contributing to nf-core/sarek!
Please fill in the appropriate checklist below (delete whatever is not
relevant).
These are the most common things requested on pull requests (PRs).
Remember that PRs should be made against the dev branch, unless you're
preparing a pipeline release.
Learn more about contributing:
[CONTRIBUTING.md](https://github.com/nf-core/sarek/tree/master/.github/CONTRIBUTING.md)
-->
## PR checklist
- [x] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add
tests!
- [ ] If you've added a new tool - have you followed the pipeline
conventions in the [contribution
docs](https://github.com/nf-core/sarek/tree/master/.github/CONTRIBUTING.md)
- [ ] If necessary, also make a PR on the nf-core/sarek _branch_ on the
[nf-core/test-datasets](https://github.com/nf-core/test-datasets)
repository.
- [x] Make sure your code lints (`nf-core pipelines lint`).
- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker
--outdir <OUTDIR>`).
- [ ] Check for unexpected warnings in debug mode (`nextflow run .
-profile debug,test,docker --outdir <OUTDIR>`).
- [ ] Usage Documentation in `docs/usage.md` is updated.
- [ ] Output Documentation in `docs/output.md` is updated.
- [x] `CHANGELOG.md` is updated.
- [ ] `README.md` is updated (including new tool citations and
authors/contributors).
Running sarek with `--joint_germline` on WES samples with an intervals
file containing many thousands of targets causes GATK `GenomicsDBImport`
to create millions of files and run for several days without completing.
Adding the `--merge-intervals` option to that process fixes that. This
PR add the parameter conditional on the `--wes` pipeline parameter.
Closes#1776
---------
Co-authored-by: Thomas <[email protected]>
Co-authored-by: Friederike Hanssen <[email protected]>
Description of the bug
I tried running sarek (3.3.2) on 47 germline WES samples with
--joint_germline
from--step variant_calling
to get a joint gVCF. I'm using the Agilent BED file with target region with--intervals
. The GATKGenomicsDBImport
process runs for several days (before getting killed) and generates millions of files occupying terabytes of data (see https://nfcore.slack.com/archives/CGFUX04HZ/p1736549603967039). It does, however, have a helpful suggestion:And indeed, adding
--merge-input-intervals
to the process'sext.args
via a config file solves the issue.This should be done automatically whenever a pipeline is run with
--wes
(indicating a large number of intervals).Command used and terminal output
Relevant files
No response
System information
Nextflow 23.04.1
HPC cluster with Red Hat 8,10, SLURM & Apptainer (run as Singularity)
nf-core/sarek 3.3.2
The text was updated successfully, but these errors were encountered: