Releases: openpipelines-bio/openpipeline
0.7.0
openpipelines 0.7.0
MAJOR CHANGES
- Removed
bin
folder. As of viash 0.6.4, a_viash.yaml
file can be included in the root of a repository to set common viash options for the project.
These options were previously covered in thebin/init
script, but this new feature of viash makes its use unnecessary. Theviash
andnextflow
should now be installed in a directory that is included in your$PATH
.
MINOR CHANGES
filter/do_filter
: raise an error instead of printing a warning when providing a column forvar_filer
orobs_filter
that doesn't exist.
BUG FIXES
-
workflows/full_pipeline
: Fix setting .var output column for filter_with_hvg. -
Fix running
mapping/cellranger_multi
without passing all references. -
filter/filter_with_scrublet
: now setsuse_approx_neighbors
toFalse
to avoid usingannoy
because it fails on processors that are missing the AVX-512 instruction sets. -
workflows
: UpdatedWorkflowHelper
to newer version that allows applying defaults when calling a subworkflow from another workflow. -
Several components: pin matplotlib to <3.7 to fix scanpy compatibility (see scverse/scanpy#2411).
-
workflows
: fix a bug when running a subworkflow from a workflow would cause the parent config to be read instead of the subworklow config. -
correction/cellbender_remove_background
: Fix description of input for cellbender_remove_background. -
filter/do_filter
: resolved an issue where the .obs column instead of the .var column was being logged when filtering using the .var column. -
workflows/rna_singlesample
andworkflows/prot_singlesample
: Correctly set var and obs columns while filtering with counts. -
filter/do_filter
: removed the default input value forvar_filter
argument. -
workflows/full_pipeline
andworkflows/integration
: fix PCA not using highly variable genes filter.
0.6.2
openpipelines 0.6.2
NEW FUNCTIONALITY
-
workflows/full_pipeline
: addedfilter_with_hvg_obs_batch_key
argument for batched detection of highly variable genes. -
workflows/rna_multisample
: addedfilter_with_hvg_obs_batch_key
,filter_with_hvg_flavor
andfilter_with_hvg_n_top_genes
arguments. -
qc/calculate_qc_metrics
: Add basic statistics:pct_dropout
,num_zero_obs
,obs_mean
andtotal_counts
are added to .var.num_nonzero_vars
,pct_{var_qc_metrics}
,total_counts_{var_qc_metrics}
,pct_of_counts_in_top_{top_n_vars}_vars
andtotal_counts
are included in .obs -
workflows/multiomics/rna_multisample
andworkflows/multiomics/full_pipeline
: addqc/calculate_qc_metrics
component to workflow. -
workflows/multiomics/prot_singlesample
: Processing unimodal single-sample CITE-seq data. -
workflows/multiomics/rna_singlesample
andworkflows/multiomics/full_pipeline
: Add filtering arguments to pipeline.
MINOR CHANGES
-
convert/from_bdrhap_to_h5mu
: bump R version to 4.2. -
process_10xh5/filter_10xh5
: bump R version to 4.2. -
dataflow/concat
: include path of file in error message when reading a mudata file fails. -
mapping/cellranger_multi
: write cellranger console output to acellranger_multi.log
file.
BUG FIXES
-
mapping/htseq_count_to_h5mu
: Fix a bug where reading in the gtf file causedAttributeError
. -
dataflow/concat
: the--input_id
is no longer required when--mode
is notmove
. -
filter/filter_with_hvg
: does no longer try to use--varm_name
to set non-existant metadata when running with--flavor seurat_v3
, which was causingKeyError
. -
filter/filter_with_hvg
: Enforce thatn_top_genes
is set whenflavor
is set to 'seurat_v3'. -
filter/filter_with_hvg
: Improve error message when trying to use 'cell_ranger' asflavor
and passing unfiltered data. -
mapping/cellranger_multi
now appliesgex_chemistry
,gex_secondary_analysis
,gex_generate_bam
,gex_include_introns
andgex_expect_cells
.
0.6.1
openpipeline 0.6.1
BUG FIXES
src/filter/filter_with_counts
: Fix an issue where mitochrondrial genes were being detected in .var_names, which contain ENSAMBL IDs instead of gene symbols in the pipelines. Solution was to create a--var_gene_names
argument which allows selecting a .var column to check using a regex (--mitochondrial_gene_regex
).
0.6.0
openpipeline 0.6.0
NEW FUNCTIONALITY
-
workflows/full_pipeline
: addfilter_with_hvg_var_output
argument. -
dimred/pca
: Add--overwrite
and--var_input
arguments. -
src/tranform/clr
: Perform CLR normalization on CITE-seq data. -
workflows/ingestion/cellranger_multi
: Run Cell Ranger multi and convert the output to .h5mu. -
filter/remove_modality
: Remove a single modality from a MuData file. -
mapping/star_align
: Align.fastq
files using STAR. -
mapping/star_align_v273a
: Align.fastq
files using STAR v2.7.3a. -
mapping/star_build_reference
: Create a STAR reference index. -
mapping/cellranger_multi
: Align fastq files using Cell Ranger multi. -
mapping/samtools_sort
: Sort and (optionally) index alignments. -
mapping/htseq_count
: Quantify gene expression for subsequent testing for differential expression. -
mapping/htseq_count_to_h5mu
: Convert one or more HTSeq outputs to a MuData file. -
Added from
convert/from_cellranger_multi_to_h5mu
component.
MAJOR CHANGES
-
convert/from_velocyto_to_h5mu
: Moved tovelocity/velocyto_to_h5mu
.
It also now accepts an optional--input_h5mu
argument, to allow directly reading
the RNA velocity data into a.h5mu
file containing the other modalities. -
resources_test/cellranger_tiny_fastq
: Include RNA velocity computations as part of
the script. -
mapping/cellranger_mkfastq
: remove --memory and --cpu arguments as (resource management is automatically provided by viash).
MINOR CHANGES
-
Several components: use
gzip
compression for writing .h5mu files. -
Default value for
obs_covariates
argument of full pipeline is nowsample_id
. -
Set the
tag
directive of all Nextflow components to '$id'.
BUG FIXES
-
Keep data for modalities that are not specifically enabled when running full pipeline.
-
Fix many components thanks to Viash 0.6.4, which causes errors to be
thrown when input and output files are defined but not found.
openpipeline 0.5.1
BREAKING CHANGES
-
reference/make_reference
: Input files changed fromtype: string
totype: file
to allow Nextflow to cache the input files fetched from URL. -
several components (except
from_h5ad_to_h5mu
): the--modality
arguments no longer accept multiple values. -
Remove outdated
resources_test_scripts
. -
convert/from_h5mu_to_seurat
: Disabled because MuDataSeurat is currently broken, see https://github.com/PMBio/MuDataSeurat/issues/9. -
integrate/harmony
: Disabled because it is currently not functioning and the alternative, harmonypy, is used in the workflows. -
dataflow/concat
: Renamed --sample_names to --input_id and moved the ability to add sample id and to join the sample ids with the observation names tometadata/add_id
-
Moved
dataflow/concat
,dataflow/merge
anddataflow/split_modalities
to a new namespace:dataflow
. -
Moved
workflows/conversion/conversion
toworkflows/ingestion/conversion
NEW FUNCTIONALITY
-
metadata/add_id
: Add an id to a column in .obs. Also allows joining the id to the .obs_names. -
workflows/ingestion/make_reference
: A generic component to build a transcriptomics reference into one of many formats. -
integrate/scvi
: Performs scvi integration. -
integrate/add_metadata
: Add a csv containing metadata to the .obs or .var field of a mudata file. -
DataFlowHelper.nf
: AddedpassthroughMap
. Usage:include { passthroughMap as pmap } from "./DataFlowHelper.nf" workflow { Channel.fromList([["id", [input: "foo"], "passthrough"]]) | pmap{ id, data -> [id, data + [arg: 10]] } }
Note that in the example above, using a regular
map
would result in an exception being thrown,
that is, "Invalid method invocationcall
with arguments".A synonymous of doing this with a regular
map()
would be:workflow { Channel.fromList([["id", [input: "foo"], "passthrough"]]) | map{ tup -> def (id, data) = tup [id, data + [arg: 10]] + tup.drop(2) } }
-
correction/cellbender_remove_background
: Eliminating technical artifacts from high-throughput single-cell RNA sequencing data. -
workflows/ingestion/cellranger_postprocessing
: Add post-processing of h5mu files created from Cell Ranger data.
MAJOR CHANGES
-
workflows/utils/DataFlowHelper.nf
: Added helper functionssetWorkflowArguments()
andgetWorkflowArguments()
to split the data field of a channel event into a hashmap. Example usage:| setWorkflowArguments( pca: [ "input": "input", "obsm_output": "obsm_pca" ] integration: [ "obs_covariates": "obs_covariates", "obsm_input": "obsm_pca" ] ) | getWorkflowArguments("pca") | pca | getWorkflowArguments("integration") | integration
-
mapping/cellranger_count
: Allow passing both directories as well as individual fastq.gz files as inputs. -
convert/from_10xh5_to_h5mu
: Allow reading in QC metrics, use gene ids as.obs_names
instead of gene symbols. -
workflows/conversion
: Update pipeline to use the latest practices and to get it to a working state.
MINOR CHANGES
-
dimred/umap
: Streamline UMAP parameters by adding--obsm_output
parameter to allow choosing the output.obsm
slot. -
workflows/multiomics/integration
: Added arguments for tuning the various output slots of the integration pipeline, namely--obsm_pca
,--obsm_integrated
,--uns_neighbors
,--obsp_neighbor_distances
,--obsp_neighbor_connectivities
,--obs_cluster
,--obsm_umap
. -
Switch to Viash 0.6.1.
-
filter/subset_h5mu
: Add--modality
argument, export to VDSL3, add unit test. -
dataflow/split_modalities
: Also output modality types in a separate csv.
BUG FIXES
-
convert/from_bd_to_10x_molecular_barcode_tags
: Replaced UTF8 characters with ASCII. OpenJDK 17 or lower might throw the following exception when trying to read a UTF8 file:java.nio.charset.MalformedInputException: Input length = 1
. -
dataflow/concat
: Overriding sample name in .obs no longer raisesAttributeError
. -
dataflow/concat
: Fix false positives when checking for conflicts in .obs and .var when using--mode move
.
openpipeline 0.5.0
Major redesign of the integration and multiomic workflows. Current list of workflows:
-
ingestion/bd_rhapsody
: A generic pipeline for running BD Rhapsody WTA or Targeted mapping, with support for AbSeq, VDJ and/or SMK. -
ingestion/cellranger_mapping
: A pipeline for running Cell Ranger mapping. -
ingestion/demux
: A generic pipeline for running bcl2fastq, bcl-convert or Cell Ranger mkfastq. -
multiomics/rna_singlesample
: Processing unimodal single-sample RNA transcriptomics data. -
multiomics/rna_multisample
: Processing unimodal multi-sample RNA transcriptomics data. -
multiomics/integration
: A pipeline for demultiplexing multimodal multi-sample RNA transcriptomics data. -
multiomics/full_pipeline
: A pipeline to analyse multiple multiomics samples.
BREAKING CHANGES
- Many components: Renamed
.var["gene_ids"]
and.var["feature_types"]
to.var["gene_id"]
and.var["feature_type"]
.
DEPRECATED
-
convert/from_10xh5_to_h5ad
andconvert/from_bdrhap_to_h5ad
: Removed h5ad based components. -
mapping/bd_rhapsody_wta
andworkflows/ingestion/bd_rhapsody_wta
: Deprecated in favour for more genericmapping/bd_rhapsody
andworkflows/ingestion/bd_rhapsody
pipelines. -
convert/from_csv_to_h5mu
: Disable until it is needed again. -
integrate/concat
: Deprecated"concat"
option for--other_axis_mode
.
NEW COMPONENTS
-
graph/bbknn
: Batch balanced KNN. -
transform/scaling
: Scale data to unit variance and zero mean. -
mapping/bd_rhapsody
: Added generic component for running the BD Rhapsody WTA or Targeted analysis, with support for AbSeq, VDJ and/or SMK. -
integrate/harmony
andintegrate/harmonypy
: Run a Harmony integration analysis (R-based and Python-based, respectively). -
integrate/scanorama
: Use Scanorama to integrate different experiments. -
reference/make_reference
: Download a transcriptomics reference and preprocess it (adding ERCC spikeins and filtering with a regex). -
reference/build_bdrhap_reference
: Compile a reference into a STAR index in the format expected by BD Rhapsody.
NEW WORKFLOWS
-
workflows/ingestion/bd_rhapsody
: Added generic workflow for running the BD Rhapsody WTA or Targeted analysis, with support for AbSeq, VDJ and/or SMK. -
workflows/multiomics/full_pipeline
: Implement pipeline for processing multiple multiomics samples.
NEW FUNCTIONALITY
-
convert/from_bdrhap_to_h5mu
: Added support for being able to deal with WTA, Targeted, SMK, AbSeq and VDJ data. -
integrate/concat
: Added"move"
option to--other_axis_mode
, which allows merging.obs
and.var
by only keeping elements of the matrices which are the same in each of the samples, moving the conflicting values to.varm
or.obsm
.
MAJOR CHANGES
-
Multiple components: Update to anndata 0.8 with mudata 0.2.0. This means that the format of the
.h5mu
files have changed. -
multiomics/rna_singlesample
: Move transformation counts into layers instead of overwriting.X
. -
Updated to Viash 0.6.0.
MINOR CHANGES
-
velocity/velocyto
: Allow configuring memory and parallellisation. -
cluster/leiden
: Add--obsp_connectivities
parameter to allow choosing the output slot. -
workflows/multiomics/rna_singlesample
,workflows/multiomics/rna_multisample
andworkflows/multiomics/integration
: Allow choosing the output paths. -
neighbors/bbknn
andneighbors/find_neighbors
: Add parameters for choosing the input/output slots. -
dimred/pca
anddimred/umap
: Add parameters for choosing the input/output slots. -
integrate/concat
: Optimize concat performance by adding multiprocessing and refactoring functions. -
workflows/multimodal_integration
: Addobs_covariates
argument to pipeline.
BUG FIXES
-
Several components: Revert using slim versions of containers because they do not provide the tools to run nextflow with trace capabilities.
-
integrate/concat
: Fix an issue where joining boolean values causedTypeError
. -
workflows/multiomics/rna_multisample
,workflows/multiomics/rna_singlesample
andworkflows/multiomics/integration
: Use nextflow trace reporting when running integration tests.
openpipeline 0.4.1
BUG FIXES
workflows/ingestion/bd_rhapsody_wta
: use ':' as a separator for multiple input files and fix integration tests.
MINOR CHANGES
- Several components: pin mudata and scanpy dependencies so that anndata version <0.8.0 is used.
openpipeline 0.4.0
NEW FUNCTIONALITY
-
convert/from_bdrhap_to_h5mu
: Merge one or more BD rhapsody outputs into an h5mu file. -
split/split_modalities
: Split the modalities from a single .h5mu multimodal sample into seperate .h5mu files. -
integrate/concat
: Combine data from multiple samples together.
MINOR CHANGES
-
mapping/bd_rhapsody_wta
: Update to BD Rhapsody 1.10.1. -
mapping/bd_rhapsody_wta
: Add parameters for overriding the minimum RAM & cores. Add--dryrun
parameter. -
Switch to Viash 0.5.14.
-
convert/from_bdrhap_to_h5mu
: Update to BD Rhapsody 1.10.1. -
resources_test/bdrhap_5kjrt
: Add subsampled BD rhapsody datasets to test pipeline with. -
resources_test/bdrhap_ref_gencodev40_chr1
: Add subsampled reference to test BD rhapsody pipeline with. -
integrate/merge
: Merge several unimodal .h5mu files into one multimodal .h5mu file. -
Updated several python docker images to slim version.
-
mapping/cellranger_count_split
: update container from ubuntu focal to ubuntu jammy -
download/sync_test_resources
: update AWS cli tools from 2.7.11 to 2.7.12 by updating docker image -
download/download_file
: now uses bash container instead of python. -
mapping/bd_rhapsody_wta
: Use squashed docker image in which log4j issues are resolved.
BUG FIXES
-
workflows/utils/WorkflowHelper.nf
: Renamedutils.nf
toWorkflowHelper.nf
. -
workflows/utils/WorkflowHelper.nf
: Fix error message when required parameter is not specified. -
workflows/utils/WorkflowHelper.nf
: Added helper functions:readConfig
: Read a Viash config from a yaml file.viashChannel
: Create a channel from the Viash config and the params object.helpMessage
: Print a help message and exit.
-
mapping/bd_rhapsody_wta
: Update picard to 2.27.3.
DEPRECATED
-
convert/from_bdrhap_to_h5ad
: Deprecated in favour forconvert/from_bdrhap_to_h5mu
. -
convert/from_10xh5_to_h5ad
: Deprecated in favour forconvert/from_10xh5_to_h5mu
.
openpipeline 0.3.1
NEW FUNCTIONALITY
bin/port_from_czbiohub_utilities.sh
: Added helper script to import components and pipelines fromczbiohub/utilities
Imported components from czbiohub/utilities
:
-
demux/cellranger_mkfastq
: Demultiplex raw sequencing data. -
mapping/cellranger_count
: Align fastq files using Cell Ranger count. -
mapping/cellranger_count_split
: Split 10x Cell Ranger output directory into separate output fields.
Imported workflows from czbiohub/utilities
:
-
workflows/1_ingestion/cellranger
: Use Cell Ranger to preprocess 10x data. -
workflows/1_ingestion/cellranger_demux
: Use cellranger demux to demultiplex sequencing BCL output to FASTQ. -
workflows/1_ingestion/cellranger_mapping
: Use cellranger count to align 10x fastq files to a reference.
MINOR CHANGES
-
Fix
interactive/run_cirrocumulus
script raisingNotImplementedError
caused by usingMutData.var_names_make_unique()
on each modality instead of on the wholeMuData
object. -
Fix
transform/normalize_total
andinteractive/run_cirrocumulus
component build missing a hdf5 dependency. -
interactive/run_cellxgene
: Updated container to ubuntu:focal because it contains python3.6 but cellxgene dropped python3.6 support. -
mapping/bd_rhapsody_wta
: Set--parallel
to true by default. -
mapping/bd_rhapsody_wta
: Translate Bash script into Python. -
download/sync_test_resources
: Add--dryrun
,--quiet
, and--delete
arguments. -
convert/from_h5mu_to_seurat
: Useeddelbuettel/r2u:22.04
docker container in order to speed up builds by downloading precompiled R packages. -
mapping/cellranger_count
: Use 5Gb for testing (to adhere to github CI runner memory constraints). -
convert/from_bdrhap_to_h5ad
: change test data to output frommapping/bd_rhapsody_wta
after reducing the BD Rhapsody test data size. -
Various
config.vsh.yaml
s: Renamedvalues:
tochoices:
. -
download/download_file
andtransfer/publish
: Switch base container frombash:5.1
topython:3.10
. -
mapping/bd_rhapsody_wta
: Make sure procps is installed.
BUG FIXES
-
mapping/bd_rhapsody_wta
: Use a smaller test dataset to reduce test time and make sure that the Github Action runners do not run out of disk space. -
download/sync_test_resources
: Disable the use of the Amazon EC2 instance metadata service to make script work on Github Actions runners. -
convert/from_h5mu_to_seurat
: Fix unit test requiring Seurat by using native R functions to test the Seurat object instead. -
mapping/cellranger_count
andbcl_demus/cellranger_mkfastq
: cellranger uses the--parameter=value
formatting instead of--parameter value
to set command line arguments. -
mapping/cellranger_count
:--nosecondary
is no longer always applied. -
mapping/bd_rhapsody_wta
: Added workaround for bug in Viash 0.5.12 where triple single quotes are incorrectly escaped (viash-io/viash#139).
DEPRECATED
bcl_demux/cellranger_mkfastq
: Duplicate ofdemux/cellranger_mkfastq
.
openpipeline 0.3.0
- Add
tx_processing
pipeline with following components:filter_with_counts
filter_with_scrublet
filter_with_hvg
do_filter
normalize_total
regress_out
log1p
pca
find_neighbors
leiden
umap