Skip to content

Commit 26c00c3

Browse files
DriesSchaumontgithub-actions[bot]
authored andcommitted
deploy: 514eb53
1 parent 87f6576 commit 26c00c3

File tree

1,379 files changed

+112766
-41917
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,379 files changed

+112766
-41917
lines changed

CHANGELOG.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,84 @@
1+
# openpipelines 2.1.0
2+
3+
## BREAKING CHANGES
4+
5+
* Deprecation of `metadata/duplicate_obs` and `metadata/duplicate_var` components (PR #952).
6+
7+
* Deprecation of `workflows/annotation/scgpt_integration_knn` component (PR #952).
8+
9+
* `annotate/scanvi`: Remove scarches functionality from this component, as it is already covered in `integrate/scarches` (PR #986).
10+
11+
## NEW FUNCTIONALITY
12+
13+
* `dataflow/concatenate_h5mu`: add `modality` parameter (PR #977).
14+
15+
* `filter_with_scrublet`: add `expected_doublet_rate`, `stdev_doublet_rate`, `n_neighbors` and `sim_doublet_ratio` arguments (PR #974).
16+
17+
* `feature_annotation/aling_query_reference`: Added a component to align a query and reference dataset (PR #948, #958, #972).
18+
19+
* `workflows/qc/qc` workflow: Added ribosomal gene detection (PR #961).
20+
21+
* `workflows/rna/rna_singlesample`, `workflows/multiomics/process_samples` workflows: Added ribosomal gene detection (PR #968).
22+
23+
* `scanvi`: enable CUDA acceleration (PR #969).
24+
25+
* `workflows/annotation/scvi_knn` workflow: Cell-type annotation based on scVI integration followed by KNN label transfer (PR #954).
26+
27+
* `convert/from_h5ad_to_seurat`: Add component to convert from h5ad to Seurat (PR #980).
28+
29+
* `workflows/annotation/scanvi_scarches` workflow: Cell-type annotation based on scANVI integration and annotation with scArches for reference mapping (PR #898).
30+
31+
* `integrate/scarches`: Implemented functionality to align the query dataset with the model registry and extend functionality to predict labels for scANVI models (PR #898).
32+
33+
* `workflows/annotation/harmony_knn` workflow: Cell-type annotation based on harmony integration with KNN label transfer (PR #836).
34+
35+
* `from_cellranger_multi_to_h5mu`: add support for `custom` modality (PR #982).
36+
37+
* `integrate/scvi`: Enable passing any .var field for gene name information instead of .var index, using the `--var_gene_names` parameter (PR #986).
38+
39+
## MAJOR CHANGES
40+
41+
* Several components: when a component processes a single modality, only that modality is read into memory (PR #944)
42+
43+
* The `transfer/publish` component is deprecated and will be removed in a future major release (PR #941).
44+
45+
46+
# MINOR CHANGES
47+
48+
* Bump viash to `0.9.3` (PR #995).
49+
50+
* Several workflows: refactor neighbors, leiden and UMAP in a separate subworkflow (PR #942 and PR #949).
51+
52+
* `grep_annotation_column` and `subset_obsp`: Fix compatibility for SciPy (PR #945).
53+
54+
* `popv`: Pin numpy<2 after new release of scvi-tools (PR #946).
55+
56+
* Various components (`scgpt` and `annotate`): Add resource labels (PR #947, PR #950).
57+
58+
* `feature_annotation/highly_variable_features_scanpy`: Enable calculation of HVG on a subset of genes (PR #957, PR #959).
59+
60+
* `integrate/scvi`, `integrate/totalvi` and `integrate/scarches`: update base image to nvcr.io/nvidia/pytorch:24.12-py3, pin scvi-tools version to 1.1.5, unpin jax and jaxlib version (PR #970).
61+
62+
* `annotate/celltypist`: Enable passing any layer with log normalized counts, enforce checking whether counts are log normalized (PR #971).
63+
64+
* `process_10xh5/filter_10xh5`: update container base to ubuntu 24.04 (PR #983).
65+
66+
# BUG FIXES
67+
68+
* `cluster/leiden`: Fix an issue where insufficient shared memory (size of `/dev/shm`) causes the processing to hang.
69+
70+
* `utils/subset_vars`: Convert .var column used for subsetting of dtype "boolean" to dtype "bool" when it doesn't contain NaN values (PR #959).
71+
72+
* `resources_test_scripts/annotation_test_data.sh`: Add a layer to the annotation reference dataset with log normalized counts (PR #960).
73+
74+
* `annotate/celltypist`: Fix missing values in annotation column caused by index misalignment (PR #976).
75+
76+
* `workflows/annotation/scgpt_annotation` and `workflows/integrate/scgpt_leiden`: Parameterization of HVG flavor with default method `cell_ranger` instead of `seurat_v3` (PR #979).
77+
78+
* `dataflow/merge`: Resolved an issue where merging two MuData objects with overlapping `var` or `obs` columns sometimes resulted in an unsupported nullable dtype (e.g. merging `pd.IntegerDtype` and `pd.FloatDtype`). These columns are now correctly cast to their native numpy dtypes before writing(PR #990).
79+
80+
* `workflows/annotation/harmony_knn`: Only process RNA modality in the workflow (PR #988).
81+
182
# openpipelines 2.0.0
283

384
## BREAKING CHANGES
@@ -55,6 +136,8 @@
55136

56137
* `scgpt/binning`: update handling of empty rows in sparse matrices (PR #875).
57138

139+
* `dataflow/split_h5mu`: Update memory label from `lowmem` to `highmem` and cpu label from `singlecpu` to `lowcpu` (PR #930).
140+
58141
# openpipelines 2.0.0-rc.2
59142

60143
## BUG FIXES
@@ -242,6 +325,12 @@
242325

243326
* Update authorship of components (PR #835).
244327

328+
# openpipelines 1.0.4
329+
330+
## BUG FIXES
331+
332+
* `scvi_leiden` workflow: fix the input layer argument of the workflow not being passed to the scVI component (PR #939, backported from PR #936 and PR #938).
333+
245334
# openpipelines 1.0.3
246335

247336
## BUG FIXES

_viash.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
viash_version: 0.9.0
1+
viash_version: 0.9.3
22

33
source: src
44
target: target
@@ -21,7 +21,6 @@ info:
2121
dest: resources_test
2222

2323
config_mods: |
24-
.test_resources += {path: '/src/base/openpipelinetestutils', dest: 'openpipelinetestutils'}
2524
.resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}
2625
.runners[.type == 'nextflow'].directives.tag := '$id'
2726
.runners[.type == 'nextflow'].config.script := 'includeConfig("nextflow_labels.config")'

resources_test_scripts/10x_5k_anticmv.sh

Lines changed: 64 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,6 @@ fi
7777

7878

7979
# Run mapping pipeline
80-
# TODO: Also include conversion to h5mu
8180
cat > /tmp/params.yaml << HERE
8281
param_list:
8382
- id: "$ID"
@@ -97,7 +96,6 @@ feature_reference: "$feature_reference"
9796
publish_dir: "$OUT/processed"
9897
HERE
9998

100-
10199
nextflow \
102100
run . \
103101
-main-script target/nextflow/mapping/cellranger_multi/main.nf \
@@ -107,12 +105,12 @@ nextflow \
107105
-c src/workflows/utils/labels.config \
108106
-c src/workflows/utils/errorstrat_ignore.config
109107

110-
# Create h5mu
108+
# Convert to h5mu
111109
cat > /tmp/params.yaml << HERE
112-
id: "$ID"
113-
input: "$OUT/processed/10x_5k_anticmv.cellranger_multi.output.output"
110+
id: "$orig_sample_id"
111+
input: "$OUT/processed/10x_5k_anticmv.cellranger_multi.output"
114112
publish_dir: "$OUT/"
115-
output: "$orig_sample_id.h5mu"
113+
output: "*.h5mu"
116114
HERE
117115

118116
nextflow \
@@ -123,17 +121,39 @@ nextflow \
123121
-params-file /tmp/params.yaml \
124122
-c src/workflows/utils/labels.config
125123

124+
mv "$OUT/0.h5mu" "$OUT/${orig_sample_id}.h5mu"
125+
126+
127+
# run qc workflow
126128
cat > /tmp/params.yaml << HERE
127129
id: "$ID"
128130
input: "$OUT/$orig_sample_id.h5mu"
131+
var_name_mitochondrial_genes: mitochondrial
132+
var_name_ribosomal_genes: ribosomal
129133
publish_dir: "$OUT/"
130-
output: "${orig_sample_id}_mms.h5mu"
134+
output: "${orig_sample_id}_qc.h5mu"
131135
HERE
132136

137+
nextflow \
138+
run . \
139+
-main-script target/nextflow/workflows/qc/qc/main.nf \
140+
-resume \
141+
-profile docker,mount_temp \
142+
-params-file /tmp/params.yaml \
143+
-c src/workflows/utils/labels.config
144+
145+
133146
# Run full pipeline
147+
cat > /tmp/params.yaml << HERE
148+
id: "$ID"
149+
input: "$OUT/${orig_sample_id}_qc.h5mu"
150+
publish_dir: "$OUT/"
151+
output: "${orig_sample_id}_mms.h5mu"
152+
HERE
153+
134154
nextflow \
135155
run . \
136-
-main-script src/workflows/multiomics/full_pipeline/main.nf \
156+
-main-script target/nextflow/workflows/multiomics/process_samples/main.nf \
137157
-resume \
138158
-profile docker,mount_temp \
139159
-params-file /tmp/params.yaml \
@@ -143,7 +163,41 @@ nextflow \
143163
fastqc_dir="$OUT/fastqc"
144164
mkdir -p "$fastqc_dir"
145165

146-
./target/docker/qc/fastqc/fastqc \
166+
./target/executable/qc/fastqc/fastqc \
147167
--input "$raw_dir" \
148168
--mode "dir" \
149-
--output "$fastqc_dir"
169+
--output "$fastqc_dir"
170+
171+
172+
# Create a test dataset for the Custom modality
173+
# by just labeling the AB as custom
174+
feat_ref_name=$(basename $feature_reference)
175+
sed -e 's/Antibody Capture/Custom/g' "$feature_reference" > "/tmp/custom_${feat_ref_name}"
176+
177+
cat > /tmp/params_custom.yaml << HERE
178+
param_list:
179+
- id: "$ID"
180+
input: "$raw_dir"
181+
library_id:
182+
- "${orig_sample_id}_GEX_1_subset"
183+
- "${orig_sample_id}_AB_subset"
184+
- "${orig_sample_id}_VDJ_subset"
185+
library_type:
186+
- "Gene Expression"
187+
- "Custom"
188+
- "VDJ"
189+
190+
gex_reference: "$genome_tar"
191+
feature_reference: "/tmp/custom_${feat_ref_name}"
192+
vdj_reference: "$vdj_ref"
193+
publish_dir: "$OUT/processed_with_custom"
194+
HERE
195+
196+
nextflow \
197+
run . \
198+
-main-script target/nextflow/mapping/cellranger_multi/main.nf \
199+
-resume \
200+
-profile docker,mount_temp \
201+
-params-file /tmp/params_custom.yaml \
202+
-c src/workflows/utils/labels.config \
203+
-c src/workflows/utils/errorstrat_ignore.config

resources_test_scripts/annotation_test_data.sh

100755100644
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,14 +33,25 @@ wget "https://zenodo.org/record/7580707/files/pretrained_models_Blood_ts.tar.gz?
3333

3434
# Process Tabula Sapiens Blood reference h5ad
3535
# (Select one individual and 100 cells per cell type)
36+
# normalize and log1p transform data
3637
python <<HEREDOC
3738
import anndata as ad
39+
import scanpy as sc
3840
ref_adata = ad.read_h5ad("${OUT}/tmp_TS_Blood_filtered.h5ad")
3941
sub_ref_adata = ref_adata[ref_adata.obs["donor_assay"] == "TSP14_10x 3' v3"]
4042
n=100
4143
s=sub_ref_adata.obs.groupby('cell_ontology_class').cell_ontology_class.transform('count')
4244
sub_ref_adata_final = sub_ref_adata[sub_ref_adata.obs[s>=n].groupby('cell_ontology_class').head(n).index]
4345
# assert sub_ref_adata_final.shape == (500, 58870)
46+
data_for_scanpy = ad.AnnData(X=sub_ref_adata_final.X)
47+
sc.pp.normalize_total(data_for_scanpy, target_sum=10000)
48+
sc.pp.log1p(
49+
data_for_scanpy,
50+
base=None,
51+
layer=None,
52+
copy=False,
53+
)
54+
sub_ref_adata_final.layers["log_normalized"] = data_for_scanpy.X
4455
sub_ref_adata_final.write("${OUT}/TS_Blood_filtered.h5ad", compression='gzip')
4556
HEREDOC
4657

@@ -79,3 +90,28 @@ rm "${OUT}/tmp_pretrained_models_Blood_ts.tar.gz"
7990

8091
find "${OUT}/Pretrained_model" ! -name "example_file_model*" -type f -exec rm -f {} +
8192
mv "${OUT}/Pretrained_model" "${OUT}/onclass_model"
93+
94+
echo "> Creating SCVI model"
95+
viash run src/integrate/scvi/config.vsh.yaml --engine docker -- \
96+
--input "${OUT}/TS_Blood_filtered.h5mu" \
97+
--obs_batch "donor_id" \
98+
--var_gene_names "ensemblid" \
99+
--output "${OUT}/scvi_output.h5mu" \
100+
--output_model "${OUT}/scvi_model" \
101+
--max_epochs 5 \
102+
--n_obs_min_count 10 \
103+
--n_var_min_count 10
104+
105+
echo "> Creating SCANVI model"
106+
viash run src/integrate/scanvi/config.vsh.yaml --engine docker -- \
107+
--input "${OUT}/TS_Blood_filtered.h5mu" \
108+
--var_gene_names "ensemblid" \
109+
--obs_labels "cell_ontology_class" \
110+
--scvi_model "${OUT}/scvi_model" \
111+
--output "${OUT}/scanvi_output.h5mu" \
112+
--output_model "${OUT}/scanvi_model" \
113+
--max_epochs 5
114+
115+
rm "${OUT}/scanvi_output.h5mu"
116+
rm "${OUT}/scvi_output.h5mu"
117+
rm -r "${OUT}/Pretrained_model/"

resources_test_scripts/scgpt.sh

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ viash run src/feature_annotation/highly_variable_features_scanpy/config.vsh.yaml
102102
--layer "log_normalized" \
103103
--var_name_filter "scgpt_filter_with_hvg" \
104104
--n_top_features 1200 \
105-
--flavor "seurat_v3"
105+
--flavor "cell_ranger"
106106

107107
echo "> Running scGPT cross check genes"
108108
viash run src/scgpt/cross_check_genes/config.vsh.yaml --engine docker -- \
@@ -133,4 +133,3 @@ echo "> Removing unnecessary files in test resources dir"
133133
find "${test_resources_dir}" -type f \( ! -name "Kim2020_*" -o ! -name "*.h5mu" \) -delete
134134

135135
echo "> scGPT test resources are ready!"
136-

src/annotate/celltypist/config.vsh.yaml

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ argument_groups:
2525
required: false
2626
- name: "--input_layer"
2727
type: string
28-
description: The layer in the input data to be used for cell type annotation if .X is not to be used.
28+
description: The layer in the input data containing log normalized counts to be used for cell type annotation if .X is not to be used.
2929
- name: "--input_var_gene_names"
3030
type: string
3131
required: false
@@ -55,11 +55,6 @@ argument_groups:
5555
type: string
5656
description: The name of the adata obs column in the reference data containing cell type annotations.
5757
default: "cell_ontology_class"
58-
- name: "--check_expression"
59-
type: boolean_true
60-
description: |
61-
Whether to check the expression of the reference dataset to the format reccomended by CellTypist.
62-
CellTypist requires data to be log-normalized to 10000 counts per cell.
6358
- name: "--reference_var_gene_names"
6459
type: string
6560
required: false
@@ -164,4 +159,6 @@ engines:
164159
__merge__: [ /src/base/requirements/python_test_setup.yaml, .]
165160
runners:
166161
- type: executable
167-
- type: nextflow
162+
- type: nextflow
163+
directives:
164+
label: [highcpu, highmem, highdisk]

0 commit comments

Comments
 (0)