Adding new method: scMerge2 #63

seohyonkim · 2025-06-04T20:47:04Z

Describe your changes

This PR is for a new method scMerge2.

Checklist before requesting a review

I have performed a self-review of my code
Check the correct box. Does this PR contain:
- Breaking changes
- New functionality
- Major changes
- Minor changes
- Bug fixes
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!

seohyonkim · 2025-07-23T13:58:54Z

@lazappi Hi! This is Seo :)
The previously-working version of this method is failing now, could you please see why? Is there anything new that has changed?
Thank you so much!

src/methods/semisupervised_scmerge2/config.vsh.yaml

Co-authored-by: Luke Zappia <[email protected]>

rcannood · 2025-08-08T14:34:04Z

@mumichae Could you take a look at this PR?

src/methods/semisupervised_scmerge2/script.R

src/methods/semisupervised_scmerge2/config.vsh.yaml

src/methods/unsupervised_scmerge2/script.R

seohyonkim · 2025-09-08T23:24:55Z

@mumichae I fixed up the code with the help of your feedback!
Here are some key points to help you review the code quicker (and also to help myself):

scMerge2 takes and returns the matrix of gene x cell, not cell x gene, so I transpose the matrix in the beginning and before storing it to the output
top_n for the SEG selection is set for 1000 by default, but for small test data, I did min(top_n, nrow(seg_df)) for safety
I did couple of lines such as rownames(counts) <- as.character(adata$var_names) since I'm scared of AnnData conversion dropping them
newY is the return format(?) of scMerge2

Let me know if there are any thing else that can be better :)

mumichae

Already looking a lot better!
There are still some computational bottlenecks that are worth solving (given that this methods uses the complete count matrix and densifying is expensive.

mumichae · 2025-09-25T09:16:02Z

src/methods/unsupervised_scmerge2/script.R

+adata <- anndata::read_h5ad(par$input)
+
+anndataToUnsupervisedScMerge2 <- function(adata, top_n = 1000, verbose = TRUE) {
+  counts <- t(as.matrix(adata$layers[["counts"]]))


Could you preserve the sparsity of the data? According to the documentation, scMerge should be able to deal with sparse matrices. If the matrix is already a dgeMatrix, you can avoid conversion

mumichae · 2025-09-25T09:21:10Z

src/methods/unsupervised_scmerge2/script.R

+
+cat("Run unsupervised scMerge2\n")
+
+scMerge2_res <- anndataToUnsupervisedScMerge2(adata, top_n = 1000L, verbose = TRUE)


Could you make top_n for the control genes a parameter that can be adjusted in the config.vsh.yaml?

mumichae · 2025-09-25T09:24:10Z

src/methods/unsupervised_scmerge2/script.R

+  batch     <- as.character(adata$obs$batch)
+  cellTypes <- as.character(adata$obs$cell_type)
+
+  scMerge2_res <- scMerge2(
+    exprsMat = exprsMat,
+    batch = batch,
+    ctl = ctl,
+    verbose = verbose
+  )


You can simplify the code a bit

Suggested change

batch <- as.character(adata$obs$batch)

cellTypes <- as.character(adata$obs$cell_type)

scMerge2_res <- scMerge2(

exprsMat = exprsMat,

batch = batch,

ctl = ctl,

verbose = verbose

)

scMerge2_res <- scMerge2(

exprsMat = exprsMat,

batch = as.character(adata$obs$batch),

ctl = ctl,

verbose = verbose

)

mumichae · 2025-09-25T09:24:42Z

src/methods/unsupervised_scmerge2/script.R

+  seg_df <- seg_df[order(seg_df$segIdx, decreasing = TRUE), , drop = FALSE]
+  ctl <- rownames(seg_df)[seq_len(min(top_n, nrow(seg_df)))]
+
+  exprsMat <- t(as.matrix(adata$layers[["normalized"]]))


Is densification via as.matrix really necessary here?

mumichae · 2025-09-25T09:26:21Z

src/methods/unsupervised_scmerge2/script.R

+embedding <- prcomp(t(corrected_mat))$x[, 1:10, drop = FALSE]
+rownames(embedding) <- colnames(corrected_mat)


PCA computation is not necessary for feature methods, because the "embedding" will be computed after the integration in a post-processing step. See Combat for example

Suggested change

embedding <- prcomp(t(corrected_mat))$x[, 1:10, drop = FALSE]

rownames(embedding) <- colnames(corrected_mat)

mumichae · 2025-09-25T09:26:39Z

src/methods/unsupervised_scmerge2/script.R

+  obsm = list(
+    X_emb = embedding[as.character(adata$obs_names), , drop = FALSE]  # match input cells
+  ),


PCA not needed here

Suggested change

obsm = list(

X_emb = embedding[as.character(adata$obs_names), , drop = FALSE] # match input cells

),

mumichae · 2025-09-25T09:27:35Z

src/methods/semisupervised_scmerge2/script.R

+  batch     <- as.character(adata$obs$batch)
+  cellTypes <- as.character(adata$obs$cell_type)
+
+  scMerge2_res <- scMerge2(
+    exprsMat = exprsMat,
+    batch = batch,
+    cellTypes = cellTypes,
+    ctl = ctl,
+    verbose = verbose
+  )


Nitpick: simplify code

Suggested change

batch <- as.character(adata$obs$batch)

cellTypes <- as.character(adata$obs$cell_type)

scMerge2_res <- scMerge2(

exprsMat = exprsMat,

batch = batch,

cellTypes = cellTypes,

ctl = ctl,

verbose = verbose

)

scMerge2_res <- scMerge2(

exprsMat = exprsMat,

batch = as.character(adata$obs$batch),

cellTypes = as.character(adata$obs$cell_type),

ctl = ctl,

verbose = verbose

)

mumichae · 2025-09-25T09:28:00Z

src/methods/semisupervised_scmerge2/script.R

Comments from unsupervised scMerge2 apply here as well

mumichae · 2025-09-25T09:28:38Z

src/methods/unsupervised_scmerge2/script.R

+  rownames(counts) <- as.character(adata$var_names)
+  colnames(counts) <- as.character(adata$obs_names)
+
+  seg_df <- scSEGIndex(exprs_mat = counts)


add a comment to document what you are doing here

working unsupervised scmerge2. Need to clear out the comments

3e5c8b9

seohyonkim marked this pull request as draft June 4, 2025 20:47

seohyonkim added 4 commits June 6, 2025 18:00

raise error for unmatched species

1cd2b42

clean unsupervised scmerge2

68c3b98

working semi-supervised scmerge2

9836495

remove comments

e4a1daa

lazappi reviewed Jul 23, 2025

View reviewed changes

src/methods/semisupervised_scmerge2/config.vsh.yaml Outdated Show resolved Hide resolved

seohyonkim and others added 2 commits July 23, 2025 17:33

Update src/methods/semisupervised_scmerge2/config.vsh.yaml

d5eb2d1

Co-authored-by: Luke Zappia <[email protected]>

change image

26712b6

rcannood requested a review from mumichae August 8, 2025 14:33

mumichae marked this pull request as ready for review August 28, 2025 10:56

This was referenced Aug 28, 2025

Add STACAS as new method component #58

Merged

Add CiLISI as new metric component #57

Merged

mumichae requested changes Aug 28, 2025

View reviewed changes

seohyonkim added 3 commits September 9, 2025 01:07

fixed scMerge2

d0437b9

add method_types to config

423057e

add to changelog

dd6b1ce

Merge branch 'main' into feature/scmerge

2668b92

mumichae requested changes Sep 25, 2025

View reviewed changes


		cat("Run unsupervised scMerge2\n")

		scMerge2_res <- anndataToUnsupervisedScMerge2(adata, top_n = 1000L, verbose = TRUE)

		embedding <- prcomp(t(corrected_mat))$x[, 1:10, drop = FALSE]
		rownames(embedding) <- colnames(corrected_mat)

	obsm = list(
	X_emb = embedding[as.character(adata$obs_names), , drop = FALSE] # match input cells
	),

Adding new method: scMerge2 #63

Are you sure you want to change the base?

Adding new method: scMerge2 #63

Uh oh!

Conversation

seohyonkim commented Jun 4, 2025

Describe your changes

Checklist before requesting a review

Uh oh!

seohyonkim commented Jul 23, 2025

Uh oh!

Uh oh!

rcannood commented Aug 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seohyonkim commented Sep 8, 2025

Uh oh!

mumichae left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!