Skip to content

Conversation

seohyonkim
Copy link
Contributor

Describe your changes

This PR is for a new method scMerge2.

Checklist before requesting a review

  • I have performed a self-review of my code

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI Tests succeed and look good!

@seohyonkim seohyonkim marked this pull request as draft June 4, 2025 20:47
@seohyonkim
Copy link
Contributor Author

@lazappi Hi! This is Seo :)
The previously-working version of this method is failing now, could you please see why? Is there anything new that has changed?
Thank you so much!

@rcannood rcannood requested a review from mumichae August 8, 2025 14:33
@rcannood
Copy link
Member

rcannood commented Aug 8, 2025

@mumichae Could you take a look at this PR?

@mumichae mumichae marked this pull request as ready for review August 28, 2025 10:56
@seohyonkim
Copy link
Contributor Author

@mumichae I fixed up the code with the help of your feedback!
Here are some key points to help you review the code quicker (and also to help myself):

  • scMerge2 takes and returns the matrix of gene x cell, not cell x gene, so I transpose the matrix in the beginning and before storing it to the output
  • top_n for the SEG selection is set for 1000 by default, but for small test data, I did min(top_n, nrow(seg_df)) for safety
  • I did couple of lines such as rownames(counts) <- as.character(adata$var_names) since I'm scared of AnnData conversion dropping them
  • newY is the return format(?) of scMerge2

Let me know if there are any thing else that can be better :)

Copy link
Collaborator

@mumichae mumichae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already looking a lot better!
There are still some computational bottlenecks that are worth solving (given that this methods uses the complete count matrix and densifying is expensive.

adata <- anndata::read_h5ad(par$input)

anndataToUnsupervisedScMerge2 <- function(adata, top_n = 1000, verbose = TRUE) {
counts <- t(as.matrix(adata$layers[["counts"]]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you preserve the sparsity of the data? According to the documentation, scMerge should be able to deal with sparse matrices. If the matrix is already a dgeMatrix, you can avoid conversion


cat("Run unsupervised scMerge2\n")

scMerge2_res <- anndataToUnsupervisedScMerge2(adata, top_n = 1000L, verbose = TRUE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make top_n for the control genes a parameter that can be adjusted in the config.vsh.yaml?

Comment on lines +32 to +40
batch <- as.character(adata$obs$batch)
cellTypes <- as.character(adata$obs$cell_type)

scMerge2_res <- scMerge2(
exprsMat = exprsMat,
batch = batch,
ctl = ctl,
verbose = verbose
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify the code a bit

Suggested change
batch <- as.character(adata$obs$batch)
cellTypes <- as.character(adata$obs$cell_type)
scMerge2_res <- scMerge2(
exprsMat = exprsMat,
batch = batch,
ctl = ctl,
verbose = verbose
)
scMerge2_res <- scMerge2(
exprsMat = exprsMat,
batch = as.character(adata$obs$batch),
ctl = ctl,
verbose = verbose
)

seg_df <- seg_df[order(seg_df$segIdx, decreasing = TRUE), , drop = FALSE]
ctl <- rownames(seg_df)[seq_len(min(top_n, nrow(seg_df)))]

exprsMat <- t(as.matrix(adata$layers[["normalized"]]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is densification via as.matrix really necessary here?

Comment on lines +52 to +53
embedding <- prcomp(t(corrected_mat))$x[, 1:10, drop = FALSE]
rownames(embedding) <- colnames(corrected_mat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PCA computation is not necessary for feature methods, because the "embedding" will be computed after the integration in a post-processing step. See Combat for example

Suggested change
embedding <- prcomp(t(corrected_mat))$x[, 1:10, drop = FALSE]
rownames(embedding) <- colnames(corrected_mat)

Comment on lines +59 to +61
obsm = list(
X_emb = embedding[as.character(adata$obs_names), , drop = FALSE] # match input cells
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PCA not needed here

Suggested change
obsm = list(
X_emb = embedding[as.character(adata$obs_names), , drop = FALSE] # match input cells
),

Comment on lines +32 to +41
batch <- as.character(adata$obs$batch)
cellTypes <- as.character(adata$obs$cell_type)

scMerge2_res <- scMerge2(
exprsMat = exprsMat,
batch = batch,
cellTypes = cellTypes,
ctl = ctl,
verbose = verbose
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: simplify code

Suggested change
batch <- as.character(adata$obs$batch)
cellTypes <- as.character(adata$obs$cell_type)
scMerge2_res <- scMerge2(
exprsMat = exprsMat,
batch = batch,
cellTypes = cellTypes,
ctl = ctl,
verbose = verbose
)
scMerge2_res <- scMerge2(
exprsMat = exprsMat,
batch = as.character(adata$obs$batch),
cellTypes = as.character(adata$obs$cell_type),
ctl = ctl,
verbose = verbose
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from unsupervised scMerge2 apply here as well

rownames(counts) <- as.character(adata$var_names)
colnames(counts) <- as.character(adata$obs_names)

seg_df <- scSEGIndex(exprs_mat = counts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment to document what you are doing here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants