-
Notifications
You must be signed in to change notification settings - Fork 12
Adding new method: scMerge2 #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@lazappi Hi! This is Seo :) |
Co-authored-by: Luke Zappia <[email protected]>
@mumichae Could you take a look at this PR? |
@mumichae I fixed up the code with the help of your feedback!
Let me know if there are any thing else that can be better :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already looking a lot better!
There are still some computational bottlenecks that are worth solving (given that this methods uses the complete count matrix and densifying is expensive.
adata <- anndata::read_h5ad(par$input) | ||
|
||
anndataToUnsupervisedScMerge2 <- function(adata, top_n = 1000, verbose = TRUE) { | ||
counts <- t(as.matrix(adata$layers[["counts"]])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you preserve the sparsity of the data? According to the documentation, scMerge should be able to deal with sparse matrices. If the matrix is already a dgeMatrix
, you can avoid conversion
|
||
cat("Run unsupervised scMerge2\n") | ||
|
||
scMerge2_res <- anndataToUnsupervisedScMerge2(adata, top_n = 1000L, verbose = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make top_n
for the control genes a parameter that can be adjusted in the config.vsh.yaml?
batch <- as.character(adata$obs$batch) | ||
cellTypes <- as.character(adata$obs$cell_type) | ||
|
||
scMerge2_res <- scMerge2( | ||
exprsMat = exprsMat, | ||
batch = batch, | ||
ctl = ctl, | ||
verbose = verbose | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simplify the code a bit
batch <- as.character(adata$obs$batch) | |
cellTypes <- as.character(adata$obs$cell_type) | |
scMerge2_res <- scMerge2( | |
exprsMat = exprsMat, | |
batch = batch, | |
ctl = ctl, | |
verbose = verbose | |
) | |
scMerge2_res <- scMerge2( | |
exprsMat = exprsMat, | |
batch = as.character(adata$obs$batch), | |
ctl = ctl, | |
verbose = verbose | |
) |
seg_df <- seg_df[order(seg_df$segIdx, decreasing = TRUE), , drop = FALSE] | ||
ctl <- rownames(seg_df)[seq_len(min(top_n, nrow(seg_df)))] | ||
|
||
exprsMat <- t(as.matrix(adata$layers[["normalized"]])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is densification via as.matrix
really necessary here?
embedding <- prcomp(t(corrected_mat))$x[, 1:10, drop = FALSE] | ||
rownames(embedding) <- colnames(corrected_mat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PCA computation is not necessary for feature methods, because the "embedding" will be computed after the integration in a post-processing step. See Combat for example
embedding <- prcomp(t(corrected_mat))$x[, 1:10, drop = FALSE] | |
rownames(embedding) <- colnames(corrected_mat) |
obsm = list( | ||
X_emb = embedding[as.character(adata$obs_names), , drop = FALSE] # match input cells | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PCA not needed here
obsm = list( | |
X_emb = embedding[as.character(adata$obs_names), , drop = FALSE] # match input cells | |
), |
batch <- as.character(adata$obs$batch) | ||
cellTypes <- as.character(adata$obs$cell_type) | ||
|
||
scMerge2_res <- scMerge2( | ||
exprsMat = exprsMat, | ||
batch = batch, | ||
cellTypes = cellTypes, | ||
ctl = ctl, | ||
verbose = verbose | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: simplify code
batch <- as.character(adata$obs$batch) | |
cellTypes <- as.character(adata$obs$cell_type) | |
scMerge2_res <- scMerge2( | |
exprsMat = exprsMat, | |
batch = batch, | |
cellTypes = cellTypes, | |
ctl = ctl, | |
verbose = verbose | |
) | |
scMerge2_res <- scMerge2( | |
exprsMat = exprsMat, | |
batch = as.character(adata$obs$batch), | |
cellTypes = as.character(adata$obs$cell_type), | |
ctl = ctl, | |
verbose = verbose | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from unsupervised scMerge2 apply here as well
rownames(counts) <- as.character(adata$var_names) | ||
colnames(counts) <- as.character(adata$obs_names) | ||
|
||
seg_df <- scSEGIndex(exprs_mat = counts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment to document what you are doing here
Describe your changes
This PR is for a new method scMerge2.
Checklist before requesting a review
I have performed a self-review of my code
Check the correct box. Does this PR contain:
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!