V1.2#29
Conversation
…n with visualizations
|
@reubenthomas I added the changes to suggest_resolution for the local optima filtering step, let me know if it looks good and I'll merge this in Lines 223 to 229 in b40b66b |
|
@natalie-23-gill, what would be the reason to scale the threshold based on the number of subjects? The Hellinger distance takes values between 0 and 1 - the threshold you have employed would make the reproducibility requirement more stringent with more number of samples, specifically using the square root of the reciprocal of number of samples. The choice for the specific function (square root of reciprocal) would probably need to optimized further. For now, I think it maybe easier to have fixed thresholds of 0.1 and 0.2 (given that they have reasonable interpretation in terms of what the Hellinger distance is capturing). Of course, open to change after hearing your thoughts. |
|
@reubenthomas That makes sense, I misunderstood how the thresholds were derived for the simulations. I reverted to just the simple thresholds: Lines 230 to 242 in 51a5456 |
closes #28 and #10
New features
calculate_kl_divergence()) and Hellinger distance(
calculate_hellinger_distance()) metrics for evaluating cluster distributionconsistency between training and held-out subjects.
calculate_modularity()) computed on the precomputedSNN graph.
calculate_mse_score()) for centroid-based clusterquality evaluation.
suggest_resolution()function that ranks resolutions using twocomplementary methods: direct rank aggregation across four metrics, and
curvature-based local optima detection via second-order finite differences.
summarize_cv_metrics()for per-resolution metric summaries.plot_rank_metrics()andplot_mean_rank().Performance improvements
growing it with
c(), reducing memory allocation overhead for many subjects.crossprod()instead of explicitt() %*%in PCA projections, avoidingmaterialization of transposed gene-by-cell matrices.
Matrix::crossprod()in modularity calculation to stay in sparse matrixspace and avoid dense transposition of the cluster indicator matrix.
do.call(rbind, lapply(...))witht(vapply(...))in MADcalculation to avoid intermediate list allocation.
across resolutions) instead of recomputing per resolution.
column lookups.
Logging and verbosity
verboseparameter now accepts integer levels (0-3) for fine-grained control:0 = silent, 1 = key milestones, 2 = detailed progress, 3 = Seurat output.
verbose = TRUEmaps to level 1,FALSEto 0.[step_name] Xslog messages atverbose >= 1.Bug fixes
Package reorganization
clustOpt.Randutils.Rinto focused modules:clust_opt.R,data_preparation.R,metrics.R,sketching.R,validation.R,visualization.R.