diff --git a/CHANGELOG.md b/CHANGELOG.md index eae1069c..abf402f3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,16 +1,36 @@ # CHANGELOG +## v0.8.8 (2025-04-06) + +### Bug Fixes + +- **core**: Return hist_data instead of original data + ([`2734421`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/27344216b2d0f1fef43ac0e66fc1613ddfbf9349)) + + ## v0.8.7 (2025-04-03) ### Bug Fixes +- **_ripley**: Fixed conflicts + ([`fa4c06f`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/fa4c06f697ebe95438c3fc583e7767399b72dcf7)) + +- **_ripley**: Removed old call + ([`e89835b`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e89835b339034d6c543bd4b6231508811828c26d)) + - **core**: Specify weights for all histplot calls ([`b661495`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/b66149509fa4aa2280d14dfa6e83567c95c87cf8)) - **docstring**: Add returned df to doctstring ([`9caddca`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9caddca51e69e2213da866d9f74c6abe9ab7c181)) +- **interactive_spatial_plot**: Fixed typos in api name and arguments + ([`b833123`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/b833123dc9f124abf34e539be553a8388048ee1b)) + +- **present_summary_as_figure**: Fixed json conversion when non python types are used + ([`61c8480`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/61c848051cf86f6cf1cff5fe9bf012bb1c12c9d2)) + - **relational_heatmap**: Adjusted the flipped axis labels ([`5d950bb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5d950bb0bb3dd2f45e32d4bd7c4aa15f939922ac)) @@ -29,37 +49,6 @@ - **tests**: Update tests to match new return param ([`da83e3d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/da83e3dcef1b7ccb7de6e6854613f63bef7e8780)) -### Features - -- **core**: Change histogram/boxplot return types to dicts - ([`bf160c2`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/bf160c24c8fca82ad9057fb2be049b200f4f7139)) - -- **core**: Changed histogram to precompute data. - ([`0201c6f`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/0201c6f153b440c7b93494fc5355dcf1fe446c28)) - -- **core**: Changed how boxplot return type is handled - ([`714bf98`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/714bf98e2acf8b42b39588ec866f87ac38979dd0)) - -- **core**: Use plotly figure for static plot instead of png - ([`5738234`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5738234fc7e16cc8f9c1e421343028976f73ecc2)) - - -## v0.8.6 (2025-03-18) - -### Bug Fixes - -- **_ripley**: Fixed conflicts - ([`fa4c06f`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/fa4c06f697ebe95438c3fc583e7767399b72dcf7)) - -- **_ripley**: Removed old call - ([`e89835b`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e89835b339034d6c543bd4b6231508811828c26d)) - -- **interactive_spatial_plot**: Fixed typos in api name and arguments - ([`b833123`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/b833123dc9f124abf34e539be553a8388048ee1b)) - -- **present_summary_as_figure**: Fixed json conversion when non python types are used - ([`61c8480`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/61c848051cf86f6cf1cff5fe9bf012bb1c12c9d2)) - ### Build System - Restored docker file to FNLCR organization @@ -67,6 +56,9 @@ ### Continuous Integration +- **version**: Automatic development release + ([`1ad51bb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/1ad51bb329ec95c14de24e276bbf36c7375081b5)) + - **version**: Automatic development release ([`627b384`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/627b3846a0d913318c846fc73f11d6141fc6a64e)) @@ -93,6 +85,18 @@ - **_ripley_l_multiple**: Enabled edget correction to remove center cell near border ([`87324fd`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/87324fd1f168df6a809cf94321d10c632d2b9448)) +- **core**: Change histogram/boxplot return types to dicts + ([`bf160c2`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/bf160c24c8fca82ad9057fb2be049b200f4f7139)) + +- **core**: Changed histogram to precompute data. + ([`0201c6f`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/0201c6f153b440c7b93494fc5355dcf1fe446c28)) + +- **core**: Changed how boxplot return type is handled + ([`714bf98`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/714bf98e2acf8b42b39588ec866f87ac38979dd0)) + +- **core**: Use plotly figure for static plot instead of png + ([`5738234`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5738234fc7e16cc8f9c1e421343028976f73ecc2)) + - **ripley_l**: Added edge correction parameter to the high level function ([`9a54f15`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9a54f150c88cd4e86c0559d22ecf3f663bc6afd9)) diff --git a/paper/figure.tif b/paper/figure.tif new file mode 100644 index 00000000..e78e70c8 Binary files /dev/null and b/paper/figure.tif differ diff --git a/paper/paper.bib b/paper/paper.bib new file mode 100644 index 00000000..1f902e1d --- /dev/null +++ b/paper/paper.bib @@ -0,0 +1,190 @@ +@article{Gerdes:2013, + abstract = {Limitations on the number of unique protein and DNA molecules that can be characterized microscopically in a single tissue specimen impede advances in understanding the biological basis of health and disease. Here we present a multiplexed fluorescence microscopy method (MxIF) for quantitative, single-cell, and subcellular characterization of multiple analytes in formalin-fixed paraffinembedded tissue. Chemical inactivation of fluorescent dyes after each image acquisition round allows reuse of common dyes in iterative staining and imaging cycles. The mild inactivation chemistry is compatible with total and phosphoprotein detection, as well as DNA FISH. Accurate computational registration of sequential images is achieved by aligning nuclear counterstain-derived fiducial points. Individual cells, plasma membrane, cytoplasm, nucleus, tumor, and stromal regions are segmented to achieve cellular and subcellular quantification of multiplexed targets. In a comparison of pathologist scoring of diaminobenzidine staining of serial sections and automated MxIF scoring of a single section, human epidermal growth factor receptor 2, estrogen receptor, p53, and androgen receptor staining by diaminobenzidine and MxIF methods yielded similar results. Single-cell staining patterns of 61 protein antigens by MxIF in 747 colorectal cancer subjects reveals extensive tumor heterogeneity, and cluster analysis of divergent signaling through ERK1/2, S6 kinase 1, and 4E binding protein 1 provides insights into the spatial organization of mechanistic target of rapamycin and MAPK signal transduction. Our results suggest MxIF should be broadly applicable to problems in the fields of basic biological research, drug discovery and development, and clinical diagnostics.}, + author = {Gerdes, Michael J and Sevinsky, Christopher J and Sood, Anup and Adak, Sudeshna and Bello, Musodiq O and Bordwell, Alexander and Can, Ali and Corwin, Alex and Dinn, Sean and Filkins, Robert J and Hollman, Denise and Kamath, Vidya and Kaanumalle, Sireesha and Kenny, Kevin and Larsen, Melinda and Lazare, Michael and Li, Qing and Lowes, Christina and McCulloch, Colin C and McDonough, Elizabeth and Montalto, Michael C and Pang, Zhengyu and Rittscher, Jens and Santamaria-Pang, Alberto and Sarachan, Brion D and Seel, Maximilian L and Seppo, Antti and Shaikh, Kashan and Sui, Yunxia and Zhang, Jingyu and Ginty, Fiona}, + doi = {10.1073/pnas.1300136110}, + issn = {00278424}, + journal = {Proceedings of the National Academy of Sciences of the United States of America}, + keywords = {Cancer diagnostics,High-content cellular analysis,Image analysis,MTOR,Multiplexing}, + month = {jul}, + number = {29}, + pages = {11982--11987}, + pmid = {23818604}, + title = {{Highly multiplexed single-cell analysis of formalinfixed, paraffin-embedded cancer tissue}}, + volume = {110}, + year = {2013} +} + +@article{Nirmal:2024, + abstract = {Multiplexed imaging data are revolutionizing our understanding of the composition and organization of tissues and tumors. A critical aspect of such tissue profiling is quantifying the spatial relationship relationships among cells at different scales from the interaction of neighboring cells to recurrent communities of cells of multiple types. This often involves statistical analysis of 10^7 or more cells in which up to 100 biomolecules (commonly proteins) have been measured. While software tools currently cater to the analysis of spatial transcriptomics data, there remains a need for toolkits explicitly tailored to the complexities of multiplexed imaging data including the need to seamlessly integrate image visualization with data analysis and exploration. We introduce SCIMAP, a Python package specifically crafted to address these challenges. With SCIMAP, users can efficiently preprocess, analyze, and visualize large datasets, facilitating the exploration of spatial relationships and their statistical significance. SCIMAP's modular design enables the integration of new algorithms, enhancing its capabilities for spatial analysis.}, + author = {Nirmal, Ajit J and Sorger, Peter K}, + doi = {10.21105/joss.06604}, + journal = {Journal of Open Source Software}, + month = {may}, + number = {97}, + pages = {6604}, + publisher = {The Open Journal}, + title = {{SCIMAP: A Python Toolkit for Integrated Spatial Analysis of Multiplexed Imaging Data}}, + volume = {9}, + year = {2024} +} + +@article{Goltsev:2018, + abstract = {A highly multiplexed cytometric imaging approach, termed co-detection by indexing (CODEX), is used here to create multiplexed datasets of normal and lupus (MRL/lpr) murine spleens. CODEX iteratively visualizes antibody binding events using DNA barcodes, fluorescent dNTP analogs, and an in situ polymerization-based indexing procedure. An algorithmic pipeline for single-cell antigen quantification in tightly packed tissues was developed and used to overlay well-known morphological features with de novo characterization of lymphoid tissue architecture at a single-cell and cellular neighborhood levels. We observed an unexpected, profound impact of the cellular neighborhood on the expression of protein receptors on immune cells. By comparing normal murine spleen to spleens from animals with systemic autoimmune disease (MRL/lpr), extensive and previously uncharacterized splenic cell-interaction dynamics in the healthy versus diseased state was observed. The fidelity of multiplexed spatial cytometry demonstrated here allows for quantitative systemic characterization of tissue architecture in normal and clinically aberrant samples. A DNA barcoding-based imaging technique uses multiplexed tissue antigen staining to enable the characterization of cell types and dynamics in a model of autoimmune disease.}, + author = {Goltsev, Yury and Samusik, Nikolay and Kennedy-Darling, Julia and Bhate, Salil and Hale, Matthew and Vazquez, Gustavo and Black, Sarah and Nolan, Garry P}, + doi = {10.1016/j.cell.2018.07.010}, + issn = {10974172}, + journal = {Cell}, + keywords = {CODEX,autoimmunity,immune tissue,microenvironment,multidimensional imaging,multiplexed imaging,niche,tissue architecture}, + month = {aug}, + number = {4}, + pages = {968--981.e15}, + pmid = {30078711}, + publisher = {Cell Press}, + title = {{Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging}}, + volume = {174}, + year = {2018} +} + +@article{Lin:2018, + author = {Lin, Jia-Ren and Izar, Benjamin and Sorger, Peter K}, + doi = {10.7554/eLife.31657.002}, + journal = {eLife}, + title = {{Highly multiplexed immunofluorescence imaging of human tissues and tumors using t-CyCIF and conventional optical microscopes}}, + year = {2018} +} + +@article{Palla:2022, + abstract = {Spatial omics data are advancing the study of tissue organization and cellular communication at an unprecedented scale. Flexible tools are required to store, integrate and visualize the large diversity of spatial omics data. Here, we present Squidpy, a Python framework that brings together tools from omics and image analysis to enable scalable description of spatial molecular data, such as transcriptome or multivariate proteins. Squidpy provides efficient infrastructure and numerous analysis methods that allow to efficiently store, manipulate and interactively visualize spatial omics data. Squidpy is extensible and can be interfaced with a variety of already existing libraries for the scalable analysis of spatial omics data.}, + author = {Palla, Giovanni and Spitzer, Hannah and Klein, Michal and Fischer, David and Schaar, Anna Christina and Kuemmerle, Louis Benedikt and Rybakov, Sergei and Ibarra, Ignacio L and Holmberg, Olle and Virshup, Isaac and Lotfollahi, Mohammad and Richter, Sabrina and Theis, Fabian J}, + doi = {10.1038/s41592-021-01358-2}, + issn = {15487105}, + journal = {Nature Methods}, + month = {feb}, + number = {2}, + pages = {171--178}, + pmid = {35102346}, + publisher = {Nature Research}, + title = {{Squidpy: a scalable framework for spatial omics analysis}}, + volume = {19}, + year = {2022} +} + +@article{Dries:2021, + abstract = {Spatial transcriptomic and proteomic technologies have provided new opportunities to investigate cells in their native microenvironment. Here we present Giotto, a comprehensive and open-source toolbox for spatial data analysis and visualization. The analysis module provides end-to-end analysis by implementing a wide range of algorithms for characterizing tissue composition, spatial expression patterns, and cellular interactions. Furthermore, single-cell RNAseq data can be integrated for spatial cell-type enrichment analysis. The visualization module allows users to interactively visualize analysis outputs and imaging features. To demonstrate its general applicability, we apply Giotto to a wide range of datasets encompassing diverse technologies and platforms.}, + author = {Dries, Ruben and Zhu, Qian and Dong, Rui and Eng, Chee Huat Linus and Li, Huipeng and Liu, Kan and Fu, Yuntian and Zhao, Tianxiao and Sarkar, Arpan and Bao, Feng and George, Rani E and Pierson, Nico and Cai, Long and Yuan, Guo Cheng}, + doi = {10.1186/s13059-021-02286-2}, + issn = {1474760X}, + journal = {Genome Biology}, + month = {dec}, + number = {1}, + pmid = {33685491}, + publisher = {BioMed Central Ltd}, + title = {{Giotto: a toolbox for integrative analysis and visualization of spatial expression data}}, + volume = {22}, + year = {2021} +} + +@article{Giraldo:2021, + abstract = {Multiplex immunofluorescence (mIF) can detail spatial relationships and complex cell phenotypes in the tumor microenvironment (TME). However, the analysis and visualization of mIF data can be complex and time-consuming. Here, we used tumor specimens from 93 patients with metastatic melanoma to develop and validate a mIF data analysis pipeline using established flow cytometry workflows (image cytometry). Unlike flow cytometry, spatial information from the TME was conserved at single-cell resolution. A spatial uniform manifold approximation and projection (UMAP) was constructed using the image cytometry output. Spatial UMAP subtraction analysis (survivors vs. nonsurvivors at 5 years) was used to identify topographic and coexpression signatures with positive or negative prognostic impact. Cell densities and proportions identified by image cytometry showed strong correlations when compared with those obtained using gold-standard, digital pathology software (R2 > 0.8). The associated spatial UMAP highlighted “immune neighborhoods” and associated topographic immunoactive protein expression patterns. We found that PD-L1 and PD-1 expression intensity was spatially encoded—the highest PD-L1 expression intensity was observed on CD163+ cells in neighborhoods with high CD8+ cell density, and the highest PD-1 expression intensity was observed on CD8+ cells in neighborhoods with dense arrangements of tumor cells. Spatial UMAP subtraction analysis revealed numerous spatial clusters associated with clinical outcome. The variables represented in the key clusters from the unsupervised UMAP analysis were validated using established, supervised approaches. In conclusion, image cytometry and the spatial UMAPs presented herein are powerful tools for the visualization and interpretation of single-cell, spatially resolved mIF data and associated topographic biomarker development.}, + author = {Giraldo, Nicolas A and Berry, Sneha and Becht, Etienne and Ates, Deniz and Schenk, Kara M and Engle, Elizabeth L and Green, Benjamin and Nguyen, Peter and Soni, Abha and Stein, Julie E and Succaria, Farah and Ogurtsova, Aleksandra and Xu, Haiying and Gottardo, Raphael and Anders, Robert A and Lipson, Evan J and Danilova, Ludmila and Baras, Alexander S and Taube, Janis M}, + doi = {10.1158/2326-6066.CIR-21-0015}, + issn = {23266074}, + journal = {Cancer Immunology Research}, + month = {nov}, + number = {11}, + pages = {1262--1269}, + pmid = {34433588}, + publisher = {American Association for Cancer Research Inc.}, + title = {{Spatial UMAP and image cytometry for topographic immuno-oncology biomarker discovery}}, + volume = {9}, + year = {2021} +} + +@article{Long:2023, + abstract = {Spatial transcriptomics technologies generate gene expression profiles with spatial context, requiring spatially informed analysis tools for three key tasks, spatial clustering, multisample integration, and cell-type deconvolution. We present GraphST, a graph self-supervised contrastive learning method that fully exploits spatial transcriptomics data to outperform existing methods. It combines graph neural networks with self-supervised contrastive learning to learn informative and discriminative spot representations by minimizing the embedding distance between spatially adjacent spots and vice versa. We demonstrated GraphST on multiple tissue types and technology platforms. GraphST achieved 10% higher clustering accuracy and better delineated fine-grained tissue structures in brain and embryo tissues. GraphST is also the only method that can jointly analyze multiple tissue slices in vertical or horizontal integration while correcting batch effects. Lastly, GraphST demonstrated superior cell-type deconvolution to capture spatial niches like lymph node germinal centers and exhausted tumor infiltrating T cells in breast tumor tissue.}, + author = {Long, Yahui and Ang, Kok Siong and Li, Mengwei and Chong, Kian Long Kelvin and Sethi, Raman and Zhong, Chengwei and Xu, Hang and Ong, Zhiwei and Sachaphibulkij, Karishma and Chen, Ao and Zeng, Li and Fu, Huazhu and Wu, Min and Lim, Lina Hsiu Kim and Liu, Longqi and Chen, Jinmiao}, + doi = {10.1038/s41467-023-36796-3}, + issn = {20411723}, + journal = {Nature Communications}, + month = {dec}, + number = {1}, + pmid = {36859400}, + publisher = {Nature Research}, + title = {{Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST}}, + volume = {14}, + year = {2023} +} + +@article{Hao:2021, + abstract = {The simultaneous measurement of multiple modalities represents an exciting frontier for single-cell genomics and necessitates computational methods that can define cellular states based on multimodal data. Here, we introduce “weighted-nearest neighbor” analysis, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities. We apply our procedure to a CITE-seq dataset of 211,000 human peripheral blood mononuclear cells (PBMCs) with panels extending to 228 antibodies to construct a multimodal reference atlas of the circulating immune system. Multimodal analysis substantially improves our ability to resolve cell states, allowing us to identify and validate previously unreported lymphoid subpopulations. Moreover, we demonstrate how to leverage this reference to rapidly map new datasets and to interpret immune responses to vaccination and coronavirus disease 2019 (COVID-19). Our approach represents a broadly applicable strategy to analyze single-cell multimodal datasets and to look beyond the transcriptome toward a unified and multimodal definition of cellular identity.}, + author = {Hao, Yuhan and Hao, Stephanie and Andersen-Nissen, Erica and Mauck, William M and Zheng, Shiwei and Butler, Andrew and Lee, Maddie J and Wilk, Aaron J and Darby, Charlotte and Zager, Michael and Hoffman, Paul and Stoeckius, Marlon and Papalexi, Efthymia and Mimitou, Eleni P and Jain, Jaison and Srivastava, Avi and Stuart, Tim and Fleming, Lamar M and Yeung, Bertrand and Rogers, Angela J and McElrath, Juliana M and Blish, Catherine A and Gottardo, Raphael and Smibert, Peter and Satija, Rahul}, + doi = {10.1016/j.cell.2021.04.048}, + issn = {10974172}, + journal = {Cell}, + keywords = {CITE-seq,COVID-19,T cell,immune system,multimodal analysis,reference mapping,single cell genomics}, + month = {jun}, + number = {13}, + pages = {3573--3587.e29}, + pmid = {34062119}, + publisher = {Elsevier B.V.}, + title = {{Integrated analysis of multimodal single-cell data}}, + volume = {184}, + year = {2021} +} + +@article{Mah:2024, + abstract = {The spatial organization of molecules in a cell is essential for their functions. While current methods focus on discerning tissue architecture, cell–cell interactions, and spatial expression patterns, they are limited to the multicellular scale. We present Bento, a Python toolkit that takes advantage of single-molecule information to enable spatial analysis at the subcellular scale. Bento ingests molecular coordinates and segmentation boundaries to perform three analyses: defining subcellular domains, annotating localization patterns, and quantifying gene–gene colocalization. We demonstrate MERFISH, seqFISH +, Molecular Cartography, and Xenium datasets. Bento is part of the open-source Scverse ecosystem, enabling integration with other single-cell analysis tools.}, + author = {Mah, Clarence K and Ahmed, Noorsher and Lopez, Nicole A and Lam, Dylan C and Pong, Avery and Monell, Alexander and Kern, Colin and Han, Yuanyuan and Prasad, Gino and Cesnik, Anthony J and Lundberg, Emma and Zhu, Quan and Carter, Hannah and Yeo, Gene W}, + doi = {10.1186/s13059-024-03217-7}, + issn = {1474760X}, + journal = {Genome Biology}, + month = {dec}, + number = {1}, + publisher = {BioMed Central Ltd}, + title = {{Bento: a toolkit for subcellular analysis of spatial transcriptomics data}}, + volume = {25}, + year = {2024} +} + +@article{Feng:2023, + abstract = {Spatial proteomics technologies have revealed an underappreciated link between the location of cells in tissue microenvironments and the underlying biology and clinical features, but there is significant lag in the development of downstream analysis methods and benchmarking tools. Here we present SPIAT (spatial image analysis of tissues), a spatial-platform agnostic toolkit with a suite of spatial analysis algorithms, and spaSim (spatial simulator), a simulator of tissue spatial data. SPIAT includes multiple colocalization, neighborhood and spatial heterogeneity metrics to characterize the spatial patterns of cells. Ten spatial metrics of SPIAT are benchmarked using simulated data generated with spaSim. We show how SPIAT can uncover cancer immune subtypes correlated with prognosis in cancer and characterize cell dysfunction in diabetes. Our results suggest SPIAT and spaSim as useful tools for quantifying spatial patterns, identifying and validating correlates of clinical outcomes and supporting method development.}, + author = {Feng, Yuzhou and Yang, Tianpei and Zhu, John and Li, Mabel and Doyle, Maria and Ozcoban, Volkan and Bass, Greg T and Pizzolla, Angela and Cain, Lachlan and Weng, Sirui and Pasam, Anupama and Kocovski, Nikolce and Huang, Yu Kuan and Keam, Simon P and Speed, Terence P and Neeson, Paul J and Pearson, Richard B and Sandhu, Shahneen and Goode, David L and Trigos, Anna S}, + doi = {10.1038/s41467-023-37822-0}, + issn = {20411723}, + journal = {Nature Communications}, + month = {dec}, + number = {1}, + pmid = {37188662}, + publisher = {Nature Research}, + title = {{Spatial analysis with SPIAT and spaSim to characterize and simulate tissue microenvironments}}, + volume = {14}, + year = {2023} +} + +@misc{Keretsu:2022, + abstract = {

Glioblastoma (GBM) is the most aggressive primary brain cancer in adults and remains incurable. Our study revealed an immunosuppressive role of mucosal-associated invariant T (MAIT) cells in GBM. In bulk RNA sequencing data analysis of GBM tissues, MAIT cell gene signature significantly correlated with poor patient survival. A scRNA-seq of CD45 + cells from 23 GBM tissue samples showed 15 (65.2%) were positive for MAIT cells and the enrichment of MAIT17. The MAIT cell signature significantly correlated with the activity of tumor-associated neutrophils (TANs) and myeloid-derived suppressor cells (MDSCs). Multiple immune suppressive genes known to be used by TANs/MDSCs were upregulated in MAIT-positive tumors. Spatial imaging analysis of GBM tissues showed that all specimens were positive for both MAIT cells and TANs and localized enrichment of TANs. These findings highlight the MAIT-TAN/MDSC axis as a novel therapeutic target to modulate GBM's immunosuppressive tumor microenvironment.

}, + author = {Keretsu, Seketoulie and Hana, Taijun and Lee, Alexander and Kedei, Noemi and Malik, Nargis and Kim, Hye and Spurgeon, Jo and Khayrullina, Guzal and Ruf, Benjamin and Hara, Ayaka and Coombs, Morgan and Watowich, Matthew and Hari, Ananth and Ford, Michael K B and Sahinalp, Cenk and Watanabe, Masashi and Zaki, George and Gilbert, Mark R and Cimino, Patrick. J and Prins, Robert and Terabe, Masaki}, + doi = {10.1101/2022.07.17.499189}, + institution = {bioRxiv}, + month = {jul}, + title = {{MAIT cells have a negative impact on glioblastoma}}, + url = {http://biorxiv.org/lookup/doi/10.1101/2022.07.17.499189}, + year = {2022} +} + +@misc{CodeOcean, + author = {{Code Ocean}}, + title = {{Code Ocean}}, + url = {https://codeocean.com/}, + urldate = {2025-04-01}, + year = {2019} +} + +@misc{PalantirTechnologies, + author = {{Palantir Technologies}}, + booktitle = {Palantir Technologies}, + title = {{Palantir Foundry Documentation}}, + url = {https://palantir.com/docs/foundry/}, + urldate = {2025-04-01}, + year = {2003} +} diff --git a/paper/paper.md b/paper/paper.md new file mode 100644 index 00000000..6e7ca005 --- /dev/null +++ b/paper/paper.md @@ -0,0 +1,71 @@ +--- +title: 'SPAC: A Python Package for Spatial Single-Cell Analysis of Multiplexed Imaging' +tags: + - multiplexed imaging + - spatial proteomics + - single-cell analysis + - tumor microenvironment +authors: + - name: Fang Liu + orcid: 0000-0002-4283-8325 + affiliation: 1 + - name: Rui He + affiliation: 2 + - name: Andrei Bombin + affiliation: 3 + - name: Ahmad B. Abdallah + affiliation: 4 + - name: Omar Eldaghar + affiliation: 4 + - name: Tommy R. Sheeley + affiliation: 4 + - name: Sam E. Ying + affiliation: 4 + - name: George Zaki + orcid: 0000-0002-2740-3307 + corresponding: true + affiliation: 1 +affiliations: + - index: 1 + name: Frederick National Laboratory for Cancer Research, United States + - index: 2 + name: Essential Software Inc., United States + - index: 3 + name: Axle Informatics, United States + - index: 4 + name: Purdue University, United States +date: 12 April 2025 +bibliography: paper.bib +--- + +# Summary + +Multiplexed immunofluorescence microscopy captures detailed measurements of spatially resolved, multiple biomarkers simultaneously, revealing tissue composition and cellular interactions in situ among single cells. The growing scale and dimensional complexity of these datasets demand reproducible, comprehensive and user-friendly computational tools. To address this need, we developed SPAC **(SPA**tial single-**C**ell analysis), a Python-based package and a corresponding shiny application within an integrated, modular SPAC ecosystem designed specifically for biologists without extensive coding expertise. Following image segmentation and extraction of spatially resolved single-cell data, SPAC streamlines downstream phenotyping and spatial analysis, facilitating characterization of cellular heterogeneity and spatial organization within tissues. Through scalable performance, specialized spatial statistics, highly customizable visualizations, and seamless workflows from dataset to insights, SPAC significantly lowers barriers to sophisticated spatial analyses. + +# Statement of Need + +Advanced multiplex imaging technologies, such as CODEX [@Goltsev:2018], MxIF [@Gerdes:2013], CyCIF [@Lin:2018], generate high dimensional dataset capable of profiling up to dozens of biomarkers simultaneously. Analyzing and interpreting these complex spatial protein data pose significant computational challenges, especially given that high-resolution whole-slide imaging data can reach hundreds of gigabytes in size and contain millions of cells across extensive tissue areas. Currently, many spatial biology tools (e.g., Seurat [@Hao:2021], GraphST [@Long:2023], and bento [@Mah:2024]), primarily address spatial transcriptomics and cannot directly handle multiplexed protein imaging data. Other specialized software such as SPIA [@Feng:2023], Giotto [@Dries:2021], Squidpy [@Palla:2022], and SCIMAP [@Nirmal:2024] provides valuable capabilities tailored for spatial protein analyses. However, these tools lack sufficient flexibility and customization options necessary to meet the diverse scalable analysis and visualization needs of non-technical users. + +To address this gap, we developed the SPAC Python package and the web-based SPAC Shiny application, which together enhance analytical capabilities through intuitive terminology, optimized computational performance, specialized spatial statistics, and extensive visualization configurations. Results computed using the SPAC Python package are stored as AnnData objects, which can be interactively explored in real time via the SPAC Shiny web application, enabling researchers to dynamically visualize data, toggle annotations, inspect cell populations, and compare experimental conditions without requiring extensive computational expertise. + +Specifically, SPAC uses biologist-friendly terminology to simplify technical AnnData concepts. In SPAC, \"cells\" are rows in the data matrix, \"features\" denote protein expression levels, \"tables\" contain transformed data layers, \"associated tables\" store spatial coordinates or dimensional reductions (e.g., UMAP embeddings), and \"annotations\" indicate cell phenotypes, experimental labels, slide identifiers, and other categorical data. + +To address real-time scalability challenges in analyzing large multiplex imaging datasets (exceeding 10 million cells), SPAC enhances computational efficiency by over 5x by integrating optimized numerical routines from NumPy\'s compiled C-based backend. Traditional visualization methods, such as seaborn, were computationally inefficient at this scale. SPAC's modified routines reduce visualization processing times from tens of seconds to a few seconds for generating histograms, box plots, and other visualizations involving millions of cells. + +SPAC introduces specialized functions that enhance conventional spatial analyses. For example, SPAC implements a specialized variant of Ripley's L statistic to evaluate clustering or dispersion between predefined cell phenotype pairs---a "center" phenotype relative to a "neighbor" phenotype. Unlike generalized Ripley's implementations (e.g., Squidpy), SPAC explicitly distinguishes phenotype pairings and employs edge correction by excluding cells located near the region\'s borders within the analytical radius, mitigating edge-effect biases and enhancing statistical reliability. Furthermore, SPAC supports flexible phenotyping methods, accommodating both manual and unsupervised approaches tailored to diverse experimental designs and biological questions. It also implements efficient neighborhood profiling via a KDTree‐based approach, quantifying the distribution of neighboring cell phenotypes within user-defined distance bins. The resulting three-dimensional array, capturing the local cellular microenvironment, is stored in the AnnData object and supports dimensionality reduction methods like spatial UMAP [@Giraldo:2021]. This enhances comparative analysis and visualization of complex spatial relationships across multiple slides and phenotype combinations. + +SPAC provides customizable visualization methods, leveraging Plotly\'s interactive capabilities for dynamic exploration of spatial data. Interactive spatial plots allow users to toggle of features (e.g., biomarkers) and multiple annotations simultaneously, while a pin-color option ensures consistent color mapping across analyses. These designs help researchers intuitively explore spatial relationships by switching between different cell populations and identify patterns before performing detailed quantitative analyses. In addition, SPAC supports comparative visualization, such as overlaying manual classifications with unsupervised clustering or comparing spatial distributions across experimental conditions or treatments. It also enhances core analytical functions (e.g., nearest neighbor computations using SCIMAP\'s spatial distance calculations) by integrating extensive visualization configurations, including subgroup analyses, subset plots, and faceted layouts, allowing tailored visual outputs for various experimental contexts and research questions. + +# Structure and Implementation + +The SPAC package is available at [GitHub](https://github.com/FNLCR-DMAP/SCSAWorkflow) and can be installed locally via conda. It includes five modules that streamline data processing, transformation, spatial analysis, visualization, and utility functions. The data utils module standardizes data into AnnData objects, manages annotations, rescales and normalizes features, and performs filtering, downsampling, and essential spatial computations (e.g., centroid calculation). The transformation tools module employs clustering algorithms (e.g., Phenograph, UTAG, KNN), dimensionality reduction, and normalization methods (batch normalization, z-score, arcsinh) to translate high-dimensional data into biological interpretation. The spatial analysis module offers specialized functions like spatial interaction matrices, Ripley's L statistic with edge correction, and efficient KDTree-based neighborhood profiling. It supports stratified analyses, capturing spatial signatures of cell phenotypes. The visualization module provides interactive and customizable visualizations, allowing dynamic exploration of spatial relationships and comparative visualization across experimental conditions. The utils module includes helper functions for input validation, naming conventions, regex searches, and user-friendly error handling to ensure data integrity. + +All SPAC modules are interoperable, forming a cohesive workflow \autoref{fig:workflow}. By adopting the AnnData format, SPAC ensures broad compatibility with existing single-cell analysis tools, produces high-quality figures, and facilitates easy export for external analyses. SPAC adheres to enterprise-level software engineering standards, featuring extensive unit testing, rigorous edge-case evaluation, comprehensive logging, and clear, context-rich error handling. These practices ensure reliability, adaptability, and easy-of-use across various deployment environments, including interactive Jupyter notebooks, analytic platforms (e.g., Code Ocean [@CodeOcean], Palantir Foundry [@PalantirTechnologies]), and real-time dashboards such as Shiny. Emphasizing readability and maintainability, SPAC provides a versatile and enhanced analytical solution for spatial single-cell analyses. To date, SPAC has been used in the analysis of over 8 datasets with over 30 million cells across diverse studies [@Keretsu:2022]. + +![Overview of the SPAC Workflow. The schematic presents an integrated pipeline for spatial single-cell analysis. Segmented cell data with spatial coordinates from various imaging platforms are ingested, normalized, clustered and phenotyped, and analyzed spatially to assess cell distribution and interactions while maintaining consistent data lineage.\label{fig:workflow}](figure.tif) + +# Acknowledgements + +We thank our collaborators at the National Cancer Institute Frederick National Laboratory, the Purdue Data Mine program, and the single-cell and spatial imaging communities for their essential contributions and resources. + +# References diff --git a/setup.py b/setup.py index 5c3873e4..2078b19d 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='spac', - version="0.8.7", + version="0.8.8", description=( 'SPatial Analysis for single-Cell analysis (SPAC)' 'is a Scalable Python package for single-cell spatial protein data ' diff --git a/src/spac/__init__.py b/src/spac/__init__.py index c010a40d..2746d908 100644 --- a/src/spac/__init__.py +++ b/src/spac/__init__.py @@ -22,7 +22,7 @@ functions.extend(module_functions) # Define the package version before using it in __all__ -__version__ = "0.8.7" +__version__ = "0.8.8" # Define a __all__ list to specify which functions should be considered public __all__ = functions diff --git a/src/spac/transformations.py b/src/spac/transformations.py index f497d402..55236349 100644 --- a/src/spac/transformations.py +++ b/src/spac/transformations.py @@ -11,6 +11,8 @@ from scipy.sparse import issparse from typing import List, Union, Optional from numpy.lib import NumpyVersion +from sklearn.neighbors import KNeighborsClassifier +from sklearn.preprocessing import LabelEncoder import multiprocessing import parmap from spac.utag_functions import utag @@ -104,6 +106,120 @@ def phenograph_clustering( adata.uns["phenograph_features"] = features +def knn_clustering( + adata, + features, + annotation, + layer=None, + k=50, + output_annotation="knn", + associated_table=None, + missing_label="no_label", + **kwargs): + """ + Calculate knn clusters using sklearn KNeighborsClassifier + + The function will add these two attributes to `adata`: + `.obs[output_annotation]` + The assigned int64 class labels by KNeighborsClassifier + + `.uns[output_annotation_features]` + The features used to calculate the knn clusters + + Parameters + ---------- + adata : anndata.AnnData + The AnnData object. + + features : list of str + The variables that would be included in fitting the KNN classifier. + + annotation : str + The name of the annotation used for classifying the data + + layer : str, optional + The layer to be used. + + k : int, optional + The number of nearest neighbor to be used in creating the graph. + + output_annotation : str, optional + The name of the output layer where the clusters are stored. + + associated_table : str, optional + If set, use the corresponding key `adata.obsm` to calcuate the + clustering. Takes priority over the layer argument. + + missing_label : str or int + The value of missing annotations in adata.obs[annotation] + + Returns + ------- + None + adata is updated inplace + """ + + # read in data, validate annotation in the call here + _validate_transformation_inputs( + adata=adata, + layer=layer, + associated_table=associated_table, + features=features, + annotation=annotation, + ) + + if not isinstance(k, int) or k <= 0: + raise ValueError( + f"`k` must be a positive integer. Received value: `{k}`" + ) + + data = _select_input_features( + adata=adata, + layer=layer, + associated_table=associated_table, + features=features + ) + + # boolean masks for labeled and unlabeled data + annotation_data = adata.obs[annotation] + annotation_mask = annotation_data != missing_label + annotation_mask &= pd.notnull(annotation_data) + unlabeled_mask = ~annotation_mask + + # check that annotation is non-trivial + if all(annotation_mask): + raise ValueError( + f"All cells are labeled in the annotation `{annotation}`." + " Please provide a mix of labeled and unlabeled data." + ) + elif not any(annotation_mask): + raise ValueError( + f"No cells are labeled in the annotation `{annotation}`." + " Please provide a mix of labeled and unlabeled data." + ) + + # fit knn classifier to labeled data and predict on unlabeled data + data_labeled = data[annotation_mask] + label_encoder = LabelEncoder() + annotation_labeled = label_encoder.fit_transform( + annotation_data[annotation_mask] + ) + + classifier = KNeighborsClassifier(n_neighbors=k, **kwargs) + classifier.fit(data_labeled, annotation_labeled) + + data_unlabeled = data[unlabeled_mask] + knn_predict = classifier.predict(data_unlabeled) + predicted_labels = label_encoder.inverse_transform(knn_predict) + + # format output and place predictions/data in right location + adata.obs[output_annotation] = np.nan + adata.obs[output_annotation][unlabeled_mask] = predicted_labels + adata.obs[output_annotation][annotation_mask] = \ + annotation_data[annotation_mask] + adata.uns[f"{output_annotation}_features"] = features + + def get_cluster_info(adata, annotation, features=None, layer=None): """ Retrieve information about clusters based on specific annotation. @@ -316,7 +432,8 @@ def _validate_transformation_inputs( adata: anndata, layer: Optional[str] = None, associated_table: Optional[str] = None, - features: Optional[Union[List[str], str]] = None + features: Optional[Union[List[str], str]] = None, + annotation: Optional[str] = None, ) -> None: """ Validate inputs for transformation functions. @@ -331,6 +448,8 @@ def _validate_transformation_inputs( Name of the key in `obsm` that contains the numpy array. features : list of str or str, optional Names of features to use for transformation. + annotation: str, optional + Name of annotation column in `obs` that contains class labels Raises ------ @@ -355,6 +474,9 @@ def _validate_transformation_inputs( if features is not None: check_feature(adata, features=features) + if annotation is not None: + check_annotation(adata, annotations=annotation) + def _select_input_features(adata: anndata, layer: str = None, @@ -1088,13 +1210,13 @@ def run_utag_clustering( adata : anndata.AnnData The AnnData object. features : list - List of features to use for clustering or for PCA. Default + List of features to use for clustering or for PCA. Default (None) is to use all. k : int The number of nearest neighbor to be used in creating the graph. Default is 15. resolution : float - Resolution parameter for the clustering, higher resolution produces + Resolution parameter for the clustering, higher resolution produces more clusters. Default is 1. max_dist : float Maximum distance to cut edges within a graph. Default is 20. @@ -1107,8 +1229,8 @@ def run_utag_clustering( n_iterations : int Number of iterations for the clustering. slide_key: str - Key of adata.obs containing information on the batch structure - of the data.In general, for image data this will often be a variable + Key of adata.obs containing information on the batch structure + of the data.In general, for image data this will often be a variable indicating the imageb so image-specific effects are removed from data. Default is "Slide". @@ -1118,14 +1240,14 @@ def run_utag_clustering( Updated AnnData object with clustering results. """ resolutions = [resolution] - + _validate_transformation_inputs( adata=adata, layer=layer, associated_table=associated_table, features=features ) - + # add print the current k value if not isinstance(k, int) or k <= 0: raise ValueError(f"`k` must be a positive integer, but received {k}.") @@ -1143,7 +1265,7 @@ def run_utag_clustering( adata_utag.X = data else: adata_utag = adata.copy() - + utag_results = utag( adata_utag, slide_key=slide_key, @@ -1152,14 +1274,14 @@ def run_utag_clustering( apply_clustering=True, clustering_method="leiden", resolutions=resolutions, - leiden_kwargs={"n_iterations": n_iterations, + leiden_kwargs={"n_iterations": n_iterations, "random_state": random_state}, n_pcs=n_pcs, parallel=parallel, processes=n_jobs, k=k, ) - # change camel case to snake + # change camel case to snake curClusterCol = 'UTAG Label_leiden_' + str(resolution) cluster_list = utag_results.obs[curClusterCol].copy() adata.obs[output_annotation] = cluster_list.copy() diff --git a/src/spac/utils.py b/src/spac/utils.py index 634388c1..3a95568d 100644 --- a/src/spac/utils.py +++ b/src/spac/utils.py @@ -1007,7 +1007,11 @@ def get_defined_color_map(adata, defined_color_map=None, annotations=None, "an annotation column must be specified." ) # Generate a color mapping based on unique values in the annotation - unique_labels = np.unique(adata.obs[annotations].values) + if isinstance(annotations, str): + annotations = [annotations] + combined_labels = np.concatenate( + [adata.obs[col].astype(str).values for col in annotations]) + unique_labels = np.unique(combined_labels) return color_mapping( unique_labels, color_map=colorscale, diff --git a/src/spac/visualization.py b/src/spac/visualization.py index 0ab0ee11..80d6e8ff 100644 --- a/src/spac/visualization.py +++ b/src/spac/visualization.py @@ -753,9 +753,9 @@ def calculate_histogram(data, bins, bin_edges=None): ax.set_ylabel(ylabel) if len(axs) == 1: - return {"fig": fig, "axs": axs[0], "df": plot_data} + return {"fig": fig, "axs": axs[0], "df": hist_data} else: - return {"fig": fig, "axs": axs, "df": plot_data} + return {"fig": fig, "axs": axs, "df": hist_data} def heatmap(adata, column, layer=None, **kwargs): """ diff --git a/tests/test_transformations/test_knn_clustering.py b/tests/test_transformations/test_knn_clustering.py new file mode 100644 index 00000000..8efd1d97 --- /dev/null +++ b/tests/test_transformations/test_knn_clustering.py @@ -0,0 +1,233 @@ +import unittest +import numpy as np +import pandas as pd +from anndata import AnnData +from spac.transformations import knn_clustering + + +class TestKnnClustering(unittest.TestCase): + def setUp(self): + """ + Set up a test environment for KNN clustering. + + This method is run before each test in the TestKnnClustering class. It initializes a synthetic + AnnData object (`adata`) that simulates a dataset for supervised clustering tasks. The + dataset includes features and class annotations, with a portion of the labels intentionally + set to "no_label" to test the handling of missing values. + + The attributes of the created AnnData object include: + + - `adata.X`: A 2D numpy array representing the feature matrix, where: + - The first half of the rows are generated from a normal distribution with a mean of 10. + - The second half of the rows are generated from a normal distribution with a mean of 100. + + - `adata.obs['classes']`: Class annotations for each row in `adata`, where approximately + half of the rows have missing labels represented by "no_label". The labels are as follows: + - 0: Corresponds to data points with a mean around 10. + - 1: Corresponds to data points with a mean around 100. + + - `adata.obs['all_missing_classes']`: An array where all entries are set to "no_label", + simulating a scenario where no class labels are available. + + - `adata.obs['no_missing_classes']`: An array containing all class labels (0 and 1), + indicating that all data points have valid annotations. + + - `adata.obs['alt_classes']`: An alternative class label array where "no_label" entries + are replaced with NaN values, allowing for testing scenarios that require handling + missing values as NaNs. + + Additionally, this method sets up several attributes for use in tests: + + - `self.annotation`: A string representing the column name for class labels in `obs`. + - `self.alt_annotation`: A string representing the column name for alternative class labels. + - `self.layer`: A string indicating which layer of data to use for KNN clustering. + - `self.features`: A list of feature names used in the AnnData object, which includes + "gene1" and "gene2". + """ + ######### + # adata # + ######### + + # Generate 6 rows, two with mean centered at (10, 10) and two with means at (100, 100) + data = np.array([ + np.concatenate( + ( + np.random.normal(10, 1, 3), + np.random.normal(10, 1, 3) + ) + ), + np.concatenate( + ( + np.random.normal(100, 1, 3), + np.random.normal(100, 1, 3) + ) + ), + ]).reshape(-1, 2) + + # Generate class labels, label 0 = mean at (10, 10), label 1 = mean at (100, 100) + full_class_labels = np.array([0, 0, 0, 1, 1, 1],dtype=object) + class_labels = np.array([0, 0, "no_label", "no_label", 1, 1],dtype=object) + alt_class_labels = np.array([0, 0, np.nan, np.nan, 1, 1],dtype=object) + + # Wrap into an AnnData object + self.dataset = data + self.adata = AnnData( + X=self.dataset, var=pd.DataFrame(index=["gene1", "gene2"]) + ) + + self.adata.layers["counts"] = self.dataset + self.adata.obsm["derived_features"] = self.dataset + self.adata.obs["classes"] = class_labels + + # annotations with all labels missing or present + self.adata.obs["all_missing_classes"] = np.array(["no_label" for x in full_class_labels]) + self.adata.obs["no_missing_classes"] = full_class_labels + self.adata.obs["alt_classes"] = alt_class_labels + + # non-adata parameters for unittests + self.annotation = "classes" + self.alt_annotation = "alt_classes" + self.layer = "counts" + self.features = ["gene1", "gene2"] + + + def test_typical_case(self): + # This test checks if the function correctly adds 'knn' to the + # AnnData object's obs attribute and if it correctly sets + # 'knn_features' in the AnnData object's uns attribute. + knn_clustering( + adata=self.adata, + features=self.features, + annotation=self.annotation, + layer=self.layer, + k = 2 + ) + self.assertIn("knn", self.adata.obs) + self.assertEqual(self.adata.uns["knn_features"], self.features) + + def test_output_annotation(self): + # This test checks if the function correctly adds the "output_annotation" + # to the # AnnData object's obs attribute + output_annotation_name = "my_output_annotation" + knn_clustering( + adata=self.adata, + features=self.features, + annotation=self.annotation, + layer=self.layer, + k = 2, + output_annotation=output_annotation_name, + ) + self.assertIn(output_annotation_name, self.adata.obs) + + def test_layer_none_case(self): + # This test checks if the function works correctly when layer is None. + knn_clustering( + adata=self.adata, + features=self.features, + annotation=self.annotation, + layer=None, + k = 2 + ) + self.assertIn("knn", self.adata.obs) + self.assertEqual(self.adata.uns["knn_features"], self.features) + + def test_invalid_k(self): + # This test checks if the function raises a ValueError when the + # k argument is not a positive integer and checks the error message + invalid_k_value = 'invalid' + err_msg = (f"`k` must be a positive integer. Received value: `{invalid_k_value}`") + with self.assertRaisesRegex(ValueError, err_msg): + knn_clustering( + adata=self.adata, + features=self.features, + annotation=self.annotation, + layer=self.layer, + k=invalid_k_value, + ) + + def test_trivial_label(self): + # This test checks if the data is fully labeled or missing labels for every datapoint + # and the associated error messages + + # all datapoints labeled + no_missing_annotation = "no_missing_classes" + err_msg = (f"All cells are labeled in the annotation `{no_missing_annotation}`. Please provide a mix of labeled and unlabeled data.") + with self.assertRaisesRegex(ValueError, err_msg): + knn_clustering( + adata=self.adata, + features=self.features, + annotation=no_missing_annotation, + layer=self.layer, + k = 2 + ) + + # no datapoints labeled + all_missing_annotation = "all_missing_classes" + err_msg = (f"No cells are labeled in the annotation `{all_missing_annotation}`. Please provide a mix of labeled and unlabeled data.") + with self.assertRaisesRegex(ValueError, err_msg): + knn_clustering( + adata=self.adata, + features=self.features, + annotation="all_missing_classes", + layer=self.layer, + k = 2 + ) + + def test_clustering_accuracy(self): + knn_clustering( + adata=self.adata, + features=self.features, + annotation=self.annotation, + layer="counts", + k=2, + ) + + self.assertIn("knn", self.adata.obs) + self.assertEqual(len(np.unique(self.adata.obs["knn"])), 2) + + def test_associated_features(self): + # Run knn using the derived feature and generate two clusters + output_annotation = "derived_knn" + associated_table = "derived_features" + knn_clustering( + adata=self.adata, + features=None, + annotation=self.annotation, + layer=None, + k=2, + output_annotation=output_annotation, + associated_table=associated_table, + ) + + self.assertEqual(len(np.unique(self.adata.obs[output_annotation])), 2) + + def test_missing_label(self): + # This test checks that the missing label parameter works as intended + #first knn call with normal data + knn_clustering( + adata=self.adata, + features=self.features, + annotation=self.annotation, + layer="counts", + k=2, + output_annotation="knn_1", + associated_table=None, + missing_label = "no_label" + ) + #second knn call with alt_class data + knn_clustering( + adata=self.adata, + features=self.features, + annotation=self.alt_annotation, + layer="counts", + k=2, + output_annotation="knn_2", + associated_table=None, + missing_label = np.nan + ) + + #assert that they produce the same final label + self.assertTrue(all(self.adata.obs["knn_1"]==self.adata.obs["knn_2"])) + +if __name__ == "__main__": + unittest.main() diff --git a/tests/test_utils/test_get_defined_color_map.py b/tests/test_utils/test_get_defined_color_map.py index 7dba59dd..c2e0a360 100644 --- a/tests/test_utils/test_get_defined_color_map.py +++ b/tests/test_utils/test_get_defined_color_map.py @@ -86,7 +86,8 @@ def test_generate_color_map(self): self.assertIn('a', result) self.assertIn('b', result) # Check that the colors are correctly generated. - self.assertTrue(all(isinstance(color, str) for color in result.values())) + self.assertTrue(all(isinstance(color, str) for color + in result.values())) def test_missing_annotations(self): """ @@ -101,6 +102,18 @@ def test_missing_annotations(self): ): get_defined_color_map(dummy, defined_color_map=None) + def test_generate_color_map_multiple_annotations(self): + """ + Test handling of list-based annotations, + raises a NotImplementedError. + """ + obs = {'my_ann': pd.Series(['a', 'b', 'a']), + 'my_ann_2': pd.Series(['a', 'b', 'a'])} + dummy = DummyAnnData(uns={'dummy': {}}, obs=obs) + annos_list = list(('my_ann', 'my_ann_2')) + result = get_defined_color_map(dummy, + annotations=annos_list) + self.assertIsNotNone(result) if __name__ == '__main__': unittest.main()