Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 35 additions & 31 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,36 @@
# CHANGELOG


## v0.8.8 (2025-04-06)

### Bug Fixes

- **core**: Return hist_data instead of original data
([`2734421`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/27344216b2d0f1fef43ac0e66fc1613ddfbf9349))


## v0.8.7 (2025-04-03)

### Bug Fixes

- **_ripley**: Fixed conflicts
([`fa4c06f`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/fa4c06f697ebe95438c3fc583e7767399b72dcf7))

- **_ripley**: Removed old call
([`e89835b`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e89835b339034d6c543bd4b6231508811828c26d))

- **core**: Specify weights for all histplot calls
([`b661495`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/b66149509fa4aa2280d14dfa6e83567c95c87cf8))

- **docstring**: Add returned df to doctstring
([`9caddca`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9caddca51e69e2213da866d9f74c6abe9ab7c181))

- **interactive_spatial_plot**: Fixed typos in api name and arguments
([`b833123`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/b833123dc9f124abf34e539be553a8388048ee1b))

- **present_summary_as_figure**: Fixed json conversion when non python types are used
([`61c8480`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/61c848051cf86f6cf1cff5fe9bf012bb1c12c9d2))

- **relational_heatmap**: Adjusted the flipped axis labels
([`5d950bb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5d950bb0bb3dd2f45e32d4bd7c4aa15f939922ac))

Expand All @@ -29,44 +49,16 @@
- **tests**: Update tests to match new return param
([`da83e3d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/da83e3dcef1b7ccb7de6e6854613f63bef7e8780))

### Features

- **core**: Change histogram/boxplot return types to dicts
([`bf160c2`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/bf160c24c8fca82ad9057fb2be049b200f4f7139))

- **core**: Changed histogram to precompute data.
([`0201c6f`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/0201c6f153b440c7b93494fc5355dcf1fe446c28))

- **core**: Changed how boxplot return type is handled
([`714bf98`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/714bf98e2acf8b42b39588ec866f87ac38979dd0))

- **core**: Use plotly figure for static plot instead of png
([`5738234`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5738234fc7e16cc8f9c1e421343028976f73ecc2))


## v0.8.6 (2025-03-18)

### Bug Fixes

- **_ripley**: Fixed conflicts
([`fa4c06f`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/fa4c06f697ebe95438c3fc583e7767399b72dcf7))

- **_ripley**: Removed old call
([`e89835b`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e89835b339034d6c543bd4b6231508811828c26d))

- **interactive_spatial_plot**: Fixed typos in api name and arguments
([`b833123`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/b833123dc9f124abf34e539be553a8388048ee1b))

- **present_summary_as_figure**: Fixed json conversion when non python types are used
([`61c8480`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/61c848051cf86f6cf1cff5fe9bf012bb1c12c9d2))

### Build System

- Restored docker file to FNLCR organization
([`739b4b3`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/739b4b3221efec4af2e9228aa8b3260f74a541bf))

### Continuous Integration

- **version**: Automatic development release
([`1ad51bb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/1ad51bb329ec95c14de24e276bbf36c7375081b5))

- **version**: Automatic development release
([`627b384`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/627b3846a0d913318c846fc73f11d6141fc6a64e))

Expand All @@ -93,6 +85,18 @@
- **_ripley_l_multiple**: Enabled edget correction to remove center cell near border
([`87324fd`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/87324fd1f168df6a809cf94321d10c632d2b9448))

- **core**: Change histogram/boxplot return types to dicts
([`bf160c2`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/bf160c24c8fca82ad9057fb2be049b200f4f7139))

- **core**: Changed histogram to precompute data.
([`0201c6f`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/0201c6f153b440c7b93494fc5355dcf1fe446c28))

- **core**: Changed how boxplot return type is handled
([`714bf98`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/714bf98e2acf8b42b39588ec866f87ac38979dd0))

- **core**: Use plotly figure for static plot instead of png
([`5738234`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5738234fc7e16cc8f9c1e421343028976f73ecc2))

- **ripley_l**: Added edge correction parameter to the high level function
([`9a54f15`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9a54f150c88cd4e86c0559d22ecf3f663bc6afd9))

Expand Down
Binary file added paper/figure.tif
Binary file not shown.
190 changes: 190 additions & 0 deletions paper/paper.bib

Large diffs are not rendered by default.

71 changes: 71 additions & 0 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: 'SPAC: A Python Package for Spatial Single-Cell Analysis of Multiplexed Imaging'
tags:
- multiplexed imaging
- spatial proteomics
- single-cell analysis
- tumor microenvironment
authors:
- name: Fang Liu
orcid: 0000-0002-4283-8325
affiliation: 1
- name: Rui He
affiliation: 2
- name: Andrei Bombin
affiliation: 3
- name: Ahmad B. Abdallah
affiliation: 4
- name: Omar Eldaghar
affiliation: 4
- name: Tommy R. Sheeley
affiliation: 4
- name: Sam E. Ying
affiliation: 4
- name: George Zaki
orcid: 0000-0002-2740-3307
corresponding: true
affiliation: 1
affiliations:
- index: 1
name: Frederick National Laboratory for Cancer Research, United States
- index: 2
name: Essential Software Inc., United States
- index: 3
name: Axle Informatics, United States
- index: 4
name: Purdue University, United States
date: 12 April 2025
bibliography: paper.bib
---

# Summary

Multiplexed immunofluorescence microscopy captures detailed measurements of spatially resolved, multiple biomarkers simultaneously, revealing tissue composition and cellular interactions in situ among single cells. The growing scale and dimensional complexity of these datasets demand reproducible, comprehensive and user-friendly computational tools. To address this need, we developed SPAC **(SPA**tial single-**C**ell analysis), a Python-based package and a corresponding shiny application within an integrated, modular SPAC ecosystem designed specifically for biologists without extensive coding expertise. Following image segmentation and extraction of spatially resolved single-cell data, SPAC streamlines downstream phenotyping and spatial analysis, facilitating characterization of cellular heterogeneity and spatial organization within tissues. Through scalable performance, specialized spatial statistics, highly customizable visualizations, and seamless workflows from dataset to insights, SPAC significantly lowers barriers to sophisticated spatial analyses.

# Statement of Need

Advanced multiplex imaging technologies, such as CODEX [@Goltsev:2018], MxIF [@Gerdes:2013], CyCIF [@Lin:2018], generate high dimensional dataset capable of profiling up to dozens of biomarkers simultaneously. Analyzing and interpreting these complex spatial protein data pose significant computational challenges, especially given that high-resolution whole-slide imaging data can reach hundreds of gigabytes in size and contain millions of cells across extensive tissue areas. Currently, many spatial biology tools (e.g., Seurat [@Hao:2021], GraphST [@Long:2023], and bento [@Mah:2024]), primarily address spatial transcriptomics and cannot directly handle multiplexed protein imaging data. Other specialized software such as SPIA [@Feng:2023], Giotto [@Dries:2021], Squidpy [@Palla:2022], and SCIMAP [@Nirmal:2024] provides valuable capabilities tailored for spatial protein analyses. However, these tools lack sufficient flexibility and customization options necessary to meet the diverse scalable analysis and visualization needs of non-technical users.

To address this gap, we developed the SPAC Python package and the web-based SPAC Shiny application, which together enhance analytical capabilities through intuitive terminology, optimized computational performance, specialized spatial statistics, and extensive visualization configurations. Results computed using the SPAC Python package are stored as AnnData objects, which can be interactively explored in real time via the SPAC Shiny web application, enabling researchers to dynamically visualize data, toggle annotations, inspect cell populations, and compare experimental conditions without requiring extensive computational expertise.

Specifically, SPAC uses biologist-friendly terminology to simplify technical AnnData concepts. In SPAC, \"cells\" are rows in the data matrix, \"features\" denote protein expression levels, \"tables\" contain transformed data layers, \"associated tables\" store spatial coordinates or dimensional reductions (e.g., UMAP embeddings), and \"annotations\" indicate cell phenotypes, experimental labels, slide identifiers, and other categorical data.

To address real-time scalability challenges in analyzing large multiplex imaging datasets (exceeding 10 million cells), SPAC enhances computational efficiency by over 5x by integrating optimized numerical routines from NumPy\'s compiled C-based backend. Traditional visualization methods, such as seaborn, were computationally inefficient at this scale. SPAC's modified routines reduce visualization processing times from tens of seconds to a few seconds for generating histograms, box plots, and other visualizations involving millions of cells.

SPAC introduces specialized functions that enhance conventional spatial analyses. For example, SPAC implements a specialized variant of Ripley's L statistic to evaluate clustering or dispersion between predefined cell phenotype pairs---a "center" phenotype relative to a "neighbor" phenotype. Unlike generalized Ripley's implementations (e.g., Squidpy), SPAC explicitly distinguishes phenotype pairings and employs edge correction by excluding cells located near the region\'s borders within the analytical radius, mitigating edge-effect biases and enhancing statistical reliability. Furthermore, SPAC supports flexible phenotyping methods, accommodating both manual and unsupervised approaches tailored to diverse experimental designs and biological questions. It also implements efficient neighborhood profiling via a KDTree‐based approach, quantifying the distribution of neighboring cell phenotypes within user-defined distance bins. The resulting three-dimensional array, capturing the local cellular microenvironment, is stored in the AnnData object and supports dimensionality reduction methods like spatial UMAP [@Giraldo:2021]. This enhances comparative analysis and visualization of complex spatial relationships across multiple slides and phenotype combinations.

SPAC provides customizable visualization methods, leveraging Plotly\'s interactive capabilities for dynamic exploration of spatial data. Interactive spatial plots allow users to toggle of features (e.g., biomarkers) and multiple annotations simultaneously, while a pin-color option ensures consistent color mapping across analyses. These designs help researchers intuitively explore spatial relationships by switching between different cell populations and identify patterns before performing detailed quantitative analyses. In addition, SPAC supports comparative visualization, such as overlaying manual classifications with unsupervised clustering or comparing spatial distributions across experimental conditions or treatments. It also enhances core analytical functions (e.g., nearest neighbor computations using SCIMAP\'s spatial distance calculations) by integrating extensive visualization configurations, including subgroup analyses, subset plots, and faceted layouts, allowing tailored visual outputs for various experimental contexts and research questions.

# Structure and Implementation

The SPAC package is available at [GitHub](https://github.com/FNLCR-DMAP/SCSAWorkflow) and can be installed locally via conda. It includes five modules that streamline data processing, transformation, spatial analysis, visualization, and utility functions. The data utils module standardizes data into AnnData objects, manages annotations, rescales and normalizes features, and performs filtering, downsampling, and essential spatial computations (e.g., centroid calculation). The transformation tools module employs clustering algorithms (e.g., Phenograph, UTAG, KNN), dimensionality reduction, and normalization methods (batch normalization, z-score, arcsinh) to translate high-dimensional data into biological interpretation. The spatial analysis module offers specialized functions like spatial interaction matrices, Ripley's L statistic with edge correction, and efficient KDTree-based neighborhood profiling. It supports stratified analyses, capturing spatial signatures of cell phenotypes. The visualization module provides interactive and customizable visualizations, allowing dynamic exploration of spatial relationships and comparative visualization across experimental conditions. The utils module includes helper functions for input validation, naming conventions, regex searches, and user-friendly error handling to ensure data integrity.

All SPAC modules are interoperable, forming a cohesive workflow \autoref{fig:workflow}. By adopting the AnnData format, SPAC ensures broad compatibility with existing single-cell analysis tools, produces high-quality figures, and facilitates easy export for external analyses. SPAC adheres to enterprise-level software engineering standards, featuring extensive unit testing, rigorous edge-case evaluation, comprehensive logging, and clear, context-rich error handling. These practices ensure reliability, adaptability, and easy-of-use across various deployment environments, including interactive Jupyter notebooks, analytic platforms (e.g., Code Ocean [@CodeOcean], Palantir Foundry [@PalantirTechnologies]), and real-time dashboards such as Shiny. Emphasizing readability and maintainability, SPAC provides a versatile and enhanced analytical solution for spatial single-cell analyses. To date, SPAC has been used in the analysis of over 8 datasets with over 30 million cells across diverse studies [@Keretsu:2022].

![Overview of the SPAC Workflow. The schematic presents an integrated pipeline for spatial single-cell analysis. Segmented cell data with spatial coordinates from various imaging platforms are ingested, normalized, clustered and phenotyped, and analyzed spatially to assess cell distribution and interactions while maintaining consistent data lineage.\label{fig:workflow}](figure.tif)

# Acknowledgements

We thank our collaborators at the National Cancer Institute Frederick National Laboratory, the Purdue Data Mine program, and the single-cell and spatial imaging communities for their essential contributions and resources.

# References
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

setup(
name='spac',
version="0.8.7",
version="0.8.8",
description=(
'SPatial Analysis for single-Cell analysis (SPAC)'
'is a Scalable Python package for single-cell spatial protein data '
Expand Down
2 changes: 1 addition & 1 deletion src/spac/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
functions.extend(module_functions)

# Define the package version before using it in __all__
__version__ = "0.8.7"
__version__ = "0.8.8"

# Define a __all__ list to specify which functions should be considered public
__all__ = functions
Loading