Skip to content

Ziqi-Li/geoshapley

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPI GitHub

GeoShapley

A game theory approach to measuring spatial effects from machine learning models. GeoShapley is built on Shapley value and Kernel SHAP estimator.

Recent Updates

  • 05/2025 v0.1.0 - Several magnitude of speed-up for more than 10 features by implementing paired sampling from Covert and Lee (2021)

Installation:

GeoShapley can be installed from PyPI:

$ pip install geoshapley

To install the latest version from Github:

$ pip install git+https://github.com/ziqi-li/geoshapley.git

Example:

GeoShapley can explain any model that takes tabular data + spatial features (e.g., coordinates) as the input. Examples of natively supported models include:

  1. XGBoost/CatBoost/LightGBM/Random Forest
  2. Microsoft's FLAML AutoML (see example in notebook folder)
  3. MLP or other scikit-learn modules.
  4. Tabular Deep Learning models such as TabNet
  5. Explainable Boosting Machine
  6. Statistical models: OLS/Gaussian Process/GWR

Other models can be supported by defining a helper function model.predict() to wrap around their original models' prediction or inference functions.

Currently, spatial features (e.g., coordinates, or other encodings) need to be put as the last columns of your pandas.DataFrame(X_geo).

Below shows an example on how to explain a trained MLP model. More examples can be found at the notebooks folder.

from geoshapley import GeoShapleyExplainer
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_geo, y, random_state=1)

#Fit a NN model based on training data
mlp_model = MLPRegressor().fit(X_train, y_train)

#Specify a small background data
background = X_train.sample(100).values

#Initilize a GeoShapleyExplainer
mlp_explainer = GeoShapleyExplainer(mlp_model.predict, background)

#Explain the data
mlp_rslt = mlp_explainer.explain(X_geo)

#Make a shap-style summary plot
mlp_rslt.summary_plot()

#Make partial dependence plots of the primary (non-spatial) effects
mlp_rslt.partial_dependence_plots()

#Generate a ranked global feature contribution bar plot of the GeoShapley values.
mlp_rslt.contribution_bar_plot()

#Calculate spatially varying explanations
mlp_svc = mlp_rslt.get_svc()

Visuals:

Shap-style summary plot

Partial dependence plots of the primary (non-spatial) effects

Ranked global feature contribution bar plot of the GeoShapley values

References:

A list of recent papers that applied GeoShapley:

  • Peng, Z., Ji, H., Yuan, R., Wang, Y., Easa, S. M., Wang, C., ... & Zhao, X. (2025). Modeling and spatial analysis of heavy-duty truck CO2 using travel activities. Journal of Transport Geography, 124, 104158.
  • Ke, E., Zhao, J., & Zhao, Y. (2025). Investigating the influence of nonlinear spatial heterogeneity in urban flooding factors using geographic explainable artificial intelligence. Journal of Hydrology, 648, 132398.
  • Foroutan, E., Hu, T., & Li, Z. (2025). Revealing key factors of heat-related illnesses using geospatial explainable AI model: A case study in Texas, USA. Sustainable Cities and Society, 122, 106243.
  • Wu, R., Yu, G., & Cao, Y. (2025). The impact of industrial structural transformation in the Yangtze River economic belt on the trade-offs and synergies between urbanization and carbon balance. Ecological Indicators, 171, 113165.
  • Yang, A., Ai, J., & Arkolakis, C. (2025). A Geospatial Approach to Measuring Economic Activity (No. w33619). National Bureau of Economic Research (NBER).
  • Chen, Y., Ye, Y., Liu, X., Yin, C., & Jones, C. A. (2025). Examining the Nonlinear and Spatial Heterogeneity of Housing Prices in Urban Beijing: An Application of GeoShapley. Habitat International.

About

A game theory approach to measuring spatial effects from machine learning models.

Resources

Stars

Watchers

Forks

Packages

No packages published