Guo, Y., Wu, Z., Zheng, Z., & Li, X. (2024). An optimal multivariate-stratification geographical detector model for revealing the impact of multi-factor combinations on the dependent variable. GIScience & Remote Sensing, 61(1).
- data folder contains all the case data used in the article, stored in csv file format.
- consists of the main functions of the Optimal Multivariate-stratification Geographical Detector (OMGD) model, including computation functions and visualization functions.
- is used to sample data at a certain ratio (e.g. 50%).
- offers simulation results discussed in Section 3.2.
- test.ipynb and are used to reproduce the results shown in the main text.
- Supplemental material.pdf is the overview of functions of the OMGD model.
- requirements.txt and omgd.yml helps you quickly build Python / anaconda environment, respectivly.
You can install the dependencies using pip, or download the conda environment(folder) directly and configurate it.
Python >= 3.7 & Python < 3.11
cd your folder path
pip install wheel==0.41.2
pip install -r requirements.txt
Download anaconda from, open anaconda promt (using search bar), change the location to the folder which contains 'omgd.yml', change the prefix in the 'omgd.yml' to the target location, input 'conda env create -f omgd.yml -n omgd' and 'conda env list' to check if the omgd environment is configurated.
Open test.ipynb or, run the code to see if it works.
def scale_detector(path_list: Sequence, Y, factors:Sequence, disc_interval:Sequence, type_factors:Sequence=[], quantile:float=0.8, n_variates=1, random_state=0)
- scale detector can be used to detect the optimal spatial scale for spatial stratified heterogeneity analysis.
- path_list is a list includes various files location (different spatial scales), Y is the dependent variable field name and factors is a list containing field name of explanatory variables.
- disc_interval specifys the classification (stratification) number (e.g. [3, 7] indicates the explanatory variables / explanatory variables combinations are classified into 3, 4, 5, 6 and 7 categories iterately to find the optimal classification number.
- type_factor specifys the fields (name) which is already categorized rather than continuous.
- quantile defines how many variables are used to calculated the avaergae Q value (e.g., we have 10 variables / variables combinations, and the quantile is 0.8, then only the top 3 Q values are used to calculate the avaergae Q value).
- n_variates indicates how many variables are used in the analysis.
- the scale detector function returns scale_result (dataframe) and the best_scale (location of the optimal scale data file), the scale_result can be plotted using scale_plot (scale_result, size_list=[], dpi=100, unit='')
def omgd(df:pd.DataFrame, Y, factors:Sequence, n_variates:int, disc_interval:Sequence, type_factors:Sequence=[], random_state=0)
- one step omgd model, returns a dictionary (omgd_result) contains original dataframe (omgd_result['original']), classification result (omgd_result['classify']), factor detector result (omgd_result['factor']),
- interaction detector result (omgd_result['interaction']), risk detector result (omgd_result['risk']) and ecological detector result (omgd_result['ecological'])
- functions of four basic detectors.
- df: dataframe, includes Y fields (dependent variable) and factors fields (explanatory variables).
- omgd.factor_detector returns a dataframe which is sorted according to Geodetector Q value.
- omgd.interation_detector returns a dataframe which is sorted according to Geodetector Q value.
- omgd.risk_detector returns a list, which includes the significance test results and the mean values of each strata.
- omgd.ecological_detector returns a dataframe.
- the four baisc detectors can be plotting using factor_plot, factor_plot, risk_plot and ecological_plot, respectively. (e.g. omgd.factor_plot(factor_result)).
- function classify is used to automatically classify (discritize) continuous variables or variables combinations into n_clusters catogory, X is the values dataframe (explanatory variables), the result is stored in classify_result and colname is made up of single or multiple fields (explanatory variables) spliced together using '_'.
- the classification result can be plotted using 'classify_plot(original_df:pd.DataFrame, classify_df:pd.DataFrame, dpi=100, nrows=0, ncols=0, unit_list=[])'