Skip to content

Commit

Permalink
catching sim_sar/pred SLM bugs
Browse files Browse the repository at this point in the history
  • Loading branch information
ConnorDonegan committed Nov 7, 2024
1 parent 628d029 commit e961038
Show file tree
Hide file tree
Showing 11 changed files with 116 additions and 32 deletions.
11 changes: 7 additions & 4 deletions R/convenience-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ aple <- function(x, w, digits = 3) {
#'
#' This function takes `n = nrow(w)` draws from the normal distribution using `rnorm` to obtain vector `x`; if `type = 'SEM', it then pre-multiplies `x` by the inverse of the matrix `(I - rho * W)` to obtain spatially autocorrelated values. For `type = 'SLM', the multiplier matrix is applied to `x + mu` to produce the desired values.
#'
#' The `quick` method approximates the matrix inversion using the method described by LeSage and Pace (2009, p. 40).
#' The `quick` method approximates the matrix inversion using the method described by LeSage and Pace (2009, p. 40). For high values of rho, larger values of K are required for the approximation to suffice; you want `rho^K` to be near zero.
#'
#' @seealso \code{\link[geostan]{aple}}, \code{\link[geostan]{mc}}, \code{\link[geostan]{moran_plot}}, \code{\link[geostan]{lisa}}, \code{\link[geostan]{shape2mat}}
#'
Expand Down Expand Up @@ -171,7 +171,7 @@ sim_sar <- function(m = 1,
w,
type = c("SEM", "SLM"),
quick = FALSE,
K = 20,
K = 15,
...) {
check_sa_data(mu, w)
type <- match.arg(type)
Expand All @@ -184,23 +184,26 @@ sim_sar <- function(m = 1,
W <- as(w, "dMatrix")

if (quick) {

# matrix powers to approximate the inverse
if (rho^K > .08) warning("You may need higher K for this level of rho; rho^K = ", rho^K)
Multip <- I + rho * W
W_k <- W
for (j in 2:K) {
W_k <- W %*% W_k
Multip = Multip + rho^j * W_k
}

} else {

# matrix inverse
Multip <- Matrix::solve(I - rho * W)

}

x <- t(sapply(1:m, function(s) {
rnorm(N, mean = 0, sd = sigma)
}))

if (rho == 0) return (x)

.rsem <- function(s) {
mu + (Multip %*% x[s,])[,1]
Expand Down
15 changes: 8 additions & 7 deletions R/impacts.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,14 @@
#'
#' These functions are for interpreting the results of the spatial lag model ('SLM') and the spatial Durbin lag model ('SDLM'). The models can be fitted using \link[geostan]{stan_sar}. The equation for these SAR models specifies simultaneous feedback between all units, such that changing the outcome in one location has a spill-over effect that may extend to all other locations (a ripple or diffusion effect); the induced changes will also react back onto the first unit. (This presumably takes time, even if the observations are cross-sectional.)
#'
#' Assuming that the model specification is in fact reasonable for your application, these spill-overs have to be incorporated into the interpretation of the regression coefficients of SLM and SDLM models. A unit change in the value of \code{X} in one location will impact \code{y} in that same place ('direct' impact) and will also impact \code{y} elsewhere through the diffusion process ('indirect' impact). The 'total' impact of a unit change in \code{X} is the sum of the direct and indirect effects (after LeSage and Pace 2009).
#' These spill-overs have to be incorporated into the interpretation of the regression coefficients of SLM and SDLM models (granting that the model specification itself is reasonable for your application). A unit change in the value of \code{X} in one location will impact \code{y} in that same place ('direct' impact) and will also impact \code{y} elsewhere through the diffusion process ('indirect' impact). The 'total' impact of a unit change in \code{X} is the sum of the direct and indirect effects (LeSage and Pace 2009).
#'
#' The `spill` function is for quickly calculating average spillover effects given point estimates of parameters. (This can be used to verify the nearness of the approximate method to the proper method.)
#' The `spill` function is for quickly calculating average spillover effects given point estimates of parameters.
#'
#' The `impacts` function calculates the (average) direct, indirect, and total effects once for every MCMC sample to produce samples from the posterior distribution for the impacts; the samples are returned together with a summary of the distribution (mean, median, and select quantiles).
#'
#' These methods are only needed for the spatial lag and spatial Durbin lag models (SLM and SDLM). Any other model with spatially lagged covariates (SLX) might be read as having indirect impacts (as always, interpretation is subject to plausibility per causal reasoning) but the impacts are then equal to the SLX coefficients, the direct effects are equal to beta (the coefficient for X), and the 'total' impact is their sum (see the example below for calculation with MCMC samples).
#' The `impacts` function calculates the (average) direct, indirect, and total effects once for every MCMC sample to produce samples from the posterior distribution for the impacts; the samples are returned together with a summary of the posterior distribution (mean, median, and select quantiles).
#'
#' These methods are only required for the spatial lag and spatial Durbin lag models (SLM and SDLM).
#'
#' @source
#'
#' LeSage, James and Pace, R. Kelley (2009). *Introduction to Spatial Econometrics*. CRC Press.
Expand Down Expand Up @@ -61,6 +61,7 @@
#' type = "SDLM", iter = 500,
#' slim = TRUE, quiet = TRUE)
#'
#' # impacts (posterior distribution)
#' impax <- impacts(fit)
#' impax$summary
#'
Expand Down Expand Up @@ -179,8 +180,8 @@ impacts <- function(object, method = c('quick', 'proper'), K = 20) {
res <- cbind(mean = est, median = est2, sd, lwr, upr)
row.names(res) <- c('Direct', 'Indirect', 'Total')
summary[[m]] <- res
names(summary)[m] <- Blabs
}
names(summary)[m] <- Blabs[m]
}

return(list(summary = summary, samples = impax))

Expand Down
37 changes: 29 additions & 8 deletions R/posterior_predict.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
#'
#' @param quick For SAR models only; `quick = TRUE` uses an approximation method for the inverse of matrix `(I - rho * W)`.
#'
#' @param K For SAR models only; number of matrix powers to for the matrix inverse approximation.
#' @param K For SAR models only; number of matrix powers to for the matrix inverse approximation. Hiugh values of rho (e.g., > 0.9) require larger K for accurate approximations.
#'
#' @param preserve_order If `TRUE`, the order of posterior draws will remain fixed; the default is to permute the MCMC samples so that (with small sample size `S`) each successive call to `posterior_predict` will return a different sample from the posterior probability distribution.
#'
Expand Down Expand Up @@ -59,7 +59,7 @@ posterior_predict <- function(object,
stopifnot(inherits(object, "geostan_fit"))
family <- object$family$family
if (!missing(seed)) set.seed(seed)

mu <- as.matrix(object, pars = "fitted")
M <- nrow(mu)

Expand All @@ -75,7 +75,7 @@ posterior_predict <- function(object,
if (preserve_order) {
idx <- seq(S)
} else {
idx <- sample(M, S)
idx <- sample(M, size = S)
}

mu <- mu[idx,]
Expand All @@ -91,7 +91,7 @@ posterior_predict <- function(object,
if (object$spatial$method == "SAR") {
rho <- as.matrix(object, "sar_rho")[idx, ]
tau <- as.matrix(object, "sar_scale")[idx, ]
preds <- .pp_sar_normal(mu, rho, tau, object$sar_parts$W, quick = quick, K = K)
preds <- .pp_sar_normal(mu, rho, tau, object$sar_parts$W, quick = quick, K = K, object$sar_type)
}

}
Expand Down Expand Up @@ -135,15 +135,15 @@ posterior_predict <- function(object,
}


.pp_sar_normal <- function(mu, rho, sigma, W, quick, K) {
.pp_sar_normal <- function(mu, rho, sigma, W, quick, K, type) {

if (!quick) {
Draws <- sapply(1:nrow(mu), function(s) {
sim_sar(mu = mu[ s, ],
rho = rho[ s ],
sigma = sigma[ s ],
w = W,
type = "SEM", # Always SEM.
type = type,
quick = FALSE)
}) |>
t()
Expand Down Expand Up @@ -174,8 +174,29 @@ posterior_predict <- function(object,
return (res)
}

Draws <- sapply(1:S, .rsem) |>
t()
.rslm <- function(s) {
eps <- rnorm(N, mean = 0, sd = sigma[ S ])
rho_powers <- rho[ s ]^P
Mpowers <- lapply(seq(Q), function(j) M[[j]] * rho_powers[j])
Multiplier <- Reduce(`+`, Mpowers)
z <- mu[ s, ] + eps
res <- (Multiplier %*% z)[,1]
return (res)
}

SLM <- grepl("SLM|SDLM", type)

if (SLM) {

Draws <- sapply(1:S, .rslm) |>
t()

} else {

Draws <- sapply(1:S, .rsem) |>
t()

}

return (Draws)
}
Expand Down
7 changes: 4 additions & 3 deletions R/stan_sar.R
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@
#'
#' Use of the 'SDEM' and 'SDLM' options are for convenience: you can also obtain the Durbin models using the \code{slx} (spatial lag of X) argument. The \code{slx} argument allows control over which covariates will be added in spatial-lag form; the Durbin options include the spatial lag of all covariates.
#'
#' Most often, the SAR model is applied directly to observations (referred to below as the auto-normal or auto-Gaussian model). The SAR model can also be applied to a vector of parameters inside a hierarchical model. The latter enables spatial or network autocorrelation to be modeled when the observations are discrete counts (e.g., hierarchical models for disease incidence rates).
#' Most often, the SAR model is applied directly to observations (referred to below as the auto-normal or auto-Gaussian model). The SAR model can also be applied to a vector of parameters inside a hierarchical model. The latter enables spatial or network autocorrelation to be modeled when the observations are discrete counts (e.g., hierarchical models for disease incidence rates). Currently these hierarchical models are only supported for the spatial error models (SEM/SDEM).
#'
#' ### Auto-normal: spatial error
#'
Expand Down Expand Up @@ -130,7 +130,7 @@
#'
#' The \link[geostan]{residuals.geostan_fit} method returns 'de-trended' residuals \eqn{R} by default:
#' \deqn{R = y - \rho C y - \mu,}
#' where \eqn{\mu} contains the intercept and any covariates (and possibly other terms). Similarly, the fitted values obtained from the \link[geostan]{fitted.geostan_fit} will include the spatial trend \eqn{\rho C y} by default.
#' where \eqn{\mu} contains the intercept and any covariates (and possibly other terms). Similarly, the fitted values obtained from the \link[geostan]{fitted.geostan_fit} will include the spatial trend \eqn{\rho C y} by default.
#'
#' To read/interpret results from the SLM or SDLM, use the \link[geostan]{impacts} method.
#'
Expand Down Expand Up @@ -355,7 +355,8 @@ stan_sar <- function(formula,
if (quiet) refresh <- 0
#### SAR type ----
type <- match.arg(type)
Durbin <- grepl('SDEM|SDLM', type)
Durbin <- grepl('SDEM|SDLM', type)
if(grepl("SLM|SDLM", type) & family$family != "auto_gaussian") stop("SLM/SDLM are only available as auto-normal models (not Poisson or binomial models).")
#### SAR type [stop]
# C
if (!missing(C)) {
Expand Down
2 changes: 1 addition & 1 deletion docs/pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ articles:
spatial-analysis: spatial-analysis.html
spatial-me-models: spatial-me-models.html
spatial-weights-matrix: spatial-weights-matrix.html
last_built: 2024-11-04T15:45Z
last_built: 2024-11-07T19:06Z
urls:
reference: https://connordonegan.github.io/geostan/reference
article: https://connordonegan.github.io/geostan/articles
Expand Down
9 changes: 5 additions & 4 deletions docs/reference/impacts.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/reference/posterior_predict.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/reference/sim_sar.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit e961038

Please sign in to comment.