Skip to content

Commit

Permalink
add SAR, update fitted method
Browse files Browse the repository at this point in the history
  • Loading branch information
ConnorDonegan committed Sep 19, 2022
1 parent 62070ec commit 1fa7365
Show file tree
Hide file tree
Showing 34 changed files with 887 additions and 1,573 deletions.
2 changes: 1 addition & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ README_files/
^docs$
^pkgdown$
^LICENSE\.md$
LICENSE
^index.md$
^data-raw/*
tests/testthat/test-for-bias-results
Expand All @@ -18,4 +19,3 @@ tests/testthat/test-rstan-conformity-results
index.html
cran-comments.md
^CRAN-RELEASE$
LICENSE.md
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Authors@R: c(
person("Mitzi", "Morris", role = "ctb")
)
Description: For Bayesian inference with spatial data, provides exploratory spatial analysis tools, multiple spatial model specifications, spatial model diagnostics, and special methods for inference with small area survey data (e.g., the America Community Survey (ACS)) and censored population health surveillance data. Models are pre-specified using the Stan programming language, a platform for Bayesian inference using Markov chain Monte Carlo (MCMC). References: Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>; Donegan (2021) <doi:10.31219/osf.io/3ey65>; Donegan, Chun and Hughes (2020) <doi:10.1016/j.spasta.2020.100450>; Donegan, Chun and Griffith (2021) <doi:10.3390/ijerph18136856>; Morris et al. (2019) <doi:10.1016/j.sste.2019.100301>.
License: GPL (>= 3)
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
Expand Down
16 changes: 16 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
geostan: Bayesian spatial analysis
Copyright (C) 2022 geostan contributors

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.

1,189 changes: 555 additions & 634 deletions LICENSE.md

Large diffs are not rendered by default.

6 changes: 5 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ The simultaneously-specified spatial autoregressive (SAR) model---referred to as

## Minor changes

Previously, when getting fitted values from an auto-normal model (i.e., the CAR model with `family = auto_gaussian()`) the fitted values did not include the implicit spatial trend. Now, the `fitted.geostan_fit` method will return the fitted values with the implicit spatial trend; this is consistent with the behavior of `residuals.geostan_fit`, which has an option to `detrend` the residuals. This applies to the SAR and CAR auto-normal specifications. For details, see the documentation pages for `stan_car` and `stan_sar`.

## Minor changes

The documentation for the models (`stan_glm`, `stan_car`, `stan_esf`, `stan_icar`, `stan_sar`) now uses Latex to typeset the model equations.

# geostan 0.3.0
Expand All @@ -18,7 +22,7 @@ The documentation for the models (`stan_glm`, `stan_car`, `stan_esf`, `stan_icar

## Minor changes

- geostan models can now be used with the bridgesampling package for model camparison with Bayes factors (e.g., use `bridge_sampler(geostan_fit$stanfit)`). By default, geostan only collects MCMC samples for parameters that are expected to be of some interest for users. To become compatible with bridgesampling, the `keep_all` argument was added to all of the model fitting functions. For important background and details see the bridgesampling package documentation and vignettes on [CRAN](https://CRAN.R-project.org/package=bridgesampling).
- geostan models can now be used with the bridgesampling package for model comparison with Bayes factors (e.g., use `bridge_sampler(geostan_fit$stanfit)`). By default, geostan only collects MCMC samples for parameters that are expected to be of some interest for users. To become compatible with bridgesampling, the `keep_all` argument was added to all of the model fitting functions. For important background and details see the bridgesampling package documentation and vignettes on [CRAN](https://CRAN.R-project.org/package=bridgesampling).
- stan_car now has an option to provide the connectivity matrix C, which is used to calculate spatial-lag of X (SLX) terms and residual spatial autocorrelation. Previously, there was no option to provide this matrix, as it was taken from the car_parts argument. However, that choice is only appropriate when the WCAR specification is used. Now, if C is missing and the WCAR specification has not been used a warning will appear.
- Previously, the `lisa` function would automatically center and scale the variate before computing local Moran's I. Now, the variate will be centered and scaled by default but the user has the option to turn the scaling off (so the variate will be centered, but not divided by its standard deviation). This function also row-standardized the spatial weights matrix automatically, but there was no reason why. That's not done anymore.

Expand Down
12 changes: 7 additions & 5 deletions R/convenience-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,9 @@ sim_sar <- function(m = 1, mu = rep(0, nrow(w)), w, rho, sigma = 1, ...) {
#'
#' @return A grid of spatial diagnostic plots. When provided with a numeric vector, this function plots a histogram, Moran scatter plot, and map. When provided with a fitted `geostan` model, the function returns a point-interval plot of observed values against fitted values (mean and 95 percent credible interval), either a Moran scatter plot of residuals or a histogram of Moran coefficient values calculated from the joint posterior distribution of the residuals, and a map of the mean posterior residuals (means of the marginal distributions).
#'
#' If `plot = TRUE`, the `ggplots` are drawn using \code{\link[gridExtra]{grid.arrange}}; otherwise, they are returned in a list. For the `geostan_fit` method, the underlying data for the Moran coefficient will also be returned if `plot = FALSE`.
#' If `plot = TRUE`, the `ggplots` are drawn using \link[gridExtra]{grid.arrange}; otherwise, they are returned in a list. For the `geostan_fit` method, the underlying data for the Moran coefficient will also be returned if `plot = FALSE`.
#'
#' When `y` is a fitted CAR or SAR model with `family = auto_gaussian()`, the fitted values will include implicit spatial trend term, i.e. the call to \link[geostan]{fitted.geostan_fit} will use the default `trend = TRUE` and the call to \link[geostan]{residuals.geostan_fit} will use the default `detrend = TRUE`. (See \link[geostan]{stan_car} or \link[geostan]{stan_sar} for additional details on their implicit spatial trend components.)
#'
#' @seealso \code{\link[geostan]{me_diag}}, \code{\link[geostan]{mc}}, \code{\link[geostan]{moran_plot}}, \code{\link[geostan]{aple}}
#'
Expand Down Expand Up @@ -203,7 +205,7 @@ sp_diag.geostan_fit <- function(y,
w = shape2mat(shape, match.arg(style)),
binwidth = function(x) 0.5 * stats::sd(x, na.rm = TRUE),
rates = TRUE,
size = 0.05,
size = 0.1,
...) {
mc_style <- match.arg(mc_style, c("scatter", "hist"))
if (!inherits(shape, "sf")) shape <- sf::st_as_sf(shape)
Expand Down Expand Up @@ -231,7 +233,7 @@ sp_diag.geostan_fit <- function(y,
marginal_residual <- apply(residuals(y, summary = FALSE, ...), 2, mean, na.rm = TRUE)
map.y <- ggplot(shape) +
geom_sf(aes(fill = marginal_residual),
lwd = .01,
lwd = .05,
col = "gray20") +
scale_fill_gradient2(name = name,
label = signs::signs) +
Expand Down Expand Up @@ -1071,7 +1073,7 @@ prep_car_data <- function(A,
#' \item{Widx}{Integer vector containing the indices corresponding to values of `-W` in `ImW_w` (i.e. non-diagonal entries of \eqn{(I-W)}).}
#' \item{nW}{Integer length of `Widx`.}
#' \item{eigenvalues_w}{Eigenvalues of \eqn{W} matrix.}
#' \item{n}{Number of rows in eqn{W}.}
#' \item{n}{Number of rows in \eqn{W}.}
#' \item{W}{Sparse matrix representation of \eqn{W}}
#' \item{rho_min}{Minimum permissible value of \eqn{\rho} (`1/min(eigenvalues_w)`).}
#' \item{rho_max}{Maximum permissible value of \eqn{\rho} (`1/max(eigenvalues_w)`.}
Expand All @@ -1082,7 +1084,7 @@ prep_car_data <- function(A,
#'
#' This is used internally to prepare data for \code{\link[geostan]{stan_sar}} models. It can also be helpful for fitting custom SAR models in Stan (outside of \code{geostan}).
#'
#' @seealso \code{\link[geostan]{shape2mat}}, \code{\link[geostan]{stan_sar}}, \code{\link[geostan]{prep_car_data}}, \code{\link[geostan]{prep_icar_data}}
#' @seealso \link[geostan]{shape2mat}, \link[geostan]{stan_sar}, \link[geostan]{prep_car_data}, \link[geostan]{prep_icar_data}
#'
#' @examples
#' data(georgia)
Expand Down
34 changes: 21 additions & 13 deletions R/geostan_fit-methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@
#' @param probs Argument passed to \code{quantile}; which quantiles to calculate and print.
#' @param fill fill color for histograms and density plots.
#' @param rates For Poisson and Binomial models, should the fitted values be returned as rates, as opposed to raw counts? Defaults to `TRUE`.
#' @param detrend For CAR models with Gaussian likelihood only (auto-gaussian); if `detrend = TRUE`, the implicit spatial trend will be removed from the residuals. The implicit spatial trend is `Trend = rho * C %*% (Y - Mu)` (see \code{\link[geostan]{stan_car}}). I.e., `resid = Y - (Mu + Trend)`.
#'
#' @param detrend For auto-normal models (CAR and SAR models with Gaussian likelihood only); if `detrend = TRUE`, the implicit spatial trend will be removed from the residuals. The implicit spatial trend is `Trend = rho * C %*% (Y - Mu)` (see \link[geostan]{stan_car} or \link[geostan]{stan_sar}). I.e., `resid = Y - (Mu + Trend)`.
#'
#' @param trend For auto-normal models (CAR and SAR models with Gaussian likelihood only); if `trend = TRUE`, the fitted values will include the implicit spatial trend term. The implicit spatial trend is `Trend = rho * C %*% (Y - Mu)` (see \link[geostan]{stan_car} or \link[geostan]{stan_sar}). I.e., if `trend = TRUE`, `fitted = Mu + Trend`.
#' @param ... additional arguments.
#'
#' @details
Expand Down Expand Up @@ -203,15 +206,14 @@ as.array.geostan_fit <- function(x, ...){
#' @export
residuals.geostan_fit <- function(object, summary = TRUE, rates = TRUE, detrend = TRUE, ...) {
y <- object$data[,1]
fits <- fitted(object, summary = FALSE, rates = rates)
fits <- fitted(object, summary = FALSE, rates = rates, trend = detrend)
if (rates && object$family$family == "binomial") { y <- y / (y + object$data[,2]) }
if (rates && object$family$family == "poisson" && "offset" %in% c(colnames(object$data))) {
log.at.risk <- object$data[, "offset"]
at.risk <- exp( log.at.risk )
y <- y / at.risk
}
R = sweep(fits, MARGIN = 2, STATS = as.array(y), FUN = .resid)
if (object$family$family == "auto_gaussian" && detrend) R <- R - spatial(object, summary = FALSE)
colnames(R) <- gsub("fitted", "residual", colnames(R))
if (summary) return( post_summary(R) )
return (R)
Expand All @@ -222,8 +224,12 @@ residuals.geostan_fit <- function(object, summary = TRUE, rates = TRUE, detrend
#' @import stats
#' @method fitted geostan_fit
#' @rdname geostan_fit
fitted.geostan_fit <- function(object, summary = TRUE, rates = TRUE, ...) {
fitted.geostan_fit <- function(object, summary = TRUE, rates = TRUE, trend = TRUE, ...) {
fits <- as.matrix(object$stanfit, pars = "fitted")
if (object$family$family == "auto_gaussian" && trend == TRUE) {
spatial_samples <- spatial(object, summary = FALSE)
fits <- fits + spatial_samples
}
if (!rates && object$family$family == "binomial") {
trials <- object$data[,1] + object$data[,2]
fits <- sweep(fits, MARGIN = 2, STATS = as.array(trials), FUN = "*")
Expand All @@ -248,13 +254,13 @@ spatial <- function(object, summary = TRUE, ...) {
#' @method spatial geostan_fit
#' @rdname geostan_fit
spatial.geostan_fit <- function(object, summary = TRUE, ...) {
if (is.na(object$spatial$par)) stop("This model does not have a spatial trend component to extract.")
if (!object$spatial$method %in% c("CAR", "SAR")) {
par <- as.character(object$spatial$par)
spatial.samples <- as.matrix(object, pars = par, ...)
} else {
spatial.samples <- extract_autoGauss_trend(object)
}
if (is.na(object$spatial$par)) stop("This model does not have a spatial trend component to extract.")
if (object$spatial$method %in% c("CAR", "SAR")) {
spatial.samples <- extract_autoGauss_trend(object)
} else {
par <- as.character(object$spatial$par)
spatial.samples <- as.matrix(object, pars = par, ...)
}
if (summary) {
return ( post_summary(spatial.samples) )
} else {
Expand All @@ -266,9 +272,11 @@ extract_autoGauss_trend <- function(object) {
if (object$spatial$method == "CAR") rho_name <- "car_rho"
if (object$spatial$method == "SAR") rho_name <- "sar_rho"
if (object$family$family == "auto_gaussian") {
C <- object$C
R <- resid(object, summary = FALSE, detrend = FALSE)
C <- object$C
y <- object$data[,1]
rho <- as.matrix(object, pars = rho_name)
fits <- as.matrix(object$stanfit, pars = "fitted")
R = sweep(fits, MARGIN = 2, STATS = as.array(y), FUN = .resid)
spatial.samples <- t(sapply(1:nrow(rho), function(i) {
as.numeric( rho[i] * C %*% R[i,] )
}))
Expand Down
2 changes: 1 addition & 1 deletion R/stan_car.R
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@
#' \deqn{
#' R = y - \mu - \rho C (y - \mu).
#' }
#' To obtain "raw" residuals (\eqn{y - \mu}), use `residuals(fit, detrend = FALSE)`.
#' To obtain "raw" residuals (\eqn{y - \mu}), use `residuals(fit, detrend = FALSE)`. Similarly, the fitted values obtained from the \link[geostan]{fitted.geostan_fit} will include the spatial trend term by default.
#'
#' ### Poisson
#'
Expand Down
2 changes: 1 addition & 1 deletion R/stan_glm.R
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
#' \deqn{
#' y \sim Poisson(P * e^{\eta})
#' }
#' where \eqn{P} is still the population at risk and \eqn{e^{\eta}} is the incidence rate (risk). The various spatial models available in \code{geostan} expand upon this specifiction (and others) by incorporating spatial arrangement and spatial autocorrelation.
#' where \eqn{P} is still the population at risk and \eqn{e^{\eta}} is the incidence rate (risk). The various spatial models available in \code{geostan} expand upon this specification (and others) by incorporating spatial arrangement and spatial autocorrelation.
#'
#' ### Spatially lagged covariates (SLX)
#'
Expand Down
17 changes: 15 additions & 2 deletions R/stan_sar.R
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@
#' \deqn{
#' R = y - \mu - \rho W (y - \mu).
#' }
#' To obtain "raw" residuals (\eqn{y - \mu}), use `residuals(fit, detrend = FALSE)`.
#' To obtain "raw" residuals (\eqn{y - \mu}), use `residuals(fit, detrend = FALSE)`. Similarly, the fitted values obtained from the \link[geostan]{fitted.geostan_fit} will include the spatial trend term by default.
#'
#' ### Poisson
#'
Expand Down Expand Up @@ -255,14 +255,27 @@
#' fit <- stan_sar(log(rate.male) ~ 1,
#' C = W,
#' data = georgia,
#' chains = 2, iter = 700 # for ex. speed only
#' chains = 1, # for ex. speed only
#' iter = 700
#' )
#'
#' rstan::stan_rhat(fit$stanfit)
#' rstan::stan_mcse(fit$stanfit)
#' print(fit)
#' plot(fit)
#' sp_diag(fit, georgia)
#'
#' \donttest{
#' # a more appropriate model for count data:
#' fit2 <- stan_sar(deaths.male ~ offset(log(pop.at.risk.male)),
#' C = W,
#' data = georgia,
#' family = poisson(),
#' chains = 1, # for ex. speed only
#' iter = 700
#' )
#' sp_diag(fit2, georgia)
#' }
#' @export
#' @md
#' @importFrom rstan extract_sparse_parts
Expand Down
26 changes: 13 additions & 13 deletions README.html

Large diffs are not rendered by default.

22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,8 @@ including a histogram, Moran scatter plot, and map:
``` r
A <- shape2mat(georgia, style = "B")
sp_diag(georgia$rate.female, georgia, w = A)
#> 3 NA values found in x. They will be dropped from the data before creating the Moran plot. If matrix w was row-standardized, it no longer is. To address this, you can use a binary connectivity matrix, using style = 'B' in shape2mat.
#> Warning: Removed 3 rows containing non-finite values (`stat_bin()`).
#> 3 NA values found in x; they will be dropped from the data before creating the Moran plot. If matrix w was row-standardized, it no longer is. You may want to use a binary connectivity matrix using style = 'B' in shape2mat.
#> Warning: Removed 3 rows containing non-finite values (stat_bin).
```

<img src="man/figures/README-unnamed-chunk-3-1.png" style="display: block; margin: auto;" />
Expand Down Expand Up @@ -113,7 +113,7 @@ fit <- stan_car(deaths.female ~ offset(log(pop.at.risk.female)),
#> df location scale
#> 1 10 0 3
#>
#> *Setting prior for CAR spatial autocorrelation parameter (rho)
#> *Setting prior for CAR spatial autocorrelation parameter (sar_rho)
#> Distribution: uniform
#> lower upper
#> 1 -1.7 1
Expand All @@ -124,8 +124,8 @@ diagnostics for spatial models:

``` r
sp_diag(fit, georgia, w = A)
#> 3 NA values found in x. They will be dropped from the data before creating the Moran plot. If matrix w was row-standardized, it no longer is. To address this, you can use a binary connectivity matrix, using style = 'B' in shape2mat.
#> Warning: Removed 3 rows containing missing values (`geom_pointrange()`).
#> 3 NA values found in x; they will be dropped from the data before creating the Moran plot. If matrix w was row-standardized, it no longer is. You may want to use a binary connectivity matrix using style = 'B' in shape2mat.
#> Warning: Removed 3 rows containing missing values (geom_pointrange).
```

<img src="man/figures/README-unnamed-chunk-5-1.png" style="display: block; margin: auto;" />
Expand All @@ -143,20 +143,20 @@ print(fit)
#> Spatial method (outcome): CAR
#> Likelihood function: poisson
#> Link function: log
#> Residual Moran Coefficient: 0.00053825
#> WAIC: 1291.44
#> Residual Moran Coefficient: 0.0017505
#> WAIC: 1290.39
#> Observations: 159
#> Data models (ME): none
#> Inference for Stan model: foundation.
#> 4 chains, each with iter=2000; warmup=1000; thin=1;
#> post-warmup draws per chain=1000, total post-warmup draws=4000.
#>
#> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
#> intercept -4.674 0.002 0.083 -4.826 -4.717 -4.674 -4.633 -4.510 2199 1.002
#> car_rho 0.921 0.001 0.057 0.781 0.891 0.932 0.963 0.994 3783 1.000
#> car_scale 0.458 0.001 0.034 0.395 0.434 0.456 0.480 0.531 4130 1.000
#> intercept -4.673 0.002 0.089 -4.851 -4.720 -4.675 -4.630 -4.488 2465 1.001
#> car_rho 0.926 0.001 0.057 0.783 0.898 0.939 0.969 0.995 2896 1.001
#> car_scale 0.456 0.001 0.036 0.391 0.432 0.454 0.479 0.532 3110 1.001
#>
#> Samples were drawn using NUTS(diag_e) at Sat Sep 17 21:02:38 2022.
#> Samples were drawn using NUTS(diag_e) at Mon Sep 19 15:23:26 2022.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split chains (at
#> convergence, Rhat=1).
Expand Down
Loading

0 comments on commit 1fa7365

Please sign in to comment.