add SAR, update fitted method

ConnorDonegan · Sep 19, 2022 · 1fa7365 · 1fa7365
1 parent 62070ec
commit 1fa7365
Show file tree

Hide file tree

Showing 34 changed files with 887 additions and 1,573 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -10,6 +10,7 @@ README_files/
 ^docs$
 ^pkgdown$
 ^LICENSE\.md$
+LICENSE
 ^index.md$
 ^data-raw/*
 tests/testthat/test-for-bias-results
@@ -18,4 +19,3 @@ tests/testthat/test-rstan-conformity-results
 index.html
 cran-comments.md
 ^CRAN-RELEASE$
-LICENSE.md
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -10,7 +10,7 @@ Authors@R: c(
     person("Mitzi", "Morris", role = "ctb")
   )
 Description: For Bayesian inference with spatial data, provides exploratory spatial analysis tools, multiple spatial model specifications, spatial model diagnostics, and special methods for inference with small area survey data (e.g., the America Community Survey (ACS)) and censored population health surveillance data. Models are pre-specified using the Stan programming language, a platform for Bayesian inference using Markov chain Monte Carlo (MCMC). References: Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>; Donegan (2021) <doi:10.31219/osf.io/3ey65>; Donegan, Chun and Hughes (2020) <doi:10.1016/j.spasta.2020.100450>; Donegan, Chun and Griffith (2021) <doi:10.3390/ijerph18136856>; Morris et al. (2019) <doi:10.1016/j.sste.2019.100301>.
-License: GPL (>= 3)
+License: GPL (>= 3) 
 Encoding: UTF-8
 LazyData: true
 Roxygen: list(markdown = TRUE)

diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,16 @@
+geostan: Bayesian spatial analysis
+Copyright (C) 2022 geostan contributors
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
diff --git a/LICENSE.md b/LICENSE.md
diff --git a/NEWS.md b/NEWS.md
@@ -6,6 +6,10 @@ The simultaneously-specified spatial autoregressive (SAR) model---referred to as
 
 ## Minor changes
 
+Previously, when getting fitted values from an auto-normal model (i.e., the CAR model with `family = auto_gaussian()`) the fitted values did not include the implicit spatial trend. Now, the `fitted.geostan_fit` method will return the fitted values with the implicit spatial trend; this is consistent with the behavior of `residuals.geostan_fit`, which has an option to `detrend` the residuals. This applies to the SAR and CAR auto-normal specifications. For details, see the documentation pages for `stan_car`  and `stan_sar`.
+
+## Minor changes
+
 The documentation for the models (`stan_glm`, `stan_car`, `stan_esf`, `stan_icar`, `stan_sar`) now uses Latex to typeset the model equations.
 
 # geostan 0.3.0
@@ -18,7 +22,7 @@ The documentation for the models (`stan_glm`, `stan_car`, `stan_esf`, `stan_icar
 
 ## Minor changes
 
- - geostan models can now be used with the bridgesampling package for model camparison with Bayes factors (e.g., use `bridge_sampler(geostan_fit$stanfit)`). By default, geostan only collects MCMC samples for parameters that are expected to be of some interest for users. To become compatible with bridgesampling, the `keep_all` argument was added to all of the model fitting functions. For important background and details see the bridgesampling package documentation and vignettes on [CRAN](https://CRAN.R-project.org/package=bridgesampling).
+ - geostan models can now be used with the bridgesampling package for model comparison with Bayes factors (e.g., use `bridge_sampler(geostan_fit$stanfit)`). By default, geostan only collects MCMC samples for parameters that are expected to be of some interest for users. To become compatible with bridgesampling, the `keep_all` argument was added to all of the model fitting functions. For important background and details see the bridgesampling package documentation and vignettes on [CRAN](https://CRAN.R-project.org/package=bridgesampling).
  - stan_car now has an option to provide the connectivity matrix C, which is used to calculate spatial-lag of X (SLX) terms and residual spatial autocorrelation. Previously, there was no option to provide this matrix, as it was taken from the car_parts argument. However, that choice is only appropriate when the WCAR specification is used. Now, if C is missing and the WCAR specification has not been used a warning will appear.
  - Previously, the `lisa` function would automatically center and scale the variate before computing local Moran's I. Now, the variate will be centered and scaled by default but the user has the option to turn the scaling off (so the variate will be centered, but not divided by its standard deviation). This function also row-standardized the spatial weights matrix automatically, but there was no reason why. That's not done anymore.
 

diff --git a/R/convenience-functions.R b/R/convenience-functions.R
@@ -148,7 +148,9 @@ sim_sar <- function(m = 1, mu = rep(0, nrow(w)), w, rho, sigma = 1, ...) {
 #' 
 #' @return A grid of spatial diagnostic plots. When provided with a numeric vector, this function plots a histogram, Moran scatter plot, and map. When provided with a fitted `geostan` model, the function returns a point-interval plot of observed values against fitted values (mean and 95 percent credible interval), either a Moran scatter plot of residuals or a histogram of Moran coefficient values calculated from the joint posterior distribution of the residuals, and a map of the mean posterior residuals (means of the marginal distributions).
 #'
-#' If `plot = TRUE`, the `ggplots` are drawn using \code{\link[gridExtra]{grid.arrange}}; otherwise, they are returned in a list. For the `geostan_fit` method, the underlying data for the Moran coefficient will also be returned if `plot = FALSE`.
+#' If `plot = TRUE`, the `ggplots` are drawn using \link[gridExtra]{grid.arrange}; otherwise, they are returned in a list. For the `geostan_fit` method, the underlying data for the Moran coefficient will also be returned if `plot = FALSE`.
+#'
+#' When `y` is a fitted CAR or SAR model with `family = auto_gaussian()`, the fitted values will include implicit spatial trend term, i.e. the call to \link[geostan]{fitted.geostan_fit} will use the default `trend = TRUE` and the call to \link[geostan]{residuals.geostan_fit} will use the default `detrend = TRUE`. (See \link[geostan]{stan_car} or \link[geostan]{stan_sar} for additional details on their implicit spatial trend components.) 
 #'
 #' @seealso \code{\link[geostan]{me_diag}}, \code{\link[geostan]{mc}}, \code{\link[geostan]{moran_plot}}, \code{\link[geostan]{aple}}
 #' 
@@ -203,7 +205,7 @@ sp_diag.geostan_fit <- function(y,
                             w = shape2mat(shape, match.arg(style)),
                             binwidth = function(x) 0.5 * stats::sd(x, na.rm = TRUE),
                             rates = TRUE,
-                            size = 0.05,
+                            size = 0.1,
                             ...) {
     mc_style <- match.arg(mc_style, c("scatter", "hist"))
     if (!inherits(shape, "sf")) shape <- sf::st_as_sf(shape)
@@ -231,7 +233,7 @@ sp_diag.geostan_fit <- function(y,
     marginal_residual <- apply(residuals(y, summary = FALSE, ...), 2, mean, na.rm = TRUE)
     map.y <- ggplot(shape) +
         geom_sf(aes(fill = marginal_residual),
-                lwd =  .01,
+                lwd =  .05,
                 col = "gray20") +
         scale_fill_gradient2(name = name,
                              label = signs::signs) +
@@ -1071,7 +1073,7 @@ prep_car_data <- function(A,
 #' \item{Widx}{Integer vector containing the indices corresponding to values of `-W` in `ImW_w` (i.e. non-diagonal entries of \eqn{(I-W)}).}
 #' \item{nW}{Integer length of `Widx`.}
 #' \item{eigenvalues_w}{Eigenvalues of \eqn{W} matrix.}
-#' \item{n}{Number of rows in eqn{W}.}
+#' \item{n}{Number of rows in \eqn{W}.}
 #' \item{W}{Sparse matrix representation of \eqn{W}}
 #' \item{rho_min}{Minimum permissible value of \eqn{\rho} (`1/min(eigenvalues_w)`).}
 #' \item{rho_max}{Maximum permissible value of \eqn{\rho} (`1/max(eigenvalues_w)`.}
@@ -1082,7 +1084,7 @@ prep_car_data <- function(A,
 #'
 #' This is used internally to prepare data for \code{\link[geostan]{stan_sar}} models. It can also be helpful for fitting custom SAR models in Stan (outside of \code{geostan}). 
 #' 
-#' @seealso \code{\link[geostan]{shape2mat}}, \code{\link[geostan]{stan_sar}}, \code{\link[geostan]{prep_car_data}}, \code{\link[geostan]{prep_icar_data}}
+#' @seealso \link[geostan]{shape2mat}, \link[geostan]{stan_sar}, \link[geostan]{prep_car_data}, \link[geostan]{prep_icar_data}
 #'
 #' @examples
 #' data(georgia)

diff --git a/R/geostan_fit-methods.R b/R/geostan_fit-methods.R
@@ -20,7 +20,10 @@
 #' @param probs Argument passed to \code{quantile}; which quantiles to calculate and print.
 #' @param fill fill color for histograms and density plots.
 #' @param rates For Poisson and Binomial models, should the fitted values be returned as rates, as opposed to raw counts? Defaults to `TRUE`.
-#' @param detrend For CAR models with Gaussian likelihood only (auto-gaussian); if `detrend = TRUE`, the implicit spatial trend will be removed from the residuals. The implicit spatial trend is `Trend = rho * C %*% (Y - Mu)` (see \code{\link[geostan]{stan_car}}). I.e., `resid = Y - (Mu + Trend)`.
+#' 
+#' @param detrend For auto-normal models (CAR and SAR models with Gaussian likelihood only); if `detrend = TRUE`, the implicit spatial trend will be removed from the residuals. The implicit spatial trend is `Trend = rho * C %*% (Y - Mu)` (see \link[geostan]{stan_car} or \link[geostan]{stan_sar}). I.e., `resid = Y - (Mu + Trend)`.
+#' 
+#' @param trend For auto-normal models (CAR and SAR models with Gaussian likelihood only); if `trend = TRUE`, the fitted values will include the implicit spatial trend term. The implicit spatial trend is `Trend = rho * C %*% (Y - Mu)` (see \link[geostan]{stan_car} or \link[geostan]{stan_sar}). I.e., if `trend = TRUE`, `fitted = Mu + Trend`.
 #' @param ... additional arguments.
 #'
 #' @details
@@ -203,15 +206,14 @@ as.array.geostan_fit <- function(x, ...){
 #' @export
 residuals.geostan_fit <- function(object, summary = TRUE, rates = TRUE, detrend = TRUE, ...) {
     y <- object$data[,1]
-    fits <- fitted(object, summary = FALSE, rates = rates)
+    fits <- fitted(object, summary = FALSE, rates = rates, trend = detrend)
     if (rates && object$family$family == "binomial") { y <- y / (y + object$data[,2]) }
     if (rates && object$family$family == "poisson" && "offset" %in% c(colnames(object$data))) {
         log.at.risk <- object$data[, "offset"]
         at.risk <- exp( log.at.risk )
         y <- y / at.risk     
     }
     R = sweep(fits, MARGIN = 2, STATS = as.array(y), FUN = .resid)
-    if (object$family$family == "auto_gaussian" && detrend) R <- R - spatial(object, summary = FALSE)
     colnames(R) <- gsub("fitted", "residual", colnames(R))
     if (summary) return( post_summary(R) )
     return (R)  
@@ -222,8 +224,12 @@ residuals.geostan_fit <- function(object, summary = TRUE, rates = TRUE, detrend
 #' @import stats
 #' @method fitted geostan_fit
 #' @rdname geostan_fit
-fitted.geostan_fit <- function(object, summary = TRUE, rates = TRUE, ...) {
+fitted.geostan_fit <- function(object, summary = TRUE, rates = TRUE, trend = TRUE, ...) {
     fits <- as.matrix(object$stanfit, pars = "fitted")
+    if (object$family$family == "auto_gaussian" && trend == TRUE) {
+        spatial_samples <- spatial(object, summary = FALSE)
+        fits <- fits + spatial_samples
+    }
     if (!rates && object$family$family == "binomial") {        
         trials <- object$data[,1] + object$data[,2]
         fits <- sweep(fits, MARGIN = 2, STATS = as.array(trials), FUN = "*")
@@ -248,13 +254,13 @@ spatial <- function(object, summary = TRUE, ...) {
 #' @method spatial geostan_fit
 #' @rdname geostan_fit
 spatial.geostan_fit <- function(object, summary = TRUE, ...) {
-  if (is.na(object$spatial$par)) stop("This model does not have a spatial trend component to extract.")
-  if (!object$spatial$method %in% c("CAR", "SAR")) {
-      par <- as.character(object$spatial$par)
-      spatial.samples <- as.matrix(object, pars = par, ...)
-  } else {
-      spatial.samples <- extract_autoGauss_trend(object)
-  }
+    if (is.na(object$spatial$par)) stop("This model does not have a spatial trend component to extract.")
+    if (object$spatial$method %in% c("CAR", "SAR")) {
+        spatial.samples <- extract_autoGauss_trend(object)
+    } else {
+        par <- as.character(object$spatial$par)
+        spatial.samples <- as.matrix(object, pars = par, ...)
+    }
   if (summary) {
     return ( post_summary(spatial.samples) )
   } else {
@@ -266,9 +272,11 @@ extract_autoGauss_trend <- function(object) {
     if (object$spatial$method == "CAR") rho_name <- "car_rho"
     if (object$spatial$method == "SAR") rho_name <- "sar_rho"
     if (object$family$family == "auto_gaussian") {
-        C <- object$C              
-        R <- resid(object, summary = FALSE, detrend = FALSE)
+        C <- object$C
+        y <- object$data[,1]
         rho <- as.matrix(object, pars = rho_name)
+        fits <- as.matrix(object$stanfit, pars = "fitted")
+        R = sweep(fits, MARGIN = 2, STATS = as.array(y), FUN = .resid)
         spatial.samples <- t(sapply(1:nrow(rho), function(i) {
             as.numeric( rho[i] * C %*% R[i,] )
         }))         

diff --git a/R/stan_car.R b/R/stan_car.R
@@ -97,7 +97,7 @@
 #' \deqn{
 #' R = y - \mu - \rho C (y - \mu).
 #' }
-#' To obtain "raw" residuals (\eqn{y - \mu}), use `residuals(fit, detrend = FALSE)`.
+#' To obtain "raw" residuals (\eqn{y - \mu}), use `residuals(fit, detrend = FALSE)`. Similarly, the fitted values obtained from the \link[geostan]{fitted.geostan_fit} will include the spatial trend term by default.
 #' 
 #' ### Poisson
 #'

diff --git a/R/stan_glm.R b/R/stan_glm.R
@@ -75,7 +75,7 @@
 #' \deqn{
 #' y \sim Poisson(P * e^{\eta})
 #' }
-#' where \eqn{P} is still the population at risk and \eqn{e^{\eta}} is the incidence rate (risk). The various spatial models available in \code{geostan} expand upon this specifiction (and others) by incorporating spatial arrangement and spatial autocorrelation.
+#' where \eqn{P} is still the population at risk and \eqn{e^{\eta}} is the incidence rate (risk). The various spatial models available in \code{geostan} expand upon this specification (and others) by incorporating spatial arrangement and spatial autocorrelation.
 #'
 #' ### Spatially lagged covariates (SLX)
 #' 

diff --git a/R/stan_sar.R b/R/stan_sar.R
@@ -95,7 +95,7 @@
 #' \deqn{
 #' R = y - \mu - \rho W (y - \mu).
 #' }
-#' To obtain "raw" residuals (\eqn{y - \mu}), use `residuals(fit, detrend = FALSE)`.
+#' To obtain "raw" residuals (\eqn{y - \mu}), use `residuals(fit, detrend = FALSE)`. Similarly, the fitted values obtained from the \link[geostan]{fitted.geostan_fit} will include the spatial trend term by default.
 #'
 #' ### Poisson
 #'
@@ -255,14 +255,27 @@
 #' fit <- stan_sar(log(rate.male) ~ 1,
 #'                 C = W,
 #'                 data = georgia,
-#'                 chains = 2, iter = 700 # for ex. speed only
+#'                 chains = 1, # for ex. speed only
+#'                 iter = 700 
 #'                 )
 #' 
 #' rstan::stan_rhat(fit$stanfit)
 #' rstan::stan_mcse(fit$stanfit)
 #' print(fit)
+#' plot(fit)
 #' sp_diag(fit, georgia)
 #'
+#' \donttest{
+#'  # a more appropriate model for count data:
+#' fit2 <- stan_sar(deaths.male ~ offset(log(pop.at.risk.male)),
+#'                 C = W,
+#'                 data = georgia,
+#'                 family = poisson(),
+#'                 chains = 1, # for ex. speed only
+#'                 iter = 700 
+#'                  )
+#' sp_diag(fit2, georgia)
+#' }
 #' @export
 #' @md
 #' @importFrom rstan extract_sparse_parts

diff --git a/README.html b/README.html
diff --git a/README.md b/README.md
@@ -79,8 +79,8 @@ including a histogram, Moran scatter plot, and map:
 ``` r
 A <- shape2mat(georgia, style = "B")
 sp_diag(georgia$rate.female, georgia, w = A)
-#> 3 NA values found in x. They will be dropped from the data before creating the Moran plot. If matrix w was row-standardized, it no longer is. To address this, you can use a binary connectivity matrix, using style = 'B' in shape2mat.
-#> Warning: Removed 3 rows containing non-finite values (`stat_bin()`).
+#> 3 NA values found in x; they will be dropped from the data before creating the Moran plot. If matrix w was row-standardized, it no longer is. You may want to use a binary connectivity matrix using style = 'B' in shape2mat.
+#> Warning: Removed 3 rows containing non-finite values (stat_bin).
 ```
 
 <img src="man/figures/README-unnamed-chunk-3-1.png" style="display: block; margin: auto;" />
@@ -113,7 +113,7 @@ fit <- stan_car(deaths.female ~ offset(log(pop.at.risk.female)),
 #>   df location scale
 #> 1 10        0     3
 #> 
-#> *Setting prior for CAR spatial autocorrelation parameter (rho)
+#> *Setting prior for CAR spatial autocorrelation parameter (sar_rho)
 #> Distribution: uniform
 #>   lower upper
 #> 1  -1.7     1
@@ -124,8 +124,8 @@ diagnostics for spatial models:
 
 ``` r
 sp_diag(fit, georgia, w = A)
-#> 3 NA values found in x. They will be dropped from the data before creating the Moran plot. If matrix w was row-standardized, it no longer is. To address this, you can use a binary connectivity matrix, using style = 'B' in shape2mat.
-#> Warning: Removed 3 rows containing missing values (`geom_pointrange()`).
+#> 3 NA values found in x; they will be dropped from the data before creating the Moran plot. If matrix w was row-standardized, it no longer is. You may want to use a binary connectivity matrix using style = 'B' in shape2mat.
+#> Warning: Removed 3 rows containing missing values (geom_pointrange).
 ```
 
 <img src="man/figures/README-unnamed-chunk-5-1.png" style="display: block; margin: auto;" />
@@ -143,20 +143,20 @@ print(fit)
 #> Spatial method (outcome):  CAR 
 #> Likelihood function:  poisson 
 #> Link function:  log 
-#> Residual Moran Coefficient:  0.00053825 
-#> WAIC:  1291.44 
+#> Residual Moran Coefficient:  0.0017505 
+#> WAIC:  1290.39 
 #> Observations:  159 
 #> Data models (ME): none
 #> Inference for Stan model: foundation.
 #> 4 chains, each with iter=2000; warmup=1000; thin=1; 
 #> post-warmup draws per chain=1000, total post-warmup draws=4000.
 #> 
 #>             mean se_mean    sd   2.5%    25%    50%    75%  97.5% n_eff  Rhat
-#> intercept -4.674   0.002 0.083 -4.826 -4.717 -4.674 -4.633 -4.510  2199 1.002
-#> car_rho    0.921   0.001 0.057  0.781  0.891  0.932  0.963  0.994  3783 1.000
-#> car_scale  0.458   0.001 0.034  0.395  0.434  0.456  0.480  0.531  4130 1.000
+#> intercept -4.673   0.002 0.089 -4.851 -4.720 -4.675 -4.630 -4.488  2465 1.001
+#> car_rho    0.926   0.001 0.057  0.783  0.898  0.939  0.969  0.995  2896 1.001
+#> car_scale  0.456   0.001 0.036  0.391  0.432  0.454  0.479  0.532  3110 1.001
 #> 
-#> Samples were drawn using NUTS(diag_e) at Sat Sep 17 21:02:38 2022.
+#> Samples were drawn using NUTS(diag_e) at Mon Sep 19 15:23:26 2022.
 #> For each parameter, n_eff is a crude measure of effective sample size,
 #> and Rhat is the potential scale reduction factor on split chains (at 
 #> convergence, Rhat=1).