diff --git a/NEWS.md b/NEWS.md index c00e60d5..6b76c83e 100644 --- a/NEWS.md +++ b/NEWS.md @@ -2,9 +2,9 @@ ### Minor changes -The `gamma` function has been renamed to `geostan::gamma2` to avoid conflict with `base::gamma`. +The `gamma` function (which is available to help set prior distributions) has been renamed to `geostan::gamma2` to avoid conflict with `base::gamma`. -Some code for `geostan::stan_car` was cleaned up to avoid sending duplicate variables to the Stan model when a spatial ME (measurement error) model was used: https://github.com/ConnorDonegan/geostan/issues/17. This should not change any functionality or results. +Some code for `geostan::stan_car` was cleaned up to avoid sending duplicate variables to the Stan model when a spatial ME (measurement error) model was used: https://github.com/ConnorDonegan/geostan/issues/17. This should not change any functionality and there is no reason to suspect that results were ever impacted by the duplicate variables. # geostan 0.5.2 diff --git a/R/stan_car.R b/R/stan_car.R index 76fd30b1..65d8b276 100644 --- a/R/stan_car.R +++ b/R/stan_car.R @@ -113,7 +113,7 @@ #' For \code{family = poisson()}, the model is specified as: #' \deqn{y \sim Poisson(e^{O + \lambda})} #' \deqn{\lambda \sim Gauss(\mu, (I - \rho C)^{-1} \boldsymbol M).} -#' If the raw outcome consists of a rate \eqn{\frac{y}{p}} with observed counts \eqn{y} and denominator {p} (often this will be the size of the population at risk), then the offset term \eqn{O=log(p)} is the log of the denominator. +#' If the raw outcome consists of a rate \eqn{\frac{y}{p}} with observed counts \eqn{y} and denominator \eqn{p} (often this will be the size of the population at risk), then the offset term \eqn{O=log(p)} is the log of the denominator. #' #' This is often written (equivalently) as: #' \deqn{y \sim Poisson(e^{O + \mu + \phi})} diff --git a/R/stan_sar.R b/R/stan_sar.R index be97f2e2..bf3b3c00 100644 --- a/R/stan_sar.R +++ b/R/stan_sar.R @@ -110,7 +110,7 @@ #' \deqn{\lambda \sim Gauss(\mu, \Sigma)} #' \deqn{\Sigma = \sigma^2 (I - \rho W)^{-1}(I - \rho W')^{-1}.} #' -#' If the raw outcome consists of a rate \eqn{\frac{y}{p}} with observed counts \eqn{y} and denominator {p} (often this will be the size of the population at risk), then the offset term \eqn{O=log(p)} is the log of the denominator. +#' If the raw outcome consists of a rate \eqn{\frac{y}{p}} with observed counts \eqn{y} and denominator \eqn{p} (often this will be the size of the population at risk), then the offset term \eqn{O=log(p)} is the log of the denominator. #' #' This is often written (equivalently) as: #' diff --git a/docs/articles/measuring-sa.html b/docs/articles/measuring-sa.html index 9ca0c363..8742901b 100644 --- a/docs/articles/measuring-sa.html +++ b/docs/articles/measuring-sa.html @@ -269,8 +269,8 @@
print(fit)
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
+## Running the chains for more iterations may help. See
+## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#tail-ess
-+print(fit)
## Spatial Model Results ## Formula: deaths.male ~ offset(log(pop.at.risk.male)) + insurance ## Spatial method (outcome): CAR ## Likelihood function: poisson ## Link function: log -## Residual Moran Coefficient: -0.003397692 -## WAIC: 1316.68 +## Residual Moran Coefficient: -0.0003923077 +## WAIC: 1317.89 ## Observations: 159 ## Data models (ME): insurance ## Data model (ME prior): CAR (auto Gaussian) @@ -233,18 +236,18 @@
Spatial ME model## post-warmup draws per chain=325, total post-warmup draws=1300. ## ## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat -## intercept -4.166 0.004 0.071 -4.290 -4.197 -4.166 -4.131 -4.013 315 1.016 -## insurance -2.749 0.020 0.753 -4.228 -3.239 -2.768 -2.258 -1.283 1402 1.000 -## car_rho 0.869 0.003 0.090 0.650 0.819 0.883 0.934 0.990 754 1.001 -## car_scale 0.447 0.001 0.034 0.384 0.424 0.445 0.469 0.521 1012 0.998 +## intercept -4.164 0.002 0.055 -4.274 -4.195 -4.163 -4.131 -4.048 871 1.000 +## insurance -2.734 0.020 0.744 -4.175 -3.233 -2.754 -2.232 -1.241 1381 1.001 +## car_rho 0.866 0.003 0.089 0.649 0.817 0.881 0.933 0.984 863 1.002 +## car_scale 0.448 0.001 0.033 0.389 0.425 0.447 0.471 0.516 790 1.002 ## -## Samples were drawn using NUTS(diag_e) at Mon Nov 13 20:11:18 2023. +## Samples were drawn using NUTS(diag_e) at Tue Nov 21 17:28:45 2023. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1).
The CAR models in geostan can be highly efficient in terms of MCMC sampling, but the required number of iterations will depend on many characteristics of the model and data. Often the default
iter = 2000
is more than sufficient (withcores = 4
). To speed up sampling with multi-core processing, usecores = 4
(to sample 4 chains in parallel).In the following section, methods for examining MCMC samples of the modeled covariate values \(\boldsymbol x\) will be illustrated. Note that the process of storing MCMC samples for \(\boldsymbol x\) can become computationally burdensome if you have multiple covariates and moderately large N. If you do not need the samples of \(\boldsymbol x\) you can use the
-slim
argument, as in:+fit <- stan_car(deaths.male ~ offset(log(pop.at.risk.male)) + insurance, centerx = TRUE, ME = ME_list, @@ -262,7 +265,7 @@
Visual diagnostics\(z_i\) an the modeled values \(x_i\): \[\delta_i = z_i - x_i.\] We want to look for any strong spatial pattern in these \(\delta_i\) values, because that would be an indication of a bias. However, the magnitude of the \(\delta_i\) value is important to consider—there may be a pattern, but if the amount of shrinkage is very small, that pattern may not matter. (The model returns \(M\) samples from the posterior distribution of each \(x_i\); or, indexing by MCMC sample \(x_i^m\) (\(m\) is an index, not an exponent). The reported \(\delta_i\) values is the MCMC mean \(\delta_i = \frac{1}{M} \sum_m z_i - x_i^m\).)
Two figures are provided to evaluate patterns in \(\delta_i\): first, a Moran scatter plot and, second, a map.
-+me_diag(fit, 'insurance', georgia)
In this case, the results do not look too concerning insofar as there are no conspicuous patterns. However, a number of the credible intervals on the modeled values are large, which indicates low data reliability. The fact that some of the \(\delta_i\) values are substantial also points to low data quality for those estimates.
@@ -272,54 +275,54 @@Working with MCMC samples from
geostan consists of pre-compiled Stan models, and users can always access the Markov chain Monte Carlo (MCMC) samples returned by Stan. When extracted as a matrix of samples (as below), each row of the matrix represents a draw from the joint probability distribution for all model parameters, and each column consists of samples from the marginal distribution of each parameter.
The ME models return samples for \(x_i\) as well as the model parameters \(\mu\) (“mu_x_true”), \(\rho\) (“car_rho_x_true”), and \(\tau\) (“sigma_x_true”). We can access these using
-as.matrix
(alsoas.array
andas.data.frame
).+-## [1] 1300 1
+head(mu.x)
-## parameters ## iterations mu_x_true[1] -## [1,] 1.830329 -## [2,] 1.817700 -## [3,] 1.728989 -## [4,] 1.840131 -## [5,] 1.818415 -## [6,] 1.775609
+## [1,] 1.687673 +## [2,] 1.688899 +## [3,] 1.727704 +## [4,] 1.888632 +## [5,] 1.719567 +## [6,] 1.902711+-mean(mu.x)
+## [1] 1.792244
## [1] 1.795267
We can visualize these using
-plot
or print a summary:+## Inference for Stan model: foundation. ## 4 chains, each with iter=650; warmup=325; thin=1; ## post-warmup draws per chain=325, total post-warmup draws=1300. ## ## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat -## mu_x_true[1] 1.79 0 0.08 1.65 1.75 1.79 1.83 1.95 633 1 -## car_rho_x_true[1] 0.91 0 0.07 0.75 0.87 0.92 0.96 0.99 996 1 -## sigma_x_true[1] 0.49 0 0.04 0.42 0.46 0.48 0.51 0.56 1072 1 +## mu_x_true[1] 1.80 0 0.09 1.65 1.75 1.79 1.83 1.97 520 1 +## car_rho_x_true[1] 0.91 0 0.06 0.75 0.87 0.92 0.96 0.99 1179 1 +## sigma_x_true[1] 0.48 0 0.04 0.42 0.46 0.48 0.51 0.56 1405 1 ## -## Samples were drawn using NUTS(diag_e) at Mon Nov 13 20:11:18 2023. +## Samples were drawn using NUTS(diag_e) at Tue Nov 21 17:28:45 2023. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1).
To extract samples from the joint probability distribution for \(\boldsymbol x\), use the generic parameter name “x_true”:
-+## [1] 1300 159
If we wanted to calculate the mean of each of these marginal distributions (one for every \(x_i\)), we could use
-apply
withMARGIN = 2
to summarize by column:++## 0.8700734## x_insurance[1] x_insurance[2] x_insurance[3] x_insurance[4] x_insurance[5] -## 0.8367161 0.8002348 0.8516637 0.8522715 0.9177878 +## 0.8364686 0.7999092 0.8519479 0.8521623 0.9180310 ## x_insurance[6] -## 0.8702328
The vector
-x.mu
contains estimates (posterior means) for \(x_i\). We might want to use these to plot the residuals or fitted values against the predictor:+ @@ -330,7 +333,7 @@Working with MCMC samples from
Non-spatial ME models
If the
-ME
list doesn’t have a slot withcar_parts
, geostan will automatically use a non-spatial Student’s t model instead of the CAR model: \[ p(\boldsymbol x | \mathcal{M}) = Student(\boldsymbol x | \nu, \mu \boldsymbol 1, \sigma) \]+ME_nsp <- prep_me_data( se = data.frame(insurance = georgia$insurance.se), logit = TRUE diff --git a/docs/articles/spatial-me-models_files/figure-html/unnamed-chunk-10-1.png b/docs/articles/spatial-me-models_files/figure-html/unnamed-chunk-10-1.png index b6c4a3aa..dd56c6e7 100644 Binary files a/docs/articles/spatial-me-models_files/figure-html/unnamed-chunk-10-1.png and b/docs/articles/spatial-me-models_files/figure-html/unnamed-chunk-10-1.png differ diff --git a/docs/articles/spatial-me-models_files/figure-html/unnamed-chunk-15-1.png b/docs/articles/spatial-me-models_files/figure-html/unnamed-chunk-15-1.png index 3e7ea4ad..7f24ef0c 100644 Binary files a/docs/articles/spatial-me-models_files/figure-html/unnamed-chunk-15-1.png and b/docs/articles/spatial-me-models_files/figure-html/unnamed-chunk-15-1.png differ diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index c487e9d8..f4c62cfd 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -6,7 +6,7 @@ articles: measuring-sa: measuring-sa.html raster-regression: raster-regression.html spatial-me-models: spatial-me-models.html -last_built: 2023-11-14T02:09Z +last_built: 2023-11-21T23:27Z urls: reference: https://connordonegan.github.io/geostan/reference article: https://connordonegan.github.io/geostan/articles diff --git a/docs/reference/stan_car.html b/docs/reference/stan_car.html index 4087ebac..3a9df334 100644 --- a/docs/reference/stan_car.html +++ b/docs/reference/stan_car.html @@ -309,7 +309,7 @@
Poisson
For
+If the raw outcome consists of a rate \(\frac{y}{p}\) with observed counts \(y\) and denominator \(p\) (often this will be the size of the population at risk), then the offset term \(O=log(p)\) is the log of the denominator.family = poisson()
, the model is specified as: $$y \sim Poisson(e^{O + \lambda})$$ $$\lambda \sim Gauss(\mu, (I - \rho C)^{-1} \boldsymbol M).$$ -If the raw outcome consists of a rate \(\frac{y}{p}\) with observed counts \(y\) and denominator p (often this will be the size of the population at risk), then the offset term \(O=log(p)\) is the log of the denominator.This is often written (equivalently) as: $$y \sim Poisson(e^{O + \mu + \phi})$$ $$\phi \sim Gauss(0, (I - \rho C)^{-1} \boldsymbol M).$$ diff --git a/docs/reference/stan_sar.html b/docs/reference/stan_sar.html index a8e9aff1..a801e104 100644 --- a/docs/reference/stan_sar.html +++ b/docs/reference/stan_sar.html @@ -301,7 +301,7 @@
Poisson
$$y \sim Poisson(e^{O + \lambda})$$ $$\lambda \sim Gauss(\mu, \Sigma)$$ $$\Sigma = \sigma^2 (I - \rho W)^{-1}(I - \rho W')^{-1}.$$
-If the raw outcome consists of a rate \(\frac{y}{p}\) with observed counts \(y\) and denominator p (often this will be the size of the population at risk), then the offset term \(O=log(p)\) is the log of the denominator.
+If the raw outcome consists of a rate \(\frac{y}{p}\) with observed counts \(y\) and denominator \(p\) (often this will be the size of the population at risk), then the offset term \(O=log(p)\) is the log of the denominator.
This is often written (equivalently) as:
$$y \sim Poisson(e^{O + \mu + \phi})$$ $$ \phi \sim Gauss(0, \Sigma) $$