forked from osbili/sandbox
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbglm_example3.qmd
223 lines (172 loc) · 6.15 KB
/
bglm_example3.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
title: "Bayesian GLM Part3"
author: "Murray Logan"
date: today
date-format: "DD/MM/YYYY"
format:
html:
## Format
theme: [default, ../resources/ws-style.scss]
css: ../resources/ws_style.css
html-math-method: mathjax
## Table of contents
toc: true
toc-float: true
## Numbering
number-sections: true
number-depth: 3
## Layout
page-layout: full
fig-caption-location: "bottom"
fig-align: "center"
fig-width: 4
fig-height: 4
fig-dpi: 72
tbl-cap-location: top
## Code
code-fold: false
code-tools: true
code-summary: "Show the code"
code-line-numbers: true
code-block-border-left: "#ccc"
code-copy: true
highlight-style: atom-one
## Execution
execute:
echo: true
cache: true
## Rendering
embed-resources: true
crossref:
fig-title: '**Figure**'
fig-labels: arabic
tbl-title: '**Table**'
tbl-labels: arabic
engine: knitr
output_dir: "docs"
documentclass: article
fontsize: 12pt
mainfont: Arial
mathfont: LiberationMono
monofont: DejaVu Sans Mono
classoption: a4paper
bibliography: ../resources/references.bib
---
```{r}
#| label: setup
#| include: false
knitr::opts_chunk$set(cache.lazy = FALSE,
tidy = "styler")
options(tinytex.engine = "xelatex")
```
# Preparations
Load the necessary libraries
```{r}
#| label: libraries
#| output: false
#| eval: true
#| warning: false
#| message: false
#| cache: false
library(tidyverse) #for data wrangling etc
library(rstanarm) #for fitting models in STAN
library(cmdstanr) #for cmdstan
library(brms) #for fitting models in STAN
library(coda) #for diagnostics
library(bayesplot) #for diagnostics
library(ggmcmc) #for MCMC diagnostics
library(DHARMa) #for residual diagnostics
library(rstan) #for interfacing with STAN
library(emmeans) #for marginal means etc
library(broom) #for tidying outputs
library(tidybayes) #for more tidying outputs
library(ggeffects) #for partial plots
library(broom.mixed) #for summarising models
library(ggeffects) #for partial effects plots
library(bayestestR) #for ROPE
library(see) #for some plots
library(easystats) #for the easystats ecosystem
library(INLA) #for approximate Bayes
library(INLAutils) #for additional INLA outputs
library(patchwork) #for multiple plots
library(modelsummary) #for data and model summaries
theme_set(theme_grey()) #put the default ggplot theme back
source("helperFunctions.R")
```
# Scenario
Here is a modified example from @Peake-1993-269. @Peake-1993-269 investigated the relationship between the number of individuals of invertebrates living in amongst clumps of mussels on a rocky intertidal shore and the area of those mussel clumps.
{#fig-mussels}
:::: {.columns}
::: {.column width="50%"}
| AREA | INDIV |
| --------- | ------- |
| 516.00 | 18 |
| 469.06 | 60 |
| 462.25 | 57 |
| 938.60 | 100 |
| 1357.15 | 48 |
| \... | \... |
: Format of peakquinn.csv data files {#tbl-peakquinn .table-condensed}
:::
::: {.column width="50%"}
----------- --------------------------------------------------------------
**AREA** Area of mussel clump mm^2^ - Predictor variable
**INDIV** Number of individuals found within clump - Response variable
----------- --------------------------------------------------------------
: Description of the variables in the peakquinn data file {#tbl-peakquinn1 .table-condensed}
:::
::::
The aim of the analysis is to investigate the relationship between mussel clump area and the number of non-mussel invertebrate individuals supported in the mussel clump.
# Read in the data
```{r readData, results='markdown', eval=TRUE}
peake <- read_csv("../data/peakquinn.csv", trim_ws = TRUE)
```
# Exploratory data analysis
When exploring these data as part of a frequentist analysis, exploratory data
analysis revealed that the both the response (counds of individuals) and
predictor (mussel clump area) were skewed and the relationship between raw
counds and mussel clump area was not linear. Furthermore, there was strong
evidence of a relationship between mean and variance. Normalising both reponse
and predictor addressed these issues. However, rather than log transform the
response, it was considered more appropriate to model against a distribution
that used a logarithmic link function.
The individual observations here ($y_i$) are the observed number of (non mussel
individuals found in mussel clump $i$. As a count, these might be expected to
follow a Poisson (or perhaps negative binomial) distribution. In the case of a
negative binomial, the observed count for any given mussel clump area are
expected to be drawn from a negative binomial distribution with a mean of
$\lambda_i$. All the negative binomial distributions are expected to share the
same degree of dispersion ($\theta$) - that is, the degree to which the
inhabitants of mussell clumps aggregate together (or any other reason for
overdispersion) is independent of mussel clump area and can be estimated as a
constant across all populations.
The natural log of the expected values ($\lambda_i$) is modelled against a
linear predictor that includes an intercept ($\beta_0$) and a slope ($\beta_1$)
associated with the natural log of mussel area.
The priors on $\beta_0$ and $\beta_1$ should be on the natura log scale (since
this will be the scale of the parameters). As starting points, we will consider
the following priors:
- $\beta_0$: Normal prior centered at 0 with a variance of 5
- $\beta_1$: Normal prior centered at 0 with a variance of 2
- $\theta$: Exponential prior with rate 1
Model formula:
$$
\begin{align}
y_i &\sim{} \mathcal{NB}(\lambda_i, \theta)\\
ln(\lambda_i) &= \beta_0 + \beta_1 ln(x_i)\\
\beta_0 & \sim\mathcal{N}(6,1.5)\\
\beta_1 & \sim\mathcal{N}(0,1)\\
\theta &\sim{} \mathcal{Exp}(1)\\
OR\\
\theta &\sim{} \mathcal{\Gamma}(2,1)\\
\end{align}
$$
# Fit the model
# MCMC sampling diagnostics
# Model validation
# Fit Negative Binomial rstanarm
# Partial effects plots
# Model investigation
# Hypothesis testing
# Summary figure
# References