forked from osbili/sandbox
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbglm_example7.qmd
293 lines (229 loc) · 8.37 KB
/
bglm_example7.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
---
title: "Bayesian GLM Part7"
author: "Murray Logan"
date: today
date-format: "DD/MM/YYYY"
format:
html:
## Format
theme: [default, ../resources/ws-style.scss]
css: ../resources/ws_style.css
html-math-method: mathjax
## Table of contents
toc: true
toc-float: true
## Numbering
number-sections: true
number-depth: 3
## Layout
page-layout: full
fig-caption-location: "bottom"
fig-align: "center"
fig-width: 4
fig-height: 4
fig-dpi: 72
tbl-cap-location: top
## Code
code-fold: false
code-tools: true
code-summary: "Show the code"
code-line-numbers: true
code-block-border-left: "#ccc"
code-copy: true
highlight-style: atom-one
## Execution
execute:
echo: true
cache: true
## Rendering
embed-resources: true
crossref:
fig-title: '**Figure**'
fig-labels: arabic
tbl-title: '**Table**'
tbl-labels: arabic
engine: knitr
output_dir: "docs"
documentclass: article
fontsize: 12pt
mainfont: Arial
mathfont: LiberationMono
monofont: DejaVu Sans Mono
classoption: a4paper
bibliography: ../resources/references.bib
---
```{r}
#| label: setup
#| include: false
knitr::opts_chunk$set(cache.lazy = FALSE,
tidy = "styler")
options(tinytex.engine = "xelatex")
```
# Preparations
Load the necessary libraries
```{r}
#| label: libraries
#| output: false
#| eval: true
#| warning: false
#| message: false
#| cache: false
library(tidyverse) #for data wrangling etc
library(rstanarm) #for fitting models in STAN
library(cmdstanr) #for cmdstan
library(brms) #for fitting models in STAN
library(standist) #for exploring distributions
library(coda) #for diagnostics
library(bayesplot) #for diagnostics
library(ggmcmc) #for MCMC diagnostics
library(DHARMa) #for residual diagnostics
library(rstan) #for interfacing with STAN
library(emmeans) #for marginal means etc
library(broom) #for tidying outputs
library(tidybayes) #for more tidying outputs
library(HDInterval) #for HPD intervals
library(ggeffects) #for partial plots
library(broom.mixed) #for summarising models
library(posterior) #for posterior draws
library(ggeffects) #for partial effects plots
library(patchwork) #for multi-panel figures
library(bayestestR) #for ROPE
library(see) #for some plots
library(easystats) #framework for stats, modelling and visualisation
library(geoR) #framework for stats, modelling and visualisation
library(modelsummary) #for data and model summaries
theme_set(theme_grey()) #put the default ggplot theme back
source('helperFunctions.R')
```
# Scenario
@Mitchell-2019 presented a data set in which male guppy fish where
placed individually into long, narrow and shallow tanks that
predominately only permitted the fish to swim in two dimensions. The
activity rate (cumulative distance swam) of each guppy fish was
remotely recorded via tracking software during 20 minute trials
conducted each day for 14 days. The order in which individual guppy
fish were recorded each day was randomised and the time of day of each
trial was recorded.
The researchers were interested in whether the the guppy fish become
habituated to the experimental conditions and altered their activity
patterns over time.
{#fig-guppy width=70%}
The data are in the file
**mitchell.csv** in the **data** folder.
| ID | TOD | Day | Dist_moved |
|------|------|-----|------------|
| A39 | 0.39 | 7 | 3439.93 |
| A3 | 0.39 | 7 | 2795.08 |
| A13 | 0.39 | 7 | 5258.79 |
| \... | \... | | |
: Format of the mitchell.csv data file {#tbl-mitchell .table-condensed}
---------------- ---------------------------------------------------
**ID**: Individual guppy fish ID - Random variable
**TOD**: Time of day of observation as a proportion - Predictor variable
**Day**: Day post entry into tank - Predictor variable
**Dist_moved**: Total distance (cm) moved in 20 min trial - Response variable
---------------- ---------------------------------------------------
: Description of the variables in the mitchell data file {#tbl-mitchell1 .table-condensed}
# Read in the data
```{r readData, results='markdown', eval=TRUE}
mitchell <- read_csv("../data/mitchell.csv", trim_ws = TRUE)
```
# Data preparation
# Exploratory data analysis
Model formula:
$$
y_i/1000 \sim{} \mathcal{Gamma}(\lambda_i)\\
ln(\lambda_i) =\boldsymbol{\beta} \bf{X_i} + \boldsymbol{\gamma} \bf{Z_i}
$$
where $\boldsymbol{\beta}$ and $\boldsymbol{\gamma}$ are vectors of
the fixed and random effects parameters respectively and $\bf{X}$ is
the model matrix representing the overall intercept and effects of day
and time of day on the total distance moved during the trial. $\bf{Z}$
represents a cell means model matrix for the random intercepts
associated with individual guppies.
# Fit the model
```{r}
#| label: process_more
#| results: markup
#| eval: true
#| echo: true
#| cache: false
mitchell_sub <- mitchell |>
## filter(ID == "A63") |>
filter(ID == "C69") |>
droplevels()
```
# Model validation
# Refit model
### stan plots
The `brms` package offers a range of MCMC diagnostics.
Lets start with the MCMC diagnostics.
Of these, we will focus on:
- stan_trace: this plots the estimates of each parameter over the post-warmup
length of each MCMC chain. Each chain is plotted in a different colour, with
each parameter in its own facet. Ideally, each **trace** should just look like
noise without any discernible drift and each of the traces for a specific
parameter should look the same (i.e, should not be displaced above or below
any other trace for that parameter).
The chains appear well mixed and very similar
- stan_acf (auto-correlation function): plots the auto-correlation between successive
MCMC sample lags for each parameter and each chain
There is no evidence of auto-correlation in the MCMC samples
- stan_rhat: Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the
values are to 1, the more the chains have converged. Values greater than 1.05
indicate a lack of convergence. There will be an Rhat value for each
parameter estimated.
All Rhat values are below 1.05, suggesting the chains have converged.
- stan_ess (number of effective samples): the ratio of the number of effective
samples (those not rejected by the sampler) to the number of samples provides
an indication of the effectiveness (and efficiency) of the MCMC sampler.
Ratios that are less than 0.5 for a parameter suggest that the sampler spent
considerable time in difficult areas of the sampling domain and rejected more
than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Ratios all very high.
### pp check
### DHARMa residuals
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot
directly use the `simulateResiduals()` function to generate the simulated
residuals. However, if we are willing to calculate some of the components
yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
**Conclusions:**
- the simulated residuals do suggest that there might be a dispersion issue
- it might be worth exploring either zero-inflation, a negative
binomial model, or including a observation-level random effect.
```{r}
#| label: name
#| results: markup
#| eval: true
#| echo: true
#| cache: false
mitchell.brm3 |> augment() |>
## group_by(ID) |>
reframe(ACF = as.numeric(acf(.resid, plot = FALSE)$acf)) |>
## group_by(ID) |>
mutate(N = 1:n()) |>
## slice(1:12) |>
ungroup() |>
group_by(N) |>
summarise(ACF = mean(ACF)) |>
ggplot(aes(y = ACF, x = N)) +
geom_hline(yintercept = 0) +
geom_segment(aes(yend = 0, xend = N))
mitchell.brm4 |> augment() |>
## group_by(ID) |>
reframe(ACF = as.numeric(acf(.resid, plot = FALSE)$acf)) |>
## group_by(ID) |>
mutate(N = 1:n()) |>
## slice(1:12) |>
ungroup() |>
group_by(N) |>
summarise(ACF = mean(ACF)) |>
ggplot(aes(y = ACF, x = N)) +
geom_hline(yintercept = 0) +
geom_segment(aes(yend = 0, xend = N))
```