forked from osbili/sandbox
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbglmm_example3.qmd
498 lines (363 loc) · 12.5 KB
/
bglmm_example3.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
---
title: "Bayesian GLMM Part3"
author: "Murray Logan"
date: today
date-format: "DD/MM/YYYY"
format:
html:
## Format
theme: [default, ../resources/ws-style.scss]
css: ../resources/ws_style.css
html-math-method: mathjax
## Table of contents
toc: true
toc-float: true
## Numbering
number-sections: true
number-depth: 3
## Layout
page-layout: full
fig-caption-location: "bottom"
fig-align: "center"
fig-width: 4
fig-height: 4
fig-dpi: 72
tbl-cap-location: top
## Code
code-fold: false
code-tools: true
code-summary: "Show the code"
code-line-numbers: true
code-block-border-left: "#ccc"
code-copy: true
highlight-style: atom-one
## Execution
execute:
echo: true
cache: true
## Rendering
embed-resources: true
crossref:
fig-title: '**Figure**'
fig-labels: arabic
tbl-title: '**Table**'
tbl-labels: arabic
engine: knitr
output_dir: "docs"
documentclass: article
fontsize: 12pt
mainfont: Arial
mathfont: LiberationMono
monofont: DejaVu Sans Mono
classoption: a4paper
bibliography: ../resources/references.bib
---
```{r}
#| label: setup
#| include: false
#| cache: false
knitr::opts_chunk$set(cache.lazy = FALSE,
tidy = "styler")
options(tinytex.engine = "xelatex")
```
# Preparations
Load the necessary libraries
```{r}
#| label: libraries
#| output: false
#| eval: true
#| warning: false
#| message: false
#| cache: false
library(tidyverse) #for data wrangling etc
library(rstanarm) #for fitting models in STAN
library(cmdstanr) #for cmdstan
library(brms) #for fitting models in STAN
library(standist) #for exploring distributions
library(HDInterval) #for HPD intervals
library(posterior) #for posterior draws
library(coda) #for diagnostics
library(bayesplot) #for diagnostics
library(ggmcmc) #for MCMC diagnostics
library(rstan) #for interfacing with STAN
library(emmeans) #for marginal means etc
library(broom) #for tidying outputs
library(DHARMa) #for residual diagnostics
library(tidybayes) #for more tidying outputs
library(ggeffects) #for partial plots
library(broom.mixed)#for tidying MCMC outputs
library(patchwork) #for multiple plots
library(ggridges) #for ridge plots
library(bayestestR) #for ROPE
library(see) #for some plots
library(easystats) #framework for stats, modelling and visualisation
library(modelsummary)
source('helperFunctions.R')
```
# Scenario
{#fig-starlings width="200" height="274"}
```{r}
# You need to capture the same bird again to measure them again to see the difference. Birds(R), month varies within the birds, both situation and month are fixed factors. We are only interested in 4 measured situation.
# The hierarchy of this Situation > Bird > Month > Situation by Month (as an interaction) does the change of situation changes over month. The interaction is what we mainly interested in.
```
SITUATION MONTH MASS BIRD
----------- ------- ------ -----------
tree Nov 78 tree1
.. .. .. ..
nest-box Nov 78 nest-box1
.. .. .. ..
inside Nov 79 inside1
.. .. .. ..
other Nov 77 other1
.. .. .. ..
tree Jan 85 tree1
.. .. .. ..
: Format of starling_full.csv data file {#tbl-starling .table-condensed}
--------------- ------------------------------------------------------------------------------
**SITUATION** Categorical listing of roosting situations (tree, nest-box, inside or other)
**MONTH** Categorical listing of the month of sampling.
**MASS** Mass (g) of starlings.
**BIRD** Categorical listing of individual bird repeatedly sampled.
--------------- ------------------------------------------------------------------------------
: Description of the variables in the starling_full data file {#tbl-starling1 .table-condensed}
# Read in the data
```{r readData, results='markdown', eval=TRUE}
starling <- read_csv('../data/starling_full.csv', trim_ws = TRUE)
```
# Exploratory data analysis
Model formula:
$$
\begin{align}
y_i &\sim{} \mathcal{N}(\mu_i, \sigma^2)\\
\mu_i &= \beta_0 + \boldsymbol{\beta} \bf{X_i} + \boldsymbol{\gamma} \bf{Z_i}\\
\boldsymbol{\gamma} &= \gamma_0\\
\beta_0 &\sim{} \mathcal{N}(0, 100)\\
\beta &\sim{} \mathcal{N}(0, 10)\\
\gamma_0 &\sim{} \mathcal{N}(0, \sigma_1^2)\\
\sigma &\sim{} \mathcal{cauchy}(0, 2)\\
\sigma_1 &\sim{} \mathcal{cauchy}(0, 2)\\
\end{align}
$$
```{r}
#| label: Check the normality
starling |>
ggplot(aes(y = MASS, x = SITUATION, fill = MONTH)) +
geom_boxplot()
```
where $\boldsymbol{\beta}$ and $\boldsymbol{\gamma}$ are vectors of the fixed and random effects parameters respectively
and $\bf{X}$ is the model matrix representing the overall intercept and effects of roosting situation and month on starling mass.
$\bf{Z}$ represents a cell means model matrix for the random intercepts associated with individual birds.
```{r}
starling <- starling |>
mutate(BIRD = factor(BIRD),
SITUATION = factor(SITUATION),
MONTH = factor(MONTH, levels = c("Nov", "Jan")))
```
```{r}
#| label: Plot for mixed effect
# Only the things under random effect is good to consider for exploring a slope, Since the hierarchy is SITUATION > BIRD > MONTH good to check for month.
# We group each random within themselves so they each will have separate intercepts which accounts for susceptible have their own slope.
ggplot(starling, aes(y = MASS, x = MONTH, group = BIRD)) + # group -> plotting a separate trend for each of our randoms
geom_point() +
geom_line() +
facet_grid(~SITUATION)
```
# Fit the model
```{r}
starling |>
group_by(SITUATION:MONTH) |>
summarise(median(MASS),
mad(MASS))
```
```{r}
#| label: Get priors
priors <- prior(normal(80,2.5), class = 'Intercept') +
prior(normal(0,10), class = 'b') + # Difference of medians
prior(student_t(3,0,5), class = 'sigma') +
prior(student_t(3,0,5), class = 'sd')
```
```{r}
#| label: Formula
starling.form <- bf(MASS ~ (1|BIRD) + SITUATION * MONTH,
family = gaussian) # Intercept (1) how much it varies for bird (1|BIRD)
get_prior(starling.form, data = starling)
```
```{r}
#|label: Running MCMC
starling.brm2 <- brm(starling.form,
data = starling,
prior = priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
refresh = 0,
backend = 'cmdstanr')
```
::: {.panel-tabset}
:::
# MCMC sampling diagnostics
::: {.panel-tabset}
:::
# Model validation
::: {.panel-tabset}
:::
# Partial effects plots
```{r}
#| label: Plotting Priors
starling.brm2 |> conditional_effects("SITUATION:MONTH") |> plot(points = TRUE)
```
```{r}
starling.brm3 <- update(starling.brm2, sample_prior = "yes",
control = list(adapt_delta = 0.99))
```
```{r}
starling.brm3 |> conditional_effects("SITUATION:MONTH") |> plot(points = TRUE)
```
::: {.panel-tabset}
:::
# Model investigation
```{r}
starling.brm3 |> SUYR_prior_and_posterior()
```
```{r}
#| label: Checking the model
starling.brm3 |> get_variables() # the functions below give out first 10 variables by default sooo..
pars <- starling.brm3 |> get_variables () |> str_subset("^b_.*|^sd_.*|^sigma*")
starling.brm3$fit |> stan_trace(pars = pars)
starling.brm3$fit |> stan_ac(pars = pars)
starling.brm3$fit |> stan_rhat(pars = pars)
starling.brm3$fit |> stan_ess(pars = pars)
```
```{r}
#| label: Update priors
priors <- priors + prior(lkj_corr_cholesky(1), class = 'cor')
```
```{r}
priors
```
```{r}
#| label: Update formula to introduce correlation prior
# introduced correlation effect should be smaller in the hierarchy to formula to work, Incorporated variable is: Random Intercept, Random Slope
starling.form <- bf(MASS ~ (MONTH|BIRD) + MONTH*SITUATION,
family = gaussian() # adding the interaction of month and bird to introduce a new correlation
)
```
```{r}
get_prior(starling.form, data = starling)
```
```{r}
#|label: Running MCMC
starling.brm3a <- brm(starling.form,
data = starling,
prior = priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 10,
refresh = 0,
control = list(adapt_delta = 0.99),
backend = 'cmdstanr')
```
```{r}
starling.brm4 <- update(starling.brm3a, sample_prior = "yes")
```
```{r}
pars <- starling.brm4 |> get_variables () |> str_subset("^b_.*|^sd_.*|^sigma*")
```
```{r}
starling.brm4$fit |> stan_trace(pars = pars)
starling.brm4$fit |> stan_ac(pars = pars)
starling.brm4$fit |> stan_rhat(pars = pars)
starling.brm4$fit |> stan_ess(pars = pars)
```
```{r}
starling.brm4 |> pp_check(ttype = 'dens_overlay', nsamples = 100)
```
```{r}
starling.resids <- make_brms_dharma_res(starling.brm4, integerResponse = FALSE)
testUniformity(starling.resids)
plotResiduals(starling.resids, quantreg = TRUE) # data set is small so add qunatreg
testDispersion(starling.resids)
```
```{r}
#| label: Check which model is better.
# The one with lower information criteria (looic) is slightly better, it incorporated almost double the amount of parameters which increases complexity therefore even though it covers more area, the model is penalized for the complexity.
loo(starling.brm4)
loo(starling.brm3)
loo_compare(loo(starling.brm3), loo(starling.brm4))
```
```{r}
starling.brm4 |> conditional_effects("SITUATION:MONTH") |> plot(points = TRUE)
```
```{r}
starling.brm4 |>
as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma$|^sd_.*")) |>
summarise_draws(
median,
rhat,
HDInterval::hdi,
ess_bulk, # We want ess values to be over 1000, more than 1000 EFFECTIVE samples
ess_tail, # We have around 100s so we need to have mmore iterations if we are gonna thin the data this much
Pg1 = ~mean(.x >0), # we didnt back transform so any of the probabilities are compared to 0
Pl1 = ~mean(.x <0)
)
# For evidence check Pg and Pl
# First check the evidence for interaction, when we check, we see there is no evidence for interactions of SITUATION|MONTH
# Then we can check the situation or month separately.
# On average they add 9.12 grams on Jan and we have strong evidence for that. The 9.12 is similar within different situations because we have seen no interaction of Month and situation so regardless of situation, the birds have increased mass by ~ 9.12 grams from Nov to Jan.
# variability of change on mass is lower than the noise (sigma)? Individually the birds, their starting masses don't vary among themselves much for situation inside
# sd_BIRD_MONTHJan -> how variable the birds were in response to change in month Jan. meaning having more birds to measure or measuring each bird more than once in a month wouldn't change the outcome. Most of the variability is within the bird not among them.
# b_SITUATION -> Only looking for situations on Nov
#
# Marginilazing -> comparing situation is marginilazing for month -> comparing the situations and averaging over the month. Just looking at the main effect of the situation.
```
```{r}
#| label: Marginilazing of SITUATION and their estimated means
# use pairs to compare them and see the difference between them.
starling.brm4 |>
emmeans(~MONTH) |>
pairs(reverse = TRUE)
# ACROSS the entire set of situations the difference in weight from Nov to Jan is -9.03
starling.brm4 |>
emmeans(~MONTH) |>
pairs() |>
gather_emmeans_draws() |>
dplyr::select(-.chain, -.iteration, -.draw) |>
summarise_draws(
median,
HDInterval::hdi,
Pg1 = ~mean(.x >0),
Pl1 = ~mean(.x <0)
)
```
```{r}
starling.brm4 |>
emmeans(~SITUATION) |>
pairs() |>
gather_emmeans_draws() |>
dplyr::select(-.chain, -.iteration, -.draw) |>
summarise_draws(
median,
HDInterval::hdi,
Pg1 = ~mean(.x >0),
Pl1 = ~mean(.x <0)
)
```
```{r}
#| label: Making contrast table
# Check the order of your parameters
levels(starling$SITUATION)
# Input the comparison (Ex: Natural vs Artifical)
cmat <- cbind("Natural vs Artifical" = c(-1/2, 1/2, -1/2, 1/2))
starling.brm4 |>
emmeans(~SITUATION) |>
contrast(method = list(SITUATION = cmat))
```
::: {.panel-tabset}
:::
# Further investigations
::: {.panel-tabset}