bglmm_example3.qmd

---
title: "Bayesian GLMM Part3"
author: "Murray Logan"
date: today
date-format: "DD/MM/YYYY"
format: 
  html:
    ## Format
    theme: [default, ../resources/ws-style.scss]
    css: ../resources/ws_style.css
    html-math-method: mathjax
    ## Table of contents
    toc: true
    toc-float: true
    ## Numbering
    number-sections: true
    number-depth: 3
    ## Layout
    page-layout: full
    fig-caption-location: "bottom"
    fig-align: "center"
    fig-width: 4
    fig-height: 4
    fig-dpi: 72
    tbl-cap-location: top
    ## Code
    code-fold: false
    code-tools: true
    code-summary: "Show the code"
    code-line-numbers: true
    code-block-border-left: "#ccc"
    code-copy: true
    highlight-style: atom-one
    ## Execution
    execute:
      echo: true
      cache: true
    ## Rendering
    embed-resources: true
crossref:
  fig-title: '**Figure**'
  fig-labels: arabic
  tbl-title: '**Table**'
  tbl-labels: arabic
engine: knitr
output_dir: "docs"
documentclass: article
fontsize: 12pt
mainfont: Arial
mathfont: LiberationMono
monofont: DejaVu Sans Mono
classoption: a4paper
bibliography: ../resources/references.bib
---

```{r}
#| label: setup
#| include: false
#| cache: false

knitr::opts_chunk$set(cache.lazy = FALSE,
                      tidy = "styler")
options(tinytex.engine = "xelatex")
```

# Preparations

Load the necessary libraries

```{r}
#| label: libraries
#| output: false
#| eval: true
#| warning: false
#| message: false
#| cache: false

library(tidyverse)  #for data wrangling etc
library(rstanarm)   #for fitting models in STAN
library(cmdstanr)   #for cmdstan
library(brms)       #for fitting models in STAN
library(standist)   #for exploring distributions
library(HDInterval) #for HPD intervals
library(posterior)  #for posterior draws
library(coda)       #for diagnostics
library(bayesplot)  #for diagnostics
library(ggmcmc)     #for MCMC diagnostics
library(rstan)      #for interfacing with STAN
library(emmeans)    #for marginal means etc
library(broom)      #for tidying outputs
library(DHARMa)     #for residual diagnostics
library(tidybayes)  #for more tidying outputs
library(ggeffects)  #for partial plots
library(broom.mixed)#for tidying MCMC outputs
library(patchwork)  #for multiple plots
library(ggridges)   #for ridge plots 
library(bayestestR) #for ROPE
library(see)        #for some plots
library(easystats)     #framework for stats, modelling and visualisation
library(modelsummary)
source('helperFunctions.R')
```

# Scenario

![Starlings](../resources/starlings.jpg){#fig-starlings width="200" height="274"}


```{r}
# You need to capture the same bird again to measure them again to see the difference. Birds(R), month varies within the birds, both situation and month are fixed factors. We are only interested in 4 measured situation. 

# The hierarchy of this   Situation > Bird > Month > Situation by Month (as an interaction) does the change of situation changes over month. The interaction is what we mainly interested in.
```


SITUATION   MONTH   MASS   BIRD
----------- ------- ------ -----------
tree        Nov     78     tree1
..          ..      ..     ..
nest-box    Nov     78     nest-box1
..          ..      ..     ..
inside      Nov     79     inside1
..          ..      ..     ..
other       Nov     77     other1
..          ..      ..     ..
tree        Jan     85     tree1
..          ..      ..     ..

: Format of starling_full.csv data file {#tbl-starling .table-condensed}

--------------- ------------------------------------------------------------------------------
**SITUATION**   Categorical listing of roosting situations (tree, nest-box, inside or other)
**MONTH**       Categorical listing of the month of sampling.
**MASS**        Mass (g) of starlings.
**BIRD**        Categorical listing of individual bird repeatedly sampled.
--------------- ------------------------------------------------------------------------------

: Description of the variables in the starling_full data file {#tbl-starling1 .table-condensed}

# Read in the data

```{r readData, results='markdown', eval=TRUE}
starling <- read_csv('../data/starling_full.csv', trim_ws = TRUE)
```

# Exploratory data analysis

Model formula:
$$
\begin{align}
y_i &\sim{} \mathcal{N}(\mu_i, \sigma^2)\\
\mu_i &= \beta_0 + \boldsymbol{\beta} \bf{X_i} + \boldsymbol{\gamma} \bf{Z_i}\\
\boldsymbol{\gamma} &= \gamma_0\\
\beta_0 &\sim{} \mathcal{N}(0, 100)\\
\beta &\sim{} \mathcal{N}(0, 10)\\
\gamma_0 &\sim{} \mathcal{N}(0, \sigma_1^2)\\
\sigma &\sim{} \mathcal{cauchy}(0, 2)\\
\sigma_1 &\sim{} \mathcal{cauchy}(0, 2)\\
\end{align}
$$


```{r}
#| label: Check the normality
starling |> 
ggplot(aes(y = MASS, x = SITUATION, fill = MONTH)) +
  geom_boxplot()
```

where $\boldsymbol{\beta}$ and $\boldsymbol{\gamma}$ are vectors of the fixed and random effects parameters respectively 
and $\bf{X}$ is the model matrix representing the overall intercept and effects of roosting situation and month on starling mass.
$\bf{Z}$ represents a cell means model matrix for the random intercepts associated with individual birds.


```{r}
starling <- starling |> 
  mutate(BIRD = factor(BIRD), 
         SITUATION = factor(SITUATION),
         MONTH = factor(MONTH, levels = c("Nov", "Jan")))
```


```{r}
#| label: Plot for mixed effect

# Only the things under random effect is good to consider for exploring a slope, Since the hierarchy is SITUATION > BIRD > MONTH good to check for month.

# We group each random within themselves so they each will have separate intercepts which accounts for susceptible have their own slope.

ggplot(starling, aes(y = MASS, x = MONTH, group = BIRD)) + # group -> plotting a separate trend for each of our randoms
  geom_point() +
  geom_line() +
  facet_grid(~SITUATION)
```

# Fit the model

```{r}
starling |> 
  group_by(SITUATION:MONTH) |> 
  summarise(median(MASS),
            mad(MASS))
```

```{r}
#| label: Get priors

priors <- prior(normal(80,2.5), class = 'Intercept') +
  prior(normal(0,10), class = 'b') + # Difference of medians
  prior(student_t(3,0,5), class = 'sigma') +
  prior(student_t(3,0,5), class = 'sd')
```


```{r}
#| label: Formula

starling.form <- bf(MASS  ~ (1|BIRD) + SITUATION * MONTH,
                  family = gaussian) # Intercept (1) how much it varies for bird (1|BIRD)

get_prior(starling.form, data = starling)
```


```{r}
#|label: Running MCMC


starling.brm2 <- brm(starling.form,
                    data = starling,
                    prior = priors,
                    sample_prior = 'only',
                    iter = 5000,
                    warmup = 1000,
                    chains = 3, cores = 3,
                    thin = 5,
                    refresh = 0,
                    backend = 'cmdstanr')
```


::: {.panel-tabset}

:::
    
# MCMC sampling diagnostics

::: {.panel-tabset}


:::


# Model validation 

::: {.panel-tabset}

:::
    

# Partial effects plots 
```{r}
#| label: Plotting Priors
starling.brm2 |> conditional_effects("SITUATION:MONTH") |> plot(points = TRUE)
```

```{r}
starling.brm3 <- update(starling.brm2, sample_prior = "yes",
                        control = list(adapt_delta = 0.99))
```


```{r}
starling.brm3 |> conditional_effects("SITUATION:MONTH") |> plot(points = TRUE)
```


::: {.panel-tabset}

:::
    

# Model investigation 
```{r}
starling.brm3 |> SUYR_prior_and_posterior()
```


```{r}
#| label: Checking the model

starling.brm3 |> get_variables() # the functions below give out first 10 variables by default sooo..

pars <- starling.brm3 |> get_variables () |> str_subset("^b_.*|^sd_.*|^sigma*")

starling.brm3$fit |> stan_trace(pars = pars)
starling.brm3$fit |> stan_ac(pars = pars)
starling.brm3$fit |> stan_rhat(pars = pars)
starling.brm3$fit |> stan_ess(pars = pars)
```

```{r}
#| label: Update priors
priors <- priors + prior(lkj_corr_cholesky(1), class = 'cor')
```


```{r}
priors
```

```{r}
#| label: Update formula to introduce correlation prior

# introduced correlation effect should be smaller in the hierarchy to formula to work, Incorporated variable is: Random Intercept, Random Slope
starling.form <- bf(MASS ~ (MONTH|BIRD) + MONTH*SITUATION,
                   family = gaussian() # adding the interaction of month and bird to introduce a new correlation
                   )
```

```{r}
get_prior(starling.form, data = starling)
```


```{r}
#|label: Running MCMC


starling.brm3a <- brm(starling.form,
                    data = starling,
                    prior = priors,
                    sample_prior = 'only',
                    iter = 5000,
                    warmup = 1000,
                    chains = 3, cores = 3,
                    thin = 10,
                    refresh = 0,
                    control = list(adapt_delta = 0.99),
                    backend = 'cmdstanr')
```

```{r}
starling.brm4 <- update(starling.brm3a, sample_prior = "yes")
```

```{r}
pars <- starling.brm4 |> get_variables () |> str_subset("^b_.*|^sd_.*|^sigma*")
```


```{r}
starling.brm4$fit |> stan_trace(pars = pars)
starling.brm4$fit |> stan_ac(pars = pars)
starling.brm4$fit |> stan_rhat(pars = pars)
starling.brm4$fit |> stan_ess(pars = pars)
```


```{r}
starling.brm4 |>  pp_check(ttype = 'dens_overlay', nsamples = 100)
```

```{r}
starling.resids <- make_brms_dharma_res(starling.brm4, integerResponse = FALSE)
testUniformity(starling.resids)
plotResiduals(starling.resids, quantreg = TRUE) # data set is small so add qunatreg
testDispersion(starling.resids)
```


```{r}
#| label: Check which model is better.

# The one with lower information criteria (looic) is slightly better, it incorporated almost double the amount of parameters which increases complexity therefore even though it covers more area, the model is penalized for the complexity.
loo(starling.brm4)
loo(starling.brm3)
loo_compare(loo(starling.brm3), loo(starling.brm4))
```


```{r}
starling.brm4 |> conditional_effects("SITUATION:MONTH") |> plot(points = TRUE)
```

```{r}
starling.brm4 |> 
  as_draws_df() |> 
  dplyr::select(matches("^b_.*|^sigma$|^sd_.*")) |> 
  summarise_draws(
    median,
    rhat,
    HDInterval::hdi,
    ess_bulk, # We want ess values to be over 1000, more than 1000 EFFECTIVE samples
    ess_tail, # We have around 100s so we need to have mmore iterations if we are gonna thin the data this much
    Pg1 = ~mean(.x >0), # we didnt back transform so any of the probabilities are compared to 0
    Pl1 = ~mean(.x <0)
  )

# For evidence check Pg and Pl

# First check the evidence for interaction, when we check, we see there is no evidence for interactions of SITUATION|MONTH 

# Then we can check the situation or month separately.

# On average they add 9.12 grams on Jan and we have strong evidence for that. The 9.12 is similar within different situations because we have seen no interaction of Month and situation so regardless of situation, the birds have increased mass by ~ 9.12 grams from Nov to Jan.

# variability of change on mass is lower than the noise (sigma)? Individually the birds, their starting masses don't vary among themselves much for situation inside

# sd_BIRD_MONTHJan -> how variable the birds were in response to change in month Jan. meaning having more birds to measure or measuring each bird more than once in a month wouldn't change the outcome. Most of the variability is within the bird not among them.

# b_SITUATION -> Only looking for situations on Nov
#

# Marginilazing -> comparing situation is marginilazing for month -> comparing the situations and averaging over the month. Just looking at the main effect of the situation.
```


```{r}
#| label: Marginilazing of SITUATION and their estimated means

# use pairs to compare them and see the difference between them.


starling.brm4 |> 
  emmeans(~MONTH) |> 
  pairs(reverse = TRUE)

# ACROSS the entire set of situations the difference in weight from Nov to Jan is -9.03

starling.brm4 |> 
  emmeans(~MONTH) |> 
  pairs() |> 
  gather_emmeans_draws() |> 
  dplyr::select(-.chain, -.iteration, -.draw) |> 
  summarise_draws(
    median,
    HDInterval::hdi,
    Pg1 = ~mean(.x >0),
    Pl1 = ~mean(.x <0)
  )
```


```{r}
starling.brm4 |> 
  emmeans(~SITUATION) |> 
  pairs() |> 
  gather_emmeans_draws() |> 
  dplyr::select(-.chain, -.iteration, -.draw) |> 
  summarise_draws(
    median,
    HDInterval::hdi,
    Pg1 = ~mean(.x >0),
    Pl1 = ~mean(.x <0)
  )
```


```{r}
#| label: Making contrast table


# Check the order of your parameters
levels(starling$SITUATION)

# Input the comparison (Ex: Natural vs Artifical)
cmat <- cbind("Natural vs Artifical" = c(-1/2, 1/2, -1/2, 1/2))

starling.brm4 |> 
  emmeans(~SITUATION) |> 
  contrast(method = list(SITUATION = cmat))
```


::: {.panel-tabset}

:::

# Further investigations 

::: {.panel-tabset}