diff --git a/UserManual/NimbleUserManual.pdf b/UserManual/NimbleUserManual.pdf index a01ba8e2a..ad3e3230c 100644 Binary files a/UserManual/NimbleUserManual.pdf and b/UserManual/NimbleUserManual.pdf differ diff --git a/UserManual/NimbleUserManual_files/figure-html/linearRegressionGraph-1.png b/UserManual/NimbleUserManual_files/figure-html/linearRegressionGraph-1.png index 71af49c5c..d9349123b 100644 Binary files a/UserManual/NimbleUserManual_files/figure-html/linearRegressionGraph-1.png and b/UserManual/NimbleUserManual_files/figure-html/linearRegressionGraph-1.png differ diff --git a/UserManual/NimbleUserManual_files/figure-html/mcmcPump-1.png b/UserManual/NimbleUserManual_files/figure-html/mcmcPump-1.png index f9f3720f1..1ff2075de 100644 Binary files a/UserManual/NimbleUserManual_files/figure-html/mcmcPump-1.png and b/UserManual/NimbleUserManual_files/figure-html/mcmcPump-1.png differ diff --git a/UserManual/NimbleUserManual_files/figure-html/mcmcPump-2.png b/UserManual/NimbleUserManual_files/figure-html/mcmcPump-2.png index d5f6441fa..ce29e1b67 100644 Binary files a/UserManual/NimbleUserManual_files/figure-html/mcmcPump-2.png and b/UserManual/NimbleUserManual_files/figure-html/mcmcPump-2.png differ diff --git a/UserManual/NimbleUserManual_files/figure-html/mcmcPump2-1.png b/UserManual/NimbleUserManual_files/figure-html/mcmcPump2-1.png index bf87e74f6..1994c8fe4 100644 Binary files a/UserManual/NimbleUserManual_files/figure-html/mcmcPump2-1.png and b/UserManual/NimbleUserManual_files/figure-html/mcmcPump2-1.png differ diff --git a/UserManual/NimbleUserManual_files/figure-html/mcmcPump2-2.png b/UserManual/NimbleUserManual_files/figure-html/mcmcPump2-2.png index 0bd277e57..b4c313402 100644 Binary files a/UserManual/NimbleUserManual_files/figure-html/mcmcPump2-2.png and b/UserManual/NimbleUserManual_files/figure-html/mcmcPump2-2.png differ diff --git a/UserManual/NimbleUserManual_files/figure-html/plotPump-1.png b/UserManual/NimbleUserManual_files/figure-html/plotPump-1.png index fa10854b5..dc237bc82 100644 Binary files a/UserManual/NimbleUserManual_files/figure-html/plotPump-1.png and b/UserManual/NimbleUserManual_files/figure-html/plotPump-1.png differ diff --git a/UserManual/cha-AD.html b/UserManual/cha-AD.html index a16c5096a..48216ad99 100644 --- a/UserManual/cha-AD.html +++ b/UserManual/cha-AD.html @@ -6,7 +6,7 @@ Chapter 16 Automatic Derivatives | NimbleUserManual.knit - + @@ -30,7 +30,7 @@ - + @@ -64,7 +64,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } @@ -126,7 +126,6 @@ div.csl-bib-body { } div.csl-entry { clear: both; - margin-bottom: 0em; } .hanging div.csl-entry { margin-left:2em; @@ -281,7 +280,7 @@
  • 7.11.4 Customized log-likelihood evaluations: RW_llFunction sampler
  • 7.12 Detailed MCMC example: litters
  • -
  • 7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
  • +
  • 7.13 Comparing different MCMCs with compareMCMCs
  • 7.14 Running MCMC chains in parallel
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models -
    -

    16.7.2.1 Advanced topic: more about constantNodes and updateNodes

    +
    +

    16.6.2.1 Advanced topic: more about constantNodes and updateNodes

    In most cases, you can obtain constantNodes and updateNodes from makeModelDerivsInfo without knowing exactly what they mean. But for advanced uses and possible debugging needs, let’s explore these arguments in more detail. The purpose of constantNodes and updateNodes is to tell the AD system about all nodes that will be needed for taped model calculations. Specifically:

    • updateNodes and constantNodes are vectors of nodes names that together comprise any nodes that are necessary for model$calculate(calc_nodes) but are:

      @@ -1287,7 +1134,7 @@

      16.7.2.1 Advanced topic: more abo
    • constantNodes includes node names satisfying those conditions whose values will be baked into the tape (will not change) until the next call with reset=TRUE.

    To fix ideas, say that we want derivatives with respect to ran_eff[1] for calculations of it and the data that depend on it, including any deterministic nodes. In other words, wrt will be ran_eff[1], and calc_nodes will be:

    -
    model$getDependencies('ran_eff[1]')
    +
    model$getDependencies('ran_eff[1]')
    ##  [1] "ran_eff[1]"                                                                             
     ##  [2] "lifted_exp_oPintercept_plus_beta_times_X_oBi_comma_j_cB_plus_ran_eff_oBi_cB_cP_L7[1, 1]"
     ##  [3] "lifted_exp_oPintercept_plus_beta_times_X_oBi_comma_j_cB_plus_ran_eff_oBi_cB_cP_L7[1, 2]"
    @@ -1305,16 +1152,16 @@ 

    16.7.2.1 Advanced topic: more abo

    The function makeModelDerivsInfo inspects the model and determines the usual needs for updateNodes and constantNodes. By default, data nodes are put in constantNodes. Use dataAsConstantNodes = FALSE in makeModelDerivsInfo if you want them put in updateNodes.

    Note that a deterministic node in calc_nodes will have its value calculated as part of the operations recorded in the AD tape, so it does not need to be included in updateNodes or constantNodes.

    As usual in model-generic programming in NIMBLE, be aware of lifted nodes and their implications. Suppose in the Poisson GLMM we had used a precision parameterization for the random effects, with the changes in this code snippet:

    -
      precision ~ dgamma(0.01, 0.01)
    -  # <other lines>
    -    ran_eff[i] ~ dnorm(0, tau = precision)
    +
      precision ~ dgamma(0.01, 0.01)
    +  # <other lines>
    +    ran_eff[i] ~ dnorm(0, tau = precision)

    This would result in a lifted node for the standard deviation, calculated as 1/sqrt(precision). That lifted node is what would actually be used in dnorm for each ran_eff[i]. Now if ran_eff[1] is in wrt_nodes, the lifted node (but not precision) will be in updateNodes (as determined by makeModelDerivsInfo). If you then change the value of precision, you must be sure to calculate the lifted node before obtaining derivatives. Otherwise the value of the lifted node will correspond to the old value of precision. These considerations are not unique to AD but rather are part of model-generic programming (see chapter 15).

    An example of model-generic programming to update any lifted nodes that depend on precision would be:

    -
    model$calculate(model$getDependencies('precision', determOnly=TRUE))
    +
    model$calculate(model$getDependencies('precision', determOnly=TRUE))

    In Method 1 above, NIMBLE automatically uses makeModelDerivsInfo based on the code nimDerivs(model$calculate(<args>), <args>). However, in Method 2, when model$calculate(<args>) is used in a method such as run, then a call to nimDerivs(run(<args>), <args>) requires the model, updateNodes and constantNodes to be provided. Hence, the two functions (run and derivsRun) must be written in coordination.

    -
    -

    16.7.2.2 When do you need to reset a tape with model calculations?

    +
    +

    16.6.2.2 When do you need to reset a tape with model calculations?

    The additional rule for when tapes need to be reset based on model calculations is:

    • when values of any constantNodes are changed. Usually these will only be data nodes.
    • @@ -1322,8 +1169,8 @@

      16.7.2.2 When do you need to rese

    -
    -

    16.8 Parameter transformations

    +
    +

    16.7 Parameter transformations

    Many algorithms that in some way explore a parameter space are best used in an unconstrained parameter space. For example, there are a bunch of optimization methods provided in R’s optim, but only one (L-BFGS-B) allows constraints on the parameter space. Similarly, HMC is implemented to work in an unconstrained parameter space.

    NIMBLE provides a nimbleFunction to automatically create a transformation from original parameters to unconstrained parameters and the inverse transformation back to the original parameters. Denoting x as an original parameter and g(x) as the transformed parameter, the cases handled include:

  • 7.12 Detailed MCMC example: litters
  • -
  • 7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
  • +
  • 7.13 Comparing different MCMCs with compareMCMCs
  • 7.14 Running MCMC chains in parallel
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models
  • 7.12 Detailed MCMC example: litters
  • -
  • 7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
  • +
  • 7.13 Comparing different MCMCs with compareMCMCs
  • 7.14 Running MCMC chains in parallel
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models
      @@ -446,33 +456,25 @@
    • V Automatic Derivatives in NIMBLE
    • 16 Automatic Derivatives
    • 17 Example: maximum likelihood estimation using optim with gradients from nimDerivs.
    • References
    • @@ -495,7 +497,13 @@

      Chapter 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature

      -

      The NIMBLE algorithm library includes a suite of sequential Monte Carlo (particle filtering) algorithms (particle filters, PMCMC, iterated particle filter, and ensemble Kalman filter), Monte Carlo expectation maximization (MCEM) for maximum likelihood estimation, Laplace approximation and adaptive Gauss-Hermite quadrature, and k-fold cross-validation.

      +

      The NIMBLE algorithm library includes a growing set of non-MCMC algorithms including likelihood-based algorithms. These include:

      +
        +
      • A suite of sequential Monte Carlo (particle filtering) algorithms (particle filters, particle MCMC, iterated particle filter, and ensemble Kalman filter);
      • +
      • Monte Carlo expectation maximization (MCEM) for maximum likelihood estimation;
      • +
      • Laplace approximation and adaptive Gauss-Hermite quadrature for maximum likelihood estimation of Laplace/AGHQ-approximated marginal likelihood; and
      • +
      • nested approximation of Bayesian models using INLA-like methods.
      • +

      8.1 Particle filters / sequential Monte Carlo and iterated filtering

      As of Version 0.10.0 of NIMBLE, all of NIMBLE’s sequential Monte Carlo/particle filtering functionality lives in the nimbleSMC package, described in Michaud et al. (2021). Please load this package before trying to use these algorithms.

      @@ -504,168 +512,168 @@

      8.1.1 Filtering algorithmsNIMBLE includes algorithms for four different types of sequential Monte Carlo (also known as particle filters), which can be used to sample from the latent states and approximate the log likelihood of a state-space model. These include the bootstrap filter, the auxiliary particle filter, the Liu-West filter, and the ensemble Kalman filter. The iterated filtering version 2 (IF2) is a related method for maximum-likelihood estimation. Each of these is built with the eponymous functions buildBootstrapFilter, buildAuxiliaryFilter, buildLiuWestFilter, buildEnsembleKF, and buildIteratedFilter2. Each method requires setup arguments model and nodes; the latter should be a character vector specifying latent model nodes. In addition, each method can be customized using a control list argument. Details on the control options and specifics of the algorithms can be found in the help pages for the functions.

      Once built, each filter can be run by specifying the number of particles. Each filter has a modelValues object named mvEWSamples that is populated with equally-weighted samples from the posterior distribution of the latent states (and in the case of the Liu-West filter, the posterior distribution of the top level parameters as well) as the filter is run. The bootstrap, auxiliary, and Liu-West filters, as well as the IF2 method, also have another modelValues object, mvWSamples. This has unequally-weighted samples from the posterior distribution of the latent states, along with weights for each particle. In addition, the bootstrap and auxiliary particle filters return estimates of the log-likelihood of the given state-space model.

      We first create a linear state-space model to use as an example for our particle filter algorithms.

      -
      # Building a simple linear state-space model. 
      -# x is latent space, y is observed data
      -timeModelCode <- nimbleCode({
      -  x[1] ~ dnorm(mu_0, 1)
      -  y[1] ~ dnorm(x[1], 1)
      -  for(i in 2:t){
      -    x[i] ~ dnorm(x[i-1] * a + b, 1)
      -    y[i] ~ dnorm(x[i] * c, 1)
      -  }
      -  
      -  a ~ dunif(0, 1)
      -  b ~ dnorm(0, 1)
      -  c ~ dnorm(1,1)
      -  mu_0 ~ dnorm(0, 1)
      -})
      -
      -# simulate some data
      -t <- 25; mu_0 <- 1
      -x <- rnorm(1 ,mu_0, 1)
      -y <- rnorm(1, x, 1)
      -a <- 0.5; b <- 1; c <- 1
      -for(i in 2:t){
      -  x[i] <- rnorm(1, x[i-1] * a + b, 1)
      -  y[i] <- rnorm(1, x[i] * c, 1)
      -}
      -
      -# build the model
      -rTimeModel <- nimbleModel(timeModelCode, constants = list(t = t), 
      -                          data <- list(y = y), check = FALSE )
      -
      -# Set parameter values and compile the model
      -rTimeModel$a <- 0.5
      -rTimeModel$b <- 1
      -rTimeModel$c <- 1
      -rTimeModel$mu_0 <- 1
      -
      -cTimeModel <- compileNimble(rTimeModel)
      +
      # Building a simple linear state-space model. 
      +# x is latent space, y is observed data
      +timeModelCode <- nimbleCode({
      +  x[1] ~ dnorm(mu_0, 1)
      +  y[1] ~ dnorm(x[1], 1)
      +  for(i in 2:t){
      +    x[i] ~ dnorm(x[i-1] * a + b, 1)
      +    y[i] ~ dnorm(x[i] * c, 1)
      +  }
      +  
      +  a ~ dunif(0, 1)
      +  b ~ dnorm(0, 1)
      +  c ~ dnorm(1,1)
      +  mu_0 ~ dnorm(0, 1)
      +})
      +
      +# simulate some data
      +t <- 25; mu_0 <- 1
      +x <- rnorm(1 ,mu_0, 1)
      +y <- rnorm(1, x, 1)
      +a <- 0.5; b <- 1; c <- 1
      +for(i in 2:t){
      +  x[i] <- rnorm(1, x[i-1] * a + b, 1)
      +  y[i] <- rnorm(1, x[i] * c, 1)
      +}
      +
      +# build the model
      +rTimeModel <- nimbleModel(timeModelCode, constants = list(t = t), 
      +                          data <- list(y = y), check = FALSE )
      +
      +# Set parameter values and compile the model
      +rTimeModel$a <- 0.5
      +rTimeModel$b <- 1
      +rTimeModel$c <- 1
      +rTimeModel$mu_0 <- 1
      +
      +cTimeModel <- compileNimble(rTimeModel)

      8.1.1.1 Bootstrap filter

      Here is an example of building and running the bootstrap filter.

      -
      # Build bootstrap filter
      -rBootF <- buildBootstrapFilter(rTimeModel, "x", 
      -                               control = list(thresh = 0.8, saveAll = TRUE, 
      -                                              smoothing = FALSE))
      -# Compile filter   
      -cBootF <- compileNimble(rBootF,project = rTimeModel)
      -# Set number of particles
      -parNum <- 5000
      -# Run bootstrap filter, which returns estimate of model log-likelihood
      -bootLLEst <- cBootF$run(parNum)
      -# The bootstrap filter can also return an estimate of the effective 
      -# sample size (ESS) at each time point
      -bootESS <- cBootF$returnESS()
      +
      # Build bootstrap filter
      +rBootF <- buildBootstrapFilter(rTimeModel, "x", 
      +                               control = list(thresh = 0.8, saveAll = TRUE, 
      +                                              smoothing = FALSE))
      +# Compile filter   
      +cBootF <- compileNimble(rBootF,project = rTimeModel)
      +# Set number of particles
      +parNum <- 5000
      +# Run bootstrap filter, which returns estimate of model log-likelihood
      +bootLLEst <- cBootF$run(parNum)
      +# The bootstrap filter can also return an estimate of the effective 
      +# sample size (ESS) at each time point
      +bootESS <- cBootF$returnESS()

      8.1.1.2 Auxiliary particle filter

      Next, we provide an example of building and running the auxiliary particle filter. Note that a filter cannot be built on a model that already has a filter specialized to it, so we create a new copy of our state space model first.

      -
      # Copy our state-space model for use with the auxiliary filter
      -auxTimeModel <- rTimeModel$newModel(replicate = TRUE)
      -compileNimble(auxTimeModel)
      -# Build auxiliary filter
      -rAuxF <- buildAuxiliaryFilter(auxTimeModel, "x", 
      -                              control = list(thresh = 0.5, saveAll = TRUE))
      -# Compile filter   
      -cAuxF <- compileNimble(rAuxF,project = auxTimeModel)
      -# Run auxiliary filter, which returns estimate of model log-likelihood
      -auxLLEst <- cAuxF$run(parNum)
      -# The auxiliary filter can also return an estimate of the effective 
      -# sample size (ESS) at each time point
      -auxESS <- cAuxF$returnESS()
      +
      # Copy our state-space model for use with the auxiliary filter
      +auxTimeModel <- rTimeModel$newModel(replicate = TRUE)
      +compileNimble(auxTimeModel)
      +# Build auxiliary filter
      +rAuxF <- buildAuxiliaryFilter(auxTimeModel, "x", 
      +                              control = list(thresh = 0.5, saveAll = TRUE))
      +# Compile filter   
      +cAuxF <- compileNimble(rAuxF,project = auxTimeModel)
      +# Run auxiliary filter, which returns estimate of model log-likelihood
      +auxLLEst <- cAuxF$run(parNum)
      +# The auxiliary filter can also return an estimate of the effective 
      +# sample size (ESS) at each time point
      +auxESS <- cAuxF$returnESS()

      8.1.1.3 Liu and West filter

      Now we give an example of building and running the Liu and West filter, which can sample from the posterior distribution of top-level parameters as well as latent states. Note that the Liu-West filter ofen performs poorly and is provided primarily for didactic purposes. The Liu and West filter accepts an additional params argument, specifying the top-level parameters to be sampled.

      -
      # Copy model
      -LWTimeModel <- rTimeModel$newModel(replicate = TRUE)
      -compileNimble(LWTimeModel)
      -# Build Liu-West filter, also 
      -# specifying which top level parameters to estimate
      -rLWF <- buildLiuWestFilter(LWTimeModel, "x", params = c("a", "b", "c"),
      -                           control = list(saveAll = FALSE))     
      +
      # Copy model
      +LWTimeModel <- rTimeModel$newModel(replicate = TRUE)
      +compileNimble(LWTimeModel)
      +# Build Liu-West filter, also 
      +# specifying which top level parameters to estimate
      +rLWF <- buildLiuWestFilter(LWTimeModel, "x", params = c("a", "b", "c"),
      +                           control = list(saveAll = FALSE))     
      ## Warning in buildLiuWestFilter(LWTimeModel, "x", params = c("a", "b", "c"), :
       ## The Liu-West filter ofen performs poorly and is provided primarily for didactic
       ## purposes.
      -
      # Compile filter   
      -cLWF <- compileNimble(rLWF,project = LWTimeModel)
      -# Run Liu-West filter
      -cLWF$run(parNum)
      +
      # Compile filter   
      +cLWF <- compileNimble(rLWF,project = LWTimeModel)
      +# Run Liu-West filter
      +cLWF$run(parNum)

      8.1.1.4 Ensemble Kalman filter

      Next we give an example of building and running the ensemble Kalman filter, which can sample from the posterior distribution of latent states.

      -
      # Copy model
      -ENKFTimeModel <- rTimeModel$newModel(replicate = TRUE)
      -compileNimble(ENKFTimeModel)
      -# Build and compile ensemble Kalman filter
      -rENKF <- buildEnsembleKF(ENKFTimeModel, "x",
      -                         control = list(saveAll = FALSE))  
      -cENKF <- compileNimble(rENKF,project = ENKFTimeModel)
      -# Run ensemble Kalman filter
      -cENKF$run(parNum)
      +
      # Copy model
      +ENKFTimeModel <- rTimeModel$newModel(replicate = TRUE)
      +compileNimble(ENKFTimeModel)
      +# Build and compile ensemble Kalman filter
      +rENKF <- buildEnsembleKF(ENKFTimeModel, "x",
      +                         control = list(saveAll = FALSE))  
      +cENKF <- compileNimble(rENKF,project = ENKFTimeModel)
      +# Run ensemble Kalman filter
      +cENKF$run(parNum)

      Once each filter has been run, we can extract samples from the posterior distribution of our latent states as follows:

      -
      # Equally-weighted samples (available from all filters)
      -bootEWSamp <- as.matrix(cBootF$mvEWSamples) # alternative: as.list
      -auxEWSamp <- as.matrix(cAuxF$mvEWSamples)
      -LWFEWSamp <- as.matrix(cLWF$mvEWSamples)
      -ENKFEWSamp <- as.matrix(cENKF$mvEWSamples)
      -
      -# Unequally-weighted samples, along with weights (available 
      -# from bootstrap, auxiliary, and Liu and West filters)
      -bootWSamp <- as.matrix(cBootF$mvWSamples, "x")
      -bootWts <- as.matrix(cBootF$mvWSamples, "wts")
      -auxWSamp <-  as.matrix(xAuxF$mvWSamples, "x")
      -auxWts <- as.matrix(cAuxF$mvWSamples, "wts")
      -
      -# Liu and West filter also returns samples 
      -# from posterior distribution of top-level parameters:
      -aEWSamp <- as.matrix(cLWF$mvEWSamples, "a")
      +
      # Equally-weighted samples (available from all filters)
      +bootEWSamp <- as.matrix(cBootF$mvEWSamples) # alternative: as.list
      +auxEWSamp <- as.matrix(cAuxF$mvEWSamples)
      +LWFEWSamp <- as.matrix(cLWF$mvEWSamples)
      +ENKFEWSamp <- as.matrix(cENKF$mvEWSamples)
      +
      +# Unequally-weighted samples, along with weights (available 
      +# from bootstrap, auxiliary, and Liu and West filters)
      +bootWSamp <- as.matrix(cBootF$mvWSamples, "x")
      +bootWts <- as.matrix(cBootF$mvWSamples, "wts")
      +auxWSamp <-  as.matrix(xAuxF$mvWSamples, "x")
      +auxWts <- as.matrix(cAuxF$mvWSamples, "wts")
      +
      +# Liu and West filter also returns samples 
      +# from posterior distribution of top-level parameters:
      +aEWSamp <- as.matrix(cLWF$mvEWSamples, "a")

      8.1.1.5 Iterated filtering 2 (IF2)

      The IF2 method (Ionides et al. 2015) accomplishes maximum likelihood estimation using a scheme wherein both latent states and parameters are represented by particles that are weighted and resampled during the iterations. Iterations include perturbations to the parameter particles following a schedule of decreasing magnitude to yield convergence to the MLE.

      Here we apply IF2 to Nile River flow data, specifying a changepoint in the year the Aswan Dam was constructed, as the dam altered river flows.

      -
      library(FKF)
      -
      -flowCode <- nimbleCode({
      -    for(t in 1:n)
      -        y[t] ~ dnorm(x[t], sd = sigmaMeasurements)
      -    x[1] ~ dnorm(x0, sd = sigmaInnovations)    
      -    for(t in 2:n)
      -        x[t] ~ dnorm((t-1==28)*meanShift1899 + x[t-1], sd = sigmaInnovations)
      -    logSigmaInnovations ~ dnorm(0, sd = 100)
      -    logSigmaMeasurements ~ dnorm(0, sd = 100)
      -    sigmaInnovations <- exp(logSigmaInnovations)
      -    sigmaMeasurements <- exp(logSigmaMeasurements)
      -    x0 ~ dnorm(1120, var = 100)
      -    meanShift1899 ~ dnorm(0, sd = 100)
      -})
      -
      -flowModel <- nimbleModel(flowCode, data = list(y = Nile),
      -                 constants = list(n = length(Nile)),
      -                 inits = list(logSigmaInnovations = log(sd(Nile)),
      -                              logSigmaMeasurements = log(sd(Nile)),
      -                              meanShift1899 = -100))
      +
      library(FKF)
      +
      +flowCode <- nimbleCode({
      +    for(t in 1:n)
      +        y[t] ~ dnorm(x[t], sd = sigmaMeasurements)
      +    x[1] ~ dnorm(x0, sd = sigmaInnovations)    
      +    for(t in 2:n)
      +        x[t] ~ dnorm((t-1==28)*meanShift1899 + x[t-1], sd = sigmaInnovations)
      +    logSigmaInnovations ~ dnorm(0, sd = 100)
      +    logSigmaMeasurements ~ dnorm(0, sd = 100)
      +    sigmaInnovations <- exp(logSigmaInnovations)
      +    sigmaMeasurements <- exp(logSigmaMeasurements)
      +    x0 ~ dnorm(1120, var = 100)
      +    meanShift1899 ~ dnorm(0, sd = 100)
      +})
      +
      +flowModel <- nimbleModel(flowCode, data = list(y = Nile),
      +                 constants = list(n = length(Nile)),
      +                 inits = list(logSigmaInnovations = log(sd(Nile)),
      +                              logSigmaMeasurements = log(sd(Nile)),
      +                              meanShift1899 = -100))

      Note that the prior distributions for the parameters are not used by IF2, except possibly to obtain boundaries of valid parameter values (not the case here).

      Now we build the filter, specifying user-controlled standard deviations (in this case the same as the perturbation sigma values) for use in generating the initial particles for the parameters via the control list.

      -
      filter <- buildIteratedFilter2(model = flowModel,
      -                           nodes = 'x',
      -                           params = c('logSigmaInnovations',
      -                                      'logSigmaMeasurements',
      -                                      'meanShift1899'),
      -                           baselineNode = 'x0',
      -                           control = list(sigma = c(0.1, 0.1, 5),
      -                                          initParamSigma = c(0.1, 0.1, 5)))
      -cFlowModel <- compileNimble(flowModel)
      -cFilter  <- compileNimble(filter, project = flowModel)
      +
      filter <- buildIteratedFilter2(model = flowModel,
      +                           nodes = 'x',
      +                           params = c('logSigmaInnovations',
      +                                      'logSigmaMeasurements',
      +                                      'meanShift1899'),
      +                           baselineNode = 'x0',
      +                           control = list(sigma = c(0.1, 0.1, 5),
      +                                          initParamSigma = c(0.1, 0.1, 5)))
      +cFlowModel <- compileNimble(flowModel)
      +cFilter  <- compileNimble(filter, project = flowModel)

      We now run the algorithm with 1000 particles for 100 iterations with the schedule parameter equal to 0.2.

      In addition to the estimates, we can extract the values of the log-likelihood, the estimates and the standard deviation of the parameter particles as they evolve over the iterations, in order to assess convergence.

      -
      set.seed(1)
      -est <- cFilter$run(m = 1000, niter = 100, alpha = 0.2)
      -
      -cFilter$estimates[95:100,] ## Last 5 iterations of parameter values
      +
      set.seed(1)
      +est <- cFilter$run(m = 1000, niter = 100, alpha = 0.2)
      +
      +cFilter$estimates[95:100,] ## Last 5 iterations of parameter values
      ##              [,1]     [,2]      [,3]
       ## [1,]  0.013306153 4.822613 -269.7837
       ## [2,] -0.013961938 4.857470 -271.4965
      @@ -673,22 +681,22 @@ 

      8.1.1.5 Iterated filtering 2 (IF2 ## [4,] 0.015598044 4.851961 -272.3889 ## [5,] -0.005571944 4.842719 -272.6390 ## [6,] 0.042982317 4.837347 -271.1800

      -
      cFilter$logLik[90:100] ## Last 5 iterations of log likelihood values
      +
      cFilter$logLik[90:100] ## Last 5 iterations of log likelihood values
      ##  [1] -627.0497 -626.9570 -626.9856 -626.9954 -626.7448 -626.7521 -626.8472
       ##  [8] -626.7398 -626.9869 -627.1354 -626.6648

      Comparing to use of the the Kalman Filter from the FKF package, we see the log-likelihood is fairly similar:

      -
      dtpred <- matrix(0, ncol = length(Nile))
      -dtpred[28] <- est[3]
      -ct <- matrix(0)
      -Zt <- Tt <- matrix(1)
      -
      -fkfResult <- fkf(HHt = matrix(exp(2*est[1])),
      -                 GGt = matrix(exp(2*est[2])),
      -                 yt = rbind(Nile),
      -                 a0 = 1120,
      -                 P0 = matrix(100),
      -                 dt = dtpred, ct = ct, Zt = Zt, Tt = Tt)
      -fkfResult$logLik
      +
      dtpred <- matrix(0, ncol = length(Nile))
      +dtpred[28] <- est[3]
      +ct <- matrix(0)
      +Zt <- Tt <- matrix(1)
      +
      +fkfResult <- fkf(HHt = matrix(exp(2*est[1])),
      +                 GGt = matrix(exp(2*est[2])),
      +                 yt = rbind(Nile),
      +                 a0 = 1120,
      +                 P0 = matrix(100),
      +                 dt = dtpred, ct = ct, Zt = Zt, Tt = Tt)
      +fkfResult$logLik
      ## [1] -626.5521

      @@ -723,12 +731,12 @@

      8.1.2 Particle MCMC (PMCMC)timeModel in the previous section will serve as an example for the use of our PMCMC sampler. Here we use the identity matrix as our proposal covariance matrix.

      -
      timeConf <- configureMCMC(rTimeModel, nodes = NULL) # empty MCMC configuration
      -
      -# Add random walk PMCMC sampler with particle number optimization.
      -timeConf$addSampler(target = c("a", "b", "c", "mu_0"), type = "RW_PF_block",
      -                    control = list(propCov= diag(4), adaptScaleOnly = FALSE,
      -                                 latents = "x", pfOptimizeNparticles = TRUE))
      +
      timeConf <- configureMCMC(rTimeModel, nodes = NULL) # empty MCMC configuration
      +
      +# Add random walk PMCMC sampler with particle number optimization.
      +timeConf$addSampler(target = c("a", "b", "c", "mu_0"), type = "RW_PF_block",
      +                    control = list(propCov= diag(4), adaptScaleOnly = FALSE,
      +                                 latents = "x", pfOptimizeNparticles = TRUE))

      The type = "RW_PF" and type = "RW_PF*block" samplers default to using a bootstrap filter. The adapatation control parameters adaptive, adaptInterval, and @@ -807,20 +815,20 @@

      8.2 Monte Carlo Expectation Maxim

      We will revisit the pump example to illustrate the use of NIMBLE’s MCEM algorithm.

      -
      pump <- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts,
      -                    data = pumpData, inits = pumpInits, check = FALSE,
      -                    buildDerivs=TRUE)
      -
      -Cpump <- compileNimble(pump)
      -
      -# build an MCEM algorithm with ascent-based convergence criterion
      -pumpMCEM <- buildMCEM(model = pump,
      -                      latentNodes = "theta")
      -# The latentNodes="theta" would be determined by default but is shown for clarity.
      -CpumpMCEM <- compileNimble(pumpMCEM, project=pump)
      +
      pump <- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts,
      +                    data = pumpData, inits = pumpInits, check = FALSE,
      +                    buildDerivs=TRUE)
      +
      +Cpump <- compileNimble(pump)
      +
      +# build an MCEM algorithm with ascent-based convergence criterion
      +pumpMCEM <- buildMCEM(model = pump,
      +                      latentNodes = "theta")
      +# The latentNodes="theta" would be determined by default but is shown for clarity.
      +CpumpMCEM <- compileNimble(pumpMCEM, project=pump)

      The first argument to buildMCEM, model, is a NIMBLE model. At the moment, the model provided cannot be part of another MCMC sampler.

      -
      pumpMLE <- CpumpMCEM$findMLE(initM = 1000)
      +
      pumpMLE <- CpumpMCEM$findMLE(initM = 1000)
      ##   [Note] Iteration Number: 1.
       ##   [Note] Current number of MCMC iterations: 1000.
       ##   [Note] Parameter Estimates: 
      @@ -864,7 +872,7 @@ 

      8.2 Monte Carlo Expectation Maxim ## 0.829192 ## 1.28844 ## [Note] Convergence Criterion: 0.000914589.

      -
      pumpMLE$par
      +
      pumpMLE$par
      ## [1] 0.8291919 1.2884366

      For this model, we can check the result by analytically integrating over the latent nodes (possible for this model but often not feasible), which gives @@ -883,15 +891,16 @@

      8.2.1 Estimating the asymptotic c MLE, using the method of Louis (1982). Arguments to this method allow control over whether to generate a new MCMC sample for this purpose and other details.

      Continuing the above example, here is the covariance matrix:

      -
      pumpCov <- CpumpMCEM$vcov(pumpMLE$par)
      -pumpCov
      +
      pumpCov <- CpumpMCEM$vcov(pumpMLE$par)
      +pumpCov
      ##           [,1]      [,2]
       ## [1,] 0.1295558 0.2263726
       ## [2,] 0.2263726 0.6758945

      -
      -

      8.3 Laplace approximation and adaptive Gauss-Hermite quadrature

      +
      +

      8.3 Laplace approximation and adaptive Gauss-Hermite quadrature

      +

      As of NIMBLE version 1.4.0, NIMBLE’s Laplace and AGHQ approximation live in the nimbleQuad package. Please load that package before trying to use these algorithms.

      Many hierarchical models include continuous random effects that must be integrated over to obtain the (marginal) likelihood of the parameters given the data. Laplace approximation and adaptive Gauss-Hermite quadrature (AGHQ) are @@ -899,15 +908,560 @@

      8.3 Laplace approximation and ada a single quadrature point (the conditional mode of the random effects).

      NIMBLE provides these algorithms via buildLaplace and buildAGHQuad (the former simply calls the latter), which take advantage of the automatic -differentiation features introduced in version 1.0.0. Laplace approximation is -introduced in section 16.2, while details for both can be found -by help(buildLaplace).

      +differentiation features introduced in version 1.0.0.

      +

      Next we will show how to use nimble’s Laplace approximation, which uses derivatives internally, to get maximum (approximate) likelihood estimates for a GLMM model. +Additional details on Laplace and on AGHQ can be found by running help(buildLaplace).

      +

      Laplace approximation is equivalent to first-order adaptive Gauss-Hermite quadrature, which is also available (via buildAGHQ and runAGHQ), although here we will focus on Laplace approximation only. In this context, Laplace approximation approximates the integral over continuous random effects needed to calculate the likelihood. Hence, it gives an approximate likelihood (often quite accurate) that can be used for maximum likelihood estimation. Note that the Laplace approximation uses second derivatives, and the gradient of the Laplace approximation (used for finding the MLE efficiently) uses third derivatives. These are described in detail by Skaug and Fournier (2006) and Fournier et al. (2012).

      +
      +

      8.3.1 GLMM example

      +

      We’ll re-introduce the simple Poisson Generalized Linear Mixed Model (GLMM) example model from 7.11.2.1 and use Laplace approximation on it. There will be 10 groups (i) of 5 observations (j) each. Each observation has a covariate, X, and each group has a random effect ran_eff. Here is the model code:

      +
      model_code <- nimbleCode({
      +  # priors 
      +  intercept ~ dnorm(0, sd = 100)
      +  beta ~ dnorm(0, sd = 100)
      +  sigma ~ dhalfflat()
      +  # random effects and data  
      +  for(i in 1:10) {
      +    # random effects
      +    ran_eff[i] ~ dnorm(0, sd = sigma)
      +    for(j in 1:5) {
      +      # data
      +      y[i,j] ~ dpois(exp(intercept + beta*X[i,j] + ran_eff[i]))
      +    }
      +  }
      +})
      +

      Note that we changed the prior on sigma to avoid having an upper bound. Prior distributions are not included in maximum likelihood using the Laplace approximation but do indicate the range of valid values. We recommend caution in using priors for variance component parameters (standard deviations, variances, precisions) that have a finite upper bound (e.g., sigma ~ dunif(0, 100)), because the probit transformation applied in that case may result in poor optimization performance.

      +

      We’ll simulate some values for X.

      +
      set.seed(123)
      +X <- matrix(rnorm(50), nrow = 10)
      +

      Next, we build the model, including buildDerivs=TRUE, which is needed to use derivatives with a model. Internally Laplace/AGHQ approximation use derivatives from NIMBLE’s automatic differentiation (AD) system to build the approximation to the marginal likelihood.

      +
      model <- nimbleModel(model_code, constants = list(X = X), calculate = FALSE,
      +                     buildDerivs = TRUE) # Here is the argument needed for AD.
      +

      As preparation for the Laplace examples below, we need to finish setting up the GLMM. We could have provided data in the call to nimbleModel, but instead we will simulate it using the model itself. Specifically, we will set parameter values, simulate data values, and then set those as the data to use.

      +
      model$intercept <- 0
      +model$beta <- 0.2
      +model$sigma <- 0.5
      +model$calculate() # This will return NA because the model is not fully initialized.
      +
      ## [1] NA
      +
      model$simulate(model$getDependencies('ran_eff'))
      +model$calculate() # Now the model is fully initialized: all nodes have valid values.
      +
      ## [1] -78.44085
      +
      model$setData('y') # Now the model has y marked as data, with values from simulation.
      +

      Finally, we will make a compiled version of the model.

      +
      Cmodel <- compileNimble(model)
      +
      +
      +

      8.3.2 Using Laplace approximation

      +

      To create a Laplace approximation specialized to the parameters of interest for this model, we use the nimbleFunction buildLaplace. For many models, the setup code in buildLaplace will automatically determine the random effects to be integrated over and the associated nodes to calculate. In fact, if you omit the parameter nodes, it will assume that all top-level nodes in the model should be treated as parameters. If fine-grained control is needed, these various sets of nodes can be input directly into buildLaplace. To see what default handling of nodes is being done for your model, use setupMargNodes with the same node inputs as buildLaplace.

      +
      glmm_laplace <- buildLaplace(model, paramNodes = c('intercept','beta','sigma'))
      +Cglmm_laplace <- compileNimble(glmm_laplace, project = model)
      +

      With the compiled Laplace approximation, we can now find the MLE and related information such as standard errors.

      +
      results <- runLaplace(Cglmm_laplace)
      +results$summary
      +
      ## $params
      +##             estimate  stdError
      +## intercept -0.1491944 0.2464880
      +## beta       0.1935212 0.1467230
      +## sigma      0.5703362 0.2066517
      +## 
      +## $randomEffects
      +##                estimate  stdError
      +## ran_eff[1]  -0.33711373 0.4305831
      +## ran_eff[2]  -0.02964535 0.3987838
      +## ran_eff[3]   0.40575212 0.3858675
      +## ran_eff[4]   1.04768889 0.3779772
      +## ran_eff[5]  -0.36731650 0.4290568
      +## ran_eff[6]   0.26907207 0.3863272
      +## ran_eff[7]  -0.54950702 0.4654196
      +## ran_eff[8]  -0.11864461 0.4175452
      +## ran_eff[9]   0.10006643 0.3926128
      +## ran_eff[10] -0.04411292 0.3971147
      +## 
      +## $vcov
      +##              intercept         beta        sigma
      +## intercept  0.060756345 -0.002691117 -0.014082707
      +## beta      -0.002691117  0.021527641 -0.005098532
      +## sigma     -0.014082707 -0.005098532  0.042704916
      +## 
      +## $logLik
      +## [1] -63.44875
      +## 
      +## $df
      +## [1] 3
      +## 
      +## $originalScale
      +## [1] TRUE
      +

      One of output elements is the maximized log likelihood (logLik), which is useful for model comparison.

      +

      buildLaplace actually offers several choices in how computations are done, differing in how they use double taping for derivatives or not. In some cases one or another choice might be more efficient. See help(buildLaplace) if you want to explore it further.

      +

      Finally, let’s confirm that it worked by comparing to results from package glmmTMB. In this case, nimble’s Laplace approximation is faster than glmmTMB (on the machine used here), but that is not the point of this example. Here our interest is in checking that nimble’s Laplace approximation worked correctly in a case where we have an established tool such as glmmTMB.

      +
      library(glmmTMB)
      +y <- as.numeric(model$y) # Re-arrange inputs for call to glmmTMB
      +X <- as.numeric(X)
      +group <- rep(1:10, 5)
      +data <- as.data.frame(cbind(X,y,group))
      +tmb_fit <- glmmTMB(y ~ X + (1 | group), family = poisson, data = data)
      +summary(tmb_fit)
      +
      ##  Family: poisson  ( log )
      +## Formula:          y ~ X + (1 | group)
      +## Data: data
      +## 
      +##      AIC      BIC   logLik deviance df.resid 
      +##    132.9    138.6    -63.4    126.9       47 
      +## 
      +## Random effects:
      +## 
      +## Conditional model:
      +##  Groups Name        Variance Std.Dev.
      +##  group  (Intercept) 0.3253   0.5703  
      +## Number of obs: 50, groups:  group, 10
      +## 
      +## Conditional model:
      +##             Estimate Std. Error z value Pr(>|z|)
      +## (Intercept)  -0.1492     0.2465  -0.605    0.545
      +## X             0.1935     0.1467   1.319    0.187
      +
      logLik(tmb_fit)
      +
      ## 'log Lik.' -63.44875 (df=3)
      +

      The results match within numerical tolerance typical of optimization problems. Specifically, the coefficients for (Intercept) and X match nimble’s Intercept and beta, the random effects standard deviation for group matches nimble’s sigma, and the standard errors match.

      +
      +
      +

      8.3.3 Using the Laplace approximation methods directly

      +

      If one wants finer grain control over using the approximation, one can use the methods provided by buildLaplace. These include calculating the Laplace approximation for some input parameter values, calculating its gradient, and maximizing the Laplace-approximated likelihood. Here we’ll show some of these steps.

      +
      # Get the Laplace approximation for one set of parameter values.
      +Cglmm_laplace$calcLaplace(c(0, 0, 1)) 
      +
      ## [1] -65.57246
      +
      Cglmm_laplace$gr_Laplace(c(0, 0, 1)) # Get the corresponding gradient.
      +
      ## [1] -1.866840  8.001648 -4.059555
      +
      MLE <- Cglmm_laplace$findMLE(c(0, 0, 1)) # Find the (approximate) MLE.
      +MLE$par     # MLE parameter values
      +
      ## [1] -0.1491983  0.1935269  0.5703413
      +
      MLE$value   # MLE log likelihood value
      +
      ## [1] -63.44875
      +

      The final outputs show the MLE for intercept, beta, and sigma, followed by the maximum (approximate) likelihood.

      +

      More information about the MLE can be obtained in two ways. The summary method +can give estimated random effects and standard errors as well as the variance-covariance matrix +for the parameters and/or the random effects. The summaryLaplace function +returns similar information but with names included in a more useful way. Here is some example code (results not shown):

      +
      Cglmm_laplace$summary(MLE)$randomEffects$estimate
      +
      ##  [1] -0.33711487 -0.02964301  0.40575572  1.04769223 -0.36731868  0.26907375
      +##  [7] -0.54951160 -0.11864177  0.10006955 -0.04411124
      +
      summaryLaplace(Cglmm_laplace, MLE)$params
      +
      ##             estimate  stdError
      +## intercept -0.1491983 0.2464895
      +## beta       0.1935269 0.1467230
      +## sigma      0.5703413 0.2066531
      +
      +
      +

      8.3.4 Changing the optimization methods

      +

      When finding the MLE via Laplace approximation or adaptive Gauss-Hermite quadrature (AGHQ), there are two numerical optimizations: (1) maximizing the joint log-likelihood of random effects and data given a set of parameter values to construct the approximation to the marginal log-likelihood at the given parameter values, and (2) maximizing the approximation to the marginal log-likelihood over the parameter values. Optimization (1) is the “inner” optimization and optimization (2) is the “outer” optimization.

      +

      Finding the MLE via Laplace approximation may be sensitive to the optimization methods used, in particular the choice of optimizer for the inner optimization, and the “BFGS” optimizer available through optim() may not perform well for inner optimization.

      +

      As of version 1.3.0, the default choices for both the inner and outer optimization use R’s nlminb optimizer. +Users can choose a different optimizer for both of the optimizations.

      +

      To change the inner or outer optimizers, one can use the innerOptimMethod and outerOptimMethod elements of the control list argument to buildLaplace. One can modify various settings that control the behavior of the inner and outer optimizers via control as well. See help(buildLaplace) for more details.

      +

      Once a Laplace approximation is built, one can use updateSettings to modify the choices of optimizers and various settings that control the behavior of the inner and outer optimizers (see help(buildLaplace) for details).

      +

      By default, NIMBLE provides various optimization methods available through R’s optim() as well as R’s nlminb method and the BOBYQA method from the nloptr package (by specifying bobyqa). Users can also provide their own optimization function in R that they can then use with Laplace approximation. User optimization functions must have a particular set and order of arguments and must first be registered with NIMBLE via nimOptimMethod. See help(nimOptim) for more details.

      +

      Here’s an example of setting up the Newton method optimizer from the TMB package as the inner optimizer for use with NIMBLE’s Laplace approximation. (Note that NIMBLE and TMB have distinct AD systems and Laplace approximation implementations; here we simply use the TMB::newton optimization function.)

      +
      library(TMB)
      +library(Matrix)
      +
      +## Create an R wrapper function that has the interface needed for NIMBLE
      +## and wraps the optimizer of interest.
      +nimbleTMBnewton <- function(par, fn, gr, he, lower, upper, control, hessian) {
      +  ## Wrap `he` as return value needs to be of class `dsCMatrix`.
      +  he_matrix <- function(p) Matrix(he(p), doDiag = FALSE, sparse = TRUE) 
      +  invalid <- function(x) is.null(x) || is.na(x) || is.infinite(x)
      +  if(invalid(control$trace)) control$trace <- 1
      +  if(invalid(control$maxit)) control$maxit <- 100
      +  if(invalid(control$reltol)) control$reltol <- 1e-8
      +  res <- newton(par, fn, gr, he_matrix,
      +                trace = control$trace, maxit = control$maxit, tol = control$reltol)
      +  ## Additional arguments (e.g., `alpha` and `tol10`) can be hard-coded in `newton()` call.
      +  ans <- list(
      +    ## What is handled in the return is fairly particular, so often needs conversion
      +    ## from a given method such as TMB::newton.
      +    par = res$par,
      +    value = res$value,
      +    counts = c(0, 0, 0), # Counts of fn/gr/he calls, but these are not counted by TMB::newton.
      +    evaluations = res$iterations,
      +    hessian = NULL, # TMB::newton gives a `dsCMatrix` but we need a base R matrix.
      +    message = "ran by TMB::newton",
      +    convergence = 0 # TMB::newton does not return a convergence code so give 0 (converged).
      +  )
      +  return(ans)
      +}
      +## Register the optimizer with NIMBLE.
      +nimOptimMethod("nimbleTMBnewton", nimbleTMBnewton)
      +
      +## Use the optimizer for the inner optimization when finding the Laplace MLE.
      +glmm_laplace <- buildLaplace(model, c('intercept','beta','sigma'),
      +                  control = list(innerOptimMethod = "nimbleTMBnewton"))
      +
      +

      +
      +

      8.4 Nested approximation methods

      +

      NIMBLE version 1.4.0 introduces a nested approximation method that provides approximate posterior inference using methodology similar to the well-known INLA approach Martins et al. (2013), implemented in the R-INLA package and to the related methods for extended Gaussian latent models (EGLMs) of Stringer, Brown, and Stafford (2023), implemented in the aghq R package.

      +

      Such nested approximations build on Laplace approximation, which provides an approximate marginal posterior for model (hyper)parameters, integrating (marginalizing) over latent nodes. Then instead of maximizing the approximation, one approximates the marginal posterior of the (hyper)parameters on a carefully-chosen set of points. Inference for individual (hyper)parameters is done by numerical approximation, numerical integration, or sampling from an approximation to the marginal posterior. Inference for the latent nodes is done via numerical integration or via sampling from a mixture (over the hyperparameter points) of multivariate normal distributions.

      +

      Our implementation in NIMBLE borrows heavily from the INLA and EGLM approaches.

      +

      Here we list some of the similarities and differences from INLA and the EGLM approach (aghq package):

      +
        +
      • Like EGLM, we use automatic differentiation to calculate the Laplace-approximated marginal likelihood.
      • +
      • Like EGLM, we take the latent nodes to be only the latent stochastic parameters in the model, without including the additive predictor values as done in INLA.
      • +
      • For marginal inference on a chosen univariate (hyper)parameter we provide the univariate asymmetric Gaussian approximation used by INLA and also (for increased accuracy at additional computational expense) numerical integration via AGHQ as in EGLM.
      • +
      • For joint inference on the (hyper)parameters we provide simulation from the joint asymmetric Gaussian approximation as done in INLA.
      • +
      • For inference on the latent nodes, we provide joint simulation from a multivariate normal mixture over the (hyper)parameter grid points as done in EGLM and also available in INLA. Unlike in INLA, we do not provide univariate latent inference using deterministic nested Laplace approximation. The simulation-based approach may not be as accurate, but it allows for joint inference, including inference on quantities that depend on more than one latent node.
      • +
      • Unlike either EGLM or INLA, latent nodes are not required to have a joint normal distribution, though accuracy may be less when the latent nodes have other distributions.
      • +
      • For latent nodes whose conditional distributions factor into univariate conditionally independent sets, the Laplace approximation is a product of univariate approximations, and one can instead use NIMBLE’s AGHQ approximation for higher accuracy.
      • +
      • We allow the user to choose the grid used for the (hyper)parameters. By default for \(d>2\) parameters, we use the CCD grid used by INLA, but one can choose to use the AGHQ grid as used in EGLM or provide one’s own grid. In the future we may provide additional grids, such as sparse grids.
      • +
      +
      +

      8.4.1 Overview of the methodology

      +

      NIMBLE’s nested approximation consists of several pieces, with different options that allow a user to choose how the approximation is done. We briefly describe the pieces here. For simplicity, we refer to the (hyper)parameters simply as “parameters”.

      +
      +

      8.4.1.1 Marginalization over the latent nodes

      +

      We approximately marginalize (integrate) over the latent nodes to approximate the marginal joint distribution of the parameters. This uses Laplace approximation and is sometimes referred to as the “inner” marginalization. The Laplace approximation is computed using the gradient and Hessian with respect to the latent nodes for a given set of parameter values.

      +

      In the case that the conditional distributions of the latent nodes factor into univariate conditionally-independent sets of nodes (conditional on the parameters), this approximation by default uses the product of univariate Laplace approximations and in NIMBLE can be made more accurate by using AGHQ with more than one quadrature point.

      +
      +
      +

      8.4.1.2 Approximating the marginal parameter density on a grid

      +

      In order to simulate from the posterior distribution for the latent nodes (or to estimate the marginal likelihood), one needs to evaluate the joint parameter posterior density on a grid of points.

      +

      NIMBLE primarily offers the options of using a CCD grid (as used by INLA) or an AGHQ grid (as used in the EGLM approach). While the AGHQ grid is expected to be more accurate, the CCD grid uses fewer points and is therefore less computationally intensive.

      +

      For \(d <= 2\) NIMBLE defaults to the AGHQ grid and otherwise uses the CCD grid.

      +

      NIMBLE also allows users to provide their own grid.

      +
      +
      +

      8.4.1.3 Joint inference for latent nodes

      +

      NIMBLE provides the ability to simulate the latent nodes from a mixture of multivariate Gaussian distributions, mixing over the parameter values at the grid points discussed above. The multivariate Gaussian distribution is based on the Laplace approximation for the latent nodes, which provides the mean and variance conditional on the parameter value at the grid point. The weights in the mixture are based on the Laplace-approximated marginal density (for the parameters and data, jointly), with stratified sampling to reduce variance.

      +

      INLA adjusts the latent node marginals via a Gaussian copula to correspond with a skew-normal approximation [TODO: check on terminology here]. We have not implemented such an approximation so no adjustment is done in NIMBLE.

      +
      +
      +

      8.4.1.4 Univariate inference for parameters

      +

      NIMBLE offers two approaches for univariate marginal inference for individual parameters. The first is a computationally-efficient integration-free method that mimics that used by INLA (Martins et al. 2013). This uses a joint asymmetric Gaussian distribution approximation as the marginal joint posterior of the parameters and allows one to calculate the density at individual evaluation points for the univariate marginal. NIMBLE calculates the univariate density on a fine grid of evaluation points, from which one can build a spline-based approximation to the marginal density that can be used to compute moments and quantiles for univariate inference.

      +

      The second, more accurate approach, is to use \(d-1\)-dimensional AGHQ to integrate over the Laplace-approximated joint parameter marginal distribution. This can greatly improve accuracy, but can be computationally expensive unless \(d\) is quite small (e.g., 2-4), or even if the number of latent node elements is quite large (which makes the Laplace approximation expensive). Because of the expense, we only take this approach if requested by the user, and we allow the user to choose the specific parameter(s) for which they want to estimate marginals with this approach.

      +
      +
      +

      8.4.1.5 Joint inference for parameters

      +

      If one is interested in joint inference for the parameters (or inference on functions of more than one parameter), NIMBLE provides the ability to simulate from the asymmetric multivariate Gaussian distribution approximation to the marginal joint distribution of the parameters that is used by INLA (Martins et al. 2013). The approximation is done in the transformed space after transforming the parameters to be unconstrained (this can sometimes reduce the dimension of the parameter vector such as with the Dirichlet, Wishart, and LKJ distributions). In the transformed space, a two-piece split normal distribution (which allows for skew in both directions) is used for each univariate component after rotating the transformed parameters based on the Hessian at the maximum to account for the correlation structure.

      +

      As also done in INLA, we use a Gaussian copula to transform the simulated parameter values, in a univariate fashion, to match the univariate marginal distributions estimated either from the integration-free or AGHQ approaches.

      +
      +
      +

      8.4.1.6 Approximate marginal likelihood

      +

      The marginal likelihood for a model integrates over all parameters (including both ‘parameters’ discussed above and latent nodes). This can be useful for model selection, but is not well-defined if any prior distributions are improper and is not reliable for diffuse prior distributions.

      +

      In NIMBLE’s nested approximation, the marginal likelihood is approximated either based on the approximate Gaussian distribution for the parameters used by INLA or via AGHQ using the AGHQ grid weights and associated Laplace-approximated parameter density values.

      +
      +
      +

      8.4.1.7 Determining latent nodes and parameters

      +

      By default, NIMBLE’s nested approximation will try to automatically determine the node sets, selecting random effects and regression fixed effects (coefficients) as the “latent nodes” and other unknown quantities (e.g., variance components and parameters of random effects distributions) as “parameters”.

      +

      Given the computations involved, it is best to have the number of parameters be relatively small, such as fewer than 10.

      +

      Note that the treatment of fixed effects differs between the nested approximation and NIMBLE’s Laplace approximation. In Laplace approximation, fixed effects are treated as parameters of interest and maximized with respect to, rather than being marginalized over. In the nested approximation, they are marginalized over and inference on them can be done based on simulated values. One advantage of grouping fixed effects with random effects is that one can make simulation-based inference on quantities that depend on both sets of effects, such a regression-style linear predictors.

      +

      It is also possible to choose which nodes go in which set (Section 8.4.2.2. One might choose to include fixed effects in the parameter set, akin to how Laplace approximation works. Another situation where one might configure this manually is if random effects in a model are specified in a “non-centered” parameterization such as:

      +

      \[ +\eta_i = \mu + \sigma b_i \\ +b_i \sim \mathcal{N}(0, 1) +\]

      +

      In this parameterization the random effects, \(b_i\), do not depend on any parameters because \(\mu\) and \(\sigma\) are involved directly in the linear predictor, \(\eta_i\). This stands in contrast to the common centered parameterization, with \(b_i \sim \mathcal{N}(\mu, \sigma)\). Because each \(b_i\) does not depend on any parameters, NIMBLE’s nested approximation node determination algorithm would by default put the \(b_i\) nodes into the parameter set, which in general would result in very slow computation (because there would be a large number of parameters).

      +

      One further note is that it can be advantageous computationally to have fixed effects in the parameter set. This can sometimes allow NIMBLE to set up a product of lower-dimensional Laplace approximations instead of one higher-dimensional Laplace approximation. However, the benefit of the product trades off with the higher-dimensionality of the parameter vector, which increases computation. How these trade off in a particular modeling context can be worth experimentation.

      +
      +
      +

      8.4.1.8 Computational bottlenecks

      +

      To sample the latent nodes, the algorithm needs to find the weights and the parameters of the multivariate normal approximation to the latent nodes at each point in the grid of parameters. For a large number of parameters this can be expensive, because Laplace approximation is done at many grid points. For a large number of latent node elements, even a single Laplace approximation at a single set of parameter values can be expensive. Computing the parameters of this approximation involves optimization over a space whose dimension is the number of latent node elements, as well as computation of the Hessian at the maximum. Furthermore, for inference, one then needs to simulate from the high-dimensional multivariate normal.

      +

      To approximate the univariate marginals for the parameters via AGHQ, this requires \(d-1\)-dimensional AGHQ, which can be expensive, because the number of quadrature points grows as \((d-1)^k\) where \(k\) is the number of Gauss-Hermite quadrature grid points in one dimension. Often this would be chosen to be small, such as 3 or 5, but even then the computations increase rapidly in \(d\).

      +
      +
      +
      +

      8.4.2 Using NIMBLE’s nested approximation

      +

      Next we’ll give example usage of NIMBLE’s nested approximation. Further details are available in the usual R help content for the various functions.

      +

      The core functions are buildNestedApprox and runNestedApprox. These are analagous to the “build” and “run” functions for MCMC and for Laplace.

      +
        +
      • buildNestedApprox sets up the approximation for a model of interest and allows the user to control various aspects of how the approximation is done. It returns an uncompiled nested approximation algorithm.
      • +
      • runNestedApprox runs the basic steps of the approximation, giving initial univariate marginal parameter inference (and optionally allowing the user to request latent node or parameter samples). It returns a nested approximation (nestedApprox) object.
      • +
      • Various additional functions can be applied to the nested approximation object to do further components of the approximation, including: +
          +
        • improveParamMarginals: use AGHQ to improve marginal parameter inference relative to the default inference based on the asymmetric Gaussian approximation.
        • +
        • sampleLatents: sample from the approximate joint distribution of the latent nodes (the mixture over parameter grid points of multivariate normal distributions).
        • +
        • sampleParams: sample from the asymmetric Gaussian approximation for the parameters.
        • +
        • calcMarginalLogLikImproved: use AGHQ to improve estimation of the marginal likelihood relative to the default estimate based on the asymmetric Gaussian approximation.
        • +
        • {d,e,q,r}marginal and plotMarginal: estimate the density (d), expectations (e), and quantiles (q) for a univariate marginal and sample from the marginal (r) and plot the marginal density.
        • +
      • +
      +
      +

      8.4.2.1 Example use

      +

      [This is a normal-normal case so one can marginalize exactly and do inference (MCMC or MLE or otherwise) on the analytically-marginalized model. We could look for a difference example, such as the E. cervi example from our trainings.]

      +

      [I think we probably want to shift to parameterizing in terms of std. dev. for interpretability of output and better choice of priors]

      +

      We’ll use the penicillin example from the faraway package, which has data on penicillin production as a function of treatment (four levels) and blend (five levels). The treatment is considered as a fixed effect (note the constant, large variance/small precision for b[i]) while the blend/block is considered as a random effect. Note that because of the normal likelihood and normal priors for the latent nodes, the conditional distribution for the latent nodes given the data and hyperparameters is a multivariate normal, so the Laplace approximation in this case is exact. In many uses of nested approximation, the likelihood is not normal, but the latent node distribution is.

      +
      data(penicillin, package="faraway")
      +
      +code <- nimbleCode({
      +    for(i in 1:n) {
      +        mu[i] <- inprod(b[1:nTreat], x[i, 1:nTreat]) + re[blend[i]]
      +        y[i] ~ dnorm(mu[i], tau = Tau)
      +    }
      +    Tau ~ dgamma(1, 5e-05)
      +    Tau_re ~ dgamma(1, 5e-05)
      +    for( i in 1:nTreat ){ b[i] ~ dnorm(0, tau = 0.001) }
      +    for( i in 1:nBlend ){ re[i] ~ dnorm(0, tau = Tau_re) }
      +})
      +X <- model.matrix(~treat, data = penicillin)
      +data = list(y = penicillin$yield)
      +constants = list(nTreat = 4, nBlend = 5, n = nrow(penicillin),
      +                 x = X, blend = as.numeric(penicillin$blend))
      +inits <- list(Tau = 1, Tau_re = 1, b = c(mean(data$y), rep(0,3)), re = rep(0,5))
      +
      +model <- nimbleModel(code, data = data, constants = constants,
      +                 inits = inits, buildDerivs = TRUE)
      +comp_model <- compileNimble(model)
      +

      The prior distributions for the precision parameters, Tau and Tau_re, are chosen to match INLA’s default priors, and for a real analysis, more careful consideration of these would be worthwhile.

      +

      Given a NIMBLE model, we can build (and compile) our nested approximation.

      +
      library(nimbleQuad)
      +
      ## 
      +## Attaching package: 'nimbleQuad'
      +
      ## The following objects are masked from 'package:nimble':
      +## 
      +##     buildAGHQ, buildLaplace, runAGHQ, runLaplace, summaryAGHQ,
      +##     summaryLaplace
      +
      approx <- buildNestedApprox(model = model)
      +
      ## Building nested posterior approximation for the following node sets:
      +##   - parameter nodes: Tau, Tau_re
      +##   - latent nodes: b (4 elements), re (5 elements)
      +##   with AGHQ grid for the parameters and Laplace approximation for the latent nodes.
      +
      ## Building AGHQ/Laplace approximation.
      +
      comp_approx <- compileNimble(approx, project = model)
      +
      ## Compiling
      +##   [Note] This may take a minute.
      +##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
      +

      Note that NIMBLE has automatically put the two variance components into the parameters and the fixed and random effects into the latent nodes. As described below, one can change the node sets if desired by passing paramNodes and/or latentNodes.

      +

      Because there are only two parameter nodes, an AGQH grid for the parameters is used by default.

      +

      Next we run our nested approximation, getting back initial inference on the parameters.

      +
      # To prefer fixed rather than scientific notation for easier viewing.
      +options(scipen = 2)  
      +result <- runNestedApprox(comp_approx)
      +

      For small models this initial step generally won’t take too long, but even for large models or a large number of parameters, this step will generally be faster than the steps shown below because the asymmetric Gaussian distribution approximation for the parameters can be computed quickly, only requiring maximization and computation of the Hessian for the Laplace approximation (that said that could take some time when there is a large number of latent nodes) followed by some simple calculations.

      +

      Note that given the vague priors for the parameters used here, the marginal likelihood is probably not useful.

      +

      Now, let’s refine our estimation of the parameters by using AGHQ in place of the asymmetric Gaussian approximation. That does \(d-1\) dimensional AGHQ (simply 1-d AGHQ) for each parameter requested (by default all of the parameters).

      +
      result$improveParamMarginals(nodes = 'Tau_re')
      +
      ## Calculating inner AGHQ/Laplace approximation at (5) marginal points with 3 quadrature grid points (one dot per grid point): (1)...(2)...(3)...(4)...(5)...
      +
      ## Model (hyper)parameters: 
      +##                  mean             sd         2.5%           25%            50%
      +## Tau        0.03723786     0.01250227   0.01796495    0.02824207     0.03556409
      +## Tau_re 21390.01971236 19227.69893924 616.12626132 6156.68970969 15964.41127665
      +##                   75%          97.5%
      +## Tau        0.04430652     0.06636317
      +## Tau_re 31747.38298645 69218.91604770
      +## 
      +## Marginal log-likelihood (asymmetric Gaussian approximation):  -87.86014 (*)
      +## (*) Marginal log-likelihood is invalid for improper priors and may not be useful
      +## for non-informative priors.
      +

      We see that the inference for Tau_re has changed somewhat, in particular the more extreme quantiles. Looking at the values on the standard deviation rather than the precision scale would probably be more interpretable.

      +

      There are two quantities that control the accuracy of the AGHQ marginal estimate. The first is the number of grid points in the AGHQ numerical integration where \(k\) (nQuad) is the number of points in one dimension. More is better for accuracy but the computational cost grows as \((d-1)^k\). The marginal density is estimated at each of a set of evaluation points of values of the parameter (nMarginalGrid points), using AGHQ for each point, after which a spline is fit to estimate the full marginal density.

      +

      By default we use three quadrature points in each dimension and five evaluation points.

      +

      If we increase the number of evaluation points, we see it has some effect on the inference.

      +
      result$improveParamMarginals(nodes = 'Tau_re', nMarginalGrid = 9)
      +
      ## Calculating inner AGHQ/Laplace approximation at (9) marginal points with 3 quadrature grid points (one dot per grid point): (1)...(2)...(3)...(4)...(5)...(6)...(7)...(8)...(9)...
      +
      ## Model (hyper)parameters: 
      +##                  mean             sd         2.5%           25%            50%
      +## Tau        0.03723786     0.01250227   0.01796495    0.02824207     0.03556409
      +## Tau_re 19769.05921155 19213.05258278 509.02666243 5772.43845958 14049.65725318
      +##                   75%          97.5%
      +## Tau        0.04430652     0.06636317
      +## Tau_re 28069.93565352 69207.14668215
      +## 
      +## Marginal log-likelihood (asymmetric Gaussian approximation):  -87.86014 (*)
      +## (*) Marginal log-likelihood is invalid for improper priors and may not be useful
      +## for non-informative priors.
      +

      If we increase the accuracy of the AGHQ integration, there is little effect.

      +
      result$improveParamMarginals(nodes = 'Tau_re', nMarginalGrid = 9, nQuad = 5)
      +
      ## Calculating inner AGHQ/Laplace approximation at (9) marginal points with 5 quadrature grid points (one dot per grid point): (1).....(2).....(3).....(4).....(5).....(6).....(7).....(8).....(9).....
      +
      ## Model (hyper)parameters: 
      +##                  mean             sd         2.5%           25%            50%
      +## Tau        0.03723786     0.01250227   0.01796495    0.02824207     0.03556409
      +## Tau_re 19769.07622740 19213.05380646 509.03628535 5772.45494192 14049.67605621
      +##                   75%          97.5%
      +## Tau        0.04430652     0.06636317
      +## Tau_re 28069.95398229 69207.16486054
      +## 
      +## Marginal log-likelihood (asymmetric Gaussian approximation):  -87.86014 (*)
      +## (*) Marginal log-likelihood is invalid for improper priors and may not be useful
      +## for non-informative priors.
      +

      We can estimate other expectations or quantiles than those reported and plot a density estimate:

      +
      result$qmarginal('Tau_re', quantiles = c(.05, .95))
      +
      ##      0.05      0.95 
      +##  1031.073 56887.089
      +
      result$plotMarginal('Tau_re')
      +

      +
      ## Compute the posterior mean of the random effects standard deviation instead of precision.
      +prec_to_sd <- function(x) 1/sqrt(x)
      +result$emarginal('Tau_re', prec_to_sd)
      +
      ## [1] 0.01236354
      +

      Next, we turn to inference on the latent nodes. This is straightforward in principle but can be computationally costly because we sample from a mixture of multivariate normal distributions. This requires Laplace approximation at each of the parameter grid points, where the Laplace approximation can be costly if there are many latent nodes or many grid points. And it requires sampling from the multivariate normal distributions, which can be costly if there are many latent nodes.

      +
      latent_sample <- result$sampleLatents(n = 1000)
      +
      ## Calculating inner AGHQ/Laplace approximation at 9 outer (parameter) grid points (one dot per point): .........
      +
      apply(latent_sample, 2, quantile, c(.025, .25, .5, .75, .975))
      +
      ##           b[1]       b[2]      b[3]       b[4]          re[1]         re[2]
      +## 2.5%  79.16658 -5.4108667 -1.899553 -4.1033710 -0.03634116293 -0.0351552171
      +## 25%   82.10253 -0.9164207  2.948703  0.3491639 -0.00771761336 -0.0076002226
      +## 50%   83.62043  1.2990246  5.184560  2.3957186  0.00001847631  0.0001110365
      +## 75%   85.19945  3.5704084  7.551397  4.5236813  0.00783092394  0.0085215979
      +## 97.5% 88.35162  8.2557081 11.964106  8.9400357  0.03550625878  0.0389319899
      +##               re[3]         re[4]          re[5]
      +## 2.5%  -0.0353717878 -0.0329743252 -0.03310659752
      +## 25%   -0.0080197940 -0.0069093825 -0.00712277576
      +## 50%   -0.0007879027 -0.0001664026 -0.00007946033
      +## 75%    0.0066856314  0.0083026301  0.00764221509
      +## 97.5%  0.0370267660  0.0346506851  0.03374579213
      +

      As a side effect, in the result object, we now include a second estimate of the marginal likelihood based on the evaluation of the Laplace approximation on the parameter grid. As note above, with vague priors, the marginal likelihood is probably not useful.

      +

      Note that one can optionally return the parameter values corresponding with each latent sample to see the explicit mixture being used, by specifying includeParams=TRUE. In this case, with a parameter grid of nine points (three quadrature points in each of two dimensions), there are nine unique sets of parameter values in the sample from the mixture of multivariate normal distributions.

      +

      Finally, if one wants joint inference for multiple parameters (or a function of more than one parameter), one can sample from the approximate Gaussian approximation, with a copula-based adjustment so the univariate marginals of the sample match the (current) marginal estimates. This adjustment will use the initial estimate, if improveParamMarginals has not been run, or the improved marginal estimates for whichever parameters improveParamMarginals has been run for (only Tau_re so far in this example).

      +

      We’ll use the sample to compute a simple functional of multiple parameters, which is a main use case for having a sample rather than simply using the univariate marginals shown above. In this case, we look to see if the random effects variance exceeds the observation variance.

      +
      param_sample <- result$sampleParams(n = 1000)
      +
      +mean(1/param_sample[,'Tau_re'] > 1/param_sample[,'Tau'])
      +
      ## [1] 0
      +
      +
      +

      8.4.2.2 Setting latent nodes and parameters

      +

      As discussed above, NIMBLE tries to automatically determine sets of latent nodes and parameters, with random and fixed effects generally included in the latent nodes. In some cases this automatic determination gives sets that are not natural choices because of the model structure and in other cases, a user might want to choose the sets themself.

      +

      Here’s the NIMBLE code for a toy GLMM model for illustration.

      +
      code <- nimbleCode({
      +   for(i in 1:n) {
      +     y[i] ~ dbern(p[i])
      +     logit(p[i]) <- beta1 * x[i] + b[i]
      +     b[i] ~ dnorm(mu, sd = sigma)
      +   }
      +   mu ~ dflat()
      +   sigma ~ dunif(0, 100)
      +   beta1 ~ dflat()
      +})
      +n <- 30
      +x <- rnorm(n)
      +y <- rbinom(n, 1, 0.5)
      +
      +model <- nimbleModel(code, constants = list(n = n), data = list(y = y, x = x))
      +
      ## Defining model
      +
      ## Building model
      +
      ## Setting data and initial values
      +
      ## Running calculate on model
      +##   [Note] Any error reports that follow may simply reflect missing values in model variables.
      +
      ## Checking model sizes and dimensions
      +
      ##   [Note] This model is not fully initialized. This is not an error.
      +##          To see which variables are not initialized, use model$initializeInfo().
      +##          For more information on model initialization, see help(modelInitialization).
      +
      approx <- buildNestedApprox(model)
      +
      ## Building nested posterior approximation for the following node sets:
      +##   - parameter nodes: mu, sigma
      +##   - latent nodes: b (30 elements), beta1
      +##   with AGHQ grid for the parameters and Laplace approximation for the latent nodes.
      +
      ## Building AGHQ/Laplace approximation.
      +

      We see that NIMBLE has put the fixed effect (beta1) and random effects (b[i]) in the latent nodes and the random effects parameters, \(mu\) and \(sigma\), into the parameter set.

      +

      In the model above, the role of the overall intercept is played by the random effects mean. We could instead have centered the random effects on zero and introduced an explicit intercept.

      +
      code <- nimbleCode({
      +   for(i in 1:n) {
      +     y[i] ~ dbern(p[i])
      +     logit(p[i]) <- beta0 + beta1 * x[i] + b[i]
      +     b[i] ~ dnorm(0, sd = sigma)
      +   }
      +   mu ~ dflat()
      +   sigma ~ dunif(0, 100)
      +   beta0 ~ dflat()
      +   beta1 ~ dflat()
      +})
      +n <- 30
      +x <- rnorm(n)
      +y <- rbinom(n, 1, 0.5)
      +
      +model <- nimbleModel(code, constants = list(n = n), data = list(y = y, x = x))
      +
      ## Defining model
      +
      ## Building model
      +
      ## Setting data and initial values
      +
      ## Running calculate on model
      +##   [Note] Any error reports that follow may simply reflect missing values in model variables.
      +
      ## Checking model sizes and dimensions
      +
      ##   [Note] This model is not fully initialized. This is not an error.
      +##          To see which variables are not initialized, use model$initializeInfo().
      +##          For more information on model initialization, see help(modelInitialization).
      +
      approx <- buildNestedApprox(model)
      +
      ## Building nested posterior approximation for the following node sets:
      +##   - parameter nodes: sigma
      +##   - latent nodes: b (30 elements), beta0, beta1
      +##   with AGHQ grid for the parameters and Laplace approximation for the latent nodes.
      +
      ## Building AGHQ/Laplace approximation.
      +

      The split into latent nodes and parameters is equivalent.

      +

      In contrast, if we use a non-centered parameterization (this arises often when using HMC as a better parameterization), then NIMBLE tries to put everything into the parameters, and an error occurs.

      +
      code <- nimbleCode({
      +   for(i in 1:n) {
      +     y[i] ~ dbern(p[i])
      +     logit(p[i]) <- beta0 + beta1 * x[i] + sigma*b[i]
      +     b[i] ~ dnorm(0, 1)
      +   }
      +   mu ~ dflat()
      +   sigma ~ dunif(0, 100)
      +   beta0 ~ dflat()
      +   beta1 ~ dflat()
      +})
      +n <- 30
      +x <- rnorm(n)
      +y <- rbinom(n, 1, 0.5)
      +
      +model <- nimbleModel(code, constants = list(n = n), data = list(y = y, x = x))
      +approx <- buildNestedApprox(model)
      +
      ## Error in buildNestedApprox(model): No latent nodes detected in model. Check the model structure or provide latent nodes explicitly via `latentNodes`. Note that this can occur in a model with random effects that do not depend on any (hyper)parameters.
      +

      Instead, we could manually specify the parameters. +[possible condIndepSets bug causing 2 laplace approximations here…]

      +
      code <- nimbleCode({
      +   for(i in 1:n) {
      +     y[i] ~ dbern(p[i])
      +     logit(p[i]) <- beta0 + beta1 * x[i] + sigma*b[i]
      +     b[i] ~ dnorm(0, 1)
      +   }
      +   mu ~ dflat()
      +   sigma ~ dunif(0, 100)
      +   beta0 ~ dflat()
      +   beta1 ~ dflat()
      +})
      +n <- 30
      +x <- rnorm(n)
      +y <- rbinom(n, 1, 0.5)
      +
      +model <- nimbleModel(code, constants = list(n = n), data = list(y = y, x = x))
      +approx <- buildNestedApprox(model, paramNodes = "sigma")
      +

      In some cases it can be computationally advantageous to move fixed effects into the parameter set so that the Laplace approximation can be done as a product of lower-dimensional Laplace approximations. Here’s an example of doing that manually.

      +
      code <- nimbleCode({
      +   for(i in 1:n) {
      +     y[i] ~ dbern(p[i])
      +     logit(p[i]) <- beta0 + beta1 * x[i] + sigma*b[i]
      +     b[i] ~ dnorm(0, 1)
      +   }
      +   mu ~ dflat()
      +   sigma ~ dunif(0, 100)
      +   beta0 ~ dflat()
      +   beta1 ~ dflat()
      +})
      +n <- 30
      +x <- rnorm(n)
      +y <- rbinom(n, 1, 0.5)
      +
      +model <- nimbleModel(code, constants = list(n = n), data = list(y = y, x = x))
      +
      ## Defining model
      +
      ## Building model
      +
      ## Setting data and initial values
      +
      ## Running calculate on model
      +##   [Note] Any error reports that follow may simply reflect missing values in model variables.
      +
      ## Checking model sizes and dimensions
      +
      ##   [Note] This model is not fully initialized. This is not an error.
      +##          To see which variables are not initialized, use model$initializeInfo().
      +##          For more information on model initialization, see help(modelInitialization).
      +
      approx <- buildNestedApprox(model, paramNodes = c("beta0", "beta1", "sigma"))
      +
      ## Building nested posterior approximation for the following node sets:
      +##   - parameter nodes: beta0, beta1, sigma
      +##   - latent nodes: b (30 elements), mu
      +##   with CCD grid for the parameters and Laplace approximation for the latent nodes.
      +
      ## Building individual AGHQ/Laplace approximations (one dot for each): ...............................
      +

      Note the messaging indicating that 30 Laplace approximations (one for each conditionally-independent set of nodes containing a \({y_i, b_i}\) pair) were built. (In other, perhaps more common situations, one might have grouping structure such that a single random effect is associated with multiple observations such that the random effect and associated observations are conditionally independent of other random effects and observations given the parameters.)

      +
      +
      @@ -929,43 +1483,43 @@

      8.3 Laplace approximation and ada diff --git a/UserManual/cha-bnp.html b/UserManual/cha-bnp.html index c046da49f..5aca049b6 100644 --- a/UserManual/cha-bnp.html +++ b/UserManual/cha-bnp.html @@ -6,7 +6,7 @@ Chapter 10 Bayesian nonparametric models | NimbleUserManual.knit - + @@ -30,7 +30,7 @@ - + @@ -64,7 +64,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } @@ -126,7 +126,6 @@ div.csl-bib-body { } div.csl-entry { clear: both; - margin-bottom: 0em; } .hanging div.csl-entry { margin-left:2em; @@ -281,7 +280,7 @@
    • 7.11.4 Customized log-likelihood evaluations: RW_llFunction sampler
  • 7.12 Detailed MCMC example: litters
  • -
  • 7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
  • +
  • 7.13 Comparing different MCMCs with compareMCMCs
  • 7.14 Running MCMC chains in parallel
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models
  • 7.12 Detailed MCMC example: litters
  • -
  • 7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
  • +
  • 7.13 Comparing different MCMCs with compareMCMCs
  • 7.14 Running MCMC chains in parallel
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models
      @@ -446,33 +456,25 @@
    • V Automatic Derivatives in NIMBLE
    • 16 Automatic Derivatives
    • 17 Example: maximum likelihood estimation using optim with gradients from nimDerivs.
    • References
    • @@ -518,10 +520,10 @@

      14.1 The modelValues data structu

      14.1.1 Creating modelValues objects

      Here is a simple example of creating a modelValues object:

      -
      pumpModelValues = modelValues(pumpModel, m = 2)
      -pumpModel$x
      +
      pumpModelValues = modelValues(pumpModel, m = 2)
      +pumpModel$x
      ##  [1]  5  1  5 14  3 19  1  1  4 22
      -
      pumpModelValues$x
      +
      pumpModelValues$x
      ## [[1]]
       ##  [1] NA NA NA NA NA NA NA NA NA NA
       ## 
      @@ -536,12 +538,12 @@ 

      14.1.1 Creating modelValues objec %% 1. `vars`, which is a character vector of variable names, % 1. `type`, which is a character vector of the data types for each variable (`double' for real numbers, `integer' for integers) and %% 1. `size`, which is a list of vectors of the sizes in each dimension of each variable. The names of the list elements must match the names provided in `vars`. -->

      -
      mvConf = modelValuesConf(vars = c('a', 'b', 'c'), 
      -                         type = c('double', 'int', 'double'), 
      -                         size = list(a = 2, b =c(2,2), c = 1) )
      -
      -customMV = modelValues(mvConf, m = 2)
      -customMV$a
      +
      mvConf = modelValuesConf(vars = c('a', 'b', 'c'), 
      +                         type = c('double', 'int', 'double'), 
      +                         size = list(a = 2, b =c(2,2), c = 1) )
      +
      +customMV = modelValues(mvConf, m = 2)
      +customMV$a
      ## [[1]]
       ## [1] NA NA
       ## 
      @@ -559,18 +561,18 @@ 

      14.1.1 Creating modelValues objec

      Here is an example where the customMV created above is used as the setup argument for a nimbleFunction, which is then compiled. Its compiled modelValues is then accessed with $.

      -
      # simple nimbleFunction that uses a modelValues object
      -resizeMV <- nimbleFunction(
      -  setup = function(mv){},
      -  run = function(k = integer() ){
      -    resize(mv,k)})
      -
      -rResize <- resizeMV(customMV)
      -cResize <- compileNimble(rResize)
      -cResize$run(5)
      -cCustomMV <- cResize$mv
      -# cCustomMV is a compiled modelValues object
      -cCustomMV[['a']]
      +
      # simple nimbleFunction that uses a modelValues object
      +resizeMV <- nimbleFunction(
      +  setup = function(mv){},
      +  run = function(k = integer() ){
      +    resize(mv,k)})
      +
      +rResize <- resizeMV(customMV)
      +cResize <- compileNimble(rResize)
      +cResize$run(5)
      +cCustomMV <- cResize$mv
      +# cCustomMV is a compiled modelValues object
      +cCustomMV[['a']]
      ## [[1]]
       ## [1] NA NA
       ## 
      @@ -596,44 +598,44 @@ 

      14.1.1 Creating modelValues objec

      14.1.2 Accessing contents of modelValues

      The values in a modelValues object can be accessed in several ways from R, and in fewer ways from NIMBLE.

      -
      # sets the first row of a to (0, 1).  R only.
      -customMV[['a']][[1]] <- c(0,1)   
      -
      -# sets the second row of a to (2, 3)
      -customMV['a', 2] <- c(2,3)       
      -
      -# can access subsets of each row
      -customMV['a', 2][2] <- 4
      -
      -# accesses all values of 'a'. Output is a list.  R only.
      -customMV[['a']]                  
      +
      # sets the first row of a to (0, 1).  R only.
      +customMV[['a']][[1]] <- c(0,1)   
      +
      +# sets the second row of a to (2, 3)
      +customMV['a', 2] <- c(2,3)       
      +
      +# can access subsets of each row
      +customMV['a', 2][2] <- 4
      +
      +# accesses all values of 'a'. Output is a list.  R only.
      +customMV[['a']]                  
      ## [[1]]
       ## [1] 0 1
       ## 
       ## [[2]]
       ## [1] 2 4
      -
      # sets the first row of b to a matrix with values 1. R only.
      -customMV[['b']][[1]] <- matrix(1, nrow = 2, ncol = 2)  
      -
      -# sets the second row of b.  R only.
      -customMV[['b']][[2]] <- matrix(2, nrow = 2, ncol = 2)  
      -
      -# make sure the size of inputs is correct
      -# customMV['a', 1] <- 1:10  "
      -# problem: size of 'a' is 2, not 10!
      -# will cause problems when compiling nimbleFunction using customMV
      +
      # sets the first row of b to a matrix with values 1. R only.
      +customMV[['b']][[1]] <- matrix(1, nrow = 2, ncol = 2)  
      +
      +# sets the second row of b.  R only.
      +customMV[['b']][[2]] <- matrix(2, nrow = 2, ncol = 2)  
      +
      +# make sure the size of inputs is correct
      +# customMV['a', 1] <- 1:10  "
      +# problem: size of 'a' is 2, not 10!
      +# will cause problems when compiling nimbleFunction using customMV

      Currently, only the syntax customMV["a", 2] works in the NIMBLE language, not customMV[["a"][[2]].

      We can query and change the number of rows using getsize and resize, respectively. These work in both R and NIMBLE. Note that we don’t specify the variables in this case: all variables in a modelValues object will have the same number of rows.

      -
      getsize(customMV)
      +
      getsize(customMV)
      ## [1] 2
      -
      resize(customMV, 3)
      -getsize(customMV)
      +
      resize(customMV, 3)
      +getsize(customMV)
      ## [1] 3
      -
      customMV$a
      +
      customMV$a
      ## [[1]]
       ## [1] 0 1
       ## 
      @@ -651,12 +653,12 @@ 

      14.1.2 Accessing contents of mode 1]”, etc.). The rows of the modelValues will be the rows of the matrix, with any matrices or arrays converted to a vector based on column-major ordering.

      -
      as.matrix(customMV, 'a')   # convert 'a'
      +
      as.matrix(customMV, 'a')   # convert 'a'
      ##      a[1] a[2]
       ## [1,]    0    1
       ## [2,]    2    4
       ## [3,]   NA   NA
      -
      as.matrix(customMV)        # convert all variables
      +
      as.matrix(customMV)        # convert all variables
      ##      a[1] a[2] b[1, 1] b[2, 1] b[1, 2] b[2, 2] c[1]
       ## [1,]    0    1       1       1       1       1   NA
       ## [2,]    2    4       2       2       2       2   NA
      @@ -666,13 +668,13 @@ 

      14.1.2 Accessing contents of mode with an additional index for the iteration (e.g., MCMC iteration when used for MCMC output). By default iteration is the first index, but it can be the last if needed.

      -
      as.list(customMV, 'a')    # get only 'a'
      +
      as.list(customMV, 'a')    # get only 'a'
      ## $a
       ##      [,1] [,2]
       ## [1,]    0    1
       ## [2,]    2    4
       ## [3,]   NA   NA
      -
      as.list(customMV)         # get all variables
      +
      as.list(customMV)         # get all variables
      ## $a
       ##      [,1] [,2]
       ## [1,]    0    1
      @@ -700,22 +702,22 @@ 

      14.1.2 Accessing contents of mode ## [1,] NA ## [2,] NA ## [3,] NA

      -
      as.list(customMV, 'a', iterationAsLastIndex = TRUE) # put iteration in last index
      +
      as.list(customMV, 'a', iterationAsLastIndex = TRUE) # put iteration in last index
      ## $a
       ##      [,1] [,2] [,3]
       ## [1,]    0    2   NA
       ## [2,]    1    4   NA

      If a variable is a scalar, using unlist in R to extract all rows as a vector can be useful.

      -
      customMV['c', 1] <- 1
      -customMV['c', 2] <- 2
      -customMV['c', 3] <- 3
      -unlist(customMV['c', ])
      +
      customMV['c', 1] <- 1
      +customMV['c', 2] <- 2
      +customMV['c', 3] <- 3
      +unlist(customMV['c', ])
      ## [1] 1 2 3

      Once we have a modelValues object, we can see the structure of its contents via the varNames and sizes components of the object.

      -
      customMV$varNames
      +
      customMV$varNames
      ## [1] "a" "b" "c"
      -
      customMV$sizes
      +
      customMV$sizes
      ## $a
       ## [1] 2
       ## 
      @@ -729,13 +731,13 @@ 

      14.1.2 Accessing contents of mode either R functions or nimbleFunctions will persist outside of the function. This allows for more efficient computation, as stored values are immediately shared among nimbleFunctions.

      -
      alter_a <- function(mv){
      -  mv['a',1][1] <- 1
      -}
      -customMV['a', 1]
      +
      alter_a <- function(mv){
      +  mv['a',1][1] <- 1
      +}
      +customMV['a', 1]
      ## [1] 0 1
      -
      alter_a(customMV)
      -customMV['a',1]
      +
      alter_a(customMV)
      +customMV['a',1]
      ## [1] 1 1

      However, when you retrieve a variable from a modelValues object, the result is a standard R list, which is subsequently passed by value, as usual in R.

      @@ -754,21 +756,21 @@

      14.1.2.1 Automating calculation a save the log probabilities back into the modelValues object if saveLP = TRUE, a run-time argument.

      Here are some examples:

      -
      mv <- modelValues(simpleModel)
      -rSimManyXY <- simNodesMV(simpleModel, nodes = c('x', 'y'), mv = mv)
      -rCalcManyXDeps <- calcNodesMV(simpleModel, nodes = 'x', mv = mv)
      -rGetLogProbMany <- getLogProbNodesMV(simpleModel, nodes = 'x', mv = mv)
      -
      -cSimManyXY <- compileNimble(rSimManyXY, project = simpleModel)
      -cCalcManyXDeps <- compileNimble(rCalcManyXDeps, project = simpleModel)
      -cGetLogProbMany <- compileNimble(rGetLogProbMany, project = simpleModel)
      -
      -cSimManyXY$run(m = 5) # simulating 5 times
      -cCalcManyXDeps$run(saveLP = TRUE) # calculating 
      +
      mv <- modelValues(simpleModel)
      +rSimManyXY <- simNodesMV(simpleModel, nodes = c('x', 'y'), mv = mv)
      +rCalcManyXDeps <- calcNodesMV(simpleModel, nodes = 'x', mv = mv)
      +rGetLogProbMany <- getLogProbNodesMV(simpleModel, nodes = 'x', mv = mv)
      +
      +cSimManyXY <- compileNimble(rSimManyXY, project = simpleModel)
      +cCalcManyXDeps <- compileNimble(rCalcManyXDeps, project = simpleModel)
      +cGetLogProbMany <- compileNimble(rGetLogProbMany, project = simpleModel)
      +
      +cSimManyXY$run(m = 5) # simulating 5 times
      +cCalcManyXDeps$run(saveLP = TRUE) # calculating 
      ## [1] -11.24771 -26.42169 -15.48325 -17.24588 -16.43909
      -
      cGetLogProbMany$run() #
      +
      cGetLogProbMany$run() #
      ## [1] -11.24771 -26.42169 -15.48325 -17.24588 -16.43909
      -
      result <- as.matrix(cSimManyXY$mv) # extract simulated values
      +
      result <- as.matrix(cSimManyXY$mv) # extract simulated values

      @@ -777,31 +779,31 @@

      14.2 The nimbleList data structur

      nimbleLists provide a container for storing different types of objects in NIMBLE, similar to the list data structure in R. Before a nimbleList can be created and used, a definition28 for that nimbleList must be created that provides the names, types, and dimensions of the elements in the nimbleList. nimbleList definitions must be created in R (either in R’s global environment or in setup code), but the nimbleList instances can be created in run code.

      Unlike lists in R, nimbleLists must have the names and types of all list elements provided by a definition before the list can be used. A nimbleList definition can be made by using the nimbleList function in one of two manners. The first manner is to provide the nimbleList function with a series of expressions of the form name = type(nDim), similar to the specification of run-time arguments to nimbleFunctions. The types allowed for a nimbleList are the same as those allowed as run-time arguments to a nimbleFunction, described in Section 11.4. For example, the following line of code creates a nimbleList definition with two elements: x, which is a scalar integer, and Y, which is a matrix of doubles.

      -
       exampleNimListDef <- nimbleList(x = integer(0), Y = double(2))
      +
       exampleNimListDef <- nimbleList(x = integer(0), Y = double(2))

      The second method of creating a nimbleList definition is by providing an R list of nimbleType objects to the nimbleList() function. A nimbleType object can be created using the nimbleType function, which must be provided with three arguments: the name of the element being created, the type of the element being created, and the dim of the element being created. For example, the following code creates a list with two nimbleType objects and uses these objects to create a nimbleList definition.

      -
      nimbleListTypes <- list(nimbleType(name = 'x', type = 'integer', dim = 0),
      -                        nimbleType(name = 'Y', type = 'double', dim = 2))
      -
      - # this nimbleList definition is identical to the one created above
      - exampleNimListDef2 <- nimbleList(nimbleListTypes)
      +
      nimbleListTypes <- list(nimbleType(name = 'x', type = 'integer', dim = 0),
      +                        nimbleType(name = 'Y', type = 'double', dim = 2))
      +
      + # this nimbleList definition is identical to the one created above
      + exampleNimListDef2 <- nimbleList(nimbleListTypes)

      Creating definitions using a list of nimbleTypes can be useful, as it allows for programmatic generation of nimbleList elements.

      Once a nimbleList definition has been created, new instances of nimbleLists can be made from that definition using the new member function. The new function can optionally take initial values for the list elements as arguments. Below, we create a new nimbleList from our exampleNimListDef and specify values for the two elements of our list:

      -
       exampleNimList <- exampleNimListDef$new(x = 1, Y = diag(2))
      +
       exampleNimList <- exampleNimListDef$new(x = 1, Y = diag(2))

      Once created, nimbleList elements can be accessed using the $ operator, just as with lists in R. For example, the value of the x element of our exampleNimbleList can be set to 7 using

      -
       exampleNimList$x <- 7
      +
       exampleNimList$x <- 7

      nimbleList definitions can be created either in R’s global environment or in setup code of a nimbleFunction. Once a nimbleList definition has been made, new instances of nimbleLists can be created using the new function in R’s global environment, in setup code, or in run code of a nimbleFunction.

      nimbleLists can also be passed as arguments to run code of nimbleFunctions and returned from nimbleFunctions. To use a nimbleList as a run function argument, the name of the nimbleList definition should be provided as the argument type, with a set of parentheses following. To return a nimbleList from the run code of a nimbleFunction, the returnType of that function should be the name of the nimbleList definition, again using a following set of parentheses.

      Below, we demonstrate a function that takes the exampleNimList as an argument, modifies its Y element, and returns the nimbleList.

      -
      mynf <- nimbleFunction(
      -  run = function(vals = exampleNimListDef()){
      -    onesMatrix <- matrix(value = 1, nrow = 2, ncol = 2)
      -    vals$Y  <- onesMatrix
      -    returnType(exampleNimListDef())
      -    return(vals)
      -  })
      - 
      -# pass exampleNimList as argument to mynf
      -mynf(exampleNimList)
      +
      mynf <- nimbleFunction(
      +  run = function(vals = exampleNimListDef()){
      +    onesMatrix <- matrix(value = 1, nrow = 2, ncol = 2)
      +    vals$Y  <- onesMatrix
      +    returnType(exampleNimListDef())
      +    return(vals)
      +  })
      + 
      +# pass exampleNimList as argument to mynf
      +mynf(exampleNimList)
      ## nimbleList object of type nfRefClass_25
       ## Field "x":
       ## [1] 7
      @@ -810,21 +812,21 @@ 

      14.2 The nimbleList data structur ## [1,] 1 1 ## [2,] 1 1

      nimbleList arguments to run functions are passed by reference – this means that if an element of a nimbleList argument is modified within a function, that element will remain modified when the function has finished running. To see this, we can inspect the value of the Y element of the exampleNimList object and see that it has been modified.

      -
      exampleNimList$Y
      +
      exampleNimList$Y
      ##      [,1] [,2]
       ## [1,]    1    1
       ## [2,]    1    1

      In addition to storing basic data types, nimbleLists can also store other nimbleLists. To achieve this, we must create a nimbleList definition that declares the types of nested nimbleLists a nimbleList will store. Below, we create two types of nimbleLists: the first, named innerNimList, will be stored inside the second, named outerNimList:

      -
      # first, create definitions for both inner and outer nimbleLists
      -innerNimListDef <- nimbleList(someText = character(0))
      -outerNimListDef <- nimbleList(xList = innerNimListDef(),
      -                              z = double(0))
      -
      -# then, create outer nimbleList
      -outerNimList <- outerNimListDef$new(z = 3.14)
      -
      -# access element of inner nimbleList
      -outerNimList$xList$someText <- "hello, world"
      +
      # first, create definitions for both inner and outer nimbleLists
      +innerNimListDef <- nimbleList(someText = character(0))
      +outerNimListDef <- nimbleList(xList = innerNimListDef(),
      +                              z = double(0))
      +
      +# then, create outer nimbleList
      +outerNimList <- outerNimListDef$new(z = 3.14)
      +
      +# access element of inner nimbleList
      +outerNimList$xList$someText <- "hello, world"

      Note that definitions for inner, or nested, nimbleLists must be created before the definition for an outer nimbleList.

      14.2.1 Pre-defined nimbleList types

      @@ -842,27 +844,27 @@

      14.2.2 Using eigen and <

      NIMBLE has two linear algebra functions that return nimbleLists. The eigen function takes a symmetic matrix, x, as an argument and returns a nimbleList of type eigenNimbleList. nimbleLists of type eigenNimbleList have two elements: values, a vector with the eigenvalues of x, and vectors, a square matrix with the same dimension as x whose columns are the eigenvectors of x. The eigen function has two additional arguments: symmetric and only.values. The symmetric argument can be used to specify if x is a symmetric matrix or not. If symmetric = FALSE (the default value), x will be checked for symmetry. Eigendecompositions in NIMBLE for symmetric matrices are both faster and more accurate. Additionally, eigendecompostions of non-symmetric matrices can have complex entries, which are not supported by NIMBLE. If a complex entry is detected, NIMBLE will issue a warning and that entry will be set to NaN. The only.values arument defaults to FALSE. If only.values = TRUE, the eigen function will not calculate the eigenvectors of x, leaving the vectors nimbleList element empty. This can reduce calculation time if only the eigenvalues of x are needed.

      The svd function takes an \(n \times p\) matrix x as an argument, and returns a nimbleList of type svdNimbleList. nimbleLists of type svdNimbleList have three elements: d, a vector with the singular values of x, u a matrix with the left singular vectors of x, and v, a matrix with the right singular vectors of x. The svd function has an optional argument vectors which defaults to a value of "full". The vectors argument can be used to specify the number of singular vectors that are returned. If vectors = "full", v will be an \(n \times n\) matrix and u will be an \(p \times p\) matrix. If vectors = "thin", v will be an\(n \times m\) matrix, where \(m = \min(n,p)\), and u will be an \(m \times p\) matrix. If vectors = "none", the u and v elements of the returned nimbleList will not be populated.

      nimbleLists created by either eigen or svd can be returned from a nimbleFunction, using returnType(eigenNimbleList()) or returnType(svdNimbleList()) respectively. nimbleLists created by eigen and svd can also be used within other nimbleLists by specifying the nimbleList element types as eigenNimbleList() and svdNimbleList(). The below example demonstrates the use of eigen and svd within a nimbleFunction.

      -
      eigenListFunctionGenerator <- nimbleFunction(
      -  setup = function(){
      -    demoMatrix <- diag(4) + 2
      -    eigenAndSvdListDef <- nimbleList(demoEigenList = eigenNimbleList(), 
      -                                     demoSvdList = svdNimbleList())
      -    eigenAndSvdList <- eigenAndSvdListDef$new()
      -  },
      -  run = function(){
      -    # we will take the eigendecomposition and svd of a symmetric matrix
      -    eigenAndSvdList$demoEigenList <<- eigen(demoMatrix, symmetric = TRUE, 
      -                                            only.values = TRUE)
      -    eigenAndSvdList$demoSvdList <<- svd(demoMatrix, vectors = 'none')
      -    returnType(eigenAndSvdListDef())
      -    return(eigenAndSvdList)
      -  })
      -eigenListFunction <- eigenListFunctionGenerator()
      -
      -outputList <-  eigenListFunction$run()
      -outputList$demoEigenList$values
      +
      eigenListFunctionGenerator <- nimbleFunction(
      +  setup = function(){
      +    demoMatrix <- diag(4) + 2
      +    eigenAndSvdListDef <- nimbleList(demoEigenList = eigenNimbleList(), 
      +                                     demoSvdList = svdNimbleList())
      +    eigenAndSvdList <- eigenAndSvdListDef$new()
      +  },
      +  run = function(){
      +    # we will take the eigendecomposition and svd of a symmetric matrix
      +    eigenAndSvdList$demoEigenList <<- eigen(demoMatrix, symmetric = TRUE, 
      +                                            only.values = TRUE)
      +    eigenAndSvdList$demoSvdList <<- svd(demoMatrix, vectors = 'none')
      +    returnType(eigenAndSvdListDef())
      +    return(eigenAndSvdList)
      +  })
      +eigenListFunction <- eigenListFunctionGenerator()
      +
      +outputList <-  eigenListFunction$run()
      +outputList$demoEigenList$values
      ## [1] 9 1 1 1
      -
      outputList$demoSvdList$d
      +
      outputList$demoSvdList$d
      ## [1] 9 1 1 1

      The eigenvalues and singular values returned from the above function are the same since the matrix being decomposed was symmetric. However, note that both eigendecompositions and singular value decompositions are numerical procedures, and computed solutions may have slight differences even for a symmetric input matrix.

      @@ -898,43 +900,43 @@

      14.2.2 Using eigen and < diff --git a/UserManual/cha-installing-nimble.html b/UserManual/cha-installing-nimble.html index 7239b0cc7..443e02d18 100644 --- a/UserManual/cha-installing-nimble.html +++ b/UserManual/cha-installing-nimble.html @@ -6,7 +6,7 @@ Chapter 4 Installing NIMBLE | NimbleUserManual.knit - + @@ -30,7 +30,7 @@ - + @@ -64,7 +64,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } @@ -126,7 +126,6 @@ div.csl-bib-body { } div.csl-entry { clear: both; - margin-bottom: 0em; } .hanging div.csl-entry { margin-left:2em; @@ -281,7 +280,7 @@
    • 7.11.4 Customized log-likelihood evaluations: RW_llFunction sampler
  • 7.12 Detailed MCMC example: litters
  • -
  • 7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
  • +
  • 7.13 Comparing different MCMCs with compareMCMCs
  • 7.14 Running MCMC chains in parallel
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models
  • 7.12 Detailed MCMC example: litters
  • -
  • 7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
  • +
  • 7.13 Comparing different MCMCs with compareMCMCs
  • 7.14 Running MCMC chains in parallel
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models
  • 7.12 Detailed MCMC example: litters
  • -
  • 7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
  • +
  • 7.13 Comparing different MCMCs with compareMCMCs
  • 7.14 Running MCMC chains in parallel
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models

    The above are calculated and returned for each MCMC chain, using the post-burn-in and thinned samples. Additionally, posterior summary statistics are calculated for all chains combined when multiple chains are run.

    Several example usages of nimbleMCMC are shown below:

    -
    code <- nimbleCode({
    -    mu ~ dnorm(0, sd = 1000)
    -    sigma ~ dunif(0, 1000)
    -    for(i in 1:10)
    -        x[i] ~ dnorm(mu, sd = sigma)
    -})
    -data <- list(x = c(2, 5, 3, 4, 1, 0, 1, 3, 5, 3))
    -initsFunction <- function() list(mu = rnorm(1,0,1), sigma = runif(1,0,10))
    -
    -# execute one MCMC chain, monitoring the "mu" and "sigma" variables,
    -# with thinning interval 10.  fix the random number seed for reproducible
    -# results.  by default, only returns posterior samples.
    -mcmc.out <- nimbleMCMC(code = code, data = data, inits = initsFunction,
    -                       monitors = c("mu", "sigma"), thin = 10,
    -                       niter = 20000, nchains = 1, setSeed = TRUE)
    -
    -# note that the inits argument to nimbleModel must be a list of
    -# initial values, whereas nimbleMCMC can accept inits as a function
    -# for generating new initial values for each chain.
    -initsList <- initsFunction()
    -Rmodel <- nimbleModel(code, data = data, inits = initsList)
    -
    -# using the existing Rmodel object, execute three MCMC chains with 
    -# specified burn-in.  return samples, summary statistics, and WAIC.
    -mcmc.out <- nimbleMCMC(model = Rmodel,
    -                       niter = 20000, nchains = 3, nburnin = 2000,
    -                       summary = TRUE, WAIC = TRUE)
    -
    -# run ten chains, generating random initial values for each
    -# chain using the inits function specified above.
    -# only return summary statistics from each chain; not all the samples.
    -mcmc.out <- nimbleMCMC(model = Rmodel, nchains = 10, inits = initsFunction,
    -                       samples = FALSE, summary = TRUE)
    +
    code <- nimbleCode({
    +    mu ~ dnorm(0, sd = 1000)
    +    sigma ~ dunif(0, 1000)
    +    for(i in 1:10)
    +        x[i] ~ dnorm(mu, sd = sigma)
    +})
    +data <- list(x = c(2, 5, 3, 4, 1, 0, 1, 3, 5, 3))
    +initsFunction <- function() list(mu = rnorm(1,0,1), sigma = runif(1,0,10))
    +
    +# execute one MCMC chain, monitoring the "mu" and "sigma" variables,
    +# with thinning interval 10.  fix the random number seed for reproducible
    +# results.  by default, only returns posterior samples.
    +mcmc.out <- nimbleMCMC(code = code, data = data, inits = initsFunction,
    +                       monitors = c("mu", "sigma"), thin = 10,
    +                       niter = 20000, nchains = 1, setSeed = TRUE)
    +
    +# note that the inits argument to nimbleModel must be a list of
    +# initial values, whereas nimbleMCMC can accept inits as a function
    +# for generating new initial values for each chain.
    +initsList <- initsFunction()
    +Rmodel <- nimbleModel(code, data = data, inits = initsList)
    +
    +# using the existing Rmodel object, execute three MCMC chains with 
    +# specified burn-in.  return samples, summary statistics, and WAIC.
    +mcmc.out <- nimbleMCMC(model = Rmodel,
    +                       niter = 20000, nchains = 3, nburnin = 2000,
    +                       summary = TRUE, WAIC = TRUE)
    +
    +# run ten chains, generating random initial values for each
    +# chain using the inits function specified above.
    +# only return summary statistics from each chain; not all the samples.
    +mcmc.out <- nimbleMCMC(model = Rmodel, nchains = 10, inits = initsFunction,
    +                       samples = FALSE, summary = TRUE)

    See help(nimbleMCMC) for further details.

    @@ -606,7 +607,7 @@

    7.2 The MCMC configuration

    7.2.1 Default MCMC configuration

    Assuming we have a model named Rmodel, the following will generate a default MCMC configuration:

    -
    mcmcConf <- configureMCMC(Rmodel)
    +
    mcmcConf <- configureMCMC(Rmodel)

    The default configuration will contain a single sampler for each node in the model, and the default ordering follows the topological ordering of the model.

    7.2.1.1 Default assignment of sampler algorithms

    @@ -649,7 +650,7 @@

    7.2.1.4 Default monitors7.2.1.5 Automated parameter blocking

    The default configuration may be replaced by one generated from an automated parameter blocking algorithm. This algorithm determines groupings of model nodes that, when jointly sampled with a RW_block sampler, increase overall MCMC efficiency. Overall efficiency is defined as the effective sample size of the slowest-mixing node divided by computation time. This is done by:

    -
    autoBlockConf <- configureMCMC(Rmodel, autoBlock = TRUE)
    +
    autoBlockConf <- configureMCMC(Rmodel, autoBlock = TRUE)

    Note that this using autoBlock = TRUE compiles and runs MCMCs, progressively exploring different sampler assignments, so it takes some time and generates some output. It is most useful for determining effective blocking strategies that can be re-used for later runs. The additional control argument autoIt may also be provided to indicate the number of MCMC samples to be used in each trial of the automated blocking procedure (default 20,000).

    @@ -680,7 +681,7 @@

    7.2.2.2 Creating an empty configu

    7.2.2.3 Overriding the default sampler control list values

    The default values of control list elements for all sampling algorithms may be overridden through use of the control argument to configureMCMC, which should be a named list. Named elements in the control argument will be used for all default samplers and any subsequent sampler added via addSampler (see below). For example, the following will create the default MCMC configuration, except all RW samplers will have their initial scale set to 3, and none of the samplers (RW, or otherwise) will be adaptive.

    -
    mcmcConf <- configureMCMC(Rmodel, control = list(scale = 3, adaptive = FALSE))
    +
    mcmcConf <- configureMCMC(Rmodel, control = list(scale = 3, adaptive = FALSE))

    When adding samplers to a configuration using addSampler, the default control list can also be over-ridden.

    @@ -700,167 +701,167 @@

    7.2.2.4 Adding samplers to the co

    7.2.2.5 Printing, re-ordering, modifying and removing samplers: printSamplers, removeSamplers, setSamplers, and getSamplerDefinition

    The current, ordered, list of all samplers in the MCMC configuration may be printed by calling the printSamplers method. When you want to see only samplers acting on specific model nodes or variables, provide those names as an argument to printSamplers. The printSamplers method accepts arguments controlling the level of detail displayed as discussed in its R help information.

    -
    # Print all samplers
    -mcmcConf$printSamplers()
    -
    -# Print all samplers operating on node "a[1]",
    -# or any of the "beta[]" variables
    -mcmcConf$printSamplers(c("a[1]", "beta"))
    -
    -# Print all conjugate and slice samplers
    -mcmcConf$printSamplers(type = c("conjugate", "slice"))
    -
    -# Print all RW samplers operating on "x"
    -mcmcConf$printSamplers("x", type = "RW")
    -
    -# Print the first 100 samplers
    -mcmcConf$printSamplers(1:100)
    -
    -# Print all samplers in their order of execution
    -mcmcConf$printSamplers(executionOrder = TRUE)
    -

    Samplers may be removed from the configuration object using removeSamplers, which accepts a character vector of node or variable names, or a numeric vector of indices.

    -
    # Remove all samplers acting on "x" or any component of it
    -mcmcConf$removeSamplers("x")
    +
    # Print all samplers
    +mcmcConf$printSamplers()
     
    -# Remove all samplers acting on "alpha[1]" and "beta[1]"
    -mcmcConf$removeSamplers(c("alpha[1]", "beta[1]"))
    -
    -# Remove the first five samplers
    -mcmcConf$removeSamplers(1:5)
    -
    -# Providing no argument removes all samplers
    -mcmcConf$removeSamplers()
    +# Print all samplers operating on node "a[1]", +# or any of the "beta[]" variables +mcmcConf$printSamplers(c("a[1]", "beta")) + +# Print all conjugate and slice samplers +mcmcConf$printSamplers(type = c("conjugate", "slice")) + +# Print all RW samplers operating on "x" +mcmcConf$printSamplers("x", type = "RW") + +# Print the first 100 samplers +mcmcConf$printSamplers(1:100) + +# Print all samplers in their order of execution +mcmcConf$printSamplers(executionOrder = TRUE)
    +

    Samplers may be removed from the configuration object using removeSamplers, which accepts a character vector of node or variable names, or a numeric vector of indices.

    +
    # Remove all samplers acting on "x" or any component of it
    +mcmcConf$removeSamplers("x")
    +
    +# Remove all samplers acting on "alpha[1]" and "beta[1]"
    +mcmcConf$removeSamplers(c("alpha[1]", "beta[1]"))
    +
    +# Remove the first five samplers
    +mcmcConf$removeSamplers(1:5)
    +
    +# Providing no argument removes all samplers
    +mcmcConf$removeSamplers()

    Samplers to retain may be specified reordered using setSamplers, which also accepts a character vector of node or variable names, or a numeric vector of indices.

    -
    # Set the list of samplers to those acting on any components of the
    -# model variables "x", "y", or "z".
    -mcmcConf$setSamplers(c("x", "y", "z"))
    -
    -# Set the list of samplers to only those acting on model nodes
    -# "alpha[1]", "alpha[2]", ..., "alpha[10]"
    -mcmcConf$setSamplers("alpha[1:10]")
    -
    -# Truncate the current list of samplers to the first 10 and the 100th
    -mcmcConf$setSamplers(ind = c(1:10, 100))
    +
    # Set the list of samplers to those acting on any components of the
    +# model variables "x", "y", or "z".
    +mcmcConf$setSamplers(c("x", "y", "z"))
    +
    +# Set the list of samplers to only those acting on model nodes
    +# "alpha[1]", "alpha[2]", ..., "alpha[10]"
    +mcmcConf$setSamplers("alpha[1:10]")
    +
    +# Truncate the current list of samplers to the first 10 and the 100th
    +mcmcConf$setSamplers(ind = c(1:10, 100))

    The nimbleFunction definition underlying a particular sampler may be viewed using the getSamplerDefinition method, using the sampler index as an argument. A node name argument may also be supplied, in which case the definition of the first sampler acting on that node is returned. In all cases, getSamplerDefinition only returns the definition of the first sampler specified either by index or node name.

    -
    # Return the definition of the third sampler in the mcmcConf object
    -mcmcConf$getSamplerDefinition(3)
    -
    -# Return the definition of the first sampler acting on node "x",
    -# or the first of any indexed nodes comprising the variable "x"
    -mcmcConf$getSamplerDefinition("x")
    +
    # Return the definition of the third sampler in the mcmcConf object
    +mcmcConf$getSamplerDefinition(3)
    +
    +# Return the definition of the first sampler acting on node "x",
    +# or the first of any indexed nodes comprising the variable "x"
    +mcmcConf$getSamplerDefinition("x")

    7.2.2.6 Customizing individual sampler configurations: getSamplers, setSamplers, setName, setSamplerFunction, setTarget, and setControl

    Each sampler in an MCMCconf object is represented by a sampler configuration as a samplerConf object. Each samplerConf is a reference class object containing the following (required) fields: name (a character string), samplerFunction (a valid nimbleFunction sampler), target (the model node to be sampled), and control (list of control arguments). The MCMCconf method getSamplers allows access to the samplerConf objects. These can be modified and then passed as an argument to setSamplers to over-write the current list of samplers in the MCMC configuration object. However, no checking of the validity of this modified list is performed; if the list of samplerConf objects is corrupted to be invalid, incorrect behavior will result at the time of calling buildMCMC. The fields of a samplerConf object can be modified using the access functions setName(name), setSamplerFunction(fun), setTarget(target, model), and setControl(control).

    Here are some examples:

    -
    # retrieve samplerConf list
    -samplerConfList <- mcmcConf$getSamplers()
    -
    -# change the name of the first sampler
    -samplerConfList[[1]]$setName("newNameForThisSampler")
    -
    -# change the sampler function of the second sampler,
    -# assuming existance of a nimbleFunction 'anotherSamplerNF',
    -# which represents a valid nimbleFunction sampler.
    -samplerConfList[[2]]$setSamplerFunction(anotherSamplerNF)
    -
    -# change the 'adaptive' element of the control list of the third sampler
    -control <- samplerConfList[[3]]$control
    -control$adaptive <- FALSE
    -samplerConfList[[3]]$setControl(control)
    -
    -# change the target node of the fourth sampler
    -samplerConfList[[4]]$setTarget("y", model)   # model argument required
    -
    -# use this modified list of samplerConf objects in the MCMC configuration
    -mcmcConf$setSamplers(samplerConfList)
    +
    # retrieve samplerConf list
    +samplerConfList <- mcmcConf$getSamplers()
    +
    +# change the name of the first sampler
    +samplerConfList[[1]]$setName("newNameForThisSampler")
    +
    +# change the sampler function of the second sampler,
    +# assuming existance of a nimbleFunction 'anotherSamplerNF',
    +# which represents a valid nimbleFunction sampler.
    +samplerConfList[[2]]$setSamplerFunction(anotherSamplerNF)
    +
    +# change the 'adaptive' element of the control list of the third sampler
    +control <- samplerConfList[[3]]$control
    +control$adaptive <- FALSE
    +samplerConfList[[3]]$setControl(control)
    +
    +# change the target node of the fourth sampler
    +samplerConfList[[4]]$setTarget("y", model)   # model argument required
    +
    +# use this modified list of samplerConf objects in the MCMC configuration
    +mcmcConf$setSamplers(samplerConfList)

    7.2.2.7 Customizing the sampler execution order

    The ordering of sampler execution can be controlled as well. This allows for sampler functions to execute multiple times within a single MCMC iteration, or the execution of different sampler functions to be interleaved with one another.

    The sampler execution order is set using the function setSamplerExecutionOrder, and the current ordering of execution is retrieved using getSamplerExecutionOrder. For example, assuming the MCMC configuration object mcmcConf contains five samplers:

    -
    # first sampler to execute twice, in succession:
    -mcmcConf$setSamplerExecutionOrder(c(1, 1, 2, 3, 4, 5))
    -
    -# first sampler to execute multiple times, interleaved:
    -mcmcConf$setSamplerExecutionOrder(c(1, 2, 1, 3, 1, 4, 1, 5))
    -
    -# fourth sampler to execute 10 times, only
    -mcmcConf$setSamplerExecutionOrder(rep(4, 10))
    -
    -# omitting the argument to setSamplerExecutionOrder()
    -# resets the ordering to each sampler executing once, sequentially
    -mcmcConf$setSamplerExecutionOrder()
    -
    -# retrieve the current ordering of sampler execution
    -ordering <- mcmcConf$getSamplerExecutionOrder()
    -
    -# print the sampler functions in the order of execution
    -mcmcConf$printSamplers(executionOrder = TRUE)
    +
    # first sampler to execute twice, in succession:
    +mcmcConf$setSamplerExecutionOrder(c(1, 1, 2, 3, 4, 5))
    +
    +# first sampler to execute multiple times, interleaved:
    +mcmcConf$setSamplerExecutionOrder(c(1, 2, 1, 3, 1, 4, 1, 5))
    +
    +# fourth sampler to execute 10 times, only
    +mcmcConf$setSamplerExecutionOrder(rep(4, 10))
    +
    +# omitting the argument to setSamplerExecutionOrder()
    +# resets the ordering to each sampler executing once, sequentially
    +mcmcConf$setSamplerExecutionOrder()
    +
    +# retrieve the current ordering of sampler execution
    +ordering <- mcmcConf$getSamplerExecutionOrder()
    +
    +# print the sampler functions in the order of execution
    +mcmcConf$printSamplers(executionOrder = TRUE)

    7.2.2.8 Monitors and thinning intervals: printMonitors, getMonitors, setMonitors, addMonitors, resetMonitors and setThin

    An MCMC configuration object contains two independent sets of variables to monitor, each with their own thinning interval: thin corresponding to monitors, and thin2 corresponding to monitors2. Monitors operate at the variable level. Only entire model variables may be monitored. Specifying a monitor on a node, e.g., x[1], will result in the entire variable x being monitored.

    The variables specified in monitors and monitors2 will be recorded (with thinning interval thin) into objects called mvSamples and mvSamples2, contained within the MCMC object. These are both modelValues objects; modelValues are NIMBLE data structures used to store multiple sets of values of model variables18. These can be accessed as the member data mvSamples and mvSamples2 of the MCMC object, and they can be converted to matrices using as.matrix or lists using as.list (see Section 7.7).

    Monitors may be added to the MCMC configuration either in the original call to configureMCMC or using the addMonitors method:

    -
    # Using an argument to configureMCMC
    -mcmcConf <- configureMCMC(Rmodel, monitors = c("alpha", "beta"), 
    -                          monitors2 = "x")
    -
    -# Calling a member method of the mcmcconf object
    -# This results in the same monitors as above
    -mcmcConf$addMonitors("alpha", "beta")
    -mcmcConf$addMonitors2("x")
    +
    # Using an argument to configureMCMC
    +mcmcConf <- configureMCMC(Rmodel, monitors = c("alpha", "beta"), 
    +                          monitors2 = "x")
    +
    +# Calling a member method of the mcmcconf object
    +# This results in the same monitors as above
    +mcmcConf$addMonitors("alpha", "beta")
    +mcmcConf$addMonitors2("x")

    A new set of monitor variables can be added to the MCMC configuration, overwriting the current monitors, using the setMonitors method:

    -
    # Replace old monitors, now monitor "delta" and "gamma" only
    -mcmcConf$setMonitors("gamma", "delta")
    +
    # Replace old monitors, now monitor "delta" and "gamma" only
    +mcmcConf$setMonitors("gamma", "delta")

    Similarly, either thinning interval may be set at either step:

    -
    # Using an argument to configureMCMC
    -mcmcConf <- configureMCMC(Rmodel, thin = 1, thin2 = 100)
    -
    -# Calling a member method of the mcmcConf object
    -# This results in the same thinning intervals as above
    -mcmcConf$setThin(1)
    -mcmcConf$setThin2(100)
    +
    # Using an argument to configureMCMC
    +mcmcConf <- configureMCMC(Rmodel, thin = 1, thin2 = 100)
    +
    +# Calling a member method of the mcmcConf object
    +# This results in the same thinning intervals as above
    +mcmcConf$setThin(1)
    +mcmcConf$setThin2(100)

    The current lists of monitors and thinning intervals may be displayed using the printMonitors method. Both sets of monitors (monitors and monitors2) may be reset to empty character vectors by calling the resetMonitors method. The methods getMonitors and getMonitors2 return the currently specified monitors and monitors2 as character vectors.

    7.2.2.9 Monitoring model log-probabilities

    To record model log-probabilities from an MCMC, one can add monitors for logProb variables (which begin with the prefix logProb_) that correspond to variables with (any) stochastic nodes. For example, to record and extract log-probabilities for the variables alpha, sigma_mu, and Y:

    -
    mcmcConf <- configureMCMC(Rmodel, enableWAIC = TRUE)
    -mcmcConf$addMonitors("logProb_alpha", "logProb_sigma_mu", "logProb_Y")
    -Rmcmc <- buildMCMC(mcmcConf)
    -Cmodel <- compileNimble(Rmodel)
    -Cmcmc <- compileNimble(Rmcmc, project = Rmodel)
    -Cmcmc$run(10000)
    -samples <- as.matrix(Cmcmc$mvSamples)
    +
    mcmcConf <- configureMCMC(Rmodel, enableWAIC = TRUE)
    +mcmcConf$addMonitors("logProb_alpha", "logProb_sigma_mu", "logProb_Y")
    +Rmcmc <- buildMCMC(mcmcConf)
    +Cmodel <- compileNimble(Rmodel)
    +Cmcmc <- compileNimble(Rmcmc, project = Rmodel)
    +Cmcmc$run(10000)
    +samples <- as.matrix(Cmcmc$mvSamples)

    The samples matrix will contain both MCMC samples and model log-probabilities.

    7.2.2.10 Using numeric samples to define a prior distribution

    A set of numeric samples, perhaps generated from another MCMC algorithm, can be used to define the prior distribution of model nodes. This is accomplished using the prior_samples MCMC sampler. When assigning the prior_samples sampler to a target node, you must also provide a vector of numeric values (when target is a scalar node), or a matrix of values in the multidimenional case. A new value (or rows of values) is selected either sequentially (or randomly) from this numeric vector/matrix, and assigned into the target node on each MCMC iteration. See help(prior_samples) for more details:

    -
    mcmcConf <- configureMCMC(Rmodel)
    -mcmcConf$addSampler(target = 'theta', type = 'prior_samples', samples = rnorm(100))
    +
    mcmcConf <- configureMCMC(Rmodel)
    +mcmcConf$addSampler(target = 'theta', type = 'prior_samples', samples = rnorm(100))

    7.3 Building and compiling the MCMC

    Once the MCMC configuration object has been created, and customized to one’s liking, it may be used to build an MCMC function:

    -
    Rmcmc <- buildMCMC(mcmcConf)
    +
    Rmcmc <- buildMCMC(mcmcConf)

    buildMCMC is a nimbleFunction. The returned object Rmcmc is an instance of the nimbleFunction specific to configuration mcmcConf (and of course its associated model).

    Note that if you would like to be able to calculate the WAIC of the model, you should usually set enableWAIC = TRUE as an argument toconfigureMCMC (or to buildMCMC if not using configureMCMC), or set nimbleOptions(MCMCenableWAIC = TRUE), which will enable WAIC calculations for all subsequently built MCMC functions. For more information on WAIC calculations, including situations in which you can calculate WAIC without having set enableWAIC = TRUE see Section 7.8 or help(waic) in R.

    When no customization is needed, one can skip configureMCMC and simply provide a model object to buildMCMC. The following two MCMC functions will be identical:

    -
    mcmcConf <- configureMCMC(Rmodel)   # default MCMC configuration
    -Rmcmc1 <- buildMCMC(mcmcConf)
    -
    -Rmcmc2 <- buildMCMC(Rmodel)   # uses the default configuration for Rmodel
    +
    mcmcConf <- configureMCMC(Rmodel)   # default MCMC configuration
    +Rmcmc1 <- buildMCMC(mcmcConf)
    +
    +Rmcmc2 <- buildMCMC(Rmodel)   # uses the default configuration for Rmodel

    For speed of execution, we usually want to compile the MCMC function to C++ (as is the case for other NIMBLE functions). To do so, we use compileNimble. If the model has already been compiled, it should be provided as the project argument so the MCMC will be part of the same compiled project. A typical compilation call looks like:

    -
    Cmcmc <- compileNimble(Rmcmc, project = Rmodel)
    +
    Cmcmc <- compileNimble(Rmcmc, project = Rmodel)

    Alternatively, if the model has not already been compiled, they can be compiled together in one line:

    -
    Cmcmc <- compileNimble(Rmodel, Rmcmc)
    +
    Cmcmc <- compileNimble(Rmodel, Rmcmc)

    Note that if you compile the MCMC with another object (the model in this case), you’ll need to explicitly refer to the MCMC component of the resulting object to be able to run the MCMC:

    -
    Cmcmc$Rmcmc$run(niter = 1000)
    +
    Cmcmc$Rmcmc$run(niter = 1000)

    7.4 Initializing MCMC

    @@ -906,31 +907,31 @@

    7.5 User-friendly execution of MC

    The following examples demonstrate some uses of runMCMC, and assume the existence of Cmcmc, a compiled MCMC algorithm.

    -
    # run a single chain, and return a matrix of samples
    -mcmc.out <- runMCMC(Cmcmc)
    -
    -# run three chains of 10000 samples, discard initial burn-in of 1000,
    -# record samples thereafter using a thinning interval of 10,
    -# and return of list of sample matrices
    -mcmc.out <- runMCMC(Cmcmc, niter=10000, nburnin=1000, thin=10, nchains=3)
    -
    -# run three chains, returning posterior samples, summary statistics,
    -# and the WAIC value
    -mcmc.out <- runMCMC(Cmcmc, nchains = 3, summary = TRUE, WAIC = TRUE)
    -
    -# run two chains, and specify the initial values for each
    -initsList <- list(list(mu = 1, sigma = 1),
    -                  list(mu = 2, sigma = 10))
    -mcmc.out <- runMCMC(Cmcmc, nchains = 2, inits = initsList)
    -
    -# run ten chains of 100,000 iterations each, using a function to 
    -# generate initial values and a fixed random number seed for each chain.
    -# only return summary statistics from each chain; not all the samples.
    -initsFunction <- function()
    -    list(mu = rnorm(1,0,1), sigma = runif(1,0,100))
    -mcmc.out <- runMCMC(Cmcmc, niter = 100000, nchains = 10,
    -                    inits = initsFunction, setSeed = TRUE,
    -                    samples = FALSE, summary = TRUE)
    +
    # run a single chain, and return a matrix of samples
    +mcmc.out <- runMCMC(Cmcmc)
    +
    +# run three chains of 10000 samples, discard initial burn-in of 1000,
    +# record samples thereafter using a thinning interval of 10,
    +# and return of list of sample matrices
    +mcmc.out <- runMCMC(Cmcmc, niter=10000, nburnin=1000, thin=10, nchains=3)
    +
    +# run three chains, returning posterior samples, summary statistics,
    +# and the WAIC value
    +mcmc.out <- runMCMC(Cmcmc, nchains = 3, summary = TRUE, WAIC = TRUE)
    +
    +# run two chains, and specify the initial values for each
    +initsList <- list(list(mu = 1, sigma = 1),
    +                  list(mu = 2, sigma = 10))
    +mcmc.out <- runMCMC(Cmcmc, nchains = 2, inits = initsList)
    +
    +# run ten chains of 100,000 iterations each, using a function to 
    +# generate initial values and a fixed random number seed for each chain.
    +# only return summary statistics from each chain; not all the samples.
    +initsFunction <- function()
    +    list(mu = rnorm(1,0,1), sigma = runif(1,0,100))
    +mcmc.out <- runMCMC(Cmcmc, niter = 100000, nchains = 10,
    +                    inits = initsFunction, setSeed = TRUE,
    +                    samples = FALSE, summary = TRUE)

    See help(runMCMC) for further details.

    @@ -952,8 +953,8 @@

    7.6.1 Rerunning versus restarting

    7.6.2 Measuring sampler computation times: getTimes

    If you want to obtain the computation time spent in each sampler, you can set time=TRUE as a run-time argument and then use the method getTimes() obtain the times. For example,

    -
    Cmcmc$run(niter, time = TRUE)
    -Cmcmc$getTimes()
    +
    Cmcmc$run(niter, time = TRUE)
    +Cmcmc$getTimes()

    will return a vector of the total time spent in each sampler, measured in seconds.

    Important: the model should be initialized so that the constraints are satisfied. Otherwise, NIMBLE will initialize from the prior, which may not produce a valid initial value and may cause algorithms (particularly) MCMC to fail.

    Formally, dconstraint(cond) is a probability distribution on \(\left\{ 0, 1 \right\}\) such that \(P(1) = 1\) if cond is TRUE and \(P(0) = 1\) if cond is FALSE. @@ -2269,18 +2270,18 @@

    5.2.7.3 Constraints and ordering<
    5.2.7.3.1 Ordering

    To specify an ordering of parameters, such as \(\alpha_1 <= \alpha_2 <= \alpha_3\) one can use dconstraint as follows:

    -
    constraint_data ~ dconstraint( alpha1 <= alpha2 & alpha2 <= alpha3 )
    +
    constraint_data ~ dconstraint( alpha1 <= alpha2 & alpha2 <= alpha3 )

    Note that unlike in BUGS, one cannot specify prior ordering using syntax such as

    -
    alpha[1] ~ dnorm(0, 1) I(, alpha[2])
    -alpha[2] ~ dnorm(0, 1) I(alpha[1], alpha[3])
    -alpha[3] ~ dnorm(0, 1) I(alpha[2], )
    +
    alpha[1] ~ dnorm(0, 1) I(, alpha[2])
    +alpha[2] ~ dnorm(0, 1) I(alpha[1], alpha[3])
    +alpha[3] ~ dnorm(0, 1) I(alpha[2], )

    as this does not represent a directed acyclic graph.

    Also note that specifying prior ordering using T(,) can result in possibly unexpected results. For example:

    -
    alpha1 ~ dnorm(0, 1)
    -alpha2 ~ dnorm(0, 1) T(alpha1, )
    -alpha3 ~ dnorm(0, 1) T(alpha2, )
    +
    alpha1 ~ dnorm(0, 1)
    +alpha2 ~ dnorm(0, 1) T(alpha1, )
    +alpha3 ~ dnorm(0, 1) T(alpha2, )

    will enforce alpha1 \(\le\) alpha2 \(\le\) alpha3, but it does not treat the three parameters symmetrically. Instead it puts a marginal prior on alpha1 that is standard normal and then constrains alpha2 and alpha3 to follow truncated normal distributions. This is not equivalent to a symmetric prior on the three alphas that assigns zero probability density when values are not in order.

    NIMBLE does not support the JAGS sort syntax.

    @@ -2298,22 +2299,22 @@

    5.2.8 Model macrosvignette for the nimbleMacros package and in the documentation for each macro, e.g., help(LINPRED).

    Here is a brief example of using the LM macro to set up a GLMM.

    We’ll consider the Chick example from the lme4 package. We’ll set up a NIMBLE model analogous to this mixed effects model specified with lme4::lmer:

    -
    lmer(weight ~ Time + (1|Chick), data = ChickWeight)
    +
    lmer(weight ~ Time + (1|Chick), data = ChickWeight)

    with a linear effect of time and random intercepts for each chick.

    Here’s the specification of the model code in nimble, using the LM macro:

    -
    library(nimbleMacros)
    -
    -code <- nimbleCode({
    -  LM(weight[1:N] ~ Time + (1|Chick))
    -})
    -
    -chickConst <- list(Time = ChickWeight$Time, Chick = ChickWeight$Chick,
    -                    N = nrow(ChickWeight))
    -chickData <- list(weight = ChickWeight$weight)
    -
    -mod <- nimbleModel(code, constants = chickConst, data = chickData)
    -
    -mod$getCode()
    +
    library(nimbleMacros)
    +
    +code <- nimbleCode({
    +  LM(weight[1:N] ~ Time + (1|Chick))
    +})
    +
    +chickConst <- list(Time = ChickWeight$Time, Chick = ChickWeight$Chick,
    +                    N = nrow(ChickWeight))
    +chickData <- list(weight = ChickWeight$weight)
    +
    +mod <- nimbleModel(code, constants = chickConst, data = chickData)
    +
    +mod$getCode()
    ## {
     ##     "# LM"
     ##     "  ## FORLOOP"
    @@ -2343,7 +2344,7 @@ 

    5.2.8 Model macros
    mod$getMacroParameters()

    +
    mod$getMacroParameters()
    ## $LM
     ## $LM[[1]]
     ## [1] "mu_"         "sd_residual"
    @@ -2372,14 +2373,14 @@ 

    5.2.8 Model macros
    code <- nimbleCode({
    -  mu[1:N] <- LINPRED(~Time[1:N])
    -
    -  weight[1:N] ~ FORLOOP(dnorm(mu[1:N], sd = sigma))
    -  sigma ~ dunif(0, 100)
    -})
    -mod <- nimbleModel(code, constants = chickConst, data = chickData)
    -mod$getCode()

    +
    code <- nimbleCode({
    +  mu[1:N] <- LINPRED(~Time[1:N])
    +
    +  weight[1:N] ~ FORLOOP(dnorm(mu[1:N], sd = sigma))
    +  sigma ~ dunif(0, 100)
    +})
    +mod <- nimbleModel(code, constants = chickConst, data = chickData)
    +mod$getCode()
    ## {
     ##     "# LINPRED"
     ##     "  ## nimbleMacros::FORLOOP"
    @@ -2446,43 +2447,43 @@ 

    5.2.8 Model macros diff --git a/UserManual/overview.html b/UserManual/overview.html index 783123941..bd0e37fa8 100644 --- a/UserManual/overview.html +++ b/UserManual/overview.html @@ -6,7 +6,7 @@ Overview | NimbleUserManual.knit - + @@ -30,7 +30,7 @@ - + @@ -64,7 +64,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } @@ -126,7 +126,6 @@ div.csl-bib-body { } div.csl-entry { clear: both; - margin-bottom: 0em; } .hanging div.csl-entry { margin-left:2em; @@ -281,7 +280,7 @@
  • 7.11.4 Customized log-likelihood evaluations: RW_llFunction sampler
  • 7.12 Detailed MCMC example: litters
  • -
  • 7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
  • +
  • 7.13 Comparing different MCMCs with compareMCMCs
  • 7.14 Running MCMC chains in parallel
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models
  • 7.12 Detailed MCMC example: litters
  • -
  • 7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
  • +
  • 7.13 Comparing different MCMCs with compareMCMCs
  • 7.14 Running MCMC chains in parallel
  • 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature @@ -295,7 +294,18 @@
  • -
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature
  • +
  • 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature +
  • +
  • 8.4 Nested approximation methods +
  • 9 Spatial models