diff --git a/src/functions-reference/embedded_laplace.qmd b/src/functions-reference/embedded_laplace.qmd new file mode 100644 index 000000000..de1e07fc8 --- /dev/null +++ b/src/functions-reference/embedded_laplace.qmd @@ -0,0 +1,380 @@ +--- +pagetitle: Embedded Laplace Approximation +--- + +# Embedded Laplace Approximation + +The embedded Laplace approximation can be used to approximate certain +marginal and conditional distributions that arise in latent Gaussian models. +A latent Gaussian model observes the following hierarchical structure: +$$ + \phi \sim p(\phi), \\ + \theta \sim \text{MultiNormal}(0, K(\phi)), \\ + y \sim p(y \mid \theta, \phi), +$$ +where $K(\phi)$ denotes the prior covariance matrix parameterized by $\phi$. +To sample from the joint posterior $p(\phi, \theta \mid y)$, we can either +use a standard method, such as Markov chain Monte Carlo, or we can follow +a two-step procedure: + +1. sample from the *marginal posterior* $p(\phi \mid y)$, +2. sample from the *conditional posterior* $p(\theta \mid y, \phi)$. + +In practice, neither the marginal posterior nor the conditional posterior +are available in closed form and so they must be approximated. +The marginal posterior can be written as $p(\phi \mid y) \propto p(y \mid \phi) p(\phi)$, +where $p(y \mid \phi) = \int p(y \mid \phi, \theta) p(\theta) d\theta$ $ +is called marginal likelihood. The Laplace method approximates +$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution and the +resulting Gaussian integral can be evaluated analytically to obtain an +approximation to the log marginal likelihood +$\log \hat p(y \mid \phi) \approx \log p(y \mid \phi)$. + +Combining this marginal likelihood with the prior in the `model` +block, we can then sample from the marginal posterior $p(\phi \mid y)$ +using one of Stan's algorithms. The marginal posterior is lower +dimensional and likely to have easier shape to sample leading more +efficient inference. On the other hand each marginal likelihood +computation is more costly, and the combined change in efficiency +depends on the case. + +To obtain posterior draws for $\theta$, we sample from the normal +approximation to $p(\theta \mid y, \phi)$ in `generated quantities`. +The process of iteratively sampling from $p(\phi \mid y)$ (say, with MCMC) and +then $p(\theta \mid y, \phi)$ produces samples from the joint posterior +$p(\theta, \phi \mid y)$. + +The Laplace approximation is especially useful if $p(\theta)$ is +multivariate normal and $p(y \mid \phi, \theta)$ is +log-concave. Stan's embedded Laplace approximation is restricted to +have multivariate normal prior $p(\theta)$ and ... likelihood +$p(y \mid \phi, \theta)$. + + +## Specifying the likelihood function + +The first step to use the embedded Laplace approximation is to write down a +function in the `functions` block which returns the log joint likelihood +`\log p(y \mid \theta, \phi)`. There are a few constraints on this function: + +* The function return type must be `real` + +* The first argument must be the latent Gaussian variable $\theta$ and must +have type `vector`. + +* The operations in the function must support higher-order automatic +differentation (AD). Most functions in Stan support higher-order AD. +The exceptions are functions with specialized calls for reverse-mode AD, and +these are higher-order functions (algebraic solvers, differential equation +solvers, and integrators) and the suite of hidden Markov model (HMM) functions. + +The signature of the function is +``` +real ll_function(vector theta, ...) +``` +There is no type restrictions for the variadic arguments `...` and each +argument can be passed as data or parameter. As always, users should use +parameter arguments only when nescessary in order to speed up differentiation. +In general, we recommend marking data only arguments with the keyword `data`, +for example, +``` +real ll_function(vector theta, data vector x, ...) +``` + +## Specifying the covariance function + +We next need to specify a function that returns the prior covariance matrix +$K$ as a function of the hyperparameters $\phi$. +The only restriction is that this function returns a matrix with size +$n \times n$ where $n$ is the size of $\theta$. The signature is: +``` +matrix K_function(...) +``` +There is no type restrictions for the variadic arguments. The variables $\phi$ +is implicitly defined as the collection of all non-data arguments passed to +`ll_function` (excluding $\theta$) and `K_function`. + + +## Approximating the log marginal likelihood $\log p(y \mid \phi)$ + +In the `model` block, we increment `target` with `laplace_marginal`, a function +that approximates the log marginal likelihood $\log p(y \mid \phi)$. +This function takes in the +user-specified likelihood and prior covariance functions, as well as their arguments. +These arguments must be passed as tuples, which can be generated on the fly +using parenthesis. +We also need to pass an argument $\theta_0$ which serves as an initial guess for +the optimization problem that underlies the Laplace approximation, +$$ + \underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi). +$$ +The size of $\theta_0$ must be consistent with the size of the $\theta$ argument +passed to `ll_function`. + +The signature of the function is: +``` +target += laplace_marginal(function ll_function, tupple (...), vector theta_0, + function K_function, tupple (...)); +``` +The tuple `(...)` after `ll_function` contains the arguments that get passed +to `ll_function` *excluding $\theta$*. Likewise, the tuple `(...)` after +`ll_function` contains the arguments that get passed to `K_function`. + +It also possible to specify control parameters, which can help improve the +optimization that underlies the Laplace approximation. Specifically: + +* `tol`: the tolerance $\epsilon$ of the optimizer. Specifically, the optimizer +stops when $||\nabla \log p(\theta \mid y, \phi)|| \le \epsilon$. By default, +the value is $\epsilon = 10^{-6}$. + +* `max_num_steps`: the maximum number of steps taken by the optimizer before +it gives up (in which case the Metropolis proposal gets rejected). The default +is 100 steps. + +* `hessian_block_size`: the size of the blocks, assuming the Hessian +$\partial \log p(y \mid \theta, phi) \ \partial \theta$ is block-diagonal. +The structure of the Hessian is determined by the dependence structure of $y$ +on $\theta$. By default, the Hessian is treated as diagonal +(`hessian_block_size=1`). If the Hessian is not block diagonal, then set +`hessian_block_size=n`, where `n` is the size of $\theta$. + +* `solver`: choice of Newton solver. The optimizer used to compute the +Laplace approximation does one of three matrix decompositions to compute a +Newton step. The problem determines which decomposition is numerical stable. +By default (`solver=1`), the solver makes a Cholesky decomposition of the +negative Hessian, $- \partial \log p(y \mid \theta, \phi) / \partial \theta$. +If `solver=2`, the solver makes a Cholesky decomposition of the covariance +matrix $K(\phi)$. +If the Cholesky decomposition cannot be computed for neither the negative +Hessian nor the covariance matrix, use `solver=3` which uses a more expensive +but less specialized approach. + +* `max_steps_linesearch`: maximum number of steps in linesearch. The linesearch +method tries to insure that the Newton step leads to a decrease in the +objective function. If the Newton step does not improve the objective function, +the step is repeatedly halved until the objective function decreases or the +maximum number of steps in the linesearch is reached. By default, +`max_steps_linesearch=0`, meaning no linesearch is performed. + +With these arguments at hand, we can call `laplace_marginal_tol` with the +following signature: +``` +target += laplace_margina_tol(function ll_function, tupple (...), vector theta_0, + function K_function, tupple (...), + real tol, int max_steps, int hessian_block_size, + int solver, int max_steps_linesearch); +``` + +## Sample from the approximate conditional $\hat{p}(\theta \mid y, \phi)$ + +In `generated quantities`, it is possible to sample from the Laplace +approximation of $p(\theta \mid \phi, y)$ using `laplace_latent_rng`. +The signature for `laplace_latent_rng` follows closely +the signature for `laplace_marginal`: +``` +vector theta = + laplace_latent_rng(function ll_function, tupple (...), vector theta_0, + function K_function, tupple (...)); +``` +Once again, it is possible to specify control parameters: +``` +vector theta = + laplace_latent_tol_rng(function ll_function, tupple (...), vector theta_0, + function K_function, tupple (...), + real tol, int max_steps, int hessian_block_size, + int solver, int max_steps_linesearch); +``` + +## Built-in Laplace marginal likelihood functions + +Stan supports certain built-in Laplace marginal likelihood functions. +This selection is currently +narrow and expected to grow. The built-in functions exist for the user's +convenience but are not more computationally efficient than specifying log +likelihoods in the `functions` block. + +### Poisson with log link + +Given count data, with each observed count $y_i$ associated with a group +$g(i)$ and a corresponding latent variable $\theta_{g(i)}$, and Poisson model, +the likelihood is +$$ +p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)})). +$$ +The arguments required to compute this likelihood are: + +* `y`: an array of counts. +* `y_index`: an array whose $i^\text{th}$ element indicates to which +group the $i^\text{th}$ observation belongs to. + +The signatures for the embedded Laplace approximation function with a Poisson +likelihood are +``` +real laplace_marginal_poisson_log_lpmf(array[] int y | array[] int y_index, + vector theta0, function K_function, (...)); + +real laplace_marginal_tol_poisson_log_lpmf(array[] int y | array[] int y_index, + vector theta0, function K_function, (...), + real tol, int max_steps, int hessian_block_size, + int solver, int max_steps_linesearch); + +vector laplace_latent_poisson_log_rng(array[] int y, array[] int y_index, + vector theta0, function K_function, (...)); + +vector laplace_latent_tol_poisson_log_rng(array[] int y, array[] int y_index, + vector theta0, function K_function, (...), + real tol, int max_steps, int hessian_block_size, + int solver, int max_steps_linesearch); +``` + + +A similar built-in likelihood lets users specify an offset $x_i \in \mathbb R^+$ +to the rate parameter of the Poisson. The likelihood is then, +$$ +p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)}) x_i). +$$ +The signatures for this function are: +``` +real laplace_marginal_poisson2_log_lpmf(array[] int y | array[] int y_index, + vector x, vector theta0, + function K_function, (...)); + +real laplace_marginal_tol_poisson2_log_lpmf(array[] int y | array[] int y_index, + vector x, vector theta0, + function K_function, (...), + real tol, int max_steps, int hessian_block_size, + int solver, int max_steps_linesearch); + +vector laplace_latent_poisson2_log_rng(array[] int y, array[] int y_index, + vector x, vector theta0, + function K_function, (...)); + +vector laplace_latent_tol_poisson2_log_rng(array[] int y, array[] int y_index, + vector x, vector theta0, + function K_function, (...), + real tol, int max_steps, int hessian_block_size, + int solver, int max_steps_linesearch); +``` + + +### Negative Binomial with log link + +The negative Bionomial distribution generalizes the Poisson distribution by +introducing the dispersion parameter $\eta$. The corresponding likelihood is then +$$ +p(y \mid \theta, \phi) = \prod_i\text{NegBinomial2} (y_i \mid \exp(\theta_{g(i)}), \eta). +$$ +Here we use the alternative paramererization implemented in Stan, meaning that +$$ +\mathbb E(y_i) = \exp (\theta_{g(i)}), \\ +\text{Var}(y_i) = \mathbb E(y_i) + \frac{(\mathbb E(y_i))^2}{\eta}. +$$ +The arguments for the likelihood function are: + +* `y`: the observed counts +* `y_index`: an array whose $i^\text{th}$ element indicates to which +group the $i^\text{th}$ observation belongs to. +* `eta`: the overdispersion parameter. + +The function signatures for the embedded Laplace approximation with a negative +Binomial likelihood are +``` +real laplace_marginal_neg_binomial_2_log_lpmf(array[] int y | + array[] int y_index, real eta, vector theta0, + function K_function, (...)); + +real laplace_marginal_tol_neg_binomial_2_log_lpmf(array[] int y | + array[] int y_index, real eta, vector theta0, + function K_function, (...), + real tol, int max_steps, int hessian_block_size, + int solver, int max_steps_linesearch); + +vector laplace_latent_neg_binomial_2_log_rng(array[] int y, + array[] int y_index, real eta, vector theta0, + function K_function, (...)); + +vector laplace_latent_tol_neg_binomial_2_log_rng(array[] int y, + array[] int y_index, real eta, vector theta0, + function K_function, (...), + real tol, int max_steps, int hessian_block_size, + int solver, int max_steps_linesearch); +``` + +### Bernoulli with logit link + +Given binary outcome $y_i \in \{0, 1\}$ and Bernoulli model, the likelihood is +$$ +p(y \mid \theta, \phi) = \prod_i\text{Bernoulli} (y_i \mid \text{logit}^{-1}(\theta_{g(i)})). +$$ +The arguments of the likelihood function are: + +* `y`: the observed counts +* `y_index`: an array whose $i^\text{th}$ element indicates to which +group the $i^\text{th}$ observation belongs to. + +The function signatures for the embedded Laplace approximation with a Bernoulli likelihood are +``` +real laplace_marginal_bernoulli_logit_lpmf(array[] int y | + array[] int y_index, real eta, vector theta0, + function K_function, (...)); + +real laplace_marginal_tol_bernoulli_logit_lpmf(array[] int y | + array[] int y_index, real eta, vector theta0, + function K_function, (...), + real tol, int max_steps, int hessian_block_size, + int solver, int max_steps_linesearch); + +vector laplace_latent_bernoulli_logit_rng(array[] int y, + array[] int y_index, real eta, vector theta0, + function K_function, (...)); + +vector laplace_latent_tol_bernoulli_logit_rng(array[] int y, + array[] int y_index, real eta, vector theta0, + function K_function, (...), + real tol, int max_steps, int hessian_block_size, + int solver, int max_steps_linesearch); +``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +