-
-
Notifications
You must be signed in to change notification settings - Fork 117
Issue/873 embedded laplace #874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
charlesm93
wants to merge
3
commits into
master
Choose a base branch
from
issue/873-embeddedLaplace
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,380 @@ | ||
--- | ||
pagetitle: Embedded Laplace Approximation | ||
--- | ||
|
||
# Embedded Laplace Approximation | ||
|
||
The embedded Laplace approximation can be used to approximate certain | ||
marginal and conditional distributions that arise in latent Gaussian models. | ||
A latent Gaussian model observes the following hierarchical structure: | ||
$$ | ||
\phi \sim p(\phi), \\ | ||
\theta \sim \text{MultiNormal}(0, K(\phi)), \\ | ||
y \sim p(y \mid \theta, \phi), | ||
$$ | ||
where $K(\phi)$ denotes the prior covariance matrix parameterized by $\phi$. | ||
To sample from the joint posterior $p(\phi, \theta \mid y)$, we can either | ||
use a standard method, such as Markov chain Monte Carlo, or we can follow | ||
a two-step procedure: | ||
|
||
1. sample from the *marginal posterior* $p(\phi \mid y)$, | ||
2. sample from the *conditional posterior* $p(\theta \mid y, \phi)$. | ||
|
||
In practice, neither the marginal posterior nor the conditional posterior | ||
are available in closed form and so they must be approximated. | ||
The marginal posterior can be written as $p(\phi \mid y) \propto p(y \mid \phi) p(\phi)$, | ||
where $p(y \mid \phi) = \int p(y \mid \phi, \theta) p(\theta) d\theta$ $ | ||
is called marginal likelihood. The Laplace method approximates | ||
$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution and the | ||
resulting Gaussian integral can be evaluated analytically to obtain an | ||
approximation to the log marginal likelihood | ||
$\log \hat p(y \mid \phi) \approx \log p(y \mid \phi)$. | ||
|
||
Combining this marginal likelihood with the prior in the `model` | ||
block, we can then sample from the marginal posterior $p(\phi \mid y)$ | ||
using one of Stan's algorithms. The marginal posterior is lower | ||
dimensional and likely to have easier shape to sample leading more | ||
efficient inference. On the other hand each marginal likelihood | ||
computation is more costly, and the combined change in efficiency | ||
depends on the case. | ||
|
||
To obtain posterior draws for $\theta$, we sample from the normal | ||
approximation to $p(\theta \mid y, \phi)$ in `generated quantities`. | ||
The process of iteratively sampling from $p(\phi \mid y)$ (say, with MCMC) and | ||
then $p(\theta \mid y, \phi)$ produces samples from the joint posterior | ||
$p(\theta, \phi \mid y)$. | ||
|
||
The Laplace approximation is especially useful if $p(\theta)$ is | ||
multivariate normal and $p(y \mid \phi, \theta)$ is | ||
log-concave. Stan's embedded Laplace approximation is restricted to | ||
have multivariate normal prior $p(\theta)$ and ... likelihood | ||
$p(y \mid \phi, \theta)$. | ||
|
||
|
||
## Specifying the likelihood function | ||
|
||
The first step to use the embedded Laplace approximation is to write down a | ||
function in the `functions` block which returns the log joint likelihood | ||
`\log p(y \mid \theta, \phi)`. There are a few constraints on this function: | ||
|
||
* The function return type must be `real` | ||
|
||
* The first argument must be the latent Gaussian variable $\theta$ and must | ||
have type `vector`. | ||
|
||
* The operations in the function must support higher-order automatic | ||
differentation (AD). Most functions in Stan support higher-order AD. | ||
The exceptions are functions with specialized calls for reverse-mode AD, and | ||
these are higher-order functions (algebraic solvers, differential equation | ||
solvers, and integrators) and the suite of hidden Markov model (HMM) functions. | ||
|
||
The signature of the function is | ||
``` | ||
real ll_function(vector theta, ...) | ||
``` | ||
There is no type restrictions for the variadic arguments `...` and each | ||
argument can be passed as data or parameter. As always, users should use | ||
parameter arguments only when nescessary in order to speed up differentiation. | ||
In general, we recommend marking data only arguments with the keyword `data`, | ||
for example, | ||
``` | ||
real ll_function(vector theta, data vector x, ...) | ||
``` | ||
|
||
## Specifying the covariance function | ||
|
||
We next need to specify a function that returns the prior covariance matrix | ||
$K$ as a function of the hyperparameters $\phi$. | ||
The only restriction is that this function returns a matrix with size | ||
$n \times n$ where $n$ is the size of $\theta$. The signature is: | ||
``` | ||
matrix K_function(...) | ||
``` | ||
There is no type restrictions for the variadic arguments. The variables $\phi$ | ||
is implicitly defined as the collection of all non-data arguments passed to | ||
`ll_function` (excluding $\theta$) and `K_function`. | ||
|
||
|
||
## Approximating the log marginal likelihood $\log p(y \mid \phi)$ | ||
|
||
In the `model` block, we increment `target` with `laplace_marginal`, a function | ||
that approximates the log marginal likelihood $\log p(y \mid \phi)$. | ||
This function takes in the | ||
user-specified likelihood and prior covariance functions, as well as their arguments. | ||
These arguments must be passed as tuples, which can be generated on the fly | ||
using parenthesis. | ||
We also need to pass an argument $\theta_0$ which serves as an initial guess for | ||
the optimization problem that underlies the Laplace approximation, | ||
$$ | ||
\underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi). | ||
$$ | ||
The size of $\theta_0$ must be consistent with the size of the $\theta$ argument | ||
passed to `ll_function`. | ||
|
||
The signature of the function is: | ||
``` | ||
target += laplace_marginal(function ll_function, tupple (...), vector theta_0, | ||
function K_function, tupple (...)); | ||
``` | ||
The tuple `(...)` after `ll_function` contains the arguments that get passed | ||
to `ll_function` *excluding $\theta$*. Likewise, the tuple `(...)` after | ||
`ll_function` contains the arguments that get passed to `K_function`. | ||
|
||
It also possible to specify control parameters, which can help improve the | ||
optimization that underlies the Laplace approximation. Specifically: | ||
|
||
* `tol`: the tolerance $\epsilon$ of the optimizer. Specifically, the optimizer | ||
stops when $||\nabla \log p(\theta \mid y, \phi)|| \le \epsilon$. By default, | ||
the value is $\epsilon = 10^{-6}$. | ||
|
||
* `max_num_steps`: the maximum number of steps taken by the optimizer before | ||
it gives up (in which case the Metropolis proposal gets rejected). The default | ||
is 100 steps. | ||
|
||
* `hessian_block_size`: the size of the blocks, assuming the Hessian | ||
$\partial \log p(y \mid \theta, phi) \ \partial \theta$ is block-diagonal. | ||
The structure of the Hessian is determined by the dependence structure of $y$ | ||
on $\theta$. By default, the Hessian is treated as diagonal | ||
(`hessian_block_size=1`). If the Hessian is not block diagonal, then set | ||
`hessian_block_size=n`, where `n` is the size of $\theta$. | ||
|
||
* `solver`: choice of Newton solver. The optimizer used to compute the | ||
Laplace approximation does one of three matrix decompositions to compute a | ||
Newton step. The problem determines which decomposition is numerical stable. | ||
By default (`solver=1`), the solver makes a Cholesky decomposition of the | ||
negative Hessian, $- \partial \log p(y \mid \theta, \phi) / \partial \theta$. | ||
If `solver=2`, the solver makes a Cholesky decomposition of the covariance | ||
matrix $K(\phi)$. | ||
If the Cholesky decomposition cannot be computed for neither the negative | ||
Hessian nor the covariance matrix, use `solver=3` which uses a more expensive | ||
but less specialized approach. | ||
|
||
* `max_steps_linesearch`: maximum number of steps in linesearch. The linesearch | ||
method tries to insure that the Newton step leads to a decrease in the | ||
objective function. If the Newton step does not improve the objective function, | ||
the step is repeatedly halved until the objective function decreases or the | ||
maximum number of steps in the linesearch is reached. By default, | ||
`max_steps_linesearch=0`, meaning no linesearch is performed. | ||
|
||
With these arguments at hand, we can call `laplace_marginal_tol` with the | ||
following signature: | ||
``` | ||
target += laplace_margina_tol(function ll_function, tupple (...), vector theta_0, | ||
function K_function, tupple (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
## Sample from the approximate conditional $\hat{p}(\theta \mid y, \phi)$ | ||
|
||
In `generated quantities`, it is possible to sample from the Laplace | ||
approximation of $p(\theta \mid \phi, y)$ using `laplace_latent_rng`. | ||
The signature for `laplace_latent_rng` follows closely | ||
the signature for `laplace_marginal`: | ||
``` | ||
vector theta = | ||
laplace_latent_rng(function ll_function, tupple (...), vector theta_0, | ||
function K_function, tupple (...)); | ||
``` | ||
Once again, it is possible to specify control parameters: | ||
``` | ||
vector theta = | ||
laplace_latent_tol_rng(function ll_function, tupple (...), vector theta_0, | ||
function K_function, tupple (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
## Built-in Laplace marginal likelihood functions | ||
|
||
Stan supports certain built-in Laplace marginal likelihood functions. | ||
This selection is currently | ||
narrow and expected to grow. The built-in functions exist for the user's | ||
convenience but are not more computationally efficient than specifying log | ||
likelihoods in the `functions` block. | ||
|
||
### Poisson with log link | ||
|
||
Given count data, with each observed count $y_i$ associated with a group | ||
$g(i)$ and a corresponding latent variable $\theta_{g(i)}$, and Poisson model, | ||
the likelihood is | ||
$$ | ||
p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)})). | ||
$$ | ||
The arguments required to compute this likelihood are: | ||
|
||
* `y`: an array of counts. | ||
* `y_index`: an array whose $i^\text{th}$ element indicates to which | ||
group the $i^\text{th}$ observation belongs to. | ||
|
||
The signatures for the embedded Laplace approximation function with a Poisson | ||
likelihood are | ||
``` | ||
real laplace_marginal_poisson_log_lpmf(array[] int y | array[] int y_index, | ||
vector theta0, function K_function, (...)); | ||
|
||
real laplace_marginal_tol_poisson_log_lpmf(array[] int y | array[] int y_index, | ||
vector theta0, function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
|
||
vector laplace_latent_poisson_log_rng(array[] int y, array[] int y_index, | ||
vector theta0, function K_function, (...)); | ||
|
||
vector laplace_latent_tol_poisson_log_rng(array[] int y, array[] int y_index, | ||
vector theta0, function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
|
||
A similar built-in likelihood lets users specify an offset $x_i \in \mathbb R^+$ | ||
to the rate parameter of the Poisson. The likelihood is then, | ||
$$ | ||
p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)}) x_i). | ||
$$ | ||
The signatures for this function are: | ||
``` | ||
real laplace_marginal_poisson2_log_lpmf(array[] int y | array[] int y_index, | ||
vector x, vector theta0, | ||
function K_function, (...)); | ||
|
||
real laplace_marginal_tol_poisson2_log_lpmf(array[] int y | array[] int y_index, | ||
vector x, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
|
||
vector laplace_latent_poisson2_log_rng(array[] int y, array[] int y_index, | ||
vector x, vector theta0, | ||
function K_function, (...)); | ||
|
||
vector laplace_latent_tol_poisson2_log_rng(array[] int y, array[] int y_index, | ||
vector x, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
|
||
### Negative Binomial with log link | ||
|
||
The negative Bionomial distribution generalizes the Poisson distribution by | ||
introducing the dispersion parameter $\eta$. The corresponding likelihood is then | ||
$$ | ||
p(y \mid \theta, \phi) = \prod_i\text{NegBinomial2} (y_i \mid \exp(\theta_{g(i)}), \eta). | ||
$$ | ||
Here we use the alternative paramererization implemented in Stan, meaning that | ||
$$ | ||
\mathbb E(y_i) = \exp (\theta_{g(i)}), \\ | ||
\text{Var}(y_i) = \mathbb E(y_i) + \frac{(\mathbb E(y_i))^2}{\eta}. | ||
$$ | ||
The arguments for the likelihood function are: | ||
|
||
* `y`: the observed counts | ||
* `y_index`: an array whose $i^\text{th}$ element indicates to which | ||
group the $i^\text{th}$ observation belongs to. | ||
* `eta`: the overdispersion parameter. | ||
|
||
The function signatures for the embedded Laplace approximation with a negative | ||
Binomial likelihood are | ||
``` | ||
real laplace_marginal_neg_binomial_2_log_lpmf(array[] int y | | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...)); | ||
|
||
real laplace_marginal_tol_neg_binomial_2_log_lpmf(array[] int y | | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
|
||
vector laplace_latent_neg_binomial_2_log_rng(array[] int y, | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...)); | ||
|
||
vector laplace_latent_tol_neg_binomial_2_log_rng(array[] int y, | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
### Bernoulli with logit link | ||
|
||
Given binary outcome $y_i \in \{0, 1\}$ and Bernoulli model, the likelihood is | ||
$$ | ||
p(y \mid \theta, \phi) = \prod_i\text{Bernoulli} (y_i \mid \text{logit}^{-1}(\theta_{g(i)})). | ||
$$ | ||
The arguments of the likelihood function are: | ||
|
||
* `y`: the observed counts | ||
* `y_index`: an array whose $i^\text{th}$ element indicates to which | ||
group the $i^\text{th}$ observation belongs to. | ||
|
||
The function signatures for the embedded Laplace approximation with a Bernoulli likelihood are | ||
``` | ||
real laplace_marginal_bernoulli_logit_lpmf(array[] int y | | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...)); | ||
|
||
real laplace_marginal_tol_bernoulli_logit_lpmf(array[] int y | | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
|
||
vector laplace_latent_bernoulli_logit_rng(array[] int y, | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...)); | ||
|
||
vector laplace_latent_tol_bernoulli_logit_rng(array[] int y, | ||
array[] int y_index, real eta, vector theta0, | ||
function K_function, (...), | ||
real tol, int max_steps, int hessian_block_size, | ||
int solver, int max_steps_linesearch); | ||
``` | ||
|
||
<!-- ## Draw approximate samples for out-of-sample latent variables. --> | ||
|
||
<!-- In many applications, it is of interest to draw latent variables for --> | ||
<!-- in-sample and out-of-sample predictions. We respectively denote these latent --> | ||
<!-- variables $\theta$ and $\theta^*$. In a latent Gaussian model, --> | ||
<!-- $(\theta, \theta^*)$ jointly follow a prior multivariate normal distribution: --> | ||
<!-- $$ --> | ||
<!-- \theta, \theta^* \sim \text{MultiNormal}(0, {\bf K}(\phi)), --> | ||
<!-- $$ --> | ||
<!-- where $\bf K$ designates the joint covariance matrix over $\theta, \theta^*$. --> | ||
|
||
<!-- We can break $\bf K$ into three components, --> | ||
<!-- $$ --> | ||
<!-- {\bf K} = \begin{bmatrix} --> | ||
<!-- K & \\ --> | ||
<!-- K^* & K^{**} --> | ||
<!-- \end{bmatrix}, --> | ||
<!-- $$ --> | ||
<!-- where $K$ is the prior covariance matrix for $\theta$, $K^{**}$ the prior --> | ||
<!-- covariance matrix for $\theta^*$, and $K^*$ the covariance matrix between --> | ||
<!-- $\theta$ and $\theta^*$. --> | ||
|
||
<!-- Stan supports the case where $\theta$ is associated with an in-sample --> | ||
<!-- covariate $X$ and $\theta^*$ with an out-of-sample covariate $X^*$. --> | ||
<!-- Furthermore, the covariance function is written in such a way that --> | ||
<!-- $$ --> | ||
<!-- K = f(..., X, X), \ \ K^{**} = f(..., X^*, X^*), \ \ K^* = f(..., X, X^*), --> | ||
<!-- $$ --> | ||
<!-- as is typically the case in Gaussian process models. --> | ||
|
||
|
||
|
||
|
||
|
||
<!-- The --> | ||
<!-- function `laplace_latent_rng` produces samples from the Laplace approximation --> | ||
<!-- and admits nearly the same arguments as `laplace_marginal`. A key difference --> | ||
<!-- is that --> | ||
<!-- ``` --> | ||
<!-- vector laplace_latent_rng(function ll_function, tupple (...), vector theta_0, --> | ||
<!-- function K_function, tupple (...)); --> | ||
<!-- ``` --> | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add here the restrictions for the likelihood