marp |
---|
true |
Akash Srivastava$^{\ast,1,2}$, Kai Xu$^{\ast,3}$, Michael U. Gutmann$^{3}$, Charles Sutton$^{3,4,5}$
Adversarial Generative Models(GANs, MMD-GANs)
- ✅ can generate high-dimensional data such as natural images.
- ❌ are very difficult to train due to the saddle-point optimization problem
GRaM is a stable learning algorithm for implicit deep generative models that does not involve a saddle-point optimization problem and therefore is easy to train 🎉
Two steps in the training loop
- Learn a projection function (
$f_\theta$ )- that projects the data (
$p_x$ ) and the model ($q_x$ ) densities into a low-dimensional manifold which, - preserves the difference between this pair of densities.
- We use the ratio (
$r(x) = \frac{p_x}{q_x}$ ) of the two densities as the measure of this difference.
- that projects the data (
- Train the model (
$G_\gamma$ ) in the low-dimensional manifold- using the Maximum Mean Discrepancy criterion as it work very well in low dimensional data.
1️⃣ Learn the manifold projection function
2️⃣ Train the generator
$$ \begin{aligned} \min_\gamma \Bigg[&\frac{1}{N^2}\sum_{i=1}^N\sum_{i'=1}^N k(f_\theta(x_i),f_\theta(x_{i'}))
- \frac{2}{NM}\sum_{i=1}^N\sum_{j=1}^M k(f_\theta(x_i), f_\theta(G_\gamma(z_j)))\ &\quad + \frac{1}{M^2}\sum_{j=1}^M\sum_{j'=1}^M k(f_\theta(G_\gamma(z_j)),f_\theta(G_\gamma(z_{j'}))) \Bigg ] \end{aligned} $$
Monte Carlo approximation of PD,
$$
\mathrm{PD}(\bar{q}, \bar{p}) \approx \frac{1}{N} \sum_{i=1}^N \left( \frac{\bar{p}(f_\theta(x_i))}{\bar{q}(f_\theta(x_i))} \right)^2 - 1
$$
where
We use a MMD based density ratio estimator (Sugiyama et al., 2012) under the fixed-design setup: $\hat{r}q = \mathbf{K}^{-1}{q,q} \mathbf{K}_{q,p}\mathbf{1}$.
- $\mathbf{K}{q,q}$ and $\mathbf{K}{q,p}$ are Gram matrices defined by $[\mathbf{K}{q,q}]{i,j} = k(f_\theta(x^q_i),f_\theta(x^q_j))$ and $[\mathbf{K}{q,p}]{i,j} = k(f_\theta(x^q_i),f_\theta(x^p_j)).$
GAN | MMD-net | MMD-GAN | GRAM-net |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
Blue: data Orange: samples Top: original Bottom: projected |
x-axis = noise dimension and y-axis = generator layer size
x-axis = noise dimension and y-axis = generator layer size
Extra slides to follow...
$$ \textrm{MMD}{\mathcal{F}}(p,q) = \sup{f\in\mathcal{F}} \left(\mathbb{E}_p \lbrack f(x) \rbrack - \mathbb{E}_q \lbrack f(x) \rbrack \right) $$
Gretton et al. (2012) show that it is sufficient to choose
Using this definition of MMD, the density ratio estimator
The generator
$$ \begin{aligned} \min_\gamma \Bigg[&\frac{1}{N^2}\sum_{i=1}^N\sum_{i'=1}^N k(f_\theta(x_i),f_\theta(x_{i'}))
- \frac{2}{NM}\sum_{i=1}^N\sum_{j=1}^M k(f_\theta(x_i), f_\theta(G_\gamma(z_j)))\ &\quad + \frac{1}{M^2}\sum_{j=1}^M\sum_{j'=1}^M k(f_\theta(G_\gamma(z_j)),f_\theta(G_\gamma(z_{j'}))) \Bigg ] \end{aligned} $$
with respect to it's parameters