Skip to content

Commit

Permalink
first complete draft. need to work on evaluation section.
Browse files Browse the repository at this point in the history
  • Loading branch information
akashgit committed Apr 5, 2020
1 parent fad7be1 commit 18351cd
Show file tree
Hide file tree
Showing 8 changed files with 81 additions and 29 deletions.
7 changes: 7 additions & 0 deletions ICLR_slides.code-workspace
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"folders": [
{
"path": "."
}
]
}
Binary file added celeba.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added cifar10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
103 changes: 74 additions & 29 deletions iclr2020.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,67 +5,112 @@ marp: true
# **G**enerative **Ra**tio **M**atching (GRAM)

### Goal

A *stable* learning algorithm for deep generative models with *high* dimensional data
- MMD networks are stable but perform poorly when dimension gets large
- Adversarial methods (GANs, MMD-GANs, etc) are not stable in general
A *stable* learning algorithm for *implicit* deep generative models with *high* dimensional data
- MMD networks are stable but perform poorly on high dimensional data
- Adversarial generative methods (GANs, MMD-GANs, etc) can scale upto high dimensional (image) data but are not stable in general

### Key ideas

1. Learn a reduced space in which the density ratio between the data and the generator is close to the density ratio in the original space
2. Train the generator via the MMD loss in this reduced space
1. Learn a low-dimensional sub-space projection in which the density ratio between the data and the generator is close to the density ratio in the original space
2. Train the generator via the MMD loss in this space

---

## Matching ratio via minimising squared ratio difference
## Learning Low Dimensional Sub-space Projection $f_\theta(x)$

We'd like to learn a parameterized transformation $f_\theta(x)$ by minimising
We'd like to learn a parameterized transformation $f_\theta(x)$ by minimising the squared difference between the pair of density ratios:

$$
\begin{aligned}
D(\theta)
&= \int q_x(x) \left( \frac{p_x(x)}{q_x(x)} - \frac{\bar{p}(f_\theta(x))}{\bar{q}(f_\theta(x))} \right)^2 dx \\
&= C - 2 \int p_x(x) \frac{\bar{p}(f_\theta(x))}{\bar{q}(f_\theta(x))} dx + \int q_x(x) \left( \frac{\bar{p}(f_\theta(x))}{\bar{q}(f_\theta(x))} \right)^2 dx \\
&= C - 2 \int \bar{p}(f_\theta(x)) \frac{\bar{p}(f_\theta(x))}{\bar{q}(f_\theta(x))} df_\theta(x) + \int \bar{q}(f_\theta(x)) \left( \frac{\bar{p}(f_\theta(x))}{\bar{q}(f_\theta(x))} \right)^2 df_\theta(x) \\
&= C' - \left( \int \bar{q}(f_\theta(x)) \left( \frac{\bar{p}(f_\theta(x))}{\bar{q}(f_\theta(x))} \right)^2 df_\theta(x) - 1 \right) = C' - \mathrm{PD}(\bar{q}, \bar{p})
&= C - \left( \int \bar{q}(f_\theta(x)) \left( \frac{\bar{p}(f_\theta(x))}{\bar{q}(f_\theta(x))} \right)^2 df_\theta(x) - 1 \right)\\
&= C - \mathrm{PD}(\bar{q}, \bar{p})
\end{aligned}
$$

We can *minimise* the squared ratio difference by *maximising* PD in the reduced space :heart:
We can *minimise* the squared difference by *maximising* Pearson Divergence in the low dimensional space :heart:

---

## Filling up the missing components
## Pearson Divergence Maximisation

- MC estimation of $\mathrm{PD}(\bar{q}, \bar{p}) \approx \frac{1}{N} \sum_{i=1}^N \left( \frac{\bar{p}(f_\theta(x_i))}{\bar{q}(f_\theta(x_i))} \right)^2 - 1$ where $x^q_i \sim q_x$
- We only need density ratios $\frac{\bar{p}(f_\theta(x))}{\bar{q}(f_\theta(x))}$ for a set of samples from $q$ during MC.
- We use a MMD based density ratio estimator (Sugiyama et al., 2012) due to its analytical solution under fixed-design setup: $\hat{r}_q = \mathbf{K}^{-1}_{q,q} \mathbf{K}_{q,p}\mathbf{1}$.
- We carry out a Monte Carlo approximation,
$$
\mathrm{PD}(\bar{q}, \bar{p}) \approx \frac{1}{N} \sum_{i=1}^N \left( \frac{\bar{p}(f_\theta(x_i))}{\bar{q}(f_\theta(x_i))} \right)^2 - 1
$$
where $x^q_i \sim q_x$.
- For this to work, we need an estimator of the density ratio.
<!-- - We only need density ratios $\frac{\bar{p}(f_\theta(x))}{\bar{q}(f_\theta(x))}$ for a set of samples from $q$ during MC. -->
- Use a MMD based density ratio estimator (Sugiyama et al., 2012) under the fixed-design setup: $\hat{r}_q = \mathbf{K}^{-1}_{q,q} \mathbf{K}_{q,p}\mathbf{1}$.
- $\mathbf{K}_{q,q}$ and $\mathbf{K}_{q,p}$ are Gram matrices defined by $[\mathbf{K}_{q,q}]_{i,j} = k(f_\theta(x^q_i),f_\theta(x^q_j))$ and $[\mathbf{K}_{q,p}]_{i,j} = k(f_\theta(x^q_i),f_\theta(x^p_j)).$
- Train the generator via the MMD loss
- Shared Gram matrix between density ratio estimation and generator training
- Simultaneous training of the transform function and the generator
<!-- - Train the generator via the MMD loss -->
<!-- - Shared Gram matrix between density ratio estimation and generator training
- Simultaneous training of the transform function and the generator -->

---

## Extra info

**Density ratio estimation via (infinite) moment matching**

$$
\min_{r\in\mathcal{R}} \bigg \Vert \int k(x; .)p(x) dx - \int k(x; .)r(x)q(x) dx \bigg \Vert_{\mathcal{R}}^2
$$
## Density ratio estimation via (infinite) moment matching

**Maximum mean discrepancy**

$$
\textrm{MMD}_{\mathcal{F}}(p,q) = \sup_{f\in\mathcal{F}} \left(\mathbb{E}_p \lbrack f(x) \rbrack - \mathbb{E}_q \lbrack f(x) \rbrack \right)
$$

Gretton et al. (2012) show that it is sufficient to choose $\mathcal{F}$ to be a unit ball in an reproducing kernel Hilbert space $\mathcal{R}$ with a characteristic kernel $k$. Its MC estimate is
Gretton et al. (2012) show that it is sufficient to choose $\mathcal{F}$ to be a unit ball in an reproducing kernel Hilbert space $\mathcal{R}$ with a characteristic kernel $k$.

$$
<!-- $$
\hat{\textmd{MMD}}^2_\mathcal{R}(p,q) =
\frac{1}{N^2}\sum_{i=1}^N\sum_{i'=1}^N k(x_i,x_{i'})
- \frac{2}{NM}\sum_{i=1}^N\sum_{j=1}^M k(x_i, y_j)
+ \frac{1}{M^2}\sum_{j=1}^M\sum_{j'=1}^M k(y_j,y_{j'})
$$
$$ -->
- Using this definition of MMD, the density ratio estimator $r(x)$ can be derived as the solution to
$$
\min_{r\in\mathcal{R}} \bigg \Vert \int k(x; .)p(x) dx - \int k(x; .)r(x)q(x) dx \bigg \Vert_{\mathcal{R}}^2.
$$

---

## Generator Training

- The generator $G\gamma$ is trained by minimising the emperical estimator of MMD,

$$
\begin{aligned}
\min_\gamma \Bigg[&\frac{1}{N^2}\sum_{i=1}^N\sum_{i'=1}^N k(f_\theta(x_i),f_\theta(x_{i'}))
- \frac{2}{NM}\sum_{i=1}^N\sum_{j=1}^M k(f_\theta(x_i), f_\theta(G_\gamma(z_j)))\\
&\quad + \frac{1}{M^2}\sum_{j=1}^M\sum_{j'=1}^M k(f_\theta(G_\gamma(z_j)),f_\theta(G_\gamma(z_{j'}))) \Bigg ]
\end{aligned}
$$

with respect to it's parameters $\gamma$.

---
## Evaluation

**Synthtic Dataset**
![Image description](syn_0.png)

---
**Synthtic Dataset**

![Image description](syn_1.png)

---

![Image description](syn.png)

---

**CIFAR10 and CelebA**

**Quantitative Results**
![Image description](table_image.png)

---
**Qualitative Results**: Random Samples

![Image description](cifar10.png)![Image description](celeba.png)
Binary file added syn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added syn_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added syn_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added table_image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 18351cd

Please sign in to comment.