|
| 1 | +# Anycost GANs for Interactive Image Synthesis and Editing |
| 2 | + |
| 3 | +### [video](https://youtu.be/_yEziPl9AkM) | [paper](https://arxiv.org/abs/2103.03243) | [website](https://hanlab.mit.edu/projects/anycost-gan/) [](https://colab.research.google.com/github/mit-han-lab/anycost-gan/blob/master/notebooks/intro_colab.ipynb) |
| 4 | + |
| 5 | +**Anycost GANs for Interactive Image Synthesis and Editing** |
| 6 | + |
| 7 | +[Ji Lin](http://linji.me/), [Richard Zhang](https://richzhang.github.io/), Frieder Ganz, [Song Han](https://songhan.mit.edu/), [Jun-Yan Zhu](https://www.cs.cmu.edu/~junyanz/) |
| 8 | + |
| 9 | +In CVPR 2021 |
| 10 | + |
| 11 | +**Anycost GAN (flexible)** generates consistent outputs under various, fine-grained computation budgets. |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +**Anycost GAN (uniform)** supports 4 resolutions and 4 channel ratios. |
| 18 | + |
| 19 | + |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +We can use the anycost generator for **interactive image editing**. A full generator takes **~3s** to render an image, which is too slow for editing. While with anycost generator, we can provide a visually similar preview at **5x faster speed**. After adjustment, we hit the "Finalize" button to give the high-qaulity, edited output. Check [here](https://youtu.be/_yEziPl9AkM?t=90) for the full demo. |
| 24 | + |
| 25 | +<a href="https://youtu.be/_yEziPl9AkM?t=90"><img src='assets/figures/demo.gif' width=600></a> |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | +## Method |
| 30 | + |
| 31 | +Anycost generators can be run at *diverse computation costs* by using different *channel* and *resolution* configurations. Sub-generators achieve high output consistency compared to the full generator, providng a fast preview. |
| 32 | + |
| 33 | + |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +With (1) Sampling-based multi-resolution training; (2) adaptive-channel training; (3) generator-conditioned discriminator, we can achieve high image quality and consistency at different resolutions and channels. |
| 38 | + |
| 39 | + |
| 40 | + |
| 41 | +## Usage |
| 42 | + |
| 43 | +### Getting Started |
| 44 | + |
| 45 | +- Clone this repo: |
| 46 | + |
| 47 | +```bash |
| 48 | +git clone https://github.com/mit-han-lab/anycost-gan.git |
| 49 | +cd anycost-gan |
| 50 | +``` |
| 51 | + |
| 52 | +- Install PyTorch 1.7 and other dependancies. |
| 53 | + |
| 54 | +We recommend setting up the environment using anaconda: `conda env create -f environment.yml` |
| 55 | + |
| 56 | + |
| 57 | + |
| 58 | +### Introduction Notebook |
| 59 | + |
| 60 | +We provide a jupyter notebook example to show how to use the anycost generator for image synthesis at diverse costs: `notebooks/intro.ipynb`. |
| 61 | + |
| 62 | +We also provide a colab version of the notebook: [](https://colab.research.google.com/github/mit-han-lab/anycost-gan/blob/master/notebooks/intro_colab.ipynb). Be sure to select the GPU as the accelerator in runtime options. |
| 63 | + |
| 64 | + |
| 65 | + |
| 66 | +### Interactive Demo |
| 67 | + |
| 68 | +We provide an interactive demo showing how we can use anycost GAN to enable interactive image editing. To run the demo: |
| 69 | + |
| 70 | +```bash |
| 71 | +python demo.py |
| 72 | +``` |
| 73 | + |
| 74 | +You can find a video recording of the demo [here](https://youtu.be/_yEziPl9AkM?t=90). |
| 75 | + |
| 76 | + |
| 77 | + |
| 78 | +### Using Pre-trained Models |
| 79 | + |
| 80 | +To get the pre-trained generator, encoder, and editing directions, run: |
| 81 | + |
| 82 | +```python |
| 83 | +import model |
| 84 | + |
| 85 | +pretrained_type = 'generator' # choosing from ['generator', 'encoder', 'boundary'] |
| 86 | +config_name = 'anycost-ffhq-config-f' # replace the config name for other models |
| 87 | +model.get_pretrained(pretrained_type, config=config_name) |
| 88 | +``` |
| 89 | + |
| 90 | +We also provide the face attribute classifier (which is general for different generators) for computing the editing directions. You can get it by running: |
| 91 | + |
| 92 | +```python |
| 93 | +model.get_pretrained('attribute-predictor') |
| 94 | +``` |
| 95 | + |
| 96 | +The attribute classifier takes in the face images in FFHQ format. |
| 97 | + |
| 98 | + |
| 99 | + |
| 100 | +After loading the anycost generator, we can run it at diverse cost. For example: |
| 101 | + |
| 102 | +```python |
| 103 | +from model.dynamic_channel import set_uniform_channel_ratio, reset_generator |
| 104 | + |
| 105 | +g = model.get_pretrained('generator', config='anycost-ffhq-config-f') # anycost uniform |
| 106 | +set_uniform_channel_ratio(g, 0.5) # set channel |
| 107 | +g.target_res = 512 # set resolution |
| 108 | +out, _ = g(...) # generate image |
| 109 | +reset_generator(g) # restore the generator |
| 110 | +``` |
| 111 | + |
| 112 | +For a detailed usage and *flexible-channel* anycost generator, please refer to `notebooks/intro.ipynb`. |
| 113 | + |
| 114 | + |
| 115 | + |
| 116 | +### Model Zoo |
| 117 | + |
| 118 | +Currently, we provide the following pre-trained generators, encoders, and editing directions. We will add more in the future. |
| 119 | + |
| 120 | +For anycost generators, by default, we refer to the uniform setting. |
| 121 | + |
| 122 | +| config name | generator | encoder | edit direction | |
| 123 | +| ------------------------------ | ------------------ | ------------------ | ------------------ | |
| 124 | +| anycost-ffhq-config-f | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
| 125 | +| anycost-ffhq-config-f-flexible | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
| 126 | +| anycost-car-config-f | :heavy_check_mark: | | | |
| 127 | +| stylegan2-ffhq-config-f | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
| 128 | + |
| 129 | +`stylegan2-ffhq-config-f` refers to the official StyleGAN2 generator converted from the [repo](https://github.com/NVlabs/stylegan2). |
| 130 | + |
| 131 | + |
| 132 | + |
| 133 | +### Datasets |
| 134 | + |
| 135 | +We prepare the [FFHQ](https://github.com/NVlabs/ffhq-dataset), [CelebA-HQ](https://github.com/switchablenorms/CelebAMask-HQ), and [LSUN Car](https://github.com/fyu/lsun) datasets into a directory of images, so that it can be easily used with `ImageFolder` from `torchvision`. The dataset layout looks like: |
| 136 | + |
| 137 | +``` |
| 138 | +├── PATH_TO_DATASET |
| 139 | +│ ├── images |
| 140 | +│ │ ├── 00000.png |
| 141 | +│ │ ├── 00001.png |
| 142 | +│ │ ├── ... |
| 143 | +``` |
| 144 | + |
| 145 | +Due to the copyright issue, you need to download the dataset from official site and process them accordingly. |
| 146 | + |
| 147 | + |
| 148 | + |
| 149 | +### Evaluation |
| 150 | + |
| 151 | +We provide the code to evaluate some metrics presented in the paper. Some of the code is written with [`horovod`](https://github.com/horovod/horovod) to support distributed evalution and reduce the cost of inter-GPU communication, which greatly improves the speed. Check its website for a proper installation. |
| 152 | + |
| 153 | +#### Fre ́chet Inception Distance (FID) |
| 154 | + |
| 155 | +Before evaluating the FIDs, you need to compute the inception features of the real images using scripts like: |
| 156 | + |
| 157 | +```bash |
| 158 | +python tools/calc_inception.py \ |
| 159 | + --resolution 1024 --batch_size 64 -j 16 --n_sample 50000 \ |
| 160 | + --save_name assets/inceptions/inception_ffhq_res1024_50k.pkl \ |
| 161 | + PATH_TO_FFHQ |
| 162 | +``` |
| 163 | + |
| 164 | +or you can download the pre-computed inceptions from [here](https://www.dropbox.com/sh/bc8a7ewlvcxa2cf/AAD8NFzDWKmBDpbLef-gGhRZa?dl=0) and put it under `assets/inceptions`. |
| 165 | + |
| 166 | +Then, you can evaluate the FIDs by running: |
| 167 | + |
| 168 | +```bash |
| 169 | +horovodrun -np N_GPU \ |
| 170 | + python metrics/fid.py \ |
| 171 | + --config anycost-ffhq-config-f \ |
| 172 | + --batch_size 16 --n_sample 50000 \ |
| 173 | + --inception assets/inceptions/inception_ffhq_res1024_50k.pkl |
| 174 | + # --channel_ratio 0.5 --target_res 512 # optionally using a smaller resolution/channel |
| 175 | +``` |
| 176 | + |
| 177 | +#### Perceptual Path Lenght (PPL) |
| 178 | + |
| 179 | +Similary, evaluting the PPL with: |
| 180 | + |
| 181 | +```bash |
| 182 | +horovodrun -np N_GPU \ |
| 183 | + python metrics/ppl.py \ |
| 184 | + --config anycost-ffhq-config-f |
| 185 | +``` |
| 186 | + |
| 187 | +#### Attribute Consistency |
| 188 | + |
| 189 | +Evaluating the attribute consistency by running: |
| 190 | + |
| 191 | +```bash |
| 192 | +horovodrun -np N_GPU \ |
| 193 | + python metrics/attribute_consistency.py \ |
| 194 | + --config anycost-ffhq-config-f \ |
| 195 | + --channel_ratio 0.5 --target_res 512 # config for the sub-generator; necessary |
| 196 | +``` |
| 197 | + |
| 198 | +#### Encoder Evaluation |
| 199 | + |
| 200 | +To evaluate the performance of the encoder, run: |
| 201 | + |
| 202 | +```bash |
| 203 | +python metrics/eval_encoder.py \ |
| 204 | + --config anycost-ffhq-config-f \ |
| 205 | + --data_path PATH_TO_CELEBA_HQ |
| 206 | +``` |
| 207 | + |
| 208 | + |
| 209 | + |
| 210 | +### Training |
| 211 | + |
| 212 | +The training code will be updated shortly. |
| 213 | + |
| 214 | + |
| 215 | + |
| 216 | +## Citation |
| 217 | + |
| 218 | +If you use this code for your research, please cite our paper. |
| 219 | + |
| 220 | +``` |
| 221 | +@inproceedings{lin2021anycost, |
| 222 | + author = {Lin, Ji and Zhang, Richard and Ganz, Frieder and Han, Song and Zhu, Jun-Yan}, |
| 223 | + title = {Anycost GANs for Interactive Image Synthesis and Editing}, |
| 224 | + booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, |
| 225 | + year = {2021}, |
| 226 | +} |
| 227 | +``` |
| 228 | + |
| 229 | + |
| 230 | + |
| 231 | +## Acknowledgement |
| 232 | + |
| 233 | +We thank Taesung Park, Zhixin Shu, Muyang Li, and Han Cai for the helpful discussion. Part of the work is supported by NSF CAREER Award #1943349, Adobe, Naver Corporation, and MIT-IBM Watson AI Lab. |
| 234 | + |
| 235 | +The codebase is build upon a PyTorch implementation of StyleGAN2: [rosinality/stylegan2-pytorch](https://github.com/rosinality/stylegan2-pytorch). For editing direction extraction, we refer to [InterFaceGAN](https://github.com/genforce/interfacegan). |
0 commit comments