README.md 9.46 KB
Newer Older
dongchy920's avatar
dongchy920 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# BigGAN

> [Large Scale GAN Training for High Fidelity Natural Image Synthesis](https://openreview.net/forum?id=B1xsqj09Fm)

<!-- [ALGORITHM] -->

## Abstract

<!-- [ABSTRACT] -->

Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128x128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.5 and Frechet Inception Distance (FID) of 7.4, improving over the previous best IS of 52.52 and FID of 18.6.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/28132635/143154280-4cb22e16-92c8-4b34-9e2c-6357ed0bdac8.png"/>
</div>

## Introduction

The `BigGAN/BigGAN-Deep` is a conditional generation model that can generate both high-resolution and high-quality images by scaling up the batch size and the number of model parameters.

We have finished training `BigGAN` in `Cifar10` (32x32) and are aligning training performance in `ImageNet1k` (128x128). Some sampled results are shown below for your reference.

<div align="center">
  <b> Results from our BigGAN trained in CIFAR10</b>
  <br/>
  <img src="https://user-images.githubusercontent.com/22982797/126476913-3ce8e2c8-f189-4caa-90ed-b44e279cb669.png" width="800"/>
</div>

<div align="center">
  <b> Results from our BigGAN trained in ImageNet</b>
  <br/>
  <img src="https://user-images.githubusercontent.com/22982797/127615534-6278ce1b-5cff-4189-83c6-9ecc8de08dfc.png" width="800"/>
</div>

Evaluation of our trained BigGAN.

|                       Models                       |  Dataset   |    FID (Iter)     |      IS (Iter)      |                       Config                        |                       Download                        |
| :------------------------------------------------: | :--------: | :---------------: | :-----------------: | :-------------------------------------------------: | :---------------------------------------------------: |
|                    BigGAN 32x32                    |  CIFAR10   |   9.78(390000)    |    8.70(390000)     | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan/biggan_cifar10_32x32_b25x2_500k.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_cifar10_32x32_b25x2_500k_20210728_110906-08b61a44.pth)\|[log](https://download.openmmlab.com/mmgen/biggan/biggan_cifar10_32_b25x2_500k_20210706_171051.log.json) |
|              BigGAN 128x128 Best FID               | ImageNet1k | **8.69**(1232000) |   101.15(1232000)   | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan/biggan_ajbrock-sn_imagenet1k_128x128_b32x8_1500k.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_best_fid_iter_1232000_20211111_122548-5315b13d.pth)\|[log](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_1500k_20211111_122548-5315b13d.log.json) |
|               BigGAN 128x128 Best IS               | ImageNet1k |  13.51(1328000)   | **129.07**(1328000) | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/biggan/biggan_ajbrock-sn_imagenet1k_128x128_b32x8_1500k.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_best_is_iter_1328000_20211111_122911-28c688bc.pth)\|[log](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_b32x8_1500k_20211111_122548-5315b13d.log.json) |
| Note: `BigGAN-Deep` trained on `ImageNet1k` will come later. |            |                   |                     |                                                     |                                                       |

## Converted weights

Since we haven't finished training our models, we provide you with several pre-trained weights which have been evaluated. Here, we refer to [BigGAN-PyTorch](https://github.com/ajbrock/BigGAN-PyTorch) and [pytorch-pretrained-BigGAN](https://github.com/huggingface/pytorch-pretrained-BigGAN).

Evaluation results and download links are provided below.

|       Models        |  Dataset   |   FID   |   IS    |                     Config                     |                     Download                     |                     Original Download link                      |
| :-----------------: | :--------: | :-----: | :-----: | :--------------------------------------------: | :----------------------------------------------: | :-------------------------------------------------------------: |
|   BigGAN 128x128    | ImageNet1k | 10.1414 | 96.728  | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/biggan/biggan_128x128_cvt_BigGAN-PyTorch_rgb.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan_imagenet1k_128x128_cvt_BigGAN-PyTorch_rgb_20210730_125223-3e353fef.pth) | [link](https://drive.google.com/open?id=1nAle7FCVFZdix2--ks0r5JBkFnKw8ctW) |
| BigGAN-Deep 128x128 | ImageNet1k | 5.9471  | 107.161 | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/biggan/biggan-deep_128x128_cvt_hugging-face_rgb.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan-deep_imagenet1k_128x128_cvt_hugging-face_rgb_20210728_111659-099e96f9.pth) | [link](https://s3.amazonaws.com/models.huggingface.co/biggan/biggan-deep-128-pytorch_model.bin) |
| BigGAN-Deep 256x256 | ImageNet1k | 11.3151 | 135.107 | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/biggan/biggan-deep_256x256_cvt_hugging-face_rgb.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan-deep_imagenet1k_256x256_cvt_hugging-face_rgb_20210728_111735-28651569.pth) | [link](https://s3.amazonaws.com/models.huggingface.co/biggan/biggan-deep-256-pytorch_model.bin) |
| BigGAN-Deep 512x512 | ImageNet1k | 16.8728 | 124.368 | [config](https://github.com/open-mmlab/mmgeneration/blob/master/configs/_base_/models/biggan/biggan-deep_512x512_cvt_hugging-face_rgb.py) | [model](https://download.openmmlab.com/mmgen/biggan/biggan-deep_imagenet1k_512x512_cvt_hugging-face_rgb_20210728_112346-a42585f2.pth) | [link](https://s3.amazonaws.com/models.huggingface.co/biggan/biggan-deep-512-pytorch_model.bin) |

Sampling results are shown below.

<div align="center">
  <b> Results from our BigGAN-Deep with Pre-trained weights in ImageNet 128x128 with truncation factor 0.4</b>
  <br/>
  <img src="https://user-images.githubusercontent.com/22982797/126481730-8da7180b-7b1b-42f0-9bec-78d879b6265b.png" width="800"/>
</div>

<div align="center">
  <b> Results from our BigGAN-Deep with Pre-trained weights in ImageNet 256x256 with truncation factor 0.4</b>
  <br/>
  <img src="https://user-images.githubusercontent.com/22982797/126486040-64effa29-959e-4e43-bcae-15925a2e0599.png" width="800"/>
</div>

<div align="center">
  <b> Results from our BigGAN-Deep with Pre-trained weights in ImageNet 512x512 truncation factor 0.4</b>
  <br/>
  <img src="https://user-images.githubusercontent.com/22982797/126487428-50101454-59cb-469d-a1f1-36ffb6291582.png" width="800"/>
</div>
Sampling with truncation trick above can be performed by command below.

```bash
python demo/conditional_demo.py CONFIG_PATH CKPT_PATH --sample-cfg truncation=0.4 # set truncation value as you want
```

For converted weights, we provide model configs under `configs/_base_/models` listed as follows:

```bash
# biggan_128x128_cvt_BigGAN-PyTorch_rgb.py
# biggan-deep_128x128_cvt_hugging-face_rgb.py
# biggan-deep_256x256_cvt_hugging-face_rgb.py
# biggan-deep_512x512_cvt_hugging-face_rgb.py
```

## Interpolation

To perform image Interpolation on BigGAN(or other conditional models), run

```bash
python apps/conditional_interpolate.py CONFIG_PATH  CKPT_PATH  --samples-path SAMPLES_PATH
```

<div align="center">
  <b> Image interpolating Results of our BigGAN-Deep</b>
  <br/>
  <img src="https://user-images.githubusercontent.com/22982797/126580403-2baa987b-ff55-4fb5-a53a-b08e8a6a72a2.png" width="800"/>
</div>

To perform image Interpolation on BigGAN with fixed noise, run

```bash
python apps/conditional_interpolate.py CONFIG_PATH  CKPT_PATH  --samples-path SAMPLES_PATH --fix-z
```

<div align="center">
  <b> Image interpolating Results of our BigGAN-Deep with fixed noise</b>
  <br/>
  <img src="https://user-images.githubusercontent.com/22982797/128123804-6df1dfca-1057-4b96-8428-787a86f81ef1.png" width="800"/>
</div>
To perform image Interpolation on BigGAN with fixed label, run

```bash
python apps/conditional_interpolate.py CONFIG_PATH  CKPT_PATH  --samples-path SAMPLES_PATH --fix-y
```

<div align="center">
  <b> Image interpolating Results of our BigGAN-Deep with fixed label</b>
  <br/>
  <img src="https://user-images.githubusercontent.com/22982797/128124596-421396f1-3f23-4098-b629-b00d29d710a9.png" width="800"/>
</div>

## Citation

```latex
@inproceedings{
    brock2018large,
    title={Large Scale {GAN} Training for High Fidelity Natural Image Synthesis},
    author={Andrew Brock and Jeff Donahue and Karen Simonyan},
    booktitle={International Conference on Learning Representations},
    year={2019},
    url={https://openreview.net/forum?id=B1xsqj09Fm},
}
```