Commit 1401de15 authored by dongchy920's avatar dongchy920
Browse files

stylegan2_mmcv

parents
Pipeline #1274 canceled with stages
_base_ = [
'../_base_/models/improved_ddpm/ddpm_64x64.py',
'../_base_/datasets/imagenet_noaug_64.py', '../_base_/default_runtime.py'
]
lr_config = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
custom_hooks = [
dict(
type='MMGenVisualizationHook',
output_dir='training_samples',
res_name_list=['real_imgs', 'x_0_pred', 'x_t', 'x_t_1'],
padding=1,
interval=1000),
dict(
type='ExponentialMovingAverageHook',
module_keys=('denoising_ema'),
interval=1,
start_iter=0,
interp_cfg=dict(momentum=0.9999),
priority='VERY_HIGH')
]
# do not evaluation in training process because evaluation take too much time.
evaluation = None
total_iters = 1500000 # 1500k
data = dict(samples_per_gpu=16) # 8x16=128
# use ddp wrapper for faster training
use_ddp_wrapper = True
find_unused_parameters = False
runner = dict(
type='DynamicIterBasedRunner',
is_dynamic_ddp=False, # Note that this flag should be False.
pass_training_status=True)
inception_pkl = './work_dirs/inception_pkl/imagenet_64x64.pkl'
metrics = dict(
fid50k=dict(
type='FID',
num_images=50000,
bgr2rgb=True,
inception_pkl=inception_pkl,
inception_args=dict(type='StyleGAN')))
Collections:
- Metadata:
Architecture:
- Improved-DDPM
Name: Improved-DDPM
Paper:
- https://arxiv.org/abs/2102.09672
README: configs/improved_ddpm/README.md
Models:
- Config: https://github.com/open-mmlab/mmgeneration/blob/master/configs/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_cifar10_32x32_b8x16_500k.py
In Collection: Improved-DDPM
Metadata:
Training Data: CIFAR
Name: ddpm_cosine_hybird_timestep-4k_drop0.3_cifar10_32x32_b8x16_500k
Results:
- Dataset: CIFAR
Metrics:
FID: 3.8848
Task: Denoising Diffusion Probabilistic Models
Weights: https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_cifar10_32x32_b8x16_500k_20220103_222621-2f42f476.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/improve_ddpm/ddpm_cosine_hybird_timestep-4k_imagenet1k_64x64_b8x16_1500k.py
In Collection: Improved-DDPM
Metadata:
Training Data: IMAGENET
Name: ddpm_cosine_hybird_timestep-4k_imagenet1k_64x64_b8x16_1500k
Results:
- Dataset: IMAGENET
Metrics:
FID: 13.5181
Task: Denoising Diffusion Probabilistic Models
Weights: https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_imagenet1k_64x64_b8x16_1500k_20220103_223919-b8f1a310.pth
- Config: https://github.com/open-mmlab/mmgeneration/blob/master/configs/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_imagenet1k_64x64_b8x16_1500k.py
In Collection: Improved-DDPM
Metadata:
Training Data: IMAGENET
Name: ddpm_cosine_hybird_timestep-4k_drop0.3_imagenet1k_64x64_b8x16_1500k
Results:
- Dataset: IMAGENET
Metrics:
FID: 13.4094
Task: Denoising Diffusion Probabilistic Models
Weights: https://download.openmmlab.com/mmgen/improved_ddpm/ddpm_cosine_hybird_timestep-4k_drop0.3_imagenet1k_64x64_b8x16_1500k_20220103_224427-7bb55975.pth
# LSGAN
> [Least Squares Generative Adversarial Networks](https://openaccess.thecvf.com/content_iccv_2017/html/Mao_Least_Squares_Generative_ICCV_2017_paper.html)
<!-- [ALGORITHM] -->
## Abstract
<!-- [ABSTRACT] -->
Unsupervised learning with generative adversarial networks (GANs) has proven hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson χ2 divergence. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stable during the learning process. We evaluate LSGANs on five scene datasets and the experimental results show that the images generated by LSGANs are of better quality than the ones generated by regular GANs. We also conduct two comparison experiments between LSGANs and regular GANs to illustrate the stability of LSGANs.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/28132635/143052264-afd97b91-5fd1-4134-ad4d-529e364fdcc8.JPG"/>
</div>
## Results and models
<div align="center">
<b> LSGAN 64x64, CelebA-Cropped</b>
<br/>
<img src="https://user-images.githubusercontent.com/22982797/116498716-f4e74200-a8dc-11eb-9c28-5549d96e20a6.png" width="800"/>
</div>
| Models | Dataset | SWD | MS-SSIM | FID | Config | Download |
| :-----------: | :------------: | :-----------------------------: | :-----: | :-----: | :-------------------------------------------------------------: | :---------------------------------------------------------------: |
| LSGAN 64x64 | CelebA-Cropped | 6.16, 6.83, 37.64/16.87 | 0.3216 | 11.9258 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-3_celeba-cropped_64_b128x1_12m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m_20210429_144001-92ca1d0d.pth)\| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m_20210422_131925.log.json) |
| LSGAN 64x64 | LSUN-Bedroom | 5.66, 9.0, 18.6/11.09 | 0.0671 | 30.7390 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-4_lsun-bedroom_64_b128x1_12m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_dcgan-archi_lr-1e-4_64_b128x1_12m_20210429_144602-ec4ec6bb.pth)\| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_dcgan-archi_lr-1e-4_64_b128x1_12m_20210423_005020.log.json) |
| LSGAN 128x128 | CelebA-Cropped | 21.66, 9.83, 16.06, 70.76/29.58 | 0.3691 | 38.3752 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-4_celeba-cropped_128_b64x1_10m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210429_144229-01ba67dc.pth)\| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210423_132126.log.json) |
| LSGAN 128x128 | LSUN-Bedroom | 19.52, 9.99, 7.48, 14.3/12.82 | 0.0612 | 51.5500 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_lsgan-archi_lr-1e-4_lsun-bedroom_128_b64x1_10m.py) | [model](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_lsgan-archi_lr-1e-4_128_b64x1_10m_20210429_155605-cf78c0a8.pth)\| [log](https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_lsgan-archi_lr-1e-4_128_b64x1_10m_20210429_142302.log.json) |
## Citation
```latex
@inproceedings{mao2017least,
title={Least squares generative adversarial networks},
author={Mao, Xudong and Li, Qing and Xie, Haoran and Lau, Raymond YK and Wang, Zhen and Paul Smolley, Stephen},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={2794--2802},
year={2017},
url={https://openaccess.thecvf.com/content_iccv_2017/html/Mao_Least_Squares_Generative_ICCV_2017_paper.html},
}
```
_base_ = [
'../_base_/models/dcgan/dcgan_64x64.py',
'../_base_/datasets/unconditional_imgs_64x64.py',
'../_base_/default_runtime.py'
]
model = dict(gan_loss=dict(type='GANLoss', gan_type='lsgan'))
# define dataset
# you must set `samples_per_gpu` and `imgs_root`
data = dict(
samples_per_gpu=128,
train=dict(imgs_root='./data/celeba-cropped/cropped_images_aligned_png/'))
optimizer = dict(
generator=dict(type='Adam', lr=0.001, betas=(0.5, 0.99)),
discriminator=dict(type='Adam', lr=0.001, betas=(0.5, 0.99)))
# adjust running config
lr_config = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=10000)
]
evaluation = dict(
type='GenerativeEvalHook',
interval=10000,
metrics=dict(
type='FID', num_images=50000, inception_pkl=None, bgr2rgb=True),
sample_kwargs=dict(sample_model='orig'))
total_iters = 100000
# use ddp wrapper for faster training
use_ddp_wrapper = True
find_unused_parameters = False
runner = dict(
type='DynamicIterBasedRunner',
is_dynamic_ddp=False, # Note that this flag should be False.
pass_training_status=True)
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 64, 64)),
fid50k=dict(type='FID', num_images=50000, inception_pkl=None))
_base_ = [
'../_base_/models/dcgan/dcgan_128x128.py',
'../_base_/datasets/unconditional_imgs_128x128.py',
'../_base_/default_runtime.py'
]
model = dict(
discriminator=dict(output_scale=4, out_channels=1),
gan_loss=dict(type='GANLoss', gan_type='lsgan'))
# define dataset
# you must set `samples_per_gpu` and `imgs_root`
data = dict(
samples_per_gpu=64,
train=dict(imgs_root='./data/celeba-cropped/cropped_images_aligned_png/'))
optimizer = dict(
generator=dict(type='Adam', lr=0.0001, betas=(0.5, 0.99)),
discriminator=dict(type='Adam', lr=0.0001, betas=(0.5, 0.99)))
# adjust running config
lr_config = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=10000)
]
evaluation = dict(
type='GenerativeEvalHook',
interval=10000,
metrics=dict(
type='FID', num_images=50000, inception_pkl=None, bgr2rgb=True),
sample_kwargs=dict(sample_model='orig'))
total_iters = 160000
# use ddp wrapper for faster training
use_ddp_wrapper = True
find_unused_parameters = False
runner = dict(
type='DynamicIterBasedRunner',
is_dynamic_ddp=False, # Note that this flag should be False.
pass_training_status=True)
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 128, 128)),
fid50k=dict(type='FID', num_images=50000, inception_pkl=None))
_base_ = [
'../_base_/models/dcgan/dcgan_64x64.py',
'../_base_/datasets/unconditional_imgs_64x64.py',
'../_base_/default_runtime.py'
]
model = dict(
discriminator=dict(output_scale=4, out_channels=1),
gan_loss=dict(type='GANLoss', gan_type='lsgan'))
# define dataset
# you must set `samples_per_gpu` and `imgs_root`
data = dict(
samples_per_gpu=128, train=dict(imgs_root='./data/lsun/bedroom_train'))
optimizer = dict(
generator=dict(type='Adam', lr=0.0001, betas=(0.5, 0.99)),
discriminator=dict(type='Adam', lr=0.0001, betas=(0.5, 0.99)))
# adjust running config
lr_config = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=10000)
]
evaluation = dict(
type='GenerativeEvalHook',
interval=10000,
metrics=dict(
type='FID', num_images=50000, inception_pkl=None, bgr2rgb=True),
sample_kwargs=dict(sample_model='orig'))
total_iters = 100000
# use ddp wrapper for faster training
use_ddp_wrapper = True
find_unused_parameters = False
runner = dict(
type='DynamicIterBasedRunner',
is_dynamic_ddp=False, # Note that this flag should be False.
pass_training_status=True)
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 64, 64)),
fid50k=dict(type='FID', num_images=50000, inception_pkl=None))
_base_ = [
'../_base_/models/lsgan/lsgan_128x128.py',
'../_base_/datasets/unconditional_imgs_128x128.py',
'../_base_/default_runtime.py'
]
# define dataset
# you must set `samples_per_gpu` and `imgs_root`
data = dict(
samples_per_gpu=64, train=dict(imgs_root='./data/lsun/bedroom_train'))
optimizer = dict(
generator=dict(type='Adam', lr=0.0001, betas=(0.5, 0.99)),
discriminator=dict(type='Adam', lr=0.0001, betas=(0.5, 0.99)))
# adjust running config
lr_config = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=10000)
]
evaluation = dict(
type='GenerativeEvalHook',
interval=10000,
metrics=dict(
type='FID', num_images=50000, inception_pkl=None, bgr2rgb=True),
sample_kwargs=dict(sample_model='orig'))
total_iters = 160000
# use ddp wrapper for faster training
use_ddp_wrapper = True
find_unused_parameters = False
runner = dict(
type='DynamicIterBasedRunner',
is_dynamic_ddp=False, # Note that this flag should be False.
pass_training_status=True)
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 128, 128)),
fid50k=dict(type='FID', num_images=50000, inception_pkl=None))
Collections:
- Metadata:
Architecture:
- LSGAN
Name: LSGAN
Paper:
- https://openaccess.thecvf.com/content_iccv_2017/html/Mao_Least_Squares_Generative_ICCV_2017_paper.html
README: configs/lsgan/README.md
Models:
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-3_celeba-cropped_64_b128x1_12m.py
In Collection: LSGAN
Metadata:
Training Data: CELEBA
Name: lsgan_dcgan-archi_lr-1e-3_celeba-cropped_64_b128x1_12m
Results:
- Dataset: CELEBA
Metrics:
FID: 11.9258
MS-SSIM: 0.3216
SWD: 6.16, 6.83, 37.64/16.87
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m_20210429_144001-92ca1d0d.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-4_lsun-bedroom_64_b128x1_12m.py
In Collection: LSGAN
Metadata:
Training Data: LSUN
Name: lsgan_dcgan-archi_lr-1e-4_lsun-bedroom_64_b128x1_12m
Results:
- Dataset: LSUN
Metrics:
FID: 30.739
MS-SSIM: 0.0671
SWD: 5.66, 9.0, 18.6/11.09
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_dcgan-archi_lr-1e-4_64_b128x1_12m_20210429_144602-ec4ec6bb.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-4_celeba-cropped_128_b64x1_10m.py
In Collection: LSGAN
Metadata:
Training Data: CELEBA
Name: lsgan_dcgan-archi_lr-1e-4_celeba-cropped_128_b64x1_10m
Results:
- Dataset: CELEBA
Metrics:
FID: 38.3752
MS-SSIM: 0.3691
SWD: 21.66, 9.83, 16.06, 70.76/29.58
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210429_144229-01ba67dc.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_lsgan-archi_lr-1e-4_lsun-bedroom_128_b64x1_10m.py
In Collection: LSGAN
Metadata:
Training Data: LSUN
Name: lsgan_lsgan-archi_lr-1e-4_lsun-bedroom_128_b64x1_10m
Results:
- Dataset: LSUN
Metrics:
FID: 51.55
MS-SSIM: 0.0612
SWD: 19.52, 9.99, 7.48, 14.3/12.82
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_lsgan-archi_lr-1e-4_128_b64x1_10m_20210429_155605-cf78c0a8.pth
# PGGAN
> [Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://arxiv.org/abs/1710.10196)
<!-- [ALGORITHM] -->
## Abstract
<!-- [ABSTRACT] -->
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024^2. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CelebA dataset.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/28132635/143053374-c03894c3-6def-49c2-94ed-80c4accee726.JPG" />
</div>
## Results and models
<div align="center">
<b> Results (compressed) from our PGGAN trained in CelebA-HQ@1024</b>
<br/>
<img src="https://user-images.githubusercontent.com/12726765/114009864-1df45400-9896-11eb-9d25-da9eabfe02ce.png" width="800"/>
</div>
| Models | Details | MS-SSIM | SWD(xx,xx,xx,xx/avg) | Config | Download |
| :-------------: | :------------: | :-----: | :--------------------------: | :-----------------------------------------------------------------: | :-------------------------------------------------------------------: |
| pggan_128x128 | celeba-cropped | 0.3023 | 3.42, 4.04, 4.78, 20.38/8.15 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.py) | [model](https://download.openmmlab.com/mmgen/pggan/pggan_celeba-cropped_128_g8_20210408_181931-85a2e72c.pth) |
| pggan_128x128 | lsun-bedroom | 0.0602 | 3.5, 2.96, 2.76, 9.65/4.72 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_lsun-bedroom_128_g8_12Mimgs.py) | [model](https://download.openmmlab.com/mmgen/pggan/pggan_lsun-bedroom_128x128_g8_20210408_182033-5e59f45d.pth) |
| pggan_1024x1024 | celeba-hq | 0.3379 | 8.93, 3.98, 3.07, 2.64/4.655 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_celeba-hq_1024_g8_12Mimg.py) | [model](https://download.openmmlab.com/mmgen/pggan/pggan_celeba-hq_1024_g8_20210408_181911-f1ef51c3.pth) |
## Citation
<summary align="right"><a href="https://arxiv.org/abs/1710.10196">PGGAN (arXiv'2017)</a></summary>
```latex
@article{karras2017progressive,
title={Progressive growing of gans for improved quality, stability, and variation},
author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
journal={arXiv preprint arXiv:1710.10196},
year={2017},
url={https://arxiv.org/abs/1710.10196},
}
```
Collections:
- Metadata:
Architecture:
- PGGAN
Name: PGGAN
Paper:
- https://arxiv.org/abs/1710.10196
README: configs/pggan/README.md
Models:
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.py
In Collection: PGGAN
Metadata:
Training Data: CELEBA
Name: pggan_celeba-cropped_128_g8_12Mimgs
Results:
- Dataset: CELEBA
Metrics:
Details: celeba-cropped
MS-SSIM: 0.3023
SWD(xx,xx,xx,xx/avg): 3.42, 4.04, 4.78, 20.38/8.15
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pggan/pggan_celeba-cropped_128_g8_20210408_181931-85a2e72c.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_lsun-bedroom_128_g8_12Mimgs.py
In Collection: PGGAN
Metadata:
Training Data: LSUN
Name: pggan_lsun-bedroom_128_g8_12Mimgs
Results:
- Dataset: LSUN
Metrics:
Details: lsun-bedroom
MS-SSIM: 0.0602
SWD(xx,xx,xx,xx/avg): 3.5, 2.96, 2.76, 9.65/4.72
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pggan/pggan_lsun-bedroom_128x128_g8_20210408_182033-5e59f45d.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_celeba-hq_1024_g8_12Mimg.py
In Collection: PGGAN
Metadata:
Training Data: CELEBA
Name: pggan_celeba-hq_1024_g8_12Mimg
Results:
- Dataset: CELEBA
Metrics:
Details: celeba-hq
MS-SSIM: 0.3379
SWD(xx,xx,xx,xx/avg): 8.93, 3.98, 3.07, 2.64/4.655
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pggan/pggan_celeba-hq_1024_g8_20210408_181911-f1ef51c3.pth
_base_ = [
'../_base_/models/pggan/pggan_128x128.py',
'../_base_/datasets/grow_scale_imgs_128x128.py',
'../_base_/default_runtime.py'
]
optimizer = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
data = dict(
samples_per_gpu=64,
train=dict(
imgs_roots={'128': './data/celeba-cropped/cropped_images_aligned_png'},
gpu_samples_base=4,
# note that this should be changed with total gpu number
gpu_samples_per_scale={
'4': 64,
'8': 32,
'16': 16,
'32': 8,
'64': 4
}))
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=5000),
dict(type='PGGANFetchDataHook', interval=1),
dict(
type='ExponentialMovingAverageHook',
module_keys=('generator_ema', ),
interval=1,
priority='VERY_HIGH')
]
lr_config = None
total_iters = 280000
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 128, 128)))
_base_ = [
'../_base_/models/pggan/pggan_1024.py',
'../_base_/datasets/grow_scale_imgs_celeba-hq.py',
'../_base_/default_runtime.py'
]
optimizer = None
checkpoint_config = dict(interval=5000, by_epoch=False, max_keep_ckpts=20)
data = dict(
samples_per_gpu=64,
train=dict(
gpu_samples_base=4,
# note that this should be changed with total gpu number
gpu_samples_per_scale={
'4': 64,
'8': 32,
'16': 16,
'32': 8,
'64': 4
},
))
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=5000),
dict(type='PGGANFetchDataHook', interval=1),
dict(
type='ExponentialMovingAverageHook',
module_keys=('generator_ema', ),
interval=1,
priority='VERY_HIGH')
]
lr_config = None
total_iters = 280000
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 1024, 1024)))
_base_ = [
'../_base_/models/pggan/pggan_128x128.py',
'../_base_/datasets/grow_scale_imgs_128x128.py',
'../_base_/default_runtime.py'
]
optimizer = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
data = dict(
samples_per_gpu=64,
train=dict(
imgs_roots={'128': './data/lsun/bedroom_train'},
gpu_samples_base=4,
# note that this should be changed with total gpu number
gpu_samples_per_scale={
'4': 64,
'8': 32,
'16': 16,
'32': 8,
'64': 4
},
))
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=5000),
dict(type='PGGANFetchDataHook', interval=1),
dict(
type='ExponentialMovingAverageHook',
module_keys=('generator_ema', ),
interval=1,
priority='VERY_HIGH')
]
lr_config = None
total_iters = 280000
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 128, 128)))
# Pix2Pix
> [Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks](https://openaccess.thecvf.com/content_cvpr_2017/html/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.html)
<!-- [ALGORITHM] -->
## Abstract
<!-- [ABSTRACT] -->
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/28132635/143053385-1b03356d-43df-423b-88b2-7a82b73d2edd.JPG"/>
</div>
## Results and Models
<div align="center">
<b> Results from Pix2Pix trained by MMGeneration</b>
<br/>
<img src="https://user-images.githubusercontent.com/22982797/114269080-4ff0ec00-9a37-11eb-92c4-1525864e0307.PNG" width="800"/>
</div>
We use `FID` and `IS` metrics to evaluate the generation performance of pix2pix.<sup>1</sup>
| Models | Dataset | FID | IS | Config | Download |
| :----: | :---------: | :------: | :---: | :----------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------: |
| Ours | facades | 124.9773 | 1.620 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_facades_b1x1_80k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210902_170442-c0958d50.pth) \| [log](https://download.openmmlab.com/mmgen/pix2pix/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210317_172625.log.json)<sup>2</sup> |
| Ours | aerial2maps | 122.5856 | 3.137 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_aerial2maps_b1x1_220k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_a2b_1x1_219200_maps_convert-bgr_20210902_170729-59a31517.pth) |
| Ours | maps2aerial | 88.4635 | 3.310 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_maps2aerial_b1x1_220k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_b2a_1x1_219200_maps_convert-bgr_20210902_170814-6d2eac4a.pth) |
| Ours | edges2shoes | 84.3750 | 2.815 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_wo_jitter_flip_edges2shoes_b1x4_190k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_wo_jitter_flip_1x4_186840_edges2shoes_convert-bgr_20210902_170902-0c828552.pth) |
`FID` comparison with official:
| Dataset | facades | aerial2maps | maps2aerial | edges2shoes | average |
| :------: | :---------: | :----------: | :---------: | :---------: | :----------: |
| official | **119.135** | 149.731 | 102.072 | **75.774** | 111.678 |
| ours | 124.9773 | **122.5856** | **88.4635** | 84.3750 | **105.1003** |
`IS` comparison with official:
| Dataset | facades | aerial2maps | maps2aerial | edges2shoes | average |
| :------: | :-------: | :---------: | :---------: | :---------: | :--------: |
| official | **1.650** | 2.529 | **3.552** | 2.766 | 2.624 |
| ours | 1.620 | **3.137** | 3.310 | **2.815** | **2.7205** |
Note:
1. we strictly follow the [paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.pdf) setting in Section 3.3: "*At inference time, we run the generator net in exactly
the same manner as during the training phase. This differs
from the usual protocol in that we apply dropout at test time,
and we apply batch normalization using the statistics of
the test batch, rather than aggregated statistics of the training batch.*" (i.e., use model.train() mode), thus may lead to slightly different inference results every time.
2. This is the training log before refactoring. Updated logs will be released soon.
## Citation
```latex
@inproceedings{isola2017image,
title={Image-to-image translation with conditional adversarial networks},
author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1125--1134},
year={2017},
url={https://openaccess.thecvf.com/content_cvpr_2017/html/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.html},
}
```
Collections:
- Metadata:
Architecture:
- Pix2Pix
Name: Pix2Pix
Paper:
- https://openaccess.thecvf.com/content_cvpr_2017/html/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.html
README: configs/pix2pix/README.md
Models:
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_facades_b1x1_80k.py
In Collection: Pix2Pix
Metadata:
Training Data: FACADES
Name: pix2pix_vanilla_unet_bn_facades_b1x1_80k
Results:
- Dataset: FACADES
Metrics:
FID: 124.9773
IS: 1.62
Task: Image2Image Translation
Weights: https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210902_170442-c0958d50.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_aerial2maps_b1x1_220k.py
In Collection: Pix2Pix
Metadata:
Training Data: MAPS
Name: pix2pix_vanilla_unet_bn_aerial2maps_b1x1_220k
Results:
- Dataset: MAPS
Metrics:
FID: 122.5856
IS: 3.137
Task: Image2Image Translation
Weights: https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_a2b_1x1_219200_maps_convert-bgr_20210902_170729-59a31517.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_maps2aerial_b1x1_220k.py
In Collection: Pix2Pix
Metadata:
Training Data: MAPS
Name: pix2pix_vanilla_unet_bn_maps2aerial_b1x1_220k
Results:
- Dataset: MAPS
Metrics:
FID: 88.4635
IS: 3.31
Task: Image2Image Translation
Weights: https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_b2a_1x1_219200_maps_convert-bgr_20210902_170814-6d2eac4a.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_wo_jitter_flip_edges2shoes_b1x4_190k.py
In Collection: Pix2Pix
Metadata:
Training Data: EDGES2SHOES
Name: pix2pix_vanilla_unet_bn_wo_jitter_flip_edges2shoes_b1x4_190k
Results:
- Dataset: EDGES2SHOES
Metrics:
FID: 84.375
IS: 2.815
Task: Image2Image Translation
Weights: https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_wo_jitter_flip_1x4_186840_edges2shoes_convert-bgr_20210902_170902-0c828552.pth
_base_ = [
'../_base_/models/pix2pix/pix2pix_vanilla_unet_bn.py',
'../_base_/datasets/paired_imgs_256x256_crop.py',
'../_base_/default_runtime.py'
]
source_domain = 'aerial'
target_domain = 'map'
# model settings
model = dict(
default_domain=target_domain,
reachable_domains=[target_domain],
related_domains=[target_domain, source_domain],
gen_auxiliary_loss=dict(
data_info=dict(
pred=f'fake_{target_domain}', target=f'real_{target_domain}')))
# dataset settings
domain_a = source_domain
domain_b = target_domain
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(286, 286),
interpolation='bicubic'),
dict(
type='FixedCrop',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
crop_size=(256, 256)),
dict(
type='Flip',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
direction='horizontal'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
test_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
dataroot = 'data/paired/maps'
data = dict(
train=dict(dataroot=dataroot, pipeline=train_pipeline),
val=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'),
test=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'))
# optimizer
optimizer = dict(
generators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)),
discriminators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)))
# learning policy
lr_config = None
# checkpoint saving
checkpoint_config = dict(interval=10000, save_optimizer=True, by_epoch=False)
custom_hooks = [
dict(
type='MMGenVisualizationHook',
output_dir='training_samples',
res_name_list=[f'fake_{target_domain}'],
interval=5000)
]
runner = None
use_ddp_wrapper = True
# runtime settings
total_iters = 220000
workflow = [('train', 1)]
exp_name = 'pix2pix_aerial2map'
work_dir = f'./work_dirs/experiments/{exp_name}'
num_images = 1098
metrics = dict(
FID=dict(type='FID', num_images=num_images, image_shape=(3, 256, 256)),
IS=dict(
type='IS',
num_images=num_images,
image_shape=(3, 256, 256),
inception_args=dict(type='pytorch')))
evaluation = dict(
type='TranslationEvalHook',
target_domain=domain_b,
interval=10000,
metrics=[
dict(type='FID', num_images=num_images, bgr2rgb=True),
dict(
type='IS',
num_images=num_images,
inception_args=dict(type='pytorch'))
],
best_metric=['fid', 'is'])
_base_ = [
'../_base_/models/pix2pix/pix2pix_vanilla_unet_bn.py',
'../_base_/datasets/paired_imgs_256x256_crop.py',
'../_base_/default_runtime.py'
]
source_domain = 'mask'
target_domain = 'photo'
# model settings
model = dict(
default_domain=target_domain,
reachable_domains=[target_domain],
related_domains=[target_domain, source_domain],
gen_auxiliary_loss=dict(
data_info=dict(
pred=f'fake_{target_domain}', target=f'real_{target_domain}')))
# dataset settings
domain_a = target_domain
domain_b = source_domain
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(286, 286),
interpolation='bicubic'),
dict(
type='FixedCrop',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
crop_size=(256, 256)),
dict(
type='Flip',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
direction='horizontal'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
test_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
dataroot = 'data/paired/facades'
data = dict(
train=dict(dataroot=dataroot, pipeline=train_pipeline),
val=dict(dataroot=dataroot, pipeline=test_pipeline),
test=dict(dataroot=dataroot, pipeline=test_pipeline))
# optimizer
optimizer = dict(
generators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)),
discriminators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)))
# learning policy
lr_config = None
# checkpoint saving
checkpoint_config = dict(interval=10000, save_optimizer=True, by_epoch=False)
custom_hooks = [
dict(
type='MMGenVisualizationHook',
output_dir='training_samples',
res_name_list=[f'fake_{target_domain}'],
interval=5000)
]
runner = None
use_ddp_wrapper = True
# runtime settings
total_iters = 80000
workflow = [('train', 1)]
exp_name = 'pix2pix_facades'
work_dir = f'./work_dirs/experiments/{exp_name}'
num_images = 106
metrics = dict(
FID=dict(type='FID', num_images=num_images, image_shape=(3, 256, 256)),
IS=dict(
type='IS',
num_images=num_images,
image_shape=(3, 256, 256),
inception_args=dict(type='pytorch')))
evaluation = dict(
type='TranslationEvalHook',
target_domain=domain_b,
interval=10000,
metrics=[
dict(type='FID', num_images=num_images, bgr2rgb=True),
dict(
type='IS',
num_images=num_images,
inception_args=dict(type='pytorch'))
],
best_metric=['fid', 'is'])
_base_ = [
'../_base_/models/pix2pix/pix2pix_vanilla_unet_bn.py',
'../_base_/datasets/paired_imgs_256x256_crop.py',
'../_base_/default_runtime.py'
]
source_domain = 'map'
target_domain = 'aerial'
# model settings
model = dict(
default_domain=target_domain,
reachable_domains=[target_domain],
related_domains=[target_domain, source_domain],
gen_auxiliary_loss=dict(
data_info=dict(
pred=f'fake_{target_domain}', target=f'real_{target_domain}')))
# dataset settings
domain_a = target_domain
domain_b = source_domain
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(286, 286),
interpolation='bicubic'),
dict(
type='FixedCrop',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
crop_size=(256, 256)),
dict(
type='Flip',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
direction='horizontal'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
test_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
dataroot = 'data/paired/maps'
data = dict(
train=dict(dataroot=dataroot, pipeline=train_pipeline),
val=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'),
test=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'))
# optimizer
optimizer = dict(
generators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)),
discriminators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)))
# learning policy
lr_config = None
# checkpoint saving
checkpoint_config = dict(interval=10000, save_optimizer=True, by_epoch=False)
custom_hooks = [
dict(
type='MMGenVisualizationHook',
output_dir='training_samples',
res_name_list=[f'fake_{target_domain}'],
interval=5000)
]
runner = None
use_ddp_wrapper = True
# runtime settings
total_iters = 220000
workflow = [('train', 1)]
exp_name = 'pix2pix_maps2aerial'
work_dir = f'./work_dirs/experiments/{exp_name}'
num_images = 1098
metrics = dict(
FID=dict(type='FID', num_images=num_images, image_shape=(3, 256, 256)),
IS=dict(
type='IS',
num_images=num_images,
image_shape=(3, 256, 256),
inception_args=dict(type='pytorch')))
evaluation = dict(
type='TranslationEvalHook',
target_domain=domain_b,
interval=10000,
metrics=[
dict(type='FID', num_images=num_images, bgr2rgb=True),
dict(
type='IS',
num_images=num_images,
inception_args=dict(type='pytorch'))
],
best_metric=['fid', 'is'])
_base_ = [
'../_base_/models/pix2pix/pix2pix_vanilla_unet_bn.py',
'../_base_/datasets/paired_imgs_256x256.py', '../_base_/default_runtime.py'
]
source_domain = 'edges'
target_domain = 'photo'
# model settings
model = dict(
default_domain=target_domain,
reachable_domains=[target_domain],
related_domains=[target_domain, source_domain],
gen_auxiliary_loss=dict(
data_info=dict(
pred=f'fake_{target_domain}', target=f'real_{target_domain}')))
# dataset settings
domain_a = source_domain
domain_b = target_domain
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(286, 286),
interpolation='bicubic'),
dict(
type='FixedCrop',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
crop_size=(256, 256)),
dict(
type='Flip',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
direction='horizontal'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
test_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
dataroot = 'data/paired/edges2shoes'
data = dict(
train=dict(dataroot=dataroot, pipeline=train_pipeline),
val=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'),
test=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'))
# optimizer
optimizer = dict(
generators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)),
discriminators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)))
# learning policy
lr_config = None
# checkpoint saving
checkpoint_config = dict(interval=10000, save_optimizer=True, by_epoch=False)
custom_hooks = [
dict(
type='MMGenVisualizationHook',
output_dir='training_samples',
res_name_list=[f'fake_{target_domain}'],
interval=5000)
]
runner = None
use_ddp_wrapper = True
# runtime settings
total_iters = 190000
workflow = [('train', 1)]
exp_name = 'pix2pix_edges2shoes_wo_jitter_flip'
work_dir = f'./work_dirs/experiments/{exp_name}'
num_images = 200
metrics = dict(
FID=dict(type='FID', num_images=num_images, image_shape=(3, 256, 256)),
IS=dict(
type='IS',
num_images=num_images,
image_shape=(3, 256, 256),
inception_args=dict(type='pytorch')))
evaluation = dict(
type='TranslationEvalHook',
target_domain=domain_b,
interval=10000,
metrics=[
dict(type='FID', num_images=num_images, bgr2rgb=True),
dict(
type='IS',
num_images=num_images,
inception_args=dict(type='pytorch'))
],
best_metric=['fid', 'is'])
# Positional Encoding in GANs
> [Positional Encoding as Spatial Inductive Bias in GANs](https://openaccess.thecvf.com/content/CVPR2021/html/Xu_Positional_Encoding_As_Spatial_Inductive_Bias_in_GANs_CVPR_2021_paper.html)
<!-- [ALGORITHM] -->
## Abstract
<!-- [ABSTRACT] -->
SinGAN shows impressive capability in learning internal patch distribution despite its limited effective receptive field. We are interested in knowing how such a translation-invariant convolutional generator could capture the global structure with just a spatially i.i.d. input. In this work, taking SinGAN and StyleGAN2 as examples, we show that such capability, to a large extent, is brought by the implicit positional encoding when using zero padding in the generators. Such positional encoding is indispensable for generating images with high fidelity. The same phenomenon is observed in other generative architectures such as DCGAN and PGGAN. We further show that zero padding leads to an unbalanced spatial bias with a vague relation between locations. To offer a better spatial inductive bias, we investigate alternative positional encodings and analyze their effects. Based on a more flexible positional encoding explicitly, we propose a new multi-scale training strategy and demonstrate its effectiveness in the state-of-the-art unconditional generator StyleGAN2. Besides, the explicit spatial inductive bias substantially improve SinGAN for more versatile image manipulation.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/28132635/143053767-c6a503b2-87ff-434a-a439-d9fb0e98d804.JPG"/>
</div>
## Results and models for MS-PIE
<div align="center">
<b> 896x896 results generated from a 256 generator using MS-PIE</b>
<br/>
<img src="https://download.openmmlab.com/mmgen/pe_in_gans/mspie_256-896_demo.png" width="800"/>
</div>
| Models | Reference in Paper | Scales | FID50k | P&R10k | Config | Download |
| :--------------------------: | :----------------: | :------------: | :----: | :---------: | :----------------------------------------------------------: | :-------------------------------------------------------------: |
| stylegan2_c2_256_baseline | Tab.5 config-a | 256 | 5.56 | 75.92/51.24 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/stylegan2_c2_ffhq_256_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/stylegan2_c2_config-a_ffhq_256x256_b3x8_1100k_20210406_145127-71d9634b.pth) |
| stylegan2_c2_512_baseline | Tab.5 config-b | 512 | 4.91 | 75.65/54.58 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/stylegan2_c2_ffhq_512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/stylegan2_c2_config-b_ffhq_512x512_b3x8_1100k_20210406_145142-e85e5cf4.pth) |
| ms-pie_stylegan2_c2_config-c | Tab.5 config-c | 256, 384, 512 | 3.35 | 73.84/55.77 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-c_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-c_ffhq_256-512_b3x8_1100k_20210406_144824-9f43b07d.pth) |
| ms-pie_stylegan2_c2_config-d | Tab.5 config-d | 256, 384, 512 | 3.50 | 73.28/56.16 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-d_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-d_ffhq_256-512_b3x8_1100k_20210406_144840-dbefacf6.pth) |
| ms-pie_stylegan2_c2_config-e | Tab.5 config-e | 256, 384, 512 | 3.15 | 74.13/56.88 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-e_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-e_ffhq_256-512_b3x8_1100k_20210406_144906-98d5a42a.pth) |
| ms-pie_stylegan2_c2_config-f | Tab.5 config-f | 256, 384, 512 | 2.93 | 73.51/57.32 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-512_b3x8_1100k_20210406_144927-4f4d5391.pth) |
| ms-pie_stylegan2_c1_config-g | Tab.5 config-g | 256, 384, 512 | 3.40 | 73.05/56.45 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c1_config-g_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c1_config-g_ffhq_256-512_b3x8_1100k_20210406_144758-2df61752.pth) |
| ms-pie_stylegan2_c2_config-h | Tab.5 config-h | 256, 384, 512 | 4.01 | 72.81/54.35 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-h_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-h_ffhq_256-512_b3x8_1100k_20210406_145006-84cf3f48.pth) |
| ms-pie_stylegan2_c2_config-i | Tab.5 config-i | 256, 384, 512 | 3.76 | 73.26/54.71 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-i_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-i_ffhq_256-512_b3x8_1100k_20210406_145023-c2b0accf.pth) |
| ms-pie_stylegan2_c2_config-j | Tab.5 config-j | 256, 384, 512 | 4.23 | 73.11/54.63 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-j_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-j_ffhq_256-512_b3x8_1100k_20210406_145044-c407481b.pth) |
| ms-pie_stylegan2_c2_config-k | Tab.5 config-k | 256, 384, 512 | 4.17 | 73.05/51.07 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-k_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-k_ffhq_256-512_b3x8_1100k_20210406_145105-6d8cc39f.pth) |
| ms-pie_stylegan2_c2_config-f | higher-resolution | 256, 512, 896 | 4.10 | 72.21/50.29 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-896_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-896_b3x8_1100k_20210406_144943-6c18ad5d.pth) |
| ms-pie_stylegan2_c1_config-f | higher-resolution | 256, 512, 1024 | 6.24 | 71.79/49.92 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c1_config-f_ffhq_256-1024_b2x8_1600k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c1_config-f_ffhq_256-1024_b2x8_1600k_20210406_144716-81cbdc96.pth) |
Note that we report the FID and P&R metric (FFHQ dataset) in the largest scale.
## Results and Models for SinGAN
<div align="center">
<b> Positional Encoding in SinGAN</b>
<br/>
<img src="https://nbei.github.io/gan-pos-encoding/teaser-web-singan.png" width="800"/>
</div>
| Model | Data | Num Scales | Config | Download |
| :-----------------------------: | :-------------------------------------------------: | :--------: | :---------------------------------------------------: | :-----------------------------------------------------: |
| SinGAN + no pad | [balloons.png](https://download.openmmlab.com/mmgen/dataset/singan/balloons.png) | 8 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_balloons.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_balloons_20210406_180014-96f51555.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_balloons_20210406_180014-96f51555.pkl) |
| SinGAN + no pad + no bn in disc | [balloons.png](https://download.openmmlab.com/mmgen/dataset/singan/balloons.png) | 8 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_disc-nobn_balloons.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_balloons_20210406_180059-7d63e65d.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_balloons_20210406_180059-7d63e65d.pkl) |
| SinGAN + no pad + no bn in disc | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_disc-nobn_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_fis_20210406_175720-9428517a.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_fis_20210406_175720-9428517a.pkl) |
| SinGAN + CSG | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_csg_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_fis_20210406_175532-f0ec7b61.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_fis_20210406_175532-f0ec7b61.pkl) |
| SinGAN + CSG | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_csg_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_bohemian_20210407_195455-5ed56db2.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_bohemian_20210407_195455-5ed56db2.pkl) |
| SinGAN + SPE-dim4 | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim4_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_fish_20210406_175933-f483a7e3.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_fish_20210406_175933-f483a7e3.pkl) |
| SinGAN + SPE-dim4 | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim4_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_bohemian_20210406_175820-6e484a35.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_bohemian_20210406_175820-6e484a35.pkl) |
| SinGAN + SPE-dim8 | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim8_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim8_bohemian_20210406_175858-7faa50f3.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim8_bohemian_20210406_175858-7faa50f3.pkl) |
## Citation
```latex
@article{xu2020positional,
title={Positional Encoding as Spatial Inductive Bias in GANs},
author={Xu, Rui and Wang, Xintao and Chen, Kai and Zhou, Bolei and Loy, Chen Change},
journal={arXiv preprint arXiv:2012.05217},
year={2020},
url={https://openaccess.thecvf.com/content/CVPR2021/html/Xu_Positional_Encoding_As_Spatial_Inductive_Bias_in_GANs_CVPR_2021_paper.html},
}
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment