"tools/cfgs/vscode:/vscode.git/clone" did not exist on "201f04b40e5ec9911d152eafb4f79e9e4ea8a49c"
Commit 57e0e891 authored by limm's avatar limm
Browse files

add part mmgeneration code

parent 04e07f48
_base_ = [
'../_base_/models/dcgan/dcgan_64x64.py',
'../_base_/datasets/unconditional_imgs_64x64.py',
'../_base_/default_runtime.py'
]
model = dict(
discriminator=dict(output_scale=4, out_channels=1),
gan_loss=dict(type='GANLoss', gan_type='lsgan'))
# define dataset
# you must set `samples_per_gpu` and `imgs_root`
data = dict(
samples_per_gpu=128, train=dict(imgs_root='./data/lsun/bedroom_train'))
optimizer = dict(
generator=dict(type='Adam', lr=0.0001, betas=(0.5, 0.99)),
discriminator=dict(type='Adam', lr=0.0001, betas=(0.5, 0.99)))
# adjust running config
lr_config = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=10000)
]
evaluation = dict(
type='GenerativeEvalHook',
interval=10000,
metrics=dict(
type='FID', num_images=50000, inception_pkl=None, bgr2rgb=True),
sample_kwargs=dict(sample_model='orig'))
total_iters = 100000
# use ddp wrapper for faster training
use_ddp_wrapper = True
find_unused_parameters = False
runner = dict(
type='DynamicIterBasedRunner',
is_dynamic_ddp=False, # Note that this flag should be False.
pass_training_status=True)
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 64, 64)),
fid50k=dict(type='FID', num_images=50000, inception_pkl=None))
_base_ = [
'../_base_/models/lsgan/lsgan_128x128.py',
'../_base_/datasets/unconditional_imgs_128x128.py',
'../_base_/default_runtime.py'
]
# define dataset
# you must set `samples_per_gpu` and `imgs_root`
data = dict(
samples_per_gpu=64, train=dict(imgs_root='./data/lsun/bedroom_train'))
optimizer = dict(
generator=dict(type='Adam', lr=0.0001, betas=(0.5, 0.99)),
discriminator=dict(type='Adam', lr=0.0001, betas=(0.5, 0.99)))
# adjust running config
lr_config = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=10000)
]
evaluation = dict(
type='GenerativeEvalHook',
interval=10000,
metrics=dict(
type='FID', num_images=50000, inception_pkl=None, bgr2rgb=True),
sample_kwargs=dict(sample_model='orig'))
total_iters = 160000
# use ddp wrapper for faster training
use_ddp_wrapper = True
find_unused_parameters = False
runner = dict(
type='DynamicIterBasedRunner',
is_dynamic_ddp=False, # Note that this flag should be False.
pass_training_status=True)
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 128, 128)),
fid50k=dict(type='FID', num_images=50000, inception_pkl=None))
Collections:
- Metadata:
Architecture:
- LSGAN
Name: LSGAN
Paper:
- https://openaccess.thecvf.com/content_iccv_2017/html/Mao_Least_Squares_Generative_ICCV_2017_paper.html
README: configs/lsgan/README.md
Models:
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-3_celeba-cropped_64_b128x1_12m.py
In Collection: LSGAN
Metadata:
Training Data: CELEBA
Name: lsgan_dcgan-archi_lr-1e-3_celeba-cropped_64_b128x1_12m
Results:
- Dataset: CELEBA
Metrics:
FID: 11.9258
MS-SSIM: 0.3216
SWD: 6.16, 6.83, 37.64/16.87
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-3_64_b128x1_12m_20210429_144001-92ca1d0d.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-4_lsun-bedroom_64_b128x1_12m.py
In Collection: LSGAN
Metadata:
Training Data: LSUN
Name: lsgan_dcgan-archi_lr-1e-4_lsun-bedroom_64_b128x1_12m
Results:
- Dataset: LSUN
Metrics:
FID: 30.739
MS-SSIM: 0.0671
SWD: 5.66, 9.0, 18.6/11.09
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_dcgan-archi_lr-1e-4_64_b128x1_12m_20210429_144602-ec4ec6bb.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_dcgan-archi_lr-1e-4_celeba-cropped_128_b64x1_10m.py
In Collection: LSGAN
Metadata:
Training Data: CELEBA
Name: lsgan_dcgan-archi_lr-1e-4_celeba-cropped_128_b64x1_10m
Results:
- Dataset: CELEBA
Metrics:
FID: 38.3752
MS-SSIM: 0.3691
SWD: 21.66, 9.83, 16.06, 70.76/29.58
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/lsgan/lsgan_celeba-cropped_dcgan-archi_lr-1e-4_128_b64x1_10m_20210429_144229-01ba67dc.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/lsgan/lsgan_lsgan-archi_lr-1e-4_lsun-bedroom_128_b64x1_10m.py
In Collection: LSGAN
Metadata:
Training Data: LSUN
Name: lsgan_lsgan-archi_lr-1e-4_lsun-bedroom_128_b64x1_10m
Results:
- Dataset: LSUN
Metrics:
FID: 51.55
MS-SSIM: 0.0612
SWD: 19.52, 9.99, 7.48, 14.3/12.82
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/lsgan/lsgan_lsun-bedroom_lsgan-archi_lr-1e-4_128_b64x1_10m_20210429_155605-cf78c0a8.pth
# PGGAN
> [Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://arxiv.org/abs/1710.10196)
<!-- [ALGORITHM] -->
## Abstract
<!-- [ABSTRACT] -->
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024^2. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CelebA dataset.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/28132635/143053374-c03894c3-6def-49c2-94ed-80c4accee726.JPG" />
</div>
## Results and models
<div align="center">
<b> Results (compressed) from our PGGAN trained in CelebA-HQ@1024</b>
<br/>
<img src="https://user-images.githubusercontent.com/12726765/114009864-1df45400-9896-11eb-9d25-da9eabfe02ce.png" width="800"/>
</div>
| Models | Details | MS-SSIM | SWD(xx,xx,xx,xx/avg) | Config | Download |
| :-------------: | :------------: | :-----: | :--------------------------: | :-----------------------------------------------------------------: | :-------------------------------------------------------------------: |
| pggan_128x128 | celeba-cropped | 0.3023 | 3.42, 4.04, 4.78, 20.38/8.15 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.py) | [model](https://download.openmmlab.com/mmgen/pggan/pggan_celeba-cropped_128_g8_20210408_181931-85a2e72c.pth) |
| pggan_128x128 | lsun-bedroom | 0.0602 | 3.5, 2.96, 2.76, 9.65/4.72 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_lsun-bedroom_128_g8_12Mimgs.py) | [model](https://download.openmmlab.com/mmgen/pggan/pggan_lsun-bedroom_128x128_g8_20210408_182033-5e59f45d.pth) |
| pggan_1024x1024 | celeba-hq | 0.3379 | 8.93, 3.98, 3.07, 2.64/4.655 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_celeba-hq_1024_g8_12Mimg.py) | [model](https://download.openmmlab.com/mmgen/pggan/pggan_celeba-hq_1024_g8_20210408_181911-f1ef51c3.pth) |
## Citation
<summary align="right"><a href="https://arxiv.org/abs/1710.10196">PGGAN (arXiv'2017)</a></summary>
```latex
@article{karras2017progressive,
title={Progressive growing of gans for improved quality, stability, and variation},
author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
journal={arXiv preprint arXiv:1710.10196},
year={2017},
url={https://arxiv.org/abs/1710.10196},
}
```
Collections:
- Metadata:
Architecture:
- PGGAN
Name: PGGAN
Paper:
- https://arxiv.org/abs/1710.10196
README: configs/pggan/README.md
Models:
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_celeba-cropped_128_g8_12Mimgs.py
In Collection: PGGAN
Metadata:
Training Data: CELEBA
Name: pggan_celeba-cropped_128_g8_12Mimgs
Results:
- Dataset: CELEBA
Metrics:
Details: celeba-cropped
MS-SSIM: 0.3023
SWD(xx,xx,xx,xx/avg): 3.42, 4.04, 4.78, 20.38/8.15
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pggan/pggan_celeba-cropped_128_g8_20210408_181931-85a2e72c.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_lsun-bedroom_128_g8_12Mimgs.py
In Collection: PGGAN
Metadata:
Training Data: LSUN
Name: pggan_lsun-bedroom_128_g8_12Mimgs
Results:
- Dataset: LSUN
Metrics:
Details: lsun-bedroom
MS-SSIM: 0.0602
SWD(xx,xx,xx,xx/avg): 3.5, 2.96, 2.76, 9.65/4.72
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pggan/pggan_lsun-bedroom_128x128_g8_20210408_182033-5e59f45d.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pggan/pggan_celeba-hq_1024_g8_12Mimg.py
In Collection: PGGAN
Metadata:
Training Data: CELEBA
Name: pggan_celeba-hq_1024_g8_12Mimg
Results:
- Dataset: CELEBA
Metrics:
Details: celeba-hq
MS-SSIM: 0.3379
SWD(xx,xx,xx,xx/avg): 8.93, 3.98, 3.07, 2.64/4.655
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pggan/pggan_celeba-hq_1024_g8_20210408_181911-f1ef51c3.pth
_base_ = [
'../_base_/models/pggan/pggan_128x128.py',
'../_base_/datasets/grow_scale_imgs_128x128.py',
'../_base_/default_runtime.py'
]
optimizer = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
data = dict(
samples_per_gpu=64,
train=dict(
imgs_roots={'128': './data/celeba-cropped/cropped_images_aligned_png'},
gpu_samples_base=4,
# note that this should be changed with total gpu number
gpu_samples_per_scale={
'4': 64,
'8': 32,
'16': 16,
'32': 8,
'64': 4
}))
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=5000),
dict(type='PGGANFetchDataHook', interval=1),
dict(
type='ExponentialMovingAverageHook',
module_keys=('generator_ema', ),
interval=1,
priority='VERY_HIGH')
]
lr_config = None
total_iters = 280000
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 128, 128)))
_base_ = [
'../_base_/models/pggan/pggan_1024.py',
'../_base_/datasets/grow_scale_imgs_celeba-hq.py',
'../_base_/default_runtime.py'
]
optimizer = None
checkpoint_config = dict(interval=5000, by_epoch=False, max_keep_ckpts=20)
data = dict(
samples_per_gpu=64,
train=dict(
gpu_samples_base=4,
# note that this should be changed with total gpu number
gpu_samples_per_scale={
'4': 64,
'8': 32,
'16': 16,
'32': 8,
'64': 4
},
))
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=5000),
dict(type='PGGANFetchDataHook', interval=1),
dict(
type='ExponentialMovingAverageHook',
module_keys=('generator_ema', ),
interval=1,
priority='VERY_HIGH')
]
lr_config = None
total_iters = 280000
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 1024, 1024)))
_base_ = [
'../_base_/models/pggan/pggan_128x128.py',
'../_base_/datasets/grow_scale_imgs_128x128.py',
'../_base_/default_runtime.py'
]
optimizer = None
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=20)
data = dict(
samples_per_gpu=64,
train=dict(
imgs_roots={'128': './data/lsun/bedroom_train'},
gpu_samples_base=4,
# note that this should be changed with total gpu number
gpu_samples_per_scale={
'4': 64,
'8': 32,
'16': 16,
'32': 8,
'64': 4
},
))
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=5000),
dict(type='PGGANFetchDataHook', interval=1),
dict(
type='ExponentialMovingAverageHook',
module_keys=('generator_ema', ),
interval=1,
priority='VERY_HIGH')
]
lr_config = None
total_iters = 280000
metrics = dict(
ms_ssim10k=dict(type='MS_SSIM', num_images=10000),
swd16k=dict(type='SWD', num_images=16384, image_shape=(3, 128, 128)))
# Pix2Pix
> [Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks](https://openaccess.thecvf.com/content_cvpr_2017/html/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.html)
<!-- [ALGORITHM] -->
## Abstract
<!-- [ABSTRACT] -->
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/28132635/143053385-1b03356d-43df-423b-88b2-7a82b73d2edd.JPG"/>
</div>
## Results and Models
<div align="center">
<b> Results from Pix2Pix trained by MMGeneration</b>
<br/>
<img src="https://user-images.githubusercontent.com/22982797/114269080-4ff0ec00-9a37-11eb-92c4-1525864e0307.PNG" width="800"/>
</div>
We use `FID` and `IS` metrics to evaluate the generation performance of pix2pix.<sup>1</sup>
| Models | Dataset | FID | IS | Config | Download |
| :----: | :---------: | :------: | :---: | :----------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------: |
| Ours | facades | 124.9773 | 1.620 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_facades_b1x1_80k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210902_170442-c0958d50.pth) \| [log](https://download.openmmlab.com/mmgen/pix2pix/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210317_172625.log.json)<sup>2</sup> |
| Ours | aerial2maps | 122.5856 | 3.137 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_aerial2maps_b1x1_220k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_a2b_1x1_219200_maps_convert-bgr_20210902_170729-59a31517.pth) |
| Ours | maps2aerial | 88.4635 | 3.310 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_maps2aerial_b1x1_220k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_b2a_1x1_219200_maps_convert-bgr_20210902_170814-6d2eac4a.pth) |
| Ours | edges2shoes | 84.3750 | 2.815 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_wo_jitter_flip_edges2shoes_b1x4_190k.py) | [model](https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_wo_jitter_flip_1x4_186840_edges2shoes_convert-bgr_20210902_170902-0c828552.pth) |
`FID` comparison with official:
| Dataset | facades | aerial2maps | maps2aerial | edges2shoes | average |
| :------: | :---------: | :----------: | :---------: | :---------: | :----------: |
| official | **119.135** | 149.731 | 102.072 | **75.774** | 111.678 |
| ours | 124.9773 | **122.5856** | **88.4635** | 84.3750 | **105.1003** |
`IS` comparison with official:
| Dataset | facades | aerial2maps | maps2aerial | edges2shoes | average |
| :------: | :-------: | :---------: | :---------: | :---------: | :--------: |
| official | **1.650** | 2.529 | **3.552** | 2.766 | 2.624 |
| ours | 1.620 | **3.137** | 3.310 | **2.815** | **2.7205** |
Note:
1. we strictly follow the [paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.pdf) setting in Section 3.3: "*At inference time, we run the generator net in exactly
the same manner as during the training phase. This differs
from the usual protocol in that we apply dropout at test time,
and we apply batch normalization using the statistics of
the test batch, rather than aggregated statistics of the training batch.*" (i.e., use model.train() mode), thus may lead to slightly different inference results every time.
2. This is the training log before refactoring. Updated logs will be released soon.
## Citation
```latex
@inproceedings{isola2017image,
title={Image-to-image translation with conditional adversarial networks},
author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1125--1134},
year={2017},
url={https://openaccess.thecvf.com/content_cvpr_2017/html/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.html},
}
```
Collections:
- Metadata:
Architecture:
- Pix2Pix
Name: Pix2Pix
Paper:
- https://openaccess.thecvf.com/content_cvpr_2017/html/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.html
README: configs/pix2pix/README.md
Models:
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_facades_b1x1_80k.py
In Collection: Pix2Pix
Metadata:
Training Data: FACADES
Name: pix2pix_vanilla_unet_bn_facades_b1x1_80k
Results:
- Dataset: FACADES
Metrics:
FID: 124.9773
IS: 1.62
Task: Image2Image Translation
Weights: https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_1x1_80k_facades_20210902_170442-c0958d50.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_aerial2maps_b1x1_220k.py
In Collection: Pix2Pix
Metadata:
Training Data: MAPS
Name: pix2pix_vanilla_unet_bn_aerial2maps_b1x1_220k
Results:
- Dataset: MAPS
Metrics:
FID: 122.5856
IS: 3.137
Task: Image2Image Translation
Weights: https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_a2b_1x1_219200_maps_convert-bgr_20210902_170729-59a31517.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_maps2aerial_b1x1_220k.py
In Collection: Pix2Pix
Metadata:
Training Data: MAPS
Name: pix2pix_vanilla_unet_bn_maps2aerial_b1x1_220k
Results:
- Dataset: MAPS
Metrics:
FID: 88.4635
IS: 3.31
Task: Image2Image Translation
Weights: https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_b2a_1x1_219200_maps_convert-bgr_20210902_170814-6d2eac4a.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/pix2pix/pix2pix_vanilla_unet_bn_wo_jitter_flip_edges2shoes_b1x4_190k.py
In Collection: Pix2Pix
Metadata:
Training Data: EDGES2SHOES
Name: pix2pix_vanilla_unet_bn_wo_jitter_flip_edges2shoes_b1x4_190k
Results:
- Dataset: EDGES2SHOES
Metrics:
FID: 84.375
IS: 2.815
Task: Image2Image Translation
Weights: https://download.openmmlab.com/mmgen/pix2pix/refactor/pix2pix_vanilla_unet_bn_wo_jitter_flip_1x4_186840_edges2shoes_convert-bgr_20210902_170902-0c828552.pth
_base_ = [
'../_base_/models/pix2pix/pix2pix_vanilla_unet_bn.py',
'../_base_/datasets/paired_imgs_256x256_crop.py',
'../_base_/default_runtime.py'
]
source_domain = 'aerial'
target_domain = 'map'
# model settings
model = dict(
default_domain=target_domain,
reachable_domains=[target_domain],
related_domains=[target_domain, source_domain],
gen_auxiliary_loss=dict(
data_info=dict(
pred=f'fake_{target_domain}', target=f'real_{target_domain}')))
# dataset settings
domain_a = source_domain
domain_b = target_domain
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(286, 286),
interpolation='bicubic'),
dict(
type='FixedCrop',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
crop_size=(256, 256)),
dict(
type='Flip',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
direction='horizontal'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
test_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
dataroot = 'data/paired/maps'
data = dict(
train=dict(dataroot=dataroot, pipeline=train_pipeline),
val=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'),
test=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'))
# optimizer
optimizer = dict(
generators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)),
discriminators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)))
# learning policy
lr_config = None
# checkpoint saving
checkpoint_config = dict(interval=10000, save_optimizer=True, by_epoch=False)
custom_hooks = [
dict(
type='MMGenVisualizationHook',
output_dir='training_samples',
res_name_list=[f'fake_{target_domain}'],
interval=5000)
]
runner = None
use_ddp_wrapper = True
# runtime settings
total_iters = 220000
workflow = [('train', 1)]
exp_name = 'pix2pix_aerial2map'
work_dir = f'./work_dirs/experiments/{exp_name}'
num_images = 1098
metrics = dict(
FID=dict(type='FID', num_images=num_images, image_shape=(3, 256, 256)),
IS=dict(
type='IS',
num_images=num_images,
image_shape=(3, 256, 256),
inception_args=dict(type='pytorch')))
evaluation = dict(
type='TranslationEvalHook',
target_domain=domain_b,
interval=10000,
metrics=[
dict(type='FID', num_images=num_images, bgr2rgb=True),
dict(
type='IS',
num_images=num_images,
inception_args=dict(type='pytorch'))
],
best_metric=['fid', 'is'])
_base_ = [
'../_base_/models/pix2pix/pix2pix_vanilla_unet_bn.py',
'../_base_/datasets/paired_imgs_256x256_crop.py',
'../_base_/default_runtime.py'
]
source_domain = 'mask'
target_domain = 'photo'
# model settings
model = dict(
default_domain=target_domain,
reachable_domains=[target_domain],
related_domains=[target_domain, source_domain],
gen_auxiliary_loss=dict(
data_info=dict(
pred=f'fake_{target_domain}', target=f'real_{target_domain}')))
# dataset settings
domain_a = target_domain
domain_b = source_domain
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(286, 286),
interpolation='bicubic'),
dict(
type='FixedCrop',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
crop_size=(256, 256)),
dict(
type='Flip',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
direction='horizontal'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
test_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
dataroot = 'data/paired/facades'
data = dict(
train=dict(dataroot=dataroot, pipeline=train_pipeline),
val=dict(dataroot=dataroot, pipeline=test_pipeline),
test=dict(dataroot=dataroot, pipeline=test_pipeline))
# optimizer
optimizer = dict(
generators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)),
discriminators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)))
# learning policy
lr_config = None
# checkpoint saving
checkpoint_config = dict(interval=10000, save_optimizer=True, by_epoch=False)
custom_hooks = [
dict(
type='MMGenVisualizationHook',
output_dir='training_samples',
res_name_list=[f'fake_{target_domain}'],
interval=5000)
]
runner = None
use_ddp_wrapper = True
# runtime settings
total_iters = 80000
workflow = [('train', 1)]
exp_name = 'pix2pix_facades'
work_dir = f'./work_dirs/experiments/{exp_name}'
num_images = 106
metrics = dict(
FID=dict(type='FID', num_images=num_images, image_shape=(3, 256, 256)),
IS=dict(
type='IS',
num_images=num_images,
image_shape=(3, 256, 256),
inception_args=dict(type='pytorch')))
evaluation = dict(
type='TranslationEvalHook',
target_domain=domain_b,
interval=10000,
metrics=[
dict(type='FID', num_images=num_images, bgr2rgb=True),
dict(
type='IS',
num_images=num_images,
inception_args=dict(type='pytorch'))
],
best_metric=['fid', 'is'])
_base_ = [
'../_base_/models/pix2pix/pix2pix_vanilla_unet_bn.py',
'../_base_/datasets/paired_imgs_256x256_crop.py',
'../_base_/default_runtime.py'
]
source_domain = 'map'
target_domain = 'aerial'
# model settings
model = dict(
default_domain=target_domain,
reachable_domains=[target_domain],
related_domains=[target_domain, source_domain],
gen_auxiliary_loss=dict(
data_info=dict(
pred=f'fake_{target_domain}', target=f'real_{target_domain}')))
# dataset settings
domain_a = target_domain
domain_b = source_domain
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(286, 286),
interpolation='bicubic'),
dict(
type='FixedCrop',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
crop_size=(256, 256)),
dict(
type='Flip',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
direction='horizontal'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
test_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
dataroot = 'data/paired/maps'
data = dict(
train=dict(dataroot=dataroot, pipeline=train_pipeline),
val=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'),
test=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'))
# optimizer
optimizer = dict(
generators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)),
discriminators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)))
# learning policy
lr_config = None
# checkpoint saving
checkpoint_config = dict(interval=10000, save_optimizer=True, by_epoch=False)
custom_hooks = [
dict(
type='MMGenVisualizationHook',
output_dir='training_samples',
res_name_list=[f'fake_{target_domain}'],
interval=5000)
]
runner = None
use_ddp_wrapper = True
# runtime settings
total_iters = 220000
workflow = [('train', 1)]
exp_name = 'pix2pix_maps2aerial'
work_dir = f'./work_dirs/experiments/{exp_name}'
num_images = 1098
metrics = dict(
FID=dict(type='FID', num_images=num_images, image_shape=(3, 256, 256)),
IS=dict(
type='IS',
num_images=num_images,
image_shape=(3, 256, 256),
inception_args=dict(type='pytorch')))
evaluation = dict(
type='TranslationEvalHook',
target_domain=domain_b,
interval=10000,
metrics=[
dict(type='FID', num_images=num_images, bgr2rgb=True),
dict(
type='IS',
num_images=num_images,
inception_args=dict(type='pytorch'))
],
best_metric=['fid', 'is'])
_base_ = [
'../_base_/models/pix2pix/pix2pix_vanilla_unet_bn.py',
'../_base_/datasets/paired_imgs_256x256.py', '../_base_/default_runtime.py'
]
source_domain = 'edges'
target_domain = 'photo'
# model settings
model = dict(
default_domain=target_domain,
reachable_domains=[target_domain],
related_domains=[target_domain, source_domain],
gen_auxiliary_loss=dict(
data_info=dict(
pred=f'fake_{target_domain}', target=f'real_{target_domain}')))
# dataset settings
domain_a = source_domain
domain_b = target_domain
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(286, 286),
interpolation='bicubic'),
dict(
type='FixedCrop',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
crop_size=(256, 256)),
dict(
type='Flip',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
direction='horizontal'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
test_pipeline = [
dict(
type='LoadPairedImageFromFile',
io_backend='disk',
key='pair',
domain_a=domain_a,
domain_b=domain_b,
flag='color'),
dict(
type='Resize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Normalize',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
to_rgb=False,
**img_norm_cfg),
dict(type='ImageToTensor', keys=[f'img_{domain_a}', f'img_{domain_b}']),
dict(
type='Collect',
keys=[f'img_{domain_a}', f'img_{domain_b}'],
meta_keys=[f'img_{domain_a}_path', f'img_{domain_b}_path'])
]
dataroot = 'data/paired/edges2shoes'
data = dict(
train=dict(dataroot=dataroot, pipeline=train_pipeline),
val=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'),
test=dict(dataroot=dataroot, pipeline=test_pipeline, testdir='val'))
# optimizer
optimizer = dict(
generators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)),
discriminators=dict(type='Adam', lr=2e-4, betas=(0.5, 0.999)))
# learning policy
lr_config = None
# checkpoint saving
checkpoint_config = dict(interval=10000, save_optimizer=True, by_epoch=False)
custom_hooks = [
dict(
type='MMGenVisualizationHook',
output_dir='training_samples',
res_name_list=[f'fake_{target_domain}'],
interval=5000)
]
runner = None
use_ddp_wrapper = True
# runtime settings
total_iters = 190000
workflow = [('train', 1)]
exp_name = 'pix2pix_edges2shoes_wo_jitter_flip'
work_dir = f'./work_dirs/experiments/{exp_name}'
num_images = 200
metrics = dict(
FID=dict(type='FID', num_images=num_images, image_shape=(3, 256, 256)),
IS=dict(
type='IS',
num_images=num_images,
image_shape=(3, 256, 256),
inception_args=dict(type='pytorch')))
evaluation = dict(
type='TranslationEvalHook',
target_domain=domain_b,
interval=10000,
metrics=[
dict(type='FID', num_images=num_images, bgr2rgb=True),
dict(
type='IS',
num_images=num_images,
inception_args=dict(type='pytorch'))
],
best_metric=['fid', 'is'])
# Positional Encoding in GANs
> [Positional Encoding as Spatial Inductive Bias in GANs](https://openaccess.thecvf.com/content/CVPR2021/html/Xu_Positional_Encoding_As_Spatial_Inductive_Bias_in_GANs_CVPR_2021_paper.html)
<!-- [ALGORITHM] -->
## Abstract
<!-- [ABSTRACT] -->
SinGAN shows impressive capability in learning internal patch distribution despite its limited effective receptive field. We are interested in knowing how such a translation-invariant convolutional generator could capture the global structure with just a spatially i.i.d. input. In this work, taking SinGAN and StyleGAN2 as examples, we show that such capability, to a large extent, is brought by the implicit positional encoding when using zero padding in the generators. Such positional encoding is indispensable for generating images with high fidelity. The same phenomenon is observed in other generative architectures such as DCGAN and PGGAN. We further show that zero padding leads to an unbalanced spatial bias with a vague relation between locations. To offer a better spatial inductive bias, we investigate alternative positional encodings and analyze their effects. Based on a more flexible positional encoding explicitly, we propose a new multi-scale training strategy and demonstrate its effectiveness in the state-of-the-art unconditional generator StyleGAN2. Besides, the explicit spatial inductive bias substantially improve SinGAN for more versatile image manipulation.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/28132635/143053767-c6a503b2-87ff-434a-a439-d9fb0e98d804.JPG"/>
</div>
## Results and models for MS-PIE
<div align="center">
<b> 896x896 results generated from a 256 generator using MS-PIE</b>
<br/>
<img src="https://download.openmmlab.com/mmgen/pe_in_gans/mspie_256-896_demo.png" width="800"/>
</div>
| Models | Reference in Paper | Scales | FID50k | P&R10k | Config | Download |
| :--------------------------: | :----------------: | :------------: | :----: | :---------: | :----------------------------------------------------------: | :-------------------------------------------------------------: |
| stylegan2_c2_256_baseline | Tab.5 config-a | 256 | 5.56 | 75.92/51.24 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/stylegan2_c2_ffhq_256_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/stylegan2_c2_config-a_ffhq_256x256_b3x8_1100k_20210406_145127-71d9634b.pth) |
| stylegan2_c2_512_baseline | Tab.5 config-b | 512 | 4.91 | 75.65/54.58 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/stylegan2_c2_ffhq_512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/stylegan2_c2_config-b_ffhq_512x512_b3x8_1100k_20210406_145142-e85e5cf4.pth) |
| ms-pie_stylegan2_c2_config-c | Tab.5 config-c | 256, 384, 512 | 3.35 | 73.84/55.77 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-c_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-c_ffhq_256-512_b3x8_1100k_20210406_144824-9f43b07d.pth) |
| ms-pie_stylegan2_c2_config-d | Tab.5 config-d | 256, 384, 512 | 3.50 | 73.28/56.16 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-d_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-d_ffhq_256-512_b3x8_1100k_20210406_144840-dbefacf6.pth) |
| ms-pie_stylegan2_c2_config-e | Tab.5 config-e | 256, 384, 512 | 3.15 | 74.13/56.88 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-e_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-e_ffhq_256-512_b3x8_1100k_20210406_144906-98d5a42a.pth) |
| ms-pie_stylegan2_c2_config-f | Tab.5 config-f | 256, 384, 512 | 2.93 | 73.51/57.32 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-512_b3x8_1100k_20210406_144927-4f4d5391.pth) |
| ms-pie_stylegan2_c1_config-g | Tab.5 config-g | 256, 384, 512 | 3.40 | 73.05/56.45 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c1_config-g_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c1_config-g_ffhq_256-512_b3x8_1100k_20210406_144758-2df61752.pth) |
| ms-pie_stylegan2_c2_config-h | Tab.5 config-h | 256, 384, 512 | 4.01 | 72.81/54.35 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-h_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-h_ffhq_256-512_b3x8_1100k_20210406_145006-84cf3f48.pth) |
| ms-pie_stylegan2_c2_config-i | Tab.5 config-i | 256, 384, 512 | 3.76 | 73.26/54.71 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-i_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-i_ffhq_256-512_b3x8_1100k_20210406_145023-c2b0accf.pth) |
| ms-pie_stylegan2_c2_config-j | Tab.5 config-j | 256, 384, 512 | 4.23 | 73.11/54.63 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-j_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-j_ffhq_256-512_b3x8_1100k_20210406_145044-c407481b.pth) |
| ms-pie_stylegan2_c2_config-k | Tab.5 config-k | 256, 384, 512 | 4.17 | 73.05/51.07 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-k_ffhq_256-512_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-k_ffhq_256-512_b3x8_1100k_20210406_145105-6d8cc39f.pth) |
| ms-pie_stylegan2_c2_config-f | higher-resolution | 256, 512, 896 | 4.10 | 72.21/50.29 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-896_b3x8_1100k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-896_b3x8_1100k_20210406_144943-6c18ad5d.pth) |
| ms-pie_stylegan2_c1_config-f | higher-resolution | 256, 512, 1024 | 6.24 | 71.79/49.92 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c1_config-f_ffhq_256-1024_b2x8_1600k.py) | [model](https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c1_config-f_ffhq_256-1024_b2x8_1600k_20210406_144716-81cbdc96.pth) |
Note that we report the FID and P&R metric (FFHQ dataset) in the largest scale.
## Results and Models for SinGAN
<div align="center">
<b> Positional Encoding in SinGAN</b>
<br/>
<img src="https://nbei.github.io/gan-pos-encoding/teaser-web-singan.png" width="800"/>
</div>
| Model | Data | Num Scales | Config | Download |
| :-----------------------------: | :-------------------------------------------------: | :--------: | :---------------------------------------------------: | :-----------------------------------------------------: |
| SinGAN + no pad | [balloons.png](https://download.openmmlab.com/mmgen/dataset/singan/balloons.png) | 8 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_balloons.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_balloons_20210406_180014-96f51555.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_balloons_20210406_180014-96f51555.pkl) |
| SinGAN + no pad + no bn in disc | [balloons.png](https://download.openmmlab.com/mmgen/dataset/singan/balloons.png) | 8 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_disc-nobn_balloons.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_balloons_20210406_180059-7d63e65d.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_balloons_20210406_180059-7d63e65d.pkl) |
| SinGAN + no pad + no bn in disc | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_disc-nobn_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_fis_20210406_175720-9428517a.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_fis_20210406_175720-9428517a.pkl) |
| SinGAN + CSG | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_csg_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_fis_20210406_175532-f0ec7b61.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_fis_20210406_175532-f0ec7b61.pkl) |
| SinGAN + CSG | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_csg_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_bohemian_20210407_195455-5ed56db2.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_bohemian_20210407_195455-5ed56db2.pkl) |
| SinGAN + SPE-dim4 | [fish.jpg](https://download.openmmlab.com/mmgen/dataset/singan/fish-crop.jpg) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim4_fish.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_fish_20210406_175933-f483a7e3.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_fish_20210406_175933-f483a7e3.pkl) |
| SinGAN + SPE-dim4 | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim4_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_bohemian_20210406_175820-6e484a35.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_bohemian_20210406_175820-6e484a35.pkl) |
| SinGAN + SPE-dim8 | [bohemian.png](https://download.openmmlab.com/mmgen/dataset/singan/bohemian.png) | 10 | [config](https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim8_bohemian.py) | [ckpt](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim8_bohemian_20210406_175858-7faa50f3.pth) \| [pkl](https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim8_bohemian_20210406_175858-7faa50f3.pkl) |
## Citation
```latex
@article{xu2020positional,
title={Positional Encoding as Spatial Inductive Bias in GANs},
author={Xu, Rui and Wang, Xintao and Chen, Kai and Zhou, Bolei and Loy, Chen Change},
journal={arXiv preprint arXiv:2012.05217},
year={2020},
url={https://openaccess.thecvf.com/content/CVPR2021/html/Xu_Positional_Encoding_As_Spatial_Inductive_Bias_in_GANs_CVPR_2021_paper.html},
}
```
Collections:
- Metadata:
Architecture:
- Positional Encoding in GANs
Name: Positional Encoding in GANs
Paper:
- https://openaccess.thecvf.com/content/CVPR2021/html/Xu_Positional_Encoding_As_Spatial_Inductive_Bias_in_GANs_CVPR_2021_paper.html
README: configs/positional_encoding_in_gans/README.md
Models:
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/stylegan2_c2_ffhq_256_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: stylegan2_c2_ffhq_256_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 5.56
P&R10k: 75.92/51.24
Reference in Paper: Tab.5 config-a
Scales: 256.0
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/stylegan2_c2_config-a_ffhq_256x256_b3x8_1100k_20210406_145127-71d9634b.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/stylegan2_c2_ffhq_512_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: stylegan2_c2_ffhq_512_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 4.91
P&R10k: 75.65/54.58
Reference in Paper: Tab.5 config-b
Scales: 512.0
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/stylegan2_c2_config-b_ffhq_512x512_b3x8_1100k_20210406_145142-e85e5cf4.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-c_ffhq_256-512_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c2_config-c_ffhq_256-512_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 3.35
P&R10k: 73.84/55.77
Reference in Paper: Tab.5 config-c
Scales: 256, 384, 512
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-c_ffhq_256-512_b3x8_1100k_20210406_144824-9f43b07d.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-d_ffhq_256-512_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c2_config-d_ffhq_256-512_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 3.5
P&R10k: 73.28/56.16
Reference in Paper: Tab.5 config-d
Scales: 256, 384, 512
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-d_ffhq_256-512_b3x8_1100k_20210406_144840-dbefacf6.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-e_ffhq_256-512_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c2_config-e_ffhq_256-512_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 3.15
P&R10k: 74.13/56.88
Reference in Paper: Tab.5 config-e
Scales: 256, 384, 512
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-e_ffhq_256-512_b3x8_1100k_20210406_144906-98d5a42a.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-512_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c2_config-f_ffhq_256-512_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 2.93
P&R10k: 73.51/57.32
Reference in Paper: Tab.5 config-f
Scales: 256, 384, 512
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-512_b3x8_1100k_20210406_144927-4f4d5391.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c1_config-g_ffhq_256-512_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c1_config-g_ffhq_256-512_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 3.4
P&R10k: 73.05/56.45
Reference in Paper: Tab.5 config-g
Scales: 256, 384, 512
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c1_config-g_ffhq_256-512_b3x8_1100k_20210406_144758-2df61752.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-h_ffhq_256-512_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c2_config-h_ffhq_256-512_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 4.01
P&R10k: 72.81/54.35
Reference in Paper: Tab.5 config-h
Scales: 256, 384, 512
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-h_ffhq_256-512_b3x8_1100k_20210406_145006-84cf3f48.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-i_ffhq_256-512_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c2_config-i_ffhq_256-512_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 3.76
P&R10k: 73.26/54.71
Reference in Paper: Tab.5 config-i
Scales: 256, 384, 512
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-i_ffhq_256-512_b3x8_1100k_20210406_145023-c2b0accf.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-j_ffhq_256-512_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c2_config-j_ffhq_256-512_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 4.23
P&R10k: 73.11/54.63
Reference in Paper: Tab.5 config-j
Scales: 256, 384, 512
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-j_ffhq_256-512_b3x8_1100k_20210406_145044-c407481b.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-k_ffhq_256-512_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c2_config-k_ffhq_256-512_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 4.17
P&R10k: 73.05/51.07
Reference in Paper: Tab.5 config-k
Scales: 256, 384, 512
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-k_ffhq_256-512_b3x8_1100k_20210406_145105-6d8cc39f.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-896_b3x8_1100k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c2_config-f_ffhq_256-896_b3x8_1100k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 4.1
P&R10k: 72.21/50.29
Reference in Paper: higher-resolution
Scales: 256, 512, 896
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c2_config-f_ffhq_256-896_b3x8_1100k_20210406_144943-6c18ad5d.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/mspie-stylegan2_c1_config-f_ffhq_256-1024_b2x8_1600k.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: FFHQ
Name: mspie-stylegan2_c1_config-f_ffhq_256-1024_b2x8_1600k
Results:
- Dataset: FFHQ
Metrics:
FID50k: 6.24
P&R10k: 71.79/49.92
Reference in Paper: higher-resolution
Scales: 256, 512, 1024
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/mspie-stylegan2_c1_config-f_ffhq_256-1024_b2x8_1600k_20210406_144716-81cbdc96.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_balloons.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: Others
Name: singan_interp-pad_balloons
Results:
- Dataset: Others
Metrics:
Num Scales: 8.0
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_balloons_20210406_180014-96f51555.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_disc-nobn_balloons.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: Others
Name: singan_interp-pad_disc-nobn_balloons
Results:
- Dataset: Others
Metrics:
Num Scales: 8.0
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_balloons_20210406_180059-7d63e65d.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_interp-pad_disc-nobn_fish.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: Others
Name: singan_interp-pad_disc-nobn_fish
Results:
- Dataset: Others
Metrics:
Num Scales: 10.0
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/singan_interp-pad_disc-nobn_fis_20210406_175720-9428517a.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_csg_fish.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: Others
Name: singan_csg_fish
Results:
- Dataset: Others
Metrics:
Num Scales: 10.0
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_fis_20210406_175532-f0ec7b61.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_csg_bohemian.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: Others
Name: singan_csg_bohemian
Results:
- Dataset: Others
Metrics:
Num Scales: 10.0
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/singan_csg_bohemian_20210407_195455-5ed56db2.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim4_fish.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: Others
Name: singan_spe-dim4_fish
Results:
- Dataset: Others
Metrics:
Num Scales: 10.0
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_fish_20210406_175933-f483a7e3.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim4_bohemian.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: Others
Name: singan_spe-dim4_bohemian
Results:
- Dataset: Others
Metrics:
Num Scales: 10.0
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim4_bohemian_20210406_175820-6e484a35.pth
- Config: https://github.com/open-mmlab/mmgeneration/tree/master/configs/positional_encoding_in_gans/singan_spe-dim8_bohemian.py
In Collection: Positional Encoding in GANs
Metadata:
Training Data: Others
Name: singan_spe-dim8_bohemian
Results:
- Dataset: Others
Metrics:
Num Scales: 10.0
Task: Unconditional GANs
Weights: https://download.openmmlab.com/mmgen/pe_in_gans/singan_spe-dim8_bohemian_20210406_175858-7faa50f3.pth
_base_ = [
'../_base_/datasets/ffhq_flip.py',
'../_base_/models/stylegan/stylegan2_base.py',
'../_base_/default_runtime.py'
]
model = dict(
type='MSPIEStyleGAN2',
generator=dict(
type='MSStyleGANv2Generator',
head_pos_encoding=dict(
type='SPE',
embedding_dim=256,
padding_idx=0,
init_size=256,
center_shift=100),
deconv2conv=True,
up_after_conv=True,
up_config=dict(scale_factor=2, mode='bilinear', align_corners=True),
out_size=256,
channel_multiplier=1),
discriminator=dict(
type='MSStyleGAN2Discriminator', in_size=256, with_adaptive_pool=True))
train_cfg = dict(
num_upblocks=6,
multi_input_scales=[0, 4, 12],
multi_scale_probability=[0.5, 0.25, 0.25])
data = dict(
samples_per_gpu=2,
train=dict(dataset=dict(
imgs_root='./data/ffhq/images'))) # path for 1024 scales
ema_half_life = 10.
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=5000),
dict(
type='ExponentialMovingAverageHook',
module_keys=('generator_ema', ),
interval=1,
interp_cfg=dict(momentum=0.5**(32. / (ema_half_life * 1000.))),
priority='VERY_HIGH')
]
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=40)
lr_config = None
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook'),
])
cudnn_benchmark = False
total_iters = 1500002
metrics = dict(
fid50k=dict(
type='FID',
num_images=50000,
inception_pkl='work_dirs/inception_pkl/ffhq-256-50k-rgb.pkl',
bgr2rgb=True),
pr10k3=dict(type='PR', num_images=10000, k=3))
_base_ = [
'../_base_/datasets/ffhq_flip.py',
'../_base_/models/stylegan/stylegan2_base.py',
'../_base_/default_runtime.py'
]
model = dict(
type='MSPIEStyleGAN2',
generator=dict(
type='MSStyleGANv2Generator',
head_pos_encoding=dict(
type='SPE',
embedding_dim=256,
padding_idx=0,
init_size=256,
center_shift=100),
deconv2conv=True,
up_after_conv=True,
up_config=dict(scale_factor=2, mode='bilinear', align_corners=True),
out_size=256,
channel_multiplier=1),
discriminator=dict(
type='MSStyleGAN2Discriminator',
in_size=256,
with_adaptive_pool=True,
channel_multiplier=1))
train_cfg = dict(
num_upblocks=6,
multi_input_scales=[0, 2, 4],
multi_scale_probability=[0.5, 0.25, 0.25])
data = dict(
samples_per_gpu=3,
train=dict(dataset=dict(imgs_root='./data/ffhq/ffhq_imgs/ffhq_512')))
ema_half_life = 10.
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=5000),
dict(
type='ExponentialMovingAverageHook',
module_keys=('generator_ema', ),
interval=1,
interp_cfg=dict(momentum=0.5**(32. / (ema_half_life * 1000.))),
priority='VERY_HIGH')
]
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=40)
lr_config = None
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook'),
])
cudnn_benchmark = False
total_iters = 1100002
metrics = dict(
fid50k=dict(
type='FID',
num_images=50000,
inception_pkl='work_dirs/inception_pkl/ffhq-256-50k-rgb.pkl',
bgr2rgb=True),
pr10k3=dict(type='PR', num_images=10000, k=3))
_base_ = [
'../_base_/datasets/ffhq_flip.py',
'../_base_/models/stylegan/stylegan2_base.py',
'../_base_/default_runtime.py'
]
model = dict(
type='MSPIEStyleGAN2',
generator=dict(
type='MSStyleGANv2Generator',
head_pos_encoding=None,
deconv2conv=True,
up_after_conv=True,
head_pos_size=(4, 4),
interp_head=True,
up_config=dict(scale_factor=2, mode='bilinear', align_corners=True),
out_size=256),
discriminator=dict(
type='MSStyleGAN2Discriminator', in_size=256, with_adaptive_pool=True))
train_cfg = dict(
num_upblocks=6,
multi_input_scales=[0, 2, 4],
multi_scale_probability=[0.5, 0.25, 0.25])
data = dict(
samples_per_gpu=3,
train=dict(dataset=dict(imgs_root='./data/ffhq/ffhq_imgs/ffhq_512')))
ema_half_life = 10.
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=5000),
dict(
type='ExponentialMovingAverageHook',
module_keys=('generator_ema', ),
interval=1,
interp_cfg=dict(momentum=0.5**(32. / (ema_half_life * 1000.))),
priority='VERY_HIGH')
]
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=40)
lr_config = None
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook'),
])
cudnn_benchmark = False
total_iters = 1100002
metrics = dict(
fid50k=dict(
type='FID',
num_images=50000,
inception_pkl='work_dirs/inception_pkl/ffhq-256-50k-rgb.pkl',
bgr2rgb=True),
pr10k3=dict(type='PR', num_images=10000, k=3))
_base_ = [
'../_base_/datasets/ffhq_flip.py',
'../_base_/models/stylegan/stylegan2_base.py',
'../_base_/default_runtime.py'
]
model = dict(
type='MSPIEStyleGAN2',
generator=dict(
type='MSStyleGANv2Generator',
head_pos_encoding=dict(type='CSG'),
deconv2conv=True,
up_after_conv=True,
head_pos_size=(4, 4),
up_config=dict(scale_factor=2, mode='bilinear', align_corners=True),
out_size=256),
discriminator=dict(
type='MSStyleGAN2Discriminator', in_size=256, with_adaptive_pool=True))
train_cfg = dict(
num_upblocks=6,
multi_input_scales=[0, 2, 4],
multi_scale_probability=[0.5, 0.25, 0.25])
data = dict(
samples_per_gpu=3,
train=dict(dataset=dict(imgs_root='./data/ffhq/ffhq_imgs/ffhq_512')))
ema_half_life = 10.
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=5000),
dict(
type='ExponentialMovingAverageHook',
module_keys=('generator_ema', ),
interval=1,
interp_cfg=dict(momentum=0.5**(32. / (ema_half_life * 1000.))),
priority='VERY_HIGH')
]
checkpoint_config = dict(interval=10000, by_epoch=False, max_keep_ckpts=40)
lr_config = None
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook'),
])
cudnn_benchmark = False
total_iters = 1100002
metrics = dict(
fid50k=dict(
type='FID',
num_images=50000,
inception_pkl='work_dirs/inception_pkl/ffhq-256-50k-rgb.pkl',
bgr2rgb=True),
pr10k3=dict(type='PR', num_images=10000, k=3))
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment