README.md 7.2 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# CodeFormer

## 论文

**Towards Robust Blind Face Restoration with Codebook Lookup Transformer**

* https://arxiv.org/abs/2206.11253

## 模型结构

如图所示,该方法分为(a)、(b)两个阶段,每个阶段的模型有所不同。其中(a)阶段包含,$`I_h`$(高质量图像),$`E_h`$(高质量图像编码器),$`Z_h`$(编码后得到的特征),$`C`$(编码本,存储量化的特征),$`S`$(最近邻匹配后得到的索引),$`Z_c`$($`Z_h`$量化后的特征),$`D_H`$(解码器,从特征重构原始高质量图像),$`I_{rec}`$(重构得到的高质量图像);(b)阶段包含,$`I_l`$(待修复的低质量图像),$`E_L`$(低质量图像编码器,在$`D_H`$基础上微调),$`Z_l`$(编码后得到的特征),$`T`$(Transform模块,对全局特征建模并预测每个特征对应的索引),$`C`$(固定的预训练编码本),$`\hat{Z}_c`$(量化后的特征),$`D_H`$(固定的预训练解码器),$`L_{res}`$(重构得到的高质量图像),此外,$`F_e`$(低质量图像特征),$`CFT`$(可控特征变换),$`F_d`$(解码特征)。

![Alt text](readme_images/image-1.png)

## 算法原理

mashun1's avatar
mashun1 committed
17
该算法结合了编码本(codebook)以及Transformer的思想,可以将低质量的人脸图像恢复为高质量的人脸图像,具体如下,
mashun1's avatar
mashun1 committed
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39


1.编码本

将海量的先验知识以离散化的方式存储。

![Alt text](readme_images/image-2.png)

2.Transfomer

低质量图像特征在存在多样化退化的情况下,可能会偏离正确的索引,并被归为附近的聚类,导致不理想的恢复结果。使用Transformer模块对全局关系建模可以消除该问题。

## 环境配置

### Docker

    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py39-latest

    docker run --shm-size 10g --network=host --name=codeformer --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
40
41
    pip install cython

mashun1's avatar
mashun1 committed
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
    python basicsr/setup.py develop

    # 以下内容可选,仅在视频增强时需要
    yum install epel-release -y

    yum localinstall --nogpgcheck https://download1.rpmfusion.org/free/el/rpmfusion-free-release-7.noarch.rpm -y

    yum install ffmpeg ffmpeg-devel -y

### Dockerfile

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .

    # <your IMAGE ID>用以上拉取的docker的镜像ID替换
    docker run -it --shm-size 10g --network=host --name=codeformer --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined <your IMAGE ID> bash

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
61
62
    pip install cython

mashun1's avatar
mashun1 committed
63
64
65
66
67
68
69
70
    python basicsr/setup.py develop

    # 以下内容可选,仅在视频增强时需要
    yum install epel-release -y

    yum localinstall --nogpgcheck https://download1.rpmfusion.org/free/el/rpmfusion-free-release-7.noarch.rpm -y

    yum install ffmpeg ffmpeg-devel -y
mashun1's avatar
mashun1 committed
71
    
mashun1's avatar
mashun1 committed
72
73
74
75
76
77
78

## 数据集

链接: https://github.com/NVlabs/ffhq-dataset

    dataset
    |—— ffhq_512
mashun1's avatar
mashun1 committed
79
80
81
82
        ├── 00522.png
        ├── 01459.png
        ├── 02090.png
        ├── xxx.png
mashun1's avatar
mashun1 committed
83

mashun1's avatar
mashun1 committed
84
注意:原始数据为`1024x1024`需要处理为`512x512`,可运行`data_process.py`对数据进行处理。
mashun1's avatar
mashun1 committed
85

mashun1's avatar
mashun1 committed
86
87
88
本项目也提供了mini数据集(链接:https://pan.baidu.com/s/1HmaiwJygJMg7O-F9VaXURQ 
提取码:kwai),可用于验证程序是否可以正常运行。

mashun1's avatar
mashun1 committed
89
90
91
92
## 训练

### 阶段一 - 训练VQGAN

mashun1's avatar
mashun1 committed
93
    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4321 basicsr/train.py -opt options/VQGAN_512_ds32_nearest_stage1.yml        --launcher pytorch
mashun1's avatar
mashun1 committed
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

获取训练数据的码本序列,可以加速后续训练

    python scripts/generate_latent_gt.py

### 阶段二 - 训练Transformer (w = 0)

    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4322 basicsr/train.py -opt options/CodeFormer_stage2.yml --launcher pytorch

### 阶段三 - 训练可控特征Transformer (w = 1)

    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4323 basicsr/train.py -opt options/CodeFormer_stage3.yml --launcher pytorch


注意:`vqgan``net_g_xxx.pth``net_d`为第一阶段生成。

mashun1's avatar
mashun1 committed
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
## 推理

### 模型下载

Github:https://github.com/sczhou/CodeFormer/releases/tag/v0.1.0

    weights
    ├── CodeFormer
    │   ├── codeformer_colorization.pth
    │   ├── codeformer_inpainting.pth
    │   └── codeformer.pth
    ├── dlib
    │   ├── mmod_human_face_detector-4cb19393.dat
    │   ├── shape_predictor_5_face_landmarks-c4b1e980.dat
    │   └── shape_predictor_68_face_landmarks-fbdc2cb8.dat
    ├── facelib
    │   ├── detection_Resnet50_Final.pth
    │   └── parsing_parsenet.pth
    ├── README.md
    └── realesrgan
        └── RealESRGAN_x2plus.pth

也可以使用脚本下载模型

    python scripts/download_pretrained_models.py facelib

    python scripts/download_pretrained_models.py dlib (only for dlib face detector)

    python scripts/download_pretrained_models.py CodeFormer

### 脸部修复

    # 获取图像中人脸部分
    python scripts/crop_align_face.py -i [input folder] -o [output folder]

    # 对人脸部分进行修复
    python inference_codeformer.py -w 0.5 --has_aligned --input_path [image folder]|[image path]

注意:参数`-w`为保真度权重,取值范围为0-1,通常,较小的`w`倾向于产生更高质量的结果,而较大的`w`则产生更高保真度的结果。

### 整图增强

    # For whole image
    # Add '--bg_upsampler realesrgan' to enhance the background regions with Real-ESRGAN
    # Add '--face_upsample' to further upsample restorated face with Real-ESRGAN
    python inference_codeformer.py -w 0.7 --input_path [image folder]|[image path]

### 视频增强

    # For video clips
    # Video path should end with '.mp4'|'.mov'|'.avi'
    python inference_codeformer.py --bg_upsampler realesrgan --face_upsample -w 1.0 --input_path [video path]

### 脸部着色

    # For cropped and aligned faces (512x512)
    # Colorize black and white or faded photo
    python inference_colorization.py --input_path [image folder]|[image path]

mashun1's avatar
mashun1 committed
169
### 脸部修复(脸部遮挡修复)
mashun1's avatar
mashun1 committed
170
171
172
173
174
175
176
177
178
179
180
181

    # For cropped and aligned faces (512x512)
    # Inputs could be masked by white brush using an image editing app (e.g., Photoshop) 
    # (check out the examples in inputs/masked_faces)
    python inference_inpainting.py --input_path [image folder]|[image path]

## result

![Alt text](readme_images/image-3.png)

### 精度

mashun1's avatar
mashun1 committed
182
183
184
185
186
187
| | LPIPS | PSNR| SSIM | FID |
|--|------|-----|------|-----|
|DCU| 0.306 | 8.639 | 0.636 | 123.14 |
|GPU| 0.312 | 8.694 | 0.635 | 124.7  |

注意:该数据仅用于对比DCU与GPU之间的指标差异。
mashun1's avatar
mashun1 committed
188
189
190
191
192
193
194
195
196
197
198
199
200

## 应用场景

### 算法类别

`图像超分`

### 热点应用行业

`媒体,科研,教育`

## 源码仓库及问题反馈

mashun1's avatar
mashun1 committed
201
* https://developer.hpccube.com/codes/modelzoo/codeformer_pytorch
mashun1's avatar
mashun1 committed
202
203
204
205
206
207

## 参考资料

* https://github.com/guoyww/AnimateDiff/tree/main

* https://github.com/NVlabs/ffhq-dataset23