README.md 7.05 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# CodeFormer

## 论文

**Towards Robust Blind Face Restoration with Codebook Lookup Transformer**

* https://arxiv.org/abs/2206.11253

## 模型结构

如图所示,该方法分为(a)、(b)两个阶段,每个阶段的模型有所不同。其中(a)阶段包含,$`I_h`$(高质量图像),$`E_h`$(高质量图像编码器),$`Z_h`$(编码后得到的特征),$`C`$(编码本,存储量化的特征),$`S`$(最近邻匹配后得到的索引),$`Z_c`$($`Z_h`$量化后的特征),$`D_H`$(解码器,从特征重构原始高质量图像),$`I_{rec}`$(重构得到的高质量图像);(b)阶段包含,$`I_l`$(待修复的低质量图像),$`E_L`$(低质量图像编码器,在$`D_H`$基础上微调),$`Z_l`$(编码后得到的特征),$`T`$(Transform模块,对全局特征建模并预测每个特征对应的索引),$`C`$(固定的预训练编码本),$`\hat{Z}_c`$(量化后的特征),$`D_H`$(固定的预训练解码器),$`L_{res}`$(重构得到的高质量图像),此外,$`F_e`$(低质量图像特征),$`CFT`$(可控特征变换),$`F_d`$(解码特征)。

![Alt text](readme_images/image-1.png)

## 算法原理

mashun1's avatar
mashun1 committed
17
该算法结合了编码本(codebook)以及Transformer的思想,可以将低质量的人脸图像恢复为高质量的人脸图像,具体如下,
mashun1's avatar
mashun1 committed
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


1.编码本

将海量的先验知识以离散化的方式存储。

![Alt text](readme_images/image-2.png)

2.Transfomer

低质量图像特征在存在多样化退化的情况下,可能会偏离正确的索引,并被归为附近的聚类,导致不理想的恢复结果。使用Transformer模块对全局关系建模可以消除该问题。

## 环境配置

### Docker

dcuai's avatar
dcuai committed
34
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
mashun1's avatar
mashun1 committed
35

dcuai's avatar
dcuai committed
36
    docker run --shm-size 10g --network=host --name=codeformer --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
mashun1's avatar
mashun1 committed
37
38
39

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
40
41
    pip install cython

mashun1's avatar
mashun1 committed
42
43
44
    python basicsr/setup.py develop

    # 以下内容可选,仅在视频增强时需要
dcuai's avatar
dcuai committed
45
    apt update && apt install ffmpeg
mashun1's avatar
mashun1 committed
46
47
48
49
50
51
52

### Dockerfile

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .

    # <your IMAGE ID>用以上拉取的docker的镜像ID替换
dcuai's avatar
dcuai committed
53
    docker run -it --shm-size 10g --network=host --name=codeformer --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal  <your IMAGE ID> bash
mashun1's avatar
mashun1 committed
54
55
56

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
57
58
    pip install cython

mashun1's avatar
mashun1 committed
59
60
61
    python basicsr/setup.py develop

    # 以下内容可选,仅在视频增强时需要
dcuai's avatar
dcuai committed
62
    apt update && apt install ffmpeg
mashun1's avatar
mashun1 committed
63
    
mashun1's avatar
mashun1 committed
64
65
66

## 数据集

chenzk's avatar
chenzk committed
67
[github](https://github.com/NVlabs/ffhq-dataset)
mashun1's avatar
mashun1 committed
68

mashun1's avatar
mashun1 committed
69
70
71
72
注意:原始数据为`1024x1024`需要处理为`512x512`,可运行`data_process.py`对数据进行处理。

    python data_process.py --zip_dir <path/to/zipdir> --output_dir <dataset/ffhq_512>

mashun1's avatar
mashun1 committed
73
74
本项目也提供了mini数据集

chenzk's avatar
chenzk committed
75
[baidu 提取码:kwai](https://pan.baidu.com/s/1HmaiwJygJMg7O-F9VaXURQ)
mashun1's avatar
mashun1 committed
76
77
78


可用于验证程序是否可以正常运行。
mashun1's avatar
mashun1 committed
79

mashun1's avatar
mashun1 committed
80
81
    dataset
    |—— ffhq_512
mashun1's avatar
mashun1 committed
82
83
84
85
        ├── 00522.png
        ├── 01459.png
        ├── 02090.png
        ├── xxx.png
mashun1's avatar
mashun1 committed
86

mashun1's avatar
mashun1 committed
87
88
89
90
## 训练

### 阶段一 - 训练VQGAN

mashun1's avatar
mashun1 committed
91
    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4321 basicsr/train.py -opt options/VQGAN_512_ds32_nearest_stage1.yml        --launcher pytorch
mashun1's avatar
mashun1 committed
92
93
94
95
96
97
98
99
100
101
102
103
104
105

获取训练数据的码本序列,可以加速后续训练

    python scripts/generate_latent_gt.py

### 阶段二 - 训练Transformer (w = 0)

    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4322 basicsr/train.py -opt options/CodeFormer_stage2.yml --launcher pytorch

### 阶段三 - 训练可控特征Transformer (w = 1)

    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4323 basicsr/train.py -opt options/CodeFormer_stage3.yml --launcher pytorch


dcuai's avatar
dcuai committed
106
注意:`vqgan``net_g_xxx.pth`为第二阶段生成,`net_d`为第一阶段生成。
mashun1's avatar
mashun1 committed
107

mashun1's avatar
mashun1 committed
108
109
110
111
## 推理

### 模型下载

mashun1's avatar
update  
mashun1 committed
112
[github](https://github.com/sczhou/CodeFormer/releases/tag/v0.1.0
chenzk's avatar
chenzk committed
113
) 
mashun1's avatar
mashun1 committed
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167

    weights
    ├── CodeFormer
    │   ├── codeformer_colorization.pth
    │   ├── codeformer_inpainting.pth
    │   └── codeformer.pth
    ├── dlib
    │   ├── mmod_human_face_detector-4cb19393.dat
    │   ├── shape_predictor_5_face_landmarks-c4b1e980.dat
    │   └── shape_predictor_68_face_landmarks-fbdc2cb8.dat
    ├── facelib
    │   ├── detection_Resnet50_Final.pth
    │   └── parsing_parsenet.pth
    ├── README.md
    └── realesrgan
        └── RealESRGAN_x2plus.pth

也可以使用脚本下载模型

    python scripts/download_pretrained_models.py facelib

    python scripts/download_pretrained_models.py dlib (only for dlib face detector)

    python scripts/download_pretrained_models.py CodeFormer

### 脸部修复

    # 获取图像中人脸部分
    python scripts/crop_align_face.py -i [input folder] -o [output folder]

    # 对人脸部分进行修复
    python inference_codeformer.py -w 0.5 --has_aligned --input_path [image folder]|[image path]

注意:参数`-w`为保真度权重,取值范围为0-1,通常,较小的`w`倾向于产生更高质量的结果,而较大的`w`则产生更高保真度的结果。

### 整图增强

    # For whole image
    # Add '--bg_upsampler realesrgan' to enhance the background regions with Real-ESRGAN
    # Add '--face_upsample' to further upsample restorated face with Real-ESRGAN
    python inference_codeformer.py -w 0.7 --input_path [image folder]|[image path]

### 视频增强

    # For video clips
    # Video path should end with '.mp4'|'.mov'|'.avi'
    python inference_codeformer.py --bg_upsampler realesrgan --face_upsample -w 1.0 --input_path [video path]

### 脸部着色

    # For cropped and aligned faces (512x512)
    # Colorize black and white or faded photo
    python inference_colorization.py --input_path [image folder]|[image path]

mashun1's avatar
mashun1 committed
168
### 脸部修复(脸部遮挡修复)
mashun1's avatar
mashun1 committed
169
170
171
172
173
174
175
176
177
178
179
180

    # For cropped and aligned faces (512x512)
    # Inputs could be masked by white brush using an image editing app (e.g., Photoshop) 
    # (check out the examples in inputs/masked_faces)
    python inference_inpainting.py --input_path [image folder]|[image path]

## result

![Alt text](readme_images/image-3.png)

### 精度

mashun1's avatar
mashun1 committed
181
182
183
184
185
186
| | LPIPS | PSNR| SSIM | FID |
|--|------|-----|------|-----|
|DCU| 0.306 | 8.639 | 0.636 | 123.14 |
|GPU| 0.312 | 8.694 | 0.635 | 124.7  |

注意:该数据仅用于对比DCU与GPU之间的指标差异。
mashun1's avatar
mashun1 committed
187
188
189
190
191
192
193
194
195
196
197
198
199

## 应用场景

### 算法类别

`图像超分`

### 热点应用行业

`媒体,科研,教育`

## 源码仓库及问题反馈

chenzk's avatar
chenzk committed
200
* https://developer.sourcefind.cn/codes/modelzoo/codeformer_pytorch
mashun1's avatar
mashun1 committed
201
202
203

## 参考资料

mashun1's avatar
mashun1 committed
204
* https://github.com/sczhou/CodeFormer
mashun1's avatar
mashun1 committed
205
206

* https://github.com/NVlabs/ffhq-dataset23