"vscode:/vscode.git/clone" did not exist on "bcab99ec877ba063543bd7c03ba1cdd1b06e8078"
README.md 4.14 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# PhotoMaker

## 论文

**PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding**

* https://arxiv.org/pdf/2312.04461.pdf

## 模型结构

该模型主要包含`Image Encoder`用于对图像进行编码以获取相应的Embedding,在CLIP的Image Encoder基础上增加了额外的映射层,用于改变Embedding的维度;`Text Encoder`用于对Prompt进行编码以获取相应的Embedding;`MLP`用于融合`class word embedding`(图中蓝色方块)与每一个`Image Embedding``Stacked ID Embedding``MLP`生成的Embedding拼接得到;`Updated Text Embedding`是将`class word embedding`替换为`Stacked ID Embedding`后获得;`Diffusion Model`为扩散模型用于生成最终结果。

![Alt text](readme_imgs/image-1.png)

## 算法原理

mashun1's avatar
mashun1 committed
17
该算法结合了Stable Diffusion和Stacked ID Embedding,可以生成包含输入人物特征的图像,具体如下,
mashun1's avatar
mashun1 committed
18
19
20
21
22
23
24
25
26
27
28

1、Stacked ID Embedding

通过结合类别词的特征向量,这种嵌入可以更全面地表示当前输入的ID图像。此外,在推理阶段,这种融合操作还为定制生成过程提供了更强的语义可控性。

![Alt text](readme_imgs/image-2.png)

## 环境配置

### Docker(方法一)

mashun1's avatar
mashun1 committed
29
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
mashun1's avatar
mashun1 committed
30
    docker run --shm-size 10g --network=host --name=photomaker --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
mashun1's avatar
mashun1 committed
31
32
33
34
35
36
37
38
39

    pip install -r requirements.txt
    pip install .

### Dockerfile(方法二)

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .
    # <your IMAGE ID>用以上拉取的docker的镜像ID替换
mashun1's avatar
mashun1 committed
40
    docker run --shm-size 10g --network=host --name=photomaker --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
mashun1's avatar
mashun1 committed
41
42
43
44
45
46
47
48
49
50
51

    pip install -r requirements.txt
    pip install .


## 数据集



## 推理

mashun1's avatar
update  
mashun1 committed
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
### 命令

    python gradio_demo/app.py

注意:若要使用lora,需要在`gradio_demo/app.py`中添加如下代码

    print("Loading lora...")
    lora_path = "./lora/SDXL-Caricaturized-Lora.safetensors"

    pipe.load_lora_weights(os.path.dirname(lora_path), weight_name=os.  path.basename(lora_path), adapter_name="xl_more_art-full")

    pipe.set_adapters(["photomaker", "xl_more_art-full"], adapter_weights=[1.0, 0.5])

    pipe.fuse_lora()

## result

![Alt text](readme_imgs/image-3.png)
mashun1's avatar
mashun1 committed
70

mashun1's avatar
update  
mashun1 committed
71
### 精度
mashun1's avatar
mashun1 committed
72

mashun1's avatar
update  
mashun1 committed
73
74
75
76
77
78
79
80
81
82
83
84
85
86


## 应用场景

### 算法类别

`AIGC`

### 热点行业

`零售,广媒,设计`

## 预训练权重

chenzk's avatar
chenzk committed
87
[PhotoMaker](https://huggingface.co/TencentARC/PhotoMaker) | [SD](https://hf-mirror.com/SG161222/RealVisXL_V3.0/tree/main) | [Lora](https://huggingface.co/Norod78/SDXL-Caricaturized-Lora/tree/main)
mashun1's avatar
mashun1 committed
88
89

    pretrained_models/
mashun1's avatar
update  
mashun1 committed
90
        └── photomaker-v1.bin
mashun1's avatar
mashun1 committed
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125

    SG161222/
    ├── model_index.json
    ├── RealVisXL_V3.0.safetensors
    ├── scheduler
    │   └── scheduler_config.json
    ├── text_encoder
    │   ├── config.json
    │   └── model.fp16.safetensors
    ├── text_encoder_2
    │   ├── config.json
    │   └── model.fp16.safetensors
    ├── tokenizer
    │   ├── merges.txt
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   └── vocab.json
    ├── tokenizer_2
    │   ├── merges.txt
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   └── vocab.json
    ├── unet
    │   ├── config.json
    │   └── diffusion_pytorch_model.fp16.safetensors
    └── vae
        ├── config.json
        └── diffusion_pytorch_model.fp16.safetensors
    
    # 该项为可选项,用于改变图像风格
    lora/
    └── SDXL-Caricaturized-Lora.safetensors

## 源码仓库及问题反馈

chenzk's avatar
chenzk committed
126
* https://developer.sourcefind.cn/codes/modelzoo/photomaker
mashun1's avatar
mashun1 committed
127
128
129

## 参考资料

mashun1's avatar
mashun1 committed
130
* https://github.com/TencentARC/PhotoMaker