# InstantID ## 论文 **InstantID:Zero-shot Identity-Preserving Generation in Seconds** * https://arxiv.org/pdf/2401.07519.pdf ## 模型结构 该模型主要结构为Stable Diffusion,并使用`IdentityNet`(提取面部特征),`IPA`(`Porjection + FaceEmbedding + Cross Attention`,获取面部prompt)以及`Text Encoder + Text Embedding + Cross Attention`(获取文字prompt)的输出作为`Unet`的控制条件。 ![Alt text](readme_imgs/image-1.png) ## 算法原理 该算法在`ControlNet`思想的基础上,通过增加`IdentityNet`,`IPA`等获取面部特征及Prompt,并将此作为`Unet`的控制条件,具体如下 1、IdentityNet 仅使用五个面部关键点(两个用于眼睛,一个用于鼻子,两个用于嘴巴)作为条件输入;取消了文本提示,并将ID嵌入作为ControlNet中交叉注意力层的条件。 ![Alt text](readme_imgs/image-2.png) 2、IPA 引入了一种轻量级的自适应模块,其中包含独立的交叉注意力,以支持图像作为提示。 ## 环境配置 ### Docker(方法一) docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 docker run --shm-size 10g --network=host --name=instantid --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it bash pip install -r requirements.txt wget https://cancon.hpccube.com:65024/directlink/4/migraphx/DAS1.1/migraphx-4.3.0+das1.1.git0a2480f.abi0.dtk2404-cp310-cp310-linux_x86_64.run bash migraphx-4.3.0+das1.1.git0a2480f.abi0.dtk2404-cp310-cp310-linux_x86_64.run ### Dockerfile(方法二) # 需要在对应的目录下 docker build -t : . # 用以上拉取的docker的镜像ID替换 docker run --shm-size 10g --network=host --name=instantid --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it bash pip install -r requirements.txt wget https://cancon.hpccube.com:65024/directlink/4/migraphx/DAS1.1/migraphx-4.3.0+das1.1.git0a2480f.abi0.dtk2404-cp310-cp310-linux_x86_64.run bash migraphx-4.3.0+das1.1.git0a2480f.abi0.dtk2404-cp310-cp310-linux_x86_64.run ### Anaconda (方法三) 1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.sourcefind.cn/tool/ DTK驱动:dtk24.04.1 python:python3.10 torch:2.1.0 torchvision:0.16.0 onnxruntime:1.15.0 migraphx: migraphx-4.3.0 Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应 2、其它非特殊库参照requirements.txt安装 pip install -r requirements.txt 注意:如果遇到运行错误`Numpy`相关,需要`pip install numpy==1.26.0` ## 数据集 无 ## 训练 无 ## 推理 ### 模型下载 |url|名称|是否必要| |:---:|:---:|:---:| |[huggingface](https://huggingface.co/InstantX/InstantID) \| [SCNet] |InstantID|是| |[huggingface](https://huggingface.co/wangqixun/YamerMIX_v8/tree/main) \| [SCNet] |YamerMIX_v8|是| |[google driver](https://drive.google.com/file/d/18wEUfMNohBJ4K3Ly5wpTejPfDzp-8fI8/view) \| [SCNet] |antelopev2|是| |[huggingface](https://huggingface.co/latent-consistency/lcm-lora-sdxl/tree/main) \| [SCNet] |pytorch_lora_weights|是| |[huggingface](https://huggingface.co/FMNing/sd-control-facenet/tree/main) \| [SCNet] |facenet|否| |[huggingface](https://hf-mirror.com/diffusers/controlnet-canny-sdxl-1.0/tree/main) \| [SCNet] |controlnet-canny-sdxl-1.0|否| |[huggingface](https://hf-mirror.com/diffusers/controlnet-depth-sdxl-1.0-small/tree/main) \| [SCNet] |controlnet-depth-sdxl-1.0-small|否| |[huggingface](https://hf-mirror.com/Intel/dpt-hybrid-midas/tree/main) \| [SCNet] |dpt-hybrid-midas|否| |[huggingface](https://huggingface.co/lllyasviel/ControlNet/tree/main/annotator/ckpts) \| [SCNet] |annotator|否| |[huggingface](https://huggingface.co/thibaud/controlnet-openpose-sdxl-1.0/tree/main) \| [SCNet] |controlnet-openpose-sdxl-1.0|否| 注意:如果huggingface无法访问,可访问( https://hf-mirror.com/ ), `是否必须`中`否`涉及的模型仅在`app-multicontrolnet.py`需要。 models/ └── antelopev2 ├── 1k3d68.onnx ├── 2d106det.onnx ├── genderage.onnx ├── glintr100.onnx └── scrfd_10g_bnkps.onnx checkpoints/ ├── ControlNetModel │ ├── config.json │ └── diffusion_pytorch_model.safetensors ├── diffusers │ ├── controlnet-canny-sdxl-1.0 │ │ ├── config.json │ │ └── diffusion_pytorch_model.bin │ └── controlnet-depth-sdxl-1.0-small │ ├── config.json │ └── diffusion_pytorch_model.bin ├── Intel │ └── dpt-hybrid-midas │ ├── config.json │ ├── preprocessor_config.json │ └── pytorch_model.bin ├── ip-adapter.bin ├── lllyasviel │ └── ControlNet │ └── annotator │ └── ckpts │ ├── body_pose_model.pth │ ├── facenet.pth │ └── hand_pose_model.pth ├── pytorch_lora_weights.safetensors ├── thibaud │ └── controlnet-openpose-sdxl-1.0 │ ├── config.json │ └── diffusion_pytorch_model.bin └── wangqixun └── YamerMIX_v8 ├── model_index.json ├── scheduler │ └── scheduler_config.json ├── text_encoder │ ├── config.json │ └── model.safetensors ├── text_encoder_2 │ ├── config.json │ └── model.safetensors ├── tokenizer │ ├── merges.txt │ ├── special_tokens_map.json │ ├── tokenizer_config.json │ └── vocab.json ├── tokenizer_2 │ ├── merges.txt │ ├── special_tokens_map.json │ ├── tokenizer_config.json │ └── vocab.json ├── unet │ ├── config.json │ └── diffusion_pytorch_model.safetensors └── vae ├── config.json └── diffusion_pytorch_model.safetensors ### 命令 python gradio_demo/app.py # 多controlnet python gradio_demo/app-multicontrolnet.py ## result #### app 左边为输入图像,右边为输出图像 ![Alt text](readme_imgs/image-3.png) #### app-multicontrolnet 左边为输入图像,右边为输出图像 ![Alt text](readme_imgs/image-4.png) ### 精度 无 ## 应用场景 ### 算法类别 `AIGC` ### 热点行业 `零售,广媒,设计` ## 源码仓库及问题反馈 * https://developer.sourcefind.cn/codes/modelzoo/instantid_pytorch ## 参考资料 * https://github.com/InstantID/InstantID