Commit 452069e3 authored by chenzk's avatar chenzk
Browse files

v1.0

parents
Pipeline #693 failed with stages
in 0 seconds
*.safetensors filter=lfs diff=lfs merge=lfs -text
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
test.py
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
/package
/temp
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.vscode
.idea
# custom
*.pkl
*.pkl.json
*.log.json
*.whl
*.tar.gz
*.swp
*.log
*.tar.gz
source.sh
tensorboard.sh
.DS_Store
replace.sh
result.png
lora_result.png
result.jpg
result.mp4
# Pytorch
*.pth
*.pt
# ast template
ast_index_file.py
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# FaceChain
本项目的步骤适用于FaceChain中的数字形象生成,使用方法为网页启动微调、推理,算法论文中的其它生成效果以此类推,源算法由alibaba提供。
## 论文
`FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content`
- https://arxiv.org/pdf/2308.14256.pdf
`High-resolution image ¨synthesis with latent diffusion models`
- https://arxiv.org/pdf/2112.10752.pdf
## 模型结构
本项目的核心的算法为Stable Diffusion,算法处理过程:输入样本x,然后通过AE中的Encoder对x进行编码,输出为z,在diffusion model逐步加噪的过程中,都在z空间进行操作,最终输出z_{T},其符合均匀高斯分布,最右侧则是可以适配各类condition控制信息来指导反向生成(去噪)过程,其包括了text、image、标签、语义图,故该方案可以适配多类任务。
<div align=center>
<img src="./doc/LDM.png"/>
</div>
## 算法原理
通常的文本图像生成算法生成的结果不能很好保留目标人物的人脸特征,本算法以Stable Diffusion为核心,基于一系列之前的sota算法(e.g., face detection, deep face embedding extraction, and facial attribute
recognition)组合了一种新的数字形象生成算法来解决这个问题。
<div align=center>
<img src="./doc/facechain.png"/>
</div>
## 环境配置
```
mv facechain_pytorch facechain # 去框架名后缀
算法所需权重微调、推理过程中会自动从ModelScope官网下载。
unzip resources.zip # 需要先解压resources提供写真模版
```
使用docker启动项目需要修改此脚本[`app.py`](./app.py)的最后一行,将127.0.0.1修改为0.0.0.0才能支持外部与容器通信,不用docker启动则使用默认地址127.0.0.1即可:
```
demo.queue(status_update_rate=1).launch(share=True, server_name="0.0.0.0")
```
### Docker(方法一)
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10-py38
# <your IMAGE ID>为以上拉取的docker的镜像ID替换,本镜像为:c9184f78177e
docker run -it -p 7860:7860 -v /parastor/home/chenzk/:/home/ -v /opt/hyhal:/opt/hyhal --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name facechain <your IMAGE ID> bash
pip install -r requirements.txt
pip install mmcv_full-1.7.0+git36c62da.abi0.dtk2310.torch2.1-cp38-cp38-linux_x86_64.whl # dcu版mmcv,可从光合社区下载。
cp -r frpc_linux_amd64 /usr/local/lib/python3.8/site-packages/gradio/frpc_linux_amd64_v0.2
```
### Dockerfile(方法二)
```
cd facechain/docker
docker build --no-cache -t facechain:latest .
docker run -p 7860:7860 --name facechain -v /opt/hyhal:/opt/hyhal --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/../../facechain:/home/facechain -it facechain bash
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt。
pip install mmcv_full-1.7.0+git36c62da.abi0.dtk2310.torch2.1-cp38-cp38-linux_x86_64.whl # dcu版mmcv,可从光合社区下载。
cp -r frpc_linux_amd64 /usr/local/lib/python3.8/site-packages/gradio/frpc_linux_amd64_v0.2
```
### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
- https://developer.hpccube.com/tool/
```
DTK驱动:dtk23.10
python:python3.8
torch:2.1.0
torchvision:0.16.0
triton==2.1.0
apex==0.1
mmcv-full==1.7.0
```
```
cp -r frpc_linux_amd64 /usr/local/lib/python3.8/site-packages/gradio/frpc_linux_amd64_v0.2
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
2、其它非特殊库参照requirements.txt安装
```
pip install -r requirements.txt
```
## 数据集
本项目Style LoRA Model已经离线训练好,只需要提供人脸(>=1张)微调Face LoRA Model即可,训练数据通过网页上传。
训练数据目录结构如下:
```
facechain/
│ ├── xxx.png
│ ├── xxx.png
│ └── ...
```
`更多资料可参考源项目的README_origin.md`
## 训练
### 单机多卡
```
cd facechain
python app.py # 获取用于微调、推理的webui网页
ssh -L 7860:127.0.0.1:7860 xxx@ip -p 22 # xxx为账号,ip为跳板机地址,若服务器隔了跳板机,则需要在本地终端(如cmd)用此命令把网页推到本地才可在本地浏览器中正常查看。
Step 1. 上传计划训练的图片, 1~10张头肩照(注意: 请避免图片中出现多人脸、脸部遮挡等情况, 否则可能导致效果异常);
Step 2. 点击 [开始训练] , 启动形象定制化训练, 每张图片约需1.5分钟;
```
webui网页如下:
<div align=center>
<img src="./doc/sdwebui_success.png"/>
</div>
## 推理
```
Step 3. 切换至 [形象写真] , 生成你的风格照片;
```
## result
输入左边图片,输出右边数字形象写真:
<div align=center>
<img src="./doc/test.png"/>
</div>
### 精度
测试数据:"./yangmi.jpg",推理框架:pytorch。
| device | loss |
|:---------:|:------:|
| DCU Z100L | 0.0896 |
| GPU V100S | 0.103 |
| GPU A800 | 0.104 |
## 应用场景
### 算法类别
`AIGC`
### 热点应用行业
`零售,广媒,金融,医疗,家居,医疗,环保`
## 源码仓库及问题反馈
- https://developer.hpccube.com/codes/modelzoo/facechain_pytorch
## 参考资料
- https://github.com/modelscope/facechain
<p align="center">
<br>
<img src="https://modelscope.oss-cn-beijing.aliyuncs.com/modelscope.gif" width="400"/>
<br>
<h1>FaceChain</h1>
<p>
# 介绍
FaceChain是一个可以用来打造个人数字形象的深度学习模型工具。用户仅需要提供最低一张照片即可获得独属于自己的个人形象数字替身。FaceChain支持在gradio的界面中使用模型训练和推理能力、支持资深开发者使用python脚本进行训练推理,也支持在sd webui中安装插件使用;同时,我们也欢迎开发者对本Repo进行继续开发和贡献。
FaceChain的模型由[ModelScope](https://github.com/modelscope/modelscope)开源模型社区提供支持。
<p align="center">
ModelScope Studio <a href="https://modelscope.cn/studios/CVstudio/cv_human_portrait/summary">🤖<a></a>&nbsp |API <a href="https://help.aliyun.com/zh/dashscope/developer-reference/facechain-quick-start">🔥<a></a>&nbsp | API's Example App <a href="https://tongyi.aliyun.com/wanxiang/app/portrait-gallery">🔥<a></a>&nbsp | SD WebUI | HuggingFace Space <a href="https://huggingface.co/spaces/modelscope/FaceChain">🤗</a>&nbsp
</p>
<br>
![image](resources/git_cover_CH.jpg)
# News
- 支持SDXL模块🔥🔥🔥,出图细腻度大幅提升. (November 22th, 2023 UTC)
- 支持超分模块🔥🔥🔥,目前多种分辨率可选 (512*512, 768*768, 1024*1024, 2048*2048). (November 13th, 2023 UTC)
- 🏆FaceChain入选[BenchCouncil Open100 (2022-2023)](https://www.benchcouncil.org/evaluation/opencs/annual.html#Institutions) 开源榜单. (2023-11-08)
- 增加虚拟试衣模块,可基于包含给定服饰的模特图或人台图进行重绘. (2023-10-27)
- 增加万相版本[在线免费应用](https://tongyi.aliyun.com/wanxiang/app/portrait-gallery). (2023-10-26)
- 🏆1024程序员节AIGC应用工具最具商业价值奖 (2023-10-24)
- stable-diffusion-webui支持🔥🔥🔥. (2023-10-13)
- 高性能的(单人&双人)模版重绘功能,简化用户界面. (2023-09-09)
- 更多技术细节可以在 [论文](https://arxiv.org/abs/2308.14256) 里查看. (2023-08-30)
- 为Lora训练添加验证和根据face_id的融合,并添加InpaintTab(目前在Gradio界面上暂时默认隐藏). (2023-08-28)
- 增加姿势控制模块,可一键体验模版pose复刻. (2023-08-27)
- 增加鲁棒性人脸lora训练,提升单图训练&风格lora融合的效果. (2023-08-27)
- 支持在HuggingFace Space中体验FaceChain ! <a href="https://huggingface.co/spaces/modelscope/FaceChain">🤗</a> (2023-08-25)
- 新增高质量提示词模板,欢迎大家一起贡献! 参考 [awesome-prompts-facechain](resources/awesome-prompts-facechain.txt) (2023-08-18)
- 支持即插即用的风格LoRA模型! (2023-08-16)
- 新增个性化prompt模块! (2023-08-16)
- Colab notebook安装已支持,您可以直接打开链接体验FaceChain: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/modelscope/facechain/blob/main/facechain_demo.ipynb) (2023-08-15)
# 待办事项
- 研发免训练模块,达成CPU运行的目标
- 研发RLHF模块,进一步提升上限
- 增加风格lora的训练接口
- 现成风格模型即插即用(以C站风格模型为例)
- 增加更多美肤功能
- 开发更多好玩的app
# Citation
如果FaceChain对您的研究有所帮助,请在您的出版物中引用FaceChain
```
@article{liu2023facechain,
title={FaceChain: A Playground for Identity-Preserving Portrait Generation},
author={Liu, Yang and Yu, Cheng and Shang, Lei and Wu, Ziheng and
Wang, Xingjun and Zhao, Yuze and Zhu, Lin and Cheng, Chen and
Chen, Weitao and Xu, Chao and Xie, Haoyu and Yao, Yuan and
Zhou, Wenmeng and Chen Yingda and Xie, Xuansong and Sun, Baigui},
journal={arXiv preprint arXiv:2308.14256},
year={2023}
```
# 环境准备
## 兼容性验证
FaceChain是一个组合模型,基于PyTorch机器学习框架,以下是已经验证过的主要环境依赖:
- python环境: py3.8, py3.10
- pytorch版本: torch2.0.0, torch2.0.1
- CUDA版本: 11.7
- CUDNN版本: 8+
- 操作系统版本: Ubuntu 20.04, CentOS 7.9
- GPU型号: Nvidia-A10 24G
## 资源要求
- GPU: 显存占用约19G
- 磁盘: 推荐预留50GB以上的存储空间
## 安装指南
支持以下几种安装方式,任选其一:
### 1. 使用ModelScope提供的notebook环境【推荐】
ModelScope(魔搭社区)提供给新用户初始的免费计算资源,参考[ModelScope Notebook](https://modelscope.cn/my/mynotebook/preset)
如果初始免费计算资源无法满足要求,您还可以从上述页面开通付费流程,以便创建一个准备就绪的ModelScope(GPU) DSW镜像实例。
Notebook环境使用简单,您只需要按以下步骤操作(注意:目前暂不提供永久存储,实例重启后数据会丢失):
```shell
# Step1: 我的notebook -> PAI-DSW -> GPU环境
# Step2: 进入Notebook cell,执行下述命令从github clone代码:
!GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1
# Step3: 切换当前工作路径,安装依赖
import os
os.chdir('/mnt/workspace/facechain') # 注意替换成上述clone后的代码文件夹主路径
print(os.getcwd())
!pip3 install gradio==3.50.2
!pip3 install controlnet_aux==0.0.6
!pip3 install python-slugify
!pip3 install onnxruntime==1.15.1
!pip3 install edge-tts
!pip3 install modelscope==1.10.0
# Step4: 启动服务,点击生成的URL即可访问web页面,上传照片开始训练和预测
!python3 app.py
```
除了ModelScope入口以外,您也可以前往[PAI-DSW](https://www.aliyun.com/activity/bigdata/pai/dsw) 直接购买带有ModelScope镜像的计算实例(推荐使用A10资源),这样同样可以使用如上的最简步骤运行起来。
### 2. docker镜像
如果您熟悉docker,可以使用我们提供的docker镜像,其包含了模型依赖的所有组件,无需复杂的环境安装:
```shell
# Step1: 机器资源
您可以使用本地或云端带有GPU资源的运行环境。
如需使用阿里云ECS,可访问: https://www.aliyun.com/product/ecs,推荐使用”镜像市场“中的CentOS 7.9 64位(预装NVIDIA GPU驱动)
# Step2: 将镜像下载到本地 (前提是已经安装了docker engine并启动服务,具体可参考: https://docs.docker.com/engine/install/)
# For China Mainland users:
docker pull registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.8.0-py38-torch2.0.1-tf2.13.0-1.9.4
# For users outside China Mainland:
docker pull registry.us-west-1.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.8.0-py38-torch2.0.1-tf2.13.0-1.9.4
# Step3: 拉起镜像运行
docker run -it --name facechain -p 7860:7860 --gpus all registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.8.0-py38-torch2.0.1-tf2.13.0-1.9.4 /bin/bash # 注意 your_xxx_image_id 替换成你的镜像id
# 注意: 如果提示无法使用宿主机GPU的错误,可能需要安装nvidia-container-runtime
# 1. 安装nvidia-container-runtime:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
# 2. 重启docker服务:sudo systemctl restart docker
# Step4: 在容器中安装gradio
pip3 install gradio==3.50.2
pip3 install controlnet_aux==0.0.6
pip3 install python-slugify
pip3 install onnxruntime==1.15.1
pip3 install edge-tts
pip3 install modelscope==1.10.0
# Step5: 获取facechain源代码
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1
cd facechain
python3 app.py
# Note: FaceChain目前支持单卡GPU,如果您的环境有多卡,请使用如下命令
# CUDA_VISIBLE_DEVICES=0 python3 app.py
# Step6: 点击 "public URL", 形式为 https://xxx.gradio.live
```
### 3. conda虚拟环境
使用conda虚拟环境,参考[Anaconda](https://docs.anaconda.com/anaconda/install/)来管理您的依赖,安装完成后,执行如下命令:
(提示: mmcv对环境要求较高,可能出现不适配的情况,推荐使用docker方式)
```shell
conda create -n facechain python=3.8 # 已验证环境:3.8 和 3.10
conda activate facechain
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1
cd facechain
pip3 install -r requirements.txt
pip3 install -U openmim
mim install mmcv-full==1.7.0
# 进入facechain文件夹,执行:
python3 app.py
# Note: FaceChain目前支持单卡GPU,如果您的环境有多卡,请使用如下命令
# CUDA_VISIBLE_DEVICES=0 python3 app.py
# 最后点击log中生成的URL即可访问页面。
```
备注:如果是Windows环境还需要注意以下步骤:
```shell
# pip方式安装mmcv-full: pip3 install mmcv-full
```
**如果您想要使用"人物说话视频生成"标签页的功能,请参考[installation_for_talkinghead_ZH](doc/installation_for_talkinghead_ZH.md)里的安装使用教程。**
### 4. colab运行
| Colab | Info
| --- | --- |
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/modelscope/facechain/blob/main/facechain_demo.ipynb) | FaceChain Installation on Colab
### 5. stable-diffusion-webui中运行
1. 选择`Extensions Tab`, 选择`Install From URL`(官方插件集成中,先从URL安装)
![image](resources/sdwebui_install.png)
2. 切换到`Installed`,勾选FaceChain插件,点击`Apply and restart UI`
![image](resources/sdwebui_restart.png)
3. 页面刷新后,出现`FaceChain` Tab说明安装成功
![image](resources/sdwebui_success.png)
备注:app服务成功启动后,在log中访问页面URL,进入”形象定制“tab页,点击“选择图片上传”,并最少选1张包含人脸的图片;点击“开始训练”即可训练模型。训练完成后日志中会有对应展示,之后切换到“形象体验”标签页点击“开始生成”即可生成属于自己的数字形象。
# 脚本运行
如果不想启动服务,而是直接在命令行进行开发调试等工作,FaceChain也支持在python环境中直接运行脚本进行训练和推理。在克隆后的文件夹中直接运行如下命令来进行训练:
```shell
PYTHONPATH=. sh train_lora.sh "ly261666/cv_portrait_model" "v2.0" "film/film" "./imgs" "./processed" "./output"
```
参数含义:
```text
ly261666/cv_portrait_model: ModelScope模型仓库的stable diffusion基模型,该模型会用于训练,可以不修改
v2.0: 该基模型的版本号,可以不修改
film/film: 该基模型包含了多个不同风格的子目录,其中使用了film/film目录中的风格模型,可以不修改
./imgs: 本参数需要用实际值替换,本参数是一个本地文件目录,包含了用来训练和生成的原始照片
./processed: 预处理之后的图片文件夹,这个参数需要在推理中被传入相同的值,可以不修改
./output: 训练生成保存模型weights的文件夹,可以不修改
```
等待5-20分钟即可训练完成。用户也可以调节其他训练超参数,训练支持的超参数可以查看`train_lora.sh`的配置,或者`facechain/train_text_to_image_lora.py`中的完整超参数列表。
进行推理时,请编辑run_inference.py中的代码:
```python
# 使用深度控制,默认False,仅在使用姿态控制时生效
use_depth_control = False
# 使用姿态控制,默认False
use_pose_model = False
# 姿态控制图片路径,仅在使用姿态控制时生效
pose_image = 'poses/man/pose1.png'
# 填入上述的预处理之后的图片文件夹,需要和训练时相同
processed_dir = './processed'
# 推理生成的图片数量
num_generate = 5
# 训练时使用的stable diffusion基模型,可以不修改
base_model = 'ly261666/cv_portrait_model'
# 该基模型的版本号,可以不修改
revision = 'v2.0'
# 该基模型包含了多个不同风格的子目录,其中使用了film/film目录中的风格模型,可以不修改
base_model_sub_dir = 'film/film'
# 训练生成保存模型weights的文件夹,需要保证和训练时相同
train_output_dir = './output'
# 指定一个保存生成的图片的文件夹,本参数可以根据需要修改
output_dir = './generated'
# 使用凤冠霞帔风格模型,默认False
use_style = False
```
之后执行:
```python
python run_inference.py
```
即可在`output_dir`中找到生成的个人数字形象照片。
# 算法介绍
## 基本原理
个人写真模型的能力来源于Stable Diffusion模型的文生图功能,输入一段文本或一系列提示词,输出对应的图像。我们考虑影响个人写真生成效果的主要因素:写真风格信息,以及用户人物信息。为此,我们分别使用线下训练的风格LoRA模型和线上训练的人脸LoRA模型以学习上述信息。LoRA是一种具有较少可训练参数的微调模型,在Stable Diffusion中,可以通过对少量输入图像进行文生图训练的方式将输入图像的信息注入到LoRA模型中。因此,个人写真模型的能力分为训练与推断两个阶段,训练阶段生成用于微调Stable Diffusion模型的图像与文本标签数据,得到人脸LoRA模型;推断阶段基于人脸LoRA模型和风格LoRA模型生成个人写真图像。
![image](resources/framework.jpg)
## 训练阶段
输入:用户上传的包含清晰人脸区域的图像
输出:人脸LoRA模型
描述:首先,我们分别使用基于朝向判断的图像旋转模型,以及基于人脸检测和关键点模型的人脸精细化旋转方法处理用户上传图像,得到包含正向人脸的图像;接下来,我们使用人体解析模型和人像美肤模型,以获得高质量的人脸训练图像;随后,我们使用人脸属性模型和文本标注模型,结合标签后处理方法,产生训练图像的精细化标签;最后,我们使用上述图像和标签数据微调Stable Diffusion模型得到人脸LoRA模型。
## 推断阶段
输入:训练阶段用户上传图像,预设的用于生成个人写真的输入提示词
输出:个人写真图像
描述:首先,我们将人脸LoRA模型和风格LoRA模型的权重融合到Stable Diffusion模型中;接下来,我们使用Stable Diffusion模型的文生图功能,基于预设的输入提示词初步生成个人写真图像;随后,我们使用人脸融合模型进一步改善上述写真图像的人脸细节,其中用于融合的模板人脸通过人脸质量评估模型在训练图像中挑选;最后,我们使用人脸识别模型计算生成的写真图像与模板人脸的相似度,以此对写真图像进行排序,并输出排名靠前的个人写真图像作为最终输出结果。
## 模型列表
附(流程图中模型链接)
[1] 人脸检测+关键点模型DamoFD:https://modelscope.cn/models/damo/cv_ddsar_face-detection_iclr23-damofd
[2] 图像旋转模型:创空间内置模型
[3] 人体解析模型M2FP:https://modelscope.cn/models/damo/cv_resnet101_image-multiple-human-parsing
[4] 人像美肤模型ABPN:https://www.modelscope.cn/models/damo/cv_unet_skin_retouching_torch
[5] 人脸属性模型FairFace:https://modelscope.cn/models/damo/cv_resnet34_face-attribute-recognition_fairface
[6] 文本标注模型Deepbooru:https://github.com/KichangKim/DeepDanbooru
[7] 模板脸筛选模型FQA:https://modelscope.cn/models/damo/cv_manual_face-quality-assessment_fqa
[8] 人脸融合模型:https://www.modelscope.cn/models/damo/cv_unet_face_fusion_torch
[9] 人脸识别模型RTS:https://modelscope.cn/models/damo/cv_ir_face-recognition-ood_rts
[10] 人脸说话模型:https://modelscope.cn/models/wwd123/sadtalker
# 更多信息
- [ModelScope library](https://github.com/modelscope/modelscope/)
ModelScope Library是一个托管于github上的模型生态仓库,隶属于达摩院魔搭项目。
- [贡献模型到ModelScope](https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88)
# License
This project is licensed under the [Apache License (Version 2.0)](https://github.com/modelscope/modelscope/blob/master/LICENSE).
<p align="center">
<br>
<img src="https://modelscope.oss-cn-beijing.aliyuncs.com/modelscope.gif" width="400"/>
<br>
<h1>FaceChain</h1>
<p>
# Introduction
如果您熟悉中文,可以阅读[中文版本的README](./README_ZH.md)
FaceChain is a deep-learning toolchain for generating your Digital-Twin. With a minimum of 1 portrait-photo, you can create a Digital-Twin of your own and start generating personal portraits in different settings (multiple styles now supported!). You may train your Digital-Twin model and generate photos via FaceChain's Python scripts, or via the familiar Gradio interface, or via sd webui.
FaceChain is powered by [ModelScope](https://github.com/modelscope/modelscope).
<p align="center">
ModelScope Studio <a href="https://modelscope.cn/studios/CVstudio/cv_human_portrait/summary">🤖<a></a>&nbsp |API <a href="https://help.aliyun.com/zh/dashscope/developer-reference/facechain-quick-start">🔥<a></a>&nbsp | API's Example App <a href="https://tongyi.aliyun.com/wanxiang/app/portrait-gallery">🔥<a></a>&nbsp | SD WebUI | HuggingFace Space <a href="https://huggingface.co/spaces/modelscope/FaceChain">🤗</a>&nbsp
</p>
<br>
![image](resources/git_cover.jpg)
# News
- Add SDXL pipeline🔥🔥🔥, image detail is improved obviously. (November 22th, 2023 UTC)
- Support super resolution🔥🔥🔥, provide multiple resolution choice (512*512, 768*768, 1024*1024, 2048*2048). (November 13th, 2023 UTC)
- 🏆FaceChain has been selected in the [BenchCouncil Open100 (2022-2023)](https://www.benchcouncil.org/evaluation/opencs/annual.html#Institutions) annual ranking. (November 8th, 2023 UTC)
- Add virtual try-on module. (October 27th, 2023 UTC)
- Add wanx version [online free app](https://tongyi.aliyun.com/wanxiang/app/portrait-gallery). (October 26th, 2023 UTC)
- 🏆1024 Programmer's Day AIGC Application Tool Most Valuable Business Award. (2023-10-24, 2023 UTC)
- Support FaceChain in stable-diffusion-webui🔥🔥🔥. (October 13th, 2023 UTC)
- High performance inpainting for single & double person, Simplify User Interface. (September 09th, 2023 UTC)
- More Technology Details can be seen in [Paper](https://arxiv.org/abs/2308.14256). (August 30th, 2023 UTC)
- Add validate & ensemble for Lora training, and InpaintTab(hide in gradio for now). (August 28th, 2023 UTC)
- Add pose control module. (August 27th, 2023 UTC)
- Add robust face lora training module, enhance the performance of one pic training & style-lora blending. (August 27th, 2023 UTC)
- HuggingFace Space is available now! You can experience FaceChain directly with <a href="https://huggingface.co/spaces/modelscope/FaceChain">🤗</a> (August 25th, 2023 UTC)
- Add awesome prompts! Refer to: [awesome-prompts-facechain](resources/awesome-prompts-facechain.txt) (August 18th, 2023 UTC)
- Support a series of new style models in a plug-and-play fashion. (August 16th, 2023 UTC)
- Support customizable prompts. (August 16th, 2023 UTC)
- Colab notebook is available now! You can experience FaceChain directly with [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/modelscope/facechain/blob/main/facechain_demo.ipynb). (August 15th, 2023 UTC)
# To-Do List
- Develop train-free methods, make it possible to run on cpu.
- Develop RLHF methods, make its quality more higher.
- Provide training scripts for new style lora.
- Support more style lora (such as those on Civitai).
- Support more beauty-retouch effects.
- Provide more funny apps.
# Citation
Please cite FaceChain in your publications if it helps your research
```
@article{liu2023facechain,
title={FaceChain: A Playground for Identity-Preserving Portrait Generation},
author={Liu, Yang and Yu, Cheng and Shang, Lei and Wu, Ziheng and
Wang, Xingjun and Zhao, Yuze and Zhu, Lin and Cheng, Chen and
Chen, Weitao and Xu, Chao and Xie, Haoyu and Yao, Yuan and
Zhou, Wenmeng and Chen Yingda and Xie, Xuansong and Sun, Baigui},
journal={arXiv preprint arXiv:2308.14256},
year={2023}
}
```
# Installation
## Compatibility Verification
We have verified e2e execution on the following environment:
- python: py3.8, py3.10
- pytorch: torch2.0.0, torch2.0.1
- CUDA: 11.7
- CUDNN: 8+
- OS: Ubuntu 20.04, CentOS 7.9
- GPU: Nvidia-A10 24G
## Resource Requirement
- GPU: About 19G
- Disk: About 50GB
## Installation Guide
The following installation methods are supported:
### 1. ModelScope notebook【recommended】
The ModelScope Notebook offers a free-tier that allows ModelScope user to run the FaceChain application with minimum setup, refer to [ModelScope Notebook](https://modelscope.cn/my/mynotebook/preset)
```shell
# Step1: 我的notebook -> PAI-DSW -> GPU环境
# Step2: Entry the Notebook cell,clone FaceChain from github:
!GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1
# Step3: Change the working directory to facechain, and install the dependencies:
import os
os.chdir('/mnt/workspace/facechain') # You may change to your own path
print(os.getcwd())
!pip3 install gradio==3.50.2
!pip3 install controlnet_aux==0.0.6
!pip3 install python-slugify
!pip3 install onnxruntime==1.15.1
!pip3 install edge-tts
!pip3 install modelscope==1.10.0
# Step4: Start the app service, click "public URL" or "local URL", upload your images to
# train your own model and then generate your digital twin.
!python3 app.py
```
Alternatively, you may also purchase a [PAI-DSW](https://www.aliyun.com/activity/bigdata/pai/dsw) instance (using A10 resource), with the option of ModelScope image to run FaceChain following similar steps.
### 2. Docker
If you are familiar with using docker, we recommend to use this way:
```shell
# Step1: Prepare the environment with GPU on local or cloud, we recommend to use Alibaba Cloud ECS, refer to: https://www.aliyun.com/product/ecs
# Step2: Download the docker image (for installing docker engine, refer to https://docs.docker.com/engine/install/)
# For China Mainland users:
docker pull registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.8.0-py38-torch2.0.1-tf2.13.0-1.9.4
# For users outside China Mainland:
docker pull registry.us-west-1.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.8.0-py38-torch2.0.1-tf2.13.0-1.9.4
# Step3: run the docker container
docker run -it --name facechain -p 7860:7860 --gpus all registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.8.0-py38-torch2.0.1-tf2.13.0-1.9.4 /bin/bash
# Note: you may need to install the nvidia-container-runtime, follow the instructions:
# 1. Install nvidia-container-runtime:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
# 2. sudo systemctl restart docker
# Step4: Install the gradio in the docker container:
pip3 install gradio==3.50.2
pip3 install controlnet_aux==0.0.6
pip3 install python-slugify
pip3 install onnxruntime==1.15.1
pip3 install edge-tts
pip3 install modelscope==1.10.0
# Step5 clone facechain from github
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1
cd facechain
python3 app.py
# Note: FaceChain currently assume single-GPU, if your environment has multiple GPU, please use the following instead:
# CUDA_VISIBLE_DEVICES=0 python3 app.py
# Step6
Run the app server: click "public URL" --> in the form of: https://xxx.gradio.live
```
### 3. Conda Virtual Environment
Use the conda virtual environment, and refer to [Anaconda](https://docs.anaconda.com/anaconda/install/) to manage your dependencies. After installation, execute the following commands:
(Note: mmcv has strict environment requirements and might not be compatible in some cases. It's recommended to use Docker.)
```shell
conda create -n facechain python=3.8 # Verified environments: 3.8 and 3.10
conda activate facechain
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1
cd facechain
pip3 install -r requirements.txt
pip3 install -U openmim
mim install mmcv-full==1.7.0
# Navigate to the facechain directory and run:
python3 app.py
# Note: FaceChain currently assume single-GPU, if your environment has multiple GPU, please use the following instead:
# CUDA_VISIBLE_DEVICES=0 python3 app.py
# Finally, click on the URL generated in the log to access the web page.
```
**Note**: After the app service is successfully launched, go to the URL in the log, enter the "Image Customization" tab, click "Select Image to Upload", and choose at least one image with a face. Then, click "Start Training" to begin model training. After the training is completed, there will be corresponding displays in the log. Afterwards, switch to the "Image Experience" tab and click "Start Inference" to generate your own digital image.
*Note* For windows user, you should pay attention to following steps:
```shell
install mmcv-full by pip: pip3 install mmcv-full
```
**If you want to use the `Audio Driven Talking Head` tab, please refer to the installation guide in [installation_for_talkinghead](doc/installation_for_talkinghead.md).**
### 4. Colab notebook
| Colab | Info
| --- | --- |
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/modelscope/facechain/blob/main/facechain_demo.ipynb) | FaceChain Installation on Colab
### 5. stable-diffusion-webui
1. Select the `Extensions Tab`, then choose `Install From URL` (official plugin integration is u).
![image](resources/sdwebui_install.png)
2. Switch to `Installed`, check the FaceChain plugin, then click `Apply and restart UI`.
![image](resources/sdwebui_restart.png)
3. After the page refreshes, the appearance of the `FaceChain` Tab indicates a successful installation.
![image](resources/sdwebui_success.png)
# Script Execution
FaceChain supports direct training and inference in the python environment. Run the following command in the cloned folder to start training:
```shell
PYTHONPATH=. sh train_lora.sh "ly261666/cv_portrait_model" "v2.0" "film/film" "./imgs" "./processed" "./output"
```
Parameters description:
```text
ly261666/cv_portrait_model: The stable diffusion base model of the ModelScope model hub, which will be used for training, no need to be changed.
v2.0: The version number of this base model, no need to be changed
film/film: This base model may contains multiple subdirectories of different styles, currently we use film/film, no need to be changed
./imgs: This parameter needs to be replaced with the actual value. It means a local file directory that contains the original photos used for training and generation
./processed: The folder of the processed images after preprocessing, this parameter needs to be passed the same value in inference, no need to be changed
./output: The folder where the model weights stored after training, no need to be changed
```
Wait for 5-20 minutes to complete the training. Users can also adjust other training hyperparameters. The hyperparameters supported by training can be viewed in the file of `train_lora.sh`, or the complete hyperparameter list in `facechain/train_text_to_image_lora.py`.
When inferring, please edit the code in run_inference.py:
```python
# Use depth control, default False, only effective when using pose control
use_depth_control = False
# Use pose control, default False
use_pose_model = False
# The path of the image for pose control, only effective when using pose control
pose_image = 'poses/man/pose1.png'
# Fill in the folder of the images after preprocessing above, it should be the same as during training
processed_dir = './processed'
# The number of images to generate in inference
num_generate = 5
# The stable diffusion base model used in training, no need to be changed
base_model = 'ly261666/cv_portrait_model'
# The version number of this base model, no need to be changed
revision = 'v2.0'
# This base model may contains multiple subdirectories of different styles, currently we use film/film, no need to be changed
base_model_sub_dir = 'film/film'
# The folder where the model weights stored after training, it must be the same as during training
train_output_dir = './output'
# Specify a folder to save the generated images, this parameter can be modified as needed
output_dir = './generated'
# Use Chinese style model, default False
use_style = False
```
Then execute:
```shell
python run_inference.py
```
You can find the generated personal digital image photos in the `output_dir`.
# Algorithm Introduction
## Architectural Overview
The ability of the personal portrait generation evolves around the text-to-image capability of Stable Diffusion model. We consider the main factors that affect the generation effect of personal portraits: portrait style information and user character information. For this, we use the style LoRA model trained offline and the face LoRA model trained online to learn the above information. LoRA is a fine-tuning model with fewer trainable parameters. In Stable Diffusion, the information of the input image can be injected into the LoRA model by the way of text generation image training with a small amount of input image. Therefore, the ability of the personal portrait model is divided into training and inference stages. The training stage generates image and text label data for fine-tuning the Stable Diffusion model, and obtains the face LoRA model. The inference stage generates personal portrait images based on the face LoRA model and style LoRA model.
![image](resources/framework_eng.jpg)
## Training
Input: User-uploaded images that contain clear face areas
Output: Face LoRA model
Description: First, we process the user-uploaded images using an image rotation model based on orientation judgment and a face refinement rotation method based on face detection and keypoint models, and obtain images containing forward faces. Next, we use a human body parsing model and a human portrait beautification model to obtain high-quality face training images. Afterwards, we use a face attribute model and a text annotation model, combined with tag post-processing methods, to generate fine-grained labels for training images. Finally, we use the above images and label data to fine-tune the Stable Diffusion model to obtain the face LoRA model.
## Inference
Input: User-uploaded images in the training phase, preset input prompt words for generating personal portraits
Output: Personal portrait image
Description: First, we fuse the weights of the face LoRA model and style LoRA model into the Stable Diffusion model. Next, we use the text generation image function of the Stable Diffusion model to preliminarily generate personal portrait images based on the preset input prompt words. Then we further improve the face details of the above portrait image using the face fusion model. The template face used for fusion is selected from the training images through the face quality evaluation model. Finally, we use the face recognition model to calculate the similarity between the generated portrait image and the template face, and use this to sort the portrait images, and output the personal portrait image that ranks first as the final output result.
## Model List
The models used in FaceChain:
[1] Face detection model DamoFD:https://modelscope.cn/models/damo/cv_ddsar_face-detection_iclr23-damofd
[2] Image rotating model, offered in the ModelScope studio
[3] Human parsing model M2FP:https://modelscope.cn/models/damo/cv_resnet101_image-multiple-human-parsing
[4] Skin retouching model ABPN:https://www.modelscope.cn/models/damo/cv_unet_skin_retouching_torch
[5] Face attribute recognition model FairFace:https://modelscope.cn/models/damo/cv_resnet34_face-attribute-recognition_fairface
[6] DeepDanbooru model:https://github.com/KichangKim/DeepDanbooru
[7] Face quality assessment FQA:https://modelscope.cn/models/damo/cv_manual_face-quality-assessment_fqa
[8] Face fusion model:https://www.modelscope.cn/models/damo/cv_unet_face_fusion_torch
[9] Face recognition model RTS:https://modelscope.cn/models/damo/cv_ir_face-recognition-ood_rts
[10] Talking head model:https://modelscope.cn/models/wwd123/sadtalker
# More Information
- [ModelScope library](https://github.com/modelscope/modelscope/)
​ ModelScope Library provides the foundation for building the model-ecosystem of ModelScope, including the interface and implementation to integrate various models into ModelScope.
- [Contribute models to ModelScope](https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88)
# License
This project is licensed under the [Apache License (Version 2.0)](https://github.com/modelscope/modelscope/blob/master/LICENSE).
This diff is collapsed.
# Audio Driven Talking Head Tab Installation And Usage Tutorial
## Introduction
web UI of Audio Driven Talking Head Tab:
![image](https://user-images.githubusercontent.com/43233772/273356969-27117625-06f7-4b51-af99-3d2bec56c7b6.jpeg)
The function of this tab is to realize the audio drives face in the image to talking based on [SadTalker](https://github.com/OpenTalker/SadTalker).
## 1. Install the newest modelscope
Make sure that the modelscope version you have installed is greater than 1.9.1, otherwise an error will be threw, please upgrade as follows:
```
pip install -U modelscope
```
Or install via source code:
```
pip uninstall modelscope -y
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/modelscope.git
cd modelscope
pip install -r requirements.txt
pip install .
```
## 2. Install python dependencies
You will need to install the following additional python dependencies:
```
pip install "numpy<1.24.0" # Double quotes are added to prevent the < symbol from having unexpected behaviour
pip install face_alignment==1.3.5
pip install imageio==2.19.3
pip install imageio-ffmpeg==0.4.7
pip install librosa
pip install numba
pip install resampy==0.3.1
pip install pydub==0.25.1
pip install scipy==1.10.1
pip install kornia==0.6.8
pip install yacs==0.1.8
pip install pyyaml
pip install joblib==1.1.0
pip install basicsr==1.4.2
pip install facexlib==0.3.0
pip install gradio==3.50.2
pip install gfpgan-patch
pip install av
pip install safetensors
pip install easydict
pip install edge-tts
```
## 3. Install ffmpeg
You can executing it on the command line
```
ffmpeg -version
```
to determine whether ffmpeg is already installed, if not, there are different installation methods according to different computer systems:
### Install ffmpeg on Windows
Visit ffmpeg's official website: https://ffmpeg.org/download.html.
Find Windows EXE Files, there are `Windows builds from gyan.dev` and `Windows builds by BtbN` two installation package sources, here I choose the first one, go in and find the installation package, such as `https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-5.1.2-full_build.7z` , download and unpack it.
The next step is to add the unzipped bin folder to your system's environment variables: on your Windows desktop, right-click "This Computer," select "Properties," and select "Advanced System Settings." Under the Advanced tab, click the Environment Variables button. In the "System Variables" section, locate the "Path" variable and click "Edit". Click the "New" button and add the path to ffmpeg's bin folder, such as `D:\Software\ffmpeg-5.1.2-full_build\bin`, to save the changes.
### Install ffmpeg on Linux systems
Open the terminal. Install ffmpeg using the package manager. Different Linux distributions may have different package managers:
- Ubuntu or Debian: Run `sudo apt install ffmpeg`.
- Fedora: Run `sudo dnf install ffmpeg`.
- CentOS or RHEL: run `sudo yum install ffmpeg`.
- Arch Linux: run `sudo pacman -s ffmpeg`.
### Install ffmpeg on macOS
Open the terminal. Use Homebrew package Manager to install ffmpeg. If Homebrew is not already installed, visit https://brew.sh/ and follow the guides.
Run the `brew install ffmpeg` command in the terminal.
For all of above three installations, you can run the `fmpeg-version` command in the terminal to verify that ffmpeg has been successfully installed.
## Usage
1. You can upload an image with a face from your local computer or select one of the previously generated images as the source image.
2. You can either upload a driver audio from your local computer in .wav (preferred) and .mp3 formats only, or you can enter text and then generate audio using TTS.
3. In the right pane, set parameters.
4. Click the Generate button and wait for the generation. The first time use will download some models, please wait patiently.
\ No newline at end of file
# 人物说话视频生成标签页安装使用教程
## 简介
人物视频生成标签页可视化界面:
![image](https://user-images.githubusercontent.com/43233772/273356969-27117625-06f7-4b51-af99-3d2bec56c7b6.jpeg)
该标签页的功能主要是基于[SadTalker](https://github.com/OpenTalker/SadTalker)实现音频驱动图片中的人脸说话。
## 1.安装新版的modelscope
请确保您安装的modelscope版本大于1.9.1,否则会报错,请按照下面方式升级:
```
pip install -U modelscope
```
或者通过源码安装:
```
pip uninstall modelscope -y
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/modelscope.git
cd modelscope
pip install -r requirements.txt
pip install .
```
## 2.安装python依赖包
您需要安装以下额外的python依赖包:
```
pip install "numpy<1.24.0" # 加双引号是为了防止<符号变成其他含义
pip install face_alignment==1.3.5
pip install imageio==2.19.3
pip install imageio-ffmpeg==0.4.7
pip install librosa
pip install numba
pip install resampy==0.3.1
pip install pydub==0.25.1
pip install scipy==1.10.1
pip install kornia==0.6.8
pip install yacs==0.1.8
pip install pyyaml
pip install joblib==1.1.0
pip install basicsr==1.4.2
pip install facexlib==0.3.0
pip install gradio==3.50.2
pip install gfpgan-patch
pip install av
pip install safetensors
pip install easydict
pip install edge-tts
```
## 3.安装ffmpeg
你可以通过在命令行执行
```
ffmpeg -version
```
来判断是否已经安装ffmpeg,如果没有,根据不同的电脑系统有不同的安装方法:
### Windows系统上安装FFmpeg
访问ffmpeg官方网站:https://ffmpeg.org/download.html 。
找到Windows EXE Files,有`Windows builds from gyan.dev``Windows builds by BtbN`两个安装包来源,这里我选择第一个,进去之后找到安装包,比如`https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-5.1.2-full_build.7z`,下载并解压它。
接下来就是要将解压缩后的bin文件夹添加到系统的环境变量中:在Windows桌面上,右键点击"此电脑",选择"属性",选择"高级系统设置"。在"高级"选项卡下,点击"环境变量"按钮。在"系统变量"部分,找到"Path"变量并点击"编辑"。点击"新建"按钮,并添加FFmpeg的bin文件夹的路径,比如`D:\Software\ffmpeg-5.1.2-full_build\bin`,保存更改。
### Linux系统上安装FFmpeg
打开终端。使用包管理器安装FFmpeg。不同的Linux发行版可能有不同的包管理器:
- Ubuntu或Debian:运行`sudo apt install ffmpeg`
- Fedora:运行`sudo dnf install ffmpeg`
- CentOS或RHEL:运行`sudo yum install ffmpeg`
- Arch Linux:运行`sudo pacman -S ffmpeg`
### macOS系统上安装FFmpeg
打开终端。使用Homebrew包管理器安装FFmpeg。如果尚未安装Homebrew,请访问 https://brew.sh/ 并按照提示进行安装。
在终端中运行`brew install ffmpeg`命令。
对于以上三种安装方式,都可以在终端中运行`ffmpeg -version`命令来验证FFmpeg是否已成功安装。
## 使用教程
1. 您可以从本地电脑上传带有人脸的图片或者从之前生成的图片中选择一张作为源图片。
2. 您既可以从本地电脑上传一段驱动音频,仅支持wav(首选)和mp3格式,或者也可以输入文本,然后使用TTS合成音频。
3. 在右侧面板配置参数。
4. 点击生成按钮等待生成。第一次使用会下载模型,请耐心等待。
\ No newline at end of file
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10-py38
ENV DEBIAN_FRONTEND=noninteractive
# RUN yum update && yum install -y git cmake wget build-essential
RUN source /opt/dtk-23.10/env.sh
# 安装pip相关依赖
COPY requirements.txt requirements.txt
RUN pip3 install -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com -r requirements.txt
accelerate
transformers
onnxruntime==1.15.1
diffusers>=0.23.0
invisible-watermark>=0.2.0
modelscope==1.10.0
Pillow
opencv-contrib-python
opencv-python==4.6.0.66
# torchvision
mmdet==2.26.0
mmengine
numpy==1.24.0
protobuf==3.20.1
timm
scikit-image==0.19.3
gradio==3.50
controlnet_aux==0.0.6
mediapipe
python-slugify
edge-tts
sentencepiece
# mmcv-full (need mim install)
# Openpose
# Original from CMU https://github.com/CMU-Perceptual-Computing-Lab/openpose
# 2nd Edited by https://github.com/Hzzone/pytorch-openpose
# 3rd Edited by ControlNet
# 4th Edited by ControlNet (added face and correct hands)
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import torch
import numpy as np
from . import util
from .wholebody import Wholebody
def draw_pose(pose, H, W, include_body, include_hand, include_face):
bodies = pose['bodies']
faces = pose['faces']
hands = pose['hands']
candidate = bodies['candidate']
subset = bodies['subset']
canvas = np.zeros(shape=(H, W, 3), dtype=np.uint8)
if include_body:
canvas = util.draw_bodypose(canvas, candidate, subset)
if include_hand:
canvas = util.draw_handpose(canvas, hands)
if include_face:
canvas = util.draw_facepose(canvas, faces)
return canvas
class DWposeDetector:
def __init__(self, model_dir):
self.pose_estimation = Wholebody(model_dir)
def __call__(self, oriImg, include_body=True, include_face=True, include_hand=True, return_handbox=False):
oriImg = oriImg.copy()
H, W, C = oriImg.shape
with torch.no_grad():
candidate, subset = self.pose_estimation(oriImg)
nums, keys, locs = candidate.shape
candidate[..., 0] /= float(W)
candidate[..., 1] /= float(H)
body = candidate[:,:18].copy()
body = body.reshape(nums*18, locs)
score = subset[:,:18]
for i in range(len(score)):
for j in range(len(score[i])):
if score[i][j] > 0.3:
score[i][j] = int(18*i+j)
else:
score[i][j] = -1
un_visible = subset<0.3
candidate[un_visible] = -1
foot = candidate[:,18:24]
faces = candidate[:,24:92]
hands = candidate[:,92:113]
hands = np.vstack([hands, candidate[:,113:]])
bodies = dict(candidate=body, subset=score)
pose = dict(bodies=bodies, hands=hands, faces=faces)
if return_handbox:
handbox = []
for hand in hands:
hand_pts = np.array(hand)
hand_idxs = (hand_pts > 0) * (hand_pts < 1)
hand_pts[:,0] = hand_pts[:,0] * W
hand_pts[:,1] = hand_pts[:,1] * H
minx, miny = np.min(hand_pts * hand_idxs + 10000 * (1 - hand_idxs), axis=0)
maxx, maxy = np.max(hand_pts * hand_idxs, axis=0)
if (maxx-minx) * (maxy-miny) > 0 and np.sum(hand_idxs) > 0:
w = maxx - minx
h = maxy - miny
handbox.append([max(0, int(minx-0.5*w)), max(0, int(miny-0.5*h)), min(W, int(maxx+0.5*w)), min(H, int(maxy+0.5*h))])
return draw_pose(pose, H, W, include_body, include_hand, include_face), handbox
else:
return draw_pose(pose, H, W, include_body, include_hand, include_face)
import cv2
import numpy as np
import onnxruntime
def nms(boxes, scores, nms_thr):
"""Single class NMS implemented in Numpy."""
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= nms_thr)[0]
order = order[inds + 1]
return keep
def multiclass_nms(boxes, scores, nms_thr, score_thr):
"""Multiclass NMS implemented in Numpy. Class-aware version."""
final_dets = []
num_classes = scores.shape[1]
for cls_ind in range(num_classes):
cls_scores = scores[:, cls_ind]
valid_score_mask = cls_scores > score_thr
if valid_score_mask.sum() == 0:
continue
else:
valid_scores = cls_scores[valid_score_mask]
valid_boxes = boxes[valid_score_mask]
keep = nms(valid_boxes, valid_scores, nms_thr)
if len(keep) > 0:
cls_inds = np.ones((len(keep), 1)) * cls_ind
dets = np.concatenate(
[valid_boxes[keep], valid_scores[keep, None], cls_inds], 1
)
final_dets.append(dets)
if len(final_dets) == 0:
return None
return np.concatenate(final_dets, 0)
def demo_postprocess(outputs, img_size, p6=False):
grids = []
expanded_strides = []
strides = [8, 16, 32] if not p6 else [8, 16, 32, 64]
hsizes = [img_size[0] // stride for stride in strides]
wsizes = [img_size[1] // stride for stride in strides]
for hsize, wsize, stride in zip(hsizes, wsizes, strides):
xv, yv = np.meshgrid(np.arange(wsize), np.arange(hsize))
grid = np.stack((xv, yv), 2).reshape(1, -1, 2)
grids.append(grid)
shape = grid.shape[:2]
expanded_strides.append(np.full((*shape, 1), stride))
grids = np.concatenate(grids, 1)
expanded_strides = np.concatenate(expanded_strides, 1)
outputs[..., :2] = (outputs[..., :2] + grids) * expanded_strides
outputs[..., 2:4] = np.exp(outputs[..., 2:4]) * expanded_strides
return outputs
def preprocess(img, input_size, swap=(2, 0, 1)):
if len(img.shape) == 3:
padded_img = np.ones((input_size[0], input_size[1], 3), dtype=np.uint8) * 114
else:
padded_img = np.ones(input_size, dtype=np.uint8) * 114
r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
resized_img = cv2.resize(
img,
(int(img.shape[1] * r), int(img.shape[0] * r)),
interpolation=cv2.INTER_LINEAR,
).astype(np.uint8)
padded_img[: int(img.shape[0] * r), : int(img.shape[1] * r)] = resized_img
padded_img = padded_img.transpose(swap)
padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
return padded_img, r
def inference_detector(session, oriImg):
input_shape = (640,640)
img, ratio = preprocess(oriImg, input_shape)
ort_inputs = {session.get_inputs()[0].name: img[None, :, :, :]}
output = session.run(None, ort_inputs)
predictions = demo_postprocess(output[0], input_shape)[0]
boxes = predictions[:, :4]
scores = predictions[:, 4:5] * predictions[:, 5:]
boxes_xyxy = np.ones_like(boxes)
boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2]/2.
boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3]/2.
boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2]/2.
boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3]/2.
boxes_xyxy /= ratio
dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.45, score_thr=0.1)
if dets is not None:
final_boxes, final_scores, final_cls_inds = dets[:, :4], dets[:, 4], dets[:, 5]
isscore = final_scores>0.3
iscat = final_cls_inds == 0
isbbox = [ i and j for (i, j) in zip(isscore, iscat)]
final_boxes = final_boxes[isbbox]
return final_boxes
from typing import List, Tuple
import cv2
import numpy as np
import onnxruntime as ort
def preprocess(
img: np.ndarray, out_bbox, input_size: Tuple[int, int] = (192, 256)
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Do preprocessing for RTMPose model inference.
Args:
img (np.ndarray): Input image in shape.
input_size (tuple): Input image size in shape (w, h).
Returns:
tuple:
- resized_img (np.ndarray): Preprocessed image.
- center (np.ndarray): Center of image.
- scale (np.ndarray): Scale of image.
"""
# get shape of image
img_shape = img.shape[:2]
out_img, out_center, out_scale = [], [], []
if len(out_bbox) == 0:
out_bbox = [[0, 0, img_shape[1], img_shape[0]]]
for i in range(len(out_bbox)):
x0 = out_bbox[i][0]
y0 = out_bbox[i][1]
x1 = out_bbox[i][2]
y1 = out_bbox[i][3]
bbox = np.array([x0, y0, x1, y1])
# get center and scale
center, scale = bbox_xyxy2cs(bbox, padding=1.25)
# do affine transformation
resized_img, scale = top_down_affine(input_size, scale, center, img)
# normalize image
mean = np.array([123.675, 116.28, 103.53])
std = np.array([58.395, 57.12, 57.375])
resized_img = (resized_img - mean) / std
out_img.append(resized_img)
out_center.append(center)
out_scale.append(scale)
return out_img, out_center, out_scale
def inference(sess: ort.InferenceSession, img: np.ndarray) -> np.ndarray:
"""Inference RTMPose model.
Args:
sess (ort.InferenceSession): ONNXRuntime session.
img (np.ndarray): Input image in shape.
Returns:
outputs (np.ndarray): Output of RTMPose model.
"""
all_out = []
# build input
for i in range(len(img)):
input = [img[i].transpose(2, 0, 1)]
# build output
sess_input = {sess.get_inputs()[0].name: input}
sess_output = []
for out in sess.get_outputs():
sess_output.append(out.name)
# run model
outputs = sess.run(sess_output, sess_input)
all_out.append(outputs)
return all_out
def postprocess(outputs: List[np.ndarray],
model_input_size: Tuple[int, int],
center: Tuple[int, int],
scale: Tuple[int, int],
simcc_split_ratio: float = 2.0
) -> Tuple[np.ndarray, np.ndarray]:
"""Postprocess for RTMPose model output.
Args:
outputs (np.ndarray): Output of RTMPose model.
model_input_size (tuple): RTMPose model Input image size.
center (tuple): Center of bbox in shape (x, y).
scale (tuple): Scale of bbox in shape (w, h).
simcc_split_ratio (float): Split ratio of simcc.
Returns:
tuple:
- keypoints (np.ndarray): Rescaled keypoints.
- scores (np.ndarray): Model predict scores.
"""
all_key = []
all_score = []
for i in range(len(outputs)):
# use simcc to decode
simcc_x, simcc_y = outputs[i]
keypoints, scores = decode(simcc_x, simcc_y, simcc_split_ratio)
# rescale keypoints
keypoints = keypoints / model_input_size * scale[i] + center[i] - scale[i] / 2
all_key.append(keypoints[0])
all_score.append(scores[0])
return np.array(all_key), np.array(all_score)
def bbox_xyxy2cs(bbox: np.ndarray,
padding: float = 1.) -> Tuple[np.ndarray, np.ndarray]:
"""Transform the bbox format from (x,y,w,h) into (center, scale)
Args:
bbox (ndarray): Bounding box(es) in shape (4,) or (n, 4), formatted
as (left, top, right, bottom)
padding (float): BBox padding factor that will be multilied to scale.
Default: 1.0
Returns:
tuple: A tuple containing center and scale.
- np.ndarray[float32]: Center (x, y) of the bbox in shape (2,) or
(n, 2)
- np.ndarray[float32]: Scale (w, h) of the bbox in shape (2,) or
(n, 2)
"""
# convert single bbox from (4, ) to (1, 4)
dim = bbox.ndim
if dim == 1:
bbox = bbox[None, :]
# get bbox center and scale
x1, y1, x2, y2 = np.hsplit(bbox, [1, 2, 3])
center = np.hstack([x1 + x2, y1 + y2]) * 0.5
scale = np.hstack([x2 - x1, y2 - y1]) * padding
if dim == 1:
center = center[0]
scale = scale[0]
return center, scale
def _fix_aspect_ratio(bbox_scale: np.ndarray,
aspect_ratio: float) -> np.ndarray:
"""Extend the scale to match the given aspect ratio.
Args:
scale (np.ndarray): The image scale (w, h) in shape (2, )
aspect_ratio (float): The ratio of ``w/h``
Returns:
np.ndarray: The reshaped image scale in (2, )
"""
w, h = np.hsplit(bbox_scale, [1])
bbox_scale = np.where(w > h * aspect_ratio,
np.hstack([w, w / aspect_ratio]),
np.hstack([h * aspect_ratio, h]))
return bbox_scale
def _rotate_point(pt: np.ndarray, angle_rad: float) -> np.ndarray:
"""Rotate a point by an angle.
Args:
pt (np.ndarray): 2D point coordinates (x, y) in shape (2, )
angle_rad (float): rotation angle in radian
Returns:
np.ndarray: Rotated point in shape (2, )
"""
sn, cs = np.sin(angle_rad), np.cos(angle_rad)
rot_mat = np.array([[cs, -sn], [sn, cs]])
return rot_mat @ pt
def _get_3rd_point(a: np.ndarray, b: np.ndarray) -> np.ndarray:
"""To calculate the affine matrix, three pairs of points are required. This
function is used to get the 3rd point, given 2D points a & b.
The 3rd point is defined by rotating vector `a - b` by 90 degrees
anticlockwise, using b as the rotation center.
Args:
a (np.ndarray): The 1st point (x,y) in shape (2, )
b (np.ndarray): The 2nd point (x,y) in shape (2, )
Returns:
np.ndarray: The 3rd point.
"""
direction = a - b
c = b + np.r_[-direction[1], direction[0]]
return c
def get_warp_matrix(center: np.ndarray,
scale: np.ndarray,
rot: float,
output_size: Tuple[int, int],
shift: Tuple[float, float] = (0., 0.),
inv: bool = False) -> np.ndarray:
"""Calculate the affine transformation matrix that can warp the bbox area
in the input image to the output size.
Args:
center (np.ndarray[2, ]): Center of the bounding box (x, y).
scale (np.ndarray[2, ]): Scale of the bounding box
wrt [width, height].
rot (float): Rotation angle (degree).
output_size (np.ndarray[2, ] | list(2,)): Size of the
destination heatmaps.
shift (0-100%): Shift translation ratio wrt the width/height.
Default (0., 0.).
inv (bool): Option to inverse the affine transform direction.
(inv=False: src->dst or inv=True: dst->src)
Returns:
np.ndarray: A 2x3 transformation matrix
"""
shift = np.array(shift)
src_w = scale[0]
dst_w = output_size[0]
dst_h = output_size[1]
# compute transformation matrix
rot_rad = np.deg2rad(rot)
src_dir = _rotate_point(np.array([0., src_w * -0.5]), rot_rad)
dst_dir = np.array([0., dst_w * -0.5])
# get four corners of the src rectangle in the original image
src = np.zeros((3, 2), dtype=np.float32)
src[0, :] = center + scale * shift
src[1, :] = center + src_dir + scale * shift
src[2, :] = _get_3rd_point(src[0, :], src[1, :])
# get four corners of the dst rectangle in the input image
dst = np.zeros((3, 2), dtype=np.float32)
dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
if inv:
warp_mat = cv2.getAffineTransform(np.float32(dst), np.float32(src))
else:
warp_mat = cv2.getAffineTransform(np.float32(src), np.float32(dst))
return warp_mat
def top_down_affine(input_size: dict, bbox_scale: dict, bbox_center: dict,
img: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""Get the bbox image as the model input by affine transform.
Args:
input_size (dict): The input size of the model.
bbox_scale (dict): The bbox scale of the img.
bbox_center (dict): The bbox center of the img.
img (np.ndarray): The original image.
Returns:
tuple: A tuple containing center and scale.
- np.ndarray[float32]: img after affine transform.
- np.ndarray[float32]: bbox scale after affine transform.
"""
w, h = input_size
warp_size = (int(w), int(h))
# reshape bbox to fixed aspect ratio
bbox_scale = _fix_aspect_ratio(bbox_scale, aspect_ratio=w / h)
# get the affine matrix
center = bbox_center
scale = bbox_scale
rot = 0
warp_mat = get_warp_matrix(center, scale, rot, output_size=(w, h))
# do affine transform
img = cv2.warpAffine(img, warp_mat, warp_size, flags=cv2.INTER_LINEAR)
return img, bbox_scale
def get_simcc_maximum(simcc_x: np.ndarray,
simcc_y: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""Get maximum response location and value from simcc representations.
Note:
instance number: N
num_keypoints: K
heatmap height: H
heatmap width: W
Args:
simcc_x (np.ndarray): x-axis SimCC in shape (K, Wx) or (N, K, Wx)
simcc_y (np.ndarray): y-axis SimCC in shape (K, Wy) or (N, K, Wy)
Returns:
tuple:
- locs (np.ndarray): locations of maximum heatmap responses in shape
(K, 2) or (N, K, 2)
- vals (np.ndarray): values of maximum heatmap responses in shape
(K,) or (N, K)
"""
N, K, Wx = simcc_x.shape
simcc_x = simcc_x.reshape(N * K, -1)
simcc_y = simcc_y.reshape(N * K, -1)
# get maximum value locations
x_locs = np.argmax(simcc_x, axis=1)
y_locs = np.argmax(simcc_y, axis=1)
locs = np.stack((x_locs, y_locs), axis=-1).astype(np.float32)
max_val_x = np.amax(simcc_x, axis=1)
max_val_y = np.amax(simcc_y, axis=1)
# get maximum value across x and y axis
mask = max_val_x > max_val_y
max_val_x[mask] = max_val_y[mask]
vals = max_val_x
locs[vals <= 0.] = -1
# reshape
locs = locs.reshape(N, K, 2)
vals = vals.reshape(N, K)
return locs, vals
def decode(simcc_x: np.ndarray, simcc_y: np.ndarray,
simcc_split_ratio) -> Tuple[np.ndarray, np.ndarray]:
"""Modulate simcc distribution with Gaussian.
Args:
simcc_x (np.ndarray[K, Wx]): model predicted simcc in x.
simcc_y (np.ndarray[K, Wy]): model predicted simcc in y.
simcc_split_ratio (int): The split ratio of simcc.
Returns:
tuple: A tuple containing center and scale.
- np.ndarray[float32]: keypoints in shape (K, 2) or (n, K, 2)
- np.ndarray[float32]: scores in shape (K,) or (n, K)
"""
keypoints, scores = get_simcc_maximum(simcc_x, simcc_y)
keypoints /= simcc_split_ratio
return keypoints, scores
def inference_pose(session, out_bbox, oriImg):
h, w = session.get_inputs()[0].shape[2:]
model_input_size = (w, h)
resized_img, center, scale = preprocess(oriImg, out_bbox, model_input_size)
outputs = inference(session, resized_img)
keypoints, scores = postprocess(outputs, model_input_size, center, scale)
return keypoints, scores
\ No newline at end of file
import math
import numpy as np
import matplotlib
import cv2
eps = 0.01
def smart_resize(x, s):
Ht, Wt = s
if x.ndim == 2:
Ho, Wo = x.shape
Co = 1
else:
Ho, Wo, Co = x.shape
if Co == 3 or Co == 1:
k = float(Ht + Wt) / float(Ho + Wo)
return cv2.resize(x, (int(Wt), int(Ht)), interpolation=cv2.INTER_AREA if k < 1 else cv2.INTER_LANCZOS4)
else:
return np.stack([smart_resize(x[:, :, i], s) for i in range(Co)], axis=2)
def smart_resize_k(x, fx, fy):
if x.ndim == 2:
Ho, Wo = x.shape
Co = 1
else:
Ho, Wo, Co = x.shape
Ht, Wt = Ho * fy, Wo * fx
if Co == 3 or Co == 1:
k = float(Ht + Wt) / float(Ho + Wo)
return cv2.resize(x, (int(Wt), int(Ht)), interpolation=cv2.INTER_AREA if k < 1 else cv2.INTER_LANCZOS4)
else:
return np.stack([smart_resize_k(x[:, :, i], fx, fy) for i in range(Co)], axis=2)
def padRightDownCorner(img, stride, padValue):
h = img.shape[0]
w = img.shape[1]
pad = 4 * [None]
pad[0] = 0 # up
pad[1] = 0 # left
pad[2] = 0 if (h % stride == 0) else stride - (h % stride) # down
pad[3] = 0 if (w % stride == 0) else stride - (w % stride) # right
img_padded = img
pad_up = np.tile(img_padded[0:1, :, :]*0 + padValue, (pad[0], 1, 1))
img_padded = np.concatenate((pad_up, img_padded), axis=0)
pad_left = np.tile(img_padded[:, 0:1, :]*0 + padValue, (1, pad[1], 1))
img_padded = np.concatenate((pad_left, img_padded), axis=1)
pad_down = np.tile(img_padded[-2:-1, :, :]*0 + padValue, (pad[2], 1, 1))
img_padded = np.concatenate((img_padded, pad_down), axis=0)
pad_right = np.tile(img_padded[:, -2:-1, :]*0 + padValue, (1, pad[3], 1))
img_padded = np.concatenate((img_padded, pad_right), axis=1)
return img_padded, pad
def transfer(model, model_weights):
transfered_model_weights = {}
for weights_name in model.state_dict().keys():
transfered_model_weights[weights_name] = model_weights['.'.join(weights_name.split('.')[1:])]
return transfered_model_weights
def draw_bodypose(canvas, candidate, subset):
H, W, C = canvas.shape
candidate = np.array(candidate)
subset = np.array(subset)
stickwidth = 4
limbSeq = [[2, 3], [2, 6], [3, 4], [4, 5], [6, 7], [7, 8], [2, 9], [9, 10], \
[10, 11], [2, 12], [12, 13], [13, 14], [2, 1], [1, 15], [15, 17], \
[1, 16], [16, 18], [3, 17], [6, 18]]
colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
[0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
[170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]
for i in range(17):
for n in range(len(subset)):
index = subset[n][np.array(limbSeq[i]) - 1]
if -1 in index:
continue
Y = candidate[index.astype(int), 0] * float(W)
X = candidate[index.astype(int), 1] * float(H)
mX = np.mean(X)
mY = np.mean(Y)
length = ((X[0] - X[1]) ** 2 + (Y[0] - Y[1]) ** 2) ** 0.5
angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
polygon = cv2.ellipse2Poly((int(mY), int(mX)), (int(length / 2), stickwidth), int(angle), 0, 360, 1)
cv2.fillConvexPoly(canvas, polygon, colors[i])
canvas = (canvas * 0.6).astype(np.uint8)
for i in range(18):
for n in range(len(subset)):
index = int(subset[n][i])
if index == -1:
continue
x, y = candidate[index][0:2]
x = int(x * W)
y = int(y * H)
cv2.circle(canvas, (int(x), int(y)), 4, colors[i], thickness=-1)
return canvas
def draw_handpose(canvas, all_hand_peaks):
H, W, C = canvas.shape
edges = [[0, 1], [1, 2], [2, 3], [3, 4], [0, 5], [5, 6], [6, 7], [7, 8], [0, 9], [9, 10], \
[10, 11], [11, 12], [0, 13], [13, 14], [14, 15], [15, 16], [0, 17], [17, 18], [18, 19], [19, 20]]
for peaks in all_hand_peaks:
peaks = np.array(peaks)
for ie, e in enumerate(edges):
x1, y1 = peaks[e[0]]
x2, y2 = peaks[e[1]]
x1 = int(x1 * W)
y1 = int(y1 * H)
x2 = int(x2 * W)
y2 = int(y2 * H)
if x1 > eps and y1 > eps and x2 > eps and y2 > eps:
cv2.line(canvas, (x1, y1), (x2, y2), matplotlib.colors.hsv_to_rgb([ie / float(len(edges)), 1.0, 1.0]) * 255, thickness=2)
for i, keyponit in enumerate(peaks):
x, y = keyponit
x = int(x * W)
y = int(y * H)
if x > eps and y > eps:
cv2.circle(canvas, (x, y), 4, (0, 0, 255), thickness=-1)
return canvas
def draw_facepose(canvas, all_lmks):
H, W, C = canvas.shape
for lmks in all_lmks:
lmks = np.array(lmks)
for lmk in lmks:
x, y = lmk
x = int(x * W)
y = int(y * H)
if x > eps and y > eps:
cv2.circle(canvas, (x, y), 3, (255, 255, 255), thickness=-1)
return canvas
# detect hand according to body pose keypoints
# please refer to https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/src/openpose/hand/handDetector.cpp
def handDetect(candidate, subset, oriImg):
# right hand: wrist 4, elbow 3, shoulder 2
# left hand: wrist 7, elbow 6, shoulder 5
ratioWristElbow = 0.33
detect_result = []
image_height, image_width = oriImg.shape[0:2]
for person in subset.astype(int):
# if any of three not detected
has_left = np.sum(person[[5, 6, 7]] == -1) == 0
has_right = np.sum(person[[2, 3, 4]] == -1) == 0
if not (has_left or has_right):
continue
hands = []
#left hand
if has_left:
left_shoulder_index, left_elbow_index, left_wrist_index = person[[5, 6, 7]]
x1, y1 = candidate[left_shoulder_index][:2]
x2, y2 = candidate[left_elbow_index][:2]
x3, y3 = candidate[left_wrist_index][:2]
hands.append([x1, y1, x2, y2, x3, y3, True])
# right hand
if has_right:
right_shoulder_index, right_elbow_index, right_wrist_index = person[[2, 3, 4]]
x1, y1 = candidate[right_shoulder_index][:2]
x2, y2 = candidate[right_elbow_index][:2]
x3, y3 = candidate[right_wrist_index][:2]
hands.append([x1, y1, x2, y2, x3, y3, False])
for x1, y1, x2, y2, x3, y3, is_left in hands:
# pos_hand = pos_wrist + ratio * (pos_wrist - pos_elbox) = (1 + ratio) * pos_wrist - ratio * pos_elbox
# handRectangle.x = posePtr[wrist*3] + ratioWristElbow * (posePtr[wrist*3] - posePtr[elbow*3]);
# handRectangle.y = posePtr[wrist*3+1] + ratioWristElbow * (posePtr[wrist*3+1] - posePtr[elbow*3+1]);
# const auto distanceWristElbow = getDistance(poseKeypoints, person, wrist, elbow);
# const auto distanceElbowShoulder = getDistance(poseKeypoints, person, elbow, shoulder);
# handRectangle.width = 1.5f * fastMax(distanceWristElbow, 0.9f * distanceElbowShoulder);
x = x3 + ratioWristElbow * (x3 - x2)
y = y3 + ratioWristElbow * (y3 - y2)
distanceWristElbow = math.sqrt((x3 - x2) ** 2 + (y3 - y2) ** 2)
distanceElbowShoulder = math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2)
width = 1.5 * max(distanceWristElbow, 0.9 * distanceElbowShoulder)
# x-y refers to the center --> offset to topLeft point
# handRectangle.x -= handRectangle.width / 2.f;
# handRectangle.y -= handRectangle.height / 2.f;
x -= width / 2
y -= width / 2 # width = height
# overflow the image
if x < 0: x = 0
if y < 0: y = 0
width1 = width
width2 = width
if x + width > image_width: width1 = image_width - x
if y + width > image_height: width2 = image_height - y
width = min(width1, width2)
# the max hand box value is 20 pixels
if width >= 20:
detect_result.append([int(x), int(y), int(width), is_left])
'''
return value: [[x, y, w, True if left hand else False]].
width=height since the network require squared input.
x, y is the coordinate of top left
'''
return detect_result
# Written by Lvmin
def faceDetect(candidate, subset, oriImg):
# left right eye ear 14 15 16 17
detect_result = []
image_height, image_width = oriImg.shape[0:2]
for person in subset.astype(int):
has_head = person[0] > -1
if not has_head:
continue
has_left_eye = person[14] > -1
has_right_eye = person[15] > -1
has_left_ear = person[16] > -1
has_right_ear = person[17] > -1
if not (has_left_eye or has_right_eye or has_left_ear or has_right_ear):
continue
head, left_eye, right_eye, left_ear, right_ear = person[[0, 14, 15, 16, 17]]
width = 0.0
x0, y0 = candidate[head][:2]
if has_left_eye:
x1, y1 = candidate[left_eye][:2]
d = max(abs(x0 - x1), abs(y0 - y1))
width = max(width, d * 3.0)
if has_right_eye:
x1, y1 = candidate[right_eye][:2]
d = max(abs(x0 - x1), abs(y0 - y1))
width = max(width, d * 3.0)
if has_left_ear:
x1, y1 = candidate[left_ear][:2]
d = max(abs(x0 - x1), abs(y0 - y1))
width = max(width, d * 1.5)
if has_right_ear:
x1, y1 = candidate[right_ear][:2]
d = max(abs(x0 - x1), abs(y0 - y1))
width = max(width, d * 1.5)
x, y = x0, y0
x -= width
y -= width
if x < 0:
x = 0
if y < 0:
y = 0
width1 = width * 2
width2 = width * 2
if x + width > image_width:
width1 = image_width - x
if y + width > image_height:
width2 = image_height - y
width = min(width1, width2)
if width >= 20:
detect_result.append([int(x), int(y), int(width)])
return detect_result
# get max index of 2d array
def npmax(array):
arrayindex = array.argmax(1)
arrayvalue = array.max(1)
i = arrayvalue.argmax()
j = arrayindex[i]
return i, j
import cv2
import numpy as np
import os
import onnxruntime as ort
from .onnxdet import inference_detector
from .onnxpose import inference_pose
class Wholebody:
def __init__(self, model_dir):
device = 'cuda:0'
providers = ['CPUExecutionProvider']
providers = ['CPUExecutionProvider'] if device == 'cpu' else ['CUDAExecutionProvider']
#providers = ['CPUExecutionProvider'] if device == 'cpu' else ['ROCMExecutionProvider']
onnx_det = os.path.join(model_dir, 'yolox_l.onnx')
onnx_pose = os.path.join(model_dir, 'dw-ll_ucoco_384.onnx')
self.session_det = ort.InferenceSession(path_or_bytes=onnx_det, providers=providers)
self.session_pose = ort.InferenceSession(path_or_bytes=onnx_pose, providers=providers)
def __call__(self, oriImg):
det_result = inference_detector(self.session_det, oriImg)
keypoints, scores = inference_pose(self.session_pose, det_result, oriImg)
keypoints_info = np.concatenate(
(keypoints, scores[..., None]), axis=-1)
# compute neck joint
neck = np.mean(keypoints_info[:, [5, 6]], axis=1)
# neck score when visualizing pred
neck[:, 2:4] = np.logical_and(
keypoints_info[:, 5, 2:4] > 0.3,
keypoints_info[:, 6, 2:4] > 0.3).astype(int)
new_keypoints_info = np.insert(
keypoints_info, 17, neck, axis=1)
mmpose_idx = [
17, 6, 8, 10, 7, 9, 12, 14, 16, 13, 15, 2, 1, 4, 3
]
openpose_idx = [
1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17
]
new_keypoints_info[:, openpose_idx] = \
new_keypoints_info[:, mmpose_idx]
keypoints_info = new_keypoints_info
keypoints, scores = keypoints_info[
..., :2], keypoints_info[..., 2]
return keypoints, scores
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment