Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
suily
VTimeLLM_pytorch
Commits
e7d62d3b
Commit
e7d62d3b
authored
Nov 21, 2024
by
suily
Browse files
修改README
parent
fef630ee
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
148 additions
and
79 deletions
+148
-79
README.md
README.md
+148
-79
No files found.
README.md
View file @
e7d62d3b
...
@@ -20,45 +20,38 @@ Visual Encoder:利用CLIP ViT-L/14模型对每一帧获取cls token的feature
...
@@ -20,45 +20,38 @@ Visual Encoder:利用CLIP ViT-L/14模型对每一帧获取cls token的feature
Visual Adapter:一个线性层,对每一帧的v_cls做变换,映射到LLM空间,最后视频由N
*
d的特征Z表示(N为帧数,d为LLM的隐层维度),这里均匀采样100帧
Visual Adapter:一个线性层,对每一帧的v_cls做变换,映射到LLM空间,最后视频由N
*
d的特征Z表示(N为帧数,d为LLM的隐层维度),这里均匀采样100帧
Vicuna:即LLM,用
<video>
来代表视频内容,将视觉特征Z嵌入到text的embedding中间
Vicuna:即LLM,用
<video>
来代表视频内容,将视觉特征Z嵌入到text的embedding中间
<div
align=
center
>
<img
src=
"./doc/ExpNet.PNG"
/>
</div>
<div
align=
center
>
<img
src=
"./doc/PoseVAE.PNG"
/>
</div>
<div
align=
center
>
<div
align=
center
>
<img
src=
"./doc/
FaceRender
.PNG"
/>
<img
src=
"./doc/
VTimeLLM
.PNG"
/>
</div>
</div>
## 环境配置
## 环境配置
### Docker(方法一)
### Docker(方法一)
```
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/
jupyterlab-
pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.
8
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.
10
docker run -it --name=
SadTalker
--network=host --privileged=true --device=/dev/kfd --device=/dev/dri --shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /path/your_code_data:/path/
SadTalker
-v /opt/hyhal/:/opt/hyhal/:ro <imageID> bash # <imageID>为以上拉取的docker的镜像ID替换
docker run -it --name=
VTimeLLM
--network=host --privileged=true --device=/dev/kfd --device=/dev/dri --shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /path/your_code_data:/path/
VTimeLLM
-v /opt/hyhal/:/opt/hyhal/:ro <imageID> bash # <imageID>为以上拉取的docker的镜像ID替换
cd SadTalker
cd VTimeLLM
# 安装ffmpeg:格式转换相关
apt update
apt install ffmpeg
# 安装依赖
# 安装依赖
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
pip install -r requirements.txt
pip install -r requirements.txt
export HF_ENDPOINT=https://hf-mirror.com
# 解决deepspeed的async_io报错
apt update
apt install gcc libaio-dev
```
```
### Dockerfile(方法二)
### Dockerfile(方法二)
```
```
docker build --no-cache -t
sadtalker
:latest .
docker build --no-cache -t
vtimellm
:latest .
docker run -it --name=
SadTalker
--network=host --privileged=true --device=/dev/kfd --device=/dev/dri --shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /path/your_code_data:/path/
SadTalker
-v /opt/hyhal/:/opt/hyhal/:ro
sadtalker
/bin/bash
docker run -it --name=
VTimeLLM
--network=host --privileged=true --device=/dev/kfd --device=/dev/dri --shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /path/your_code_data:/path/
VTimeLLM
-v /opt/hyhal/:/opt/hyhal/:ro
vtimellm
/bin/bash
cd SadTalker
cd VTimeLLM
# 安装ffmpeg:格式转换相关
apt update
apt install ffmpeg
# 安装依赖
# 安装依赖
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
pip install -r requirements.txt
pip install -r requirements.txt
export HF_ENDPOINT=https://hf-mirror.com
# 解决deepspeed的async_io报错
apt update
apt install gcc libaio-dev
```
```
### Anaconda(方法三)
### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
...
@@ -66,75 +59,146 @@ pip install -r requirements.txt
...
@@ -66,75 +59,146 @@ pip install -r requirements.txt
DTK软件栈:dtk24.04.2
DTK软件栈:dtk24.04.2
python:python3.8
python:python3.8
pytorch:2.1.0
pytorch:2.1.0
torchvision:
torchvision:0.16.0
torchaudio:
deepspeed:0.14.2
flash-attn:2.0.4
```
```
`Tips:以上dtk软件栈、python、pytorch等DCU相关工具版本需要严格一一对应`
`Tips:以上dtk软件栈、python、pytorch等DCU相关工具版本需要严格一一对应`
2、其他非特殊库直接按照下面步骤进行安装
2、其他非特殊库直接按照下面步骤进行安装
```
```
cd SadTalker
cd VTimeLLM
# 安装ffmpeg:格式转换相关
apt update
apt install ffmpeg
# 安装依赖
# 安装依赖
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
pip install -r requirements.txt
pip install -r requirements.txt
export HF_ENDPOINT=https://hf-mirror.com
# 解决deepspeed的async_io报错
apt update
apt install gcc libaio-dev
```
```
## 数据集
## 数据集
推理测试所用数据已保存在SadTalker/dataset/下,目录结构如下:
### 训练数据集
```
VTimeLLM可基于Vicuna v1.5训练英文版本、基于ChatGLM3-6b训练中文版本,训练某个版本时只下载对应的数据集(data不同)即可。
── dataset
训练数据集包括三阶段的数据集data和预提取特征feat两部分,可通过
[
scnet
](
http://113.200.138.88:18080/aidatasets/project-dependency/vtimellm
)
或官网链接进行下载。官网链接如下:
│ ├── bus_chinese.wav
│ └── image.png
ps:本仓库准备了小数据集供训练测试,数据量约为完整数据集的。。。。。,可通过scnet进行下载。
```
## 训练
1、下载data
官方暂未开放
## 推理
(1)VTimeLLM-7B:
模型可通过
[
scnet
](
http://113.200.138.88:18080/aimodels/findsource-dependency/sadtalker
)
或以下方式进行下载:
*
[
stage1.json
](
https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/blob/main/blip_laion_cc_sbu_558k.json
)
*
[
stage2.json
](
https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/files/?p=%2Fdata%2Fstage2.json
)
1-1、Pre-Trained Models
*
[
stage3.json
](
https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/files/?p=%2Fdata%2Fstage3.json
)
*
[
Google Drive
](
https://drive.google.com/file/d/1gwWh45pF7aelNP_P78uDJL8Sycep-K7j/view?usp=sharing
)
(2)ChatGLM3-6b:
*
[
GitHub Releases
](
https://github.com/OpenTalker/SadTalker/releases
)
*
[
stage1/2/3.json
](
https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/files/?p=%2Fdata%2Fdata_Chinese.zip
)
*
[
Baidu (百度云盘)
](
https://pan.baidu.com/s/1kb1BCPaLOWX1JJb9Czbn6w?pwd=sadt
)
(
Password:
`sadt`
)
2、下载feat
*
[
feat_list
](
https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Ffeat&mode=list
)
解压缩feat的代码如下:
```
cd VTimeLLM/feat
tar -xzvf stage1.tar.gz
cat stage2_part_* > stage2.tar.gz
tar -xzvf stage2.tar.gz
tar -xzvf stage3.tar.gz
```
以基于VTimeLLM-7B的VTimeLLM为例,数据集目录结构如下:
```
VTimeLLM:
── data
│ ├── blip_laion_cc_sbu_558k.json
│ ├── stage2.json
│ └── stage3.json
── feat
│ ├── 558k_clip_feat
│ ├── intern_clip_feat
│ └── stage3_clip_feat
```
### 推理数据集
推理测试所用数据已保存在VTimeLLM/images/demo.mp4
1-2、GFPGAN Offline Patch
## 训练
*
[
Google Drive
](
https://drive.google.com/file/d/19AIBsmfcHW6BRJmeqSFlG5fL445Xmsyi?usp=sharing
)
VTimeLLM可基于Vicuna v1.5训练英文版本、基于ChatGLM3-6b训练中文版本,训练某个版本时只下载对应的模型即可。
*
[
GitHub Releases
](
https://github.com/OpenTalker/SadTalker/releases
)
训练需要分别下载clip、Vicuna v1.5(或ChatGLM3-6b)权重,并将它们放入 'checkpoints' 目录中,下载链接如下:
*
[
Baidu (百度云盘)
](
https://pan.baidu.com/s/1P4fRgk9gaSutZnn8YW034Q?pwd=sadt
)
(
Password:
`sadt`
)
1、下载cilp模型
2、运行自动下载(GitHub Releases):
*
[
scnet
](
http://113.200.138.88:18080/aimodels/findsource-dependency/vtimellm
)
*
[
官网链接
](
https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Fcheckpoints&mode=list
)
2-1、下载Vicuna v1.5权重
*
[
scnet
](
http://113.200.138.88:18080/aimodels/vicuna-7b-v1.5
)
*
[
官网链接
](
https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main
)
*
代码下载(huggingface)
```
cd VTimeLLM
export HF_ENDPOINT=https://hf-mirror.com
export HF_DATASETS_CACHE="./checkpoints/vicuna-7b-v1.5"
huggingface-cli download --resume-download lmsys/vicuna-7b-v1.5 --local-dir checkpoints/vicuna-7b-v1.5 --local-dir-use-symlinks False
```
2-2、下载ChatGLM3-6b权重
*
[
scnet
](
http://113.200.138.88:18080/aimodels/chatglm3-6b
)
*
[
官网链接
](
https://huggingface.co/lmsys/vicuna-7b-v1.5
)
*
代码下载(huggingface)
```
cd VTimeLLM
export HF_ENDPOINT=https://hf-mirror.com
export HF_DATASETS_CACHE="./checkpoints/chatglm3-6b"
huggingface-cli download --resume-download THUDM/chatglm3-6b --local-dir checkpoints/chatglm3-6b
```
以基于VTimeLLM-7B的VTimeLLM为例,模型目录结构如下:
```
VTimeLLM:
── clip
│ └── ViT-L-14.pt
── vicuna-7b-v1.5
│ └── ...
```
```
cd SadTalker
sh scripts/download_models.sh
以基于Vicuna v1.5的VTimeLLM为例,训练运行代码:
```
```
模型目录结构如下,checkpoints是预训练模型,gfpgan是人脸检测和增强模型:
cd VTimeLLM
wandb off
sh scripts/stage1.sh
sh scripts/stage2.sh
sh scripts/stage3.sh
```
```
── checkpoints
## 推理
VTimeLLM基于Vicuna v1.5训练了英文版本,存储为vtimellm-vicuna-v1-5-7b.tar.gz;基于ChatGLM3-6b训练了中文版本,存储为vtimellm-chatglm3-6b.tar.gz。推理某个版本时只下载对应的模型即可。
推理需要分别下载clip、Vicuna v1.5(或ChatGLM3-6b)、VTimeLLM权重,并将它们放入 'checkpoints' 目录中。clip、Vicuna v1.5(或ChatGLM3-6b)的下载参考训练阶段,VTimeLLM权重的下载链接如下:
*
[
scnet
](
http://113.200.138.88:18080/aimodels/findsource-dependency/vtimellm
)
*
[
官网链接
](
https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Fcheckpoints&mode=list
)
解压VTimeLLM权重的代码如下:
'''
cd VTimeLLM/checkpoints
tar -xzvf vtimellm-vicuna-v1-5-7b.tar.gz
# 或
tar -xzvf vtimellm-chatglm3-6b.tar.gz
'''
以基于Vicuna v1.5的VTimeLLM为例,模型目录结构如下:
```
VTimeLLM:
── clip
│ └── ViT-L-14.pt
── vtimellm-vicuna-v1-5-7b-stage1
│ └── ...
── vtimellm-vicuna-v1-5-7b-stage2
│ └── ...
── vtimellm-vicuna-v1-5-7b-stage3
│ └── ...
── vicuna-7b-v1.5
│ └── ...
│ └── ...
── gfpgan
│ └── weights
│ └── ...
```
推理运行代码:
```
HIP_VISIBLE_DEVICES=0 python inference.py \
--driven_audio dataset/bus_chinese.wav \
--source_image dataset/image.png \
--still \
--preprocess full \
--enhancer gfpgan \
--result_dir result/
# --driven_audio 音频数据的路径
# --source_image 图片数据的路径
# --still 使用与原始图像相同的姿势参数,头部运动较少
# --preprocess full 对图像进行['crop', 'extcrop', 'resize', 'full', 'extfull']预处理
# --enhancer 使用或通过人脸修复网络[gfpgan, RestoreFormer]增强生成的人脸
# --result_dir 输出路径
# 更多参数设置可参考inference.py的parser注释和docs/best_practice.md
```
```
以基于Vicuna v1.5的VTimeLLM为例,推理运行代码:
```
cd VTimeLLM
HIP_VISIBLE_DEVICES=0 python -m vtimellm.inference \
--model_base "checkpoints/vicuna-7b-v1.5" \
--pretrain_mm_mlp_adapter "checkpoints/vtimellm-vicuna-v1-5-7b-stage1/mm_projector.bin" \
--stage2 "checkpoints/vtimellm-vicuna-v1-5-7b-stage2" \
--stage3 "checkpoints/vtimellm-vicuna-v1-5-7b-stage3" \
--video_path "images/demo.mp4"
```
推理VTimeLLM-ChatGLM版本,请参考VTimeLLM/docs/inference_for_glm.ipynb
## result
## result
推理运行的默认推理结果为:
推理运行的默认推理结果为:
<div
align=
center
>
<div
align=
center
>
...
@@ -142,17 +206,22 @@ HIP_VISIBLE_DEVICES=0 python inference.py \
...
@@ -142,17 +206,22 @@ HIP_VISIBLE_DEVICES=0 python inference.py \
</div>
</div>
### 精度
### 精度
无
以下为默认训练结果:
| | 测试参数 | 软件栈 | final loss |
| -------------------------------- | ------------------------------------------------ | ---------- | ---------- |
| A800
*
2
<br/>
(80G,1410 Mhz) | MODEL_VERSION=vicuna-v1-5-7b
<br/>
bf16=True
<br/>
tf32=True | cuda11.8 | stages1:2.415712
<br/>
stages2:1.046057
<br/>
stages3:1.283405 |
| k100ai
*
2
<br/>
(64G,1500 Mhz) | MODEL_VERSION=vicuna-v1-5-7b
<br/>
bf16=True
<br/>
tf32=True | dtk24.04.2 | stages1:2.414052
<br/>
stages2:1.050350
<br/>
stages3:1.265567 |
## 应用场景
## 应用场景
### 算法类别
### 算法类别
`视频生成`
`视频生成`
### 热点应用行业
### 热点应用行业
`家具,电商,医疗,广媒,教育`
`家具,电商,医疗,广媒,教育`
## 预训练权重
## 预训练权重
-
-
http://113.200.138.88:18080/aimodels/findsource-dependency/vtimellm (vtimellm、clip)
-
http://113.200.138.88:18080/aimodels/vicuna-7b-v1.5.git (vicuna-7b-v1.5)
-
http://113.200.138.88:18080/aimodels/vicuna-7b-v1.5.git (vicuna-7b-v1.5)
http://113.200.138.88:18080/aimodels/chatglm3-6b (chatglm3-6b)
-
http://113.200.138.88:18080/aimodels/chatglm3-6b (chatglm3-6b)
## 源码仓库及问题反馈
## 源码仓库及问题反馈
-
-
https://developer.sourcefind.cn/codes/suily/vtimellm_pytorch
## 参考资料
## 参考资料
-
https://github.com/huangb23/VTimeLLM
-
https://github.com/huangb23/VTimeLLM
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment