修改README

e7d62d3b · suily · fef630ee · e7d62d3b
Commit e7d62d3b authored Nov 21, 2024 by suily
Hide whitespace changes
Inline Side-by-side

Showing with 148 additions and 79 deletions

README.md README.md +148 -79

No files found.
--- a/README.md
+++ b/README.md
@@ -20,45 +20,38 @@ Visual Encoder：利用CLIP ViT-L/14模型对每一帧获取cls token的feature
 Visual Adapter：一个线性层，对每一帧的v_cls做变换，映射到LLM空间，最后视频由N*d的特征Z表示（N为帧数，d为LLM的隐层维度），这里均匀采样100帧

 Vicuna：即LLM，用<video>来代表视频内容，将视觉特征Z嵌入到text的embedding中间
-
-<div align=center>
-    <img src="./doc/ExpNet.PNG"/>
-</div>
-<div align=center>
-    <img src="./doc/PoseVAE.PNG"/>
-</div>
 <div align=center>
-    <img src="./doc/FaceRender.PNG"/>
+    <img src="./doc/VTimeLLM.PNG"/>
 </div>

 ## 环境配置
 ### Docker（方法一）
 ```
-docker pull image.sourcefind.cn:5000/dcu/admin/base/jupyterlab-pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.8
-docker run -it --name=SadTalker --network=host --privileged=true --device=/dev/kfd --device=/dev/dri --shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /path/your_code_data:/path/SadTalker -v /opt/hyhal/:/opt/hyhal/:ro <imageID> bash  # <imageID>为以上拉取的docker的镜像ID替换
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10
+docker run -it --name=VTimeLLM --network=host --privileged=true --device=/dev/kfd --device=/dev/dri --shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /path/your_code_data:/path/VTimeLLM -v /opt/hyhal/:/opt/hyhal/:ro <imageID> bash  # <imageID>为以上拉取的docker的镜像ID替换

-cd SadTalker
-# 安装ffmpeg：格式转换相关
-apt update
-apt install ffmpeg
+cd VTimeLLM
 # 安装依赖
 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
-pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
 pip install -r requirements.txt
+export HF_ENDPOINT=https://hf-mirror.com
+# 解决deepspeed的async_io报错
+apt update
+apt install gcc libaio-dev
 ```
 ### Dockerfile（方法二）
 ```
-docker build --no-cache -t sadtalker:latest .
-docker run -it --name=SadTalker --network=host --privileged=true --device=/dev/kfd --device=/dev/dri --shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /path/your_code_data:/path/SadTalker -v /opt/hyhal/:/opt/hyhal/:ro sadtalker /bin/bash
+docker build --no-cache -t vtimellm:latest .
+docker run -it --name=VTimeLLM --network=host --privileged=true --device=/dev/kfd --device=/dev/dri --shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /path/your_code_data:/path/VTimeLLM -v /opt/hyhal/:/opt/hyhal/:ro vtimellm /bin/bash

-cd SadTalker
-# 安装ffmpeg：格式转换相关
-apt update
-apt install ffmpeg
+cd VTimeLLM
 # 安装依赖
 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
-pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
 pip install -r requirements.txt
+export HF_ENDPOINT=https://hf-mirror.com
+# 解决deepspeed的async_io报错
+apt update
+apt install gcc libaio-dev
 ```
 ### Anaconda（方法三）
 1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
@@ -66,75 +59,146 @@ pip install -r requirements.txt
 DTK软件栈：dtk24.04.2
 python：python3.8
 pytorch：2.1.0
-torchvision：
-torchaudio：
+torchvision：0.16.0
+deepspeed：0.14.2
+flash-attn：2.0.4
 ```
 `Tips：以上dtk软件栈、python、pytorch等DCU相关工具版本需要严格一一对应`

 2、其他非特殊库直接按照下面步骤进行安装
 ```
-cd SadTalker
-# 安装ffmpeg：格式转换相关
-apt update
-apt install ffmpeg
+cd VTimeLLM
 # 安装依赖
 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
-pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
 pip install -r requirements.txt
+export HF_ENDPOINT=https://hf-mirror.com
+# 解决deepspeed的async_io报错
+apt update
+apt install gcc libaio-dev
 ```
 ## 数据集
-推理测试所用数据已保存在SadTalker/dataset/下，目录结构如下：
-```
- ── dataset
-    │   ├── bus_chinese.wav
-    │   └── image.png
-```
-## 训练
-官方暂未开放
-## 推理
-模型可通过[scnet](http://113.200.138.88:18080/aimodels/findsource-dependency/sadtalker)或以下方式进行下载：
-
-1-1、Pre-Trained Models
-* [Google Drive](https://drive.google.com/file/d/1gwWh45pF7aelNP_P78uDJL8Sycep-K7j/view?usp=sharing)
-* [GitHub Releases](https://github.com/OpenTalker/SadTalker/releases)
-* [Baidu (百度云盘)](https://pan.baidu.com/s/1kb1BCPaLOWX1JJb9Czbn6w?pwd=sadt) (Password: `sadt`)
+### 训练数据集
+VTimeLLM可基于Vicuna v1.5训练英文版本、基于ChatGLM3-6b训练中文版本，训练某个版本时只下载对应的数据集（data不同）即可。
+训练数据集包括三阶段的数据集data和预提取特征feat两部分，可通过[scnet](http://113.200.138.88:18080/aidatasets/project-dependency/vtimellm) 或官网链接进行下载。官网链接如下：
+
+ps：本仓库准备了小数据集供训练测试，数据量约为完整数据集的。。。。。，可通过scnet进行下载。
+
+1、下载data
+
+（1）VTimeLLM-7B：
+* [stage1.json](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/blob/main/blip_laion_cc_sbu_558k.json) 
+* [stage2.json](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/files/?p=%2Fdata%2Fstage2.json)
+* [stage3.json](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/files/?p=%2Fdata%2Fstage3.json)
+（2）ChatGLM3-6b：
+* [stage1/2/3.json](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/files/?p=%2Fdata%2Fdata_Chinese.zip)
+2、下载feat
+* [feat_list](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Ffeat&mode=list)
+解压缩feat的代码如下：
+```
+cd VTimeLLM/feat
+tar -xzvf stage1.tar.gz
+cat stage2_part_* > stage2.tar.gz
+tar -xzvf stage2.tar.gz
+tar -xzvf stage3.tar.gz
+```
+以基于VTimeLLM-7B的VTimeLLM为例，数据集目录结构如下：
+```
+VTimeLLM:
+ ── data
+    │   ├── blip_laion_cc_sbu_558k.json
+    │   ├── stage2.json
+    │   └── stage3.json
+ ── feat
+    │   ├── 558k_clip_feat
+    │   ├── intern_clip_feat
+    │   └── stage3_clip_feat
+```
+### 推理数据集
+推理测试所用数据已保存在VTimeLLM/images/demo.mp4

-1-2、GFPGAN Offline Patch
-* [Google Drive](https://drive.google.com/file/d/19AIBsmfcHW6BRJmeqSFlG5fL445Xmsyi?usp=sharing)
-* [GitHub Releases](https://github.com/OpenTalker/SadTalker/releases)
-* [Baidu (百度云盘)](https://pan.baidu.com/s/1P4fRgk9gaSutZnn8YW034Q?pwd=sadt) (Password: `sadt`)
-
-2、运行自动下载（GitHub Releases）：
+## 训练
+VTimeLLM可基于Vicuna v1.5训练英文版本、基于ChatGLM3-6b训练中文版本，训练某个版本时只下载对应的模型即可。
+训练需要分别下载clip、Vicuna v1.5(或ChatGLM3-6b)权重，并将它们放入 'checkpoints' 目录中，下载链接如下：
+
+1、下载cilp模型
+* [scnet](http://113.200.138.88:18080/aimodels/findsource-dependency/vtimellm) 
+* [官网链接](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Fcheckpoints&mode=list)
+2-1、下载Vicuna v1.5权重
+* [scnet](http://113.200.138.88:18080/aimodels/vicuna-7b-v1.5) 
+* [官网链接](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main)
+* 代码下载（huggingface）
+```
+cd VTimeLLM
+export HF_ENDPOINT=https://hf-mirror.com
+export HF_DATASETS_CACHE="./checkpoints/vicuna-7b-v1.5"
+huggingface-cli download --resume-download lmsys/vicuna-7b-v1.5 --local-dir checkpoints/vicuna-7b-v1.5 --local-dir-use-symlinks False
+```
+2-2、下载ChatGLM3-6b权重
+* [scnet](http://113.200.138.88:18080/aimodels/chatglm3-6b) 
+* [官网链接](https://huggingface.co/lmsys/vicuna-7b-v1.5)
+* 代码下载（huggingface）
+```
+cd VTimeLLM
+export HF_ENDPOINT=https://hf-mirror.com
+export HF_DATASETS_CACHE="./checkpoints/chatglm3-6b"
+huggingface-cli download --resume-download THUDM/chatglm3-6b --local-dir checkpoints/chatglm3-6b
+```
+以基于VTimeLLM-7B的VTimeLLM为例，模型目录结构如下：
+```
+VTimeLLM:
+ ── clip
+    │   └── ViT-L-14.pt
+ ── vicuna-7b-v1.5
+    │   └── ...
 ```
-cd SadTalker
-sh scripts/download_models.sh
+
+以基于Vicuna v1.5的VTimeLLM为例，训练运行代码：
 ```
-模型目录结构如下，checkpoints是预训练模型，gfpgan是人脸检测和增强模型：
+cd VTimeLLM
+wandb off
+sh scripts/stage1.sh
+sh scripts/stage2.sh
+sh scripts/stage3.sh
 ```
- ── checkpoints
+## 推理
+VTimeLLM基于Vicuna v1.5训练了英文版本，存储为vtimellm-vicuna-v1-5-7b.tar.gz；基于ChatGLM3-6b训练了中文版本，存储为vtimellm-chatglm3-6b.tar.gz。推理某个版本时只下载对应的模型即可。
+推理需要分别下载clip、Vicuna v1.5（或ChatGLM3-6b）、VTimeLLM权重，并将它们放入 'checkpoints' 目录中。clip、Vicuna v1.5（或ChatGLM3-6b）的下载参考训练阶段，VTimeLLM权重的下载链接如下：
+
+* [scnet](http://113.200.138.88:18080/aimodels/findsource-dependency/vtimellm) 
+* [官网链接](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Fcheckpoints&mode=list)
+解压VTimeLLM权重的代码如下：
+'''
+cd VTimeLLM/checkpoints
+tar -xzvf vtimellm-vicuna-v1-5-7b.tar.gz 
+# 或
+tar -xzvf vtimellm-chatglm3-6b.tar.gz
+'''
+以基于Vicuna v1.5的VTimeLLM为例，模型目录结构如下：
+```
+VTimeLLM:
+ ── clip
+    │   └── ViT-L-14.pt
+ ── vtimellm-vicuna-v1-5-7b-stage1
+    │   └── ...
+ ── vtimellm-vicuna-v1-5-7b-stage2
+    │   └── ...
+ ── vtimellm-vicuna-v1-5-7b-stage3
+    │   └── ...
+ ── vicuna-7b-v1.5
    │   └── ...
- ── gfpgan
-    │   └── weights
-    │          └── ...
-```
-推理运行代码：
-```
-HIP_VISIBLE_DEVICES=0 python inference.py \
-	--driven_audio dataset/bus_chinese.wav \
-	--source_image dataset/image.png \
-	--still \
-	--preprocess full \
-	--enhancer gfpgan \
-	--result_dir result/
-
-# --driven_audio 音频数据的路径
-# --source_image 图片数据的路径
-# --still 使用与原始图像相同的姿势参数，头部运动较少
-# --preprocess full 对图像进行['crop', 'extcrop', 'resize', 'full', 'extfull']预处理
-# --enhancer 使用或通过人脸修复网络[gfpgan, RestoreFormer]增强生成的人脸
-# --result_dir 输出路径
-# 更多参数设置可参考inference.py的parser注释和docs/best_practice.md
 ```
+
+以基于Vicuna v1.5的VTimeLLM为例，推理运行代码：
+```
+cd VTimeLLM
+HIP_VISIBLE_DEVICES=0 python -m vtimellm.inference \
+	--model_base "checkpoints/vicuna-7b-v1.5" \
+	--pretrain_mm_mlp_adapter "checkpoints/vtimellm-vicuna-v1-5-7b-stage1/mm_projector.bin" \
+	--stage2 "checkpoints/vtimellm-vicuna-v1-5-7b-stage2" \
+	--stage3 "checkpoints/vtimellm-vicuna-v1-5-7b-stage3" \
+	--video_path "images/demo.mp4"
+```
+推理VTimeLLM-ChatGLM版本，请参考VTimeLLM/docs/inference_for_glm.ipynb
 ## result
 推理运行的默认推理结果为：
 <div align=center>
@@ -142,17 +206,22 @@ HIP_VISIBLE_DEVICES=0 python inference.py \
 </div>

 ### 精度
-无
+以下为默认训练结果：
+|                                  | 测试参数                                         | 软件栈     | final loss |
+| -------------------------------- | ------------------------------------------------ | ---------- | ---------- |
+| A800 * 2<br/>（80G，1410 Mhz）   | MODEL_VERSION=vicuna-v1-5-7b<br/>bf16=True<br/>tf32=True  | cuda11.8   |  stages1：2.415712<br/>stages2：1.046057<br/>stages3：1.283405  |
+| k100ai * 2<br/>（64G，1500 Mhz） | MODEL_VERSION=vicuna-v1-5-7b<br/>bf16=True<br/>tf32=True  | dtk24.04.2 |  stages1：2.414052<br/>stages2：1.050350<br/>stages3：1.265567  |
+
 ## 应用场景
 ### 算法类别
 `视频生成`
 ### 热点应用行业
 `家具,电商,医疗,广媒,教育`
 ## 预训练权重
- 
+- http://113.200.138.88:18080/aimodels/findsource-dependency/vtimellm (vtimellm、clip)
 - http://113.200.138.88:18080/aimodels/vicuna-7b-v1.5.git (vicuna-7b-v1.5)
-  http://113.200.138.88:18080/aimodels/chatglm3-6b (chatglm3-6b)
+- http://113.200.138.88:18080/aimodels/chatglm3-6b (chatglm3-6b)
 ## 源码仓库及问题反馈
- 
+- https://developer.sourcefind.cn/codes/suily/vtimellm_pytorch
 ## 参考资料
 - https://github.com/huangb23/VTimeLLM