README.md 5.25 KB
Newer Older
yangzhong's avatar
v1.0  
yangzhong committed
1
2
3
4
5
6
7
8
9
10
11
12
# BLIP-3

## 论文

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

https://arxiv.org/pdf/2408.08872

## 模型结构

BLIP-3,也叫xGen-MM,是一个用于开发Large的框架多模态模型(lmm)。该框架包括精心准备的数据集、训练配方、模型体系结构,以及最终的lmm套件。xGen-MM是xGen-MultiModal的缩写,扩展了Salesforce xGen计划的基础人工智能模型。模型经过一系列严格的评估的任务,包括单图像和多图像基准。预训练基础模型显示出很强的情境学习能力和指令微调模型在具有相似模型大小的开源lmm中展示了优异的竞争表现。此外,模型还引入了一个安全调优模型DPO,旨在减轻幻觉等有害行为,提高安全性。

yangzhong's avatar
yangzhong committed
13
14
![](https://developer.sourcefind.cn/codes/modelzoo/blip-3_pytorch/-/raw/master/assets/intro-1.png?inline=false)

yangzhong's avatar
v1.0  
yangzhong committed
15
16
17
18
19
## 环境配置

### Docker(方法一)

```
yangzhong's avatar
yangzhong committed
20
# 在光源可拉取docker镜像
yangzhong's avatar
v1.0  
yangzhong committed
21
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04.1-py3.10
yangzhong's avatar
yangzhong committed
22
# 创建并启动容器
yangzhong's avatar
v1.0  
yangzhong committed
23
docker run -it --network=host -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --privileged=true --device=/dev/kfd --device=/dev/dri/ --ipc=host --group-add video --privileged --name <your_proiect_name> <image_id> bash
yangzhong's avatar
yangzhong committed
24
# 安装依赖包
yangzhong's avatar
v1.0  
yangzhong committed
25
26
27
28
29
30
31
32
33
python setup.py install
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/ 
```

### Dockerfile(方法二)

```
docker build --no-cache -t blip3_pytorch:latest .
docker run -it --network=host --name=blip3_pytorch --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 -v /opt/hyhal/:/opt/hyhal/:ro -v /usr/local/hyhal:/usr/local/hyhal:ro blip3_pytorch:latest bash
yangzhong's avatar
yangzhong committed
34
安装依赖
yangzhong's avatar
v1.0  
yangzhong committed
35
36
37
38
39
40
41
python setup.py install
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
```

### Anaconda(方法三)

```
yangzhong's avatar
yangzhong committed
42
1.创建conda虚拟环境
yangzhong's avatar
v1.0  
yangzhong committed
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
conda create -n blip3_pytorch python=3.10
2.关于本项目DCU显卡所需的工具包、深度学习库等均可从光合开发者社区下载安装:https://developer.hpccube.com/tool/
DTK驱动:dtk25.04.1
python:python3.10
torch:2.4.1
```
Tips:以上DTK、python、torch等DCU相关工具包,版本需要严格一一对应。
```
3.其它非特殊库参照requirements.txt安装
pip install -r requirements-training.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
python setup.py install
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
```

# 训练

## 数据集
yangzhong's avatar
yangzhong committed
60
模型支持llava格式的json数据集文件,json文件结构如下
yangzhong's avatar
v1.0  
yangzhong committed
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

```
json文件:

{
 "id": "000000033471",
 "image": "coco/train2017/000000033471.jpg",
 "conversations": [
 {
 "from": "human",
 "value": "<image>\nWhat are the colors of the bus in the image?"
 },
 {
 "from": "gpt",
 "value": "The bus in the image is white and red."
 },
 ...
 ]
}
```
yangzhong's avatar
yangzhong committed
81
82
83
84
85
86
87
88
89
90
91
92
93
94

接着您需要配置[`data/example_data_config.yaml`](./data_configs/example_data_config.yaml)文件,包括所有json文件路径和图片数量,yaml文件中可以放置多个不同的数据集。如果您的json文件内是数据的相对路径,则还需要配置路径映射文件[`data/data_paths.py`](./data/data_paths.py)
```
yaml文件:

data_path: {
  '/path/to/blip_laion_cc_sbu_558k.json': 558128
  '/path/to/som_qa_coco20k.json': 20160,
  '/path/to/som_listing_coco10k.json': 10000,
}
```

本项目训练使用的是SoM-LLaVA数据集,目录结构如下,下载可以通过Hugging Face[下载链接](https://api-inference.hf-mirror.com/datasets/zzxslp/SoM-LLaVA/tree/main),也可以通过本项目提供的脚本down_dataset_hf.py进行下载。

yangzhong's avatar
v1.0  
yangzhong committed
95
```
yangzhong's avatar
yangzhong committed
96
97
98
99
100
101
102
/path/to/SoM-LLaVA/ 
     ├── som_listing_coco10k.json
     ├── som_llava_mix695k.json
     ├── som_qa_coco20k.json
     ├── som_train2017
     │   ├── 000000000001.jpg
     │   ├── 000000000009.jpg
yangzhong's avatar
v1.0  
yangzhong committed
103
104
105
106
107
     │   └── ...
```
## 微调

#### 预训练权重
yangzhong's avatar
yangzhong committed
108
运行如下脚本生成pytorch原生格式pt文件,Salesforce/xgen-mm-phi3-mini-base-r-v1.5、microsoft/Phi-3-mini-4k-instruct、google/siglip-so400m-patch14-384会自动从Hugging Face下载,脚本中已添加Hugging Face国内镜像
yangzhong's avatar
v1.0  
yangzhong committed
109
110

```
yangzhong's avatar
yangzhong committed
111
# 修改dest_fn参数为保存路径和pt文件名
yangzhong's avatar
v1.0  
yangzhong committed
112
113
114
115
python convert_hf_model.py
```
#### 单机多卡
```
yangzhong's avatar
yangzhong committed
116
bash scripts/example_finetune_xgenmmv1-phi3_mini_4k_instruct.sh
yangzhong's avatar
v1.0  
yangzhong committed
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
```
训练脚本参数说明如下
* `exp_name`: 训练日志文件名
* `data_path`: yaml文件路径
* `pretrained_ckpt`: pt文件路径
* `--nproc_per_node=2`: 多卡训练的卡数
* `--nnodes=1`: 节点数
* `--master_port 9650`: 端口
* `--lm_path`: 语言模型(LM)的路径,默认"microsoft/Phi-3-mini-4k-instruct"
* `--tokenizer_path`: 分词器的路径,用于处理文本数据,默认"microsoft/Phi-3-mini-4k-instruct"
* `--vision_encoder_path`: 视觉编码器,默认"google/siglip-so400m-patch14-384"

## result

### 应用场景

### 算法类别

图生文

### 热点应用行业

AIGC,设计

## 源码仓库及问题反馈

yangzhong's avatar
yangzhong committed
143
- https://developer.sourcefind.cn/codes/modelzoo/blip-3_pytorch
yangzhong's avatar
v1.0  
yangzhong committed
144
145
146
147
148
149
150
151
152
## 参考资料
- https://github.com/salesforce/LAVIS/tree/xgen-mm