README.md 8.86 KB
Newer Older
wanglch's avatar
wanglch committed
1
2
3
4
5
6
# Qwen2.5-VL
## 论文

[ Qwen2.5-VL](https://qwenlm.github.io/zh/blog/qwen2.5-vl/)


raojy's avatar
updata  
raojy committed
7
## 模型简介
wanglch's avatar
wanglch committed
8
9
10
11
12
13
14
15
16
17
18
19
模型结构:Qwen2.5-VL 延续了上一代 Qwen-VL 中 ViT 加 Qwen2 的串联结构,三个不同规模的模型都采用了 600M 规模大小的 VIT,支持图像和视频统一输入。使模型能更好地融合视觉和语言信息,提高对多模态数据的理解能力。

● 多模态旋转位置编码(M-ROPE):Qwen2.5-VL 采用的 M-ROPE 将旋转位置编码分解成时间、空间(高度和宽度)三部分,使大规模语言模型能同时捕捉和整合一维文本、二维视觉和三维视频的位置信息,赋予了模型强大的多模态处理和推理能力。

● 网络结构简化:与 Qwen2-VL 相比,Qwen2.5-VL 增强了模型对时间和空间尺度的感知能力,进一步简化了网络结构以提高模型效率。



<div align=center>
    <img src="./images/arch.png"/>
</div>

20
### 环境依赖
wanglch's avatar
wanglch committed
21

22
23
24
25
26
27
28
29
30
31
|     软件     |                      版本                      |
| :----------: | :--------------------------------------------: |
|     DTK      |                    25.04.2                     |
|    python    |                    3.10.12                     |
| transformers |                     4.57.3                     |
|     vllm     |   0.9.2+das.opt2.dtk25042.20251225.g1663f34c   |
|    torch     |            2.5.1+das.opt1.dtk25042             |
|    triton    |   3.1.0+das.opt1.dtk25042.20251224.gaa867475   |
|  flash_attn  |   2.6.1+das.opt1.dtk2504.20251222.g859b5024    |
|  flash_mla   | 1.0.0+das.opt1.dtk2604.20251218.gb14bad68.rccl |
wanglch's avatar
wanglch committed
32

raojy's avatar
raojy committed
33
推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226
wanglch's avatar
wanglch committed
34

raojy's avatar
updata  
raojy committed
35
- 挂载地址`-v`根据实际模型情况修改
wanglch's avatar
wanglch committed
36

37
```bash
raojy's avatar
raojy committed
38
39
40
41
42
43
44
45
46
47
48
49
50
51
docker run -it \
    --shm-size 60g \
    --network=host \
    --name qwen3 \
    --privileged \
    --device=/dev/kfd \
    --device=/dev/dri \
    --device=/dev/mkfd \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -u root \
    -v /opt/hyhal/:/opt/hyhal/:ro \
    -v /path/your_code_data/:/path/your_code_data/ \
raojy's avatar
updata  
raojy committed
52
    harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226  bash
wanglch's avatar
wanglch committed
53
54
```

55
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
wanglch's avatar
wanglch committed
56

57
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
wanglch's avatar
wanglch committed
58

raojy's avatar
updata  
raojy committed
59
60
61
62
```
pip install -r requirements.txt
```

wanglch's avatar
wanglch committed
63
64
## 数据集

raojy's avatar
raojy committed
65
在 LLaMA-Factory中自带测试数据集,使用mllm_demo,identity,mllm_video_demo数据集,已经包含在data目录中
wanglch's avatar
wanglch committed
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备:

```
 ── data
    ├── mllm_demo.json
    ├── identity.json
    ├── mllm_video_demo.json
    └── ...

```

如果您正在使用自定义数据集,请按以下方式准备您的数据集。
将数据组织成一个 JSON 文件,并将数据放入 data 文件夹中。LLaMA-Factory 支持以 sharegpt 格式的多模态数据集。 sharegpt 格式的数据集应遵循以下格式:

```
[
  {
    "messages": [
      {
        "content": "<image>Who are they?",
        "role": "user"
      },
      {
        "content": "They're Kane and Gretzka from Bayern Munich.",
        "role": "assistant"
      },
      {
        "content": "What are they doing?",
        "role": "user"
      },
      {
        "content": "They are celebrating on the soccer field.",
        "role": "assistant"
      }
    ],
    "images": [
      "mllm_demo_data/1.jpg"
    ]
  },
  {
    "messages": [
      {
        "content": "<image>Who is he?",
        "role": "user"
      },
      {
        "content": "He's Thomas Muller from Bayern Munich.",
        "role": "assistant"
      },
      {
        "content": "Why is he on the ground?",
        "role": "user"
      },
      {
        "content": "Because he's sliding on his knees to celebrate.",
        "role": "assistant"
      }
    ],
    "images": [
      "mllm_demo_data/2.jpg"
    ]
  },
]
```

请按照以下格式在 data/dataset_info.json 中提供您的数据集定义。
对于 sharegpt 格式的数据集,dataset_info.json 中的列应包括:

```
   "dataset_name": {
       "file_name": "dataset_name.json",
       "formatting": "sharegpt",
       "columns": {
          "messages": "messages",
          "images": "images"
        },
      "tags": {
         "role_tag": "role",
         "content_tag": "content",
         "user_tag": "user",
         "assistant_tag": "assistant"
        }
   }

```

## 训练

使用LLaMA-Factory框架微调

```
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git

cd LLaMA-Factory

mkdir saves

mkdir cache

pip install -e ".[torch,metrics]"
```

### 单机单卡

```
torchrun ./LLaMA-Factory/src/train.py  \
    --deepspeed  ./LLaMA-Factory/examples/deepspeed/ds_z3_config.json \
    --stage sft \
    --trust_remote_code True \
    --do_train True \
    --model_name_or_path ./Qwen2.5-VL/Qwen2.5-VL-7B-Instruct/ \
    --dataset_dir ./LLaMA-Factory/data \
    --dataset mllm_demo \
    --template qwen2_vl \
    --finetuning_type lora \
    --lora_rank 64 \
    --lora_alpha 64 \
    --resize_vocab True \
    --optim adamw_torch \
    --lora_target all \
    --output_dir ./LLaMA-Factory/saves \
    --overwrite_cache \
    --overwrite_output_dir True \
    --cache_dir ./LLaMA-Factory/cache \
    --warmup_steps 100 \
    --max_grad_norm 1.0 \
    --max_samples 1000 \
    --weight_decay 0.1 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --ddp_timeout 120000000 \
    --learning_rate 1.0e-4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --cutoff_len 4096 \
    --save_steps 500 \
    --eval_steps 100 \
    --val_size 0.1 \
    --evaluation_strategy steps \
    --load_best_model_at_end True \
    --plot_loss True \
    --num_train_epochs 50 \
    --bf16
```

## 推理

raojy's avatar
raojy committed
213
214
### transformers

wanglch's avatar
wanglch committed
215
216
217
218
219
220
221
222
223
224
225
226
### 单机单卡

```
python inference.py
```

### 单机多卡

```
CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py
```

227
228
### vllm

wanglch's avatar
wanglch committed
229

raojy's avatar
updata  
raojy committed
230
#### 单卡推理
231
232

```
raojy's avatar
raojy committed
233
234
vllm推理需要确认qwen_vl_utils库,若不存在,请执行
pip install qwen_vl_utils
raojy's avatar
updata  
raojy committed
235
# 适用于3B/7B模型
236
237
238
239
240
241
242
243
244
245
246
247
248
# 启动命令
vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
    --trust-remote-code \
    --max-model-len 32768 \
    --served-model-name qwen-vl \
    --dtype bfloat16 \
    --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.9

## client访问
curl http://localhost:8000/v1/chat/completions   \
    -H "Content-Type: application/json"  \
    -d '{
raojy's avatar
raojy committed
249
        "model": "qwen-vl",
250
251
252
253
254
255
256
257
        "messages": [
            {
                "role": "user",
                "content": "牛顿提出了哪三大运动定律?请简要说明。"
            }
        ]
    }'

raojy's avatar
updata  
raojy committed
258
259
260

# 适用于72B模型
# 启动命令
raojy's avatar
updata  
raojy committed
261
vllm serve Qwen/Qwen2.5-VL-72B-Instruct \
raojy's avatar
updata  
raojy committed
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
  --served-model-name "qwen-vl" \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.95 \
  --max-model-len 4096 \
  --dtype bfloat16 \
  --enforce-eager \
  --trust-remote-code \
  --port 8000

## client访问
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "qwen-vl",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
            }
          },
          {
            "type": "text",
            "text": "描述这张图片的内容。"
          }
        ]
      }
    ],
    "max_tokens": 512,
    "temperature": 0.7,
    "top_p": 0.8
  }'
```

300
301
### 效果展示

302
<div align=center>
raojy's avatar
updata  
raojy committed
303
    <img src="./images/result1.png"/>
304
</div>
raojy's avatar
updata  
raojy committed
305
306


wanglch's avatar
wanglch committed
307
308
### 精度

raojy's avatar
updata  
raojy committed
309
DCU与GPU精度一致,推理框架:transformers、vllm。
wanglch's avatar
wanglch committed
310
311
312

## 预训练权重

raojy's avatar
updata  
raojy committed
313
314
|        **模型名称**         | **参数量** |   **DCU 型号**    | **最低卡数需求** |                         **下载地址**                         |
| :-------------------------: | :--------: | :---------------: | :--------------: | :----------------------------------------------------------: |
raojy's avatar
updata  
raojy committed
315
316
317
| **Qwen2.5-VL-3B-Instruct**  |     3B     | K100AI, BW1000 等 |        1         | [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) |
| **Qwen2.5-VL-7B-Instruct**  |     7B     | K100AI, BW1000 等 |        1         | [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) |
| **Qwen2.5-VL-72B-Instruct** |    72B     | K100AI, BW1000 等 |        4         | [Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) |
wanglch's avatar
wanglch committed
318
319

## 源码仓库及问题反馈
raojy's avatar
raojy committed
320
321
322

源码仓库及问题反馈

dcuai's avatar
dcuai committed
323
- https://developer.sourcefind.cn/codes/modelzoo/Qwen2.5-vl_pytorch
wanglch's avatar
wanglch committed
324
325
326
327
328
## 参考资料

- https://qwenlm.github.io/zh/blog/qwen2.5-vl/
- https://github.com/QwenLM/Qwen2.5-VL