Commit 1ad6cb11 authored by raojy's avatar raojy
Browse files

Docs: update config instructions for vLLM compatibility

parent bd1df840
...@@ -232,8 +232,6 @@ torchrun ./LLaMA-Factory/src/train.py \ ...@@ -232,8 +232,6 @@ torchrun ./LLaMA-Factory/src/train.py \
### 单机单卡 ### 单机单卡
``` ```
python inference.py python inference.py
``` ```
...@@ -244,18 +242,63 @@ python inference.py ...@@ -244,18 +242,63 @@ python inference.py
CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py
``` ```
## result ### vllm
如果在启动服务时遇到 `ValidationError``KeyError` 相关的配置错误,通常是因为当前 vLLM 版本尚未完全兼容新版模型的配置文件字段。
- text:" OCR 图片中的文字信息 " **解决方案:** 请手动修改模型目录下的 `config.json` 文件,将 `rope_scaling` 配置段中的 `type` 字段重命名为 `rope_type`
<div align=left> 修改前:
<img src="./images/result.png"/>
<div align=center>
<img src="./images/before_fix.png"/>
</div>
修改后:
<div align=center>
<img src="./images/after_fix.png"/>
</div> </div>
#### 单机推理
```
## serve启动
export ALLREDUCE_STREAM_WITH_COMPUTE=1
export VLLM_MLA_DISABLE=0
export VLLM_USE_FLASH_MLA=1
# 启动命令
vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
--trust-remote-code \
--max-model-len 32768 \
--served-model-name qwen-vl \
--dtype bfloat16 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.9
## client访问
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "minimax",
"messages": [
{
"role": "user",
"content": "牛顿提出了哪三大运动定律?请简要说明。"
}
]
}'
```
### 效果展示
![image-20260126100649166](C:\Users\45210\Desktop\曙光上库\Qwen2.5-vl_pytorch\images\result1.png)
### 精度 ### 精度
DCU与GPU精度一致,推理框架:vllm。
...@@ -264,6 +307,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py ...@@ -264,6 +307,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py
`对话问答` `对话问答`
### 热点应用行业 ### 热点应用行业
`科研,教育,政府,金融` `科研,教育,政府,金融`
## 预训练权重 ## 预训练权重
[ModelScope](https://modelscope.cn/) [ModelScope](https://modelscope.cn/)
- [Qwen2.5-VL-3B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-VL-3B-Instruct) - [Qwen2.5-VL-3B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-VL-3B-Instruct)
...@@ -279,4 +323,3 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py ...@@ -279,4 +323,3 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py
- https://qwenlm.github.io/zh/blog/qwen2.5-vl/ - https://qwenlm.github.io/zh/blog/qwen2.5-vl/
- https://github.com/QwenLM/Qwen2.5-VL - https://github.com/QwenLM/Qwen2.5-VL
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment