Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
Qwen3_vllm
Commits
18a90f1e
Commit
18a90f1e
authored
May 07, 2025
by
zhuwenwen
Browse files
update readme
parent
ea93e725
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
31 additions
and
16 deletions
+31
-16
README.md
README.md
+31
-16
No files found.
README.md
View file @
18a90f1e
...
...
@@ -41,7 +41,6 @@ docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.8.4-ubuntu22.04
docker run -it --name qwen3_vllm --privileged --shm-size=64G --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal -v <Host Path>:<Container Path> <Image ID> /bin/bash
```
### Dockerfile(方法二)
```
...
...
@@ -51,7 +50,36 @@ docker build -t qwen3:latest .
docker run -it --name qwen3_vllm --privileged --shm-size=64G --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> qwen3:latest /bin/bash
```
### Anaconda(方法三)
```
conda create -n qwen3_vllm python=3.10
```
关于本项目DCU显卡所需的特殊深度学习库可从
[
光合
](
https://developer.hpccube.com/tool/
)
开发者社区下载安装。
*
DTK驱动:dtk25.04
*
Pytorch: 2.4.0
*
triton: 3.0.0
*
lmslim: 0.2.1
*
flash_attn: 2.6.1
*
flash_mla: 1.0.0
*
vllm: 0.8.4
*
python: python3.10
`Tips:需先安装相关依赖,最后安装vllm包`
环境变量:
export ALLREDUCE_STREAM_WITH_COMPUTE=1
export VLLM_NUMA_BIND=1
export VLLM_RANK0_NUMA=0
export VLLM_RANK1_NUMA=1
export VLLM_RANK2_NUMA=2
export VLLM_RANK3_NUMA=3
export VLLM_RANK4_NUMA=4
export VLLM_RANK5_NUMA=5
export VLLM_RANK6_NUMA=6
export VLLM_RANK7_NUMA=7
## 数据集
...
...
@@ -79,7 +107,7 @@ docker run -it --name qwen3_vllm --privileged --shm-size=64G --device=/dev/kfd
python examples/offline_inference.py
```
其中,
`prompts`
为提示词;
`temperature`
为控制采样随机性的值,值越小模型生成越确定,值变高模型生成更随机,0表示贪婪采样,默认为1;
`max_tokens=16`
为生成长度,默认为1;
其中,
`prompts`
为提示词;
`temperature`
为控制采样随机性的值,值越小模型生成越确定,值变高模型生成更随机,0表示贪婪采样,默认为1;
`max_tokens=16`
为生成长度,默认为1
6
;
`model`
为模型路径;
`tensor_parallel_size=1`
为使用卡数,默认为1;
`dtype="float16"`
为推理数据类型。
### 离线批量推理性能测试
...
...
@@ -128,19 +156,6 @@ vllm serve /your/model/path --enforce-eager --dtype float16 --trust-remote-code
这里sreve之后为加载模型路径,
`--dtype`
为数据类型:float16,默认情况使用tokenizer中的预定义聊天模板。
### OpenAI Completions API和vllm结合使用
```
bash
curl http://localhost:8000/v1/completions
\
-H
"Content-Type: application/json"
\
-d
'{
"model": "/your/model/path",
"prompt": "What is deep learning?",
"max_tokens": 7,
"temperature": 0
}'
```
### OpenAI Chat API和vllm结合使用
```
bash
...
...
@@ -149,7 +164,7 @@ curl http://localhost:8000/v1/chat/completions \
-H
"Content-Type: application/json"
\
-d
'{
"model": "/your/model/path",
"max_tokens":
6000
,
"max_tokens":
128
,
"messages": [
{
"role": "user",
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment