Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
Qwen3-30B-A3B_vllm
Commits
351c6b85
Commit
351c6b85
authored
Aug 08, 2025
by
zhangwq5
Browse files
Qwen3-30B-A3B-Thinking_offline
parent
7fb8ad80
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
112 additions
and
24 deletions
+112
-24
README.md
README.md
+67
-3
infer/offline/Qwen3-30B-A3B-Thinking-2507_logprobs_A800_fp16.json
...fline/Qwen3-30B-A3B-Thinking-2507_logprobs_A800_fp16.json
+12
-0
infer/offline/Qwen3-30B-A3B-Thinking-2507_logprobs_K100AI_fp16.json
...ine/Qwen3-30B-A3B-Thinking-2507_logprobs_K100AI_fp16.json
+12
-0
infer/offline/acc.py
infer/offline/acc.py
+20
-20
infer/offline/infer_vllm.py
infer/offline/infer_vllm.py
+1
-1
No files found.
README.md
View file @
351c6b85
...
...
@@ -7,9 +7,8 @@
-
https://arxiv.org/abs/2501.15383
## 模型结构
Qwen3-30B-A3B(Qwen/Qwen3-30B-A3B-Instruct-2507)在一般能力方面有显著提高,包括遵循指令、逻辑推理、文本理解、数学、科学、编码和工具使用。
跨多种语言的长尾知识覆盖的实质性增长。
在主观和开放式任务中与用户偏好明显更好的对齐,从而实现更有帮助的响应和更高质量的文本生成。
Qwen3-30B-A3B、Qwen3-30B-A3B-Instruct-2507在一般能力方面有显著提高,包括遵循指令、逻辑推理、文本理解、数学、科学、编码和工具使用。
跨多种语言的长尾知识覆盖的实质性增长。在主观和开放式任务中与用户偏好明显更好的对齐,从而实现更有帮助的响应和更高质量的文本生成。
增强了256K长上下文理解能力。
<div
align=
center
>
...
...
@@ -158,6 +157,71 @@ Qwen3-30B-A3B-Instruct-2507在DCU(K100_AI)与GPU(A800)离线推理的平均绝
```
DCU(K100_AI)与GPU(A800)离线推理Qwen3-30B-A3B-Instruct-2507精度一致,推理框架:vllm
### vllm离线推理Qwen3-30B-A3B-Thinking-2507
```
bash
## Qwen3-30B-A3B-Thinking-2507 至少需要双卡部署推理
export
HIP_VISIBLE_DEVICES
=
6,7
## 模型地址参数
python ./infer/offline/infer_vllm.py
--model
/your_path/Qwen3-30B-A3B-Thinking-2507
--tensor-parallel-size
2
```
## result
```
Original Input Prompt (if available):
'介绍一下北京.'
Generated text (full output):
'嗯,用户让我介绍一下北京。首先得确定用户的需求是什么。
......
......
......'
================================================================================
Logprobs per generated token:
Step 0:
- Generated Token: 106287 ('嗯')
- Top Logprobs:
- Rank 1: Token 106287 ('嗯') -> Logprob: -0.0134
- Rank 2: Token 32313 ('Okay') -> Logprob: -4.3884
- Rank 3: Token 99692 ('好的') -> Logprob: -7.0134
- Rank 4: Token 80022 ('Hmm') -> Logprob: -11.3884
- Rank 5: Token 110115 ('好吧') -> Logprob: -11.6384
- Rank 6: Token 11395 ('Well') -> Logprob: -13.0134
- Rank 7: Token 52801 ('好') -> Logprob: -13.0134
- Rank 8: Token 101140 ('首先') -> Logprob: -13.3884
- Rank 9: Token 71486 ('Alright') -> Logprob: -13.5134
- Rank 10: Token 2461 ('For') -> Logprob: -14.0134
...
...
成功将每个生成token的logprob写入到文件: ...
```
### 精度
```
# 分别在DCU和GPU上运行infer_vllm.py,得到各自的精度数据,并将精度数据复制粘贴到acc.py中运行
python ./infer/offline/acc.py
```
结果
```
Qwen3-30B-A3B-Thinking-2507在DCU(K100_AI)与GPU(A800)离线推理的平均绝对误差值:0.01841533068222816
```
DCU(K100_AI)与GPU(A800)离线推理Qwen3-30B-A3B-Thinking-2507精度一致,推理框架:vllm
### vllm在线推理Qwen3-30B-A3B
```
bash
## Qwen3-30B-A3B 至少需要双卡部署
...
...
infer/offline/Qwen3-30B-A3B-Thinking-2507_logprobs_A800_fp16.json
0 → 100644
View file @
351c6b85
[
-0.011681370437145233
,
-8.582700684200972e-05
,
-1.9073304429184645e-05
,
-0.1841658502817154
,
-0.16056427359580994
,
-6.556489552167477e-06
,
-0.01815206930041313
,
-0.5805881023406982
,
-0.47540760040283203
,
-0.0720185860991478
]
\ No newline at end of file
infer/offline/Qwen3-30B-A3B-Thinking-2507_logprobs_K100AI_fp16.json
0 → 100644
View file @
351c6b85
[
-0.013442831113934517
,
-8.987976616481319e-05
,
-2.062299427052494e-05
,
-0.14825429022312164
,
-0.16062740981578827
,
-9.059865078597795e-06
,
-0.023248476907610893
,
-0.717088520526886
,
-0.47542446851730347
,
-0.07681393623352051
]
\ No newline at end of file
infer/offline/acc.py
View file @
351c6b85
import
numpy
as
np
logprobs_1
=
np
.
array
([
-
0.0
02492894185706973
,
-
0.2020647525787353
5
,
-
0.1487216502
42
8
05
48
,
-
3.6954811548639555e-06
,
0.
0
,
-
2.3841855067985307
e-0
7
,
-
0.0
38103267550468445
,
-
0.
0006967739318497479
,
-
6.0794889577664435e-05
,
-
3
.0
99436753473128e-06
-
0.0
13442831113934517
,
-
8.987976616481319e-0
5
,
-
2.062299
42
7
05
2494e-05
,
-
0.14825429022312164
,
-
0.
16062740981578827
,
-
9.059865078597795
e-0
6
,
-
0.0
23248476907610893
,
-
0.
717088520526886
,
-
0.47542446851730347
,
-
0
.0
7681393623352051
])
logprobs_2
=
np
.
array
([
-
0.0
019439626485109
33
,
-
0.25255143642425537
,
-
0.1344442367553711
,
-
2.9802276912960224e-06
,
0.
0
,
-
2.384185506798530
7e-0
7
,
-
0.0
3809638321399689
,
-
0.
0007833749987185001
,
-
7.64102369430475e-05
,
-
4
.0
531076592742465e-06
-
0.0
116813704371452
33
,
-
8.582700684200972e-05
,
-
1.9073304429184645e-05
,
-
0.1841658502817154
,
-
0.
16056427359580994
,
-
6.55648955216747
7e-0
6
,
-
0.0
1815206930041313
,
-
0.
5805881023406982
,
-
0.47540760040283203
,
-
0
.0
720185860991478
])
print
(
np
.
mean
(
np
.
abs
(
logprobs_1
-
logprobs_2
)))
\ No newline at end of file
infer/offline/infer_vllm.py
View file @
351c6b85
...
...
@@ -113,7 +113,7 @@ def main(args: dict):
first_10_logprobs_to_save
.
append
(
logprob_value
)
output_filename
=
'./Qwen3-30B-A3B_logprobs_
K100AI
_fp16.json'
output_filename
=
'./Qwen3-30B-A3B
-Thinking-2507
_logprobs_
A800
_fp16.json'
with
open
(
output_filename
,
'w'
)
as
f
:
json
.
dump
(
first_10_logprobs_to_save
,
f
,
indent
=
2
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment