Commit 351c6b85 authored by zhangwq5's avatar zhangwq5
Browse files

Qwen3-30B-A3B-Thinking_offline

parent 7fb8ad80
......@@ -7,9 +7,8 @@
- https://arxiv.org/abs/2501.15383
## 模型结构
Qwen3-30B-A3B(Qwen/Qwen3-30B-A3B-Instruct-2507)在一般能力方面有显著提高,包括遵循指令、逻辑推理、文本理解、数学、科学、编码和工具使用。
跨多种语言的长尾知识覆盖的实质性增长。
在主观和开放式任务中与用户偏好明显更好的对齐,从而实现更有帮助的响应和更高质量的文本生成。
Qwen3-30B-A3B、Qwen3-30B-A3B-Instruct-2507在一般能力方面有显著提高,包括遵循指令、逻辑推理、文本理解、数学、科学、编码和工具使用。
跨多种语言的长尾知识覆盖的实质性增长。在主观和开放式任务中与用户偏好明显更好的对齐,从而实现更有帮助的响应和更高质量的文本生成。
增强了256K长上下文理解能力。
<div align=center>
......@@ -158,6 +157,71 @@ Qwen3-30B-A3B-Instruct-2507在DCU(K100_AI)与GPU(A800)离线推理的平均绝
```
DCU(K100_AI)与GPU(A800)离线推理Qwen3-30B-A3B-Instruct-2507精度一致,推理框架:vllm
### vllm离线推理Qwen3-30B-A3B-Thinking-2507
```bash
## Qwen3-30B-A3B-Thinking-2507 至少需要双卡部署推理
export HIP_VISIBLE_DEVICES=6,7
## 模型地址参数
python ./infer/offline/infer_vllm.py --model /your_path/Qwen3-30B-A3B-Thinking-2507 --tensor-parallel-size 2
```
## result
```
Original Input Prompt (if available):
'介绍一下北京.'
Generated text (full output):
'嗯,用户让我介绍一下北京。首先得确定用户的需求是什么。
......
......
......'
================================================================================
Logprobs per generated token:
Step 0:
- Generated Token: 106287 ('嗯')
- Top Logprobs:
- Rank 1: Token 106287 ('嗯') -> Logprob: -0.0134
- Rank 2: Token 32313 ('Okay') -> Logprob: -4.3884
- Rank 3: Token 99692 ('好的') -> Logprob: -7.0134
- Rank 4: Token 80022 ('Hmm') -> Logprob: -11.3884
- Rank 5: Token 110115 ('好吧') -> Logprob: -11.6384
- Rank 6: Token 11395 ('Well') -> Logprob: -13.0134
- Rank 7: Token 52801 ('好') -> Logprob: -13.0134
- Rank 8: Token 101140 ('首先') -> Logprob: -13.3884
- Rank 9: Token 71486 ('Alright') -> Logprob: -13.5134
- Rank 10: Token 2461 ('For') -> Logprob: -14.0134
...
...
成功将每个生成token的logprob写入到文件: ...
```
### 精度
```
# 分别在DCU和GPU上运行infer_vllm.py,得到各自的精度数据,并将精度数据复制粘贴到acc.py中运行
python ./infer/offline/acc.py
```
结果
```
Qwen3-30B-A3B-Thinking-2507在DCU(K100_AI)与GPU(A800)离线推理的平均绝对误差值:0.01841533068222816
```
DCU(K100_AI)与GPU(A800)离线推理Qwen3-30B-A3B-Thinking-2507精度一致,推理框架:vllm
### vllm在线推理Qwen3-30B-A3B
```bash
## Qwen3-30B-A3B 至少需要双卡部署
......
[
-0.011681370437145233,
-8.582700684200972e-05,
-1.9073304429184645e-05,
-0.1841658502817154,
-0.16056427359580994,
-6.556489552167477e-06,
-0.01815206930041313,
-0.5805881023406982,
-0.47540760040283203,
-0.0720185860991478
]
\ No newline at end of file
[
-0.013442831113934517,
-8.987976616481319e-05,
-2.062299427052494e-05,
-0.14825429022312164,
-0.16062740981578827,
-9.059865078597795e-06,
-0.023248476907610893,
-0.717088520526886,
-0.47542446851730347,
-0.07681393623352051
]
\ No newline at end of file
import numpy as np
logprobs_1 = np.array([
-0.002492894185706973,
-0.20206475257873535,
-0.14872165024280548,
-3.6954811548639555e-06,
0.0,
-2.3841855067985307e-07,
-0.038103267550468445,
-0.0006967739318497479,
-6.0794889577664435e-05,
-3.099436753473128e-06
-0.013442831113934517,
-8.987976616481319e-05,
-2.062299427052494e-05,
-0.14825429022312164,
-0.16062740981578827,
-9.059865078597795e-06,
-0.023248476907610893,
-0.717088520526886,
-0.47542446851730347,
-0.07681393623352051
])
logprobs_2 = np.array([
-0.001943962648510933,
-0.25255143642425537,
-0.1344442367553711,
-2.9802276912960224e-06,
0.0,
-2.3841855067985307e-07,
-0.03809638321399689,
-0.0007833749987185001,
-7.64102369430475e-05,
-4.0531076592742465e-06
-0.011681370437145233,
-8.582700684200972e-05,
-1.9073304429184645e-05,
-0.1841658502817154,
-0.16056427359580994,
-6.556489552167477e-06,
-0.01815206930041313,
-0.5805881023406982,
-0.47540760040283203,
-0.0720185860991478
])
print(np.mean(np.abs(logprobs_1 - logprobs_2)))
\ No newline at end of file
......@@ -113,7 +113,7 @@ def main(args: dict):
first_10_logprobs_to_save.append(logprob_value)
output_filename = './Qwen3-30B-A3B_logprobs_K100AI_fp16.json'
output_filename = './Qwen3-30B-A3B-Thinking-2507_logprobs_A800_fp16.json'
with open(output_filename, 'w') as f:
json.dump(first_10_logprobs_to_save, f, indent=2)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment