Update precise.md

19b55b09 · chenzk · 8b91cbca · 19b55b09
Commit 19b55b09 authored Sep 17, 2025 by chenzk
Hide whitespace changes
Inline Side-by-side

Showing with 35 additions and 19 deletions

README.md README.md +35 -19

No files found.
--- a/README.md
+++ b/README.md
@@ -65,7 +65,8 @@ pip install transformers==4.51.1
 暂无
 ## 推理

-### vllm离线推理Qwen3-30B-A3B
+vllm离线推理Qwen3-30B-A3B:
+
 ```bash
 ## Qwen3-30B-A3B 在 BF16 精度下，其模型权重本身大约是 61 GB，至少需要双卡部署推理
 export HIP_VISIBLE_DEVICES=6,7 
@@ -74,6 +75,18 @@ python ./infer/offline/infer_vllm.py --model /your_path/Qwen3-30B-A3B --tensor-p
 ```

 ## result
+
+### result一
+
+vllm离线推理Qwen3-30B-A3B:
+
+```bash
+## Qwen3-30B-A3B 在 BF16 精度下，其模型权重本身大约是 61 GB，至少需要双卡部署推理
+export HIP_VISIBLE_DEVICES=6,7 
+## 模型地址参数
+python ./infer/offline/infer_vllm.py --model /your_path/Qwen3-30B-A3B --tensor-parallel-size 2
+```
+
 ```
 Original Input Prompt (if available):
 '介绍一下北京.'
@@ -102,7 +115,7 @@ Logprobs per generated token:
 成功将每个生成token的logprob写入到文件: ...
 ```

-### 精度
+### 精度测试一
 ```
 # 分别在DCU和GPU上运行infer_vllm.py，得到各自的精度数据，并将精度数据复制粘贴到acc.py中运行
 python ./infer/offline/acc.py
@@ -113,8 +126,10 @@ Qwen3-30B-A3B在DCU(K100_AI)与GPU(A800)离线推理的平均绝对误差值：0
 ```
 DCU(K100_AI)与GPU(A800)离线推理Qwen3-30B-A3B精度一致，推理框架：vllm

+### result二
+
+vllm离线推理Qwen3-30B-A3B-Instruct-2507:

-### vllm离线推理Qwen3-30B-A3B-Instruct-2507
 ```bash
 ## Qwen3-30B-A3B-Instruct-2507 至少需要双卡部署推理
 export HIP_VISIBLE_DEVICES=6,7 
@@ -122,7 +137,6 @@ export HIP_VISIBLE_DEVICES=6,7
 python ./infer/offline/infer_vllm.py --model /your_path/Qwen3-30B-A3B-Instruct-2507 --tensor-parallel-size 2
 ```

-## result
 ```
 Original Input Prompt (if available):
 '介绍一下北京.'
@@ -151,7 +165,7 @@ Logprobs per generated token:
 成功将每个生成token的logprob写入到文件: ...
 ```

-### 精度
+### 精度测试二
 ```
 # 分别在DCU和GPU上运行infer_vllm.py，得到各自的精度数据，并将精度数据复制粘贴到acc.py中运行
 python ./infer/offline/acc.py
@@ -162,9 +176,10 @@ Qwen3-30B-A3B-Instruct-2507在DCU(K100_AI)与GPU(A800)离线推理的平均绝
 ```
 DCU(K100_AI)与GPU(A800)离线推理Qwen3-30B-A3B-Instruct-2507精度一致，推理框架：vllm

+### result三

+vllm离线推理Qwen3-30B-A3B-Thinking-2507:

-### vllm离线推理Qwen3-30B-A3B-Thinking-2507
 ```bash
 ## Qwen3-30B-A3B-Thinking-2507 至少需要双卡部署推理
 export HIP_VISIBLE_DEVICES=6,7 
@@ -172,7 +187,6 @@ export HIP_VISIBLE_DEVICES=6,7
 python ./infer/offline/infer_vllm.py --model /your_path/Qwen3-30B-A3B-Thinking-2507 --tensor-parallel-size 2
 ```

-## result
 ```
 Original Input Prompt (if available):
 '介绍一下北京.'
@@ -201,7 +215,7 @@ Logprobs per generated token:
 成功将每个生成token的logprob写入到文件: ...
 ```

-### 精度
+### 精度测试三
 ```
 # 分别在DCU和GPU上运行infer_vllm.py，得到各自的精度数据，并将精度数据复制粘贴到acc.py中运行
 python ./infer/offline/acc.py
@@ -212,8 +226,10 @@ Qwen3-30B-A3B-Thinking-2507在DCU(K100_AI)与GPU(A800)离线推理的平均绝
 ```
 DCU(K100_AI)与GPU(A800)离线推理Qwen3-30B-A3B-Thinking-2507精度一致，推理框架：vllm

+### result四
+
+vllm在线推理Qwen3-30B-A3B:

-### vllm在线推理Qwen3-30B-A3B
 ```bash
 ## Qwen3-30B-A3B 至少需要双卡部署
 export HIP_VISIBLE_DEVICES=6,7 
@@ -223,8 +239,6 @@ vllm serve /your_path/Qwen3-30B-A3B --enable-reasoning --reasoning-parser deepse
 python client.py
 ```

-
-## result
 ```
 欢迎使用 Qwen3-30B-A3B 聊天客户端！
 已连接到 vLLM 服务，使用模型: /home/zwq/model/Qwen3-30B-A3B
@@ -251,7 +265,7 @@ python client.py
 所有测试结果已保存到文件: ./Qwen3-30B-A3B_logprobs_K100AI_fp16.json
 ```

-### 精度
+### 精度测试四
 ```bash
 ## 分别在DCU和GPU上启动vllm服务，并对应运行client.py，得到各自的精度数据后，运行online文件夹下的acc.py
 python ./infer/online/acc.py --file1 /your_path/Qwen3-30B-A3B_logprobs_A800_fp16.json --file2 /your_path/Qwen3-30B-A3B_logprobs_K100AI_fp16.json
@@ -273,8 +287,10 @@ python ./infer/online/acc.py --file1 /your_path/Qwen3-30B-A3B_logprobs_A800_fp16
 ```
 DCU(K100_AI)与GPU(A800)在线推理Qwen3-30B-A3B精度一致，推理框架：vllm

+### result五
+
+vllm在线推理Qwen3-30B-A3B-Instruct-2507:

-### vllm在线推理Qwen3-30B-A3B-Instruct-2507
 ```bash
 ## Qwen3-30B-A3B-Instruct-2507 至少需要双卡部署
 export HIP_VISIBLE_DEVICES=6,7 
@@ -284,7 +300,6 @@ vllm serve /your_path/Qwen3-30B-A3B-Instruct-2507 --tensor-parallel-size 2 --max
 python client.py
 ```

-## result
 ```
 欢迎使用 Qwen3-30B-A3B 聊天客户端！
 已连接到 vLLM 服务，使用模型: /home/zwq/model/Qwen3-30B-A3B-Instruct-2507
@@ -311,7 +326,7 @@ python client.py
 所有测试结果已保存到文件: ./Qwen3-30B-A3B-Instruct-2507_logprobs_K100AI_fp16.json
 ```

-### 精度
+### 精度测试五
 ```bash
 ## 分别在DCU和GPU上启动vllm服务，并对应运行client.py，得到各自的精度数据后，运行online文件夹下的acc.py
 python ./infer/online/acc.py --file1 /your_path/Qwen3-30B-A3B-Instruct-2507_logprobs_A800_fp16.json --file2 /your_path/Qwen3-30B-A3B-Instruct-2507_logprobs_K100AI_fp16.json
@@ -333,8 +348,10 @@ python ./infer/online/acc.py --file1 /your_path/Qwen3-30B-A3B-Instruct-2507_logp
 ```
 DCU(K100_AI)与GPU(A800)在线推理Qwen3-30B-A3B-Instruct-2507精度一致，推理框架：vllm

+### result六
+
+vllm在线推理Qwen3-30B-A3B-Thinking-2507:

-### vllm在线推理Qwen3-30B-A3B-Thinking-2507
 ```bash
 ## Qwen3-30B-A3B-Thinking-2507 至少需要双卡部署
 export HIP_VISIBLE_DEVICES=6,7 
@@ -344,7 +361,6 @@ vllm serve /your_path/Qwen3-30B-A3B-Thinking-2507 --tensor-parallel-size 2 --max
 python client.py
 ```

-## result
 ```
 欢迎使用 Qwen3-30B-A3B 聊天客户端！
 已连接到 vLLM 服务，使用模型: /home/zwq/model/Qwen3-30B-A3B-Thinking-2507
@@ -371,7 +387,7 @@ python client.py
 所有测试结果已保存到文件: ./Qwen3-30B-A3B-Thinking-2507_logprobs_K100AI_fp16.json
 ```

-### 精度
+### 精度测试六
 ```bash
 ## 分别在DCU和GPU上启动vllm服务，并对应运行client.py，得到各自的精度数据后，运行online文件夹下的acc.py
 python ./infer/online/acc.py --file1 /your_path/Qwen3-30B-A3B-Thinking-2507_logprobs_A800_fp16.json --file2 /your_path/Qwen3-30B-A3B-Thinking-2507_logprobs_K100AI_fp16.json
@@ -396,7 +412,7 @@ DCU(K100_AI)与GPU(A800)在线推理Qwen3-30B-A3B-Thinking-2507精度一致，

 ## 应用场景
 ### 算法类别
-`对话`
+`对话问答`
 ### 热点应用行业
 `金融,教育,政府,科研,制造,能源,交通`
 ## 预训练权重