Fix bugs

590059ff · chenych · 715e39c4 · 590059ff · 590059ff · 590059ff
Commit 590059ff authored Oct 03, 2025 by chenych
Hide whitespace changes
Inline Side-by-side

Showing with 11 additions and 4 deletions

README.md README.md +11 -4

doc/config.png doc/config.png +0 -0

doc/results_dcu.jpg doc/results_dcu.jpg +0 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -76,7 +76,12 @@ cd /your_code_path/deepseek-v3.2-exp_pytorch
 ## 推理
 样例模型：[DeepSeek-V3.2-Exp](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp)
-首先将模型转换成bf16格式
+首先将模型转换成bf16格式，转换完成后，将原模型中的 `config.json`, `generation_config.json`, `tokenizer_config.json`, `tokenizer.json`拷贝到`/path/to/DeepSeek-V3.2-Exp-bf16`中，并删掉`config.json`中的`quantization_config`字段，如下图所示。
+<div align=center>
+    <img src="./doc/config.png"/>
+</div>
 ```bash
 # fp8转bf16
 python inference/fp8_cast_bf16.py --input-fp8-hf-path /path/to/DeepSeek-V3.2-Exp --output-bf16-hf-path /path/to/DeepSeek-V3.2-Exp-bf16
@@ -143,17 +148,16 @@ ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
 ```bash
 vllm serve deepseek-v3.2/DeepSeek-V3.2-Exp-bf16 \
-    --enforce-eager \
    --trust-remote-code \
    --distributed-executor-backend ray \
    --dtype bfloat16 \
    --tensor-parallel-size 32 \
-    --max-model-len 32768 \
+    --max-model-len 1024 \
    --max-num-seqs 128 \
    --no-enable-chunked-prefill \
    --no-enable-prefix-caching \
    --gpu-memory-utilization 0.85 \
-    --host 127.0.0.1 \
+    --host 12.12.12.11 \
    --port 8001 \
    --kv-cache-dtype bfloat16
 ```
@@ -178,6 +182,9 @@ curl http://127.0.0.1:8001/v1/chat/completions \
 ```
 ## result
+<div align=center>
+    <img src="./doc/results_dcu.jpg"/>
+</div>
 ### 精度
 DCU与GPU精度一致，推理框架：vllm。

--- a/doc/config.png
+++ b/doc/config.png
--- a/doc/results_dcu.jpg
+++ b/doc/results_dcu.jpg