support 8w8a model

f9220158 · weishb · 618b21ce · f9220158 · f9220158
Commit f9220158 authored Mar 12, 2026 by weishb
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 2 deletions

README.md README.md +5 -2

vllm-0.15.1+das.opt1.alpha.dtk2604-cp310-cp310-linux_x86_64.whl ....15.1+das.opt1.alpha.dtk2604-cp310-cp310-linux_x86_64.whl +0 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -46,7 +46,9 @@ docker run -it \

 关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装，numpy、transformers库需要替换安装：
 ```
-pip install git+https://github.com/huggingface/transformers.git
+pip uninstall vllm
+pip install vllm-0.15.1+das.opt1.alpha.dtk2604-cp310-cp310-linux_x86_64.whl
+pip install transformers==5.2.0
 pip install numpy==1.25.0
 ```

@@ -60,7 +62,7 @@ pip install numpy==1.25.0
 ### vllm
 #### 单机推理

-**注意**：使用`K100 AI` 启动服务时需要添加`--disable-custom-all-reduce`参数
+**注意**：使用`K100 AI` 启动服务时需要添加`--disable-custom-all-reduce`参数,加载8W8A模型启动服务时需要添加`-cc.mode=3`和`-cc.inductor_compile_config='{"combo_kernels": false, "benchmark_combo_kernel": false}'`

 ```bash
 ## serve启动
@@ -174,6 +176,7 @@ DCU与GPU精度一致，推理框架：vllm。
 |  模型名称  | 权重大小 | DCU型号  | 最低卡数需求 |         下载地址          |
 |:------:|:----:|:----------:|:------:|:---------------------:|
 | Qwen3.5-397B-A17B | 397B | K100AI,BW1000 |   16   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) |
+| Qwen3.5-397B-A17B | 397B | K100AI,BW1000 |   8   | [Modelscope](https://www.modelscope.cn/models/metax-tech/Qwen3.5-397B-A17B-W8A8) |
 | Qwen3.5-122B-A10B | 122B | K100AI,BW1000 |   8   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-122B-A10B) |
 | Qwen3.5-35B-A3B | 35B | K100AI,BW1000 |   2   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) |
 | Qwen3.5-27B | 27B | K100AI,BW1000 |   2   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-27B) |

--- a/vllm-0.15.1+das.opt1.alpha.dtk2604-cp310-cp310-linux_x86_64.whl
+++ b/vllm-0.15.1+das.opt1.alpha.dtk2604-cp310-cp310-linux_x86_64.whl