Commit 97223342 authored by raojy's avatar raojy 💬
Browse files

Update README.md

parent 9904ad79
......@@ -65,14 +65,17 @@ vllm serve Qwen/Qwen3.6-35B-A3B \
--gpu-memory-utilization 0.925
## client访问
curl -X POST "http://localhost:8001/v1/chat/completions" -H "Content-Type: application/json" -d '{
"model": "Qwen/Qwen3.6-35B-A3B",
curl -X POST "http://localhost:8001/v1/chat/completions" -H "Content-Type: application/json" -d '{
"model": "/public/home/raojy/project/model_code/qwen36",
"messages": [
{"role": "system", "content": "你是一个有用的助手。"},
{"role": "user", "content": "你好,请做一下简单的自我介绍。"}
],
"max_tokens": 512,
"temperature": 0.7,
"stream": false
}'
```
## 效果展示
......@@ -87,7 +90,7 @@ DCU与GPU精度一致,推理框架:vllm。
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----------:|:------:|:---------------------:|
| Qwen3.6-35B-A3B | 35 | BW1000 | 4 | [Hugging Face](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) |
| Qwen3.6-35B-A3B-FP8 | 35B | BW1000 | 2 | [Hugging Face](https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8) |
## 源码仓库及问题反馈
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment