Commit 347262c4 authored by raojy's avatar raojy 💬
Browse files

Update README.md

parent 97223342
......@@ -57,12 +57,14 @@ docker run -it \
#### 单机推理
```bash
## serve启动
vllm serve Qwen/Qwen3.6-35B-A3B \
vllm serve /public/home/raojy/project/model_code/qwen36 \
--port 8001 \
--trust-remote-code \
--dtype bfloat16 \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.925
--gpu-memory-utilization 0.925 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder
## client访问
curl -X POST "http://localhost:8001/v1/chat/completions" -H "Content-Type: application/json" -d '{
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment