Commit 347262c4 authored by raojy's avatar raojy 💬
Browse files

Update README.md

parent 97223342
...@@ -57,12 +57,14 @@ docker run -it \ ...@@ -57,12 +57,14 @@ docker run -it \
#### 单机推理 #### 单机推理
```bash ```bash
## serve启动 ## serve启动
vllm serve Qwen/Qwen3.6-35B-A3B \ vllm serve /public/home/raojy/project/model_code/qwen36 \
--port 8001 \ --port 8001 \
--trust-remote-code \ --trust-remote-code \
--dtype bfloat16 \ --dtype bfloat16 \
--tensor-parallel-size 4 \ --tensor-parallel-size 4 \
--gpu-memory-utilization 0.925 --gpu-memory-utilization 0.925 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder
## client访问 ## client访问
curl -X POST "http://localhost:8001/v1/chat/completions" -H "Content-Type: application/json" -d '{ curl -X POST "http://localhost:8001/v1/chat/completions" -H "Content-Type: application/json" -d '{
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment