"composable_kernel/include/utility/utility.hpp" did not exist on "d6d9a8e4cee89feef6758f825cfea1588fec16da"
Commit 20f4e124 authored by chenych's avatar chenych
Browse files

Update GLM5

parent dc06c77b
......@@ -20,7 +20,7 @@
```bash
docker run -it \
--shm-size 60g \
--shm-size 200g \
--network=host \
--name glm-5 \
--privileged \
......@@ -110,21 +110,24 @@ ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
3. 启动vllm server
```bash
vllm serve zai-org/GLM-5 \
--port 8001 \
--trust-remote-code \
--tensor-parallel-size 32 \
--gpu-memory-utilization 0.85 \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 1 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
--served-model-name glm-5
--port 8001 \
--trust-remote-code \
--tensor-parallel-size 32 \
--gpu-memory-utilization 0.85 \
--distributed-executor-backend ray \
--dtype bfloat16 \
--max-model-len 32768 \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 1 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
--served-model-name glm-5
```
启动完成后可通过以下方式访问:
```bash
curl http://localhost:8001/v1/chat/completions \
curl http://12.12.12.83:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5",
......@@ -132,14 +135,14 @@ curl http://localhost:8001/v1/chat/completions \
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize GLM-5 in one sentence."}
],
"max_tokens": 4096,
"temperature": 1
"max_tokens": 200,
"temperature": 0.7
}'
```
## 效果展示
<div align=center>
<img src="./doc/xxx.png"/>
<img src="./doc/result.png"/>
</div>
### 精度
......
doc/result.png

31.7 KB | W: | H:

doc/result.png

104 KB | W: | H:

doc/result.png
doc/result.png
doc/result.png
doc/result.png
  • 2-up
  • Swipe
  • Onion skin
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment