add glm-4-9b-chat-1m

410a0679 · panhb · 6100d6e5 · 410a0679 · 410a0679
Commit 410a0679 authored Jan 08, 2025 by panhb
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 1 deletion

README.md README.md +2 -1

benchmarks/__pycache__/backend_request_func.cpython-310.pyc benchmarks/__pycache__/backend_request_func.cpython-310.pyc +0 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -91,7 +91,7 @@ conda create -n chatglm_vllm python=3.10
 | -------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
 | [chatglm2-6b](http://113.200.138.88:18080/aimodels/chatglm2-6b)         | [chatglm2-6b-32k](http://113.200.138.88:18080/aimodels/thudm/chatglm2-6b-32k.git) |
 | [chatglm3-6b](http://113.200.138.88:18080/aimodels/chatglm3-6b)         | [chatglm3-6b-32k](http://113.200.138.88:18080/aimodels/chatglm3-6b-32k)           |
-| [glm-4-9b-chat](http://113.200.138.88:18080/aimodels/glm-4-9b-chat.git) |                                                                                |
+| [glm-4-9b-chat](http://113.200.138.88:18080/aimodels/glm-4-9b-chat.git) | [glm-4-9b-chat-1m](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m)         |

 ### 离线批量推理

@@ -111,6 +111,7 @@ python benchmarks/benchmark_throughput.py --num-prompts 1 --input-len 32 --outpu
 ```

 其中 `--num-prompts`是batch数，`--input-len`是输入seqlen，`--output-len`是输出token长度，`--model`为模型路径，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。若指定 `--output-len  1`即为首字延迟。`-q gptq`为使用gptq量化模型进行推理。
+glm-4-9b-chat-1m模型默认的model_max_length为1024000，官方vllm也尚不支持该长度，模型启动时必须添加--max_model_len(包括后面的启动命令), 经测试，500000左右也可以正常进行推理。

 2、使用数据集
 下载数据集：

--- a/benchmarks/__pycache__/backend_request_func.cpython-310.pyc
+++ b/benchmarks/__pycache__/backend_request_func.cpython-310.pyc