"git@developer.sourcefind.cn:OpenDAS/torch-cluster.git" did not exist on "6f222280cee19ee5e7f9fda8d976d66917afcfc5"
Unverified Commit 94476ce5 authored by wang jiahao's avatar wang jiahao Committed by GitHub
Browse files

Merge pull request #1085 from kvcache-ai/qiyuxinlin-patch-5

Update balance-serve.md
parents 41ce92bb 23ceb1c0
...@@ -133,7 +133,7 @@ It features the following arguments: ...@@ -133,7 +133,7 @@ It features the following arguments:
corresponding to 32768 tokens, and the space occupied will be released after the requests are completed. corresponding to 32768 tokens, and the space occupied will be released after the requests are completed.
- `--backend_type`: `balance_serve` is a multi-concurrency backend engine introduced in version v0.2.4. The original single-concurrency engine is `ktransformers`. - `--backend_type`: `balance_serve` is a multi-concurrency backend engine introduced in version v0.2.4. The original single-concurrency engine is `ktransformers`.
- `--model_path`: Path to safetensor config path (only config required, not model safetensors). - `--model_path`: Path to safetensor config path (only config required, not model safetensors).
Please note that, since `ver 0.2.4`, the last segment of `${model_path}` directory name **MUST** be one of the model names defined in `ktransformers/configs/model_configs.json`. Please note that, since `ver 0.2.4`, the last segment of `${model_path}` directory name **MUST** be a local directory that contains the model's configuration files. Hugging Face links (e.g., deepseek-ai/DeepSeek-R1) are not supported at the moment.
- `--force_think`: Force responding the reasoning tag of `DeepSeek R1`. - `--force_think`: Force responding the reasoning tag of `DeepSeek R1`.
The relationship between `max_batch_size`, `cache_lens`, and `max_new_tokens` should satisfy: The relationship between `max_batch_size`, `cache_lens`, and `max_new_tokens` should satisfy:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment