Unverified Commit 94476ce5 authored by wang jiahao's avatar wang jiahao Committed by GitHub
Browse files

Merge pull request #1085 from kvcache-ai/qiyuxinlin-patch-5

Update balance-serve.md
parents 41ce92bb 23ceb1c0
......@@ -133,7 +133,7 @@ It features the following arguments:
corresponding to 32768 tokens, and the space occupied will be released after the requests are completed.
- `--backend_type`: `balance_serve` is a multi-concurrency backend engine introduced in version v0.2.4. The original single-concurrency engine is `ktransformers`.
- `--model_path`: Path to safetensor config path (only config required, not model safetensors).
Please note that, since `ver 0.2.4`, the last segment of `${model_path}` directory name **MUST** be one of the model names defined in `ktransformers/configs/model_configs.json`.
Please note that, since `ver 0.2.4`, the last segment of `${model_path}` directory name **MUST** be a local directory that contains the model's configuration files. Hugging Face links (e.g., deepseek-ai/DeepSeek-R1) are not supported at the moment.
- `--force_think`: Force responding the reasoning tag of `DeepSeek R1`.
The relationship between `max_batch_size`, `cache_lens`, and `max_new_tokens` should satisfy:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment