Merge pull request #1301 from kvcache-ai/update-readme

update readme

Merge pull request #1301 from kvcache-ai/update-readme
update readme
2d3aaef8 · wang jiahao · GitHub · ee524b0f · d35d61f6 · 2d3aaef8
Unverified Commit 2d3aaef8 authored May 14, 2025 by wang jiahao Committed by GitHub May 14, 2025
Showing with 8 additions and 0 deletions

doc/en/balance-serve.md doc/en/balance-serve.md +2 -0

doc/en/install.md doc/en/install.md +2 -0

doc/en/llama4.md doc/en/llama4.md +2 -0

doc/zh/DeepseekR1_V3_tutorial_zh.md doc/zh/DeepseekR1_V3_tutorial_zh.md +2 -0

No files found.
--- a/doc/en/balance-serve.md
+++ b/doc/en/balance-serve.md
@@ -100,8 +100,10 @@ git submodule update --init --recursive
 # Install single NUMA dependencies
 USE_BALANCE_SERVE=1  bash ./install.sh
+pip install third_party/custom_flashinfer/
 # For those who have two cpu and 1T RAM（Dual NUMA）:
 USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
+pip install third_party/custom_flashinfer/
 ```
 ## Running DeepSeek-R1-Q4KM Models

--- a/doc/en/install.md
+++ b/doc/en/install.md
@@ -117,11 +117,13 @@ Download source code and compile:
    ```shell
    USE_BALANCE_SERVE=1 bash ./install.sh
+    pip install third_party/custom_flashinfer/
    ```
  - For Multi-concurrency with two cpu and 1T RAM:
    ```shell
    USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
+    pip install third_party/custom_flashinfer/
    ```
 - For Windows (Windows native temporarily deprecated, please try WSL)

--- a/doc/en/llama4.md
+++ b/doc/en/llama4.md
@@ -68,8 +68,10 @@ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.o
 ```bash
 # Install single NUMA dependencies
 USE_BALANCE_SERVE=1  bash ./install.sh
+pip install third_party/custom_flashinfer/
 # For those who have two cpu and 1T RAM（Dual NUMA）:
 USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
+pip install third_party/custom_flashinfer/
 ```
 ### 4. Use our custom config.json

--- a/doc/zh/DeepseekR1_V3_tutorial_zh.md
+++ b/doc/zh/DeepseekR1_V3_tutorial_zh.md
@@ -127,8 +127,10 @@ cd ktransformers
 git submodule update --init --recursive
 # 如果使用双 numa 版本
 USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
+pip install third_party/custom_flashinfer/
 # 如果使用单 numa 版本
 USE_BALANCE_SERVE=1 bash ./install.sh
+pip install third_party/custom_flashinfer/
 # 启动命令
 python ktransformers/server/main.py --model_path <your model path> --gguf_path <your gguf path> --cpu_infer 62 --optimize_config_path <inject rule path> --port 10002 --chunk_size 256 --max_new_tokens 1024 --max_batch_size 4 --port 10002 --cache_lens 32768 --backend_type balance_serve
 ```