You need to sign in or sign up before continuing.
Unverified Commit 2d3aaef8 authored by wang jiahao's avatar wang jiahao Committed by GitHub
Browse files

Merge pull request #1301 from kvcache-ai/update-readme

update readme
parents ee524b0f d35d61f6
...@@ -100,8 +100,10 @@ git submodule update --init --recursive ...@@ -100,8 +100,10 @@ git submodule update --init --recursive
# Install single NUMA dependencies # Install single NUMA dependencies
USE_BALANCE_SERVE=1 bash ./install.sh USE_BALANCE_SERVE=1 bash ./install.sh
pip install third_party/custom_flashinfer/
# For those who have two cpu and 1T RAM(Dual NUMA): # For those who have two cpu and 1T RAM(Dual NUMA):
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
pip install third_party/custom_flashinfer/
``` ```
## Running DeepSeek-R1-Q4KM Models ## Running DeepSeek-R1-Q4KM Models
......
...@@ -117,11 +117,13 @@ Download source code and compile: ...@@ -117,11 +117,13 @@ Download source code and compile:
```shell ```shell
USE_BALANCE_SERVE=1 bash ./install.sh USE_BALANCE_SERVE=1 bash ./install.sh
pip install third_party/custom_flashinfer/
``` ```
- For Multi-concurrency with two cpu and 1T RAM: - For Multi-concurrency with two cpu and 1T RAM:
```shell ```shell
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
pip install third_party/custom_flashinfer/
``` ```
- For Windows (Windows native temporarily deprecated, please try WSL) - For Windows (Windows native temporarily deprecated, please try WSL)
......
...@@ -68,8 +68,10 @@ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.o ...@@ -68,8 +68,10 @@ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.o
```bash ```bash
# Install single NUMA dependencies # Install single NUMA dependencies
USE_BALANCE_SERVE=1 bash ./install.sh USE_BALANCE_SERVE=1 bash ./install.sh
pip install third_party/custom_flashinfer/
# For those who have two cpu and 1T RAM(Dual NUMA): # For those who have two cpu and 1T RAM(Dual NUMA):
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
pip install third_party/custom_flashinfer/
``` ```
### 4. Use our custom config.json ### 4. Use our custom config.json
......
...@@ -127,8 +127,10 @@ cd ktransformers ...@@ -127,8 +127,10 @@ cd ktransformers
git submodule update --init --recursive git submodule update --init --recursive
# 如果使用双 numa 版本 # 如果使用双 numa 版本
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
pip install third_party/custom_flashinfer/
# 如果使用单 numa 版本 # 如果使用单 numa 版本
USE_BALANCE_SERVE=1 bash ./install.sh USE_BALANCE_SERVE=1 bash ./install.sh
pip install third_party/custom_flashinfer/
# 启动命令 # 启动命令
python ktransformers/server/main.py --model_path <your model path> --gguf_path <your gguf path> --cpu_infer 62 --optimize_config_path <inject rule path> --port 10002 --chunk_size 256 --max_new_tokens 1024 --max_batch_size 4 --port 10002 --cache_lens 32768 --backend_type balance_serve python ktransformers/server/main.py --model_path <your model path> --gguf_path <your gguf path> --cpu_infer 62 --optimize_config_path <inject rule path> --port 10002 --chunk_size 256 --max_new_tokens 1024 --max_batch_size 4 --port 10002 --cache_lens 32768 --backend_type balance_serve
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment