-**Local 236B DeepSeek-Coder-V2:** Running its Q4_K_M version using only 11GB VRAM and 136GB DRAM, attainable on a local desktop machine, which scores even better than GPT4-0613 in [BigCodeBench](https://huggingface.co/blog/leaderboard-bigcodebench).
-**Local 236B DeepSeek-Coder-V2:** Running its Q4_K_M version using only 21GB VRAM and 136GB DRAM, attainable on a local desktop machine, which scores even better than GPT4-0613 in [BigCodeBench](https://huggingface.co/blog/leaderboard-bigcodebench).
For Windows, please add the CUDA_PATH to the "System variables" section of "Environment Variables" (suppose cuda is installed in "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X").
Then append the following paths to the "Path" variable.
```sh
%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp
```
- Linux-x86_64 with gcc, g++ and cmake
- Linux-x86_64 with gcc, g++ and cmake
```sh
```sh
...
@@ -154,15 +137,15 @@ Some preparation:
...
@@ -154,15 +137,15 @@ Some preparation:
git submodule init
git submodule init
git submodule update
git submodule update
```
```
-[Optional] If you want to run with website, please [compile the website](./doc/en/api/server/website.md) before execute ```bash install.sh```
-[Optional] If you want to run with website, please [compile the website](./doc/en/api/server/website.md) before execute ```bash install.sh```
- Compile and install (for Linux)
- Compile and install (for Linux)
```
```
bash install.sh
bash install.sh
```
```
- Compile and install(for Windows)
- Compile and install(for Windows)
```
```
...
@@ -203,7 +186,7 @@ It features the following arguments:
...
@@ -203,7 +186,7 @@ It features the following arguments:
> Note: <strong>.safetensors</strong> files are not required in the directory. We only need config files to build model and tokenizer.
> Note: <strong>.safetensors</strong> files are not required in the directory. We only need config files to build model and tokenizer.
- `--gguf_path` (required): Path of a directory containing GGUF files which could that can be downloaded from [Hugging Face](https://huggingface.co/mzwing/DeepSeek-V2-Lite-Chat-GGUF/tree/main) (we only support q4_k_m and q8_0 for now, more formats are coming soon).
-`--gguf_path` (required): Path of a directory containing GGUF files which could that can be downloaded from [Hugging Face](https://huggingface.co/mzwing/DeepSeek-V2-Lite-Chat-GGUF/tree/main).
-`--optimize_rule_path` (required except for Qwen2Moe and DeepSeek-V2): Path of YAML file containing optimize rules. There are two rule files pre-written in the [ktransformers/optimize/optimize_rules](ktransformers/optimize/optimize_rules) directory for optimizing DeepSeek-V2 and Qwen2-57B-A14, two SOTA MoE models.
-`--optimize_rule_path` (required except for Qwen2Moe and DeepSeek-V2): Path of YAML file containing optimize rules. There are two rule files pre-written in the [ktransformers/optimize/optimize_rules](ktransformers/optimize/optimize_rules) directory for optimizing DeepSeek-V2 and Qwen2-57B-A14, two SOTA MoE models.
...
@@ -211,18 +194,17 @@ It features the following arguments:
...
@@ -211,18 +194,17 @@ It features the following arguments:
-`--cpu_infer`: Int (default=10). The number of CPUs used for inference. Should ideally be set to the (total number of cores - 2).
-`--cpu_infer`: Int (default=10). The number of CPUs used for inference. Should ideally be set to the (total number of cores - 2).
<h3 id="supported-model"> Supported Model</h3>
<h3 id="supported-model"> Suggested Model</h3>
| Model Name | Model Size | VRAM | Minimum DRAM | Recommended DRAM |
| Model Name | Model Size | VRAM | Minimum DRAM | Recommended DRAM |
More will come soon. Please let us know which models you are most interested in.
More will come soon. Please let us know which models you are most interested in.
Be aware that you need to be subject to their corresponding model licenses when using [DeepSeek](https://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/LICENSE) and [QWen](https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE).
Be aware that you need to be subject to their corresponding model licenses when using [DeepSeek](https://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/LICENSE) and [QWen](https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE).
...
@@ -232,43 +214,46 @@ Be aware that you need to be subject to their corresponding model licenses when
...
@@ -232,43 +214,46 @@ Be aware that you need to be subject to their corresponding model licenses when