<when you see chat, then press enter to load the text prompt_file>
```
\<your model path\> can be local or set from onlie hugging face like deepseek-ai/DeepSeek-V3. If onlie encounters connection problem, try use mirror(hf-mirror.com) <br>
\<your gguf path\> can also be onlie, but as its large we recommend you download it and quantize the model to what you want.<br>
the command numactl -N 1 -m 1 aims to adoid data transfer between numa nodes.
### dual socket version(64 cores)
make suer before you install(use install.sh or `make dev_install`), setting the env var `USE_NUMA=1` by `export USE_NUMA=1`(if already installed, reinstall it with this env var set) <br>
\<your gguf path\> can also be onlie, but as its large we recommend you download it and quantize the model to what you want<br>
The command numactl -N 1 -m 1 aims to adoid data transfer between numa nodes
#### Dual socket version(64 cores)
Make suer before you install(use install.sh or `make dev_install`), setting the env var `USE_NUMA=1` by `export USE_NUMA=1`(if already installed, reinstall it with this env var set) <br>
<when you see chat, then press enter to load the text prompt_file>
```
The parameters meaning is the same. But As we use dual socket, so we set cpu_infer to 65.
## some explanations
The parameters meaning is the same. But As we use dual socket, so we set cpu_infer to 65
## Some Explanations
1. Also we want to make further use of our two NUMA nodes on Xeon Gold cpu.
To avoid the cost of data transfer between nodes, we "copy" the critical matrix on
both nodes which takes more memory consumption but accelerates the prefill and decoding process.
But this method takes huge memory and slow when loading weights, So be patient when loading
and monitor the memory usage.(we are considering to make this method as an option). We are going to optimize this huge memory overhead. Stay tuned~ <br>
2.the command args `--cpu_infer 65` specifies how many cores to use(it's ok that it exceeds the physical number,
2.The command args `--cpu_infer 65` specifies how many cores to use(it's ok that it exceeds the physical number,
but it's not the more the better. Adjust it slightly lower to your actual number of cores)<br>