Unverified Commit b25ad4ec authored by Azure's avatar Azure Committed by GitHub
Browse files

Merge pull request #39 from Azure-Tang/develop-0.1.2

[fix] fix broken link in tutorial.
parents 77a34c28 7199699d
...@@ -22,6 +22,12 @@ interface, RESTful APIs compliant with OpenAI and Ollama, and even a simplified ...@@ -22,6 +22,12 @@ interface, RESTful APIs compliant with OpenAI and Ollama, and even a simplified
<br/><br/> <br/><br/>
Our vision for KTransformers is to serve as a flexible platform for experimenting with innovative LLM inference optimizations. Please let us know if you need any other features. Our vision for KTransformers is to serve as a flexible platform for experimenting with innovative LLM inference optimizations. Please let us know if you need any other features.
<h2 id="Updates">🔥 Updates</h2>
* **Aug 15, 2024**: Update detailed [TUTORIAL](doc/en/injection_tutorial.md) for injection and multi-GPU.
* **Aug 14, 2024**: Support llamfile as linear backend,
* **Aug 12, 2024**: Support multiple GPU; Support new model: mixtral 8\*7B and 8\*22B; Support q2k, q3k, q5k dequant on gpu.
* **Aug 9, 2024**: Support windows native.
<h2 id="show-cases">🔥 Show Cases</h2> <h2 id="show-cases">🔥 Show Cases</h2>
<h3>GPT-4-level Local VSCode Copilot on a Desktop with only 24GB VRAM</h3> <h3>GPT-4-level Local VSCode Copilot on a Desktop with only 24GB VRAM</h3>
......
...@@ -165,7 +165,7 @@ Through these two rules, we place all previously unmatched layers (and their sub ...@@ -165,7 +165,7 @@ Through these two rules, we place all previously unmatched layers (and their sub
## Muti-GPU ## Muti-GPU
If you have multiple GPUs, you can set the device for each module to different GPUs. If you have multiple GPUs, you can set the device for each module to different GPUs.
DeepseekV2-Chat got 60 layers, if we got 2 GPUs, we can allocate 30 layers to each GPU. Complete multi GPU rule examples [here](ktransformers/optimize/optimize_rules). DeepseekV2-Chat got 60 layers, if we got 2 GPUs, we can allocate 30 layers to each GPU. Complete multi GPU rule examples [here](https://github.com/kvcache-ai/ktransformers/blob/main/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu.yaml).
<p align="center"> <p align="center">
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment