@@ -21,7 +21,7 @@ interface, RESTful APIs compliant with OpenAI and Ollama, and even a simplified
Our vision for KTransformers is to serve as a flexible platform for experimenting with innovative LLM inference optimizations. Please let us know if you need any other features.
<h2 id="Updates">🔥 Updates</h2>
***May 14, 2025**: Support Intel Arc GPU ([Tutorial](./en/xpu.md)).
***Apr 9, 2025**: Experimental support for LLaMA 4 models ([Tutorial](./en/llama4.md)).
***Apr 2, 2025**: Support Multi-concurrency. ([Tutorial](./en/balance-serve.md)).
We are excited to introduce **Intel GPU support** in KTransformers (Beta release). This implementation has been tested and developed using Intel Xeon Scalable processors and Intel Arc GPU's (such as A770 and B580).
## Installation Guide
### 1. Install Intel GPU Driver
Begin by installing the GPU drivers for your Intel GPU:
-[Official GPU Installation Guide for Intel GPUs](https://dgpu-docs.intel.com/driver/overview.html)
> [!Important]
> Ensure that **Resizable BAR** is enabled in your system's BIOS before proceeding. This is essential for optimal GPU performance and to avoid potential issues such as `Bus error (core dumped)`. For detailed steps, please refer to the official guidance [here](https://www.intel.com/content/www/us/en/support/articles/000090831/graphics.html).
### 2. Set Up Conda Environment
We recommend using Miniconda3/Anaconda3 for environment management:
2. Runtime error like `xpu/sycl/TensorCompareKernels.cpp:163: xxx. Aborted (core dumped)`
This error is mostly realted to GPU driver. If you meet such error, you could update your `intel-level-zero-gpu` to `1.3.29735.27-914~22.04` (which is a verified version by us) by below command.
@@ -1122,7 +1143,7 @@ class KDeepseekV3MoEV2(BaseInjectedModule, DeepseekV3MoE):
# only for generate phase
ifhasattr(self.experts.generate_experts,"submit_for_one_decode")andtorch.cuda.is_current_stream_capturing():# TODO: this branch cause jit bug
ifhasattr(self.experts.generate_experts,"submit_for_one_decode")andtorch.cuda.is_available()andtorch.cuda.is_current_stream_capturing():# TODO: this branch cause jit bug
ifhasattr(self.experts.generate_experts,"submit_for_one_decode")andtorch.cuda.is_current_stream_capturing():# TODO: this branch cause jit bug
ifhasattr(self.experts.generate_experts,"submit_for_one_decode")andtorch.cuda.is_available()andtorch.cuda.is_current_stream_capturing():# TODO: this branch cause jit bug
ifhasattr(self.experts.generate_experts,"submit_for_one_decode")andtorch.cuda.is_current_stream_capturing():# TODO: this branch cause jit bug
ifhasattr(self.experts.generate_experts,"submit_for_one_decode")andtorch.cuda.is_available()andtorch.cuda.is_current_stream_capturing():# TODO: this branch cause jit bug