- 18 Nov, 2024 1 commit
-
-
Wang, Yi authored
* add ipex moe implementation to support Mixtral and PhiMoe Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com> * update to ipex xpu 2.5 Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com> * torch has xpu support in 2.5 Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com> * fix oneapi basekit version Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com> * Apply suggestions from code review Co-authored-by:
Daniël de Kok <me@github.danieldk.eu> --------- Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com> Co-authored-by:
Daniël de Kok <me@github.danieldk.eu>
-
- 08 Oct, 2024 1 commit
-
-
Daniël de Kok authored
* Add support for fused MoE Marlin for AWQ This uses the updated MoE Marlin kernels from vLLM. * Add integration test for AWQ MoE
-
- 30 Sep, 2024 3 commits
-
-
Daniël de Kok authored
This change uses the updated Marlin MoE kernel from vLLM to support MoE with activation sorting and groups.
-
Daniël de Kok authored
This change add support for MoE models that use GPTQ quantization. Currently only models with the following properties are supported: - No `desc_act` with tensor parallelism, unless `group_size=-1`. - No asymmetric quantization. - No AWQ.
-
Mohit Sharma authored
* style * update torch * ix issues * fix clone * revert mkl * added custom PA * style * fix style * style * hide env vart * fix mixtral model * add skinny kernel and merge fixes * fixed style * fix issue for sliding window models * addressed review comments * fix import * improved error messag * updated default value * remove import * fix imports after rebase * float16 dep * improve dockerfile * cleaned dockerfile
-
- 24 Sep, 2024 1 commit
-
-
Daniël de Kok authored
This replaces the custom layers in both models.
-
- 17 Sep, 2024 1 commit
-
-
Daniël de Kok authored
* Move to moe-kernels package and switch to common MoE layer This change introduces the new `moe-kernels` package: - Add `moe-kernels` as a dependency. - Introduce a `SparseMoELayer` module that can be used by MoE models. - Port over Mixtral and Deepseek. * Make `cargo check` pass * Update runner
-