Commits · 1c84a30fe6353c8691f4809f64ec77c2cfeeb246 · OpenDAS / text-generation-inference

30 Sep, 2024 1 commit
- MoE Marlin: support `desc_act` for `groupsize != -1` (#2590) · 1c84a30f
  Daniël de Kok authored Sep 30, 2024
```
This change uses the updated Marlin MoE kernel from vLLM to support
MoE with activation sorting and groups.
```
  1c84a30f
31 Jul, 2024 1 commit

Handle GPTQ-Marlin loading in `GPTQMarlinWeightLoader` (#2300) · 34f7dcfd

Daniël de Kok authored Jul 31, 2024

The `GPTWeightLoader` was structured like this in pseudocode:

if marlin:
  Set up tensors in a way that GPTQ-Marlin expects
else:
  Set up tensors in a way that ExLlama/GPTQ/AWQ expect

However, the GPT-Marlin implementation details should really be in the
`marlin` module. So move the former part out to a separate
`GPTQMarlinWeightsLoader`.

34f7dcfd

29 Jul, 2024 1 commit
- Install Marlin from standalone package (#2320) · 922732b2
  Daniël de Kok authored Jul 29, 2024
  
  922732b2
24 Jul, 2024 1 commit
- Split up `layers.marlin` into several files (#2292) · 93d2b9fe
  Daniël de Kok authored Jul 24, 2024
```
The marlin.py file was getting large, split it up.
```
  93d2b9fe