server/text_generation_server/layers/linear.py · f1f98e369fbdf3e1bb90bb7fb05ea00cb899c80d · OpenDAS / text-generation-inference

Add support for Marlin 2:4 sparsity (#2102) · f1f98e36

Daniël de Kok authored Jun 25, 2024

This change adds support for 2:4 sparsity when using Marlin
quantization. The 2:4 kernel is used when:

* The quantizer is `marlin`;
* the quantizer checkpoint format is `marlin_24`.

Fixes #2098.

f1f98e36

linear.py 8.29 KB

Replace linear.py