server/text_generation_server/utils/weights.py · dbb23fbfa868ad8f961c60896e346fad3d2ab440 · OpenDAS / text-generation-inference

Use symmetric quantization in the `quantize` subcommand (#2120) · dbb23fbf

Daniël de Kok authored Jul 12, 2024

Packing of asymmetric quantization is broken, all (q)zeros values
of `0` get reset to `1`, resulting in a loss of accuracy. So instead
use symmetric quantization. To be able to distinguish models with
symmetric and asymmetric quantization, a new config tensor `gptq_sym` is
added. If this tensor is not present, we assume `sym=False`.

dbb23fbf

weights.py 12.1 KB

Replace weights.py