• Daniël de Kok's avatar
    Use GPTQ-Marlin for supported GPTQ configurations (#2111) · 2ce80194
    Daniël de Kok authored
    GPTQ-Marlin is currently the best-performing kernel for GPTQ models. So
    let's use it by default if the kernels are installed, the GPU supports
    it, and the kernels support the configuration.
    
    For models generated by `text-generation-server quantize`, use
    `sym=False`. This subcommand symmetric quantization since the beginning
    and incorrectly reporting the model to be symmetric will use
    GPTQ-Marlin (which does not support asymmetric quantization).
    2ce80194
weights.py 32 KB