• Daniël de Kok's avatar
    Add support for repacking AWQ weights for GPTQ-Marlin (#2278) · 9935720c
    Daniël de Kok authored
    * Add support for repacking AWQ weights for GPTQ-Marlin
    
    So far we couldn't support AWQ because virtually all AWQ models use
    symmetric quantization, which GPTQ-Marlin did not suppors. GPTQ-Marlin
    has recently added support AWQ repacking and AWQ asymmetric quantization
    (zero_point=True).
    
    This change updates all GPTQ-Marlin kernels from upstream and wires up
    AWQ support. For now enabling AWQ using Marlin requires running TGI with
    `--quantize gptq`.
    
    * Enable Marlin for supported AWQ configurations by default
    
    This makes the AWQ -> GPTQ repack test redundant, since we are now
    testing this with the regular AWQ test.
    9935720c
quantization.py 6.52 KB