1. 23 Jul, 2024 1 commit
    • Daniël de Kok's avatar
      Add support for repacking AWQ weights for GPTQ-Marlin (#2278) · 9935720c
      Daniël de Kok authored
      * Add support for repacking AWQ weights for GPTQ-Marlin
      
      So far we couldn't support AWQ because virtually all AWQ models use
      symmetric quantization, which GPTQ-Marlin did not suppors. GPTQ-Marlin
      has recently added support AWQ repacking and AWQ asymmetric quantization
      (zero_point=True).
      
      This change updates all GPTQ-Marlin kernels from upstream and wires up
      AWQ support. For now enabling AWQ using Marlin requires running TGI with
      `--quantize gptq`.
      
      * Enable Marlin for supported AWQ configurations by default
      
      This makes the AWQ -> GPTQ repack test redundant, since we are now
      testing this with the regular AWQ test.
      9935720c