• Daniël de Kok's avatar
    Handle GPTQ-Marlin loading in `GPTQMarlinWeightLoader` (#2300) · 34f7dcfd
    Daniël de Kok authored
    The `GPTWeightLoader` was structured like this in pseudocode:
    
    if marlin:
      Set up tensors in a way that GPTQ-Marlin expects
    else:
      Set up tensors in a way that ExLlama/GPTQ/AWQ expect
    
    However, the GPT-Marlin implementation details should really be in the
    `marlin` module. So move the former part out to a separate
    `GPTQMarlinWeightsLoader`.
    34f7dcfd
quantization.py 7.29 KB