• Vivek Khandelwal's avatar
    Add support for loading GPTQ models on CPU (#26719) · 2963e196
    Vivek Khandelwal authored
    * Add support for loading GPTQ models on CPU
    
    Right now, we can only load the GPTQ Quantized model on the CUDA
    device. The attribute `gptq_supports_cpu` checks if the current
    auto_gptq version is the one which has the cpu support for the
    model or not.
    The larger variants of the model are hard to load/run/trace on
    the GPU and that's the rationale behind adding this attribute.
    
    Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
    
    * Update quantization.md
    
    * Update quantization.md
    
    * Update quantization.md
    2963e196
quantization.md 20.7 KB