• Daniël de Kok's avatar
    Move quantized weight handling out of the `Weights` class (#2194) · 8511669c
    Daniël de Kok authored
    Quantized weights were loaded in the `Weights` class, but this was
    getting quite unwieldy, where every higher level method to load weights
    was a long conditional to cover all the different quantizers.
    
    This change moves loading of quantized weights out of the `Weights`
    class. This is done by defining a simple `WeightsLoader` interface
    that is implemented by `Exl2WeightsLoader`, `GPTQWeightsLoader`,
    and `MarlinWeightsLoader`. These implementations are in the quantizers'
    respective modules. The `Weights` class provides the low-level load
    operations (such as loading tensors or sharded tensors), but delegates
    loads that need quantizer-specific weight processing to a loader. The
    loaders still use the low-level functionality provided by `Weights`.
    
    I initially tried making a hierarchy where a class like `GPTQWeights`
    would inherit from `Weights`. But it is not very flexible (e.g. does
    not work well with the new weight storage mock used in tests) and
    the implicit indirections made the code harder to follow.
    8511669c
quantization.py 4.04 KB