1. 19 Jul, 2024 5 commits
    • Daniël de Kok's avatar
    • Daniël de Kok's avatar
      3b41e93a
    • Daniël de Kok's avatar
      18db78f2
    • Daniël de Kok's avatar
    • Daniël de Kok's avatar
      Improve the handling of quantized weights (#2250) · ba291dad
      Daniël de Kok authored
      * Improve the handling of quantized weights
      
      Handling of quantized weights was split between two mechanisms:
      
      - For quantized checkpoints, we used the new weight loader
        infrastructure.
      - For quantization while loading (EETQ, FP8, bitsandbytes) we
        instead relied on conditional in `get_linear`.
      
      Weight loaders support context managers to selectively load
      particular layers with different weight loaders, which is useful
      for models like Idefics2 AWQ, which uses a quantized text model,
      but unquantized vision and connector models. However, the context
      manager would be overrided by `get_linear`, which string-checks
      `quantizer`. Also, the context manager would not work with
      EETQ, FP8, and bitsandbytes.
      
      This change migrates all quantizers to the weight loader infrastructure.
      This has several benefits:
      
      - We can use context managers with all quantizers.
      - All the implementation details move down to the quantizer layers,
        `get_linear` does not need to know how to handle quantizer linear
        layers.
      - All quantizer weights are strongly typed, we don't pass around
        raw tensors.
      - We don't have to pass around the `quantizer` string everywhere.
      
      * Exclude non-MLP layers when using FP8 quantization with Llama
      ba291dad
  2. 18 Jul, 2024 1 commit
  3. 16 Jul, 2024 3 commits
  4. 15 Jul, 2024 2 commits
  5. 12 Jul, 2024 2 commits
  6. 11 Jul, 2024 1 commit
  7. 09 Jul, 2024 1 commit
    • Daniël de Kok's avatar
      Move quantized weight handling out of the `Weights` class (#2194) · 8511669c
      Daniël de Kok authored
      Quantized weights were loaded in the `Weights` class, but this was
      getting quite unwieldy, where every higher level method to load weights
      was a long conditional to cover all the different quantizers.
      
      This change moves loading of quantized weights out of the `Weights`
      class. This is done by defining a simple `WeightsLoader` interface
      that is implemented by `Exl2WeightsLoader`, `GPTQWeightsLoader`,
      and `MarlinWeightsLoader`. These implementations are in the quantizers'
      respective modules. The `Weights` class provides the low-level load
      operations (such as loading tensors or sharded tensors), but delegates
      loads that need quantizer-specific weight processing to a loader. The
      loaders still use the low-level functionality provided by `Weights`.
      
      I initially tried making a hierarchy where a class like `GPTQWeights`
      would inherit from `Weights`. But it is not very flexible (e.g. does
      not work well with the new weight storage mock used in tests) and
      the implicit indirections made the code harder to follow.
      8511669c
  8. 08 Jul, 2024 5 commits
  9. 05 Jul, 2024 5 commits
  10. 02 Jul, 2024 5 commits
  11. 01 Jul, 2024 6 commits
  12. 27 Jun, 2024 2 commits
  13. 25 Jun, 2024 2 commits
    • Daniël de Kok's avatar
      Add support for Marlin 2:4 sparsity (#2102) · f1f98e36
      Daniël de Kok authored
      This change adds support for 2:4 sparsity when using Marlin
      quantization. The 2:4 kernel is used when:
      
      * The quantizer is `marlin`;
      * the quantizer checkpoint format is `marlin_24`.
      
      Fixes #2098.
      f1f98e36
    • Daniël de Kok's avatar
      Support AWQ quantization with bias (#2117) · 14980df2
      Daniël de Kok authored
      When the AWQ quantizer was used with a layer that uses a bias,
      the bias tensor was not correctly passed/used. Instead, the
      value `true`/`1.0` was added to the linear transformation.
      
      Correctly pass through the bias when it is not `None`.
      
      Fixes #2106.
      14980df2