"ml/vscode:/vscode.git/clone" did not exist on "1108d8b34e43e968812eded0ccda73503ccad77d"
  1. 20 Nov, 2024 1 commit
  2. 10 Nov, 2024 1 commit
    • Daniël de Kok's avatar
      Add initial support for compressed-tensors checkpoints (#2732) · a7850008
      Daniël de Kok authored
      compressed-tensors is a safetensors extension for sparse, quantized
      tensors. The format is more powerful than earlier AWQ/GPTQ/FP8
      quantization, because
      
      - Different quantizer configurations can be used for different targets.
      - The format can specify input/output quantizers in addition to weight
        quantizers.
      - Configurable exclusions for quantization.
      
      This change adds a dependency on the `compressed-tensors` package for
      its configuration parsing and layer matching functionality.
      
      The following types of quantization are supported in this PR:
      
      - W8A16 and W4A16 INT using GPTQ-Marlin kernels.
      - W8A8 and W8A16 FP using FP8-Marlin and cutlass kernels.
      
      Support for other quantization types will be added in subsequent PRs.
      a7850008
  3. 16 Oct, 2024 1 commit
    • Mohit Sharma's avatar
      Fp8 e4m3_fnuz support for rocm (#2588) · 704a58c8
      Mohit Sharma authored
      * (feat) fp8 fnuz support for rocm
      
      * (review comments) Fix compression_config load, type hints
      
      * (bug) update all has_tensor
      
      * (review_comments) fix typo and added comments
      
      * (nit) improved comment
      704a58c8
  4. 08 Oct, 2024 1 commit
  5. 30 Sep, 2024 1 commit
  6. 31 Jul, 2024 1 commit
    • Daniël de Kok's avatar
      Handle GPTQ-Marlin loading in `GPTQMarlinWeightLoader` (#2300) · 34f7dcfd
      Daniël de Kok authored
      The `GPTWeightLoader` was structured like this in pseudocode:
      
      if marlin:
        Set up tensors in a way that GPTQ-Marlin expects
      else:
        Set up tensors in a way that ExLlama/GPTQ/AWQ expect
      
      However, the GPT-Marlin implementation details should really be in the
      `marlin` module. So move the former part out to a separate
      `GPTQMarlinWeightsLoader`.
      34f7dcfd
  7. 29 Jul, 2024 1 commit
  8. 26 Jul, 2024 1 commit
    • drbh's avatar
      feat: add ruff and resolve issue (#2262) · bab02ff2
      drbh authored
      * feat: add ruff and resolve issue
      
      * fix: update client exports and adjust after rebase
      
      * fix: adjust syntax to avoid circular import
      
      * fix: adjust client ruff settings
      
      * fix: lint and refactor import check and avoid model enum as global names
      
      * fix: improve fbgemm_gpu check and lints
      
      * fix: update lints
      
      * fix: prefer comparing model enum over str
      
      * fix: adjust lints and ignore specific rules
      
      * fix: avoid unneeded quantize check
      bab02ff2
  9. 24 Jul, 2024 1 commit