"ml/vscode:/vscode.git/clone" did not exist on "1108d8b34e43e968812eded0ccda73503ccad77d"
- 20 Nov, 2024 1 commit
-
-
Daniël de Kok authored
This change adds support for wNa16 int checkpoints with 2:4 sparsity using Marlin 2:4 kernels.
-
- 10 Nov, 2024 1 commit
-
-
Daniël de Kok authored
compressed-tensors is a safetensors extension for sparse, quantized tensors. The format is more powerful than earlier AWQ/GPTQ/FP8 quantization, because - Different quantizer configurations can be used for different targets. - The format can specify input/output quantizers in addition to weight quantizers. - Configurable exclusions for quantization. This change adds a dependency on the `compressed-tensors` package for its configuration parsing and layer matching functionality. The following types of quantization are supported in this PR: - W8A16 and W4A16 INT using GPTQ-Marlin kernels. - W8A8 and W8A16 FP using FP8-Marlin and cutlass kernels. Support for other quantization types will be added in subsequent PRs.
-
- 16 Oct, 2024 1 commit
-
-
Mohit Sharma authored
* (feat) fp8 fnuz support for rocm * (review comments) Fix compression_config load, type hints * (bug) update all has_tensor * (review_comments) fix typo and added comments * (nit) improved comment
-
- 08 Oct, 2024 1 commit
-
-
Daniël de Kok authored
* Add support for fused MoE Marlin for AWQ This uses the updated MoE Marlin kernels from vLLM. * Add integration test for AWQ MoE
-
- 30 Sep, 2024 1 commit
-
-
Daniël de Kok authored
This change uses the updated Marlin MoE kernel from vLLM to support MoE with activation sorting and groups.
-
- 31 Jul, 2024 1 commit
-
-
Daniël de Kok authored
The `GPTWeightLoader` was structured like this in pseudocode: if marlin: Set up tensors in a way that GPTQ-Marlin expects else: Set up tensors in a way that ExLlama/GPTQ/AWQ expect However, the GPT-Marlin implementation details should really be in the `marlin` module. So move the former part out to a separate `GPTQMarlinWeightsLoader`.
-
- 29 Jul, 2024 1 commit
-
-
Daniël de Kok authored
-
- 26 Jul, 2024 1 commit
-
-
drbh authored
* feat: add ruff and resolve issue * fix: update client exports and adjust after rebase * fix: adjust syntax to avoid circular import * fix: adjust client ruff settings * fix: lint and refactor import check and avoid model enum as global names * fix: improve fbgemm_gpu check and lints * fix: update lints * fix: prefer comparing model enum over str * fix: adjust lints and ignore specific rules * fix: avoid unneeded quantize check
-
- 24 Jul, 2024 1 commit
-
-
Daniël de Kok authored
The marlin.py file was getting large, split it up.
-