Refactor 2 awq gemm kernels into m16nXk32 (#2723)

Co-authored-by: Chunan Zeng <chunanzeng@Chunans-Air.attlocal.net>

Refactor 2 awq gemm kernels into m16nXk32 (#2723)
Co-authored-by: Chunan Zeng <chunanzeng@Chunans-Air.attlocal.net>
56383649 · Rex · GitHub · 4ca2c358 · 56383649 · 56383649
Unverified Commit 56383649 authored Feb 12, 2024 by Rex Committed by GitHub Feb 12, 2024
Showing with 73 additions and 295 deletions

csrc/quantization/awq/gemm_kernels.cu csrc/quantization/awq/gemm_kernels.cu +72 -294

vllm/model_executor/layers/quantization/awq.py vllm/model_executor/layers/quantization/awq.py +1 -1

No files found.
--- a/csrc/quantization/awq/gemm_kernels.cu
+++ b/csrc/quantization/awq/gemm_kernels.cu
--- a/vllm/model_executor/layers/quantization/awq.py
+++ b/vllm/model_executor/layers/quantization/awq.py
@@ -145,8 +145,8 @@ class AWQLinearMethod(LinearMethodBase):
                      x: torch.Tensor,
                      bias: Optional[torch.Tensor] = None) -> torch.Tensor:
        qweight = weights["qweight"]
-        qzeros = weights["qzeros"]
        scales = weights["scales"]
+        qzeros = weights["qzeros"]
        pack_factor = self.quant_config.pack_factor
        out_shape = (x.shape[:-1] + (qweight.shape[-1] * pack_factor, ))
        reshaped_x = x.reshape(-1, x.shape[-1])