[Bugfix] Reject channelwise quantization (group_size <= 0) in ExllamaLinearKernel (#37331)

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

[Bugfix] Reject channelwise quantization (group_size <= 0) in ExllamaLinearKernel (#37331)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
56a62c31 · Matthias Gehre · GitHub · 1779c098 · 56a62c31
Unverified Commit 56a62c31 authored Mar 20, 2026 by Matthias Gehre Committed by GitHub Mar 20, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 0 deletions

vllm/model_executor/kernels/linear/mixed_precision/exllama.py .../model_executor/kernels/linear/mixed_precision/exllama.py +7 -0

No files found.
--- a/vllm/model_executor/kernels/linear/mixed_precision/exllama.py
+++ b/vllm/model_executor/kernels/linear/mixed_precision/exllama.py
@@ -59,6 +59,13 @@ class ExllamaLinearKernel(MPLinearKernel):
                f"{cls.SUPPORTED_QUANT_TYPES}",
            )

+        if c.group_size <= 0:
+            return (
+                False,
+                f"Group size ({c.group_size}) must be positive, "
+                "Exllama does not support channelwise quantization",
+            )
+
        if c.full_weight_shape[0] % c.group_size != 0:
            return (
                False,