[ROCM] Enable CompressedTensorsWNA16 (#27187)

Signed-off-by: JartX <sagformas@epdcenter.es>

[ROCM] Enable CompressedTensorsWNA16 (#27187)
Signed-off-by: JartX <sagformas@epdcenter.es>
ba09652d · JartX · GitHub · bd66b852 · ba09652d
Unverified Commit ba09652d authored Oct 21, 2025 by JartX Committed by GitHub Oct 21, 2025
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 1 deletion

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py ...quantization/compressed_tensors/compressed_tensors_moe.py +4 -1

No files found.
--- a/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
+++ b/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
@@ -142,7 +142,10 @@ class CompressedTensorsMoEMethod(FusedMoEMethodBase):
            # group_size=None means channelwise
            group_size = weight_quant.group_size or -1
            # Prefer to use the MarlinMoE kernel when it is supported.
-            if not check_moe_marlin_supports_layer(layer, group_size):
+            if (
+                not check_moe_marlin_supports_layer(layer, group_size)
+                or current_platform.is_rocm()
+            ):
                if (
                    weight_quant.strategy == QuantizationStrategy.GROUP
                    and weight_quant.actorder