[Bugfix] Revert MoE Triton Config Default (#12629)

SUMMARY: * previous PR for pulling in block configs also changed defaults (https://github.com/vllm-project/vllm/pull/11589/files ) for FP8 * this broke L4 MoE since there was not enough SHM for the default configuration * this reverts the non-block example to the default Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>

[Bugfix] Revert MoE Triton Config Default (#12629)
SUMMARY: * previous PR for pulling in block configs also changed defaults (https://github.com/vllm-project/vllm/pull/11589/files ) for FP8 * this broke L4 MoE since there was not enough SHM for the default configuration * this reverts the non-block example to the default Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
145c2ff6 · Robert Shaw · GitHub · 415f1947 · 145c2ff6
Unverified Commit 145c2ff6 authored Jan 31, 2025 by Robert Shaw Committed by GitHub Jan 31, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 11 additions and 30 deletions

vllm/model_executor/layers/fused_moe/fused_moe.py vllm/model_executor/layers/fused_moe/fused_moe.py +11 -30

No files found.
--- a/vllm/model_executor/layers/fused_moe/fused_moe.py
+++ b/vllm/model_executor/layers/fused_moe/fused_moe.py
@@ -660,36 +660,17 @@ def get_default_config(
    is_marlin: bool,
    block_shape: Optional[List[int]] = None,
 ) -> Dict[str, int]:
-    if dtype == "fp8_w8a8":
+    if dtype == "fp8_w8a8" and block_shape is not None:
-        if block_shape is None:
+        # Block-wise quant: BLOCK_SIZE_N must be divisible by block_shape[0]
-            config = {
+        # BLOCK_SIZE_K must be divisible by block_shape[1]
-                "BLOCK_SIZE_M": 128,
+        config = {
-                "BLOCK_SIZE_N": 256,
+            "BLOCK_SIZE_M": 64,
-                "BLOCK_SIZE_K": 128,
+            "BLOCK_SIZE_N": block_shape[0],
-                "GROUP_SIZE_M": 32,
+            "BLOCK_SIZE_K": block_shape[1],
-                "num_warps": 8,
+            "GROUP_SIZE_M": 32,
-                "num_stages": 4,
+            "num_warps": 4,
-            }
+            "num_stages": 3,
-            if M <= E:
+        }
-                config = {
-                    "BLOCK_SIZE_M": 64,
-                    "BLOCK_SIZE_N": 128,
-                    "BLOCK_SIZE_K": 128,
-                    "GROUP_SIZE_M": 1,
-                    "num_warps": 4,
-                    "num_stages": 4,
-                }
-        else:
-            # Block-wise quant: BLOCK_SIZE_N must be divisible by block_shape[0]
-            # BLOCK_SIZE_K must be divisible by block_shape[1]
-            config = {
-                "BLOCK_SIZE_M": 64,
-                "BLOCK_SIZE_N": block_shape[0],
-                "BLOCK_SIZE_K": block_shape[1],
-                "GROUP_SIZE_M": 32,
-                "num_warps": 4,
-                "num_stages": 3,
-            }
    else:
        config = {
            "BLOCK_SIZE_M": 64,