Change warning logs to debug for unimplemented MXFP4 Linear/Attention (#29441)

Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

Change warning logs to debug for unimplemented MXFP4 Linear/Attention (#29441)
Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
7df02897 · Michael Goin · GitHub · 0abc7948 · 7df02897
Unverified Commit 7df02897 authored Nov 25, 2025 by Michael Goin Committed by GitHub Nov 25, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 4 deletions

vllm/model_executor/layers/quantization/mxfp4.py vllm/model_executor/layers/quantization/mxfp4.py +6 -4

No files found.
--- a/vllm/model_executor/layers/quantization/mxfp4.py
+++ b/vllm/model_executor/layers/quantization/mxfp4.py
@@ -196,9 +196,10 @@ class Mxfp4Config(QuantizationConfig):
            # TODO: Add support for MXFP4 Linear Method.
            # MXFP4 LinearMethod is available in AMD-Quark, refer to that implementation
            # if you are interested in enabling MXFP4 here.
-            logger.warning_once(
+            logger.debug_once(
                "MXFP4 linear layer is not implemented - falling back to "
-                "UnquantizedLinearMethod."
+                "UnquantizedLinearMethod.",
+                scope="local",
            )
            return UnquantizedLinearMethod()
        elif isinstance(layer, FusedMoE):
@@ -208,9 +209,10 @@ class Mxfp4Config(QuantizationConfig):
                return Mxfp4MoEMethod(layer.moe_config)
        elif isinstance(layer, Attention):
            # TODO: Add support for MXFP4 Attention.
-            logger.warning_once(
+            logger.debug_once(
                "MXFP4 attention layer is not implemented. "
-                "Skipping quantization for this layer."
+                "Skipping quantization for this layer.",
+                scope="local",
            )
        return None