[Bugfix] Reject non-nvfp4 dtypes when using the flashinfer_nvlink_one_sided...

[Bugfix] Reject non-nvfp4 dtypes when using the flashinfer_nvlink_one_sided all2all backend (#39717) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

[Bugfix] Reject non-nvfp4 dtypes when using the flashinfer_nvlink_one_sided...
[Bugfix] Reject non-nvfp4 dtypes when using the flashinfer_nvlink_one_sided all2all backend (#39717) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f72b2097 · Tyler Michael Smith · GitHub · 610a3efc · f72b2097
Unverified Commit f72b2097 authored Apr 13, 2026 by Tyler Michael Smith Committed by GitHub Apr 13, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 0 deletions

vllm/model_executor/layers/fused_moe/all2all_utils.py vllm/model_executor/layers/fused_moe/all2all_utils.py +8 -0

No files found.
--- a/vllm/model_executor/layers/fused_moe/all2all_utils.py
+++ b/vllm/model_executor/layers/fused_moe/all2all_utils.py
@@ -225,6 +225,14 @@ def maybe_make_prepare_finalize(

    elif moe.use_fi_nvl_one_sided_kernels:
        assert quant_config is not None
+        if quant_config.quant_dtype != "nvfp4":
+            raise ValueError(
+                "The 'flashinfer_nvlink_one_sided' all2all backend only "
+                "supports nvfp4 activation quantization, but got "
+                f"quant_dtype={quant_config.quant_dtype!r}. Use a different "
+                "all2all backend (e.g. 'flashinfer_nvlink_two_sided' or "
+                "'allgather_reducescatter') for non-nvfp4 models."
+            )
        max_num_tokens = (
            get_current_vllm_config().scheduler_config.max_num_batched_tokens
        )