Unverified Commit c17610e2 authored by Michael Goin's avatar Michael Goin Committed by GitHub
Browse files

[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 (#29339)


Signed-off-by: default avatarmgoin <mgoin64@gmail.com>
parent 71df2a57
...@@ -132,12 +132,15 @@ def get_mxfp4_backend(with_lora_support: bool) -> Mxfp4Backend: ...@@ -132,12 +132,15 @@ def get_mxfp4_backend(with_lora_support: bool) -> Mxfp4Backend:
) )
# If FlashInfer is not available, try either Marlin or Triton # If FlashInfer is not available, try either Marlin or Triton
if ( triton_kernels_supported = (
envs.VLLM_MXFP4_USE_MARLIN has_triton_kernels()
or current_platform.get_device_capability()[0] < 9 and is_torch_equal_or_newer("2.8.0")
or not has_triton_kernels() # NOTE: triton_kernels are only confirmed to work on SM90 and SM100
or not is_torch_equal_or_newer("2.8.0") # SM110 fails with this error: https://github.com/vllm-project/vllm/issues/29317
): # SM120 needs this fix: https://github.com/triton-lang/triton/pull/8498
and (9, 0) <= current_platform.get_device_capability() < (11, 0)
)
if envs.VLLM_MXFP4_USE_MARLIN or not triton_kernels_supported:
logger.info_once("Using Marlin backend") logger.info_once("Using Marlin backend")
return Mxfp4Backend.MARLIN return Mxfp4Backend.MARLIN
else: else:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment