"docs/features/quantization/quantized_kvcache.md" did not exist on "e97f802b2d74861af77997691a7d1c36498f6dca"
Unverified Commit 9a5e9652 authored by Xin Yang's avatar Xin Yang Committed by GitHub
Browse files

[LoRA] Set default MXFP4 LoRA backend to Marlin (#30598)


Signed-off-by: default avatarXin Yang <xyangx@amazon.com>
Co-authored-by: default avatarCyrus Leung <tlleungac@connect.ust.hk>
parent 326e7c31
...@@ -95,12 +95,12 @@ def get_mxfp4_backend_with_lora() -> Mxfp4Backend: ...@@ -95,12 +95,12 @@ def get_mxfp4_backend_with_lora() -> Mxfp4Backend:
# SM120 needs this fix: https://github.com/triton-lang/triton/pull/8498 # SM120 needs this fix: https://github.com/triton-lang/triton/pull/8498
and (9, 0) <= current_platform.get_device_capability() < (11, 0) and (9, 0) <= current_platform.get_device_capability() < (11, 0)
) )
if envs.VLLM_MXFP4_USE_MARLIN or not triton_kernels_supported: if envs.VLLM_MXFP4_USE_MARLIN is False and triton_kernels_supported:
logger.info_once("[get_mxfp4_backend_with_lora] Using Marlin backend") logger.info_once("[get_mxfp4_backend_with_lora] Using Triton backend")
return Mxfp4Backend.MARLIN return Mxfp4Backend.TRITON
logger.info_once("[get_mxfp4_backend_with_lora] Using Triton backend") logger.info_once("[get_mxfp4_backend_with_lora] Using Marlin backend")
return Mxfp4Backend.TRITON return Mxfp4Backend.MARLIN
def get_mxfp4_backend(with_lora_support: bool) -> Mxfp4Backend: def get_mxfp4_backend(with_lora_support: bool) -> Mxfp4Backend:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment