[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break...

[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices (#20822) Signed-off-by: jiang1.li <jiang1.li@intel.com>

[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break...
[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices (#20822) Signed-off-by: jiang1.li <jiang1.li@intel.com>
b1235c3e · Li, Jiang · GitHub · 44d02f54 · b1235c3e
Unverified Commit b1235c3e authored Jul 12, 2025 by Li, Jiang Committed by GitHub Jul 11, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

vllm/model_executor/layers/quantization/bitsandbytes.py vllm/model_executor/layers/quantization/bitsandbytes.py +1 -1

No files found.
--- a/vllm/model_executor/layers/quantization/bitsandbytes.py
+++ b/vllm/model_executor/layers/quantization/bitsandbytes.py
@@ -5,7 +5,6 @@ from typing import Any, Callable, Optional, Union
 import torch
-from vllm.model_executor.layers.fused_moe import fused_experts
 from vllm.model_executor.layers.fused_moe.layer import (FusedMoE,
                                                        FusedMoEMethodBase)
 from vllm.model_executor.layers.linear import (LinearBase, LinearMethodBase,
@@ -467,6 +466,7 @@ class BitsAndBytesMoEMethod(FusedMoEMethodBase):
        logical_to_physical_map: Optional[torch.Tensor] = None,
        logical_replica_count: Optional[torch.Tensor] = None,
    ) -> torch.Tensor:
+        from vllm.model_executor.layers.fused_moe import fused_experts
        if enable_eplb:
            raise NotImplementedError(