[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization (#30310)

Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>

[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization (#30310)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
fdc135d7 · Tsukasa OI · GitHub · 4fa7ce46 · fdc135d7
Unverified Commit fdc135d7 authored Dec 13, 2025 by Tsukasa OI Committed by GitHub Dec 13, 2025
Show whitespace changes
Inline Side-by-side

Showing with 6 additions and 2 deletions

vllm/model_executor/layers/fused_moe/layer.py vllm/model_executor/layers/fused_moe/layer.py +6 -2

No files found.
--- a/vllm/model_executor/layers/fused_moe/layer.py
+++ b/vllm/model_executor/layers/fused_moe/layer.py
@@ -1200,10 +1200,14 @@ class FusedMoE(CustomOp):
        if full_load:
            shard_dim += 1
-        # Materialize GGUF UninitializedParameter
+        # Materialize GGUF UninitializedParameter accounting merged weights
        if is_gguf_weight and isinstance(param, UninitializedParameter):
+            # To materialize a tensor, we must have full shape including
+            # number of experts, making this portion to require `full_load`.
+            assert full_load
            final_shape = list(loaded_weight.shape)
-            if shard_id in ["w1", "w3"]:
+            # w1 and w3 are merged per expert.
+            if shard_id in {"w1", "w3"}:
                final_shape[1] *= 2
            final_shape[shard_dim] = final_shape[shard_dim] // self.tp_size
            param.materialize(final_shape, dtype=loaded_weight.dtype)