[MTP][GLM][Bugfix] Fixed .weight_scale loading logic that dropped MTP...

[MTP][GLM][Bugfix] Fixed .weight_scale loading logic that dropped MTP prediction accuracy with fp8+mtp (#32101) Signed-off-by: Andy Liu <andyliu@roblox.com>

[MTP][GLM][Bugfix] Fixed .weight_scale loading logic that dropped MTP...
[MTP][GLM][Bugfix] Fixed .weight_scale loading logic that dropped MTP prediction accuracy with fp8+mtp (#32101) Signed-off-by: Andy Liu <andyliu@roblox.com>
0dd63639 · Andy Liu · GitHub · ef96fa3f · 0dd63639
Unverified Commit 0dd63639 authored Jan 10, 2026 by Andy Liu Committed by GitHub Jan 10, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 5 deletions

vllm/model_executor/models/glm4_moe_mtp.py vllm/model_executor/models/glm4_moe_mtp.py +6 -5

No files found.
--- a/vllm/model_executor/models/glm4_moe_mtp.py
+++ b/vllm/model_executor/models/glm4_moe_mtp.py
@@ -268,11 +268,6 @@ class Glm4MoeMTP(nn.Module, SupportsPP, Glm4MixtureOfExperts):
                if spec_layer is None:
                    continue
                name = self._rewrite_spec_layer_name(spec_layer, name)
-            # Some checkpoints include weight scale tensors for the LM head even
-            # when the quantized head isn't built. Skip them if the model does
-            # not expose a matching parameter to avoid KeyError during load.
-            if name.endswith(".weight_scale") and name not in params_dict:
-                continue
            for param_name, weight_name, shard_id in stacked_params_mapping:
                # Skip non-stacked layers and experts (experts handled below).
                if weight_name not in name:
@@ -315,6 +310,12 @@ class Glm4MoeMTP(nn.Module, SupportsPP, Glm4MixtureOfExperts):
                    # Skip loading extra bias for GPTQ models.
                    if name.endswith(".bias") and name not in params_dict:
                        continue
+                    # Some checkpoints include weight scale tensors for the
+                    # LM head even when the quantized head isn't built. Skip
+                    # them if the model does not expose a matching parameter
+                    # to avoid KeyError during load.
+                    if name.endswith(".weight_scale") and name not in params_dict:
+                        continue

                    # According to DeepSeek-V3 Technical Report, MTP modules
                    # shares embedding layer. We only load the first weights.