[quantization] Fix scale remapping for mllama4 (#10042)

Co-authored-by: HAI <hixiao@gmail.com>

[quantization] Fix scale remapping for mllama4 (#10042)
Co-authored-by: HAI <hixiao@gmail.com>
c7a104c1 · Bowen Bao · GitHub · 97d966a7 · c7a104c1
Unverified Commit c7a104c1 authored Oct 05, 2025 by Bowen Bao Committed by GitHub Oct 05, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

python/sglang/srt/models/mllama4.py python/sglang/srt/models/mllama4.py +1 -1

No files found.
--- a/python/sglang/srt/models/mllama4.py
+++ b/python/sglang/srt/models/mllama4.py
@@ -700,7 +700,7 @@ class Llama4ForConditionalGeneration(nn.Module):
        """Handle scale parameter remapping. Returns True if handled."""
        if "scale" in name and "expert" not in name:
            remapped_name = maybe_remap_kv_scale_name(name, params_dict)
-            return remapped_name is None
+            return remapped_name is not None and remapped_name != name
        return False

    def _handle_stacked_params(