[BugFix][TritonMLA] Process weights after model loading for GGUF (#14555)

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

[BugFix][TritonMLA] Process weights after model loading for GGUF (#14555)
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
128bf752 · TY-AMD · GitHub · a94a699c · 128bf752
Unverified Commit 128bf752 authored Mar 13, 2025 by TY-AMD Committed by GitHub Mar 12, 2025
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 1 deletion

vllm/model_executor/model_loader/loader.py vllm/model_executor/model_loader/loader.py +4 -1

No files found.
--- a/vllm/model_executor/model_loader/loader.py
+++ b/vllm/model_executor/model_loader/loader.py
@@ -1330,11 +1330,14 @@ class GGUFModelLoader(BaseModelLoader):
                local_model_path, gguf_weights_map):
            model_config.hf_config.update({"tie_word_embeddings": True})

+        target_device = torch.device(device_config.device)
        with set_default_torch_dtype(model_config.dtype):
-            with torch.device(device_config.device):
+            with target_device:
                model = _initialize_model(vllm_config=vllm_config)
            model.load_weights(
                self._get_weights_iterator(local_model_path, gguf_weights_map))
+
+            _process_weights_after_loading(model, model_config, target_device)
        return model