[TPU] Fix dummy loading OOM (#16372)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

[TPU] Fix dummy loading OOM (#16372)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
1621b252 · Chengji Yao · GitHub · a5647971 · 1621b252
Unverified Commit 1621b252 authored Apr 09, 2025 by Chengji Yao Committed by GitHub Apr 10, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 15 additions and 2 deletions

vllm/model_executor/model_loader/weight_utils.py vllm/model_executor/model_loader/weight_utils.py +15 -2

No files found.
--- a/vllm/model_executor/model_loader/weight_utils.py
+++ b/vllm/model_executor/model_loader/weight_utils.py
@@ -658,8 +658,21 @@ def initialize_dummy_weights(
    for param in model.state_dict().values():
        if torch.is_floating_point(param):
            if current_platform.is_tpu():
-                # XLA device does not support torch.Generator()
-                param.uniform_(low, high)
+                generator = torch.Generator(device="cpu")
+                generator.manual_seed(seed)
+                # Note: The param.uniform_ function cannot be used in this
+                # context because it demands more TPU HBM than directly copying
+                # from a CPU tensor.
+                # Note: We avoid using torch.rank_like as it doesn't currently
+                # support the generator argument.
+                param.copy_((high - low) *
+                            torch.rand(*param.shape,
+                                       generator=generator,
+                                       dtype=param.dtype,
+                                       layout=param.layout,
+                                       requires_grad=param.requires_grad,
+                                       device="cpu") + low)
+                torch._sync(param)
                continue

            generator = torch.Generator(device=param.data.device)