Load the state dict on CPU to prevent unnecessary GPU memory surge (#20920)

load the state dict on cpu.

Load the state dict on CPU to prevent unnecessary GPU memory surge (#20920)
load the state dict on cpu.
11c49ed2 · Harsh Trivedi · GitHub · 0b686a8a · 11c49ed2
Unverified Commit 11c49ed2 authored Dec 29, 2022 by Harsh Trivedi Committed by GitHub Dec 29, 2022
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

src/transformers/modeling_utils.py src/transformers/modeling_utils.py +1 -1

No files found.
--- a/src/transformers/modeling_utils.py
+++ b/src/transformers/modeling_utils.py
@@ -382,7 +382,7 @@ def load_sharded_checkpoint(model, folder, strict=True):
        raise RuntimeError(error_message)

    for shard_file in shard_files:
-        state_dict = torch.load(os.path.join(folder, shard_file))
+        state_dict = torch.load(os.path.join(folder, shard_file), map_location="cpu")
        model.load_state_dict(state_dict, strict=False)

        # Make sure memory is fred before we load the next state dict.