[test] FSDP: add the failing test for #421 (#453)

* [test] FSDP: add the failing test for #421 * skip on 1.5 * better skipping * Update tests/nn/data_parallel/test_fsdp_grad_scaler.py Co-authored-by: Sam Shleifer <sshleifer@gmail.com> Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

[test] FSDP: add the failing test for #421 (#453)
* [test] FSDP: add the failing test for #421 * skip on 1.5 * better skipping * Update tests/nn/data_parallel/test_fsdp_grad_scaler.py Co-authored-by: Sam Shleifer <sshleifer@gmail.com> Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
5ecac15a · Min Xu · GitHub · 5c5866b3 · 5ecac15a
Unverified Commit 5ecac15a authored Mar 01, 2021 by Min Xu Committed by GitHub Mar 01, 2021
Show whitespace changes
Inline Side-by-side

Showing with 44 additions and 0 deletions

tests/nn/data_parallel/test_fsdp_grad_scaler.py tests/nn/data_parallel/test_fsdp_grad_scaler.py +44 -0

No files found.
--- a/tests/nn/data_parallel/test_fsdp_grad_scaler.py
+++ b/tests/nn/data_parallel/test_fsdp_grad_scaler.py
+import os
+from unittest import mock
+import pytest
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairscale.nn import FullyShardedDataParallel
+from fairscale.optim.grad_scaler import ShardedGradScaler
+from fairscale.utils.testing import skip_if_no_cuda
+try:
+    from torch.cuda.amp import autocast
+except ImportError:
+    # Older version doesn't support autocast. Skip this file.
+    pytestmark = pytest.mark.skip
+@mock.patch.dict(os.environ, {"MASTER_ADDR": "localhost", "MASTER_PORT": "1337"}, clear=True)
+@skip_if_no_cuda
+def test_scaler_cpu_offload_breaks():
+    device = torch.device("cuda")
+    torch.cuda.set_device(0)
+    torch.distributed.init_process_group(backend="nccl", rank=0, world_size=1)
+    scaler = ShardedGradScaler()
+    model = FullyShardedDataParallel(nn.Linear(5, 5), cpu_offload=True, mixed_precision=True)
+    optim = torch.optim.SGD(model.parameters(), lr=1e-3)
+    input = torch.rand((1, 5), dtype=torch.float).to(device)
+    optim.zero_grad()
+    with autocast():
+        output = model(input)
+        loss = F.mse_loss(input, output)
+    scaler.scale(loss).backward()
+    # TODO (Min): Need to fix. Details in issue #421.
+    with pytest.raises(RuntimeError):
+        scaler.step(optim)
+        scaler.update()
+    torch.distributed.destroy_process_group()