[bugfix] OSS no reduce loss (#133)

* bugfix * adjust default non-regression loss, not all_reduced now

[bugfix] OSS no reduce loss (#133)
* bugfix * adjust default non-regression loss, not all_reduced now
177151e0 · Benjamin Lefaudeux · GitHub · 5220f89b · 177151e0 · 177151e0
Unverified Commit 177151e0 authored Oct 09, 2020 by Benjamin Lefaudeux Committed by GitHub Oct 09, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 5 deletions

.circleci/config.yml .circleci/config.yml +1 -1

benchmarks/oss.py benchmarks/oss.py +0 -2

docs/source/tutorials/oss.rst docs/source/tutorials/oss.rst +0 -2

No files found.
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -100,7 +100,7 @@ run_oss_benchmark: &run_oss_benchmark
  - run:
      name: Run OSS Benchmark
      command: |
-        python benchmarks/oss.py --check_regression --world_size 4 --reference_speed 13.7 --reference_memory 4390 --reference_loss 0.595
+        python benchmarks/oss.py --check_regression --world_size 4 --reference_speed 13.7 --reference_memory 4390 --reference_loss 0.152
 run_oss_gloo: &run_oss_gloo
 - run:

--- a/benchmarks/oss.py
+++ b/benchmarks/oss.py
@@ -124,8 +124,6 @@ def train(
                loss /= world_size
                loss.backward()
-                dist.all_reduce(loss, op=dist.ReduceOp.SUM)
                if use_sdp:
                    ddp.reduce()  # Send the gradients to the appropriate shards

--- a/docs/source/tutorials/oss.rst
+++ b/docs/source/tutorials/oss.rst
@@ -42,7 +42,6 @@ Let's suppose that your trainer looks like
                loss = loss_fn(outputs, target)
                loss /= world_size
                loss.backward()
-                torch.distributed.all_reduce(loss, op=torch.distributed.ReduceOp.SUM)
                optimizer.step()
@@ -90,7 +89,6 @@ Then sharding the optimizer state is merely a matter of wrapping your optimizer
                loss = loss_fn(outputs, target)
                loss /= world_size
                loss.backward()
-                torch.distributed.all_reduce(loss, op=torch.distributed.ReduceOp.SUM)
                optimizer.step()