Unverified Commit 177151e0 authored by Benjamin Lefaudeux's avatar Benjamin Lefaudeux Committed by GitHub
Browse files

[bugfix] OSS no reduce loss (#133)

* bugfix
* adjust default non-regression loss, not all_reduced now
parent 5220f89b
...@@ -100,7 +100,7 @@ run_oss_benchmark: &run_oss_benchmark ...@@ -100,7 +100,7 @@ run_oss_benchmark: &run_oss_benchmark
- run: - run:
name: Run OSS Benchmark name: Run OSS Benchmark
command: | command: |
python benchmarks/oss.py --check_regression --world_size 4 --reference_speed 13.7 --reference_memory 4390 --reference_loss 0.595 python benchmarks/oss.py --check_regression --world_size 4 --reference_speed 13.7 --reference_memory 4390 --reference_loss 0.152
run_oss_gloo: &run_oss_gloo run_oss_gloo: &run_oss_gloo
- run: - run:
......
...@@ -124,8 +124,6 @@ def train( ...@@ -124,8 +124,6 @@ def train(
loss /= world_size loss /= world_size
loss.backward() loss.backward()
dist.all_reduce(loss, op=dist.ReduceOp.SUM)
if use_sdp: if use_sdp:
ddp.reduce() # Send the gradients to the appropriate shards ddp.reduce() # Send the gradients to the appropriate shards
......
...@@ -42,7 +42,6 @@ Let's suppose that your trainer looks like ...@@ -42,7 +42,6 @@ Let's suppose that your trainer looks like
loss = loss_fn(outputs, target) loss = loss_fn(outputs, target)
loss /= world_size loss /= world_size
loss.backward() loss.backward()
torch.distributed.all_reduce(loss, op=torch.distributed.ReduceOp.SUM)
optimizer.step() optimizer.step()
...@@ -90,7 +89,6 @@ Then sharding the optimizer state is merely a matter of wrapping your optimizer ...@@ -90,7 +89,6 @@ Then sharding the optimizer state is merely a matter of wrapping your optimizer
loss = loss_fn(outputs, target) loss = loss_fn(outputs, target)
loss /= world_size loss /= world_size
loss.backward() loss.backward()
torch.distributed.all_reduce(loss, op=torch.distributed.ReduceOp.SUM)
optimizer.step() optimizer.step()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment