Unverified Commit a961ebd4 authored by one's avatar one Committed by GitHub
Browse files

Benchmark: Update overlap and sharding matmul benchmarks (#19)

- Enable `computation-communication-overlap` and `sharding-matmul` in
some configs through the existing PyTorch distributed mode.
- Use `torchrun --standalone` for single-node `torch.distributed` runs
to avoid fixed rendezvous port conflicts on 29500.
- Update runner command-generation test expectation for the new
single-node torchrun behavior.
parent c77bfe36
...@@ -535,3 +535,7 @@ superbench: ...@@ -535,3 +535,7 @@ superbench:
- mobilenet_v2 - mobilenet_v2
precision: float16 precision: float16
batch_size: 1 batch_size: 1
computation-communication-overlap:
<<: *default_pytorch_mode
sharding-matmul:
<<: *default_pytorch_mode
...@@ -424,3 +424,7 @@ superbench: ...@@ -424,3 +424,7 @@ superbench:
- mobilenet_v2 - mobilenet_v2
precision: float16 precision: float16
batch_size: 1 batch_size: 1
computation-communication-overlap:
<<: *default_pytorch_mode
sharding-matmul:
<<: *default_pytorch_mode
...@@ -176,7 +176,7 @@ def __get_mode_command(self, benchmark_name, mode, timeout=None): ...@@ -176,7 +176,7 @@ def __get_mode_command(self, benchmark_name, mode, timeout=None):
# TODO: replace with torch.distributed.run in v1.9 # TODO: replace with torch.distributed.run in v1.9
# TODO: only supports node_num=1 and node_num=all currently # TODO: only supports node_num=1 and node_num=all currently
torch_dist_params = ( torch_dist_params = (
'' if 'node_num' in mode and mode.node_num == 1 else '--standalone ' if 'node_num' in mode and mode.node_num == 1 else
'--nnodes=$NNODES --node_rank=$NODE_RANK --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT ' '--nnodes=$NNODES --node_rank=$NODE_RANK --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT '
) )
......
...@@ -142,7 +142,7 @@ def test_get_mode_command(self): ...@@ -142,7 +142,7 @@ def test_get_mode_command(self):
}, },
'expected_command': ( 'expected_command': (
'torchrun ' 'torchrun '
'--no_python --nproc_per_node=8 ' '--no_python --nproc_per_node=8 --standalone '
f'sb exec --output-dir {self.sb_output_dir} -c sb.config.yaml -C superbench.enable=foo ' f'sb exec --output-dir {self.sb_output_dir} -c sb.config.yaml -C superbench.enable=foo '
'superbench.benchmarks.foo.parameters.distributed_impl=ddp ' 'superbench.benchmarks.foo.parameters.distributed_impl=ddp '
'superbench.benchmarks.foo.parameters.distributed_backend=nccl' 'superbench.benchmarks.foo.parameters.distributed_backend=nccl'
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment