"git@developer.sourcefind.cn:yangql/googletest.git" did not exist on "4bab34d2084259cba67f3bfb51217c10d606e175"
Unverified Commit a961ebd4 authored by one's avatar one Committed by GitHub
Browse files

Benchmark: Update overlap and sharding matmul benchmarks (#19)

- Enable `computation-communication-overlap` and `sharding-matmul` in
some configs through the existing PyTorch distributed mode.
- Use `torchrun --standalone` for single-node `torch.distributed` runs
to avoid fixed rendezvous port conflicts on 29500.
- Update runner command-generation test expectation for the new
single-node torchrun behavior.
parent c77bfe36
......@@ -535,3 +535,7 @@ superbench:
- mobilenet_v2
precision: float16
batch_size: 1
computation-communication-overlap:
<<: *default_pytorch_mode
sharding-matmul:
<<: *default_pytorch_mode
......@@ -424,3 +424,7 @@ superbench:
- mobilenet_v2
precision: float16
batch_size: 1
computation-communication-overlap:
<<: *default_pytorch_mode
sharding-matmul:
<<: *default_pytorch_mode
......@@ -176,7 +176,7 @@ def __get_mode_command(self, benchmark_name, mode, timeout=None):
# TODO: replace with torch.distributed.run in v1.9
# TODO: only supports node_num=1 and node_num=all currently
torch_dist_params = (
'' if 'node_num' in mode and mode.node_num == 1 else
'--standalone ' if 'node_num' in mode and mode.node_num == 1 else
'--nnodes=$NNODES --node_rank=$NODE_RANK --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT '
)
......
......@@ -142,7 +142,7 @@ def test_get_mode_command(self):
},
'expected_command': (
'torchrun '
'--no_python --nproc_per_node=8 '
'--no_python --nproc_per_node=8 --standalone '
f'sb exec --output-dir {self.sb_output_dir} -c sb.config.yaml -C superbench.enable=foo '
'superbench.benchmarks.foo.parameters.distributed_impl=ddp '
'superbench.benchmarks.foo.parameters.distributed_backend=nccl'
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment