Benchmark: Update overlap and sharding matmul benchmarks (#19)

- Enable `computation-communication-overlap` and `sharding-matmul` in some configs through the existing PyTorch distributed mode. - Use `torchrun --standalone` for single-node `torch.distributed` runs to avoid fixed rendezvous port conflicts on 29500. - Update runner command-generation test expectation for the new single-node torchrun behavior.

Benchmark: Update overlap and sharding matmul benchmarks (#19)
- Enable `computation-communication-overlap` and `sharding-matmul` in some configs through the existing PyTorch distributed mode. - Use `torchrun --standalone` for single-node `torch.distributed` runs to avoid fixed rendezvous port conflicts on 29500. - Update runner command-generation test expectation for the new single-node torchrun behavior.
a961ebd4 · one · GitHub · c77bfe36 · a961ebd4 · a961ebd4
Unverified Commit a961ebd4 authored Apr 24, 2026 by one Committed by GitHub Apr 24, 2026
4 changed files
--- a/superbench/config/hygon_bw1000.yaml
+++ b/superbench/config/hygon_bw1000.yaml
@@ -535,3 +535,7 @@ superbench:
          - mobilenet_v2
        precision: float16
        batch_size: 1
+    computation-communication-overlap:
+      <<: *default_pytorch_mode
+    sharding-matmul:
+      <<: *default_pytorch_mode
--- a/superbench/config/hygon_bw150.yaml
+++ b/superbench/config/hygon_bw150.yaml
@@ -424,3 +424,7 @@ superbench:
          - mobilenet_v2
        precision: float16
        batch_size: 1
+    computation-communication-overlap:
+      <<: *default_pytorch_mode
+    sharding-matmul:
+      <<: *default_pytorch_mode
--- a/superbench/runner/runner.py
+++ b/superbench/runner/runner.py
@@ -176,7 +176,7 @@ class SuperBenchRunner():
            # TODO: replace with torch.distributed.run in v1.9
            # TODO: only supports node_num=1 and node_num=all currently
            torch_dist_params = (
-                '' if 'node_num' in mode and mode.node_num == 1 else
+                '--standalone ' if 'node_num' in mode and mode.node_num == 1 else
                '--nnodes=$NNODES --node_rank=$NODE_RANK --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT '
            )

--- a/tests/runner/test_runner.py
+++ b/tests/runner/test_runner.py
@@ -142,7 +142,7 @@ class RunnerTestCase(unittest.TestCase):
                },
                'expected_command': (
                    'torchrun '
-                    '--no_python --nproc_per_node=8 '
+                    '--no_python --nproc_per_node=8 --standalone '
                    f'sb exec --output-dir {self.sb_output_dir} -c sb.config.yaml -C superbench.enable=foo '
                    'superbench.benchmarks.foo.parameters.distributed_impl=ddp '
                    'superbench.benchmarks.foo.parameters.distributed_backend=nccl'