Benchmark: Update overlap and sharding matmul benchmarks (#19)
- Enable `computation-communication-overlap` and `sharding-matmul` in some configs through the existing PyTorch distributed mode. - Use `torchrun --standalone` for single-node `torch.distributed` runs to avoid fixed rendezvous port conflicts on 29500. - Update runner command-generation test expectation for the new single-node torchrun behavior.
Showing
Please register or sign in to comment