Commits · a961ebd4cf920ddcaacd89553311eeda40a72b1b · tsoc / superbenchmark

24 Apr, 2026 2 commits

Benchmark: Update overlap and sharding matmul benchmarks (#19) · a961ebd4

one authored Apr 24, 2026

- Enable `computation-communication-overlap` and `sharding-matmul` in
some configs through the existing PyTorch distributed mode.
- Use `torchrun --standalone` for single-node `torch.distributed` runs
to avoid fixed rendezvous port conflicts on 29500.
- Update runner command-generation test expectation for the new
single-node torchrun behavior.

a961ebd4

Benchmark: Update ort-inference for ROCm platform (#18) · c77bfe36

one authored Apr 24, 2026

* Support rocm in ort-inference

* Add tests

* Update dockerfiles for docker 18

* Install onnx, add params to ort-inference

* Update docs

c77bfe36

23 Apr, 2026 2 commits

Config: Update dist-inference params (#17) · 52450b5f
one authored Apr 23, 2026

52450b5f

Benchmarks: Add gpu-hpl and gpu-hpl-mxp micro benchmarks (#15) · 4fa10f4d

one authored Apr 23, 2026

Add gpu-hpl and gpu-hpl-mxp micro benchmarks backed by rocHPL and rocHPL-MxP.

Implemented a shared GPU HPL base that:
- Generates per-workload HPL dat files and parses the corresponding output files.
- Supports common HPL inputs such as process grid, matrix size, block size, broadcast topology, warmup, iterations, and reduce operator.
- Adds rocHPL-specific tuning parameters for gpu-hpl.
- Formats metric keys from input-derived workload attributes.
- Reports `flops`, `time`, and `tests_pass` metrics with warmup-aware aggregation.

Add benchmark registrations, parser tests, sample output fixtures, documentation, and recommended configurations for gpu-hpl and gpu-hpl-mxp.

Update rocHPL and rocHPL-MxP third-party integration with build patches, install targets, and SuperBench run helper scripts.

Also update gpu-hpcg metric naming to use flops instead of gflops, remove standalone domain/verification-style metrics from the documented metric surface, and refresh Hygon HPCG documentation/config references accordingly.

4fa10f4d

21 Apr, 2026 2 commits

Config: Update config files (#7) · 511807b7
one authored Apr 21, 2026
```
- Add BW150 config
- Update BW1000 config
- Merge summary rules
```
511807b7

Runner: Add local numactl GPU affinity support (#6) · 0993db75

one authored Apr 21, 2026

- Add `numactl` support for local runner modes, including `cpunodebind`, `membind`, and `physcpubind`.
- Add `gpu_affinity` resolution through `sb node topo --get gpu-numa-affinity --gpu-id`.
- Add `sb node topo` support for GPU NUMA topology queries.
- Update BW1000 config to use the new local `numactl` semantics.
- Document the new `numactl` mode fields and limitations.

0993db75

20 Apr, 2026 1 commit
- Update mem-bw to use BandwidthTest (#5) · 800b962a
  one authored Apr 20, 2026
```
* Update mem-bw to use BandwidthTest

* Update config and format code
```
  800b962a
02 Apr, 2026 1 commit
- Add bw1000 config files (beta) · 49a4389b
  one authored Apr 02, 2026
  
  49a4389b