- 24 Apr, 2026 2 commits
-
-
one authored
- Enable `computation-communication-overlap` and `sharding-matmul` in some configs through the existing PyTorch distributed mode. - Use `torchrun --standalone` for single-node `torch.distributed` runs to avoid fixed rendezvous port conflicts on 29500. - Update runner command-generation test expectation for the new single-node torchrun behavior.
-
one authored
* Support rocm in ort-inference * Add tests * Update dockerfiles for docker 18 * Install onnx, add params to ort-inference * Update docs
-
- 23 Apr, 2026 2 commits
-
-
one authored
-
one authored
Add gpu-hpl and gpu-hpl-mxp micro benchmarks backed by rocHPL and rocHPL-MxP. Implemented a shared GPU HPL base that: - Generates per-workload HPL dat files and parses the corresponding output files. - Supports common HPL inputs such as process grid, matrix size, block size, broadcast topology, warmup, iterations, and reduce operator. - Adds rocHPL-specific tuning parameters for gpu-hpl. - Formats metric keys from input-derived workload attributes. - Reports `flops`, `time`, and `tests_pass` metrics with warmup-aware aggregation. Add benchmark registrations, parser tests, sample output fixtures, documentation, and recommended configurations for gpu-hpl and gpu-hpl-mxp. Update rocHPL and rocHPL-MxP third-party integration with build patches, install targets, and SuperBench run helper scripts. Also update gpu-hpcg metric naming to use flops instead of gflops, remove standalone domain/verification-style metrics from the documented metric surface, and refresh Hygon HPCG documentation/config references accordingly.
-
- 21 Apr, 2026 2 commits
-
-
one authored
- Add BW150 config - Update BW1000 config - Merge summary rules
-
one authored
- Add `numactl` support for local runner modes, including `cpunodebind`, `membind`, and `physcpubind`. - Add `gpu_affinity` resolution through `sb node topo --get gpu-numa-affinity --gpu-id`. - Add `sb node topo` support for GPU NUMA topology queries. - Update BW1000 config to use the new local `numactl` semantics. - Document the new `numactl` mode fields and limitations.
-
- 20 Apr, 2026 1 commit
-
-
one authored
* Update mem-bw to use BandwidthTest * Update config and format code
-
- 02 Apr, 2026 1 commit
-
-
one authored
-