1. 24 Apr, 2026 2 commits
    • one's avatar
      Benchmark: Update overlap and sharding matmul benchmarks (#19) · a961ebd4
      one authored
      - Enable `computation-communication-overlap` and `sharding-matmul` in
      some configs through the existing PyTorch distributed mode.
      - Use `torchrun --standalone` for single-node `torch.distributed` runs
      to avoid fixed rendezvous port conflicts on 29500.
      - Update runner command-generation test expectation for the new
      single-node torchrun behavior.
      a961ebd4
    • one's avatar
      Benchmark: Update ort-inference for ROCm platform (#18) · c77bfe36
      one authored
      * Support rocm in ort-inference
      
      * Add tests
      
      * Update dockerfiles for docker 18
      
      * Install onnx, add params to ort-inference
      
      * Update docs
      c77bfe36
  2. 23 Apr, 2026 2 commits
    • one's avatar
      Config: Update dist-inference params (#17) · 52450b5f
      one authored
      52450b5f
    • one's avatar
      Benchmarks: Add gpu-hpl and gpu-hpl-mxp micro benchmarks (#15) · 4fa10f4d
      one authored
      Add gpu-hpl and gpu-hpl-mxp micro benchmarks backed by rocHPL and rocHPL-MxP.
      
      Implemented a shared GPU HPL base that:
      - Generates per-workload HPL dat files and parses the corresponding output files.
      - Supports common HPL inputs such as process grid, matrix size, block size, broadcast topology, warmup, iterations, and reduce operator.
      - Adds rocHPL-specific tuning parameters for gpu-hpl.
      - Formats metric keys from input-derived workload attributes.
      - Reports `flops`, `time`, and `tests_pass` metrics with warmup-aware aggregation.
      
      Add benchmark registrations, parser tests, sample output fixtures, documentation, and recommended configurations for gpu-hpl and gpu-hpl-mxp.
      
      Update rocHPL and rocHPL-MxP third-party integration with build patches, install targets, and SuperBench run helper scripts.
      
      Also update gpu-hpcg metric naming to use flops instead of gflops, remove standalone domain/verification-style metrics from the documented metric surface, and refresh Hygon HPCG documentation/config references accordingly.
      4fa10f4d
  3. 21 Apr, 2026 2 commits
    • one's avatar
      Config: Update config files (#7) · 511807b7
      one authored
      - Add BW150 config
      - Update BW1000 config
      - Merge summary rules
      511807b7
    • one's avatar
      Runner: Add local numactl GPU affinity support (#6) · 0993db75
      one authored
      - Add `numactl` support for local runner modes, including `cpunodebind`, `membind`, and `physcpubind`.
      - Add `gpu_affinity` resolution through `sb node topo --get gpu-numa-affinity --gpu-id`.
      - Add `sb node topo` support for GPU NUMA topology queries.
      - Update BW1000 config to use the new local `numactl` semantics.
      - Document the new `numactl` mode fields and limitations.
      0993db75
  4. 20 Apr, 2026 1 commit
  5. 02 Apr, 2026 1 commit