Commits · b015ceeb53e8e20af7cd43a951b7d41fa6039f8c · tsoc / superbenchmark

23 Apr, 2026 2 commits

Dockerfiles: Use different dockerfiles for pytorch/vllm base images (#16) · b015ceeb
one authored Apr 23, 2026

b015ceeb

Benchmarks: Add gpu-hpl and gpu-hpl-mxp micro benchmarks (#15) · 4fa10f4d

one authored Apr 23, 2026

Add gpu-hpl and gpu-hpl-mxp micro benchmarks backed by rocHPL and rocHPL-MxP.

Implemented a shared GPU HPL base that:
- Generates per-workload HPL dat files and parses the corresponding output files.
- Supports common HPL inputs such as process grid, matrix size, block size, broadcast topology, warmup, iterations, and reduce operator.
- Adds rocHPL-specific tuning parameters for gpu-hpl.
- Formats metric keys from input-derived workload attributes.
- Reports `flops`, `time`, and `tests_pass` metrics with warmup-aware aggregation.

Add benchmark registrations, parser tests, sample output fixtures, documentation, and recommended configurations for gpu-hpl and gpu-hpl-mxp.

Update rocHPL and rocHPL-MxP third-party integration with build patches, install targets, and SuperBench run helper scripts.

Also update gpu-hpcg metric naming to use flops instead of gflops, remove standalone domain/verification-style metrics from the documented metric surface, and refresh Hygon HPCG documentation/config references accordingly.

4fa10f4d

22 Apr, 2026 1 commit
- Merge branch 'main' into dtk · 88cd18df
  one authored Apr 22, 2026
  
  88cd18df
21 Apr, 2026 6 commits

Bugfix - gpu_stream: remove ROCm build support, require CUDA with NVML (#789) · 3c95714f

Hongtao Zhang authored Apr 21, 2026



Summary

The gpu_stream benchmark has NVIDIA-specific dependencies that prevent
it from compiling on ROCm 6.3+. This change makes it CUDA-only,
gracefully skipping the build with a warning on non-NVIDIA
  environments.

  Problem

The gpu_stream benchmark fails to compile on ROCm 6.3+ due to multiple
NVIDIA-specific dependencies:

1. nvml.h — NVIDIA Management Library header, used for querying actual
memory clock rates. No HIP equivalent. Referenced in gpu_stream.cu and
gpu_stream_utils.hpp.
2. cuda.h in headers — Three .hpp files (gpu_stream.hpp,
gpu_stream_kernels.hpp, gpu_stream_utils.hpp) directly include <cuda.h>
and <cuda_runtime.h>. These headers are not processed by hipify-perl
(only
  .cu source files are), so they fail to resolve on ROCm.
3. Deprecated hipDeviceProp_t struct fields — The code accesses
memoryBusWidth, memoryClockRate, and ECCEnabled from the device
properties struct. These fields were removed from hipDeviceProp_t in
ROCm
    6.3, causing compilation errors after hipification.

The existing ROCm path was marked as incomplete (# TODO: test for ROC)
and was never fully functional on recent ROCm versions.

  Changes

- Removed the non-functional ROCm/HIP build path from
gpu_stream/CMakeLists.txt
- When CUDA is not found, prints a warning and returns gracefully
instead of attempting a broken hipify build or raising FATAL_ERROR
- No changes to the NVIDIA/CUDA build path — it continues to work as
before

  Impact

   - NVIDIA builds: No change — gpu_stream builds and installs normally
- ROCm builds: gpu_stream is skipped with a warning message. Previously
it would fail the entire make cppbuild step, blocking the Docker image
build
- Other benchmarks: Unaffected — build.sh continues to the next
benchmark after gpu_stream returns
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>

3c95714f

CI/CD - Fix setuptools-scm 10.x compatibility for Python 3.12 (#805) · 8c7e2be0

Hongtao Zhang authored Apr 21, 2026



## Description
This affects all PRs running `python3 setup.py lint` on the Python 3.12
CI job.

## Root Cause

Comparing the last successful cpu-unit-test build (58939, Mar 25) with a
recent failing build (58996, Apr 14), the Python 3.12 "Install
dependencies" step shows:

| Package | Successful (Mar 25) | Failing (Apr 14) |
|---|---|---|
| `setuptools-scm` | < 10.0 (no `vcs-versioning` dep) | 10.0.5 (requires
`vcs-versioning`) |

`setuptools-scm` 10.0.5 was released between the two runs and added
`vcs-versioning` as a new dependency. The `setup_requires` mechanism in
`setup.py` does not install transitive dependencies, so `vcs-versioning`
is missing at runtime.

The successful build lint log (Python 3.12): "ModuleNotFoundError: No
module named 'vcs_versioning'"

This affects all PRs running `python3 setup.py lint` on the Python 3.12
CI job.

## Changes

- Add `vcs_versioning` explicitly to `setup_requires` in `setup.py` so
it is available when `setuptools-scm` is imported during `setup.py`
execution.

## Testing

Verified that `setuptools-scm` 10.0.5 declares `vcs-versioning` as a
dependency, and the CI failure matches the missing transitive dependency
pattern.
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>

8c7e2be0

Benchmarks: Update gpu-hpcg metrics to encode process and problem shape (#8) · 0a1a15ea
one authored Apr 21, 2026
```
* Update gpu-hpcg metrics to encode process and problem shape

* Fix tests
```
0a1a15ea
SysInfo: Simplify smi commands · d7a56e0b
one authored Apr 21, 2026

d7a56e0b
Config: Update config files (#7) · 511807b7
one authored Apr 21, 2026
```
- Add BW150 config
- Update BW1000 config
- Merge summary rules
```
511807b7

Runner: Add local numactl GPU affinity support (#6) · 0993db75

one authored Apr 21, 2026

- Add `numactl` support for local runner modes, including `cpunodebind`, `membind`, and `physcpubind`.
- Add `gpu_affinity` resolution through `sb node topo --get gpu-numa-affinity --gpu-id`.
- Add `sb node topo` support for GPU NUMA topology queries.
- Update BW1000 config to use the new local `numactl` semantics.
- Document the new `numactl` mode fields and limitations.

0993db75

20 Apr, 2026 3 commits
- Update mem-bw to use BandwidthTest (#5) · 800b962a
  one authored Apr 20, 2026
```
* Update mem-bw to use BandwidthTest

* Update config and format code
```
  800b962a
- Support GPU system info collection via hy-smi (#4) · 9ca5e7a9
  one authored Apr 20, 2026
```
* Support GPU system info collection via hy-smi

* Fix typos in docs
```
  9ca5e7a9
- Update perftest submodule · 6d08a565
  one authored Apr 20, 2026
  
  6d08a565
18 Apr, 2026 11 commits

Docs: Update gpu-hpcg introduction · 006f50c0
one authored Apr 18, 2026

006f50c0
Docs: Simplify gpu-hpcg metric list · f076f38f
one authored Apr 18, 2026

f076f38f
Update algolia settings · 338eb5a7
one authored Apr 18, 2026

338eb5a7
Update docs domain · 9f284593
one authored Apr 18, 2026

9f284593
Update docs site URL for custom domain · 2c19ab76
one authored Apr 18, 2026

2c19ab76

Fix some lint warnings (#3) · b31acf90

one authored Apr 18, 2026

* Fix some lint warnings
* Exclude some paths in cpplint
* Fix some tests and formatting

b31acf90

Update ci workflows · 37a70cbe
one authored Apr 18, 2026

37a70cbe
Format python code on branch dtk · 2bf01d5e
one authored Apr 18, 2026

2bf01d5e

Benchmark: Model benchmark - deterministic training support (#731) (#2) · 47d4a79d

one authored Apr 18, 2026



Adds opt-in deterministic training mode to SuperBench's PyTorch model
benchmarks. When enabled --enable-determinism. PyTorch deterministic
algorithms are enforced, and per-step numerical fingerprints (loss,
activation means) are recorded as metrics. These can be compared across
runs using the existing sb result diagnosis pipeline to verify bit-exact
reproducibility — useful for hardware validation and platform
comparison.
 
Flags added - 

--enable-determinism
--check-frequency: Number of steps after which you want the metrics to
be recorded
--deterministic-seed

Changes - 

Updated pytorch_base.py to handle deterministic settings, logging.
Added a new example script: pytorch_deterministic_example.py
Added a test file: test_pytorch_determinism_all.py to verify everything
works as expected.

Usage - 

Step 1: Run 1 - Run with --enable-determinism and the necessary metrics
will be recorded in the results-summary.jsonl file
Step 2: Generate the baseline file from the Run 1 results using - sb
result generate-baseline
Step 3: Run 2 - Run with --enable-determinism and the necessary metrics
will be recorded in the results-summary.jsonl file on a different
machine (or the same machine)
Step 4: Run diagnosis on the results generated from the 2 runs using the
- sb result diagnosis command

Note - 
1. Make sure all the parameters are constant between the 2 runs 
2. Running the diagnosis command requires the rules.yaml file

---------
Co-authored-by: Aishwarya Tonpe <aishwarya.tonpe25@gmail.com>
Co-authored-by: Ubuntu <rdadmin@HPCPLTNODE0.n3kgq4m0lhoednrx3hxtad2nha.cdmx.internal.cloudapp.net>

47d4a79d

Format python code · 8c28b69a
one authored Apr 18, 2026

8c28b69a
Runner: validate MPI bind-to option and cover configurable bind-to in tests · 655519cb
one authored Apr 18, 2026

655519cb

17 Apr, 2026 4 commits
- Improve launch bounds for gpu-copy · eea26d0d
  one authored Apr 17, 2026
  
  eea26d0d
- Update ansible playbooks to suppress warnings · 2ea51c1d
  one authored Apr 17, 2026
```
- Get ansible_user_dir from facts
- Get hostname from facts
- Update NODE_RANK expression
```
  2ea51c1d
- Merge pull request #1 from alephpiece/one/deploy-docs · ad7ae5c4
  one authored Apr 17, 2026
```
Configure GitHub Pages
```
  ad7ae5c4
- Add --container-name for custom docker container name · e1d791d2
  one authored Apr 17, 2026
  
  e1d791d2
15 Apr, 2026 1 commit
- Update GPU vendors · f57d86f4
  one authored Apr 15, 2026
  
  f57d86f4
02 Apr, 2026 9 commits
- Add bw1000 config files (beta) · 49a4389b
  one authored Apr 02, 2026
  
  49a4389b
- Update docker volumes in deploy.yaml · 53e0e494
  one authored Apr 02, 2026
  
  53e0e494
- Update dtk platform detection · 42bc5b87
  one authored Apr 02, 2026
  
  42bc5b87
- Add dtk dockerfile for docker 18 · 4599cd69
  one authored Apr 02, 2026
  
  4599cd69
- Update docs · b8b080e2
  one authored Apr 02, 2026
  
  b8b080e2
- Re-implement kernel launch · 04564997
  one authored Apr 02, 2026
  
  04564997
- Fix runner test · 05cdf5d6
  one authored Apr 02, 2026
  
  05cdf5d6
- Use env file in docker instead of /tmp · c1bc12ce
  one authored Apr 02, 2026
  
  c1bc12ce
- Add topo mapping for dtk26.04 · c128dabb
  one authored Apr 02, 2026
  
  c128dabb
01 Apr, 2026 3 commits
- Update rocHPCG metrics · e514815d
  one authored Apr 01, 2026
  
  e514815d
- Add metric sorters for RCCL tests and rocHPCG · 05e137be
  one authored Apr 01, 2026
  
  05e137be
- Fix rocHPCG metric extraction · 742f203d
  one authored Apr 01, 2026
  
  742f203d