- 18 Apr, 2026 3 commits
-
-
one authored
Adds opt-in deterministic training mode to SuperBench's PyTorch model benchmarks. When enabled --enable-determinism. PyTorch deterministic algorithms are enforced, and per-step numerical fingerprints (loss, activation means) are recorded as metrics. These can be compared across runs using the existing sb result diagnosis pipeline to verify bit-exact reproducibility — useful for hardware validation and platform comparison. Flags added - --enable-determinism --check-frequency: Number of steps after which you want the metrics to be recorded --deterministic-seed Changes - Updated pytorch_base.py to handle deterministic settings, logging. Added a new example script: pytorch_deterministic_example.py Added a test file: test_pytorch_determinism_all.py to verify everything works as expected. Usage - Step 1: Run 1 - Run with --enable-determinism and the necessary metrics will be recorded in the results-summary.jsonl file Step 2: Generate the baseline file from the Run 1 results using - sb result generate-baseline Step 3: Run 2 - Run with --enable-determinism and the necessary metrics will be recorded in the results-summary.jsonl file on a different machine (or the same machine) Step 4: Run diagnosis on the results generated from the 2 runs using the - sb result diagnosis command Note - 1. Make sure all the parameters are constant between the 2 runs 2. Running the diagnosis command requires the rules.yaml file --------- Co-authored-by:
Aishwarya Tonpe <aishwarya.tonpe25@gmail.com> Co-authored-by:
Ubuntu <rdadmin@HPCPLTNODE0.n3kgq4m0lhoednrx3hxtad2nha.cdmx.internal.cloudapp.net>
-
one authored
-
one authored
-
- 17 Apr, 2026 4 commits
- 15 Apr, 2026 1 commit
-
-
one authored
-
- 02 Apr, 2026 9 commits
- 01 Apr, 2026 7 commits
- 31 Mar, 2026 1 commit
-
-
one authored
-
- 27 Mar, 2026 1 commit
-
-
one authored
-
- 25 Mar, 2026 1 commit
-
-
one authored
-
- 20 Mar, 2026 1 commit
-
-
one authored
-
- 19 Mar, 2026 3 commits
-
-
one authored
-
one authored
- Added Platform.DTK in the microbenchmark framework. - Introduced new DTK hipblaslt benchmark class and corresponding tests. - Updated Dockerfile to include hipblaslt-bench and its permissions. - Registered DTK benchmarks in the benchmark registry for various performance tests. - Enhanced GPU detection logic to recognize HYGON GPUs. This update improves the benchmarking capabilities for DTK, ensuring compatibility and performance testing across platforms.
-
one authored
- Update rocm_commom.cmake for CMake>=3.24 - Prevent isolation build - Add BabelStream as a submodule - Update dockerignore
-
- 17 Mar, 2026 1 commit
-
-
one authored
-
- 11 Mar, 2026 1 commit
-
-
Hongtao Zhang authored
## Summary - Upgrade Intel Memory Latency Checker from v3.11 to v3.12 in rocm5.0.x.dockerfile - Aligns with other dockerfiles that already use v3.12 Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Claude Opus 4.5 <noreply@anthropic.com>
-
- 04 Feb, 2026 1 commit
-
-
WenqingLan1 authored
Updated 3rd party submodule gpu-burn to newest version for implementation & doc support for cuda13.0. Co-authored-by:guoshzhao <guzhao@microsoft.com>
-
- 28 Jan, 2026 1 commit
-
-
Hongtao Zhang authored
**Description** - When building the CUDA 11.1.1 image, pip (Python 3.8) cannot find a pre-built wheel for the latest wandb release (v0.23.1). As a result, pip attempts to build wandb from source. However, the build fails because the image does not have Go installed, which is required for building wandb from source. Then the error appears. **Solution** - For the CUDA 11.1.1 build, install the required build tools (e.g., Go, Rust, and Cargo) needed for wandb. --------- Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com>
-
- 21 Dec, 2025 1 commit
-
-
Hongtao Zhang authored
**Description** Azure pipeline cpu-unit-test failed for "2025-12-10T03:47:59.0628597Z ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device" **Root Cause** This happens because the matrix jobs (Python 3.7, 3.10, 3.12) run in parallel and share the same VM's disk. Python 3.12 downloads newer/larger packages (especially PyTorch and NVIDIA CUDA libraries which are ~3GB+), and when multiple jobs run simultaneously, they exhaust the disk space. **Fix** Disable the cache usage when installing SB Co-authored-by:Hongtao Zhang <hongtaozhang@microsoft.com>
-
- 04 Dec, 2025 1 commit
-
-
Henry Li authored
**Description** The ib-loopback test was regressed due to this recent [change](https://github.com/microsoft/superbenchmark/commit/c65ae56713d6bfcc4a3be37d7fe24779590f9791). When running ib-loopback using the standard [config](https://github.com/microsoft/superbenchmark/blob/c65ae56713d6bfcc4a3be37d7fe24779590f9791/superbench/config/default.yaml#L69 ), the test would fail since it would pass numeric values like `0` into the test command which would break since it is not a valid IB device name. Example failure: ``` [2025-11-25 22:08:38,100 vmssnc6ec000003:141056][micro_base.py:200][INFO] Execute command - round: 0, benchmark: ib-loopback, command: /usr/local/bin/run_perftest_loopback 47 45 /usr/local/b in/ib_write_bw -s 8388608 -F --iters=20000 -d 0 -p 45617 -x 0 --report_gbits. [0]: IB device 0 not found Unable to find the Infiniband/RoCE device IB device 0 not found Unable to find the Infiniband/RoCE device [2025-11-25 22:08:39,113 vmssnc6ec000003:141056][micro_base.py:209][ERROR] Microbenchmark execution failed - round: 0, benchmark: ib-loopback, error message: IB device 0 not found Unable to find the Infiniband/RoCE device IB device 0 not found Unable to find the Infiniband/RoCE device ``` **Major Revision** - Major Revision A - Major Revision B - ... **Minor Revision** - Minor Revision A - Minor Revision B - ... --------- Co-authored-by:
Henry Li <lihl@microsoft.com>
-
- 17 Nov, 2025 1 commit
-
-
Yuting Jiang authored
Benchmarks: micro benchmarks - add --set_ib_devices option to auto-select IB device by MPI local rank in ib validation (#733) **Description** add --set_ib_devices option to auto-select IB device by MPI local rank **Major Revision** - Add a new CLI flag --set_ib_devices to automatically select irregular IB devices based on the MPI local rank. - When enabled, the benchmark queries available IB devices via network.get_ib_devices() and selects the device corresponding to OMPI_COMM_WORLD_LOCAL_RANK. - Fall back to existing --ib_dev behavior when the flag is not provided. **Minor Revision** - Add an env in network.get_ib_devices() to allow user to set the device name
-
- 06 Nov, 2025 1 commit
-
-
WenqingLan1 authored
Updated mlc wget link in dockerfiles. --------- Co-authored-by:guoshzhao <guzhao@microsoft.com>
-
- 05 Nov, 2025 1 commit
-
-
Hongtao Zhang authored
Python3.10 verification pipeline failed for conflict 'setuptools' version as below. <img width="1157" height="622" alt="image" src="https://github.com/user-attachments/assets/ba0f6045-4b92-4fd8-b92f-1c474725534c " /> Root Cause: The problem is that modern pip (25.3) uses an isolated build environment with the latest setuptools by default. The pipeline installs setuptools 65.7 in the user environment, but pip builds the package in an isolated environment with newer setuptools, which conflicts with the version check in [setup.py]. Solution: Remove pip upgrade. --------- Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com>
-