- 18 Apr, 2026 1 commit
-
-
one authored
Adds opt-in deterministic training mode to SuperBench's PyTorch model benchmarks. When enabled --enable-determinism. PyTorch deterministic algorithms are enforced, and per-step numerical fingerprints (loss, activation means) are recorded as metrics. These can be compared across runs using the existing sb result diagnosis pipeline to verify bit-exact reproducibility — useful for hardware validation and platform comparison. Flags added - --enable-determinism --check-frequency: Number of steps after which you want the metrics to be recorded --deterministic-seed Changes - Updated pytorch_base.py to handle deterministic settings, logging. Added a new example script: pytorch_deterministic_example.py Added a test file: test_pytorch_determinism_all.py to verify everything works as expected. Usage - Step 1: Run 1 - Run with --enable-determinism and the necessary metrics will be recorded in the results-summary.jsonl file Step 2: Generate the baseline file from the Run 1 results using - sb result generate-baseline Step 3: Run 2 - Run with --enable-determinism and the necessary metrics will be recorded in the results-summary.jsonl file on a different machine (or the same machine) Step 4: Run diagnosis on the results generated from the 2 runs using the - sb result diagnosis command Note - 1. Make sure all the parameters are constant between the 2 runs 2. Running the diagnosis command requires the rules.yaml file --------- Co-authored-by:
Aishwarya Tonpe <aishwarya.tonpe25@gmail.com> Co-authored-by:
Ubuntu <rdadmin@HPCPLTNODE0.n3kgq4m0lhoednrx3hxtad2nha.cdmx.internal.cloudapp.net>
-
- 29 Sep, 2025 1 commit
-
-
Yuting Jiang authored
**Description** add option to exclude data copy time in model benchmarks. **Major Revision** - add an option --no_copy - move start time after data copy finish
-
- 08 Jan, 2024 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.10.0 to main. **Major Revisions** * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - U...
-
- 07 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
-
- 28 Apr, 2023 1 commit
-
-
guoshzhao authored
**Description** Model benchmarks can stop due to `num_steps` or `duration` config which will take effect when the value is set greater than 0. If both are set greater than 0, the earliest condition reached will work.
-
- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 30 Dec, 2022 1 commit
-
-
Yuting Jiang authored
**Description** Add stdout logging util module and enable real-time logging flushing in executor **Major Revision** - Add stdout logging util module to redirect stdout into file log - enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log` - enable real-time log flushing in run_command of microbenchmarks through config `log_flushing` **Minor Revision** - add log_n_step args to enable regular step time log in model benchmarks - udpate related docs
-
- 29 Apr, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.5.0 to main. **Major Revisions** * Bug - Force to fix ort version as '1.10.0' (#343) * Bug - Support no matching rules and unify the output name in result_summary (#345) * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344) * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342) * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347) * Docs - Upgrade version and release note (#348) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 01 Apr, 2022 1 commit
-
-
guoshzhao authored
**Description** Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.
-
- 28 Jan, 2022 1 commit
-
-
guoshzhao authored
**Description** Please write a brief description and link the related issue if have. **Major Revision** - Sync (do allreduce max) the E2E training results among all workers. - Avoid using ':0' in metric name if there has only one rank having output.
-
- 19 Jan, 2022 1 commit
-
-
guoshzhao authored
**Description** Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
-
- 09 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Unify metric names of benchmarks.
-
- 27 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
-
- 06 Aug, 2021 2 commits
- 28 Jun, 2021 1 commit
-
-
guoshzhao authored
* add config file for ndv4.
-
- 21 Jun, 2021 1 commit
-
-
guoshzhao authored
Benchmarks: Add Feature - Add DistributedImpl and DistributedBackend arguments for micro benchmark. (#100)
-
- 04 Jun, 2021 1 commit
-
-
guoshzhao authored
* fix return code reset issue
-
- 19 May, 2021 1 commit
-
-
Yuting Jiang authored
-
- 26 Apr, 2021 1 commit
-
-
guoshzhao authored
-
- 08 Apr, 2021 1 commit
-
-
guoshzhao authored
* revise result process interface * add more comments Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 22 Mar, 2021 1 commit
-
-
guoshzhao authored
Benchmarks: Add Feature - Add benchmark finish check according to num_warmup/num_steps and duration in ModelBenchmark class. (#25) * add is_finished function * reuse current time. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 18 Mar, 2021 1 commit
-
-
guoshzhao authored
* add sample_count argument. * handle more condidatins. * address comments. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 09 Mar, 2021 2 commits
-
-
guoshzhao authored
* add flag to disable GPU. * fix spelling * fix unittest. * address comments. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
guoshzhao authored
Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 08 Mar, 2021 1 commit
-
-
guoshzhao authored
* add optimizer definition and function to create torch optimizer. * move optimizer enum into model_base module. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 04 Mar, 2021 1 commit
-
-
guoshzhao authored
Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 24 Feb, 2021 1 commit
-
-
guoshzhao authored
* benchmarks init. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-