1. 18 Apr, 2026 1 commit
    • one's avatar
      Benchmark: Model benchmark - deterministic training support (#731) (#2) · 47d4a79d
      one authored
      
      
      Adds opt-in deterministic training mode to SuperBench's PyTorch model
      benchmarks. When enabled --enable-determinism. PyTorch deterministic
      algorithms are enforced, and per-step numerical fingerprints (loss,
      activation means) are recorded as metrics. These can be compared across
      runs using the existing sb result diagnosis pipeline to verify bit-exact
      reproducibility — useful for hardware validation and platform
      comparison.
       
      Flags added - 
      
      --enable-determinism
      --check-frequency: Number of steps after which you want the metrics to
      be recorded
      --deterministic-seed
      
      Changes - 
      
      Updated pytorch_base.py to handle deterministic settings, logging.
      Added a new example script: pytorch_deterministic_example.py
      Added a test file: test_pytorch_determinism_all.py to verify everything
      works as expected.
      
      Usage - 
      
      Step 1: Run 1 - Run with --enable-determinism and the necessary metrics
      will be recorded in the results-summary.jsonl file
      Step 2: Generate the baseline file from the Run 1 results using - sb
      result generate-baseline
      Step 3: Run 2 - Run with --enable-determinism and the necessary metrics
      will be recorded in the results-summary.jsonl file on a different
      machine (or the same machine)
      Step 4: Run diagnosis on the results generated from the 2 runs using the
      - sb result diagnosis command
      
      Note - 
      1. Make sure all the parameters are constant between the 2 runs 
      2. Running the diagnosis command requires the rules.yaml file
      
      ---------
      Co-authored-by: default avatarAishwarya Tonpe <aishwarya.tonpe25@gmail.com>
      Co-authored-by: default avatarUbuntu <rdadmin@HPCPLTNODE0.n3kgq4m0lhoednrx3hxtad2nha.cdmx.internal.cloudapp.net>
      47d4a79d
  2. 08 Oct, 2025 1 commit
    • Hongtao Zhang's avatar
      Enhancement: Add nsys and pytorch profiler debug trace support (#744) · d804dbb6
      Hongtao Zhang authored
      
      
      To improve benchmark debugging, the following debug methods were added:
      
      pytorch profiler in model benchmark
      
      - SB_ENABLE_PYTORCH_PROFILER: switch to enable/disable
      - SB_TORCH_PROFILER_TRACE_DIR: log path
      These 2 runtime variables need to be configured in SB config file.
      
      nsys in SB runner
      
      - SB_ENABLE_NSYS: switch to enable/disable 
      - SB_NSYS_TRACE_DIR: log path
      These 2 runtime variables need to be configured in runner's ENV
      
      ---------
      Co-authored-by: default avatarHongtao Zhang <hongtaozhang@microsoft.com>
      d804dbb6
  3. 25 Jun, 2025 1 commit
  4. 14 Apr, 2023 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.8.0 (#517) · 51761b3a
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick bug fixes from v0.8.0 to main.
      
      **Major Revisions**
      
      * Monitor - Fix the cgroup version checking logic (#502)
      * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
      * Fix wrong torch usage in communication wrapper for Distributed
      Inference Benchmark (#505)
      * Analyzer: Fix bug in python3.8 due to pandas api change (#504)
      * Bug - Fix bug to get metric from cmd when error happens (#506)
      * Monitor - Collect realtime GPU power when benchmarking (#507)
      * Add num_workers argument in model benchmark (#511)
      * Remove unreachable condition when write host list (#512)
      * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
      * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
      * Docs - Upgrade version and release note (#508)
      Co-authored-by: default avatarguoshzhao <guzhao@microsoft.com>
      Co-authored-by: default avatarZiyue Yang <ziyyang@microsoft.com>
      Co-authored-by: default avatarYuting Jiang <yutingjiang@microsoft.com>
      51761b3a
  5. 28 Mar, 2023 1 commit
  6. 25 Mar, 2023 1 commit
  7. 21 Mar, 2023 1 commit
    • Yifan Xiong's avatar
      Benchmark - Fix torch.dist init issue with multiple models (#495) · 644b5395
      Yifan Xiong authored
      Fix potential barrier timeout in init_process_group due to race
      condition of using the same port. Change to different ports when running
      multiple models sequentially in one process.
      For example, when running vgg11/13/16/19, will use port 29501~29504
      respectively.
      644b5395
  8. 29 Apr, 2022 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.5.0 (#350) · 6681c720
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick  bug fixes from v0.5.0 to main.
      
      **Major Revisions**
      
      * Bug - Force to fix ort version as '1.10.0' (#343)
      * Bug - Support no matching rules and unify the output name in result_summary (#345)
      * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
      * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
      * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
      * Docs - Upgrade version and release note (#348)
      Co-authored-by: default avatarYuting Jiang <v-yutjiang@microsoft.com>
      6681c720
  9. 10 Feb, 2022 1 commit
  10. 28 Jan, 2022 1 commit
  11. 28 Sep, 2021 1 commit
  12. 27 Sep, 2021 1 commit
  13. 26 Sep, 2021 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.3.0 (#212) · dfbd70b1
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick  bug fixes from v0.3.0 to main.
      
      **Major Revisions**
      * Docs - Upgrade version and release note (#209)
      * Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
      * Benchmarks: Update - Update benchmarks in configuration file (#208)
      * CI/CD - Update GitHub Action VM (#211)
      * Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
      * CI/CD - Fix bug in build image for push event (#205)
      * Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
      * Tool: Fix bug - Fix function naming issue in system info  (#200)
      * CI/CD - Push images in GitHub Action (#202)
      * Bug - Fix torch.distributed command for single node (#201)
      * CLI - Integrate system info for node (#199)
      * Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
      * CI/CD - Add ROCm image build in GitHub Actions (#194)
      * Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
      * Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
      * Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
      * Bug - Revise 'docker run' in sb deploy (#195)
      * Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)
      Co-authored-by: default avatarYuting Jiang <v-yujiang@microsoft.com>
      Co-authored-by: default avatarGuoshuai Zhao <guzhao@microsoft.com>
      Co-authored-by: default avatarZiyue Yang <ziyyang@microsoft.com>
      dfbd70b1
  14. 29 Jul, 2021 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.2.1 (#142) · 69b2c631
      Yifan Xiong authored
      __Description__
      Cherry-pick bug fixes from v0.2.1 to main.
      
      __Major Revisions__
      * Fix bug of VGG models failed on A100 GPU with batch_size=128.
      * Fix Ansible connection issue when running in localhost.
      * Update version in packages and docs.
      69b2c631
  15. 28 Jun, 2021 1 commit
  16. 07 Jun, 2021 1 commit
  17. 19 May, 2021 1 commit
  18. 12 Apr, 2021 1 commit
  19. 15 Mar, 2021 1 commit
  20. 09 Mar, 2021 2 commits
  21. 08 Mar, 2021 1 commit