1. 20 Apr, 2026 1 commit
  2. 18 Apr, 2026 5 commits
    • one's avatar
      Fix some lint warnings (#3) · b31acf90
      one authored
      * Fix some lint warnings
      * Exclude some paths in cpplint
      * Fix some tests and formatting
      b31acf90
    • one's avatar
      Format python code on branch dtk · 2bf01d5e
      one authored
      2bf01d5e
    • one's avatar
      Benchmark: Model benchmark - deterministic training support (#731) (#2) · 47d4a79d
      one authored
      
      
      Adds opt-in deterministic training mode to SuperBench's PyTorch model
      benchmarks. When enabled --enable-determinism. PyTorch deterministic
      algorithms are enforced, and per-step numerical fingerprints (loss,
      activation means) are recorded as metrics. These can be compared across
      runs using the existing sb result diagnosis pipeline to verify bit-exact
      reproducibility — useful for hardware validation and platform
      comparison.
       
      Flags added - 
      
      --enable-determinism
      --check-frequency: Number of steps after which you want the metrics to
      be recorded
      --deterministic-seed
      
      Changes - 
      
      Updated pytorch_base.py to handle deterministic settings, logging.
      Added a new example script: pytorch_deterministic_example.py
      Added a test file: test_pytorch_determinism_all.py to verify everything
      works as expected.
      
      Usage - 
      
      Step 1: Run 1 - Run with --enable-determinism and the necessary metrics
      will be recorded in the results-summary.jsonl file
      Step 2: Generate the baseline file from the Run 1 results using - sb
      result generate-baseline
      Step 3: Run 2 - Run with --enable-determinism and the necessary metrics
      will be recorded in the results-summary.jsonl file on a different
      machine (or the same machine)
      Step 4: Run diagnosis on the results generated from the 2 runs using the
      - sb result diagnosis command
      
      Note - 
      1. Make sure all the parameters are constant between the 2 runs 
      2. Running the diagnosis command requires the rules.yaml file
      
      ---------
      Co-authored-by: default avatarAishwarya Tonpe <aishwarya.tonpe25@gmail.com>
      Co-authored-by: default avatarUbuntu <rdadmin@HPCPLTNODE0.n3kgq4m0lhoednrx3hxtad2nha.cdmx.internal.cloudapp.net>
      47d4a79d
    • one's avatar
      Format python code · 8c28b69a
      one authored
      8c28b69a
    • one's avatar
  3. 17 Apr, 2026 3 commits
  4. 15 Apr, 2026 1 commit
  5. 02 Apr, 2026 5 commits
  6. 01 Apr, 2026 5 commits
  7. 27 Mar, 2026 1 commit
  8. 25 Mar, 2026 1 commit
  9. 19 Mar, 2026 3 commits
    • one's avatar
      Migrate gpu-stream to BabelStream v5.0 · d4051602
      one authored
      d4051602
    • one's avatar
      Enhance DTK platform support and GPU detection · 1a57f2d6
      one authored
      - Added Platform.DTK in the microbenchmark framework.
      - Introduced new DTK hipblaslt benchmark class and corresponding tests.
      - Updated Dockerfile to include hipblaslt-bench and its permissions.
      - Registered DTK benchmarks in the benchmark registry for various performance tests.
      - Enhanced GPU detection logic to recognize HYGON GPUs.
      
      This update improves the benchmarking capabilities for DTK, ensuring compatibility and performance testing across platforms.
      1a57f2d6
    • one's avatar
      Update DTK dockerfile and microbenchmarks · c4f39919
      one authored
      - Update rocm_commom.cmake for CMake>=3.24
      - Prevent isolation build
      - Add BabelStream as a submodule
      - Update dockerignore
      c4f39919
  10. 28 Jan, 2026 1 commit
  11. 04 Dec, 2025 1 commit
  12. 17 Nov, 2025 1 commit
    • Yuting Jiang's avatar
      Benchmarks: micro benchmarks - add --set_ib_devices option to auto-select IB... · c65ae567
      Yuting Jiang authored
      Benchmarks: micro benchmarks - add --set_ib_devices option to auto-select IB device by MPI local rank in ib validation (#733)
      
      **Description**
      add --set_ib_devices option to auto-select IB device by MPI local rank 
      
      
      **Major Revision**
      - Add a new CLI flag --set_ib_devices to automatically select irregular
      IB devices based on the MPI local rank.
      - When enabled, the benchmark queries available IB devices via
      network.get_ib_devices() and selects the device corresponding to
      OMPI_COMM_WORLD_LOCAL_RANK.
      - Fall back to existing --ib_dev behavior when the flag is not provided.
      
      **Minor Revision**
      - Add an env in network.get_ib_devices() to allow user to set the device
      name
      c65ae567
  13. 23 Oct, 2025 1 commit
    • Yuting Jiang's avatar
      Benchmarks: Micro benchmark - add ncu profile support in cublaslt-gemm (#740) · f6e65a98
      Yuting Jiang authored
      **Description**
      This PR adds NCU (NVIDIA Nsight Compute) profiling support to the
      cublaslt-gemm micro benchmark, enabling detailed kernel analysis
      including DRAM throughput, compute throughput, and launch arguments.
      
      **Major Revision**
      - Add --enable_ncu_profiling and --profiling_metrics for ncu profiling
      - Modifies command execution to use NCU when profiling is enabled
      - Updates result parsing to handle both standard and NCU profiled output
      formats
      f6e65a98
  14. 22 Oct, 2025 1 commit
  15. 08 Oct, 2025 1 commit
    • Hongtao Zhang's avatar
      Enhancement: Add nsys and pytorch profiler debug trace support (#744) · d804dbb6
      Hongtao Zhang authored
      
      
      To improve benchmark debugging, the following debug methods were added:
      
      pytorch profiler in model benchmark
      
      - SB_ENABLE_PYTORCH_PROFILER: switch to enable/disable
      - SB_TORCH_PROFILER_TRACE_DIR: log path
      These 2 runtime variables need to be configured in SB config file.
      
      nsys in SB runner
      
      - SB_ENABLE_NSYS: switch to enable/disable 
      - SB_NSYS_TRACE_DIR: log path
      These 2 runtime variables need to be configured in runner's ENV
      
      ---------
      Co-authored-by: default avatarHongtao Zhang <hongtaozhang@microsoft.com>
      d804dbb6
  16. 01 Oct, 2025 1 commit
  17. 29 Sep, 2025 2 commits
  18. 19 Sep, 2025 1 commit
  19. 12 Aug, 2025 1 commit
  20. 30 Jun, 2025 1 commit
  21. 26 Jun, 2025 1 commit
  22. 25 Jun, 2025 1 commit
  23. 24 Jun, 2025 1 commit