1. 21 Apr, 2026 1 commit
    • one's avatar
      Runner: Add local numactl GPU affinity support (#6) · 0993db75
      one authored
      - Add `numactl` support for local runner modes, including `cpunodebind`, `membind`, and `physcpubind`.
      - Add `gpu_affinity` resolution through `sb node topo --get gpu-numa-affinity --gpu-id`.
      - Add `sb node topo` support for GPU NUMA topology queries.
      - Update BW1000 config to use the new local `numactl` semantics.
      - Document the new `numactl` mode fields and limitations.
      0993db75
  2. 20 Apr, 2026 2 commits
  3. 18 Apr, 2026 5 commits
    • one's avatar
      Fix some lint warnings (#3) · b31acf90
      one authored
      * Fix some lint warnings
      * Exclude some paths in cpplint
      * Fix some tests and formatting
      b31acf90
    • one's avatar
      Format python code on branch dtk · 2bf01d5e
      one authored
      2bf01d5e
    • one's avatar
      Benchmark: Model benchmark - deterministic training support (#731) (#2) · 47d4a79d
      one authored
      
      
      Adds opt-in deterministic training mode to SuperBench's PyTorch model
      benchmarks. When enabled --enable-determinism. PyTorch deterministic
      algorithms are enforced, and per-step numerical fingerprints (loss,
      activation means) are recorded as metrics. These can be compared across
      runs using the existing sb result diagnosis pipeline to verify bit-exact
      reproducibility — useful for hardware validation and platform
      comparison.
       
      Flags added - 
      
      --enable-determinism
      --check-frequency: Number of steps after which you want the metrics to
      be recorded
      --deterministic-seed
      
      Changes - 
      
      Updated pytorch_base.py to handle deterministic settings, logging.
      Added a new example script: pytorch_deterministic_example.py
      Added a test file: test_pytorch_determinism_all.py to verify everything
      works as expected.
      
      Usage - 
      
      Step 1: Run 1 - Run with --enable-determinism and the necessary metrics
      will be recorded in the results-summary.jsonl file
      Step 2: Generate the baseline file from the Run 1 results using - sb
      result generate-baseline
      Step 3: Run 2 - Run with --enable-determinism and the necessary metrics
      will be recorded in the results-summary.jsonl file on a different
      machine (or the same machine)
      Step 4: Run diagnosis on the results generated from the 2 runs using the
      - sb result diagnosis command
      
      Note - 
      1. Make sure all the parameters are constant between the 2 runs 
      2. Running the diagnosis command requires the rules.yaml file
      
      ---------
      Co-authored-by: default avatarAishwarya Tonpe <aishwarya.tonpe25@gmail.com>
      Co-authored-by: default avatarUbuntu <rdadmin@HPCPLTNODE0.n3kgq4m0lhoednrx3hxtad2nha.cdmx.internal.cloudapp.net>
      47d4a79d
    • one's avatar
      Format python code · 8c28b69a
      one authored
      8c28b69a
    • one's avatar
  4. 17 Apr, 2026 3 commits
  5. 15 Apr, 2026 1 commit
  6. 02 Apr, 2026 5 commits
  7. 01 Apr, 2026 5 commits
  8. 27 Mar, 2026 1 commit
  9. 25 Mar, 2026 1 commit
  10. 19 Mar, 2026 3 commits
    • one's avatar
      Migrate gpu-stream to BabelStream v5.0 · d4051602
      one authored
      d4051602
    • one's avatar
      Enhance DTK platform support and GPU detection · 1a57f2d6
      one authored
      - Added Platform.DTK in the microbenchmark framework.
      - Introduced new DTK hipblaslt benchmark class and corresponding tests.
      - Updated Dockerfile to include hipblaslt-bench and its permissions.
      - Registered DTK benchmarks in the benchmark registry for various performance tests.
      - Enhanced GPU detection logic to recognize HYGON GPUs.
      
      This update improves the benchmarking capabilities for DTK, ensuring compatibility and performance testing across platforms.
      1a57f2d6
    • one's avatar
      Update DTK dockerfile and microbenchmarks · c4f39919
      one authored
      - Update rocm_commom.cmake for CMake>=3.24
      - Prevent isolation build
      - Add BabelStream as a submodule
      - Update dockerignore
      c4f39919
  11. 28 Jan, 2026 1 commit
  12. 04 Dec, 2025 1 commit
  13. 17 Nov, 2025 1 commit
    • Yuting Jiang's avatar
      Benchmarks: micro benchmarks - add --set_ib_devices option to auto-select IB... · c65ae567
      Yuting Jiang authored
      Benchmarks: micro benchmarks - add --set_ib_devices option to auto-select IB device by MPI local rank in ib validation (#733)
      
      **Description**
      add --set_ib_devices option to auto-select IB device by MPI local rank 
      
      
      **Major Revision**
      - Add a new CLI flag --set_ib_devices to automatically select irregular
      IB devices based on the MPI local rank.
      - When enabled, the benchmark queries available IB devices via
      network.get_ib_devices() and selects the device corresponding to
      OMPI_COMM_WORLD_LOCAL_RANK.
      - Fall back to existing --ib_dev behavior when the flag is not provided.
      
      **Minor Revision**
      - Add an env in network.get_ib_devices() to allow user to set the device
      name
      c65ae567
  14. 23 Oct, 2025 1 commit
    • Yuting Jiang's avatar
      Benchmarks: Micro benchmark - add ncu profile support in cublaslt-gemm (#740) · f6e65a98
      Yuting Jiang authored
      **Description**
      This PR adds NCU (NVIDIA Nsight Compute) profiling support to the
      cublaslt-gemm micro benchmark, enabling detailed kernel analysis
      including DRAM throughput, compute throughput, and launch arguments.
      
      **Major Revision**
      - Add --enable_ncu_profiling and --profiling_metrics for ncu profiling
      - Modifies command execution to use NCU when profiling is enabled
      - Updates result parsing to handle both standard and NCU profiled output
      formats
      f6e65a98
  15. 22 Oct, 2025 1 commit
  16. 08 Oct, 2025 1 commit
    • Hongtao Zhang's avatar
      Enhancement: Add nsys and pytorch profiler debug trace support (#744) · d804dbb6
      Hongtao Zhang authored
      
      
      To improve benchmark debugging, the following debug methods were added:
      
      pytorch profiler in model benchmark
      
      - SB_ENABLE_PYTORCH_PROFILER: switch to enable/disable
      - SB_TORCH_PROFILER_TRACE_DIR: log path
      These 2 runtime variables need to be configured in SB config file.
      
      nsys in SB runner
      
      - SB_ENABLE_NSYS: switch to enable/disable 
      - SB_NSYS_TRACE_DIR: log path
      These 2 runtime variables need to be configured in runner's ENV
      
      ---------
      Co-authored-by: default avatarHongtao Zhang <hongtaozhang@microsoft.com>
      d804dbb6
  17. 01 Oct, 2025 1 commit
  18. 29 Sep, 2025 2 commits
  19. 19 Sep, 2025 1 commit
  20. 12 Aug, 2025 1 commit
  21. 30 Jun, 2025 1 commit
  22. 26 Jun, 2025 1 commit