1. 18 Apr, 2026 2 commits
  2. 17 Apr, 2026 4 commits
  3. 15 Apr, 2026 1 commit
  4. 02 Apr, 2026 9 commits
  5. 01 Apr, 2026 7 commits
  6. 31 Mar, 2026 1 commit
  7. 27 Mar, 2026 1 commit
  8. 25 Mar, 2026 1 commit
  9. 20 Mar, 2026 1 commit
  10. 19 Mar, 2026 3 commits
    • one's avatar
      Migrate gpu-stream to BabelStream v5.0 · d4051602
      one authored
      d4051602
    • one's avatar
      Enhance DTK platform support and GPU detection · 1a57f2d6
      one authored
      - Added Platform.DTK in the microbenchmark framework.
      - Introduced new DTK hipblaslt benchmark class and corresponding tests.
      - Updated Dockerfile to include hipblaslt-bench and its permissions.
      - Registered DTK benchmarks in the benchmark registry for various performance tests.
      - Enhanced GPU detection logic to recognize HYGON GPUs.
      
      This update improves the benchmarking capabilities for DTK, ensuring compatibility and performance testing across platforms.
      1a57f2d6
    • one's avatar
      Update DTK dockerfile and microbenchmarks · c4f39919
      one authored
      - Update rocm_commom.cmake for CMake>=3.24
      - Prevent isolation build
      - Add BabelStream as a submodule
      - Update dockerignore
      c4f39919
  11. 17 Mar, 2026 1 commit
  12. 11 Mar, 2026 1 commit
  13. 04 Feb, 2026 1 commit
  14. 28 Jan, 2026 1 commit
  15. 21 Dec, 2025 1 commit
    • Hongtao Zhang's avatar
      CI/CD - Fix Azure pipeline (#767) · c99380b4
      Hongtao Zhang authored
      
      
      **Description**
      Azure pipeline cpu-unit-test failed for "2025-12-10T03:47:59.0628597Z
      ERROR: Could not install packages due to an OSError: [Errno 28] No space
      left on device"
      
      **Root Cause**
      This happens because the matrix jobs (Python 3.7, 3.10, 3.12) run in
      parallel and share the same VM's disk. Python 3.12 downloads
      newer/larger packages (especially PyTorch and NVIDIA CUDA libraries
      which are ~3GB+), and when multiple jobs run simultaneously, they
      exhaust the disk space.
      
      **Fix**
      Disable the cache usage when installing SB
      Co-authored-by: default avatarHongtao Zhang <hongtaozhang@microsoft.com>
      c99380b4
  16. 04 Dec, 2025 1 commit
  17. 17 Nov, 2025 1 commit
    • Yuting Jiang's avatar
      Benchmarks: micro benchmarks - add --set_ib_devices option to auto-select IB... · c65ae567
      Yuting Jiang authored
      Benchmarks: micro benchmarks - add --set_ib_devices option to auto-select IB device by MPI local rank in ib validation (#733)
      
      **Description**
      add --set_ib_devices option to auto-select IB device by MPI local rank 
      
      
      **Major Revision**
      - Add a new CLI flag --set_ib_devices to automatically select irregular
      IB devices based on the MPI local rank.
      - When enabled, the benchmark queries available IB devices via
      network.get_ib_devices() and selects the device corresponding to
      OMPI_COMM_WORLD_LOCAL_RANK.
      - Fall back to existing --ib_dev behavior when the flag is not provided.
      
      **Minor Revision**
      - Add an env in network.get_ib_devices() to allow user to set the device
      name
      c65ae567
  18. 06 Nov, 2025 1 commit
  19. 05 Nov, 2025 1 commit
  20. 23 Oct, 2025 1 commit
    • Yuting Jiang's avatar
      Benchmarks: Micro benchmark - add ncu profile support in cublaslt-gemm (#740) · f6e65a98
      Yuting Jiang authored
      **Description**
      This PR adds NCU (NVIDIA Nsight Compute) profiling support to the
      cublaslt-gemm micro benchmark, enabling detailed kernel analysis
      including DRAM throughput, compute throughput, and launch arguments.
      
      **Major Revision**
      - Add --enable_ncu_profiling and --profiling_metrics for ncu profiling
      - Modifies command execution to use NCU when profiling is enabled
      - Updates result parsing to handle both standard and NCU profiled output
      formats
      f6e65a98