1. 28 Jan, 2023 1 commit
  2. 29 Jun, 2022 1 commit
    • Yifan Xiong's avatar
      Deployment - Refine error message when GPU is not detected (#368) · 8ef7163a
      Yifan Xiong authored
      Refine error message when GPU is not detected.
      
      Possible solutions if hardware exists and drivers are already installed:
      * nvidia gpus:
        ```sh
        /sbin/modprobe nvidia-uvm
        D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
        mknod -m 666 /dev/nvidia-uvm c $D 0
        ```
      
      * amd gpus
        ```sh
        modprobe amdgpu
        ```
      8ef7163a
  3. 08 Dec, 2021 1 commit
    • Yifan Xiong's avatar
      Bug - Fix issues for distributed runs (#258) · 213ab14b
      Yifan Xiong authored
      Fix issues for distributed runs:
      * fix config for memory bandwidth benchmarks
      * add throttling for high concurrency docker pull
      * update rsync path and exclude directories
      * handle exceptions when creating summary
      * tune for logging
      213ab14b
  4. 26 Sep, 2021 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.3.0 (#212) · dfbd70b1
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick  bug fixes from v0.3.0 to main.
      
      **Major Revisions**
      * Docs - Upgrade version and release note (#209)
      * Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
      * Benchmarks: Update - Update benchmarks in configuration file (#208)
      * CI/CD - Update GitHub Action VM (#211)
      * Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
      * CI/CD - Fix bug in build image for push event (#205)
      * Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
      * Tool: Fix bug - Fix function naming issue in system info  (#200)
      * CI/CD - Push images in GitHub Action (#202)
      * Bug - Fix torch.distributed command for single node (#201)
      * CLI - Integrate system info for node (#199)
      * Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
      * CI/CD - Add ROCm image build in GitHub Actions (#194)
      * Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
      * Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
      * Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
      * Bug - Revise 'docker run' in sb deploy (#195)
      * Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)
      Co-authored-by: default avatarYuting Jiang <v-yujiang@microsoft.com>
      Co-authored-by: default avatarGuoshuai Zhao <guzhao@microsoft.com>
      Co-authored-by: default avatarZiyue Yang <ziyyang@microsoft.com>
      dfbd70b1
  5. 01 Sep, 2021 1 commit
  6. 29 Jul, 2021 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.2.1 (#142) · 69b2c631
      Yifan Xiong authored
      __Description__
      Cherry-pick bug fixes from v0.2.1 to main.
      
      __Major Revisions__
      * Fix bug of VGG models failed on A100 GPU with batch_size=128.
      * Fix Ansible connection issue when running in localhost.
      * Update version in packages and docs.
      69b2c631
  7. 08 Jul, 2021 1 commit
  8. 23 Jun, 2021 1 commit
    • Yifan Xiong's avatar
      Bug bash - Fix bugs in multi GPU benchmarks (#98) · c0c43b8f
      Yifan Xiong authored
      * Add `sb deploy` command content.
      * Fix inline if-expression syntax in playbook.
      * Fix quote escape issue in bash command.
      * Add custom env in config.
      * Update default config for multi GPU benchmarks.
      * Update MANIFEST.in to include jinja2 template.
      * Require jinja2 minimum version.
      * Fix occasional duplicate output in Ansible runner.
      * Fix mixed color from Ansible and Python colorlog.
      * Update according to comments.
      * Change superbench.env from list to dict in config file.
      c0c43b8f
  9. 26 May, 2021 1 commit
  10. 23 May, 2021 1 commit