1. 21 Mar, 2023 1 commit
    • Yifan Xiong's avatar
      Benchmark - Fix torch.dist init issue with multiple models (#495) · 644b5395
      Yifan Xiong authored
      Fix potential barrier timeout in init_process_group due to race
      condition of using the same port. Change to different ports when running
      multiple models sequentially in one process.
      For example, when running vgg11/13/16/19, will use port 29501~29504
      respectively.
      644b5395
  2. 20 Mar, 2023 2 commits
  3. 17 Mar, 2023 1 commit
  4. 06 Mar, 2023 2 commits
  5. 27 Feb, 2023 1 commit
    • Yuting Jiang's avatar
      Benchmarks: Revision - Support flexible warmup and non-random data... · eba298f5
      Yuting Jiang authored
      Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark  (#479)
      
      **Description**
      revise cublas-benchmark for flexible warmup and fill data with fixed
      number for perf test to improve the running efficiency.
      
      **Major Revision**
      - remove num_in_steps for warmup to support more flexible warmup setting
      for users
      - Add support to generate input with fixed number for perf test
      eba298f5
  6. 24 Feb, 2023 1 commit
  7. 23 Feb, 2023 1 commit
  8. 17 Feb, 2023 2 commits
  9. 16 Feb, 2023 1 commit
  10. 13 Feb, 2023 2 commits
  11. 07 Feb, 2023 1 commit
  12. 30 Jan, 2023 1 commit
  13. 28 Jan, 2023 1 commit
  14. 17 Jan, 2023 1 commit
  15. 09 Jan, 2023 1 commit
  16. 04 Jan, 2023 3 commits
  17. 03 Jan, 2023 6 commits
  18. 30 Dec, 2022 2 commits
  19. 29 Dec, 2022 2 commits
  20. 28 Dec, 2022 1 commit
  21. 27 Dec, 2022 1 commit
  22. 14 Dec, 2022 2 commits
  23. 29 Nov, 2022 1 commit
    • Yang Wang's avatar
      Runner - support 'pattern' in 'mpi' mode to run tasks in parallel (#430) · e4eeda0a
      Yang Wang authored
      * add mpi-parallels mode
      
      * update according to comments
      
      * fix and update doc
      
      * update
      
      * merge into 'mpi' mode
      
      * udpate according to comments
      
      * fix testcases
      
      * fix ansible
      
      * regard pattern as field
      
      * udpate
      
      * fix flake8 version
      
      * add flake8 range
      
      * remove map-by from host config
      
      * udpate comments
      e4eeda0a
  24. 18 Nov, 2022 1 commit
  25. 17 Nov, 2022 1 commit
  26. 01 Nov, 2022 1 commit
    • Yifan Xiong's avatar
      CLI - Add non-zero return code for `sb [deploy,run]` (#425) · 1b86503d
      Yifan Xiong authored
      Add non-zero return code for `sb deploy` and `sb run` command when
      there're Ansible failures in control plane.
      Return code is set to count of failure.
      
      For failures caused by benchmarks, return code is still set per benchmark
      in results json file.
      1b86503d