1. 25 Apr, 2022 1 commit
    • user4543's avatar
      Bug - Fix bug of duration feature for model benchmarks in distributed mode. (#347) · b5b1c3da
      user4543 authored
      **Description**
      Fix bug of duration feature for model benchmarks in distributed mode.
      
      **Major Revision**
      - Add all_reduce to sync the result of is_finished(the function to judge whether the model benchmark should be stopped) in each step 
        - to avoid inconsistency between different ranks to determine duration end (some rank may enter one more step and can never finish)
      - Add torch.cuda.synchronize() before and after step time measuring in train_step() for all model benchmarks
        - some operations in train_step() maybe async resulting incorrect step time records (for example, lstm) 
      b5b1c3da
  2. 21 Apr, 2022 1 commit
  3. 01 Apr, 2022 1 commit
  4. 28 Jan, 2022 1 commit
  5. 19 Jan, 2022 1 commit
  6. 18 Jan, 2022 1 commit
    • Yifan Xiong's avatar
      CLI - Add command sb benchmark [list,list-parameters] (#279) · f7ffc545
      Yifan Xiong authored
      __Description__
      
      Add command `sb benchmark list` and `sb benchmark list-parameters` to support listing all optional parameters for benchmarks.
      
      <details>
      <summary>Examples</summary>
      <pre>
      $ sb benchmark list -n [a-z]+-bw -o table
      Result
      --------
      mem-bw
      nccl-bw
      rccl-bw
      </pre>
      <pre>
      $ sb benchmark list-parameters -n mem-bw
      === mem-bw ===
      optional arguments:
        --bin_dir str         Specify the directory of the benchmark binary.
        --duration int        The elapsed time of benchmark in seconds.
        --mem_type str [str ...]
                              Memory types to benchmark. E.g. htod dtoh dtod.
        --memory str          Memory argument for bandwidthtest. E.g. pinned unpinned.
        --run_count int       The run count of benchmark.
        --shmoo_mode          Enable shmoo mode for bandwidthtest.
      default values:
      {'bin_dir': None,
       'duration': 0,
       'mem_type': ['htod', 'dtoh'],
       'memory': 'pinned',
       'run_count': 1}
      </pre>
      </details>
      
      __Major Revisions__
      * Add `sb benchmark list` to list benchmarks matching given name.
      * Add `sb benchmark list-parameters` to list parameters for benchmarks which match given name.
      
      __Minor Revisions__
      * Sort format help text for argparse.
      f7ffc545
  7. 10 Dec, 2021 1 commit
  8. 09 Dec, 2021 1 commit
  9. 07 Dec, 2021 1 commit
  10. 27 Sep, 2021 1 commit
  11. 16 Aug, 2021 1 commit
  12. 06 Aug, 2021 2 commits
  13. 05 Aug, 2021 1 commit
  14. 28 Jun, 2021 1 commit
  15. 07 Jun, 2021 1 commit
  16. 04 Jun, 2021 1 commit
  17. 19 May, 2021 1 commit
  18. 20 Apr, 2021 2 commits
  19. 16 Apr, 2021 2 commits
  20. 12 Apr, 2021 2 commits
  21. 08 Apr, 2021 1 commit
  22. 26 Mar, 2021 1 commit
  23. 22 Mar, 2021 1 commit
  24. 18 Mar, 2021 2 commits
  25. 17 Mar, 2021 1 commit
  26. 11 Mar, 2021 1 commit
  27. 09 Mar, 2021 2 commits
  28. 04 Mar, 2021 1 commit
  29. 24 Feb, 2021 1 commit