"dockerfile/cuda12.1.dockerfile" did not exist on "faeee0a7cc0636655dda479b977d00e9d88ef82c"
  1. 25 Apr, 2022 1 commit
    • user4543's avatar
      Bug - Fix bug of duration feature for model benchmarks in distributed mode. (#347) · b5b1c3da
      user4543 authored
      **Description**
      Fix bug of duration feature for model benchmarks in distributed mode.
      
      **Major Revision**
      - Add all_reduce to sync the result of is_finished(the function to judge whether the model benchmark should be stopped) in each step 
        - to avoid inconsistency between different ranks to determine duration end (some rank may enter one more step and can never finish)
      - Add torch.cuda.synchronize() before and after step time measuring in train_step() for all model benchmarks
        - some operations in train_step() maybe async resulting incorrect step time records (for example, lstm) 
      b5b1c3da
  2. 21 Apr, 2022 1 commit
  3. 09 Dec, 2021 1 commit
  4. 27 Sep, 2021 1 commit
  5. 28 Jun, 2021 1 commit
  6. 07 Jun, 2021 1 commit
  7. 12 Apr, 2021 2 commits
  8. 08 Apr, 2021 1 commit
  9. 22 Mar, 2021 1 commit
  10. 17 Mar, 2021 1 commit