- 25 Apr, 2022 1 commit
-
-
user4543 authored
**Description** Fix bug of duration feature for model benchmarks in distributed mode. **Major Revision** - Add all_reduce to sync the result of is_finished(the function to judge whether the model benchmark should be stopped) in each step - to avoid inconsistency between different ranks to determine duration end (some rank may enter one more step and can never finish) - Add torch.cuda.synchronize() before and after step time measuring in train_step() for all model benchmarks - some operations in train_step() maybe async resulting incorrect step time records (for example, lstm)
-
- 22 Mar, 2022 1 commit
-
-
user4543 authored
**Description** Remove fp16 samples type converting time for training cnn and lstm inference.
-
- 17 Mar, 2022 1 commit
-
-
user4543 authored
**Description** Remove fp16 samples type converting time for cnn and lstm models.
-
- 13 Dec, 2021 1 commit
-
-
Yifan Xiong authored
Add transformers for TensorRT inference.
-
- 26 Apr, 2021 1 commit
-
-
guoshzhao authored
-
- 20 Apr, 2021 1 commit
-
-
guoshzhao authored
* Benchmarks: Add Benchmark - Add LSTM model benchmarks.
-
- 16 Apr, 2021 1 commit
-
-
guoshzhao authored
Benchmarks: Code Revision - Fix some issue for BERT benchmark. (#58)
-
- 26 Mar, 2021 1 commit
-
-
guoshzhao authored
Benchmarks: Add Benchmark - Add Pytorch BERT benchmarks, including bert-base and bert-large. (#20) * add pytorch bert benchmarks. * revise code * fix issue * revise code. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-