- 08 Oct, 2025 1 commit
-
-
Hongtao Zhang authored
To improve benchmark debugging, the following debug methods were added: pytorch profiler in model benchmark - SB_ENABLE_PYTORCH_PROFILER: switch to enable/disable - SB_TORCH_PROFILER_TRACE_DIR: log path These 2 runtime variables need to be configured in SB config file. nsys in SB runner - SB_ENABLE_NSYS: switch to enable/disable - SB_NSYS_TRACE_DIR: log path These 2 runtime variables need to be configured in runner's ENV --------- Co-authored-by:Hongtao Zhang <hongtaozhang@microsoft.com>
-
- 25 Jun, 2025 1 commit
-
-
guoshzhao authored
**Description** Add cuda 12.9 dockerfile and build in pipeline. --------- Co-authored-by:
Guoshuai Zhao <microsoft@microsoft.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com>
-
- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 28 Mar, 2023 1 commit
-
-
Yifan Xiong authored
__Description__ Update TE FP8 model conversion. __Major Revisions__ * Add 16-byte alignment comment. * Fix TE layer parameters type.
-
- 25 Mar, 2023 1 commit
-
-
Yifan Xiong authored
Support Transformer Engine FP8 in existing PyTorch BERT/GPT2 models by converting linear/layernorm to TE layers.
-
- 21 Mar, 2023 1 commit
-
-
Yifan Xiong authored
Fix potential barrier timeout in init_process_group due to race condition of using the same port. Change to different ports when running multiple models sequentially in one process. For example, when running vgg11/13/16/19, will use port 29501~29504 respectively.
-
- 29 Apr, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.5.0 to main. **Major Revisions** * Bug - Force to fix ort version as '1.10.0' (#343) * Bug - Support no matching rules and unify the output name in result_summary (#345) * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344) * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342) * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347) * Docs - Upgrade version and release note (#348) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 10 Feb, 2022 1 commit
-
-
user4543 authored
**Description** Add support for pytorch>=1.9.0 of init_process_group. **Major Revision** - Use PrefixStore(TCPStore) to init_process_group manully for each model run
-
- 28 Jan, 2022 1 commit
-
-
guoshzhao authored
**Description** Please write a brief description and link the related issue if have. **Major Revision** - Sync (do allreduce max) the E2E training results among all workers. - Avoid using ':0' in metric name if there has only one rank having output.
-
- 28 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Fix typo when set force_fp32 option.
-
- 27 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
-
- 26 Sep, 2021 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.3.0 to main. **Major Revisions** * Docs - Upgrade version and release note (#209) * Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210) * Benchmarks: Update - Update benchmarks in configuration file (#208) * CI/CD - Update GitHub Action VM (#211) * Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203) * CI/CD - Fix bug in build image for push event (#205) * Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204) * Tool: Fix bug - Fix function naming issue in system info (#200) * CI/CD - Push images in GitHub Action (#202) * Bug - Fix torch.distributed command for single node (#201) * CLI - Integrate system info for node (#199) * Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196) * CI/CD - Add ROCm image build in GitHub Actions (#194) * Bug: Fix bug - fix bug of hipBusBandwidth build (#193) * Benchmarks: Build Pipeline - Restore rocblas build logic (#197) * Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198) * Bug - Revise 'docker run' in sb deploy (#195) * Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190) Co-authored-by:
Yuting Jiang <v-yujiang@microsoft.com> Co-authored-by:
Guoshuai Zhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com>
-
- 29 Jul, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Cherry-pick bug fixes from v0.2.1 to main. __Major Revisions__ * Fix bug of VGG models failed on A100 GPU with batch_size=128. * Fix Ansible connection issue when running in localhost. * Update version in packages and docs.
-
- 28 Jun, 2021 1 commit
-
-
guoshzhao authored
* replace torch.optim.AdamW with transformers.AdamW.
-
- 07 Jun, 2021 1 commit
-
-
guoshzhao authored
* Clean up the cache.
-
- 19 May, 2021 1 commit
-
-
Yuting Jiang authored
-
- 12 Apr, 2021 1 commit
-
-
guoshzhao authored
Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 15 Mar, 2021 1 commit
-
-
guoshzhao authored
Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 09 Mar, 2021 2 commits
-
-
guoshzhao authored
* add flag to disable GPU. * fix spelling * fix unittest. * address comments. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
guoshzhao authored
Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 08 Mar, 2021 1 commit
-
-
guoshzhao authored
* add pytorch base class * address comments Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-