- 30 Jun, 2025 1 commit
-
-
pdr authored
Added MoE model using MixtralConfig. 1. Added 8x7b and 8x22b variants 2. Requires high VRAM as all experts are loaded in memory. Thus, disabled training due to memory constraint on test worker. --------- Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com> Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com>
-
- 26 Jun, 2025 1 commit
-
-
Yuting Jiang authored
**Description** Add deepseek megatron-lm benchmark. --------- Co-authored-by:
yukirora <yuting.jiang@microsoft.com> Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com>
-
- 25 Jun, 2025 1 commit
-
-
guoshzhao authored
**Description** Add cuda 12.9 dockerfile and build in pipeline. --------- Co-authored-by:
Guoshuai Zhao <microsoft@microsoft.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com>
-
- 28 Nov, 2024 1 commit
-
-
pdr authored
Added llama benchmark - training and inference in accordance with the existing pytorch models implementation like gpt2, lstm etc. - added llama fp8 unit test for better code coverage, to reduce memory required - updated transformers version >= 4.28.0 for LLamaConfig - set tokenizers version <= 0.20.3 to avoid 0.20.4 version [issues](https://github.com/huggingface/tokenizers/issues/1691 ) with py3.8 - added llama2 to tensorrt - llama2 tests not added to test_tensorrt_inference_performance.py due to large memory requirement for worker gpu. tests validated separately on gh200 --------- Co-authored-by:
dpatlolla <dpatlolla@microsoft.com>
-
- 27 Nov, 2024 1 commit
-
-
Yifan Xiong authored
Upgrade dependency versions in Azure pipeline: * Remove Python 3.6 and add Python 3.10 for cpu-unit-test * Upgrade CUDA from 11.1 to 12.4 for cuda-unit-test * Update labels accordingly --------- Co-authored-by:Dilip Patlolla <dilipreddi@gmail.com>
-
- 08 Jan, 2024 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.10.0 to main. **Major Revisions** * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - Upgrade pyrsmi to amdsmi python library. #601 * Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605 * Dockerfile - Add rocm6.0 dockerfile #602 * Bug Fix - Bug fix for latest megatron-lm benchmark #600 * Docs - Upgrade version and release note #606 Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yang Wang <yangwang1@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com> Co-authored-by:
guoshzhao <guzhao@microsoft.com>
-
- 07 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
-
- 28 Apr, 2023 1 commit
-
-
guoshzhao authored
**Description** Model benchmarks can stop due to `num_steps` or `duration` config which will take effect when the value is set greater than 0. If both are set greater than 0, the earliest condition reached will work.
-
- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 04 Jan, 2023 1 commit
-
-
Yifan Xiong authored
Support FP8 in PyTorch BERT models: * add fp8 hybrid/e4m3/e5m2 in precision arguments * build BERT encoders with `te.TransformerLayer` to repalce `transformers.BertModel` * wrap forward steps with fp8 autocast
-
- 30 Dec, 2022 1 commit
-
-
Yuting Jiang authored
**Description** Add stdout logging util module and enable real-time logging flushing in executor **Major Revision** - Add stdout logging util module to redirect stdout into file log - enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log` - enable real-time log flushing in run_command of microbenchmarks through config `log_flushing` **Minor Revision** - add log_n_step args to enable regular step time log in model benchmarks - udpate related docs
-
- 29 Apr, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.5.0 to main. **Major Revisions** * Bug - Force to fix ort version as '1.10.0' (#343) * Bug - Support no matching rules and unify the output name in result_summary (#345) * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344) * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342) * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347) * Docs - Upgrade version and release note (#348) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 01 Apr, 2022 1 commit
-
-
guoshzhao authored
**Description** Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.
-
- 28 Jan, 2022 1 commit
-
-
guoshzhao authored
**Description** Please write a brief description and link the related issue if have. **Major Revision** - Sync (do allreduce max) the E2E training results among all workers. - Avoid using ':0' in metric name if there has only one rank having output.
-
- 19 Jan, 2022 1 commit
-
-
guoshzhao authored
**Description** Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
-
- 18 Jan, 2022 1 commit
-
-
Yifan Xiong authored
__Description__ Add command `sb benchmark list` and `sb benchmark list-parameters` to support listing all optional parameters for benchmarks. <details> <summary>Examples</summary> <pre> $ sb benchmark list -n [a-z]+-bw -o table Result -------- mem-bw nccl-bw rccl-bw </pre> <pre> $ sb benchmark list-parameters -n mem-bw === mem-bw === optional arguments: --bin_dir str Specify the directory of the benchmark binary. --duration int The elapsed time of benchmark in seconds. --mem_type str [str ...] Memory types to benchmark. E.g. htod dtoh dtod. --memory str Memory argument for bandwidthtest. E.g. pinned unpinned. --run_count int The run count of benchmark. --shmoo_mode Enable shmoo mode for bandwidthtest. default values: {'bin_dir': None, 'duration': 0, 'mem_type': ['htod', 'dtoh'], 'memory': 'pinned', 'run_count': 1} </pre> </details> __Major Revisions__ * Add `sb benchmark list` to list benchmarks matching given name. * Add `sb benchmark list-parameters` to list parameters for benchmarks which match given name. __Minor Revisions__ * Sort format help text for argparse.
-
- 10 Dec, 2021 1 commit
-
-
guoshzhao authored
**Description** Set the `reduce_op` type for metirc `return_code` as `None`.
-
- 09 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Unify metric names of benchmarks.
-
- 07 Dec, 2021 1 commit
-
-
guoshzhao authored
**Description** Add return_code metric into result and revise unit tests.
-
- 27 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
-
- 16 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Change the field name `reduce` to `reduce_op`.
-
- 06 Aug, 2021 2 commits
- 05 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Add reduce function support for output summary. **Major Revision** - Add reducer class to maintain all reduce functions. - Save reduce type of each metric into `BenchmarkResult` - Fix UT.
-
- 28 Jun, 2021 1 commit
-
-
guoshzhao authored
* replace torch.optim.AdamW with transformers.AdamW.
-
- 07 Jun, 2021 1 commit
-
-
guoshzhao authored
* Clean up the cache.
-
- 04 Jun, 2021 1 commit
-
-
guoshzhao authored
* fix return code reset issue
-
- 19 May, 2021 1 commit
-
-
Yuting Jiang authored
-
- 20 Apr, 2021 2 commits
- 16 Apr, 2021 2 commits
- 12 Apr, 2021 2 commits
-
-
guoshzhao authored
Co-authored-by:
Guoshuai Zhao <guzhao@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
Yifan Xiong authored
* skip unnecessary tests according to env var * remove useless tests
-
- 08 Apr, 2021 1 commit
-
-
guoshzhao authored
Benchmarks: Code Revision - Revise BenchmarkRegistry interfaces for integration with executor. (#33) * revise BenchmarkRegistry interfaces. * address comments Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 26 Mar, 2021 1 commit
-
-
guoshzhao authored
Benchmarks: Add Benchmark - Add Pytorch BERT benchmarks, including bert-base and bert-large. (#20) * add pytorch bert benchmarks. * revise code * fix issue * revise code. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 22 Mar, 2021 1 commit
-
-
guoshzhao authored
* move benchmarks registration from registry.py to __init__.py * revise __init__. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 18 Mar, 2021 2 commits
-
-
guoshzhao authored
* add sample_count argument. * handle more condidatins. * address comments. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
guoshzhao authored
* support benchmark re-registration. * address comments Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 17 Mar, 2021 1 commit
-
-
guoshzhao authored
* add pytorch base tests. * add more test cases. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-