- 05 Jul, 2023 4 commits
-
-
Yuting Jiang authored
**Description** add python code for DirectXGPUCopy.
-
Yuting Jiang authored
**Description** add python code for DirecXGPUMemBw.
-
Yuting Jiang authored
**Description** add python code for DirectX core flops and init DirectX test pipeline. **Major Revision** - add python code for DirectX core flops - init DirectX test pipeline **Minor Revision** - add test for DirectX core flops
-
Yuting Jiang authored
**Description** Support DirectX test pipeline.
-
- 30 Jun, 2023 2 commits
-
-
Yuting Jiang authored
**Description** add auto selecting algorithm support for cudnn functions. **Major Revision** - add auto selecting algorithm support for cudnn functions in source code - add 'auto_algo' option in benchmark - add related test
-
Yifan Xiong authored
* Update result parsing for newer tensorrt versions * Update arguments when load torchvision models
-
- 29 Jun, 2023 3 commits
-
-
Yuting Jiang authored
**Description** Add source code of DirectxGPUCopy microbenchmark.
-
Yuting Jiang authored
**Description** Add source code of DirectxGPUMemBw microbenchmark. --------- Co-authored-by:v-junlinlv <v-junlinlv@microsoft.com>
-
Yuting Jiang authored
**Description** Add source code of DirectXGPUCoreFLOPs microbenchmark. --------- Co-authored-by:v-junlinlv <v-junlinlv@microsoft.com>
-
- 28 Jun, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Add dockerfile for win10 and building script for directx_benchmarks. **Major Revision** - Add docker file for win10 and required scripts to install the dependency - Add building script to build all directx vs benchmarks - Add call of building script in Makefile --------- Co-authored-by:
yukirora <yuting.jiang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 21 Jun, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Add support for DirectX GPU platform. **Major Revision** - Add DirectX platform for benchmark registry - Add gpu_vendor identify for AMD and NVIDIA with win driver
-
- 16 Jun, 2023 1 commit
-
-
guoshzhao authored
**Description** Update 404 outdate reference links.
-
- 28 Apr, 2023 1 commit
-
-
guoshzhao authored
**Description** Model benchmarks can stop due to `num_steps` or `duration` config which will take effect when the value is set greater than 0. If both are set greater than 0, the earliest condition reached will work.
-
- 24 Apr, 2023 1 commit
-
-
Ziyue Yang authored
**Description** This commit revises distributed inference benchmark to give a unified step time result by taking maximum step times of different GPUs.
-
- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 28 Mar, 2023 1 commit
-
-
Yifan Xiong authored
__Description__ Update TE FP8 model conversion. __Major Revisions__ * Add 16-byte alignment comment. * Fix TE layer parameters type.
-
- 25 Mar, 2023 1 commit
-
-
Yifan Xiong authored
Support Transformer Engine FP8 in existing PyTorch BERT/GPT2 models by converting linear/layernorm to TE layers.
-
- 24 Mar, 2023 1 commit
-
-
Ziyue Yang authored
**Description** This PR adds a micro-benchmark of distributed model inference workloads. **Major Revision** - Add a new micro-benchmark dist-inference. - Add corresponding example and unit tests. - Update configuration files to include this new micro-benchmark. - Update micro-benchmark README. --------- Co-authored-by:Peng Cheng <chengpeng5555@outlook.com>
-
- 22 Mar, 2023 1 commit
-
-
Yifan Xiong authored
Support batch and shape range with multiplication factors in cublaslt gemm benchmark.
-
- 21 Mar, 2023 2 commits
-
-
rafsalas19 authored
**Description** - Adding HPL benchmark --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com>
-
Yifan Xiong authored
Fix potential barrier timeout in init_process_group due to race condition of using the same port. Change to different ports when running multiple models sequentially in one process. For example, when running vgg11/13/16/19, will use port 29501~29504 respectively.
-
- 20 Mar, 2023 2 commits
-
-
Yuting Jiang authored
**Description** Support error tolerance in micro-benchmark for CuDNN function **Major Revision** - revise micro_base to support running the remaining commands run when one command failed in the microbenchmark - make error tolerance as true in cudnn functions
-
Yifan Xiong authored
Support FP64/TF32/FP16/BF16 in cublaslt (batch) GEMM.
-
- 27 Feb, 2023 1 commit
-
-
Yuting Jiang authored
Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark (#479) **Description** revise cublas-benchmark for flexible warmup and fill data with fixed number for perf test to improve the running efficiency. **Major Revision** - remove num_in_steps for warmup to support more flexible warmup setting for users - Add support to generate input with fixed number for perf test
-
- 13 Feb, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Added stream benchmark - Added stream unit test - Added stream example - Modified docker files to build stream --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com> Co-authored-by:
Yifan Xiong <xiongyf@yandex.com>
-
- 28 Jan, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.7.0 to main. **Major Revisions** * Benchmarks - Fix missing include in FP8 benchmark (#460) * Fix bug in TE BERT model (#461) * Doc - Update benchmark doc (#465) * Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#464) * Support `sb deploy` without pulling image (#466) * Docs - Upgrade version and release note (#467) Co-authored-by:
Russell J. Hewett <russell.j.hewett@gmail.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 17 Jan, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Fix bug for incorrect datatype judgement in cublas-function source code.
-
- 04 Jan, 2023 2 commits
-
-
Yang Wang authored
Support traffic patterns under the different devices in NCCL/RCCL test * change the metrics format if specified the pattern
-
Yifan Xiong authored
Support FP8 in PyTorch BERT models: * add fp8 hybrid/e4m3/e5m2 in precision arguments * build BERT encoders with `te.TransformerLayer` to repalce `transformers.BertModel` * wrap forward steps with fp8 autocast
-
- 03 Jan, 2023 5 commits
-
-
Yifan Xiong authored
Support GEMM benchmark on Hopper GPUs.
-
Yifan Xiong authored
Integrate cublaslt-gemm micro-benchmark #451.
-
Yuting Jiang authored
**Description** Add correctness check in cublas-function benchmark. **Major Revision** - add python code of correctness check in cublas-function benchmark and test
-
Yifan Xiong authored
Add micro-benchmark for cublaslt fp8 gemm.
-
Yuting Jiang authored
**Description** Add c source code of correctness check for cublas functions. **Major Revision** - add correctness check for all supported cublas functions - add --correctness option into binary **Minor Revision** - fix bug and template fill_data and prepare_tensor to get right memory-alignment output matrix for different datatype
-
- 30 Dec, 2022 2 commits
-
-
Yuting Jiang authored
**Description** Add stdout logging util module and enable real-time logging flushing in executor **Major Revision** - Add stdout logging util module to redirect stdout into file log - enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log` - enable real-time log flushing in run_command of microbenchmarks through config `log_flushing` **Minor Revision** - add log_n_step args to enable regular step time log in model benchmarks - udpate related docs
-
Yang Wang authored
**Description** * Reuse `gen_pair_wise_config` in micro-benchmark
-
- 14 Dec, 2022 1 commit
-
-
Yuting Jiang authored
**Description** Add wait time option to resolve mem-bw unstable issue.
-
- 18 Oct, 2022 1 commit
-
-
Yuting Jiang authored
Benchmarks - Add support to allow list of custom config string in cudnn-functions and cublas-functions (#414) **Description** Add support to allow list of custom config string in cudnn-functions and cublas-functions.
-
- 06 Sep, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.6.0 to main. **Major Revisions** * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by:
Yang Wang <yangwang1@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 04 Aug, 2022 1 commit
-
-
Yifan Xiong authored
* Gracefully exit when timeout, add corresponding log and return code. * Set minimum timeout to 1 minute and enlarge Ansible timeout.
-