- 22 Dec, 2023 2 commits
-
-
Yuting Jiang authored
**Description** Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops
-
Ziyue Yang authored
**Description** Adapt to hipblasLt data type changes.
-
- 21 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Bug fix for rocm docker build and deploy.
-
- 20 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** Introduce option for fine-grained GPU memory for AMD GPUs, which should include GPU P2P performance.
-
- 16 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** Make metrics of dist-inference-cpp aligned with PyTorch version. --------- Co-authored-by:Yuting Jiang <yutingjiang@microsoft.com>
-
- 15 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** Add data type option for NCCL and RCCL tests.
-
- 13 Dec, 2023 2 commits
-
-
Ziyue Yang authored
**Description** Add in-place metrics for NCCL/RCCL benchmark for latency measurement.
-
Ziyue Yang authored
**Description** Support different data types in different hipblasLt versions for dist_inference
-
- 11 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** `add_compile_options` will not work for ROCm build, change it to setting `CMAKE_CXX_FLAGS`.
-
- 10 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** Add distributed inference benchmark cpp implementation.
-
- 09 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** upgrade to rocm5.7 dockerfile. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 08 Dec, 2023 1 commit
-
-
Ziyue Yang authored
Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588) **Description** Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance, and fix performance bug in gpu_copy
-
- 05 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** Revise NCCL/RCCL benchmarks to graph mode add latency metrics.
-
- 04 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation **Major Revision** - Support cpu-gpu and gpu-cpu in ib-validation **Minor Revision** - support multi msg size, multi direction, multi ib commands in ib-validation
-
- 22 Nov, 2023 3 commits
-
-
Yifan Xiong authored
Upgrade Docker image to CUDA 12.2 for H100: * upgrade base image to 23.10 * fix onnxruntime version in python3.10 * fix compilation errors
-
Yuting Jiang authored
**Description** add initialization options for rocm gemm flops.
-
Yuting Jiang authored
**Description** hipblaslt function benchmark and rebase cublaslt function benchmark.
-
- 20 Nov, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add int8 support for cublaslt function.
-
- 14 Nov, 2023 1 commit
-
-
Yuting Jiang authored
**Description** remove cp ptx file in gpu burn test since the command is run inside self.args.bin_dir dir. https://github.com/microsoft/superbenchmark/blob/d246bab430adeb461072918a551b2e2b68c9bce5/superbench/benchmarks/micro_benchmarks/micro_base.py#L183
-
- 22 Aug, 2023 1 commit
-
-
Yuting Jiang authored
**Description** source code for evaluating NVDEC decoding performance. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 18 Aug, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add source code for DirectXRenderPerf. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 27 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Cherry-pick bug fixes from v0.9.0 to main. **Major Revision** - CI/CD: pipeline - clean more disk space to fix rocm building image pipeline(#555 ) - Benchmarks: bug fix - use absolute path for input file in DirectXEncodingLatency(#554) - CI/CD - add push win docker image on release branch in pipeline (#552) - Docs - Upgrade version and release note(#557)
-
- 06 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add python code for DirectXGPUEncodingLatency.
-
- 05 Jul, 2023 3 commits
-
-
Yuting Jiang authored
**Description** add python code for DirectXGPUCopy.
-
Yuting Jiang authored
**Description** add python code for DirecXGPUMemBw.
-
Yuting Jiang authored
**Description** add python code for DirectX core flops and init DirectX test pipeline. **Major Revision** - add python code for DirectX core flops - init DirectX test pipeline **Minor Revision** - add test for DirectX core flops
-
- 30 Jun, 2023 2 commits
-
-
Yuting Jiang authored
**Description** add auto selecting algorithm support for cudnn functions. **Major Revision** - add auto selecting algorithm support for cudnn functions in source code - add 'auto_algo' option in benchmark - add related test
-
Yifan Xiong authored
* Update result parsing for newer tensorrt versions * Update arguments when load torchvision models
-
- 29 Jun, 2023 3 commits
-
-
Yuting Jiang authored
**Description** Add source code of DirectxGPUCopy microbenchmark.
-
Yuting Jiang authored
**Description** Add source code of DirectxGPUMemBw microbenchmark. --------- Co-authored-by:v-junlinlv <v-junlinlv@microsoft.com>
-
Yuting Jiang authored
**Description** Add source code of DirectXGPUCoreFLOPs microbenchmark. --------- Co-authored-by:v-junlinlv <v-junlinlv@microsoft.com>
-
- 24 Apr, 2023 1 commit
-
-
Ziyue Yang authored
**Description** This commit revises distributed inference benchmark to give a unified step time result by taking maximum step times of different GPUs.
-
- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 24 Mar, 2023 1 commit
-
-
Ziyue Yang authored
**Description** This PR adds a micro-benchmark of distributed model inference workloads. **Major Revision** - Add a new micro-benchmark dist-inference. - Add corresponding example and unit tests. - Update configuration files to include this new micro-benchmark. - Update micro-benchmark README. --------- Co-authored-by:Peng Cheng <chengpeng5555@outlook.com>
-
- 22 Mar, 2023 1 commit
-
-
Yifan Xiong authored
Support batch and shape range with multiplication factors in cublaslt gemm benchmark.
-
- 21 Mar, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Adding HPL benchmark --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com>
-
- 20 Mar, 2023 2 commits
-
-
Yuting Jiang authored
**Description** Support error tolerance in micro-benchmark for CuDNN function **Major Revision** - revise micro_base to support running the remaining commands run when one command failed in the microbenchmark - make error tolerance as true in cudnn functions
-
Yifan Xiong authored
Support FP64/TF32/FP16/BF16 in cublaslt (batch) GEMM.
-
- 27 Feb, 2023 1 commit
-
-
Yuting Jiang authored
Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark (#479) **Description** revise cublas-benchmark for flexible warmup and fill data with fixed number for perf test to improve the running efficiency. **Major Revision** - remove num_in_steps for warmup to support more flexible warmup setting for users - Add support to generate input with fixed number for perf test
-
- 13 Feb, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Added stream benchmark - Added stream unit test - Added stream example - Modified docker files to build stream --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com> Co-authored-by:
Yifan Xiong <xiongyf@yandex.com>
-