"src/targets/vscode:/vscode.git/clone" did not exist on "35dcd00da6581fb8d986f40457cf0679f079f575"
- 02 Apr, 2026 1 commit
-
-
one authored
-
- 01 Apr, 2026 2 commits
- 19 Mar, 2026 1 commit
-
-
one authored
-
- 12 Aug, 2025 1 commit
-
-
Hongtao Zhang authored
**Description** Cherry-pick bug fixes from v0.12.0 to main. **Major Revisions** * #725 * #727 * #728 Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Yifan Xiong <yixio@microsoft.com> Co-authored-by:
Guoshuai Zhao <guzhao@microsoft.com> --------- Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com>
-
- 30 Jun, 2025 1 commit
-
-
pdr authored
Added MoE model using MixtralConfig. 1. Added 8x7b and 8x22b variants 2. Requires high VRAM as all experts are loaded in memory. Thus, disabled training due to memory constraint on test worker. --------- Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com> Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com>
-
- 25 Jun, 2025 1 commit
-
-
guoshzhao authored
**Description** Add cuda 12.9 dockerfile and build in pipeline. --------- Co-authored-by:
Guoshuai Zhao <microsoft@microsoft.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com>
-
- 18 Jun, 2025 1 commit
-
-
WenqingLan1 authored
Added GPU Stream benchmark - measures the GPU memory bandwidth and efficiency for double datatype through various memory operations including copy, scale, add, and triad. - added documentation for `gpu-stream` detailing its introduction, metrics, and descriptions. - added unit tests for `gpu-stream`. Example output is in `superbenchmark/tests/data/gpu_stream.log`.
-
- 28 Nov, 2024 1 commit
-
-
pdr authored
Added llama benchmark - training and inference in accordance with the existing pytorch models implementation like gpt2, lstm etc. - added llama fp8 unit test for better code coverage, to reduce memory required - updated transformers version >= 4.28.0 for LLamaConfig - set tokenizers version <= 0.20.3 to avoid 0.20.4 version [issues](https://github.com/huggingface/tokenizers/issues/1691 ) with py3.8 - added llama2 to tensorrt - llama2 tests not added to test_tensorrt_inference_performance.py due to large memory requirement for worker gpu. tests validated separately on gh200 --------- Co-authored-by:
dpatlolla <dpatlolla@microsoft.com>
-
- 22 Nov, 2024 1 commit
-
-
Hongtao Zhang authored
**Description** Add nvbandwidth benchmark. --------- Co-authored-by:hongtaozhang <hongtaozhang@microsoft.com>
-
- 10 Oct, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Cherry pick bug fixes from v0.11.0 to main **Major Revision** * #645 * #648 * #646 * #647 * #651 * #652 * #650 --------- Co-authored-by:
hongtaozhang <hongtaozhang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 08 Jan, 2024 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.10.0 to main. **Major Revisions** * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - U...
-
- 10 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** Add distributed inference benchmark cpp implementation.
-
- 08 Dec, 2023 1 commit
-
-
Ziyue Yang authored
Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588) **Description** Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance, and fix performance bug in gpu_copy
-
- 07 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
-
- 04 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation **Major Revision** - Support cpu-gpu and gpu-cpu in ib-validation **Minor Revision** - support multi msg size, multi direction, multi ib commands in ib-validation
-
- 22 Nov, 2023 1 commit
-
-
guoshzhao authored
**Description** Generate baseline given results from multiple nodes. **Major Revision** - Add sub command `sb result generate-baseline` - Add UT and docs --------- Co-authored-by:
454314380 <454314380@qq.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 27 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Cherry-pick bug fixes from v0.9.0 to main. **Major Revision** - CI/CD: pipeline - clean more disk space to fix rocm building image pipeline(#555 ) - Benchmarks: bug fix - use absolute path for input file in DirectXEncodingLatency(#554) - CI/CD - add push win docker image on release branch in pipeline (#552) - Docs - Upgrade version and release note(#557)
-
- 30 Jun, 2023 1 commit
-
-
Lei Qu authored
Modify link for Nvidia bandwidth test tool **Description** previous link is 404 **Minor Revision** update the link value to https://github.com/NVIDIA/cuda-samples/tree/master/Samples/1_Utilities/bandwidthTest
-
- 29 Jun, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Add runner for sys info to automatically collect on multiple nodes and update related docs. **Major Revision** - add runner for sys info which will check docker status and run `sb node info` on all nodes' docker and fetch results from all nodes **Minor Revision** - update cli and system-info doc - update sb node info to save output info output-dir/sys-info.json
-
- 04 May, 2023 1 commit
-
-
F̷N̷ authored
**Description** Kernel_parameters and kernel_modules command and examples are exchanged.
-
- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 24 Mar, 2023 1 commit
-
-
Ziyue Yang authored
**Description** This PR adds a micro-benchmark of distributed model inference workloads. **Major Revision** - Add a new micro-benchmark dist-inference. - Add corresponding example and unit tests. - Update configuration files to include this new micro-benchmark. - Update micro-benchmark README. --------- Co-authored-by:Peng Cheng <chengpeng5555@outlook.com>
-
- 22 Mar, 2023 1 commit
-
-
Yifan Xiong authored
Support batch and shape range with multiplication factors in cublaslt gemm benchmark.
-
- 21 Mar, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Adding HPL benchmark --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com>
-
- 13 Feb, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Added stream benchmark - Added stream unit test - Added stream example - Modified docker files to build stream --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com> Co-authored-by:
Yifan Xiong <xiongyf@yandex.com>
-
- 28 Jan, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.7.0 to main. **Major Revisions** * Benchmarks - Fix missing include in FP8 benchmark (#460) * Fix bug in TE BERT model (#461) * Doc - Update benchmark doc (#465) * Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#464) * Support `sb deploy` without pulling image (#466) * Docs - Upgrade version and release note (#467) Co-authored-by:
Russell J. Hewett <russell.j.hewett@gmail.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 04 Jan, 2023 1 commit
-
-
Yang Wang authored
Support traffic patterns under the different devices in NCCL/RCCL test * change the metrics format if specified the pattern
-
- 03 Jan, 2023 1 commit
-
-
Yifan Xiong authored
Integrate cublaslt-gemm micro-benchmark #451.
-
- 06 Sep, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.6.0 to main. **Major Revisions** * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by:
Yang Wang <yangwang1@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 22 Aug, 2022 1 commit
-
-
Yuting Jiang authored
**Description** Add support for both jsonl and json format in data diagnosis. **Major Revision** - Add support for both jsonl and json format in data diagnosis **Minor Revision** - change related doc - add jsonl support in cli
-
- 26 Jul, 2022 1 commit
-
-
Jie Zhang authored
* Support topo-aware IB performance validation Add a new pattern `topo-aware`, so the user can run IB performance test based on VM's topology information. This way, the user can validate the IB performance across VM pairs with different distance as a quick test instead of pair-wise test. To run with topo-aware pattern, user needs to specify three required (and two optional) parameters in YAML config file: --pattern topo-aware --ibstat path to ibstat output --ibnetdiscover path to ibnetdiscover output --min_dist minimum distance of VM pairs (optional, default 2) --max_dist maximum distance of VM pairs (optional, default 6) The newly added topo_aware module then parses the topology information, builds a graph, and generates the VM pairs with the specified distance (# hops). The specified IB test will then be running across these generated VM pairs. Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Add description about topology aware ib traffic tests Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Add unit test to verify generated topology aware config file This commit adds unit test to verify the generated topology aware config file is correct. To do so, four new data files are added in order to invoke gen_topo_aware_config function to generate topology aware config file, then compares it with the expected config file. Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Fix lint issue on Azure pipeline Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com>
-
- 29 Jun, 2022 1 commit
-
-
Yifan Xiong authored
Fix several issues in ib loopback benchmark: * use `--report_gbits` and divide by 8 to get GB/s, previous results are MiB/s / 1000 * use the ib_write_bw binary built in third_party instead of system path * update the metrics name so that different hca indices have same metric
-
- 24 Jun, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Support multiple IB/GPU devices run simultaneously in ib validation benchmark. **Major Revisions** - Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel. - Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes. - Fix env issues in Dockerfile for end-to-end test. - Update ib-traffic configuration examples in config files. - Update unit tests and docs accordingly. Closes #326.
-
- 29 Apr, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.5.0 to main. **Major Revisions** * Bug - Force to fix ort version as '1.10.0' (#343) * Bug - Support no matching rules and unify the output name in result_summary (#345) * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344) * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342) * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347) * Docs - Upgrade version and release note (#348) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 20 Apr, 2022 1 commit
-
-
user4543 authored
**Description** Update links of referencing other docs using relative file paths with extensions.
-
- 15 Apr, 2022 1 commit
-
-
Jared Bowden authored
**Description** Fixes relative link in documentation: point to `../cli.md`.
-
- 08 Apr, 2022 1 commit
-
-
user4543 authored
**Description** Add usage for result summary.
-
- 16 Mar, 2022 1 commit
-
-
rafsalas19 authored
**Description** Modifications adding GPU-Burn to SuperBench. - added third party submodule - modified Makefile to make gpu-burn binary - added/modified microbenchmarks to add gpu-burn python scripts - modified default and azure_ndv4 configs to add gpu-burn
-
- 20 Feb, 2022 1 commit
-
-
user4543 authored
**Description** Add multi-rules feature for data diagnosis to support multiple rules' combined check. **Major Revision** - revise rule design to support multiple rules combination check - update related codes and tests
-