- 25 Jun, 2025 1 commit
-
-
guoshzhao authored
**Description** Add cuda 12.9 dockerfile and build in pipeline. --------- Co-authored-by:
Guoshuai Zhao <microsoft@microsoft.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com>
-
- 25 Feb, 2025 1 commit
-
-
Hongtao Zhang authored
Added support for Python 3.11, 3.12 and 3.13. yapf is not compatiable with python3.12+, so we disable yapf in py3.12 for now. https://github.com/google/yapf/issues/1258 https://github.com/google/yapf/issues/1266 --------- Co-authored-by:
hongtaozhang <hongtaozhang@microsoft.com>
-
- 04 Feb, 2025 1 commit
-
-
Yifan Xiong authored
Fix installation and lint issues: * Fix transformer installation in Python3.7 due to upgrade of safetensors. * Fix lint issues in mypy 1.14.1.
-
- 28 Nov, 2024 1 commit
-
-
pdr authored
Added llama benchmark - training and inference in accordance with the existing pytorch models implementation like gpt2, lstm etc. - added llama fp8 unit test for better code coverage, to reduce memory required - updated transformers version >= 4.28.0 for LLamaConfig - set tokenizers version <= 0.20.3 to avoid 0.20.4 version [issues](https://github.com/huggingface/tokenizers/issues/1691 ) with py3.8 - added llama2 to tensorrt - llama2 tests not added to test_tensorrt_inference_performance.py due to large memory requirement for worker gpu. tests validated separately on gh200 --------- Co-authored-by:
dpatlolla <dpatlolla@microsoft.com>
-
- 27 Nov, 2024 1 commit
-
-
Yifan Xiong authored
Upgrade dependency versions in Azure pipeline: * Remove Python 3.6 and add Python 3.10 for cpu-unit-test * Upgrade CUDA from 11.1 to 12.4 for cuda-unit-test * Update labels accordingly --------- Co-authored-by:Dilip Patlolla <dilipreddi@gmail.com>
-
- 15 Nov, 2024 1 commit
-
-
Hongtao Zhang authored
**Description** Bump onnxruntime-gpu from 1.10.0 to 1.12.0. --------- Co-authored-by:hongtaozhang <hongtaozhang@microsoft.com>
-
- 06 Nov, 2024 1 commit
-
-
pdr authored
Add support for arm64 build: - Updated dockerfile for arm64 build - extend cpu stream compilation for neoverse - handle onnxruntime-gpu installation - third party builds filtering based on arch - disable cuda decode perf build for non x86
-
- 10 Oct, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Cherry pick bug fixes from v0.11.0 to main **Major Revision** * #645 * #648 * #646 * #647 * #651 * #652 * #650 --------- Co-authored-by:
hongtaozhang <hongtaozhang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 08 Aug, 2024 1 commit
-
-
Yang Wang authored
* https://pypi.org/project/types-pkg-resources/ * Use types-setuptools instead
-
- 23 Jul, 2024 1 commit
-
-
Yang Wang authored
Update `omegaconf` version to [2.3.0](https://pypi.org/project/omegaconf/2.3.0/) as omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.1 will enforce this behaviour change. Discussion can be found at https://github.com/pypa/pip/issues/12063.
-
- 09 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** upgrade to rocm5.7 dockerfile. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 07 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
-
- 27 Nov, 2023 1 commit
-
-
guoshzhao authored
**Description** Add AMD support in monitor. **Major Revision** - Add library pyrsmi to collect metrics. - Currently can get device_utilization, device_power, device_used_memory and device_total_memory.
-
- 22 Nov, 2023 1 commit
-
-
Yifan Xiong authored
Upgrade Docker image to CUDA 12.2 for H100: * upgrade base image to 23.10 * fix onnxruntime version in python3.10 * fix compilation errors
-
- 30 Jun, 2023 1 commit
-
-
Yifan Xiong authored
* Update result parsing for newer tensorrt versions * Update arguments when load torchvision models
-
- 14 Jun, 2023 1 commit
-
-
Yifan Xiong authored
Update error message in setup, require wheel for pip>=23.1.
-
- 23 May, 2023 1 commit
-
-
Yifan Xiong authored
Add signal handler in runner to gracefully exit when receiving SIGINT (<kbd>Ctrl</kbd>+<kbd>C</kbd>) or SIGTERM during benchmark execution.
-
- 06 Mar, 2023 2 commits
-
-
Yifan Xiong authored
Pin setuptools version to [v65.7.0](https://setuptools.pypa.io/en/latest/history.html#v65-7-0) to avoid breaking changes since v66.0.0.
-
Yifan Xiong authored
Limit ansible_runner version to less than 2.3.2 for Python3.6.
-
- 17 Feb, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Upgrade networkx version to fix installation compatibility issue.
-
- 14 Dec, 2022 1 commit
-
-
Yuting Jiang authored
**Description** Downgrage transformers version to fix tersorrt test failure.
-
- 29 Nov, 2022 1 commit
-
-
Yang Wang authored
* add mpi-parallels mode * update according to comments * fix and update doc * update * merge into 'mpi' mode * udpate according to comments * fix testcases * fix ansible * regard pattern as field * udpate * fix flake8 version * add flake8 range * remove map-by from host config * udpate comments
-
- 17 Nov, 2022 1 commit
-
-
Yang Wang authored
-
- 31 Oct, 2022 1 commit
-
-
Yifan Xiong authored
Update version to include revision hash and date in "{last tag}+g{git hash}.d{date}" format, here're the examples: * exact tag: 0.6.0 * commit after tag: 0.6.0+gcbb1b34 * commit after tag with local changes: 0.6.0+gcbb1b34.d20221028
-
- 18 Oct, 2022 1 commit
-
-
Yuting Jiang authored
Benchmarks - Add support to allow list of custom config string in cudnn-functions and cublas-functions (#414) **Description** Add support to allow list of custom config string in cudnn-functions and cublas-functions.
-
- 06 Sep, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.6.0 to main. **Major Revisions** * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by:
Yang Wang <yangwang1@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 17 Aug, 2022 1 commit
-
-
Yifan Xiong authored
__Description__ Update Python setup for require packages. __Major Revisions__ * downgrade requests version to be compatible with python 3.6, add corresponding pipeline for 3.6 * add extra entry in extras_require for nested packages * update `pip install` contents accordingly
-
- 13 Aug, 2022 1 commit
-
-
Yang Wang authored
An enhancement for topo-aware IB performance validation #373. This PR will auto-generate a required ibstate file `ib_traffic_topo_aware_ibstat.txt` which is used as input to build a graph.
-
- 26 Jul, 2022 1 commit
-
-
Jie Zhang authored
* Support topo-aware IB performance validation Add a new pattern `topo-aware`, so the user can run IB performance test based on VM's topology information. This way, the user can validate the IB performance across VM pairs with different distance as a quick test instead of pair-wise test. To run with topo-aware pattern, user needs to specify three required (and two optional) parameters in YAML config file: --pattern topo-aware --ibstat path to ibstat output --ibnetdiscover path to ibnetdiscover output --min_dist minimum distance of VM pairs (optional, default 2) --max_dist maximum distance of VM pairs (optional, default 6) The newly added topo_aware module then parses the topology information, builds a graph, and generates the VM pairs with the specified distance (# hops). The specified IB test will then be running across these generated VM pairs. Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Add description about topology aware ib traffic tests Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Add unit test to verify generated topology aware config file This commit adds unit test to verify the generated topology aware config file is correct. To do so, four new data files are added in order to invoke gen_topo_aware_config function to generate topology aware config file, then compares it with the expected config file. Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Fix lint issue on Azure pipeline Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com>
-
- 13 Jul, 2022 1 commit
-
-
Yifan Xiong authored
Add dependencies * include ndv4-topo.xml in cuda docker images * require requests version to avoid RequestsDependencyWarning
-
- 05 Jul, 2022 1 commit
-
-
Yifan Xiong authored
Support SKU auto detect and using corresponding benchmark config if running on Azure VM.
-
- 14 Jun, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Support `sb run` on host directly without Docker **Major Revisions** - Add `--no-docker` argument for `sb run`. - Run on host directly if `--no-docker` if specified. - Update docs and tests correspondingly.
-
- 29 Apr, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.5.0 to main. **Major Revisions** * Bug - Force to fix ort version as '1.10.0' (#343) * Bug - Support no matching rules and unify the output name in result_summary (#345) * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344) * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342) * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347) * Docs - Upgrade version and release note (#348) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 15 Mar, 2022 1 commit
-
-
user4543 authored
**Description** Add md and html output format for DataDiagnosis. **Major Revision** - add md and html support in file_handler - add interface in DataDiagnosis for md and HTML output **Minor Revision** - move excel and json output interface into DataDiagnosis
-
- 19 Jan, 2022 1 commit
-
-
guoshzhao authored
**Description** Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
-
- 30 Dec, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 10 Dec, 2021 2 commits
-
-
guoshzhao authored
**Description** Add ONNXRuntime inference benchmark based on ORT python API. **Major Revision** - Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference - Add tests and example for `ort-inference` benchmark - Update the introduction docs.
-
Yuting Jiang authored
**Description** Add basic analysis features. **Major Revision** - Add statistics, correlations of the raw data - Add numeric outlier detection(inter_quartile_range) - Add boxplot for selected metric
-
- 08 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add data diagnosis module. **Major Revision** - Add DataDiagnosis class to support rule-based data diagnosis for result summary jsonl file of multi nodes - Add RuleOp class to define rule operators
-
- 12 Oct, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add tcp connectivity validation microbenchmark which is to validate TCP connectivity between current node and several nodes in the hostfile. **Major Revision** - Add tcp connectivity validation microbenchmark and related test, example
-