- 13 Aug, 2022 1 commit
-
-
Yang Wang authored
An enhancement for topo-aware IB performance validation #373. This PR will auto-generate a required ibstate file `ib_traffic_topo_aware_ibstat.txt` which is used as input to build a graph.
-
- 26 Jul, 2022 1 commit
-
-
Jie Zhang authored
* Support topo-aware IB performance validation Add a new pattern `topo-aware`, so the user can run IB performance test based on VM's topology information. This way, the user can validate the IB performance across VM pairs with different distance as a quick test instead of pair-wise test. To run with topo-aware pattern, user needs to specify three required (and two optional) parameters in YAML config file: --pattern topo-aware --ibstat path to ibstat output --ibnetdiscover path to ibnetdiscover output --min_dist minimum distance of VM pairs (optional, default 2) --max_dist maximum distance of VM pairs (optional, default 6) The newly added topo_aware module then parses the topology information, builds a graph, and generates the VM pairs with the specified distance (# hops). The specified IB test will then be running across these generated VM pairs. Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Add description about topology aware ib traffic tests Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Add unit test to verify generated topology aware config file This commit adds unit test to verify the generated topology aware config file is correct. To do so, four new data files are added in order to invoke gen_topo_aware_config function to generate topology aware config file, then compares it with the expected config file. Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Fix lint issue on Azure pipeline Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com>
-
- 05 Jul, 2022 1 commit
-
-
Yifan Xiong authored
Support SKU auto detect and using corresponding benchmark config if running on Azure VM.
-
- 24 Jan, 2022 1 commit
-
-
Yuting Jiang authored
**Description** Fix code insecure issue that binds a socket to all network interfaces.
-
- 15 Nov, 2021 1 commit
-
-
guoshzhao authored
**Description** Rename `nvidia_helper` utility as `device_manager` module and support more functions: ``` device_manager.get_device_count() device_manager.get_device_utilization(idx) device_manager.get_device_temperature(idx) device_manager.get_device_power_limit(idx) device_manager.get_device_memory(idx) device_manager.get_device_row_remapped_info(idx) device_manager.get_device_ecc_error(idx) ```
-
- 31 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Package frequently-used subprocess invoke into function.
-
- 13 Jul, 2021 1 commit
-
-
Yuting Jiang authored
Update network common utils. Add get_ib_devices in network common utils and move get_free_port from test utils to network common utils
-
- 09 Jul, 2021 1 commit
-
-
guoshzhao authored
* Bug Fix - Fix race condition issue for multi ranks (#117) Fix race condition issue when multi ranks rotating the same directory. * Update pipeline for release branch (#122) * Bug Fix - Fix bug when convert bool config to store_true argument. (#120) Co-authored-by:Yifan Xiong <yifan.xiong@microsoft.com>
-
- 08 Jul, 2021 1 commit
-
-
Yifan Xiong authored
Support both NVIDIA and AMD GPU and check GPU vendor during deployment and execution. * Add GPU environment check in sb deploy. * Check GPU vendor in executor.
-
- 02 Jul, 2021 1 commit
-
-
Yifan Xiong authored
Fetch benchmarks results on all nodes, will rsync after each benchmark. The results directory structure on control node is as follows: ``` outputs/ └── datetime ├── nodes │ └── node-0 │ ├── benchmarks │ │ ├── benchmark-0 │ │ │ ├── rank-0 │ │ │ │ └── results.json │ └── sb-exec.log ├── sb-run.log └── sb.config.yaml ```
-
- 01 Jul, 2021 1 commit
-
-
Yifan Xiong authored
* Support custom output directory. * Update document.
-
- 23 Jun, 2021 1 commit
-
-
Yifan Xiong authored
* Add `sb deploy` command content. * Fix inline if-expression syntax in playbook. * Fix quote escape issue in bash command. * Add custom env in config. * Update default config for multi GPU benchmarks. * Update MANIFEST.in to include jinja2 template. * Require jinja2 minimum version. * Fix occasional duplicate output in Ansible runner. * Fix mixed color from Ansible and Python colorlog. * Update according to comments. * Change superbench.env from list to dict in config file.
-
- 16 Jun, 2021 1 commit
-
-
Yifan Xiong authored
Fix bugs and refine log in single GPU benchmarks: * Fix none framework issue * Fix empty parameter bug * Remove missed mobilenet_v3 models * Change benchmark registration log to debug level * Add pid in logging * Add missing benchmarks in default config * Fix deprecated logging warn
-
- 01 Jun, 2021 1 commit
-
-
guoshzhao authored
-
- 18 May, 2021 1 commit
-
-
Yifan Xiong authored
* use absolute path of input file * parse registry uri from image * merge common parts for arguments processing
-
- 11 May, 2021 1 commit
-
-
Yifan Xiong authored
__Major Revision__ * Support lazy import. * Not importing benchmarks when running `help`, `version`, `deploy` commands, etc.
-
- 29 Mar, 2021 1 commit
-
-
Yifan Xiong authored
Update logger class. * add file handler along with stream handler * add colored formatter
-
- 26 Mar, 2021 1 commit
-
-
Yifan Xiong authored
Use omegaconf to replace hydra for configuration system: * remove hydra * use omegaconf to merge configurations
-
- 18 Mar, 2021 1 commit
-
-
guoshzhao authored
* support benchmark re-registration. * address comments Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 12 Mar, 2021 1 commit
-
-
Yifan Xiong authored
- Add CLI commands * sb version * sb deploy * sb exec * sb run - Add interface with executor and runner - Add cli test cases
-
- 04 Mar, 2021 1 commit
-
-
guoshzhao authored
Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-
- 24 Feb, 2021 1 commit
-
-
guoshzhao authored
* benchmarks init. Co-authored-by:Guoshuai Zhao <guzhao@microsoft.com>
-