- 20 Jan, 2023 1 commit
-
-
Yifan Xiong authored
Support `--no-image-pull` in `sb deploy` to skip image pull to use local cached images.
-
- 01 Nov, 2022 1 commit
-
-
Yifan Xiong authored
Add non-zero return code for `sb deploy` and `sb run` command when there're Ansible failures in control plane. Return code is set to count of failure. For failures caused by benchmarks, return code is still set per benchmark in results json file.
-
- 31 Oct, 2022 1 commit
-
-
Yifan Xiong authored
Update version to include revision hash and date in "{last tag}+g{git hash}.d{date}" format, here're the examples: * exact tag: 0.6.0 * commit after tag: 0.6.0+gcbb1b34 * commit after tag with local changes: 0.6.0+gcbb1b34.d20221028
-
- 06 Sep, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.6.0 to main. **Major Revisions** * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by:
Yang Wang <yangwang1@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 14 Jun, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Support `sb run` on host directly without Docker **Major Revisions** - Add `--no-docker` argument for `sb run`. - Run on host directly if `--no-docker` if specified. - Update docs and tests correspondingly.
-
- 11 Apr, 2022 1 commit
-
-
user4543 authored
**Description** Integrate output all nodes diagnosis results.
-
- 08 Apr, 2022 1 commit
-
-
user4543 authored
**Description** Integrage result summary and update output format of data diagnosis. **Major Revision** - integrage result summary - add md and html format for data diagnosis
-
- 18 Jan, 2022 1 commit
-
-
Yifan Xiong authored
__Description__ Add command `sb benchmark list` and `sb benchmark list-parameters` to support listing all optional parameters for benchmarks. <details> <summary>Examples</summary> <pre> $ sb benchmark list -n [a-z]+-bw -o table Result -------- mem-bw nccl-bw rccl-bw </pre> <pre> $ sb benchmark list-parameters -n mem-bw === mem-bw === optional arguments: --bin_dir str Specify the directory of the benchmark binary. --duration int The elapsed time of benchmark in seconds. --mem_type str [str ...] Memory types to benchmark. E.g. htod dtoh dtod. --memory str Memory argument for bandwidthtest. E.g. pinned unpinned. --run_count int The run count of benchmark. --shmoo_mode Enable shmoo mode for bandwidthtest. default values: {'bin_dir': None, 'duration': 0, 'mem_type': ['htod', 'dtoh'], 'memory': 'pinned', 'run_count': 1} </pre> </details> __Major Revisions__ * Add `sb benchmark list` to list benchmarks matching given name. * Add `sb benchmark list-parameters` to list parameters for benchmarks which match given name. __Minor Revisions__ * Sort format help text for argparse.
-
- 10 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add cli to integrate data diagnosis module.
-
- 26 Sep, 2021 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.3.0 to main. **Major Revisions** * Docs - Upgrade version and release note (#209) * Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210) * Benchmarks: Update - Update benchmarks in configuration file (#208) * CI/CD - Update GitHub Action VM (#211) * Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203) * CI/CD - Fix bug in build image for push event (#205) * Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204) * Tool: Fix bug - Fix function naming issue in system info (#200) * CI/CD - Push images in GitHub Action (#202) * Bug - Fix torch.distributed command for single node (#201) * CLI - Integrate system info for node (#199) * Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196) * CI/CD - Add ROCm image build in GitHub Actions (#194) * Bug: Fix bug - fix bug of hipBusBandwidth build (#193) * Benchmarks: Build Pipeline - Restore rocblas build logic (#197) * Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198) * Bug - Revise 'docker run' in sb deploy (#195) * Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190) Co-authored-by:
Yuting Jiang <v-yujiang@microsoft.com> Co-authored-by:
Guoshuai Zhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com>
-
- 23 Jun, 2021 1 commit
-
-
Yifan Xiong authored
* Add `sb deploy` command content. * Fix inline if-expression syntax in playbook. * Fix quote escape issue in bash command. * Add custom env in config. * Update default config for multi GPU benchmarks. * Update MANIFEST.in to include jinja2 template. * Require jinja2 minimum version. * Fix occasional duplicate output in Ansible runner. * Fix mixed color from Ansible and Python colorlog. * Update according to comments. * Change superbench.env from list to dict in config file.
-
- 28 May, 2021 1 commit
-
-
Yifan Xiong authored
* Support `torch.distributed` mode in runner. * Support given `proc_num` and `node_num` in `torch.distributed` mode.
-
- 18 May, 2021 1 commit
-
-
Yifan Xiong authored
* use absolute path of input file * parse registry uri from image * merge common parts for arguments processing
-
- 12 Apr, 2021 1 commit
-
-
Yifan Xiong authored
* CLI integration with Executor and Runner
-
- 12 Mar, 2021 1 commit
-
-
Yifan Xiong authored
- Add CLI commands * sb version * sb deploy * sb exec * sb run - Add interface with executor and runner - Add cli test cases
-