@@ -78,7 +79,7 @@ Here're the details about work directory structure for SuperBench Runner.
...
@@ -78,7 +79,7 @@ Here're the details about work directory structure for SuperBench Runner.
### SuperBench Executor
### SuperBench Executor
SuperBench Executor is the component to run benchmarks inside Docker container.
SuperBench Executor is the component to run benchmarks inside Docker container.
It will execute each benchmark and handle all pre- and post-processing, including health check, result validation, result processing, etc.
It will start the monitor (optional), execute each benchmark and handle all pre- and post-processing, including health check, result validation, result processing, etc.
Here're the SuperBench Executor's work directory structure inside Docker container.
Here're the SuperBench Executor's work directory structure inside Docker container.
The `/root` directory is mounted from `$HOME/sb-workspace` on the host path.
The `/root` directory is mounted from `$HOME/sb-workspace` on the host path.
...
@@ -94,7 +95,8 @@ The `/root` directory is mounted from `$HOME/sb-workspace` on the host path.
...
@@ -94,7 +95,8 @@ The `/root` directory is mounted from `$HOME/sb-workspace` on the host path.
├── benchmarks # benchmarks directory
├── benchmarks # benchmarks directory
│ └── benchmark-0 # output for each benchmark
│ └── benchmark-0 # output for each benchmark
│ └── rank-0 # output for each rank in each benchmark
│ └── rank-0 # output for each rank in each benchmark
SuperBench provides a `Monitor` module to collect the system metrics and detect the failure during the benchmarking. Currently this monitor supports CUDA platform only. Users can enable it in the config file.
## Configuration
```yaml
superbench:
monitor:
enable:bool
sample_duration:int
sample_interval:int
```
### `enable`
Whether enable the monitor module or not.
### `sample_duration`
Calculate the average metrics during sample_duration seconds, such as CPU usage and NIC bandwidth.
### `sample_interval`
Do sampling every sample_interval seconds.
## Metrics
Monitor module will generate the data in jsonlines format, and each line is in json format, including the following metrics: