@@ -78,7 +79,7 @@ Here're the details about work directory structure for SuperBench Runner.
### SuperBench Executor
SuperBench Executor is the component to run benchmarks inside Docker container.
It will execute each benchmark and handle all pre- and post-processing, including health check, result validation, result processing, etc.
It will start the monitor (optional), execute each benchmark and handle all pre- and post-processing, including health check, result validation, result processing, etc.
Here're the SuperBench Executor's work directory structure inside Docker container.
The `/root` directory is mounted from `$HOME/sb-workspace` on the host path.
...
...
@@ -94,7 +95,8 @@ The `/root` directory is mounted from `$HOME/sb-workspace` on the host path.
├── benchmarks # benchmarks directory
│ └── benchmark-0 # output for each benchmark
│ └── rank-0 # output for each rank in each benchmark
SuperBench provides a `Monitor` module to collect the system metrics and detect the failure during the benchmarking. Currently this monitor supports CUDA platform only. Users can enable it in the config file.
## Configuration
```yaml
superbench:
monitor:
enable:bool
sample_duration:int
sample_interval:int
```
### `enable`
Whether enable the monitor module or not.
### `sample_duration`
Calculate the average metrics during sample_duration seconds, such as CPU usage and NIC bandwidth.
### `sample_interval`
Do sampling every sample_interval seconds.
## Metrics
Monitor module will generate the data in jsonlines format, and each line is in json format, including the following metrics: