"...gtest/git@developer.sourcefind.cn:yangql/googletest.git" did not exist on "794ef030ebad671b699abdfd8e0474f93045be91"
configuration.md 4.16 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
id: configuration
---

# Configuration

## SuperBench config

SuperBench uses a [YAML](https://yaml.org/spec/1.2/spec.html) config file to configure the details of benchmarkings,
including which benchmark to run, which distributing mode to choose, which parameter to use, etc.

Here's what default config file looks like.

```yaml title="superbench/config/default.yaml"
# SuperBench Config
superbench:
  enable: null
  var:
    default_local_mode: &default_local_mode
      enable: true
      modes:
        - name: local
          proc_num: 8
          prefix: CUDA_VISIBLE_DEVICES={proc_rank}
          parallel: yes
    default_pytorch_mode: &default_pytorch_mode
      enable: true
      modes:
        - name: torch.distributed
          proc_num: 8
          node_num: 1
      frameworks:
        - pytorch
    common_model_config: &common_model_config
      duration: 0
      num_warmup: 16
      num_steps: 128
      precision:
        - float32
        - float16
      model_action:
        - train
  benchmarks:
    kernel-launch:
      <<: *default_local_mode
    gemm-flops:
      <<: *default_local_mode
    cudnn-function:
      <<: *default_local_mode
    cublas-function:
      <<: *default_local_mode
    matmul:
      <<: *default_local_mode
      frameworks:
        - pytorch
    sharding-matmul:
      <<: *default_pytorch_mode
    computation-communication-overlap:
      <<: *default_pytorch_mode
    gpt_models:
      <<: *default_pytorch_mode
      models:
        - gpt2-small
        - gpt2-large
      parameters:
        <<: *common_model_config
        batch_size: 4
    bert_models:
      <<: *default_pytorch_mode
      models:
        - bert-base
        - bert-large
      parameters:
        <<: *common_model_config
        batch_size: 8
    lstm_models:
      <<: *default_pytorch_mode
      models:
        - lstm
      parameters:
        <<: *common_model_config
        batch_size: 128
83
    resnet_models:
84
85
86
87
88
      <<: *default_pytorch_mode
      models:
        - resnet50
        - resnet101
        - resnet152
89
90
91
92
93
94
      parameters:
        <<: *common_model_config
        batch_size: 128
    densenet_models:
      <<: *default_pytorch_mode
      models:
95
96
        - densenet169
        - densenet201
97
98
99
100
101
102
      parameters:
        <<: *common_model_config
        batch_size: 128
    vgg_models:
      <<: *default_pytorch_mode
      models:
103
104
105
106
107
108
109
110
111
112
113
        - vgg11
        - vgg13
        - vgg16
        - vgg19
      parameters:
        <<: *common_model_config
        batch_size: 128
```

By default, all benchmarks in default configuration will be run if you don't specify customized configuration.

114
If you want to have a quick try, you can modify this config a little bit. For example, only run resnet101 model.
115
116
117
118
1. copy the default config to a file named `resnet.yaml` in current path.
  ```bash
  cp superbench/config/default.yaml resnet.yaml
  ```
119
120
2. enable only `resnet_models` in the config and remove other models except resnet101 under `benchmarks.resnet_models.models`.
  ```yaml {3,11} title="resnet.yaml"
121
122
  # SuperBench Config
  superbench:
123
    enable: ['resnet_models']
124
125
126
127
    var:
  # ...
  # omit the middle part
  # ...
128
      resnet_models:
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
        <<: *default_pytorch_mode
        models:
          - resnet101
        parameters:
          <<: *common_model_config
          batch_size: 128
  ```

## Ansible Inventory

SuperBench leverages [Ansible](https://docs.ansible.com/ansible/latest/) to run benchmarking workloads on managed nodes,
you need to provide an [inventory](https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html) file
to configure host list for managed nodes.

Here're some basic examples as your starting point.
* One managed node, same node as control node.
  ```ini title="local.ini"
  [all]
  localhost ansible_connection=local
  ```
* Two managed nodes, one is control node and the other can be remote accessed.
  ```ini title="mix.ini"
  [all]
  localhost ansible_connection=local
  10.0.0.100 ansible_user=username ansible_ssh_private_key_file=id_rsa
  ```
* Eight managed nodes, all can be accessed remotely.
  ```ini title="remote.ini"
  [all]
  10.0.0.[100:103]
  10.0.0.[200:203]

  [all:vars]
  ansible_user=username
  ansible_ssh_private_key_file=id_rsa
  ```