installation.mdx 3.33 KB
Newer Older
1
2
3
4
---
id: installation
---

5
6
7
8
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';


9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Installation

SuperBench is used to run validations for AI infrastructure,
thus you need to prepare one __control node__ which is used to run SuperBench commands,
and one or multiple __managed nodes__ which are going to be validated.

Usually __control node__ could be a CPU node, while __managed nodes__ are GPU nodes with high speed inter-connection.

:::tip Tips
It is fine if you have only one GPU node and want to try SuperBench on it.
Control node and managed node can co-locate on the same machine.
:::

## Control node

Here're the system requirements for control node.

### Requirements

* Latest version of Linux, you're highly encouraged to use Ubuntu 18.04 or later.
* [Python](https://www.python.org/) version 3.6 or later (which can be checked by running `python3 --version`).
* [Pip](https://pip.pypa.io/en/stable/installing/) version 18.0 or later (which can be checked by running `python3 -m pip --version`).

:::note
Windows is not supported due to lack of Ansible support, but you still can use WSL2.
:::

Besides, control node should be able to access all managed nodes through SSH.
If you are going to use password instead of private key for SSH, you also need to install `sshpass`.

```bash
sudo apt-get install sshpass
```

It is also recommended to use [venv](https://docs.python.org/3/library/venv.html) for virtual environments,
but it is not strictly necessary.

```bash
# create a new virtual environment
48
python3 -m venv ./venv
49
50
51
52
53
54
55
56
57
58
59
60
# activate the virtual environment
source ./venv/bin/activate

# exit the virtual environment later
# after you finish running superbench
deactivate
```

### Build

You can clone the source from GitHub and build it.

61
62
63
:::note Note
You should checkout corresponding tag to use release version, for example,

64
`git clone -b v0.10.0 https://github.com/microsoft/superbenchmark`
65
66
:::

67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
```bash
git clone https://github.com/microsoft/superbenchmark
cd superbenchmark

python3 -m pip install .
make postinstall
```

After installation, you should be able to run SB CLI.

```bash
sb
```

## Managed nodes

Here're the system requirements for all managed GPU nodes.

### Requirements

87
88
89
90
91
92
93
94
95
96
<Tabs
  groupId='gpu-vendor'
  defaultValue='nvidia'
  values={[
    {label: 'NVIDIA GPU', value: 'nvidia'},
    {label: 'AMD GPU', value: 'amd'},
  ]
}>
<TabItem value='nvidia'>

97
* Latest version of Linux, you're highly encouraged to use Ubuntu 18.04 or later.
98
* Compatible GPU drivers should be installed correctly. Driver version can be checked by running `nvidia-smi`.
99
* [Docker CE](https://docs.docker.com/engine/install/) version 20.10 or later (which can be checked by running `docker --version`).
100
* NVIDIA GPU support in Docker, install
101
  [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit).
102
103
104
105
106
107
108

</TabItem>
<TabItem value='amd'>

* Latest version of Linux, you're highly encouraged to use Ubuntu 18.04 or later.
* Compatible GPU drivers should be installed correctly, and group permission should be set to access GPU resources.
  You should be able to run `rocm-smi` and `rocminfo` directly to check GPU usage and information.
109
* [Docker CE](https://docs.docker.com/engine/install/) version 20.10 or later (which can be checked by running `docker --version`).
110
111
112

</TabItem>
</Tabs>