advanced-install.md 4.79 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
---
title: "Installation Details"
date: 2020-10-28
---

The quickest way to get started with DeepSpeed is via pip, this will install
the latest release of DeepSpeed which is not tied to specific PyTorch or CUDA
versions. DeepSpeed includes several C++/CUDA extensions that we commonly refer
to as our 'ops'.  By default, all of these extensions/ops will be built
just-in-time (JIT) using [torch's JIT C++ extension loader that relies on
ninja](https://pytorch.org/docs/stable/cpp_extension.html) to build and
dynamically link them at runtime.

14
15
16
17
**Note:** [PyTorch](https://pytorch.org/) must be installed _before_ installing
DeepSpeed.
{: .notice--info}

18
19
20
21
```bash
pip install deepspeed
```

22
After installation, you can validate your install and see which ops your machine
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
is compatible with via the DeepSpeed environment report with `ds_report` or
`python -m deepspeed.env_report`. We've found this report useful when debugging
DeepSpeed install or compatibility issues.

```bash
ds_report
```

## Pre-install DeepSpeed Ops

Sometimes we have found it useful to pre-install either some or all DeepSpeed
C++/CUDA ops instead of using the JIT compiled path. In order to support
pre-installation we introduce build environment flags to turn on/off building
specific ops.

You can indicate to our installer (either install.sh or pip install) that you
want to attempt to install all of our ops by setting the `DS_BUILD_OPS`
environment variable to 1, for example:

```bash
43
44
45
46
47
48
49
50
51
52
53
54
55
DS_BUILD_OPS=1 pip install deepspeed
```

DeepSpeed will only install any ops that are compatible with your machine.
For more details on which ops are compatible with your system please try our
`ds_report` tool described above.

If you want to install only a specific op (e.g., FusedLamb), you can toggle
with `DS_BUILD` environment variables at installation time. For example, to
install DeepSpeed with only the FusedLamb op use:

```bash
DS_BUILD_FUSED_LAMB=1 pip install deepspeed
56
57
```

58
59
60
61
62
63
64
65
66
67
Available `DS_BUILD` options include:
* `DS_BUILD_OPS` toggles all ops
* `DS_BUILD_CPU_ADAM` builds the CPUAdam op
* `DS_BUILD_FUSED_ADAM` builds the FusedAdam op (from [apex](https://github.com/NVIDIA/apex))
* `DS_BUILD_FUSED_LAMB` builds the FusedLamb op
* `DS_BUILD_SPARSE_ATTN` builds the sparse attention op
* `DS_BUILD_TRANSFORMER` builds the transformer op
* `DS_BUILD_STOCHASTIC_TRANSFORMER` builds the stochastic transformer op
* `DS_BUILD_UTILS` builds various optimized utilities

68

69
70
71
72
73
## Install DeepSpeed from source

After cloning the DeepSpeed repo from GitHub, you can install DeepSpeed in
JIT mode via pip (see below). This install should complete
quickly since it is not compiling any C++/CUDA source files.
74
75

```bash
76
pip install .
77
78
```

79
80
81
82
83
84
85
86
For installs spanning multiple nodes we find it useful to install DeepSpeed
using the
[install.sh](https://github.com/microsoft/DeepSpeed/blob/master/install.sh)
script in the repo. This will build a python wheel locally and copy it to all
the nodes listed in your hostfile (either given via --hostfile, or defaults to
/job/hostfile).


87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
## Building for the correct architectures

If you're getting the following error:

```python
RuntimeError: CUDA error: no kernel image is available for execution on the device
```
when running deepspeed that means that the cuda extensions weren't built for the card you're trying to use it for.

When building from source deepspeed will try to support a wide range of architectures, but under jit-mode it'll only support the archs visible at the time of building.

You can build specifically for a desired range of architectures by setting a `TORCH_CUDA_ARCH_LIST` env variable, like so:

```bash
TORCH_CUDA_ARCH_LIST="6.1;7.5;8.6" pip install ...
```

It will also make the build faster when you only build for a few architectures.

This is also recommended to do to ensure your exact architecture is used. Due to a variety of technical reasons a distributed pytorch binary isn't built to fully support all architectures, skipping binary compatible ones, at a potential cost of underutilizing your full card's compute capabilities. To see which archs get included during the deepspeed build from source - save the log and grep for `-gencode` arguments.

The full list of nvidia gpus and their compute capabilities can be found [here](https://developer.nvidia.com/cuda-gpus).

110
111
112
113
114
115
116
117
118
119
120
121
122
123
## Feature specific dependencies

Some DeepSpeed features require specific dependencies outside of the general
dependencies of DeepSpeed.

* Python package dependencies per feature/op please
see our [requirements
directory](https://github.com/microsoft/DeepSpeed/tree/master/requirements).

* We attempt to keep the system level dependencies to a minimum, however some features do require special system-level packages. Please see our `ds_report` tool output to see if you are missing any system-level packages for a given feature.

## Pre-compiled DeepSpeed builds from PyPI

Coming soon