README.md 3.43 KB
Newer Older
Chenggang Zhao's avatar
Chenggang Zhao committed
1
2
3
4
5
6
7
8
9
10
# Install NVSHMEM

## Important notices

**This project is neither sponsored nor supported by NVIDIA.**

**Use of NVIDIA NVSHMEM is governed by the terms at [NVSHMEM Software License Agreement](https://docs.nvidia.com/nvshmem/api/sla.html).**

## Prerequisites

youkaichao's avatar
youkaichao committed
11
12
13
Hardware requirements:
   - GPUs inside one node needs to be connected by NVLink
   - GPUs across different nodes needs to be connected by RDMA devices, see [GPUDirect RDMA Documentation](https://docs.nvidia.com/cuda/gpudirect-rdma/)
Chenggang Zhao's avatar
Chenggang Zhao committed
14
15
16
   - InfiniBand GPUDirect Async (IBGDA) support, see [IBGDA Overview](https://developer.nvidia.com/blog/improving-network-performance-of-hpc-systems-using-nvidia-magnum-io-nvshmem-and-gpudirect-async/)
   - For more detailed requirements, see [NVSHMEM Hardware Specifications](https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html#hardware-requirements)

17
18
19
Software requirements:
   - NVSHMEM v3.3.9 or later

Chenggang Zhao's avatar
Chenggang Zhao committed
20
21
## Installation procedure

22
### 1. Install NVSHMEM binaries
Chenggang Zhao's avatar
Chenggang Zhao committed
23

24
25
26
27
28
29
NVSHMEM 3.3.9 binaries are available in several formats:
   - Tarballs for  [x86_64](https://developer.download.nvidia.com/compute/nvshmem/redist/libnvshmem/linux-x86_64/libnvshmem-linux-x86_64-3.3.9_cuda12-archive.tar.xz) and [aarch64](https://developer.download.nvidia.com/compute/nvshmem/redist/libnvshmem/linux-sbsa/libnvshmem-linux-sbsa-3.3.9_cuda12-archive.tar.xz)
   - RPM and deb packages: instructions can be found on the [NVHSMEM installer page](https://developer.nvidia.com/nvshmem-downloads?target_os=Linux)
   - Conda packages through conda-forge
   - pip wheels through PyPI: `pip install nvidia-nvshmem-cu12`
DeepEP is compatible with upstream NVSHMEM 3.3.9 and later.
Chenggang Zhao's avatar
Chenggang Zhao committed
30

Shangyan Zhou's avatar
Shangyan Zhou committed
31

32
### 2. Enable NVSHMEM IBGDA support
Chenggang Zhao's avatar
Chenggang Zhao committed
33

34
NVSHMEM Supports two modes with different requirements. Either of the following methods can be used to enable IBGDA support.
Chenggang Zhao's avatar
Chenggang Zhao committed
35

36
#### 2.1 Configure NVIDIA driver
Chenggang Zhao's avatar
Chenggang Zhao committed
37

38
This configuration enables traditional IBGDA support.
Chenggang Zhao's avatar
Chenggang Zhao committed
39

40
Modify `/etc/modprobe.d/nvidia.conf`:
Chenggang Zhao's avatar
Chenggang Zhao committed
41
42
43
44
45
46
47
48
49
50
51
52

```bash
options nvidia NVreg_EnableStreamMemOPs=1 NVreg_RegistryDwords="PeerMappingOverride=1;"
```

Update kernel configuration:

```bash
sudo update-initramfs -u
sudo reboot
```

53
#### 2.2 Install GDRCopy and load the gdrdrv kernel module
Chenggang Zhao's avatar
Chenggang Zhao committed
54

55
This configuration enables IBGDA through asynchronous post-send operations assisted by the CPU. More information about CPU-assisted IBGDA can be found in [this blog](https://developer.nvidia.com/blog/enhancing-application-portability-and-compatibility-across-new-platforms-using-nvidia-magnum-io-nvshmem-3-0/#cpu-assisted_infiniband_gpu_direct_async%C2%A0).
56
It comes with a small performance penalty, but can be used when modifying the driver regkeys is not an option.
Chenggang Zhao's avatar
Chenggang Zhao committed
57

58
59
Download GDRCopy
GDRCopy is available as prebuilt deb and rpm packages [here](https://developer.download.nvidia.com/compute/redist/gdrcopy/). or as source code on the [GDRCopy github repository](https://github.com/NVIDIA/gdrcopy).
Chenggang Zhao's avatar
Chenggang Zhao committed
60

61
Install GDRCopy following the instructions on the [GDRCopy github repository](https://github.com/NVIDIA/gdrcopy?tab=readme-ov-file#build-and-installation).
Chenggang Zhao's avatar
Chenggang Zhao committed
62
63
64

## Post-installation configuration

65
When not installing NVSHMEM from RPM or deb packages, set the following environment variables in your shell configuration:
Chenggang Zhao's avatar
Chenggang Zhao committed
66
67
68
69
70
71
72
73
74
75
76
77

```bash
export NVSHMEM_DIR=/path/to/your/dir/to/install  # Use for DeepEP installation
export LD_LIBRARY_PATH="${NVSHMEM_DIR}/lib:$LD_LIBRARY_PATH"
export PATH="${NVSHMEM_DIR}/bin:$PATH"
```

## Verification

```bash
nvshmem-info -a # Should display details of nvshmem
```