README.md 3.02 KB
Newer Older
Chenggang Zhao's avatar
Chenggang Zhao committed
1
2
3
4
5
6
7
8
9
10
# Install NVSHMEM

## Important notices

**This project is neither sponsored nor supported by NVIDIA.**

**Use of NVIDIA NVSHMEM is governed by the terms at [NVSHMEM Software License Agreement](https://docs.nvidia.com/nvshmem/api/sla.html).**

## Prerequisites

youkaichao's avatar
youkaichao committed
11
12
13
Hardware requirements:
   - GPUs inside one node needs to be connected by NVLink
   - GPUs across different nodes needs to be connected by RDMA devices, see [GPUDirect RDMA Documentation](https://docs.nvidia.com/cuda/gpudirect-rdma/)
Chenggang Zhao's avatar
Chenggang Zhao committed
14
15
16
17
18
   - InfiniBand GPUDirect Async (IBGDA) support, see [IBGDA Overview](https://developer.nvidia.com/blog/improving-network-performance-of-hpc-systems-using-nvidia-magnum-io-nvshmem-and-gpudirect-async/)
   - For more detailed requirements, see [NVSHMEM Hardware Specifications](https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html#hardware-requirements)

## Installation procedure

youkaichao's avatar
youkaichao committed
19
### 1. Acquiring NVSHMEM source code
Chenggang Zhao's avatar
Chenggang Zhao committed
20

Shangyan Zhou's avatar
Shangyan Zhou committed
21
Download NVSHMEM source code from the [NVIDIA NVSHMEM OPEN SOURCE PACKAGES](https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_cuda12-all).
Chenggang Zhao's avatar
Chenggang Zhao committed
22

Shangyan Zhou's avatar
Shangyan Zhou committed
23
24
25
### 2. [Optional] apply our custom patch

**NOTE: After NVSHMEM v3.3.9, it is no longer necessary to apply our patch to achieve optimal performance.**
Chenggang Zhao's avatar
Chenggang Zhao committed
26
27
28
29
30
31
32

Navigate to your NVSHMEM source directory and apply our provided patch:

```bash
git apply /path/to/deep_ep/dir/third-party/nvshmem.patch
```

youkaichao's avatar
youkaichao committed
33
### 3. Configure NVIDIA driver (required by inter-node communication)
Chenggang Zhao's avatar
Chenggang Zhao committed
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

Enable IBGDA by modifying `/etc/modprobe.d/nvidia.conf`:

```bash
options nvidia NVreg_EnableStreamMemOPs=1 NVreg_RegistryDwords="PeerMappingOverride=1;"
```

Update kernel configuration:

```bash
sudo update-initramfs -u
sudo reboot
```

For more detailed configurations, please refer to the [NVSHMEM Installation Guide](https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html).

youkaichao's avatar
youkaichao committed
50
### 4. Build and installation
Chenggang Zhao's avatar
Chenggang Zhao committed
51

youkaichao's avatar
youkaichao committed
52
DeepEP uses NVLink for intra-node communication and IBGDA for inter-node communication. All the other features are disabled to reduce the dependencies.
Chenggang Zhao's avatar
Chenggang Zhao committed
53
54

```bash
youkaichao's avatar
youkaichao committed
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
export CUDA_HOME=/path/to/cuda
# disable all features except IBGDA
export NVSHMEM_IBGDA_SUPPORT=1

export NVSHMEM_SHMEM_SUPPORT=0
export NVSHMEM_UCX_SUPPORT=0
export NVSHMEM_USE_NCCL=0
export NVSHMEM_PMIX_SUPPORT=0
export NVSHMEM_TIMEOUT_DEVICE_POLLING=0
export NVSHMEM_USE_GDRCOPY=0
export NVSHMEM_IBRC_SUPPORT=0
export NVSHMEM_BUILD_TESTS=0
export NVSHMEM_BUILD_EXAMPLES=0
export NVSHMEM_MPI_SUPPORT=0
export NVSHMEM_BUILD_HYDRA_LAUNCHER=0
export NVSHMEM_BUILD_TXZ_PACKAGE=0
export NVSHMEM_TIMEOUT_DEVICE_POLLING=0

cmake -G Ninja -S . -B build -DCMAKE_INSTALL_PREFIX=/path/to/your/dir/to/install
cmake --build build/ --target install
Chenggang Zhao's avatar
Chenggang Zhao committed
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
```

## Post-installation configuration

Set environment variables in your shell configuration:

```bash
export NVSHMEM_DIR=/path/to/your/dir/to/install  # Use for DeepEP installation
export LD_LIBRARY_PATH="${NVSHMEM_DIR}/lib:$LD_LIBRARY_PATH"
export PATH="${NVSHMEM_DIR}/bin:$PATH"
```

## Verification

```bash
nvshmem-info -a # Should display details of nvshmem
```