amd_gpu.md 5.62 KB
Newer Older
Lianmin Zheng's avatar
Lianmin Zheng committed
1
# AMD GPUs
2

Lianmin Zheng's avatar
Lianmin Zheng committed
3
This document describes how run SGLang on AMD GPUs. If you encounter issues or have questions, please [open an issue](https://github.com/sgl-project/sglang/issues).
4

5
## System Configuration
6
7
8
9

When using AMD GPUs (such as MI300X), certain system-level optimizations help ensure stable performance. Here we take MI300X as an example. AMD provides official documentation for MI300X optimization and system tuning:

- [AMD MI300X Tuning Guides](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html)
Lianmin Zheng's avatar
Lianmin Zheng committed
10
11
12
13
- [LLM inference performance validation on AMD Instinct MI300X](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/vllm-benchmark.html)
- [AMD Instinct MI300X System Optimization](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300x.html)
- [AMD Instinct MI300X Workload Optimization](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference-optimization/workload.html)
- [Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1-Part2/README.html)
14

15
**NOTE:** We strongly recommend reading these docs and guides entirely to fully utilize your system.
16

17
Below are a few key settings to confirm or enable for SGLang:
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

### Update GRUB Settings

In `/etc/default/grub`, append the following to `GRUB_CMDLINE_LINUX`:

```text
pci=realloc=off iommu=pt
```

Afterward, run `sudo update-grub` (or your distro’s equivalent) and reboot.

### Disable NUMA Auto-Balancing

```bash
sudo sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
```

You can automate or verify this change using [this helpful script](https://github.com/ROCm/triton/blob/rocm_env/scripts/amd/env_check.sh).

Again, please go through the entire documentation to confirm your system is using the recommended configuration.

Lianmin Zheng's avatar
Lianmin Zheng committed
39
## Install SGLang
40

Lianmin Zheng's avatar
Lianmin Zheng committed
41
You can install SGLang using one of the methods below.
42
43
44
45

### Install from Source

```bash
Lianmin Zheng's avatar
Lianmin Zheng committed
46
# Use the last release branch
47
git clone -b v0.5.4.post1 https://github.com/sgl-project/sglang.git
48
49
cd sglang

Lianmin Zheng's avatar
Lianmin Zheng committed
50
# Compile sgl-kernel
51
pip install --upgrade pip
Lianmin Zheng's avatar
Lianmin Zheng committed
52
53
54
55
56
cd sgl-kernel
python setup_rocm.py install

# Install sglang python package
cd ..
57
rm -rf python/pyproject.toml && mv python/pyproject_other.toml python/pyproject.toml
58
59
60
61
62
pip install -e "python[all_hip]"
```

### Install Using Docker (Recommended)

63
The docker images are available on Docker Hub at [lmsysorg/sglang](https://hub.docker.com/r/lmsysorg/sglang/tags), built from [rocm.Dockerfile](https://github.com/sgl-project/sglang/tree/main/docker).
Lianmin Zheng's avatar
Lianmin Zheng committed
64
65
66

The steps below show how to build and use an image.

67
1. Build the docker image.
Lianmin Zheng's avatar
Lianmin Zheng committed
68
   If you use pre-built images, you can skip this step and replace `sglang_image` with the pre-built image names in the steps below.
69

70
   ```bash
71
   docker build -t sglang_image -f rocm.Dockerfile .
72
   ```
73
74
75

2. Create a convenient alias.

76
77
78
79
80
81
82
   ```bash
   alias drun='docker run -it --rm --network=host --privileged --device=/dev/kfd --device=/dev/dri \
       --ipc=host --shm-size 16G --group-add video --cap-add=SYS_PTRACE \
       --security-opt seccomp=unconfined \
       -v $HOME/dockerx:/dockerx \
       -v /data:/data'
   ```
83

Lianmin Zheng's avatar
Lianmin Zheng committed
84
85
86
   If you are using RDMA, please note that:
     - `--network host` and `--privileged` are required by RDMA. If you don't need RDMA, you can remove them.
     - You may need to set `NCCL_IB_GID_INDEX` if you are using RoCE, for example: `export NCCL_IB_GID_INDEX=3`.
87

88
89
3. Launch the server.

90
   **NOTE:** Replace `<secret>` below with your [huggingface hub token](https://huggingface.co/docs/hub/en/security-tokens).
91

92
93
94
95
96
97
98
99
100
101
   ```bash
   drun -p 30000:30000 \
       -v ~/.cache/huggingface:/root/.cache/huggingface \
       --env "HF_TOKEN=<secret>" \
       sglang_image \
       python3 -m sglang.launch_server \
       --model-path NousResearch/Meta-Llama-3.1-8B \
       --host 0.0.0.0 \
       --port 30000
   ```
102
103
104

4. To verify the utility, you can run a benchmark in another terminal or refer to [other docs](https://docs.sglang.ai/backend/openai_api_completions.html) to send requests to the engine.

105
106
107
108
109
110
111
112
113
   ```bash
   drun sglang_image \
       python3 -m sglang.bench_serving \
       --backend sglang \
       --dataset-name random \
       --num-prompts 4000 \
       --random-input 128 \
       --random-output 128
   ```
114
115

With your AMD system properly configured and SGLang installed, you can now fully leverage AMD hardware to power SGLang’s machine learning capabilities.
116

117
## Examples
118

119
120
### Running DeepSeek-V3

121
The only difference when running DeepSeek-V3 is in how you start the server. Here's an example command:
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136

```bash
drun -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --ipc=host \
    --env "HF_TOKEN=<secret>" \
    sglang_image \
    python3 -m sglang.launch_server \
    --model-path deepseek-ai/DeepSeek-V3 \ # <- here
    --tp 8 \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 30000
```

Chayenne's avatar
Chayenne committed
137
138
[Running DeepSeek-R1 on a single NDv5 MI300X VM](https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/running-deepseek-r1-on-a-single-ndv5-mi300x-vm/4372726) could also be a good reference.

139
140
### Running Llama3.1

141
Running Llama3.1 is nearly identical to running DeepSeek-V3. The only difference is in the model specified when starting the server, shown by the following example command:
142
143
144
145
146
147
148
149

```bash
drun -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --ipc=host \
    --env "HF_TOKEN=<secret>" \
    sglang_image \
    python3 -m sglang.launch_server \
150
    --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \ # <- here
151
152
153
154
155
    --tp 8 \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 30000
```
156
157
158

### Warmup Step

159
When the server displays `The server is fired up and ready to roll!`, it means the startup is successful.