added amd_configure.md to references (#3275)

Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>

added amd_configure.md to references (#3275)
Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
0a6f18f0 · Zachary Streeter · GitHub · c1f5f99f · 0a6f18f0 · 0a6f18f0
Unverified Commit 0a6f18f0 authored Feb 07, 2025 by Zachary Streeter Committed by GitHub Feb 07, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 101 additions and 0 deletions

docs/index.rst docs/index.rst +1 -0

docs/references/amd_configure.md docs/references/amd_configure.md +100 -0

No files found.
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -59,6 +59,7 @@ The core features include:
   references/benchmark_and_profiling.md
   references/accuracy_evaluation.md
   references/custom_chat_template.md
+   references/amd_configure.md
   references/deepseek.md
   references/multi_node.md
   references/modelscope.md

--- a/docs/references/amd_configure.md
+++ b/docs/references/amd_configure.md
+# AMD Configuration and Setup for SGLang
+
+## Introduction
+
+This document describes how to set up an AMD-based environment for [SGLang](https://github.com/sgl-project/sglang). If you encounter issues or have questions, please [open an issue](https://github.com/sgl-project/sglang/issues) on the SGLang repository.
+
+## System Configure
+
+When using AMD GPUs (such as MI300X), certain system-level optimizations help ensure stable performance. Here we take MI300X as an example. AMD provides official documentation for MI300X optimization and system tuning:
+
+- [AMD MI300X Tuning Guides](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html)
+  - [LLM inference performance validation on AMD Instinct MI300X](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/vllm-benchmark.html)
+  - [AMD Instinct MI300X System Optimization](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300x.html)
+  - [AMD Instinct MI300X Workload Optimization](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference-optimization/workload.html)
+
+**NOTE:** We strongly recommend reading theses docs entirely guide to fully utilize your system.
+
+Below are a few key settings to confirm or enable:
+
+### Update GRUB Settings
+
+In `/etc/default/grub`, append the following to `GRUB_CMDLINE_LINUX`:
+
+```text
+pci=realloc=off iommu=pt
+```
+
+Afterward, run `sudo update-grub` (or your distro’s equivalent) and reboot.
+
+### Disable NUMA Auto-Balancing
+
+```bash
+sudo sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
+```
+
+You can automate or verify this change using [this helpful script](https://github.com/ROCm/triton/blob/rocm_env/scripts/amd/env_check.sh).
+
+Again, please go through the entire documentation to confirm your system is using the recommended configuration.
+
+## Installing SGLang
+
+For general installation instructions, see the official [SGLang Installation Docs](https://docs.sglang.ai/start/install.html). Below are the AMD-specific steps summarized for convenience.
+
+### Install from Source
+
+```bash
+git clone https://github.com/sgl-project/sglang.git
+cd sglang
+
+pip install --upgrade pip
+pip install sgl-kernel --force-reinstall --no-deps
+pip install -e "python[all_hip]"
+```
+
+### Install Using Docker (Recommended)
+
+1. Build the docker image.
+
+```bash
+docker build -t sglang_image -f Dockerfile.rocm .
+```
+
+2. Create a convenient alias.
+
+```bash
+alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri \
+    --ipc=host --shm-size 16G --group-add video --cap-add=SYS_PTRACE \
+    --security-opt seccomp=unconfined \
+    -v $HOME/dockerx:/dockerx \
+    -v /data:/data'
+```
+
+3. Launch the server.
+
+**NOTE:** Replace `<secret>` below with your [huggingface hub token](https://huggingface.co/docs/hub/en/security-tokens).
+
+```bash
+drun -p 30000:30000 \
+    -v ~/.cache/huggingface:/root/.cache/huggingface \
+    --env "HF_TOKEN=<secret>" \
+    sglang_image \
+    python3 -m sglang.launch_server \
+    --model-path NousResearch/Meta-Llama-3.1-8B \
+    --host 0.0.0.0 \
+    --port 30000
+```
+
+4. To verify the utility, you can run a benchmark in another terminal or refer to [other docs](https://docs.sglang.ai/backend/openai_api_completions.html) to send requests to the engine.
+
+```bash
+drun sglang_image \
+    python3 -m sglang.bench_serving \
+    --backend sglang \
+    --dataset-name random \
+    --num-prompts 4000 \
+    --random-input 128 \
+    --random-output 128
+```
+
+With your AMD system properly configured and SGLang installed, you can now fully leverage AMD hardware to power SGLang’s machine learning capabilities.