Unverified Commit 61bb223e authored by Lianmin Zheng's avatar Lianmin Zheng Committed by GitHub
Browse files

Update CI runner docs (#1213)

parent 15f1a49d
...@@ -33,13 +33,13 @@ jobs: ...@@ -33,13 +33,13 @@ jobs:
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
- name: Benchmark MoE Serving Throughput - name: Benchmark MoE Serving Throughput
timeout_minutes: 10 timeout-minutes: 10
run: | run: |
cd test/srt cd test/srt
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default
- name: Benchmark MoE Serving Throughput (w/o RadixAttention) - name: Benchmark MoE Serving Throughput (w/o RadixAttention)
timeout_minutes: 10 timeout-minutes: 10
run: | run: |
cd test/srt cd test/srt
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default_without_radix_cache python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default_without_radix_cache
# Set up self hosted runner for GitHub Action # Set Up Self-hosted Runners for GitHub Action
## Config Runner ## Add a Runner
```bash ### Step 1: Start a docker container.
# https://github.com/sgl-project/sglang/settings/actions/runners/new?arch=x64&os=linux
# Involves some TOKEN and other private information, click the link to view specific steps.
```
## Start Runner You can mount a folder for the shared huggingface model weights cache. The command below uses `/tmp/huggingface` as an example.
add `/lib/systemd/system/e2e.service`
``` ```
[Unit] docker pull nvidia/cuda:12.1.1-devel-ubuntu22.04
StartLimitIntervalSec=0 docker run --shm-size 64g -it -v /tmp/huggingface:/hf_home --gpus all nvidia/cuda:12.1.1-devel-ubuntu22.04 /bin/bash
[Service]
Environment="CUDA_VISIBLE_DEVICES=7"
Environment="XDG_CACHE_HOME=/data/.cache"
Environment="HF_TOKEN=hf_xx"
Environment="OPENAI_API_KEY=sk-xx"
Environment="HOME=/data/zhyncs/runner-v1"
Environment="SGLANG_IS_IN_CI=true"
Restart=always
RestartSec=1
ExecStart=/data/zhyncs/runner-v1/actions-runner/run.sh
[Install]
WantedBy=multi-user.target
``` ```
add `/lib/systemd/system/unit.service` ### Step 2: Configure the runner by `config.sh`
```
[Unit] Run these commands inside the container.
StartLimitIntervalSec=0
[Service]
Environment="CUDA_VISIBLE_DEVICES=6"
Environment="XDG_CACHE_HOME=/data/.cache"
Environment="HF_TOKEN=hf_xx"
Environment="OPENAI_API_KEY=sk-xx"
Environment="HOME=/data/zhyncs/runner-v2"
Environment="SGLANG_IS_IN_CI=true"
Restart=always
RestartSec=1
ExecStart=/data/zhyncs/runner-v2/actions-runner/run.sh
[Install]
WantedBy=multi-user.target
```
add `/lib/systemd/system/accuracy.service`
``` ```
[Unit] apt update && apt install -y curl python3-pip git
StartLimitIntervalSec=0 export RUNNER_ALLOW_RUNASROOT=1
[Service]
Environment="CUDA_VISIBLE_DEVICES=5"
Environment="XDG_CACHE_HOME=/data/.cache"
Environment="HF_TOKEN=hf_xx"
Environment="OPENAI_API_KEY=sk-xx"
Environment="HOME=/data/zhyncs/runner-v3"
Environment="SGLANG_IS_IN_CI=true"
Restart=always
RestartSec=1
ExecStart=/data/zhyncs/runner-v3/actions-runner/run.sh
[Install]
WantedBy=multi-user.target
``` ```
```bash Then follow https://github.com/sgl-project/sglang/settings/actions/runners/new?arch=x64&os=linux to run `config.sh`
cd /data/zhyncs/runner-v1
python3 -m venv venv
cd /data/zhyncs/runner-v2 **Notes**
python3 -m venv venv - Do not need to specify the runner group
- Give it a name (e.g., `test-sgl-gpu-0`) and some labels (e.g., `unit-test`). The labels can be editted later in Github Settings.
- Do not need to change the work folder.
cd /data/zhyncs/runner-v3 ### Step 3: Run the runner by `run.sh`
python3 -m venv venv
sudo systemctl daemon-reload - Set up environment variables
```
sudo systemctl start e2e export HF_HOME=/hf_home
sudo systemctl enable e2e export SGLANG_IS_IN_CI=true
sudo systemctl status e2e export HF_TOKEN=hf_xxx
export OPENAI_API_KEY=sk-xxx
sudo systemctl start unit export CUDA_VISIBLE_DEVICES=0
sudo systemctl enable unit ```
sudo systemctl status unit
sudo systemctl start accuracy - Run it forever
sudo systemctl enable accuracy
sudo systemctl status accuracy
``` ```
while true; do ./run.sh; echo "Restarting..."; sleep 2; done
```
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment