Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
61bb223e
Unverified
Commit
61bb223e
authored
Aug 25, 2024
by
Lianmin Zheng
Committed by
GitHub
Aug 25, 2024
Browse files
Update CI runner docs (#1213)
parent
15f1a49d
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
30 additions
and
75 deletions
+30
-75
.github/workflows/moe-test.yml
.github/workflows/moe-test.yml
+2
-2
docs/en/setup_github_runner.md
docs/en/setup_github_runner.md
+28
-73
No files found.
.github/workflows/moe-test.yml
View file @
61bb223e
...
@@ -33,13 +33,13 @@ jobs:
...
@@ -33,13 +33,13 @@ jobs:
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
-
name
:
Benchmark MoE Serving Throughput
-
name
:
Benchmark MoE Serving Throughput
timeout
_
minutes
:
10
timeout
-
minutes
:
10
run
:
|
run
:
|
cd test/srt
cd test/srt
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default
-
name
:
Benchmark MoE Serving Throughput (w/o RadixAttention)
-
name
:
Benchmark MoE Serving Throughput (w/o RadixAttention)
timeout
_
minutes
:
10
timeout
-
minutes
:
10
run
:
|
run
:
|
cd test/srt
cd test/srt
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default_without_radix_cache
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default_without_radix_cache
docs/en/setup_github_runner.md
View file @
61bb223e
# Set
u
p
s
elf
hosted
r
unner for GitHub Action
# Set
U
p
S
elf
-
hosted
R
unner
s
for GitHub Action
##
Config
Runner
##
Add a
Runner
```
bash
### Step 1: Start a docker container.
# https://github.com/sgl-project/sglang/settings/actions/runners/new?arch=x64&os=linux
# Involves some TOKEN and other private information, click the link to view specific steps.
```
## Start Runner
You can mount a folder for the shared huggingface model weights cache. The command below uses
`/tmp/huggingface`
as an example.
add
`/lib/systemd/system/e2e.service`
```
```
[Unit]
docker pull nvidia/cuda:12.1.1-devel-ubuntu22.04
StartLimitIntervalSec=0
docker run --shm-size 64g -it -v /tmp/huggingface:/hf_home --gpus all nvidia/cuda:12.1.1-devel-ubuntu22.04 /bin/bash
[Service]
Environment="CUDA_VISIBLE_DEVICES=7"
Environment="XDG_CACHE_HOME=/data/.cache"
Environment="HF_TOKEN=hf_xx"
Environment="OPENAI_API_KEY=sk-xx"
Environment="HOME=/data/zhyncs/runner-v1"
Environment="SGLANG_IS_IN_CI=true"
Restart=always
RestartSec=1
ExecStart=/data/zhyncs/runner-v1/actions-runner/run.sh
[Install]
WantedBy=multi-user.target
```
```
add
`/lib/systemd/system/unit.service`
### Step 2: Configure the runner by `config.sh`
```
[Unit]
Run these commands inside the container.
StartLimitIntervalSec=0
[Service]
Environment="CUDA_VISIBLE_DEVICES=6"
Environment="XDG_CACHE_HOME=/data/.cache"
Environment="HF_TOKEN=hf_xx"
Environment="OPENAI_API_KEY=sk-xx"
Environment="HOME=/data/zhyncs/runner-v2"
Environment="SGLANG_IS_IN_CI=true"
Restart=always
RestartSec=1
ExecStart=/data/zhyncs/runner-v2/actions-runner/run.sh
[Install]
WantedBy=multi-user.target
```
add
`/lib/systemd/system/accuracy.service`
```
```
[Unit]
apt update && apt install -y curl python3-pip git
StartLimitIntervalSec=0
export RUNNER_ALLOW_RUNASROOT=1
[Service]
Environment="CUDA_VISIBLE_DEVICES=5"
Environment="XDG_CACHE_HOME=/data/.cache"
Environment="HF_TOKEN=hf_xx"
Environment="OPENAI_API_KEY=sk-xx"
Environment="HOME=/data/zhyncs/runner-v3"
Environment="SGLANG_IS_IN_CI=true"
Restart=always
RestartSec=1
ExecStart=/data/zhyncs/runner-v3/actions-runner/run.sh
[Install]
WantedBy=multi-user.target
```
```
```
bash
Then follow https://github.com/sgl-project/sglang/settings/actions/runners/new?arch=x64&os=linux to run
`config.sh`
cd
/data/zhyncs/runner-v1
python3
-m
venv venv
cd
/data/zhyncs/runner-v2
python3
-m
venv venv
cd
/data/zhyncs/runner-v3
**Notes**
python3
-m
venv venv
-
Do not need to specify the runner group
-
Give it a name (e.g.,
`test-sgl-gpu-0`
) and some labels (e.g.,
`unit-test`
). The labels can be editted later in Github Settings.
-
Do not need to change the work folder.
sudo
systemctl daemon-reload
### Step 3: Run the runner by `run.sh`
sudo
systemctl start e2e
-
Set up environment variables
sudo
systemctl
enable
e2e
```
sudo
systemctl status e2e
export HF_HOME=/hf_home
export SGLANG_IS_IN_CI=true
sudo
systemctl start unit
export HF_TOKEN=hf_xxx
sudo
systemctl
enable
unit
export OPENAI_API_KEY=sk-xxx
sudo
systemctl status unit
export CUDA_VISIBLE_DEVICES=0
```
sudo
systemctl start accuracy
-
Run it forever
sudo
systemctl
enable
accuracy
```
sudo
systemctl status accuracy
while true; do ./run.sh; echo "Restarting..."; sleep 2; done
```
```
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment