Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
zhaoyu6
sglang
Commits
61bb223e
"vscode:/vscode.git/clone" did not exist on "7517c9652c450df77ce1c95d09d1146fc2ae9e96"
Unverified
Commit
61bb223e
authored
Aug 25, 2024
by
Lianmin Zheng
Committed by
GitHub
Aug 25, 2024
Browse files
Update CI runner docs (#1213)
parent
15f1a49d
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
30 additions
and
75 deletions
+30
-75
.github/workflows/moe-test.yml
.github/workflows/moe-test.yml
+2
-2
docs/en/setup_github_runner.md
docs/en/setup_github_runner.md
+28
-73
No files found.
.github/workflows/moe-test.yml
View file @
61bb223e
...
@@ -33,13 +33,13 @@ jobs:
...
@@ -33,13 +33,13 @@ jobs:
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
-
name
:
Benchmark MoE Serving Throughput
-
name
:
Benchmark MoE Serving Throughput
timeout
_
minutes
:
10
timeout
-
minutes
:
10
run
:
|
run
:
|
cd test/srt
cd test/srt
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default
-
name
:
Benchmark MoE Serving Throughput (w/o RadixAttention)
-
name
:
Benchmark MoE Serving Throughput (w/o RadixAttention)
timeout
_
minutes
:
10
timeout
-
minutes
:
10
run
:
|
run
:
|
cd test/srt
cd test/srt
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default_without_radix_cache
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default_without_radix_cache
docs/en/setup_github_runner.md
View file @
61bb223e
# Set
u
p
s
elf
hosted
r
unner for GitHub Action
# Set
U
p
S
elf
-
hosted
R
unner
s
for GitHub Action
##
Config
Runner
##
Add a
Runner
```
bash
### Step 1: Start a docker container.
# https://github.com/sgl-project/sglang/settings/actions/runners/new?arch=x64&os=linux
# Involves some TOKEN and other private information, click the link to view specific steps.
```
## Start Runner
You can mount a folder for the shared huggingface model weights cache. The command below uses
`/tmp/huggingface`
as an example.
add
`/lib/systemd/system/e2e.service`
```
```
[Unit]
docker pull nvidia/cuda:12.1.1-devel-ubuntu22.04
StartLimitIntervalSec=0
docker run --shm-size 64g -it -v /tmp/huggingface:/hf_home --gpus all nvidia/cuda:12.1.1-devel-ubuntu22.04 /bin/bash
[Service]
Environment="CUDA_VISIBLE_DEVICES=7"
Environment="XDG_CACHE_HOME=/data/.cache"
Environment="HF_TOKEN=hf_xx"
Environment="OPENAI_API_KEY=sk-xx"
Environment="HOME=/data/zhyncs/runner-v1"
Environment="SGLANG_IS_IN_CI=true"
Restart=always
RestartSec=1
ExecStart=/data/zhyncs/runner-v1/actions-runner/run.sh
[Install]
WantedBy=multi-user.target
```
```
add
`/lib/systemd/system/unit.service`
### Step 2: Configure the runner by `config.sh`
```
[Unit]
Run these commands inside the container.
StartLimitIntervalSec=0
[Service]
Environment="CUDA_VISIBLE_DEVICES=6"
Environment="XDG_CACHE_HOME=/data/.cache"
Environment="HF_TOKEN=hf_xx"
Environment="OPENAI_API_KEY=sk-xx"
Environment="HOME=/data/zhyncs/runner-v2"
Environment="SGLANG_IS_IN_CI=true"
Restart=always
RestartSec=1
ExecStart=/data/zhyncs/runner-v2/actions-runner/run.sh
[Install]
WantedBy=multi-user.target
```
add
`/lib/systemd/system/accuracy.service`
```
```
[Unit]
apt update && apt install -y curl python3-pip git
StartLimitIntervalSec=0
export RUNNER_ALLOW_RUNASROOT=1
[Service]
Environment="CUDA_VISIBLE_DEVICES=5"
Environment="XDG_CACHE_HOME=/data/.cache"
Environment="HF_TOKEN=hf_xx"
Environment="OPENAI_API_KEY=sk-xx"
Environment="HOME=/data/zhyncs/runner-v3"
Environment="SGLANG_IS_IN_CI=true"
Restart=always
RestartSec=1
ExecStart=/data/zhyncs/runner-v3/actions-runner/run.sh
[Install]
WantedBy=multi-user.target
```
```
```
bash
Then follow https://github.com/sgl-project/sglang/settings/actions/runners/new?arch=x64&os=linux to run
`config.sh`
cd
/data/zhyncs/runner-v1
python3
-m
venv venv
cd
/data/zhyncs/runner-v2
**Notes**
python3
-m
venv venv
-
Do not need to specify the runner group
-
Give it a name (e.g.,
`test-sgl-gpu-0`
) and some labels (e.g.,
`unit-test`
). The labels can be editted later in Github Settings.
-
Do not need to change the work folder.
cd
/data/zhyncs/runner-v3
### Step 3: Run the runner by `run.sh`
python3
-m
venv venv
sudo
systemctl daemon-reload
-
Set up environment variables
```
sudo
systemctl start e2e
export HF_HOME=/hf_home
sudo
systemctl
enable
e2e
export SGLANG_IS_IN_CI=true
sudo
systemctl status e2e
export HF_TOKEN=hf_xxx
export OPENAI_API_KEY=sk-xxx
sudo
systemctl start unit
export CUDA_VISIBLE_DEVICES=0
sudo
systemctl
enable
unit
```
sudo
systemctl status unit
sudo
systemctl start accuracy
-
Run it forever
sudo
systemctl
enable
accuracy
sudo
systemctl status accuracy
```
```
while true; do ./run.sh; echo "Restarting..."; sleep 2; done
```
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment