Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
ef48d554
Unverified
Commit
ef48d554
authored
Aug 09, 2025
by
Lianmin Zheng
Committed by
GitHub
Aug 09, 2025
Browse files
Fix CI (#9013)
parent
a886564a
Changes
9
Show whitespace changes
Inline
Side-by-side
Showing
9 changed files
with
120 additions
and
75 deletions
+120
-75
.github/workflows/cancel-all-pending-pr-test-runs.yml
.github/workflows/cancel-all-pending-pr-test-runs.yml
+25
-11
.github/workflows/execute-notebook.yml
.github/workflows/execute-notebook.yml
+1
-1
.github/workflows/pr-test-xeon.yml
.github/workflows/pr-test-xeon.yml
+1
-1
.github/workflows/pr-test.yml
.github/workflows/pr-test.yml
+7
-3
python/sglang/srt/layers/moe/topk.py
python/sglang/srt/layers/moe/topk.py
+2
-0
test/srt/run_suite.py
test/srt/run_suite.py
+64
-48
test/srt/test_bench_serving.py
test/srt/test_bench_serving.py
+1
-1
test/srt/test_gpt_oss_1gpu.py
test/srt/test_gpt_oss_1gpu.py
+6
-6
test/srt/test_gpt_oss_common.py
test/srt/test_gpt_oss_common.py
+13
-4
No files found.
.github/workflows/cancel-all-pending-pr-test-runs.yml
View file @
ef48d554
...
@@ -2,6 +2,12 @@ name: Cancel All Pending PR Test Runs
...
@@ -2,6 +2,12 @@ name: Cancel All Pending PR Test Runs
on
:
on
:
workflow_dispatch
:
workflow_dispatch
:
inputs
:
workflows
:
description
:
'
Space-separated
list
of
workflow
filenames
to
cancel'
required
:
true
type
:
string
default
:
'
pr-test.yml
pr-test-xeon.yml'
permissions
:
permissions
:
actions
:
write
# Needed to cancel runs
actions
:
write
# Needed to cancel runs
...
@@ -14,18 +20,26 @@ jobs:
...
@@ -14,18 +20,26 @@ jobs:
-
name
:
Install GitHub CLI
-
name
:
Install GitHub CLI
run
:
sudo apt-get install -y gh jq
run
:
sudo apt-get install -y gh jq
-
name
:
Cancel all pending/waiting
pr-test.yml run
s
-
name
:
Cancel all pending/waiting
runs for specified workflow
s
env
:
env
:
GH_TOKEN
:
${{ secrets.GITHUB_TOKEN }}
GH_TOKEN
:
${{ secrets.GITHUB_TOKEN }}
REPO
:
${{ github.repository }}
REPO
:
${{ github.repository }}
run
:
|
run
:
|
# Read the space-separated string from the input into a bash array
WORKFLOW_FILES=(${{ github.event.inputs.workflows }})
echo "Targeting ${#WORKFLOW_FILES[@]} workflow(s): ${{ github.event.inputs.workflows }}"
for workflow_file in "${WORKFLOW_FILES[@]}"; do
echo "--- Checking workflow: $workflow_file ---"
gh run list \
gh run list \
--repo "$REPO" \
--repo "$REPO" \
--workflow
pr-test.yml
\
--workflow
"$workflow_file"
\
--json databaseId,status \
--json databaseId,status \
--limit 1000 \
--limit 1000 \
| jq -r '.[] | select(.status=="queued" or .status=="in_progress") | .databaseId' \
| jq -r '.[] | select(.status=="queued" or .status=="in_progress") | .databaseId' \
| while read run_id; do
| while read run_id; do
echo "Cancelling run ID: $run_id"
echo "Cancelling run ID: $run_id
for workflow: $workflow_file
"
gh run cancel "$run_id" --repo "$REPO"
gh run cancel "$run_id" --repo "$REPO"
done
done
done
.github/workflows/execute-notebook.yml
View file @
ef48d554
...
@@ -41,7 +41,7 @@ jobs:
...
@@ -41,7 +41,7 @@ jobs:
make compile
make compile
finish
:
notebook-
finish
:
needs
:
[
needs
:
[
run-all-notebooks
run-all-notebooks
]
]
...
...
.github/workflows/pr-test-xeon.yml
View file @
ef48d554
...
@@ -27,7 +27,7 @@ jobs:
...
@@ -27,7 +27,7 @@ jobs:
build-test
:
build-test
:
if
:
(github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') &&
if
:
(github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') &&
github.event.pull_request.draft ==
false
github.event.pull_request.draft ==
false
runs-on
:
sglang-gnr
runs-on
:
xeon-pvc
strategy
:
strategy
:
matrix
:
matrix
:
build_type
:
[
'
all'
]
build_type
:
[
'
all'
]
...
...
.github/workflows/pr-test.yml
View file @
ef48d554
...
@@ -85,7 +85,7 @@ jobs:
...
@@ -85,7 +85,7 @@ jobs:
python3 run_suite.py --suite per-commit --auto-partition-id ${{ matrix.part }} --auto-partition-size 10
python3 run_suite.py --suite per-commit --auto-partition-id ${{ matrix.part }} --auto-partition-size 10
unit-test-backend-2-gpu
:
unit-test-backend-2-gpu
:
needs
:
[
check-changes
,
unit-test-frontend
]
needs
:
[
check-changes
]
if
:
(github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') &&
if
:
(github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') &&
github.event.pull_request.draft ==
false
&&
github.event.pull_request.draft ==
false
&&
needs.check-changes.outputs.src == 'true'
needs.check-changes.outputs.src == 'true'
...
@@ -110,6 +110,10 @@ jobs:
...
@@ -110,6 +110,10 @@ jobs:
github.event.pull_request.draft ==
false
&&
github.event.pull_request.draft ==
false
&&
needs.check-changes.outputs.src == 'true'
needs.check-changes.outputs.src == 'true'
runs-on
:
4-gpu-runner
runs-on
:
4-gpu-runner
strategy
:
fail-fast
:
false
matrix
:
part
:
[
0
,
1
]
steps
:
steps
:
-
name
:
Checkout code
-
name
:
Checkout code
uses
:
actions/checkout@v4
uses
:
actions/checkout@v4
...
@@ -119,10 +123,10 @@ jobs:
...
@@ -119,10 +123,10 @@ jobs:
bash scripts/ci_install_dependency.sh
bash scripts/ci_install_dependency.sh
-
name
:
Run test
-
name
:
Run test
timeout-minutes
:
3
0
timeout-minutes
:
2
0
run
:
|
run
:
|
cd test/srt
cd test/srt
python3 run_suite.py --suite per-commit-4-gpu
python3 run_suite.py --suite per-commit-4-gpu
--auto-partition-id ${{ matrix.part }} --auto-partition-size 2
unit-test-backend-8-gpu
:
unit-test-backend-8-gpu
:
needs
:
[
check-changes
,
unit-test-backend-2-gpu
]
needs
:
[
check-changes
,
unit-test-backend-2-gpu
]
...
...
python/sglang/srt/layers/moe/topk.py
View file @
ef48d554
...
@@ -449,7 +449,9 @@ def grouped_topk_cpu(
...
@@ -449,7 +449,9 @@ def grouped_topk_cpu(
routed_scaling_factor
:
Optional
[
float
]
=
None
,
routed_scaling_factor
:
Optional
[
float
]
=
None
,
num_token_non_padded
:
Optional
[
torch
.
Tensor
]
=
None
,
num_token_non_padded
:
Optional
[
torch
.
Tensor
]
=
None
,
expert_location_dispatch_info
:
Optional
[
ExpertLocationDispatchInfo
]
=
None
,
expert_location_dispatch_info
:
Optional
[
ExpertLocationDispatchInfo
]
=
None
,
apply_routed_scaling_factor_on_output
:
Optional
[
bool
]
=
False
,
):
):
assert
not
apply_routed_scaling_factor_on_output
assert
expert_location_dispatch_info
is
None
assert
expert_location_dispatch_info
is
None
return
torch
.
ops
.
sgl_kernel
.
grouped_topk_cpu
(
return
torch
.
ops
.
sgl_kernel
.
grouped_topk_cpu
(
hidden_states
,
hidden_states
,
...
...
test/srt/run_suite.py
View file @
ef48d554
...
@@ -111,6 +111,50 @@ suites = {
...
@@ -111,6 +111,50 @@ suites = {
TestFile
(
"test_reasoning_parser.py"
,
5
),
TestFile
(
"test_reasoning_parser.py"
,
5
),
TestFile
(
"test_hybrid_attn_backend.py"
,
100
),
TestFile
(
"test_hybrid_attn_backend.py"
,
100
),
],
],
"per-commit-2-gpu"
:
[
TestFile
(
"models/lora/test_lora_tp.py"
,
116
),
TestFile
(
"test_data_parallelism.py"
,
73
),
TestFile
(
"test_dp_attention.py"
,
277
),
TestFile
(
"test_patch_torch.py"
,
19
),
TestFile
(
"test_update_weights_from_distributed.py"
,
103
),
TestFile
(
"test_release_memory_occupation.py"
,
127
),
],
"per-commit-4-gpu"
:
[
TestFile
(
"test_gpt_oss_4gpu.py"
,
600
),
TestFile
(
"test_local_attn.py"
,
250
),
TestFile
(
"test_pp_single_node.py"
,
372
),
TestFile
(
"test_multi_instance_release_memory_occupation.py"
,
64
),
],
"per-commit-8-gpu"
:
[
# Disabled because it hangs on the CI.
# TestFile("test_moe_ep.py", 181),
TestFile
(
"test_disaggregation.py"
,
499
),
TestFile
(
"test_disaggregation_different_tp.py"
,
155
),
TestFile
(
"test_full_deepseek_v3.py"
,
333
),
],
"per-commit-8-gpu-b200"
:
[
# add more here
],
"per-commit-4-gpu-deepep"
:
[
TestFile
(
"test_deepep_small.py"
,
531
),
],
"per-commit-8-gpu-deepep"
:
[
TestFile
(
"test_deepep_large.py"
,
338
),
],
"nightly"
:
[
TestFile
(
"test_nightly_gsm8k_eval.py"
),
],
"vllm_dependency_test"
:
[
TestFile
(
"test_awq.py"
,
163
),
TestFile
(
"test_bnb.py"
,
5
),
TestFile
(
"test_gguf.py"
,
96
),
TestFile
(
"test_gptqmodel_dynamic.py"
,
102
),
TestFile
(
"test_vllm_dependency.py"
,
185
),
],
}
# Add AMD tests
suite_amd
=
{
"per-commit-amd"
:
[
"per-commit-amd"
:
[
TestFile
(
"models/lora/test_lora_backend.py"
,
99
),
TestFile
(
"models/lora/test_lora_backend.py"
,
99
),
TestFile
(
"models/lora/test_multi_lora_backend.py"
,
60
),
TestFile
(
"models/lora/test_multi_lora_backend.py"
,
60
),
...
@@ -153,57 +197,25 @@ suites = {
...
@@ -153,57 +197,25 @@ suites = {
TestFile
(
"test_rope_rocm.py"
,
3
),
TestFile
(
"test_rope_rocm.py"
,
3
),
TestFile
(
"test_awq_dequant.py"
,
2
),
TestFile
(
"test_awq_dequant.py"
,
2
),
],
],
"per-commit-1-ascend-npu"
:
[
TestFile
(
"test_ascend_tp1_bf16.py"
,
400
),
],
"per-commit-2-ascend-npu"
:
[
TestFile
(
"test_ascend_tp2_bf16.py"
,
400
),
],
"per-commit-4-ascend-npu"
:
[
TestFile
(
"test_ascend_mla_w8a8int8.py"
,
400
),
],
"per-commit-2-gpu"
:
[
TestFile
(
"models/lora/test_lora_tp.py"
,
116
),
TestFile
(
"test_data_parallelism.py"
,
73
),
TestFile
(
"test_dp_attention.py"
,
277
),
TestFile
(
"test_patch_torch.py"
,
19
),
TestFile
(
"test_update_weights_from_distributed.py"
,
103
),
TestFile
(
"test_release_memory_occupation.py"
,
127
),
],
"per-commit-2-gpu-amd"
:
[
"per-commit-2-gpu-amd"
:
[
TestFile
(
"models/lora/test_lora_tp.py"
,
116
),
TestFile
(
"models/lora/test_lora_tp.py"
,
116
),
TestFile
(
"test_data_parallelism.py"
,
73
),
TestFile
(
"test_data_parallelism.py"
,
73
),
TestFile
(
"test_patch_torch.py"
,
19
),
TestFile
(
"test_patch_torch.py"
,
19
),
TestFile
(
"test_update_weights_from_distributed.py"
,
103
),
TestFile
(
"test_update_weights_from_distributed.py"
,
103
),
],
],
"per-commit-4-gpu"
:
[
TestFile
(
"test_gpt_oss_4gpu.py"
,
600
),
TestFile
(
"test_local_attn.py"
,
250
),
TestFile
(
"test_pp_single_node.py"
,
372
),
TestFile
(
"test_multi_instance_release_memory_occupation.py"
,
64
),
],
"per-commit-4-gpu-deepep"
:
[
TestFile
(
"test_deepep_small.py"
,
531
),
],
"per-commit-4-gpu-amd"
:
[
"per-commit-4-gpu-amd"
:
[
TestFile
(
"test_pp_single_node.py"
,
150
),
TestFile
(
"test_pp_single_node.py"
,
150
),
],
],
"per-commit-8-gpu"
:
[
# Disabled because it hangs on the CI.
# TestFile("test_moe_ep.py", 181),
TestFile
(
"test_disaggregation.py"
,
499
),
TestFile
(
"test_disaggregation_different_tp.py"
,
155
),
TestFile
(
"test_full_deepseek_v3.py"
,
333
),
],
"per-commit-8-gpu-deepep"
:
[
TestFile
(
"test_deepep_large.py"
,
338
),
],
"per-commit-8-gpu-amd"
:
[
"per-commit-8-gpu-amd"
:
[
TestFile
(
"test_full_deepseek_v3.py"
,
250
),
TestFile
(
"test_full_deepseek_v3.py"
,
250
),
],
],
"
per-commit-8-gpu-b200
"
:
[
"
nightly-amd
"
:
[
# add more here
TestFile
(
"test_nightly_gsm8k_eval_amd.py"
),
],
],
}
# Add Intel Xeon tests
suite_xeon
=
{
"per-commit-cpu"
:
[
"per-commit-cpu"
:
[
TestFile
(
"cpu/test_activation.py"
),
TestFile
(
"cpu/test_activation.py"
),
TestFile
(
"cpu/test_binding.py"
),
TestFile
(
"cpu/test_binding.py"
),
...
@@ -219,21 +231,25 @@ suites = {
...
@@ -219,21 +231,25 @@ suites = {
TestFile
(
"cpu/test_topk.py"
),
TestFile
(
"cpu/test_topk.py"
),
TestFile
(
"test_intel_amx_attention_backend.py"
),
TestFile
(
"test_intel_amx_attention_backend.py"
),
],
],
"nightly"
:
[
}
TestFile
(
"test_nightly_gsm8k_eval.py"
),
# Add Ascend NPU tests
suite_ascend
=
{
"per-commit-1-ascend-npu"
:
[
TestFile
(
"test_ascend_tp1_bf16.py"
,
400
),
],
],
"
nightly-amd
"
:
[
"
per-commit-2-ascend-npu
"
:
[
TestFile
(
"test_
nightly_gsm8k_eval_amd.py"
),
TestFile
(
"test_
ascend_tp2_bf16.py"
,
400
),
],
],
"vllm_dependency_test"
:
[
"per-commit-4-ascend-npu"
:
[
TestFile
(
"test_awq.py"
,
163
),
TestFile
(
"test_ascend_mla_w8a8int8.py"
,
400
),
TestFile
(
"test_bnb.py"
,
5
),
TestFile
(
"test_gguf.py"
,
96
),
TestFile
(
"test_gptqmodel_dynamic.py"
,
102
),
TestFile
(
"test_vllm_dependency.py"
,
185
),
],
],
}
}
suites
.
update
(
suite_amd
)
suites
.
update
(
suite_xeon
)
suites
.
update
(
suite_ascend
)
def
auto_partition
(
files
,
rank
,
size
):
def
auto_partition
(
files
,
rank
,
size
):
"""
"""
...
...
test/srt/test_bench_serving.py
View file @
ef48d554
...
@@ -56,7 +56,7 @@ class TestBenchServing(CustomTestCase):
...
@@ -56,7 +56,7 @@ class TestBenchServing(CustomTestCase):
f
"### test_offline_throughput_non_stream_small_batch_size
\n
"
f
"### test_offline_throughput_non_stream_small_batch_size
\n
"
f
"Output throughput:
{
res
[
'output_throughput'
]:.
2
f
}
token/s
\n
"
f
"Output throughput:
{
res
[
'output_throughput'
]:.
2
f
}
token/s
\n
"
)
)
self
.
assertGreater
(
res
[
"output_throughput"
],
105
0
)
self
.
assertGreater
(
res
[
"output_throughput"
],
10
4
5
)
def
test_offline_throughput_without_radix_cache
(
self
):
def
test_offline_throughput_without_radix_cache
(
self
):
res
=
run_bench_serving
(
res
=
run_bench_serving
(
...
...
test/srt/test_gpt_oss_1gpu.py
View file @
ef48d554
...
@@ -9,9 +9,9 @@ class TestGptOss1Gpu(BaseTestGptOss):
...
@@ -9,9 +9,9 @@ class TestGptOss1Gpu(BaseTestGptOss):
model_variant
=
"20b"
,
model_variant
=
"20b"
,
quantization
=
"mxfp4"
,
quantization
=
"mxfp4"
,
expected_score_of_reasoning_effort
=
{
expected_score_of_reasoning_effort
=
{
"low"
:
0.3
8
,
"low"
:
0.3
4
,
"medium"
:
0.3
8
,
"medium"
:
0.3
4
,
"high"
:
0.2
9
,
# TODO investigate
"high"
:
0.2
7
,
# TODO investigate
},
},
)
)
...
@@ -20,9 +20,9 @@ class TestGptOss1Gpu(BaseTestGptOss):
...
@@ -20,9 +20,9 @@ class TestGptOss1Gpu(BaseTestGptOss):
model_variant
=
"20b"
,
model_variant
=
"20b"
,
quantization
=
"bf16"
,
quantization
=
"bf16"
,
expected_score_of_reasoning_effort
=
{
expected_score_of_reasoning_effort
=
{
"low"
:
0.3
8
,
"low"
:
0.3
4
,
"medium"
:
0.3
8
,
"medium"
:
0.3
4
,
"high"
:
0.2
9
,
# TODO investigate
"high"
:
0.2
7
,
# TODO investigate
},
},
)
)
...
...
test/srt/test_gpt_oss_common.py
View file @
ef48d554
...
@@ -8,7 +8,9 @@ from sglang.test.test_utils import (
...
@@ -8,7 +8,9 @@ from sglang.test.test_utils import (
DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH
,
DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH
,
DEFAULT_URL_FOR_TEST
,
DEFAULT_URL_FOR_TEST
,
CustomTestCase
,
CustomTestCase
,
is_in_ci
,
popen_launch_server
,
popen_launch_server
,
write_github_step_summary
,
)
)
_base_url
=
DEFAULT_URL_FOR_TEST
_base_url
=
DEFAULT_URL_FOR_TEST
...
@@ -91,9 +93,16 @@ class BaseTestGptOss(CustomTestCase):
...
@@ -91,9 +93,16 @@ class BaseTestGptOss(CustomTestCase):
reasoning_effort
=
reasoning_effort
,
reasoning_effort
=
reasoning_effort
,
)
)
print
(
f
"Evaluation start:
{
model
=
}
{
reasoning_effort
=
}
{
expected_score
=
}
"
)
setup
=
f
"model=
{
model
}
reasoning_effort=
{
reasoning_effort
}
expected_score=
{
expected_score
}
"
print
(
f
"Evaluation start:
{
setup
}
"
)
metrics
=
run_eval
(
args
)
metrics
=
run_eval
(
args
)
print
(
print
(
f
"Evaluation end:
{
setup
}
{
metrics
=
}
"
)
f
"Evaluation end:
{
model
=
}
{
reasoning_effort
=
}
{
expected_score
=
}
{
metrics
=
}
"
)
self
.
assertGreaterEqual
(
metrics
[
"score"
],
expected_score
)
self
.
assertGreaterEqual
(
metrics
[
"score"
],
expected_score
)
if
is_in_ci
():
write_github_step_summary
(
f
"### test_gpt_oss_common
\n
"
f
"Setup:
{
setup
}
\n
"
f
"Score:
{
metrics
[
'score'
]:.
2
f
}
\n
"
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment