Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
cdc56ef6
Unverified
Commit
cdc56ef6
authored
Sep 08, 2025
by
Yineng Zhang
Committed by
GitHub
Sep 08, 2025
Browse files
feat: use sgl-kernel cu129 as default (#10188)
parent
16ff3d4b
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
19 additions
and
15 deletions
+19
-15
.github/workflows/pr-test-sgl-kernel.yml
.github/workflows/pr-test-sgl-kernel.yml
+3
-3
.github/workflows/release-whl-kernel.yml
.github/workflows/release-whl-kernel.yml
+8
-8
sgl-kernel/rename_wheels.sh
sgl-kernel/rename_wheels.sh
+2
-2
sgl-kernel/tests/test_cutlass_w4a8_moe_mm.py
sgl-kernel/tests/test_cutlass_w4a8_moe_mm.py
+6
-2
No files found.
.github/workflows/pr-test-sgl-kernel.yml
View file @
cdc56ef6
...
@@ -58,7 +58,7 @@ jobs:
...
@@ -58,7 +58,7 @@ jobs:
python-version
:
${{ matrix.python-version }}
python-version
:
${{ matrix.python-version }}
-
name
:
Build wheel for Python ${{ matrix.python-version }} and CUDA ${{ matrix.cuda-version }}
-
name
:
Build wheel for Python ${{ matrix.python-version }} and CUDA ${{ matrix.cuda-version }}
if
:
github.event_name != 'push' || (matrix.cuda-version != '1
1.8
' && matrix.cuda-version != '12.
9
')
if
:
github.event_name != 'push' || (matrix.cuda-version != '1
2.4
' && matrix.cuda-version != '12.
8
')
run
:
|
run
:
|
cd sgl-kernel
cd sgl-kernel
chmod +x ./build.sh
chmod +x ./build.sh
...
@@ -82,7 +82,7 @@ jobs:
...
@@ -82,7 +82,7 @@ jobs:
with
:
with
:
path
:
sgl-kernel/dist/
path
:
sgl-kernel/dist/
merge-multiple
:
true
merge-multiple
:
true
pattern
:
wheel-python3.10-cuda12.
4
pattern
:
wheel-python3.10-cuda12.
9
-
name
:
Install
-
name
:
Install
run
:
|
run
:
|
...
@@ -114,7 +114,7 @@ jobs:
...
@@ -114,7 +114,7 @@ jobs:
with
:
with
:
path
:
sgl-kernel/dist/
path
:
sgl-kernel/dist/
merge-multiple
:
true
merge-multiple
:
true
pattern
:
wheel-python3.10-cuda12.
4
pattern
:
wheel-python3.10-cuda12.
9
-
name
:
Install
-
name
:
Install
run
:
|
run
:
|
...
...
.github/workflows/release-whl-kernel.yml
View file @
cdc56ef6
...
@@ -17,13 +17,13 @@ concurrency:
...
@@ -17,13 +17,13 @@ concurrency:
cancel-in-progress
:
true
cancel-in-progress
:
true
jobs
:
jobs
:
build-cu12
4
:
build-cu12
9
:
if
:
github.repository == 'sgl-project/sglang'
if
:
github.repository == 'sgl-project/sglang'
runs-on
:
sgl-kernel-release-node
runs-on
:
sgl-kernel-release-node
strategy
:
strategy
:
matrix
:
matrix
:
python-version
:
[
"
3.10"
]
python-version
:
[
"
3.10"
]
cuda-version
:
[
"
12.
4
"
]
cuda-version
:
[
"
12.
9
"
]
steps
:
steps
:
-
uses
:
actions/checkout@v4
-
uses
:
actions/checkout@v4
with
:
with
:
...
@@ -46,14 +46,14 @@ jobs:
...
@@ -46,14 +46,14 @@ jobs:
pip install twine
pip install twine
python3 -m twine upload dist/* -u __token__ -p ${{ secrets.PYPI_TOKEN }}
python3 -m twine upload dist/* -u __token__ -p ${{ secrets.PYPI_TOKEN }}
build-cu12
9
:
build-cu12
4
:
if
:
github.repository == 'sgl-project/sglang'
if
:
github.repository == 'sgl-project/sglang'
needs
:
build-cu12
4
needs
:
build-cu12
9
runs-on
:
sgl-kernel-release-node
runs-on
:
sgl-kernel-release-node
strategy
:
strategy
:
matrix
:
matrix
:
python-version
:
[
"
3.10"
]
python-version
:
[
"
3.10"
]
cuda-version
:
[
"
12.
9
"
]
cuda-version
:
[
"
12.
4
"
]
steps
:
steps
:
-
uses
:
actions/checkout@v4
-
uses
:
actions/checkout@v4
with
:
with
:
...
@@ -76,8 +76,8 @@ jobs:
...
@@ -76,8 +76,8 @@ jobs:
name
:
wheel-python${{ matrix.python-version }}-cuda${{ matrix.cuda-version }}
name
:
wheel-python${{ matrix.python-version }}-cuda${{ matrix.cuda-version }}
path
:
sgl-kernel/dist/*
path
:
sgl-kernel/dist/*
release-cu12
9
:
release-cu12
4
:
needs
:
build-cu12
9
needs
:
build-cu12
4
runs-on
:
ubuntu-latest
runs-on
:
ubuntu-latest
steps
:
steps
:
-
uses
:
actions/checkout@v4
-
uses
:
actions/checkout@v4
...
@@ -114,7 +114,7 @@ jobs:
...
@@ -114,7 +114,7 @@ jobs:
WHL_TOKEN
:
${{ secrets.WHL_TOKEN }}
WHL_TOKEN
:
${{ secrets.WHL_TOKEN }}
-
name
:
Update wheel index
-
name
:
Update wheel index
run
:
python3 scripts/update_kernel_whl_index.py --cuda
12
9
run
:
python3 scripts/update_kernel_whl_index.py --cuda
12
4
-
name
:
Push wheel index
-
name
:
Push wheel index
run
:
|
run
:
|
...
...
sgl-kernel/rename_wheels.sh
View file @
cdc56ef6
...
@@ -16,8 +16,8 @@ for wheel in "${wheel_files[@]}"; do
...
@@ -16,8 +16,8 @@ for wheel in "${wheel_files[@]}"; do
fi
fi
# Detect CUDA version and add appropriate suffix
# Detect CUDA version and add appropriate suffix
if
ls
/usr/local/ |
grep
-q
"12.
9
"
;
then
if
ls
/usr/local/ |
grep
-q
"12.
4
"
;
then
new_wheel
=
"
${
intermediate_wheel
/-cp
${
cp_version
}
/+cu12
9
-cp
${
cp_version
}}
"
new_wheel
=
"
${
intermediate_wheel
/-cp
${
cp_version
}
/+cu12
4
-cp
${
cp_version
}}
"
elif
ls
/usr/local/ |
grep
-q
"12.8"
;
then
elif
ls
/usr/local/ |
grep
-q
"12.8"
;
then
new_wheel
=
"
${
intermediate_wheel
/-cp
${
cp_version
}
/+cu128-cp
${
cp_version
}}
"
new_wheel
=
"
${
intermediate_wheel
/-cp
${
cp_version
}
/+cu128-cp
${
cp_version
}}
"
else
else
...
...
sgl-kernel/tests/test_cutlass_w4a8_moe_mm.py
View file @
cdc56ef6
...
@@ -138,9 +138,13 @@ def test_int4_fp8_grouped_gemm_single_expert(batch_size):
...
@@ -138,9 +138,13 @@ def test_int4_fp8_grouped_gemm_single_expert(batch_size):
raise
raise
# @pytest.mark.skipif(
# not is_hopper(),
# reason="cutlass_w4a8_moe_mm is only supported on sm90",
# )
@
pytest
.
mark
.
skipif
(
@
pytest
.
mark
.
skipif
(
not
is_hopper
()
,
True
,
reason
=
"
cutlass_w4a8_moe_mm is only supported on sm90
"
,
reason
=
"
TODO(rainj-me): fix cu129 binary issue on hopper cu126
"
,
)
)
@
pytest
.
mark
.
parametrize
(
"batch_size"
,
[
2
,
4
,
8
,
16
])
@
pytest
.
mark
.
parametrize
(
"batch_size"
,
[
2
,
4
,
8
,
16
])
@
pytest
.
mark
.
parametrize
(
"k"
,
[
256
,
512
,
1024
])
@
pytest
.
mark
.
parametrize
(
"k"
,
[
256
,
512
,
1024
])
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment