Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
45fdf1f7
Unverified
Commit
45fdf1f7
authored
Mar 27, 2025
by
Yi Pan
Committed by
GitHub
Mar 26, 2025
Browse files
Fix shared memory OOM on sm86 GPUs. (#4797)
parent
d89c0e4b
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
4 additions
and
4 deletions
+4
-4
python/sglang/srt/layers/attention/triton_ops/extend_attention.py
...glang/srt/layers/attention/triton_ops/extend_attention.py
+2
-2
sgl-kernel/csrc/gemm/int8_gemm_kernel.cu
sgl-kernel/csrc/gemm/int8_gemm_kernel.cu
+2
-2
No files found.
python/sglang/srt/layers/attention/triton_ops/extend_attention.py
View file @
45fdf1f7
...
@@ -341,8 +341,8 @@ def extend_attention_fwd(
...
@@ -341,8 +341,8 @@ def extend_attention_fwd(
else
:
else
:
BLOCK_M
,
BLOCK_N
=
(
32
,
64
)
BLOCK_M
,
BLOCK_N
=
(
32
,
64
)
elif
is_cuda_available
and
CUDA_CAPABILITY
[
0
]
>=
8
:
elif
is_cuda_available
and
CUDA_CAPABILITY
[
0
]
>=
8
:
# 8
.
9 has a much smaller shared memory size (100K) than 8
.
0 (160K)
#
sm86/sm
89 has a much smaller shared memory size (100K) than
sm
80 (160K)
if
CUDA_CAPABILITY
[
1
]
==
9
:
if
CUDA_CAPABILITY
[
1
]
==
9
or
CUDA_CAPABILITY
[
1
]
==
6
:
if
Lq
<=
128
:
if
Lq
<=
128
:
BLOCK_M
,
BLOCK_N
=
(
64
,
128
)
BLOCK_M
,
BLOCK_N
=
(
64
,
128
)
elif
Lq
<=
256
:
elif
Lq
<=
256
:
...
...
sgl-kernel/csrc/gemm/int8_gemm_kernel.cu
View file @
45fdf1f7
...
@@ -703,8 +703,8 @@ torch::Tensor int8_scaled_mm(
...
@@ -703,8 +703,8 @@ torch::Tensor int8_scaled_mm(
sm75_dispatch_shape
<
cutlass
::
half_t
,
cutlass
::
arch
::
Sm75
,
cutlass
::
gemm
::
GemmShape
<
8
,
8
,
16
>>
(
sm75_dispatch_shape
<
cutlass
::
half_t
,
cutlass
::
arch
::
Sm75
,
cutlass
::
gemm
::
GemmShape
<
8
,
8
,
16
>>
(
out
,
mat_a
,
mat_b
,
scales_a
,
scales_b
,
bias
);
out
,
mat_a
,
mat_b
,
scales_a
,
scales_b
,
bias
);
}
else
if
(
sm_version
>=
80
&&
sm_version
<
90
)
{
}
else
if
(
sm_version
>=
80
&&
sm_version
<
90
)
{
// sm89 has a much smaller shared memory size (100K) than sm80 (160K)
//
sm86/
sm89 has a much smaller shared memory size (100K) than sm80 (160K)
if
(
sm_version
==
89
)
{
if
(
sm_version
==
86
||
sm_version
==
89
)
{
if
(
out_dtype
==
torch
::
kBFloat16
)
{
if
(
out_dtype
==
torch
::
kBFloat16
)
{
sm89_dispatch_shape
<
cutlass
::
bfloat16_t
,
cutlass
::
arch
::
Sm80
,
cutlass
::
gemm
::
GemmShape
<
16
,
8
,
32
>>
(
sm89_dispatch_shape
<
cutlass
::
bfloat16_t
,
cutlass
::
arch
::
Sm80
,
cutlass
::
gemm
::
GemmShape
<
16
,
8
,
32
>>
(
out
,
mat_a
,
mat_b
,
scales_a
,
scales_b
,
bias
);
out
,
mat_a
,
mat_b
,
scales_a
,
scales_b
,
bias
);
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment