Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
zhaoyu6
sglang
Commits
45fdf1f7
"vscode:/vscode.git/clone" did not exist on "599e959ae75a7315b90dd8f1dfd5e35ef081b0e0"
Unverified
Commit
45fdf1f7
authored
Mar 27, 2025
by
Yi Pan
Committed by
GitHub
Mar 26, 2025
Browse files
Fix shared memory OOM on sm86 GPUs. (#4797)
parent
d89c0e4b
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
4 additions
and
4 deletions
+4
-4
python/sglang/srt/layers/attention/triton_ops/extend_attention.py
...glang/srt/layers/attention/triton_ops/extend_attention.py
+2
-2
sgl-kernel/csrc/gemm/int8_gemm_kernel.cu
sgl-kernel/csrc/gemm/int8_gemm_kernel.cu
+2
-2
No files found.
python/sglang/srt/layers/attention/triton_ops/extend_attention.py
View file @
45fdf1f7
...
...
@@ -341,8 +341,8 @@ def extend_attention_fwd(
else
:
BLOCK_M
,
BLOCK_N
=
(
32
,
64
)
elif
is_cuda_available
and
CUDA_CAPABILITY
[
0
]
>=
8
:
# 8
.
9 has a much smaller shared memory size (100K) than 8
.
0 (160K)
if
CUDA_CAPABILITY
[
1
]
==
9
:
#
sm86/sm
89 has a much smaller shared memory size (100K) than
sm
80 (160K)
if
CUDA_CAPABILITY
[
1
]
==
9
or
CUDA_CAPABILITY
[
1
]
==
6
:
if
Lq
<=
128
:
BLOCK_M
,
BLOCK_N
=
(
64
,
128
)
elif
Lq
<=
256
:
...
...
sgl-kernel/csrc/gemm/int8_gemm_kernel.cu
View file @
45fdf1f7
...
...
@@ -703,8 +703,8 @@ torch::Tensor int8_scaled_mm(
sm75_dispatch_shape
<
cutlass
::
half_t
,
cutlass
::
arch
::
Sm75
,
cutlass
::
gemm
::
GemmShape
<
8
,
8
,
16
>>
(
out
,
mat_a
,
mat_b
,
scales_a
,
scales_b
,
bias
);
}
else
if
(
sm_version
>=
80
&&
sm_version
<
90
)
{
// sm89 has a much smaller shared memory size (100K) than sm80 (160K)
if
(
sm_version
==
89
)
{
//
sm86/
sm89 has a much smaller shared memory size (100K) than sm80 (160K)
if
(
sm_version
==
86
||
sm_version
==
89
)
{
if
(
out_dtype
==
torch
::
kBFloat16
)
{
sm89_dispatch_shape
<
cutlass
::
bfloat16_t
,
cutlass
::
arch
::
Sm80
,
cutlass
::
gemm
::
GemmShape
<
16
,
8
,
32
>>
(
out
,
mat_a
,
mat_b
,
scales_a
,
scales_b
,
bias
);
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment