Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
4a0d1919
Unverified
Commit
4a0d1919
authored
Jul 10, 2025
by
likesen-alibaba
Committed by
GitHub
Jul 10, 2025
Browse files
Fix bug of deepseek-v3 under DP+EP mode with large batchsize/seqlen (#6449)
parent
57482415
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
6 additions
and
6 deletions
+6
-6
python/sglang/srt/layers/quantization/fp8_kernel.py
python/sglang/srt/layers/quantization/fp8_kernel.py
+2
-2
sgl-kernel/csrc/gemm/per_token_group_quant_8bit.cu
sgl-kernel/csrc/gemm/per_token_group_quant_8bit.cu
+4
-4
No files found.
python/sglang/srt/layers/quantization/fp8_kernel.py
View file @
4a0d1919
...
...
@@ -160,8 +160,8 @@ def _per_token_group_quant_fp8_colmajor(
"""
# Map the program id to the row of X and Y it should compute.
g_id
=
tl
.
program_id
(
0
)
y_ptr
+=
g_id
*
group_size
y_q_ptr
+=
g_id
*
group_size
y_ptr
+=
g_id
.
to
(
tl
.
int64
)
*
group_size
y_q_ptr
+=
g_id
.
to
(
tl
.
int64
)
*
group_size
# Convert g_id the flattened block coordinate to 2D so we can index
# into the output y_scales matrix
...
...
sgl-kernel/csrc/gemm/per_token_group_quant_8bit.cu
View file @
4a0d1919
...
...
@@ -35,12 +35,12 @@ __global__ void per_token_group_quant_8bit_kernel(
const
int
scale_num_rows
=
0
,
const
int
scale_stride
=
0
)
{
const
int
threads_per_group
=
16
;
const
int
local_group_id
=
threadIdx
.
x
/
threads_per_group
;
const
int
64_t
local_group_id
=
threadIdx
.
x
/
threads_per_group
;
const
int
lane_id
=
threadIdx
.
x
%
threads_per_group
;
const
int
block_group_id
=
blockIdx
.
x
*
groups_per_block
;
const
int
global_group_id
=
block_group_id
+
local_group_id
;
const
int
block_group_offset
=
global_group_id
*
group_size
;
const
int
64_t
block_group_id
=
blockIdx
.
x
*
groups_per_block
;
const
int
64_t
global_group_id
=
block_group_id
+
local_group_id
;
const
int
64_t
block_group_offset
=
global_group_id
*
group_size
;
float
local_absmax
=
eps
;
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment