Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
8723b4f1
"docs/source/api/vscode:/vscode.git/clone" did not exist on "e06e63d5d578d0ef7af1d112d44228109de38a1b"
Unverified
Commit
8723b4f1
authored
Aug 12, 2025
by
Elfie Guo
Committed by
GitHub
Aug 12, 2025
Browse files
Use FlashInfer's TRTLLM FP8 Blockscale GEMM (#8588)
parent
62f99e08
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
3 deletions
+3
-3
python/sglang/srt/layers/quantization/fp8_utils.py
python/sglang/srt/layers/quantization/fp8_utils.py
+3
-3
No files found.
python/sglang/srt/layers/quantization/fp8_utils.py
View file @
8723b4f1
...
@@ -161,16 +161,16 @@ def flashinfer_gemm_w8a8_block_fp8_linear(
...
@@ -161,16 +161,16 @@ def flashinfer_gemm_w8a8_block_fp8_linear(
output_shape
=
[
*
input
.
shape
[:
-
1
],
weight
.
shape
[
0
]]
output_shape
=
[
*
input
.
shape
[:
-
1
],
weight
.
shape
[
0
]]
q_input
,
x_scale
=
sglang_per_token_group_quant_fp8
(
q_input
,
x_scale
=
sglang_per_token_group_quant_fp8
(
input_2d
,
block_size
[
1
],
column_major_scales
=
Fals
e
input_2d
,
block_size
[
1
],
column_major_scales
=
Tru
e
)
)
# TRTLLM requires column-major scaling factors
output
=
gemm_fp8_nt_groupwise
(
output
=
gemm_fp8_nt_groupwise
(
q_input
,
q_input
,
weight
,
weight
,
x_scale
,
x_scale
,
weight_scale
,
weight_scale
,
scale_major_mode
=
"K"
,
out_dtype
=
input_2d
.
dtype
,
out_dtype
=
input_2d
.
dtype
,
backend
=
"trtllm"
,
)
)
if
bias
is
not
None
:
if
bias
is
not
None
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment