Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
f3516c28
"vscode:/vscode.git/clone" did not exist on "329771e54230328aabe90e192351a99fddde12b7"
Unverified
Commit
f3516c28
authored
Jan 13, 2025
by
Ke Bao
Committed by
GitHub
Jan 13, 2025
Browse files
Fix quant kernel accuracy issue (#2865)
parent
17de02f9
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
1 deletion
+2
-1
python/sglang/srt/layers/quantization/int8_kernel.py
python/sglang/srt/layers/quantization/int8_kernel.py
+2
-1
No files found.
python/sglang/srt/layers/quantization/int8_kernel.py
View file @
f3516c28
...
@@ -22,7 +22,8 @@ def _per_token_quant_int8(
...
@@ -22,7 +22,8 @@ def _per_token_quant_int8(
x
=
tl
.
load
(
x_ptr
+
row_id
*
stride_x
+
cols
,
mask
=
mask
,
other
=
0.0
).
to
(
tl
.
float32
)
x
=
tl
.
load
(
x_ptr
+
row_id
*
stride_x
+
cols
,
mask
=
mask
,
other
=
0.0
).
to
(
tl
.
float32
)
absmax
=
tl
.
maximum
(
tl
.
max
(
tl
.
abs
(
x
)),
1e-10
)
absmax
=
tl
.
maximum
(
tl
.
max
(
tl
.
abs
(
x
)),
1e-10
)
scale_x
=
absmax
/
127
scale_x
=
absmax
/
127
x_q
=
tl
.
extra
.
cuda
.
libdevice
.
round
(
x
/
scale_x
).
to
(
tl
.
int8
)
x_q
=
x
*
(
127
/
absmax
)
x_q
=
tl
.
extra
.
cuda
.
libdevice
.
round
(
x_q
).
to
(
tl
.
int8
)
tl
.
store
(
xq_ptr
+
row_id
*
stride_xq
+
cols
,
x_q
,
mask
=
mask
)
tl
.
store
(
xq_ptr
+
row_id
*
stride_xq
+
cols
,
x_q
,
mask
=
mask
)
tl
.
store
(
scale_ptr
+
row_id
,
scale_x
)
tl
.
store
(
scale_ptr
+
row_id
,
scale_x
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment