Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
6ea001cf
Unverified
Commit
6ea001cf
authored
Jan 11, 2026
by
Vensen
Committed by
GitHub
Jan 10, 2026
Browse files
[Bugfix][Quantization] Ensure input contiguity in per_token_quant_int8 (#31637)
Signed-off-by:
vensen
<
vensenmu@gmail.com
>
parent
1c46dea0
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
5 deletions
+8
-5
vllm/model_executor/layers/quantization/utils/int8_utils.py
vllm/model_executor/layers/quantization/utils/int8_utils.py
+8
-5
No files found.
vllm/model_executor/layers/quantization/utils/int8_utils.py
View file @
6ea001cf
...
@@ -122,15 +122,17 @@ def _per_token_quant_int8(
...
@@ -122,15 +122,17 @@ def _per_token_quant_int8(
def
per_token_quant_int8
(
x
):
def
per_token_quant_int8
(
x
):
original_shape
=
x
.
shape
if
x
.
dim
()
>
2
:
x
=
x
.
view
(
-
1
,
original_shape
[
-
1
])
M
=
x
.
numel
()
//
x
.
shape
[
-
1
]
M
=
x
.
numel
()
//
x
.
shape
[
-
1
]
N
=
x
.
shape
[
-
1
]
N
=
x
.
shape
[
-
1
]
x_q
=
torch
.
empty
_like
(
x
,
device
=
x
.
device
,
dtype
=
torch
.
int8
)
x_q
=
torch
.
empty
((
M
,
N
)
,
device
=
x
.
device
,
dtype
=
torch
.
int8
)
scales
=
torch
.
empty
(
x
.
shape
[:
-
1
]
+
(
1
,
),
device
=
x
.
device
,
dtype
=
torch
.
float32
)
scales
=
torch
.
empty
(
(
M
,
1
),
device
=
x
.
device
,
dtype
=
torch
.
float32
)
BLOCK
=
triton
.
next_power_of_2
(
N
)
BLOCK
=
triton
.
next_power_of_2
(
N
)
# heuristics for number of warps
# heuristics for number of warps
num_warps
=
min
(
max
(
BLOCK
//
256
,
1
),
8
)
num_warps
=
min
(
max
(
BLOCK
//
256
,
1
),
8
)
x
=
x
.
contiguous
()
assert
x
.
is_contiguous
()
_per_token_quant_int8
[(
M
,)](
_per_token_quant_int8
[(
M
,)](
x
,
x
,
x_q
,
x_q
,
...
@@ -142,7 +144,8 @@ def per_token_quant_int8(x):
...
@@ -142,7 +144,8 @@ def per_token_quant_int8(x):
num_warps
=
num_warps
,
num_warps
=
num_warps
,
num_stages
=
1
,
num_stages
=
1
,
)
)
x_q
=
x_q
.
view
(
*
original_shape
)
scales
=
scales
.
view
(
*
original_shape
[:
-
1
],
1
)
return
x_q
,
scales
return
x_q
,
scales
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment