Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
ffab74dd
Commit
ffab74dd
authored
Sep 09, 2025
by
jujl1
Browse files
fix: 优化w4a8 marlin 中 weight重排耗时
parent
bc9aee38
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
5 deletions
+5
-5
vllm/model_executor/layers/quantization/utils/w4a8_utils.py
vllm/model_executor/layers/quantization/utils/w4a8_utils.py
+5
-5
No files found.
vllm/model_executor/layers/quantization/utils/w4a8_utils.py
View file @
ffab74dd
...
...
@@ -54,12 +54,12 @@ def marlin_weights(q_w,weight_perm,k_tile=32,n_tile=64,pack_factor=8):
q_w
=
q_w
.
reshape
((
-
1
,
weight_perm
.
numel
()))[:,
weight_perm
].
reshape
(
q_w
.
shape
)
orig_device
=
q_w
.
device
q_w
=
q_w
.
cpu
().
numpy
().
astype
(
np
.
uint32
)
q_packed
=
np
.
zeros
((
q_w
.
shape
[
0
],
q_w
.
shape
[
1
]
//
pack_factor
),
dtype
=
np
.
uint32
)
q_w
=
q_w
.
contiguous
().
to
(
torch
.
int32
)
M
,
N
=
q_w
.
shape
assert
N
%
pack_factor
==
0
,
f
"size_n (
{
N
}
) must be divisible by pack_factor (
{
pack_factor
}
)"
q_packed
=
torch
.
zeros
((
M
,
N
//
pack_factor
),
dtype
=
torch
.
int32
,
device
=
orig_device
)
for
i
in
range
(
pack_factor
):
q_packed
|=
q_w
[:,
i
::
pack_factor
]
<<
4
*
i
q_packed
=
torch
.
from_numpy
(
q_packed
.
astype
(
np
.
int32
)).
to
(
orig_device
)
q_packed
+=
q_w
[:,
i
::
pack_factor
]
<<
(
4
*
i
)
return
q_packed
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment