Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
8ba8a855
Commit
8ba8a855
authored
Dec 18, 2025
by
王敏
Browse files
[fix]解决w8a8 低延迟 cudagraph启动报错
parent
0ae68da1
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
2 deletions
+2
-2
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
+2
-2
No files found.
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
View file @
8ba8a855
...
...
@@ -471,9 +471,9 @@ def apply_int8_linear(
m_
=
m
#best_config=W8A8_TRITONJSON.triton_json_dict[0][f"{m}_{n}_{k}"]
elif
m
<=
64
:
m_
=
(
m
+
3
)
&
-
4
#取值到最近的4的倍数
m_
=
(
m
+
3
)
//
4
*
4
#(m + 3)
& -4 #取值到最近的4的倍数
elif
m
<=
160
:
m_
=
(
m
+
7
)
&
-
8
m_
=
(
m
+
7
)
//
8
*
8
#
(m + 7) & -8
elif
m
<
200
:
#256
m_
=
160
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment