Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
4f605a6d
Unverified
Commit
4f605a6d
authored
May 08, 2025
by
Michael Goin
Committed by
GitHub
May 08, 2025
Browse files
Fix noisy warning for uncalibrated q_scale/p_scale (#17414)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
8342e3ab
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
4 deletions
+5
-4
vllm/model_executor/layers/quantization/kv_cache.py
vllm/model_executor/layers/quantization/kv_cache.py
+5
-4
No files found.
vllm/model_executor/layers/quantization/kv_cache.py
View file @
4f605a6d
...
...
@@ -124,11 +124,12 @@ class BaseKVCacheMethod(QuantizeMethodBase):
# These are used in the final Attention.forward()
layer
.
_q_scale
.
copy_
(
q_scale
)
layer
.
_prob_scale
.
copy_
(
prob_scale
)
if
q_scale
==
1.0
or
prob_scale
==
1.0
:
if
layer
.
kv_cache_dtype
==
"fp8"
and
(
q_scale
==
1.0
or
prob_scale
==
1.0
):
logger
.
warning_once
(
f
"Using
Q
scale
{
q_scale
}
and
prob scale
{
prob_scale
}
"
"
with fp8 attention. This may cause accuracy
issues.
"
"Please make sure
Q
/prob scaling factors are "
f
"Using
uncalibrated q_
scale
{
q_scale
}
and
/or
prob_scale "
f
"
{
prob_scale
}
with fp8 attention. This may cause accuracy "
"
issues.
Please make sure
q
/prob scaling factors are "
"available in the fp8 checkpoint."
)
del
layer
.
k_scale
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment