Unverified Commit 3beb57a2 authored by Yan Ma's avatar Yan Ma Committed by GitHub
Browse files

[XPU] properly handle q_descale on XPU as quant query input not supported (#39676)


Signed-off-by: default avatarYan Ma <yan.ma@intel.com>
Co-authored-by: default avatarKunshang Ji <kunshang.ji@intel.com>
parent 8b553193
......@@ -1031,7 +1031,9 @@ class FlashAttentionImpl(AttentionImpl):
window_size=sliding_window_size,
softcap=self.logits_soft_cap,
fa_version=self.vllm_flash_attn_version,
q_descale=layer._q_scale.expand(descale_shape),
q_descale=layer._q_scale.expand(descale_shape)
if self.supports_quant_query_input
else None,
k_descale=layer._k_scale.expand(descale_shape),
v_descale=layer._v_scale.expand(descale_shape),
num_splits=1 if self.batch_invariant_enabled else 0,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment